Maxime Nay, Lead Data Engineer at GumGum gave a talk explaining GumGum's Data Architecture and challenges associated with it on February 15th, 2018 at South Bay Java User's Group. GumGum produces over 50 TB of new raw data every day. It amounts of more than 100 billion events per day. These events are processed using a typical lambda architecture. For a given use case we have a batch pipeline and a real time pipeline. Data produced from both of these pipelines is them merged to give a complete view of the data. GumGum has more than 70 such pipelines. Some of them do not have the real time component. Processing data at such scale involves maneuvering through many challenges. Maxime talks about the challenges and the steps taken to solve some of the problems we faced.
The slides used in this presentation can be viewed at http://bit.ly/100-billion-events