Most analytics systems rely on large offline computations, which means results come in hours or days behind. Twitter is a realtime system, and it’s critical that our analytics be realtime as well. But with over 160 million users producing over 90 million tweets per day, we needed infrastructure that scaled horizontally in addition to being realtime. In this talk we’ll discuss the high-volume realtime analytics system we have built, as well as the benefits and challenges of this model over standard offline models. We’ll look at some products that we are building on top of this infrastructure, and discuss where we’re hoping to take this system in the future.
Kevin Weil specializes in the technology behind distributed systems, parallel processing, and analytics, especially in the context of large datasets. He currently leads the analytics team at Twitter, using Hadoop and other big data analytics tools to crunch Twitter’s massive data set and apply those learnings to improve the product. Prior to Twitter, he was the first employee at next-generation web media startup Cooliris, backed and incubated by Kleiner Perkins. At Cooliris, he innovated on user growth and advertising-focused analytics on a server cluster running Hadoop, Pig, and Hive—open-source implementations of the Google technology stack that are central to companies like Facebook, Yahoo, and A9. Mr. Weil has also worked at municipal wireless network provider Tropos Networks, where he optimized the performance of citywide wireless mesh networks. He has also worked at Microsoft Research and at SLAC.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities at the conference, contact Susan Young at email@example.com
Download the Strata Sponsor/Exhibitor Prospectus
View a complete list of Strata Contacts