Our Data Science tech stack has shifted from best-of-breed, “classic” business intelligence technologies to a hybrid environment, fully leveraging Hadoop and other Big Data solutions. Our philosophy has also evolved, now distilled in thinking and practice into “data science as a service”. Why did we do it? What does it look like? What are the benefits? Come find out.
A few years ago, we had the standard BI setup – source system databases, ETL tool, data warehouse DB, and reporting tool. However, the world and our business have been a changing (e.g. over 1 billion Netflix video hours streamed in June 2012).
Our current tech stack now includes these tools plus extensive use of Hadoop, Hive, Pig, Chukwa / Honu, R, Cassandra, and the Amazon cloud. I’ll dive into how we leverage all these technologies to get data science done, be it for algorithms, ad hoc analysis, or reports.
I’ll also discuss the enabling services we’ve developed to open this all up to everyone at Netflix, notably our Event Service (what happened and when) and our Execution Service (RESTful execution of Hadoop, Hive, and Pig jobs).
Kurt leads the Data Science & Engineering (DSE) Platform team at Netflix. His group architects and manages the technical infrastructure that enables Netflix’s data-centric decision making. The Netflix DSE infrastructure includes both traditional BI tools (e.g. Teradata and MicroStrategy) and various Big Data technologies (e.g. Hadoop, Hive, and Pig).
For information on exhibition and sponsorship opportunities, contact Susan Stewart at firstname.lastname@example.org.
For information on trade opportunities contact Kathy Yu at mediapartners
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata contacts.