Our Data Science tech stack has shifted from best-of-breed, “classic” business intelligence technologies to a hybrid environment, fully leveraging Hadoop and other Big Data solutions. Our philosophy has also evolved, now distilled in thinking and practice into “data science as a service”. Why did we do it? What does it look like? What are the benefits? Come find out.
A few years ago, we had the standard BI setup – source system databases, ETL tool, data warehouse DB, and reporting tool. However, the world and our business have been a changing (e.g. over 1 billion Netflix video hours streamed in June 2012).
Our current tech stack now includes these tools plus extensive use of Hadoop, Hive, Pig, Chukwa / Honu, R, Cassandra, and the Amazon cloud. I’ll dive into how we leverage all these technologies to get data science done, be it for algorithms, ad hoc analysis, or reports.
I’ll also discuss the enabling services we’ve developed to open this all up to everyone at Netflix, notably our Event Service (what happened and when) and our Execution Service (RESTful execution of Hadoop, Hive, and Pig jobs).
Kurt leads the Data Platform team at Netflix. His group architects and manages the technical infrastructure underpinning the company’s analytics. The Netflix data infrastructure includes various Big Data technologies (e.g. Hadoop, Hive, and Pig), Netflix open sourced applications and services (e.g. Lipstick and Genie), and traditional BI tools (e.g. Teradata and MicroStrategy).
For information on exhibition and sponsorship opportunities, contact Susan Stewart at email@example.com.
For information on trade opportunities contact Kathy Yu at mediapartners
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata contacts.