This tutorial will introduce BDAS, the Berkeley Data Analytics Stack. BDAS is an open source, next-generation software stack being developed by the UC Berkeley AMPLab in collaboration with several leading technology companies. It aims to tackle two major challenges in data analytics – the need for lower-latency processing (e.g. streaming and interactive queries) and more complex analytics (e.g. graph and machine learning) – while staying compatible with the Hadoop stack. ‘
In this tutorial, we will survey the following components of BDAS and show how each one can be used in real applications:
Many of the components are already in use in organizations large and small, including Yahoo!, Adobe, Intel, Conviva, Ooyala, Bizo, Baidu, Alibaba.
Tathagata Das is a third-year Ph.D. student in the AMP Lab in UC Berkeley, working Scott Shenker and Ion Stoica. He leads the development of the Spark Streaming project. His research interests include datacenter networks and frameworks for large scale data processing. Before graduate school, he has worked as an Assistant Researcher in Microsoft Research Lab India.
Haoyuan Li is a second year Computer Science PhD student in the AMP Lab at UC Berkeley, working with Scott Shenker and Ion Stoica on computer systems and cloud computing. He is the lead developer of Tachyon distributed file system. Before Berkeley, he studied at Cornell University and Peking University, and worked at Conviva and Google.
Ion Stoica is a Professor of Computer Science at UC Berkeley, where he does research on cloud computing and networked computer systems. Past work includes the Dynamic Packet State (DPS), Chord DHT, Internet Indirection Infrastructure (i3), declarative networks, replay-debugging, and multi-layer tracing in distributed systems. His current research includes resource management and scheduling for data centers, cluster computing frameworks, and network architectures. He is the recipient of a SIGCOMM Test of Time Award, the CoNEXT Rising Star Award, the PECASE Award, and the ACM doctoral dissertation award. Ion also co-founded Conviva, a startup to commercialize technologies for large scale video distribution.
Reynold Xin is an Apache Spark committer and the lead developer for Shark and GraphX, two computation frameworks built on top of Spark. He is also a co-founder of Databricks. Before Databricks, he was pursuing a PhD focusing on large scale data systems in the UC Berkeley AMPLab.
Sameer Agarwal is a Ph.D. student in the AMPLab at Berkeley working on large-scale approximate query processing frameworks. His research interests are at the intersection of distributed systems, databases and machine learning. He received his B.Tech in Computer Science and Engineering from the Indian Institute of Technology and was awarded the President of India Gold Medal in 2009. He is supported by the Qualcomm Innovation Fellowship during 2012-13 and the Facebook Graduate Fellowship during 2013-14.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For exhibition and sponsorship opportunities, contact Susan Stewart at email@example.com
For information on trade opportunities with O'Reilly conferences email mediapartners
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata + Hadoop World 2013 contacts