This tutorial will introduce BDAS, the Berkeley Data Analytics Stack. BDAS is an open source, next-generation software stack being developed by the UC Berkeley AMPLab in collaboration with several leading technology companies. It aims to tackle two major challenges in data analytics – the need for lower-latency processing (e.g. streaming and interactive queries) and more complex analytics (e.g. graph and machine learning) – while staying compatible with the Hadoop stack. ’
In this tutorial, we will survey the following components of BDAS and show how each one can be used in real applications:
Many of the components are already in use in organizations large and small, including Yahoo!, Adobe, Intel, Conviva, Ooyala, Bizo, Baidu, Alibaba.
Tathagata Das is a third-year Ph.D. student in the AMP Lab in UC Berkeley, working Scott Shenker and Ion Stoica. He leads the development of the Spark Streaming project. His research interests include datacenter networks and frameworks for large scale data processing. Before graduate school, he has worked as an Assistant Researcher in Microsoft Research Lab India.
Haoyuan Li is a Computer Science Ph.D. candidate in AMPLab at UC Berkeley, and he works with Prof. Scott Shenker and Prof. Ion Stoica on big data and cloud computing. He leads Tachyon, an open source memory-centric distributed file system enabling reliable file sharing at memory-speed across cluster frameworks. He is a founding committer of Apache Spark and a co-creator of Spark Streaming. Before Berkeley, he worked at Conviva and Google, where he co-created PFPGrowth algorithm, which is included in Apache Mahout. Haoyuan has a M.S. from Cornell University and a B.S. from Peking University, both in Computer Science.
Ion Stoica is a Professor of Computer Science at UC Berkeley, where he does research on cloud computing and networked computer systems. Past work includes the Dynamic Packet State (DPS), Chord DHT, Internet Indirection Infrastructure (i3), declarative networks, replay-debugging, and multi-layer tracing in distributed systems. His current research includes resource management and scheduling for data centers, cluster computing frameworks, and network architectures. He is the recipient of a SIGCOMM Test of Time Award, the CoNEXT Rising Star Award, the PECASE Award, and the ACM doctoral dissertation award. Ion also co-founded Conviva, a startup to commercialize technologies for large scale video distribution.
Reynold Xin is an Apache Spark committer and the lead developer for Shark and GraphX, two computation frameworks built on top of Spark. He is also a co-founder of Databricks. Before Databricks, he was pursuing a PhD focusing on large scale data systems in the UC Berkeley AMPLab.
Sameer Agarwal is a final year Ph.D. student in the AMPLab at Berkeley working on large-scale approximate query processing frameworks. His research interests are at the intersection of distributed systems, databases and machine learning, and he has published over 10 articles in various top-tier conferences including NSDI, EUROSYS, SIGMOD, VLDB and KDD. He received his B.Tech in Computer Science and Engineering from the Indian Institute of Technology and was awarded the President of India Gold Medal in 2009. He was supported by the Qualcomm Innovation Fellowship during 2012-13 and is supported by the Facebook Graduate Fellowship during 2013-14.
Comments on this page are now closed.
For exhibition and sponsorship opportunities, contact Susan Stewart at email@example.com
For information on trade opportunities with O'Reilly conferences email mediapartners
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata + Hadoop World 2013 contacts