Big Data for the Masses: How We Opened Up the Doors to Google’s Dremel

Michael Manoochehri (Google, Inc.), Jim Caputo (Google, Inc.)
Hadoop & Beyond, Grand West (NY Hilton)
Average rating: ***..
(3.47, 15 ratings)

60 hours of videos are uploaded to YouTube every minute. The Google search index contained 100 Million Gigabytes of data in 2010. Other Google services have hundreds of millions of users. Each of these products generates massive amounts of data. Google has developed custom technologies to analyze this data and make intelligent product decisions.

Dremel is a scalable, interactive ad-hoc query system. By combining multi-level execution trees and columnar data layout, Dremel allows users to run queries in a SQL-like language over tables with billions of rows in seconds. Dremel uses an architecture distinct from MapReduce-based platforms to improve efficiency when running multiple simultaneous query jobs. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google querying web logs, ad analytics and financial data.

Google’s situation is no longer unique. As more and more companies collect massive amounts of data, they need to quickly analyze it without large investments in infrastructure or human capital. We want everyone to have the power of Dremel.

BigQuery puts the powerful interactive querying capabilities of Dremel into the hands of users everywhere. It is designed for accessibility and ease of use, featuring a REST API as well as a web-based interface. BigQuery enables users to ingest 1 TB of data and run hundreds of queries on it with a SQL-like language in less than an hour.

This session will discuss the development and capabilities of Dremel, in particular its performance characteristics and ability to enable interactive ad-hoc querying on a multi-tenant architecture. We’ll also dive into the design challenges necessary to make the Dremel technology accessible and performant for third-party developers and business users to work with massive data sets.

Photo of Michael Manoochehri

Michael Manoochehri

Google, Inc.

Michael is a Developer Programs Engineer supporting developers who work with Google’s Cloud and Data platforms. With many years of experience working for research and non-profit organizations, he is interested in making data analysis on large scale computing infrastructure more accessible and affordable. Michael has written for tech blog ProgrammableWeb.com, has spent time in rural Uganda researching mobile phone use, and has a Masters degree in Information Management and Systems from UC Berkeley’s School of Information.

Jim Caputo

Google, Inc.

Jim Caputo is the technical lead for Google’s BigQuery team, and heads the engineering efforts to externalize Google’s large scale data processing for developers and enterprise customers. Prior to his tenure at Google, Jim worked on product teams at Expedia and Microsoft.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.