Big Data for the Masses: How We Opened Up the Doors to Google’s Dremel

Ryan Boyd (Google), Siddartha Naidu (Google)
Data Science
Location: Room 1-6
Average rating: ***..
(3.25, 4 ratings)

60 hours of videos are uploaded to YouTube every minute. The Google search index contained 100 Million Gigabytes of data in 2010. Other Google services have hundreds of millions of users. Each of these products generates massive amounts of data. Google has developed custom technologies to analyze this data and make intelligent product decisions.

Dremel is a scalable, interactive ad-hoc query system. By combining multi-level execution trees and columnar data layout, Dremel allows users to run queries in a SQL-like language over tables with billions of rows in seconds. Dremel uses an architecture distinct from MapReduce-based platforms to improve efficiency when running multiple simultaneous query jobs. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google querying web logs, ad analytics and financial data.

Google’s situation is no longer unique. As more and more companies collect massive amounts of data, they need to quickly analyze it without large investments in infrastructure or human capital. We want everyone to have the power of Dremel.

BigQuery puts the powerful interactive querying capabilities of Dremel into the hands of users everywhere. It is designed for accessibility and ease of use, featuring a REST API as well as a web-based interface. BigQuery enables users to ingest 1 TB of data and run hundreds of queries on it with a SQL-like language in less than an hour.

This session will discuss the development and capabilities of Dremel, in particular its performance characteristics and ability to enable interactive ad-hoc querying on a multi-tenant architecture. We’ll also dive into the design challenges necessary to make the Dremel technology accessible and performant for third-party developers and business users to work with massive data sets.

Photo of Ryan Boyd

Ryan Boyd

Google

Ryan is a Developer Advocate at Google, focused on cloud data services. He’s been at Google for 5 years and previously helped build out the Google Apps ISV ecosystem. He recently published his first book “Getting Started with OAuth 2.0” with O’Reilly.

Siddartha Naidu

Google

Siddartha has been crunching large data sets at Google since 2005 for a variety of products and as a physics grad student before that. He has worked with or on almost every data processing framework at Google and is still looking for ways to make his job easier.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com or +1 (707) 827-7148

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.