Skip to main content
Hadoop Platform

Hadoop Platform

In this track, we look at the components of the Hadoop ecosystem such as HBase, HDFS, YARN, Impala, Hive, and more.
Who should attend: Technical teams who want to understand and implement established Hadoop projects within their organizations; contributors and developers looking to integrate Hadoop technologies into their existing systems.

Track Hosts

Eric Baldeschwieler served as VP Hadoop Software Engineering for Yahoo!, where he led the evolution of Apache Hadoop from a 20 node prototype to a 42,000 node service that is behind every click at Yahoo!. Eric also served as a technology leader for Inktomi’s web service engine, which Yahoo! acquired in 2003. Prior to Inktomi, Eric developed software for video games, video post production systems and 3D modeling systems. Eric has a Master’s degree in Computer Science from the University of California, Berkeley and a Bachelor’s degree in Mathematics and Computer Science from Carnegie Mellon University.

Tom White has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. He is the author of "Hadoop: The Definitive Guide" for O'Reilly. Previously he worked as an independent consultant specializing in Hadoop, and before that was co-founder and Lead Developer at Kizoom, a UK mobile application startup. Tom has a Bachelor's degree in Mathematics from the University of Cambridge, and a Master's degree in History and Philosophy of Science from the Universities of Leeds, UK, and Florence, Italy.

Eric Sammer is currently a Principal Solution Architect at Cloudera where he helps customers plan, deploy, develop for, and use Hadoop and the related projects at scale. His background is in the development and operations of distributed, highly concurrent, data ingest and processing systems. He's been involved in the open source community and has contributed to a large number of projects over the last decade.

Add to your personal schedule
Grand Ballroom West
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
John Akred (Silicon Valley Data Science), Richard Williamson (Silicon Valley Data Science), Stephen OSullivan (Silicon Valley Data Science)
Average rating: ***..
(3.71, 17 ratings)
What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive and realtime analytical workloads. Read more.
Add to your personal schedule
Grand Ballroom East
Jun Fang (Facebook)
Average rating: ***..
(3.64, 14 ratings)
Morse is a new system developed in Facebook, to transform its ETL pipeline from daily batch to realtime. It continuously moves, transforms and loads data from distributed log and sharded mysql db, into Hive data warehouse. HBase is used as underlying storage for incrementally updated table, while the data is exposed as external table into Hive for read processing. Read more.
Add to your personal schedule
Grand Ballroom East
Henry Robinson (Cloudera)
Average rating: ***..
(3.40, 5 ratings)
The increasing diversity of frameworks and workloads that run atop a Hadoop cluster gives more flexibility and power to users, but make it very difficult for an administrator to ensure that SLAs are met while allowing exploratory, ad-hoc usage to continue to use all spare capacity. We present our vision and implementation for generalised resource management on Hadoop, suitable for all uses. Read more.
Add to your personal schedule
Grand Ballroom East
Nick Dimiduk (Hortonworks, Inc)
Average rating: ****.
(4.50, 12 ratings)
Your application is out-growing its database, you've started shopping NoSQL options. Maybe you've adopted Hadoop into your Data Warehouse. You've heard HBase might be an appropriate technology, but you need to know more. This talk is for you. To understand its use, first understand how it works. This talk explores the design of HBase and its critical paths to ground an understanding of its use. Read more.
Add to your personal schedule
Grand Ballroom East
Jonathan Hsieh (Cloudera, Inc)
Average rating: ****.
(4.57, 7 ratings)
Apache HBase is a robust random-access distributed datastore built upon Apache Hadoop’s HDFS and Apache ZooKeeper. This talk will describe themes emerging from recent features slated for the upcoming post-0.96 release. These include improvements for multi-tenant deployments; a focus on predictable latencies; and the proliferation of new extensions for features traditionally from databases. Read more.
Add to your personal schedule
Grand Ballroom East
Aaron Myers (Cloudera, Inc.), Shreepadma Venugopalan (Cloudera)
Average rating: ***..
(3.22, 9 ratings)
When Hadoop is used for sensitive data, security requirements arise that require strong authentication, authorization of data/resources, and data confidentiality. This session covers how various parts of the Hadoop ecosystem can interact in a secure way to address these requirements. We will focus on the advanced Apache Hive authorization features enabled by the Apache Sentry (incubating) project Read more.
Add to your personal schedule
Grand Ballroom East
Jing Zhao (Hortonworks, Inc.), Tsz-Wo Sze (Hortonworks Inc.)
Average rating: ****.
(4.50, 8 ratings)
In this talk, attendees will understand the high level design of HDFS snapshots, along with how snapshots can be used for data protection and disaster recovery. We will also talk about details of snapshot development and testing. In the end, we will explore how to build and improve other features on top of HDFS snapshots, including Distcp, HBase snapshots, and Hive table snapshots. Read more.
Add to your personal schedule
Grand Ballroom East
Siddharth Seth (Hortonworks Inc), Hitesh Shah (Hortonworks Inc)
Average rating: ***..
(3.67, 6 ratings)
Apache Hadoop has become popular from its specialization in the execution of MapReduce programs. However, it has been hard to leverage existing Hadoop infrastructure for various other processing paradigms such as real-time streaming, graph processing and message-passing. Learn how this barrier was removed and how new applications are being built and run on Apache Hadoop. Read more.
Add to your personal schedule
Grand Ballroom East
Jayant Shekhar (Cloudera Inc)
Average rating: ****.
(4.25, 8 ratings)
Hadoop has evolved significantly in recent years, today serving as a unified platform for near-real-time (NRT) and batch workflows, such as querying, analysis and alerting for logs and machine data. In this session, we'll dive into the details of using SolrCloud and Cloudera Impala together to serve search queries, by integrating Flume to stream events into Solr, Impala and HBase. Read more.
Add to your personal schedule
Grand Ballroom East
Paul Kent (SAS)
Average rating: *....
(1.00, 1 rating)
Analytically focused organizations are building general purpose Hadoop Clusters and want to deploy a wide range of Analytic Software. As the level of data sharing goes up and the variety of tools used to access data increases, you’ll be faced with choices: what format to store your data in; what catalog to describe the data and its layouts; and how/when/where to decide between tools. Read more.
Add to your personal schedule
Murray Hill Suite
Greg Rahn (Cloudera)
Average rating: ****.
(4.75, 8 ratings)
Impala brings SQL to Hadoop, but it also brings SQL performance tuning to those using the platform. This technical session will cover several topics in Impala performance analysis to aid in answering the question “why is my query slow?” as well as practical tips and techniques to get the best performance from Impala. Read more.
Add to your personal schedule
Murray Hill Suite
Tanel Poder (Enkitec)
Average rating: *....
(1.50, 2 ratings)
If you are a developer or DBA with Oracle background and want to learn how Hadoop works, this session is for you. We will go through the Hadoop HDFS and MapReduce data processing flow and compare it to the already familiar Oracle database parallel processing - which should make understanding the internals of this new technology a breeze. Read more.
Add to your personal schedule
Murray Hill Suite
Philip Zeyliger (Cloudera)
Average rating: ****.
(4.50, 6 ratings)
All is quiet on the log file front, but yet the system is down. What next? This talk will cover the tricks of the trade for debugging distributed systems. Motivated by experience gained diagnosing Hadoop, we’ll dig into the JVM, Linux esoterica, and outlier visualization. Read more.

Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences email mediapartners
@oreilly.com

Press & Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata + Hadoop World 2013 contacts