Strata 2013 Hadoop in Practice Sessions

A deep dive into the dominant big data stack, with practical lessons, integration tricks, and a glimpse of the road ahead.

Track Hosts

Sarah Sproehnle is the Director of Educational Services for Cloudera where she helps customers learn to use Apache Hadoop for big data processing. Cloudera provides commercial support, training and services for the Apache Hadoop platform.

Russell Jurney cut his data teeth in casino gaming, building web apps to analyze the performance of slot machines in the US and Mexico. After dabbling in entrepreneurship, interactive media and journalism, he moved to silicon valley to build analytics applications at scale at Ning and LinkedIn. He lives on the ocean in Pacifica, California with his wife Kate and two fuzzy dogs.

Add to your personal schedule
Great America Ballroom K
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Jonathan Hsieh (Cloudera, Inc), Himanshu Vashishtha (Cloudera, Inc.)
Average rating: ***..
(3.12, 16 ratings)
HBase is one of the more popular open source NoSQL databases that have cropped up over the last few years. Building applications that use HBase effectively is challenging. This tutorial is geared towards teaching the basics of building applications using HBase and covers concepts that a developer should know while using HBase as a backend store for their application. Read more.
Add to your personal schedule
Ballroom E
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Dean Wampler (Typesafe)
Average rating: ****.
(4.69, 13 ratings)
This hands-on tutorial teaches you how to use Hive, a high-level, data warehouse tool for Hadoop. Hive provides a SQL-like query language, HiveQL, that is easy to learn for people with prior SQL experience, making Hive attractive for data warehousing teams. Hive leverages the power of Hadoop for working with massive data sets without requiring expertise in MapReduce programming. Read more.
Add to your personal schedule
Ballroom H
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Karan Bhatia (Amazon Web Services), Parviz Deyhim (Amazon Web Services)
Average rating: ***..
(3.25, 4 ratings)
This hands-on tutorial will give you on an overview of how AWS can quickly and easily enable you to start generating insights from your company’s data. Read more.
Add to your personal schedule
Great America Ballroom K
Barry Fischer (Opower)
Average rating: **...
(2.67, 3 ratings)
Opower, the global leader in the field of energy information and analysis, works with 80 utility companies worldwide to give families context, insights, and advice about how to save energy. With access to an unprecedented (and still growing) amount of energy data—currently drawn from 50 million US homes—Opower is uncovering unique trends in how people are using energy at home. Read more.
Add to your personal schedule
Great America Ballroom K
Matt Walker (Etsy), Wil Stuckey (Etsy), Steve Mardenfeld (etsy)
Average rating: ****.
(4.00, 2 ratings)
As an ecommerce site with more than 800,000 different sellers, Etsy is particularly interested in understanding how shoppers find the items they seek. This talk will discuss the challenges of funnel analysis at Etsy, the corresponding deficiencies of several widely used web analytics tools, and our event sequence matching tool implemented in Hadoop. Read more.
Add to your personal schedule
Great America Ballroom K
Sam William (Stumbleupon Inc)
Average rating: ***..
(3.57, 7 ratings)
The Infrastructure team at Stumbleupon leverages the state of the art tools and technologies to build platforms that enable us collect, categorize, organize, store and analyze huge volumes of data. The platform is fast and robust that it adds minimal latency to the site.Timely collection and analysis of data helps data scientists, analysts and executives make the best decisions and validate them. Read more.
Add to your personal schedule
Great America Ballroom K
Alan Gates (Hortonworks)
Average rating: ***..
(3.50, 4 ratings)
Big Data is about more than petabytes; it is also about new paradigms, languages, and tools. This talk will cover work going on in Hadoop projects to coordinate sharing of data and user code between tools. Read more.
Add to your personal schedule
Great America Ballroom K
Matt Winkler (Microsoft)
Average rating: ****.
(4.00, 2 ratings)
In this session we’ll first discuss our experience extending Hadoop development to new platforms & languages and then discuss our experiments and experiences building supporting developer tools and plugins for those platforms. Read more.
Add to your personal schedule
Great America Ballroom K
Philip Zeyliger (Cloudera)
Average rating: ****.
(4.50, 2 ratings)
All is quiet on the log file front, but yet the system is down. What next? Three parts practical know-how (“here’s my toolbox”) and one part position paper (“must-haves for comprehensibility”), this talk will cover the tricks of the trade for debugging distributed systems. Motivated by experience gained diagnosing Hadoop, we’ll dig into the JVM, Linux esoterica, and outlier visualization. Read more.
Add to your personal schedule
Great America Ballroom K
Philip Kromer (Infochimps)
Average rating: *....
(1.00, 1 rating)
Join Flip Kromer, co-founder and CTO of Infochimps, as he walks you through a series of decision trees, making you rethink your use of Hadoop in the cloud and opening up possibilities for new patterns of work that are uniquely developer-friendly. Patterns of work like tuning your cluster to the job, and why the first priority of any analytics cluster should be downtime. Read more.
Add to your personal schedule
Great America Ballroom K
Shaun Connolly (Hortonworks), Tasso Argyros (Teradata Aster)
Average rating: ***..
(3.00, 2 ratings)
Apache Hadoop is an innovative emerging technology causing CIOs to rethink their data architecture - making this an exciting time to be a “big data” technologist. This tag-team presentation brings leaders in both Apache Hadoop and data warehousing on the stage, to answer these questions by sharing their vision for the future of big data management and analytics. Read more.
Add to your personal schedule
Great America Ballroom K
Paco Nathan (The Data Guild)
Average rating: *****
(5.00, 2 ratings)
This talk examines the notion of a "workflow" as a general abstraction for common use cases encountered in Data Science, particularly for building Enterprise apps. Patterns of workflows provide recipes for integrating different frameworks, plus the means for optimizing large-scale apps. We review this approach in the context of a sample app based on the Cascading open source project. Read more.
Add to your personal schedule
Great America Ballroom K
Milind Bhandarkar (Greenplum, A Division of EMC), Chaitan Baru (SDSC/UC San Diego)
Average rating: **...
(2.00, 1 rating)
We will describe the BigData Top100 List initiative—an new, open, community-based effort for benchmarking big data systems. Read more.
Add to your personal schedule
Great America Ballroom K
Jayant Shekhar (Cloudera Inc)
Average rating: ***..
(3.60, 5 ratings)
This talks dives into the extreme details of Building Recommendation Platforms. It covers the end to end Architecture and Design of such a system. It dives into the various ML Algorithms to be used along with their details. It also covers the Solutions to commonly seen Recommendation Patterns and detailed Use Cases along with their Solution. Read more.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts