Strata 2013 Schedule

Below are the confirmed and scheduled talks at Strata Conference in Santa Clara 2013 (schedule subject to change).

Customize Your Own Schedule

Create your own Strata schedule using the personal scheduler function. Mark the tutorials, sessions, keynotes, and events you want to attend by selecting the calendar icon [calendar icon] next to each listing. Then go to your personal schedule and get your own customized schedule generated.

See the list of all events happening onsite, including events on Monday, February 25: Women's Community Meetup, Big Data Camp, and Ignite.

Great America Ballroom J
Add to your personal schedule
9:00am Big Data for Enterprise IT Edd Dumbill (Silicon Valley Data Science)
Great America Ballroom K
Add to your personal schedule
9:00am Using HBase effectively - What You Need to Know as an Application Developer Jonathan Hsieh (Cloudera, Inc), Himanshu Vashishtha (Cloudera, Inc.)
Add to your personal schedule
1:30pm D3.js tutorial Scott Murray (University of San Francisco), Jerome Cukier (Jerome Cukier)
Ballroom CD
Add to your personal schedule
9:00am Data Driven Business Alistair Croll (Solve For Interesting)
Ballroom E
Add to your personal schedule
9:00am Hadoop Data Warehousing with Hive Dean Wampler (Typesafe)
Add to your personal schedule
1:30pm Think Like a Data Journalist: How the Guardian Turns Data into Stories Every Day Simon Rogers (Guardian), Feilding Cage (Guardian)
Ballroom F
Add to your personal schedule
9:00am Data Wrangling with R Garrett Grolemund (RStudio)
Add to your personal schedule
1:30pm Python for Data Analysis Wes McKinney (Cloudera)
Add to your personal schedule
6:00pm Plenary
Room: Mission City Ballroom
Strata 2013 Startup Showcase - Sponsored by Google Cloud Platform
Ballroom G
Add to your personal schedule
9:00am An Introduction to the Berkeley Data Analytics Stack (BDAS) Featuring Spark, Spark Streaming, and Shark - Part 1 Ion Stoica (UC Berkeley), Matei Zaharia (Databricks), Reynold Xin (Databricks), Shivaram Venkataraman (UC Berkeley), Andy Konwinski (UC Berkeley), Tathagata Das (Databricks)
Add to your personal schedule
1:30pm Introduction to Apache Hadoop Sarah Sproehnle (Cloudera, Inc.)
Room 204
Add to your personal schedule
9:00am Communicating Data Clearly Naomi Robbins (NBR)
Add to your personal schedule
1:30pm Hands-on with BDAS - Learn Spark, Spark Streaming and Shark via Real Data Analysis - Part 2 Matei Zaharia (Databricks), Reynold Xin (Databricks), Andy Konwinski (UC Berkeley), Tathagata Das (Databricks), Patrick Wendell (Databricks)
Ballroom H
Add to your personal schedule
9:00am Big Data on Amazon Web Services Karan Bhatia (Amazon Web Services), Parviz Deyhim (Amazon Web Services)
Add to your personal schedule
1:30pm Google Cloud for Data Crunchers Ryan Boyd (Google), Michael Manoochehri (Google, Inc.), Julia Ferraioli (Google)
Ballroom AB
Add to your personal schedule
9:00am Just the Basics: Core Data Science Skills with Kaggle’s Top Competitors William Cukierski (Kaggle), Ben Hamner (Kaggle)
Add to your personal schedule
1:30pm Search and Real Time Analytics on Big Data Ryan Tabora (Think Big Analytics), Jason Rutherglen (Datastax)
12:30pm Lunch - Sponsored by MapR Technologies
Room: Santa Clara Ballroom
Add to your personal schedule
5:00pm Plenary
Room: Expo Hall AB
Expo Hall Reception
8:00am Coffee Break - Sponsored by NetApp
Room: Ballroom DE Foyer
9:00am-5:00pm (8h) Enterprise IT
Big Data for Enterprise IT
Edd Dumbill (Silicon Valley Data Science)
For CIOs, IT executives, and technology professionals, Strata's Enterprise Big Data day lays out the roadmap to get your organization up to speed on big data. In this all-day event, hear how to create a big data strategy, understand the issues of managing data, and learn how data science can be used powerfully in your organization.
9:00am-12:30pm (3h 30m) Hadoop in Practice
Using HBase effectively - What You Need to Know as an Application Developer
Jonathan Hsieh (Cloudera, Inc) et al
HBase is one of the more popular open source NoSQL databases that have cropped up over the last few years. Building applications that use HBase effectively is challenging. This tutorial is geared towards teaching the basics of building applications using HBase and covers concepts that a developer should know while using HBase as a backend store for their application.
1:30pm-5:00pm (3h 30m) Design
D3.js tutorial
Scott Murray (University of San Francisco) et al
An introduction to D3, one of the most powerful Javascript data visualization libraries.
9:00am-5:00pm (8h) DDBD
Data Driven Business
Alistair Croll (Solve For Interesting)
For business strategists, marketers, product managers, and entrepreneurs, Data Driven Business looks at how to use data to make better business decisions faster. Packed with case studies, panels, and eye-opening presentations, this fast-paced day focuses on how to solve today's thorniest business problems with Big Data. It's the missing MBA for a data-driven, always-on business world.
9:00am-12:30pm (3h 30m) Hadoop in Practice
Hadoop Data Warehousing with Hive
Dean Wampler (Typesafe)
This hands-on tutorial teaches you how to use Hive, a high-level, data warehouse tool for Hadoop. Hive provides a SQL-like query language, HiveQL, that is easy to learn for people with prior SQL experience, making Hive attractive for data warehousing teams. Hive leverages the power of Hadoop for working with massive data sets without requiring expertise in MapReduce programming.
1:30pm-5:00pm (3h 30m) Data Science
Think Like a Data Journalist: How the Guardian Turns Data into Stories Every Day
Simon Rogers (Guardian) et al
This hands-on session will show how a dataset turns into a story, the narrative process the Guardian's team goes through, the tools used and the lessons learned.
9:00am-12:30pm (3h 30m) Data Science
Data Wrangling with R
Garrett Grolemund (RStudio)
Learn how to wrangle data in R: from acquiring and cleaning data, to changing data formats and performing targeted, groupwise calculations. This course will emphasize the 'reshape2' and 'plyr' packages.
1:30pm-5:00pm (3h 30m) Data Science
Python for Data Analysis
Wes McKinney (Cloudera)
This tutorial will be a hands-on introduction to the essential tools for working with structured data in Python, 'pandas' and 'NumPy'
6:00pm-8:00pm (2h)
Strata 2013 Startup Showcase - Sponsored by Google Cloud Platform
Don't miss Startup Showcase, Strata's live demo program and competition for startups and early-stage companies. The judges will pick winners from 10 finalist companies selected to present at the showcase.
9:00am-12:30pm (3h 30m) Beyond Hadoop
An Introduction to the Berkeley Data Analytics Stack (BDAS) Featuring Spark, Spark Streaming, and Shark - Part 1
Ion Stoica (UC Berkeley) et al
An introduction Spark and Shark, two components of the open-source Berkeley Data Analytics Stack (BDAS) in development at UC Berkeley. Spark is a high-speed cluster computing system compatible with Hadoop that can outperform it by up to 100x. Shark is a port of Apache Hive onto Spark that is fully compatible with, and up to 100x faster than, Hive.
1:30pm-5:00pm (3h 30m) Data Science
Introduction to Apache Hadoop
Sarah Sproehnle (Cloudera, Inc.)
This tutorial provides a solid foundation for those seeking to understand large scale data processing with MapReduce and Hadoop, plus its associated ecosystem. This session is intended for those who are new to Hadoop and are seeking to understand where Hadoop is appropriate and how it fits with existing systems. No programming experience is required.
9:00am-12:30pm (3h 30m) Design
Communicating Data Clearly
Naomi Robbins (NBR)
Communicating Data Clearly describes how to draw clear, concise, accurate graphs that are easier to understand than many of the graphs one sees today. The tutorial emphasizes how to avoid common mistakes that produce confusing or even misleading graphs. Graphs for one, two, three, and many variables are covered as well as general principles for creating effective graphs.
1:30pm-5:00pm (3h 30m) Beyond Hadoop
Hands-on with BDAS - Learn Spark, Spark Streaming and Shark via Real Data Analysis - Part 2
Matei Zaharia (Databricks) et al
Building on our previous tutorial introducing BDAS, the open-source Berkeley Data Analytics Stack, in this tutorial we will provide each audience member with a Spark/Shark cluster on EC2 and walk through hands-on coding examples. Lessons will cover the Spark and Shark command line interfaces, writing a standalone program, and data clustering using a distributed machine learning algorithm on Spark.
9:00am-12:30pm (3h 30m) Hadoop in Practice
Big Data on Amazon Web Services
Karan Bhatia (Amazon Web Services) et al
This hands-on tutorial will give you on an overview of how AWS can quickly and easily enable you to start generating insights from your company’s data.
1:30pm-5:00pm (3h 30m) Beyond Hadoop, Data Science
Google Cloud for Data Crunchers
Ryan Boyd (Google) et al
When data volume and velocity become massive, processing and analysis solutions require specialized technologies for different parts of the data pipeline. Google’s Cloud Platform is designed to help you focus on building applications, not infrastructure. We’ll demonstrate how to build end to end Big Data applications - from data collection, to analysis, to reporting and visualization.
9:00am-12:30pm (3h 30m) Data Science
Just the Basics: Core Data Science Skills with Kaggle’s Top Competitors
William Cukierski (Kaggle) et al
As more industries adopt data-driven policies, people untrained in the formal analysis of data are find themselves staring at a spreadsheet and asking what they did to deserve it. In this tutorial, two of Kaggle’s top data scientists will walk attendees through the basics of solving an analytics challenge, from defining the problem, to performing basic analysis, to visualizing the output.
1:30pm-5:00pm (3h 30m) Beyond Hadoop
Search and Real Time Analytics on Big Data
Ryan Tabora (Think Big Analytics) et al
In this hands-on tutorial, you will learn the importance of distributed search by our industry experience and knowledge of real use cases. We’ll introduce different architectures that incorporate distributed search techniques, share pain points experienced and lessons learned. For the hands-on part of the tutorial, you will learn how to install and use Apache Solr for real-time search on big data.
12:30pm-1:30pm (1h)
Break: Lunch - Sponsored by MapR Technologies
5:00pm-6:00pm (1h)
Expo Hall Reception
Grab a drink, mingle with fellow Strata participants, and see the latest technologies and products from leading companies in the data space.
8:00am-9:00am (1h)
Break: Coffee Break - Sponsored by NetApp

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts