Skip to main content

Strata + Hadoop World Schedule

Below are the confirmed and scheduled talks at Strata + Hadoop World 2013. Note: The schedule is subject to change.

Customize Your Own Schedule

Create your own conference schedule using the personal scheduler function. Mark the Tutorials, Sessions, Keynotes, and Events you want to attend by clicking on the calendar icon [calendar icon] next to each listing. Then go to your personal schedule and get your own customized schedule generated.

Sutton Center - Sutton South
Add How to Build a Hadoop Data Application to your personal schedule
9:00am How to Build a Hadoop Data Application Tom White (Cloudera), Eric Sammer (ScalingData), Joey Echeverria (Cloudera)
Add Teaching the Elephant to Read: Hadoop + Python + NLP to your personal schedule
1:30pm Teaching the Elephant to Read: Hadoop + Python + NLP Sean Murphy (JHU), Benjamin Bengfort (Cobrain Company and University of Maryland)
Gramercy Suite
Beekman Parlor - Sutton North
Add Data-Driven Business Day to your personal schedule
9:00am Data-Driven Business Day Alistair Croll (Solve For Interesting)
Grand Ballroom West
Add Building a Data Platform to your personal schedule
1:30pm Building a Data Platform John Akred (Silicon Valley Data Science), Richard Williamson (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science)
Regent Parlor
Add Getting Started With Python, Matplotlib, and Pandas to your personal schedule
9:00am Getting Started With Python, Matplotlib, and Pandas Matt Harrison (MetaSnake)
Nassau Suite
Add Using R and Hadoop for Statistical Computation at Scale to your personal schedule
9:00am Using R and Hadoop for Statistical Computation at Scale Antonio Piccolboni (Per data LLC), Joseph Rickert (Revolution Analytics)
Add Getting Started with Julia to your personal schedule
1:30pm Getting Started with Julia Leah Hanson (Google)
Murray Hill Suite
Add Data is Beautiful to your personal schedule
9:00am Data is Beautiful Julie Rodriguez (Sapient Global Markets)
Rhinelander South
Add An Introduction to the Berkeley Data Analytics Stack With Spark, Spark Streaming, Shark, Tachyon, and BlinkDB to your personal schedule
9:00am An Introduction to the Berkeley Data Analytics Stack With Spark, Spark Streaming, Shark, Tachyon, and BlinkDB Tathagata Das (Databricks), Haoyuan Li (UC Berkeley), Ion Stoica (UC Berkeley), Reynold Xin (Databricks), Sameer Agarwal (UC Berkeley)
Add Mining Social Web APIs with IPython Notebook to your personal schedule
1:30pm Mining Social Web APIs with IPython Notebook Matthew Russell (Digital Reasoning)
Add Opening Reception to your personal schedule
5:00pm Plenary
Room: Sponsor Pavilion
Opening Reception
Add Ignite to your personal schedule
8:00pm Plenary
Room: Grand Ballroom
Ignite
12:30pm Lunch
Room: America's Hall 1 & 2
Add Startup Showcase to your personal schedule
6:30pm Plenary
Room: 3rd Floor Foyer
Startup Showcase
8:00am Coffee Break
Room: Sutton Foyer
Hadoop World
9:00am-12:30pm (3h 30m) Hadoop in Action
How to Build a Hadoop Data Application
Tom White (Cloudera) et al
In this tutorial we'll use the Cloudera Development Kit (CDK) to build a Java web app that logs application events to Hadoop, and then run ad hoc and scheduled queries against the collected data.
Hadoop World
1:30pm-5:00pm (3h 30m) Hadoop in Action
Teaching the Elephant to Read: Hadoop + Python + NLP
Sean Murphy (JHU) et al
Much of the world’s data (and your own) is text. The key to unlocking its value is in a series of Natural Language Processing transformations that turn raw strings into a machine usable form. We will use Hadoop alongside Python’s NLTK to do these steps and discuss why each is necessary in your application.
9:00am-5:00pm (8h) Hardcore Data Science
Hardcore Data Science
Strata's regular data science track has great talks with real world experience from leading edge speakers. But we didn't just stop there—we added the Hardcore Data Science day to give you a chance to go even deeper. The Hardcore day will add new techniques and technologies to your data science toolbox, shared by leading data science practitioners from startups, industry, consulting and academia.
9:00am-5:00pm (8h) Data-Driven Business
Data-Driven Business Day
Alistair Croll (Solve For Interesting)
For business strategists, marketers, product managers, and entrepreneurs, Data-Driven Business looks at how to use data to make better business decisions faster. Packed with case studies, panels, and eye-opening presentations, this fast-paced day focuses on how to solve today's thorniest business problems with Big Data. It's the missing MBA for a data-driven, always-on business world.
Hadoop World
9:00am-12:30pm (3h 30m) Hadoop & Beyond
Massive Data Aggregation, Monitoring, Processing and Visualization with Apache Flume, ElasticSearch and D3.js
Israel Ekpo (Walt Disney Parks and Resorts Online)
This is a 3-hour tutorial on how to use Apache Flume to aggregate massive quantities of structured or unstructured data from sources such as log data, click streams, social media data, graph data and network traffic into centralized data stores such as HDFS, ElasticSearch, Neo4j and MongoDB so that they can be processed, digested and visualized in realtime using D3.js and HTML5 WebSockets.
Hadoop World
1:30pm-5:00pm (3h 30m) Hadoop Platform
Building a Data Platform
John Akred (Silicon Valley Data Science) et al
What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive and realtime analytical workloads.
9:00am-12:30pm (3h 30m) Data Science
Getting Started With Python, Matplotlib, and Pandas
Matt Harrison (MetaSnake)
This Tutorial will jumpstart your Python experience. Learn the basics-enough Python to be dangerous. Then use two of the most popular packages for analysis, Matplotlib for plotting, and Pandas for data wrangling. This will be a hands-on tutorial, so bring a laptop with Python 2.7 installed, and the gumption to hit the ground running and see what everyone is raving about.
Hadoop World
1:30pm-5:00pm (3h 30m) Hadoop & Beyond
An Introduction to Real-Time Analytics with Cassandra and Hadoop
Patricia Gorla (The Last Pickle)
Before you analyze your big data, you need a way to store and access it. Here we examine the benefits of using a highly-available, eventually consistent storage system, and what impact this has on real-time analytics. This session will prepare you to set up a multi-node working Cassandra and Hadoop cluster.
9:00am-12:30pm (3h 30m) Data Science
Using R and Hadoop for Statistical Computation at Scale
Antonio Piccolboni (Per data LLC) et al
This tutorial is aimed at R users who want to use Hadoop to work on big data and Hadoop users who want to do sophisticated analytics. We will introduce to R, Hadoop and the RHadoop project. We will then cover three R packages for Hadoop and the mapreduce model. We will present numerous examples of incremental complexity including the combination of rmr and RevoscaleR to solve modeling problems.
Hadoop World
1:30pm-5:00pm (3h 30m) Data Science, Hadoop & Beyond
Getting Started with Julia
Leah Hanson (Google)
Julia is a high-performance, open source language with great tools for numerical and statistical work. If you know R, MATLAB, or NumPy, you will feel at home in Julia. Unlike these systems, however, Julia takes advantage of modern compiler technology, combining an intuitive programming model with the speed of a low-level language. This workshop will take you from installed to productive in Julia.
9:00am-12:30pm (3h 30m) Design
Data is Beautiful
Julie Rodriguez (Sapient Global Markets)
Learn how to find beauty in data. The beauty of a visual is that it can communicate so much. As we become more sophisticated with the amount of data we can harness, it will become more important for us to be equally good at visually communicating that data. This workshop will guide attendees through the process of learning a method that will aide in selecting the right visualization.
1:30pm-5:00pm (3h 30m) Data Science
How to Create Predictive Models in R Using Ensembles
Giovanni Seni (Intuit)
This tutorial, based on a published book by the speaker, offers a hands-on intro to ensemble models, which combine multiple models into a single predictive system that’s often more accurate than the best of its components. Participants will use data sets and snippets of R code to experiment with the methods to gain a practical understanding of this breakthrough technology.
Hadoop World
9:00am-12:30pm (3h 30m) Hadoop & Beyond
An Introduction to the Berkeley Data Analytics Stack With Spark, Spark Streaming, Shark, Tachyon, and BlinkDB
Tathagata Das (Databricks) et al
An introduction to the open-source Berkeley Data Analytics Stack (BDAS). Spark is a high-speed cluster computing engine that supports rich analytics (e.g. machine learning) and lower-latency processing (e.g. streaming). Tachyon provides in-memory storage, letting Spark and Hadoop jobs share data efficiently. Shark and GraphX provide high-speed Hive SQL queries and graph processing on top of Spark.
1:30pm-5:00pm (3h 30m) Data Science
Mining Social Web APIs with IPython Notebook
Matthew Russell (Digital Reasoning)
A code-intensive workshop that breaks down the nuts and bolts of using IPython Notebook to uncover insights from social web APIs such as Twitter, Facebook, LinkedIn, and Google+. Attendees with a basic programming background will walk away with a working knowledge of how to access and mine valuable information the social web.
5:00pm-6:30pm (1h 30m) Event
Opening Reception
Grab a drink, mingle with fellow Strata participants on Monday, October 28, and see the latest technologies and products from leading companies in the data space.
8:00pm-9:00pm (1h) Event
Ignite
Ignite is back at Strata + Hadoop World. The theme reflects the conference’s focus on data science and visualization, with an emphasis on the wonder and mysteries that data science is stumbling into.
12:30pm-1:30pm (1h)
Break: Lunch
6:30pm-8:00pm (1h 30m) Event
Startup Showcase
<strong>Part of <a href="http://oreilly.com/dataweek"class="external">NYC DataWeek</a></strong>. Don't miss Startup Showcase, Strata Conference + Hadoop World's live demo program and competition for startups and early-stage companies. The judges will pick winners from 10 finalist companies selected to present at the showcase.
8:00am-9:00am (1h)
Break: Coffee Break

Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences email mediapartners
@oreilly.com

Press & Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata + Hadoop World 2013 contacts