Strata 2012 Schedule

Below are the confirmed and scheduled talks at Strata 2011 (schedule subject to change).

Customize Your Own Schedule

Create your own Strata schedule using the personal scheduler function. Mark the tutorials, sessions, keynotes, and events you want to attend by clicking on the calendar icon [calendar icon] next to each listing. Then click on "personal schedule" below and get your own customized schedule generated.

Ballroom AB
9:00am Deep Data
Ballroom CD
9:00am Introduction to Apache Hadoop Sarah Sproehnle (Cloudera, Inc.)
1:30pm The Two Most Important Algorithms in Predictive Modeling Today Jeremy Howard (Kaggle), Mike Bowles (Biomatica)
Ballroom E
9:00am Large scale web mining Ken Krugler (Scale Unlimited)
1:30pm The Craft of Data Journalism Simon Rogers (Guardian), Michael Brunton-Spall (Guardian News and Media)
Ballroom F
9:00am Big Data Without the Heavy Lifting James Dixon (Pentaho), Chris Deptula (OpenBI)
1:30pm Big Data Entity Extraction With Less Work and Less Code Richard Taylor (HPCC Systems from LexisNexis Risk Solutions)
GA K
GA J
9:00am Designing Data Visualizations Workshop Noah Iliinsky (IBM)
1:30pm Developing applications for Apache Hadoop Sarah Sproehnle (Cloudera, Inc.)
Ballroom G
9:00am Introduction to R for Data Mining Joseph Rickert (Revolution Analytics)
1:30pm Building Applications with Apache Cassandra Nate McCall (Apigee)
Ballroom H
9:00am Hadoop Data Warehousing with Hive Dean Wampler (Typesafe), Jason Rutherglen (Datastax)
1:30pm Hands-on Visualization with Tableau Jock Mackinlay (Tableau Software), Ross Perez (Tableau Software)
12:30pm Lunch Sponsored by HPCC Systems
Room: Santa Clara Ballroom
7:00pm Plenary
Room: Mission CIty Ballroom Foyer
Strata Mini Maker Faire & Data Crush
5:00pm Break
Room: On Your Own
9:00am-5:00pm (8h) Deep Data
Deep Data
Deep Data is a no-holds-barred program for data scientists. The advanced technical content will keep you up to speed with the latest techniques, and give you the opportunity to debate and network with the most skilled data scientists in our industry.
9:00am-12:30pm (3h 30m) Data Science
Introduction to Apache Hadoop
Sarah Sproehnle (Cloudera, Inc.)
This tutorial provides a solid foundation for those seeking to understand large scale data processing with MapReduce and Hadoop, plus its associated ecosystem. This session is intended for those who are new to Hadoop and are seeking to understand where Hadoop is appropriate and how it fits with existing systems. No programming experience is required.
1:30pm-5:00pm (3h 30m) Data Science
The Two Most Important Algorithms in Predictive Modeling Today
Jeremy Howard (Kaggle) et al
Wouldn't it be great if there were just use two algorithms which could handle most of your predictive modeling needs? It turns out that actually this is the case. Noted machine learning instructor Dr Mike Bowles and champion data miner Jeremy Howard will teach you everything you need to know to apply them successfully.
9:00am-12:30pm (3h 30m) Data Science
Large scale web mining
Ken Krugler (Scale Unlimited)
Want to extract and process Big Data from the web? This tutorial will show you how to use key open source technologies such as Hadoop, Cascading, Bixo, Tika, Mahout and Solr to create scalable, reliable web mining solutions.
1:30pm-5:00pm (3h 30m) Data Science
The Craft of Data Journalism
Simon Rogers (Guardian) et al
Learn first hand from award-winning Guardian journalists how they mix data, journalism and visualization to break and tell compelling stories: all at newsroom speeds.
9:00am-12:30pm (3h 30m) Sponsored Session
Big Data Without the Heavy Lifting
James Dixon (Pentaho) et al
The big data world is extremely chaotic based on technology in its infancy. Learn how to tame this chaos, integrate it within your existing data environments (RDBMS, analytic databases, applications), manage the workflow, orchestrate jobs, improve productivity and make using big data technologies accessible to a much wider spectrum of developers, analysts and data scientists.
1:30pm-5:00pm (3h 30m) Sponsored Session
Big Data Entity Extraction With Less Work and Less Code
Richard Taylor (HPCC Systems from LexisNexis Risk Solutions)
While extracting entities from massive amounts of text is a major problem, a proven solution exists. This tutorial will demonstrate a natural language parsing technology to extract entities from all kinds of text using massively parallel clusters.
9:00am-5:00pm (8h) JumpStart
Strata Jumpstart
Jumpstart looks at how building and running businesses changes in a data-driven world. It's the missing MBA for Big Data.
9:00am-12:30pm (3h 30m) Visualization & Interface
Designing Data Visualizations Workshop
Noah Iliinsky (IBM)
This workshop is a jumpstart lesson on how to get from a blank page and a pile of data to a useful data visualization. We'll focus on the design process, not specific tools. Bring your sample data and paper or a laptop; leave with new visualization ideas.
1:30pm-5:00pm (3h 30m) Data Science
Developing applications for Apache Hadoop
Sarah Sproehnle (Cloudera, Inc.)
Learn now how to use a Hadoop cluster for data analysis using Java MapReduce, Apache Hive and Apache Pig, and get an overview of using the HBase Hadoop database. Some programming experience is strongly recommended for this session.
9:00am-12:30pm (3h 30m) Data Science
Introduction to R for Data Mining
Joseph Rickert (Revolution Analytics)
This tutorial will enable anyone with some programming experience to begin analyzing data with the R programming language
1:30pm-5:00pm (3h 30m) Data Science
Building Applications with Apache Cassandra
Nate McCall (Apigee)
This presentation goes beyond the hype, buzzwords, and rehashed slides and actually presents the attendees with a hands-on, step-by-step tutorial on how to write a Java application on top of Apache Cassandra. It focuses on concepts such as idempotence, tunable consistency, and shared-nothing clusters to help attendees get started with Apache Cassandra quickly while avoiding common pitfalls.
9:00am-12:30pm (3h 30m) Data Science
Hadoop Data Warehousing with Hive
Dean Wampler (Typesafe) et al
This hands-on tutorial teaches you how to setup and use Hive, a high-level, data warehouse tool for Hadoop. Hive provides a SQL-like query language, HiveQL, that is easy to learn for people with prior SQL experience, making Hive attractive for data warehousing teams. Hive leverages the power of Hadoop for working with massive data sets without requiring expertise in MapReduce programming.
1:30pm-5:00pm (3h 30m) Data Science
Hands-on Visualization with Tableau
Jock Mackinlay (Tableau Software) et al
In this hands-on class, learn how to turn data into effective, interactive visualizations. You do not require a Tableau license to participate, but must bring a Windows laptop or virtual machine.
12:30pm-1:30pm (1h)
Break: Lunch Sponsored by HPCC Systems
7:00pm-9:00pm (2h)
Strata Mini Maker Faire & Data Crush
Two events happening in the same time & place: *Mini Maker Faire* is a showcase of innovative data-related hardware, apps, and projects *Data Crush*, an experiment combining wine-tasting with the gathering, analysis, and application of data to track behavioral trends and influencing factors.
5:00pm-7:00pm (2h)
Break

Sponsors

  • EMC
  • Microsoft
  • HPCC Systems™ from LexisNexis® Risk Solutions
  • MarkLogic
  • Shared Learning Collaborative
  • Cloudera
  • Digital Reasoning Systems
  • Pentaho
  • Rackspace Hosting
  • Teradata Aster
  • VMware
  • IBM
  • NetApp
  • Oracle
  • 1010data
  • 10gen
  • Acxiom
  • Amazon Web Services
  • Calpont
  • Cisco
  • Couchbase
  • Cray
  • Datameer
  • DataSift
  • DataStax
  • Esri
  • Facebook
  • Feedzai
  • Hadapt
  • Hortonworks
  • Impetus
  • Jaspersoft
  • Karmasphere
  • Lucid Imagination
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Scaleout Software
  • Skytree, Inc.
  • Splunk
  • Tableau Software
  • Talend

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at sstewart@oreilly.com.

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
@oreilly.com

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

View a complete list of Strata contacts