Strata 2013 Tutorials

Add to your personal schedule
Ballroom E
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Dean Wampler (Typesafe)
Average rating: ****.
(4.69, 13 ratings)
This hands-on tutorial teaches you how to use Hive, a high-level, data warehouse tool for Hadoop. Hive provides a SQL-like query language, HiveQL, that is easy to learn for people with prior SQL experience, making Hive attractive for data warehousing teams. Hive leverages the power of Hadoop for working with massive data sets without requiring expertise in MapReduce programming. Read more.
Add to your personal schedule
SOLD OUT
Room 204
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Average rating: ****.
(4.67, 3 ratings)
Communicating Data Clearly describes how to draw clear, concise, accurate graphs that are easier to understand than many of the graphs one sees today. The tutorial emphasizes how to avoid common mistakes that produce confusing or even misleading graphs. Graphs for one, two, three, and many variables are covered as well as general principles for creating effective graphs. Read more.
Add to your personal schedule
Great America Ballroom K
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Jonathan Hsieh (Cloudera, Inc), Himanshu Vashishtha (Cloudera, Inc.)
Average rating: ***..
(3.12, 16 ratings)
HBase is one of the more popular open source NoSQL databases that have cropped up over the last few years. Building applications that use HBase effectively is challenging. This tutorial is geared towards teaching the basics of building applications using HBase and covers concepts that a developer should know while using HBase as a backend store for their application. Read more.
Add to your personal schedule
Ballroom AB
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
William Cukierski (Kaggle), Ben Hamner (Kaggle)
Average rating: ***..
(3.67, 12 ratings)
As more industries adopt data-driven policies, people untrained in the formal analysis of data are find themselves staring at a spreadsheet and asking what they did to deserve it. In this tutorial, two of Kaggle’s top data scientists will walk attendees through the basics of solving an analytics challenge, from defining the problem, to performing basic analysis, to visualizing the output. Read more.
Add to your personal schedule
Ballroom G
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Ion Stoica (UC Berkeley), Matei Zaharia (Databricks), Reynold Xin (Databricks), Shivaram Venkataraman (UC Berkeley), Andy Konwinski (UC Berkeley), Tathagata Das (Databricks)
Average rating: *****
(5.00, 3 ratings)
An introduction Spark and Shark, two components of the open-source Berkeley Data Analytics Stack (BDAS) in development at UC Berkeley. Spark is a high-speed cluster computing system compatible with Hadoop that can outperform it by up to 100x. Shark is a port of Apache Hive onto Spark that is fully compatible with, and up to 100x faster than, Hive. Read more.
Add to your personal schedule
Ballroom F
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Garrett Grolemund (RStudio)
Average rating: ****.
(4.38, 8 ratings)
Learn how to wrangle data in R: from acquiring and cleaning data, to changing data formats and performing targeted, groupwise calculations. This course will emphasize the 'reshape2' and 'plyr' packages. Read more.
Add to your personal schedule
Great America Ballroom J
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Edd Dumbill (Silicon Valley Data Science)
Average rating: ****.
(4.33, 3 ratings)
For CIOs, IT executives, and technology professionals, Strata's Enterprise Big Data day lays out the roadmap to get your organization up to speed on big data. In this all-day event, hear how to create a big data strategy, understand the issues of managing data, and learn how data science can be used powerfully in your organization. Read more.
Add to your personal schedule
Ballroom CD
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Alistair Croll (Solve For Interesting)
Average rating: ***..
(3.75, 4 ratings)
For business strategists, marketers, product managers, and entrepreneurs, Data Driven Business looks at how to use data to make better business decisions faster. Packed with case studies, panels, and eye-opening presentations, this fast-paced day focuses on how to solve today's thorniest business problems with Big Data. It's the missing MBA for a data-driven, always-on business world. Read more.
Add to your personal schedule
Ballroom H
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Karan Bhatia (Amazon Web Services), Parviz Deyhim (Amazon Web Services)
Average rating: ***..
(3.25, 4 ratings)
This hands-on tutorial will give you on an overview of how AWS can quickly and easily enable you to start generating insights from your company’s data. Read more.
Add to your personal schedule
Ballroom AB
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Ryan Tabora (Think Big Analytics), Jason Rutherglen (Datastax)
Average rating: **...
(2.31, 13 ratings)
In this hands-on tutorial, you will learn the importance of distributed search by our industry experience and knowledge of real use cases. We’ll introduce different architectures that incorporate distributed search techniques, share pain points experienced and lessons learned. For the hands-on part of the tutorial, you will learn how to install and use Apache Solr for real-time search on big data. Read more.
Add to your personal schedule
Great America Ballroom K
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Scott Murray (University of San Francisco), Jerome Cukier (Jerome Cukier)
Average rating: ***..
(3.00, 5 ratings)
An introduction to D3, one of the most powerful Javascript data visualization libraries. Read more.
Add to your personal schedule
Room 204
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Matei Zaharia (Databricks), Reynold Xin (Databricks), Andy Konwinski (UC Berkeley), Tathagata Das (Databricks), Patrick Wendell (Databricks)
Average rating: ****.
(4.00, 1 rating)
Building on our previous tutorial introducing BDAS, the open-source Berkeley Data Analytics Stack, in this tutorial we will provide each audience member with a Spark/Shark cluster on EC2 and walk through hands-on coding examples. Lessons will cover the Spark and Shark command line interfaces, writing a standalone program, and data clustering using a distributed machine learning algorithm on Spark. Read more.
Add to your personal schedule
Ballroom E
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Simon Rogers (Guardian), Feilding Cage (Guardian)
Average rating: ****.
(4.67, 6 ratings)
This hands-on session will show how a dataset turns into a story, the narrative process the Guardian's team goes through, the tools used and the lessons learned. Read more.
Add to your personal schedule
Ballroom F
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Wes McKinney (DataPad Inc.)
Average rating: ***..
(3.88, 8 ratings)
This tutorial will be a hands-on introduction to the essential tools for working with structured data in Python, 'pandas' and 'NumPy' Read more.
Add to your personal schedule
Ballroom H
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Ryan Boyd (Google), Michael Manoochehri (Google, Inc.), Julia Ferraioli (Google)
Average rating: **...
(2.57, 7 ratings)
When data volume and velocity become massive, processing and analysis solutions require specialized technologies for different parts of the data pipeline. Google’s Cloud Platform is designed to help you focus on building applications, not infrastructure. We’ll demonstrate how to build end to end Big Data applications - from data collection, to analysis, to reporting and visualization. Read more.
Add to your personal schedule
Ballroom G
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Sarah Sproehnle (Cloudera, Inc.)
Average rating: ****.
(4.71, 7 ratings)
This tutorial provides a solid foundation for those seeking to understand large scale data processing with MapReduce and Hadoop, plus its associated ecosystem. This session is intended for those who are new to Hadoop and are seeking to understand where Hadoop is appropriate and how it fits with existing systems. No programming experience is required. Read more.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts