Skip to main content

Strata 2014 Schedule

Below are the confirmed and scheduled talks at Strata 2014. Note: The schedule is subject to change.

Customize Your Own Schedule

Create your own conference schedule using the personal scheduler function. Mark the Tutorials, Sessions, Keynotes, and Events you want to attend by clicking on the calendar icon [calendar icon] next to each listing. Then click on personal schedule below and get your own customized schedule generated.

Ballroom AB
Ballroom CD
GA Ballroom J
Ballroom E
Add IPython In Depth to your personal schedule
9:00am IPython In Depth Brian Granger (Cal Poly San Luis Obispo), Fernando Pérez (University of California at Berkeley)
Add From Scattered to Scatterplots: An Introduction to d3.js to your personal schedule
1:30pm From Scattered to Scatterplots: An Introduction to d3.js Scott Murray (University of San Francisco)
GA Ballroom K
Add Faster and Smarter Big Data Analysis with BlinkDB, MLbase, GraphX, and Tachyon: New Components of the Berkeley Data Analytics Stack (BDAS) to your personal schedule
9:00am Faster and Smarter Big Data Analysis with BlinkDB, MLbase, GraphX, and Tachyon: New Components of the Berkeley Data Analytics Stack (BDAS) Sameer Agarwal (UC Berkeley), Tathagata Das (Databricks), Ali Ghodsi (UC Berkeley), Ion Stoica (UC Berkeley), Ameet Talwalkar (Databricks), Reynold Xin (Databricks), Matei Zaharia (Databricks), Joseph Gonzalez (UC Berkeley)
Add Hands-on training with the newest BDAS components: Learn BlinkDB, MLbase, Spark, Spark Streaming, GraphX, and Shark to your personal schedule
1:30pm Hands-on training with the newest BDAS components: Learn BlinkDB, MLbase, Spark, Spark Streaming, GraphX, and Shark Andy Konwinski (Databricks), Sameer Agarwal (UC Berkeley), Tathagata Das (Databricks), Ameet Talwalkar (Databricks), Shivaram Venkataraman (UC Berkeley), Patrick Wendell (Databricks), Reynold Xin (Databricks), Matei Zaharia (Databricks), Joseph Gonzalez (UC Berkeley), Haoyuan Li (UC Berkeley)
Room 204
Add Design Thinking for Dummies (Data Scientists) to your personal schedule
9:00am Design Thinking for Dummies (Data Scientists) Michael Stringer (Datascope Analytics), Dean Malmgren (Datascope Analytics), Laurie Skelly (Datascope Analytics)
Add Big Data Workflows on Mesos Clusters to your personal schedule
1:30pm Big Data Workflows on Mesos Clusters Florian Leibert (Mesosphere), Paco Nathan (Databricks), Benjamin Hindman (Apache Mesos)
Ballroom F
Add Building a Data Platform to your personal schedule
9:00am Building a Data Platform John Akred (Silicon Valley Data Science), Richard Williamson (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science)
Add Effective Data Science With Scalding to your personal schedule
1:30pm Effective Data Science With Scalding Vitaly Gordon (LinkedIn)
Add Data Transformation: Skills of the Agile Data Wrangler to your personal schedule
3:30pm Data Transformation: Skills of the Agile Data Wrangler Joe Hellerstein (Trifacta and UC Berkeley), Jeffrey Heer (Trifacta Inc. / Univ of Washington)
Ballroom G
Add Large-scale Machine Learning Cookbook using GraphLab to your personal schedule
1:30pm Large-scale Machine Learning Cookbook using GraphLab Carlos Guestrin (GraphLab Inc.)
Ballroom H
Add Introduction to Hadoop 2.0 to your personal schedule
9:00am Introduction to Hadoop 2.0 Rich Raposa (Hortonworks)
Add Building Real-Time Apps with Apache HBase to your personal schedule
1:30pm Building Real-Time Apps with Apache HBase Ronan Stokes (Cloudera)
Add Opening Reception to your personal schedule
5:00pm Plenary
Room: Exhibit Hall
Opening Reception
Add Startup Showcase to your personal schedule
6:30pm Plenary
Room: Mission City
Startup Showcase
12:30pm Break
Room: Lunch
9:00am-5:00pm (8h) Hardcore Data Science
Hardcore Data Science
All-Day: Strata's regular data science track has great talks with real world experience from leading edge speakers. But we didn't just stop there—we added the Hardcore Data Science day to give you a chance to go even deeper. The Hardcore day will add new techniques and technologies to your data science toolbox, shared by leading data science practitioners from startups, industry, consulting...
9:00am-5:00pm (8h) Data Driven Business
Data-Driven Business Day
All-Day: For business strategists, marketers, product managers, and entrepreneurs, Data-Driven Business looks at how to use data to make better business decisions faster. Packed with case studies, panels, and eye-opening presentations, this fast-paced day focuses on how to solve today's thorniest business problems with Big Data. It's the missing MBA for a data-driven, always-on business world.
9:00am-12:30pm (3h 30m) Data Science
Dissecting Data Science Algorithms using Spreadsheets
John Foreman (MailChimp)
Data science algorithms (think machine learning, clustering, outlier detection) often get conflated with the industry-standard tools and programming languages that run them. In this tutorial, John Foreman will use only spreadsheets to build models from his book Data Smart to demonstrate exactly how data science techniques work step-by-step.
1:30pm-5:00pm (3h 30m) Data Science
Adviser: Learning How to get A Second Opinion on Your Analysis when it's Important to get it Right
Leland Wilkinson (Skytree)
3-Hours: Adviser is a new and unique statistics and machine learning application that provides a second opinion on the results of your analysis. It incorporates a full range of analytic methods plus an expert system that flags outliers, model miss-specifications, and other anomalies. This workshop will illustrate its use in real data analyses for both novices and experts.
9:00am-12:30pm (3h 30m) Data Science
IPython In Depth
Brian Granger (Cal Poly San Luis Obispo) et al
3-Hours: IPython is an open source project that provides tools for interactive and parallel computing in Python. This includes the IPython Notebook, a web-based interactive computing environment that enables users to author documents that combine code, text, equations, figures and videos. This tutorial will provide a hands-on tour of the IPython Shell, Notebook and parallel computing architecture
1:30pm-5:00pm (3h 30m) Design
From Scattered to Scatterplots: An Introduction to d3.js
Scott Murray (University of San Francisco)
d3.js is a powerful tool for creating interactive charts on the web with data. But digging into D3 from scratch can make your head spin. This tutorial will take you from scattered to building your own working, interactive scatterplots in three hours.
9:00am-12:30pm (3h 30m) Hadoop and Beyond
Faster and Smarter Big Data Analysis with BlinkDB, MLbase, GraphX, and Tachyon: New Components of the Berkeley Data Analytics Stack (BDAS)
Sameer Agarwal (UC Berkeley) et al
3-Hours: An introduction to the newest components of the open-source Berkeley Data Analytics Stack (BDAS) in development at UC Berkeley (and an overview of existing ones). BlinkDB is a SQL engine that provides fast approximate distributed query results. MLbase includes a library to make machine learning at scale easy. Tachyon is a file system that provides memory speed sharing across frameworks..
1:30pm-5:00pm (3h 30m) Hadoop and Beyond
Hands-on training with the newest BDAS components: Learn BlinkDB, MLbase, Spark, Spark Streaming, GraphX, and Shark
Andy Konwinski (Databricks) et al
3-Hours: Get hands-on training with the newest components of the open-source Berkeley Data Analytics Stack (BDAS). Lessons will cover BlinkDB, MLbase, Spark, Spark Streaming, and Shark. We will provide each audience member with an EC2 cluster and walk through hands-on exercises using these technologies to analyze real-world datasets.
9:00am-12:30pm (3h 30m) Design
Design Thinking for Dummies (Data Scientists)
Michael Stringer (Datascope Analytics) et al
As with many other types of projects, the most crucial part of any data-oriented project is choosing an appropriate problem or opportunity to focus on in the first place.
1:30pm-5:00pm (3h 30m) Hadoop and Beyond
Big Data Workflows on Mesos Clusters
Florian Leibert (Mesosphere) et al
3-Hours: Mesos is a cluster manager that provides efficient resource isolation for distributed frameworks--much like Google's "Borg" for warehouse scale computing. We'll provide hands-on experience in how to build scalable, fault-tolerant data workflows atop Mesos. We'll use Chronos to orchestrate Hadoop jobs and other data prep, then use Marathon to launch a Rails + Redis app to serve results.
9:00am-12:30pm (3h 30m) Hadoop and Beyond
Building a Data Platform
John Akred (Silicon Valley Data Science) et al
3-Hours: What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive and realtime analytical workloads.
1:30pm-3:00pm (1h 30m) Data Science
Effective Data Science With Scalding
Vitaly Gordon (LinkedIn)
90-Minutes: Machine learning is software. As such, it should follow standard software engineering practices,, however, the current tools of the trade are not modular, maintainable or reusable. In this tutorial we will learn to work with Scalding, a Scala DSL which provides both the simplicity of languages like Apache Pig, and the power of a functional fully JVM language.
3:30pm-5:00pm (1h 30m) Data Science
Data Transformation: Skills of the Agile Data Wrangler
Joe Hellerstein (Trifacta and UC Berkeley) et al
90-Minutes: Data analysts routinely report spending more time "wrangling" their data than performing analysis per se. In this tutorial we focus on the ever-present yet oft-overlooked challenges of Data Transformation, including discovery, structure, content and curation. We emphasize recent approaches that jointly emphasize interaction and inference, leveraging both human acuity and...
9:00am-12:30pm (3h 30m) Data Science
Introduction to Machine Learning with IPython and scikit-learn
Olivier Grisel (INRIA)
3-Hours: Hands on introductory workshop on Predictive Modelling and Machine Learning with open source tools from the Python community such as scikit-learn and IPython.
1:30pm-5:00pm (3h 30m) Data Science
Large-scale Machine Learning Cookbook using GraphLab
Carlos Guestrin (GraphLab Inc.)
3-Hours: This tutorial will provide an introduction to modern machine learning. Attendees will learn how to leverage some of the most popular techniques used in fraud detection, social network analysis, and personalized recommendation services.
9:00am-12:30pm (3h 30m) Hadoop and Beyond
Introduction to Hadoop 2.0
Rich Raposa (Hortonworks)
This workshop provides a detailed discussion of the new features of Apache Hadoop 2.0. We will discuss how YARN turns Hadoop from a single use system for batch data processing into a multi-use platform for storing and processing data in many ways other than batch. We will also discuss the details of the new HDFS improvements like High Availability, Federation, and Snapshots.
1:30pm-5:00pm (3h 30m) Hadoop and Beyond
Building Real-Time Apps with Apache HBase
Ronan Stokes (Cloudera)
3-Hours: Apache HBase is a distributed, column-oriented, key-value store for Apache Hadoop (via integration with HDFS). In this tutorial, you will learn the basic elements of building a real-time application that uses Apache HBase as a persistent data store.
5:00pm-6:30pm (1h 30m) Event
Opening Reception
Grab a drink, mingle with fellow Strata participants, and see the latest technologies and products from leading companies in the data space.
6:30pm-8:00pm (1h 30m) Event
Startup Showcase
Once again at Strata, we’ll be inviting the best of the best to demonstrate their innovations at Startup Showcase.
12:30pm-1:30pm (1h)
Break