Skip to main content

Strata 2014 Tutorials

All confirmed Tutorials for Strata 2014 are listed below. Please note: to attend, your registration package must include Tutorials on Tuesday.

Add to your personal schedule
Ballroom G
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Olivier Grisel (INRIA)
Average rating: ****.
(4.47, 17 ratings)
3-Hours: Hands on introductory workshop on Predictive Modelling and Machine Learning with open source tools from the Python community such as scikit-learn and IPython. Read more.
Add to your personal schedule
Ballroom E
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Brian Granger (Cal Poly San Luis Obispo), Fernando Pérez (University of California at Berkeley)
Average rating: ****.
(4.44, 9 ratings)
3-Hours: IPython is an open source project that provides tools for interactive and parallel computing in Python. This includes the IPython Notebook, a web-based interactive computing environment that enables users to author documents that combine code, text, equations, figures and videos. This tutorial will provide a hands-on tour of the IPython Shell, Notebook and parallel computing architecture Read more.
Add to your personal schedule
SOLD OUT
Ballroom F
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
John Akred (Silicon Valley Data Science), Richard Williamson (Silicon Valley Data Science), Stephen OSullivan (Silicon Valley Data Science)
Average rating: ***..
(3.27, 22 ratings)
3-Hours: What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive and realtime analytical workloads. Read more.
Add to your personal schedule
GA Ballroom K
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Sameer Agarwal (UC Berkeley), Tathagata Das (University of California Berkeley), Ali Ghodsi (UC Berkeley), Ion Stoica (UC Berkeley), Ameet Talwalkar (UC Berkeley), Reynold Xin (Databricks), Matei Zaharia (Databricks), Joseph Gonzalez (UC Berkeley)
Average rating: ****.
(4.29, 7 ratings)
3-Hours: An introduction to the newest components of the open-source Berkeley Data Analytics Stack (BDAS) in development at UC Berkeley (and an overview of existing ones). BlinkDB is a SQL engine that provides fast approximate distributed query results. MLbase includes a library to make machine learning at scale easy. Tachyon is a file system that provides memory speed sharing across frameworks.. Read more.
Add to your personal schedule
Room 204
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Michael Stringer (Datascope Analytics), Dean Malmgren (Datascope Analytics), Laurie Skelly (Datascope Analytics)
Average rating: ***..
(3.80, 5 ratings)
As with many other types of projects, the most crucial part of any data-oriented project is choosing an appropriate problem or opportunity to focus on in the first place. Read more.
Add to your personal schedule
Ballroom CD
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Average rating: ***..
(3.46, 13 ratings)
All-Day: For business strategists, marketers, product managers, and entrepreneurs, Data-Driven Business looks at how to use data to make better business decisions faster. Packed with case studies, panels, and eye-opening presentations, this fast-paced day focuses on how to solve today's thorniest business problems with Big Data. It's the missing MBA for a data-driven, always-on business world. Read more.
Add to your personal schedule
Ballroom AB
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Average rating: ***..
(3.64, 14 ratings)
All-Day: Strata's regular data science track has great talks with real world experience from leading edge speakers. But we didn't just stop there—we added the Hardcore Data Science day to give you a chance to go even deeper. The Hardcore day will add new techniques and technologies to your data science toolbox, shared by leading data science practitioners from startups, industry, consulting... Read more.
Add to your personal schedule
Ballroom H
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Rich Raposa (Hortonworks)
Average rating: ****.
(4.30, 10 ratings)
This workshop provides a detailed discussion of the new features of Apache Hadoop 2.0. We will discuss how YARN turns Hadoop from a single use system for batch data processing into a multi-use platform for storing and processing data in many ways other than batch. We will also discuss the details of the new HDFS improvements like High Availability, Federation, and Snapshots. Read more.
Add to your personal schedule
GA Ballroom J
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
John Foreman (MailChimp)
Average rating: ****.
(4.87, 15 ratings)
Data science algorithms (think machine learning, clustering, outlier detection) often get conflated with the industry-standard tools and programming languages that run them. In this tutorial, John Foreman will use only spreadsheets to build models from his book Data Smart to demonstrate exactly how data science techniques work step-by-step. Read more.
Add to your personal schedule
SOLD OUT
Room 204
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Florian Leibert (Mesosphere), Paco Nathan (Databricks), Benjamin Hindman (Apache Mesos)
Average rating: ****.
(4.40, 5 ratings)
3-Hours: Mesos is a cluster manager that provides efficient resource isolation for distributed frameworks--much like Google's "Borg" for warehouse scale computing. We'll provide hands-on experience in how to build scalable, fault-tolerant data workflows atop Mesos. We'll use Chronos to orchestrate Hadoop jobs and other data prep, then use Marathon to launch a Rails + Redis app to serve results. Read more.
Add to your personal schedule
Ballroom F
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Vitaly Gordon (LinkedIn)
Average rating: ****.
(4.00, 3 ratings)
90-Minutes: Machine learning is software. As such, it should follow standard software engineering practices,, however, the current tools of the trade are not modular, maintainable or reusable. In this tutorial we will learn to work with Scalding, a Scala DSL which provides both the simplicity of languages like Apache Pig, and the power of a functional fully JVM language. Read more.
Add to your personal schedule
Ballroom G
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Carlos Guestrin (GraphLab Inc.)
Average rating: ****.
(4.00, 10 ratings)
3-Hours: This tutorial will provide an introduction to modern machine learning. Attendees will learn how to leverage some of the most popular techniques used in fraud detection, social network analysis, and personalized recommendation services. Read more.
Add to your personal schedule
GA Ballroom K
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Andy Konwinski (Databricks), Sameer Agarwal (UC Berkeley), Tathagata Das (University of California Berkeley), Ameet Talwalkar (UC Berkeley), Shivaram Venkataraman (UC Berkeley), Patrick Wendell (Databricks), Reynold Xin (Databricks), Matei Zaharia (Databricks), Joseph Gonzalez (UC Berkeley), Haoyuan Li (UC Berkeley)
Average rating: ***..
(3.10, 10 ratings)
3-Hours: Get hands-on training with the newest components of the open-source Berkeley Data Analytics Stack (BDAS). Lessons will cover BlinkDB, MLbase, Spark, Spark Streaming, and Shark. We will provide each audience member with an EC2 cluster and walk through hands-on exercises using these technologies to analyze real-world datasets. Read more.
Add to your personal schedule
GA Ballroom J
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Leland Wilkinson (Skytree)
Average rating: ***..
(3.50, 2 ratings)
3-Hours: Adviser is a new and unique statistics and machine learning application that provides a second opinion on the results of your analysis. It incorporates a full range of analytic methods plus an expert system that flags outliers, model miss-specifications, and other anomalies. This workshop will illustrate its use in real data analyses for both novices and experts. Read more.
Add to your personal schedule
Ballroom E
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Scott Murray (University of San Francisco)
Average rating: ****.
(4.85, 13 ratings)
d3.js is a powerful tool for creating interactive charts on the web with data. But digging into D3 from scratch can make your head spin. This tutorial will take you from scattered to building your own working, interactive scatterplots in three hours. Read more.
Add to your personal schedule
Ballroom H
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Ronan Stokes (Cloudera)
Average rating: *....
(1.30, 20 ratings)
3-Hours: Apache HBase is a distributed, column-oriented, key-value store for Apache Hadoop (via integration with HDFS). In this tutorial, you will learn the basic elements of building a real-time application that uses Apache HBase as a persistent data store. Read more.
Add to your personal schedule
Ballroom F
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Joe Hellerstein (Trifacta and UC Berkeley), Jeffrey Heer (Trifacta Inc. / Univ of Washington)
Average rating: ****.
(4.80, 10 ratings)
90-Minutes: Data analysts routinely report spending more time "wrangling" their data than performing analysis per se. In this tutorial we focus on the ever-present yet oft-overlooked challenges of Data Transformation, including discovery, structure, content and curation. We emphasize recent approaches that jointly emphasize interaction and inference, leveraging both human acuity and... Read more.