Skip to main content

Schedule: Data Science sessions

Data scientists are polymaths, increasingly mixing math, social science, distributed computing, and narrative to turn raw data into insights and a better understanding of human behavior. We’ll look into the latest techniques in machine learning, prediction, and technology as well as the softer topics of building effective data teams and creating a more data-driven organizational culture that can change and adapt.

Track Hosts

Rachel Kalmar is a neuroscientist who is passionate about making sensor data accessible, actionable, and predictive. How do we take sensor data to the next level, from tracking to actions? Rachel is active in the Bay Area hardware community, and runs a sensor meetup (meetup.com/Sensored) and discussion group (bit.ly/sensorites).

Joseph Adler has many years of experience in data mining and data analysis at companies including DoubleClick, American Express, and VeriSign. He graduated from MIT with an B.Sc. and M.Eng in Computer Science and Electrical Engineering. He is the inventor of several patents for computer security and cryptography, and the author of "Baseball Hacks" and "R in a Nutshell". Currently, he is a senior data scientist at LinkedIn.

Add to your personal schedule
Ballroom G
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Olivier Grisel (INRIA)
Average rating: ****.
(4.47, 17 ratings)
3-Hours: Hands on introductory workshop on Predictive Modelling and Machine Learning with open source tools from the Python community such as scikit-learn and IPython. Read more.
Add to your personal schedule
Ballroom E
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Brian Granger (Cal Poly San Luis Obispo), Fernando Pérez (University of California at Berkeley)
Average rating: ****.
(4.44, 9 ratings)
3-Hours: IPython is an open source project that provides tools for interactive and parallel computing in Python. This includes the IPython Notebook, a web-based interactive computing environment that enables users to author documents that combine code, text, equations, figures and videos. This tutorial will provide a hands-on tour of the IPython Shell, Notebook and parallel computing architecture Read more.
Add to your personal schedule
GA Ballroom J
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
John Foreman (MailChimp)
Average rating: ****.
(4.87, 15 ratings)
Data science algorithms (think machine learning, clustering, outlier detection) often get conflated with the industry-standard tools and programming languages that run them. In this tutorial, John Foreman will use only spreadsheets to build models from his book Data Smart to demonstrate exactly how data science techniques work step-by-step. Read more.
Add to your personal schedule
Ballroom G
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Average rating: ****.
(4.00, 10 ratings)
3-Hours: This tutorial will provide an introduction to modern machine learning. Attendees will learn how to leverage some of the most popular techniques used in fraud detection, social network analysis, and personalized recommendation services. Read more.
Add to your personal schedule
GA Ballroom J
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Leland Wilkinson (Skytree)
Average rating: ***..
(3.50, 2 ratings)
3-Hours: Adviser is a new and unique statistics and machine learning application that provides a second opinion on the results of your analysis. It incorporates a full range of analytic methods plus an expert system that flags outliers, model miss-specifications, and other anomalies. This workshop will illustrate its use in real data analyses for both novices and experts. Read more.
Add to your personal schedule
Ballroom F
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Vitaly Gordon (LinkedIn)
Average rating: ****.
(4.00, 3 ratings)
90-Minutes: Machine learning is software. As such, it should follow standard software engineering practices,, however, the current tools of the trade are not modular, maintainable or reusable. In this tutorial we will learn to work with Scalding, a Scala DSL which provides both the simplicity of languages like Apache Pig, and the power of a functional fully JVM language. Read more.
Add to your personal schedule
Ballroom F
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Joe Hellerstein (Trifacta and UC Berkeley), Jeffrey Heer (Trifacta | University of Washington)
Average rating: ****.
(4.80, 10 ratings)
90-Minutes: Data analysts routinely report spending more time "wrangling" their data than performing analysis per se. In this tutorial we focus on the ever-present yet oft-overlooked challenges of Data Transformation, including discovery, structure, content and curation. We emphasize recent approaches that jointly emphasize interaction and inference, leveraging both human acuity and... Read more.
Add to your personal schedule
Ballroom AB
Chris Re (Stanford University)
Average rating: ****.
(4.18, 11 ratings)
A new generation of data processing systems, including web search, Google's Knowledge Graph, IBM's Watson, and several different recommendation systems, combine rich databases with software driven by machine learning. This talk describes our recent thoughts on one crucial pain point in the construction of trained systems feature engineering. Read more.
Add to your personal schedule
Ballroom AB
Olivier Grisel (INRIA)
Average rating: ***..
(3.86, 7 ratings)
IPython and scikit-learn offer a nice environment for interactive data analytics in general and predictive modelling in particular. This presentation will give an overview on how to use both to perform tasks such as distributed model parameter tuning and parallel training of Random Forests on ad hoc compute clusters provisioned in the cloud. Read more.
Add to your personal schedule
Ballroom AB
Adam Marcus (Locu / GoDaddy)
Average rating: ****.
(4.50, 6 ratings)
Machine learning and paid crowdsourcing power several virtuous cycles in Locu's data processing pipeline. To solve various problems, we interact with hundreds of long-term crowd workers on oDesk and tens of thousands of shorter-term workers on CrowdFlower. Come learn about Locu's magic with examples based on problems we solve every day. Read more.
Add to your personal schedule
Ballroom AB
Lukas Biewald (CrowdFlower)
Average rating: ***..
(3.83, 6 ratings)
Data scientists know how hard it is to collect, categorize and label vast amounts of data. But some smart data scientists are effectively leveraging the human intelligence of the crowd to solve these problems, resulting in better training of machine learning models and improved system performance. Read more.
Add to your personal schedule
Ballroom AB
Marc Smith (Connected Action Consulting Group)
Average rating: **...
(2.67, 3 ratings)
SNA, social network analysis, is a powerful technique for making sense of a connected world. But the skills needed to collect, analyze, visualize, and gain insights into collections of connections are hard to find. Now, new tools make networks as easy to manage as a pie chart. Using the familiar Excel spreadsheet, NodeXL enables end users to gain insights into Twitter, Facebook & more. Read more.
Add to your personal schedule
Ballroom AB
Ted Willke (Intel)
Average rating: ****.
(4.11, 9 ratings)
Graph analytics promises to uncover new patterns in big data - but it's not easy to use commercially. Why is it so tough for data scientists to construct graphs and extract insight? This talk discusses Intel's efforts to deliver a graph cluster solution that is as easy to work with as it is powerful. Read more.
Add to your personal schedule
Ballroom AB
Diane Chang (Intuit), Steven Hillion (Alpine Data Labs), Nick Kolegraff (Rackspace), Matthew Gee (Effortless Energy / University of Chicago )
Average rating: ***..
(3.78, 9 ratings)
In this panel discussion, experts from four different industries will share their first-hand experiences building and deploying teams of data scientists. Read more.
Add to your personal schedule
Ballroom AB
Abe Gong (Jawbone)
Average rating: ****.
(4.00, 9 ratings)
Creating value from big, messy data sets can be a daunting task. The session introduces the Sidekick Pattern: using small, curated data to increase the value of Big Data. Drawing on lessons from data science for Jawbone’s UP fitness tracker, we will see how smart selection of data sidekicks can accelerate analysis, solve cold start problems, and simplify complicated data pipelines. Read more.
Add to your personal schedule
Ballroom AB
Beau Cronin (Salesforce)
Average rating: ****.
(4.10, 10 ratings)
Probabilistic programming is a new paradigm for modeling and inference that offers hope for a fundamental shift in our approach to understanding the stories behind our data. This talk will provide an overview of the systems currently available and their relative strengths, show examples of their usage, and offer a peak at the road ahead. Read more.
Add to your personal schedule
Ballroom AB
Chris Harland (Microsoft)
Average rating: *****
(5.00, 13 ratings)
Predictive models are popular for their ability to grapple with massive data and bring to light features which are non-obvious to even the best domain experts. Solving practical problems with real world data involves creating models that balance predictive accuracy with practical significance. This talk provides examples of this balance in optimizing Chicago area bars and extends to Bing search. Read more.
Add to your personal schedule
Ballroom E
Michael Abbott (Kleiner Perkins Caufield & Byers)
Average rating: *****
(5.00, 3 ratings)
Everyone knows that massive, real-time data processing is behind many of the hottest new companies in technology. But what’s really going on underneath the covers? In this session, investor and technology entrepreneur Michael Abbott unboxes three startups to look at the technology, architecture, and innovations they’ve harnessed to deliver their products and services. Read more.
Add to your personal schedule
Ballroom AB
Emil Eifrem (Neo Technology / Neo4j)
Average rating: ***..
(3.33, 9 ratings)
Recent years have seen an explosion of technologies for managing and analyzing graphs. While most people associate "graph" with "the social graph," there's a wide variety of non-social use cases for graph technologies. This session will explore graph adoption in finance, telecom, healthcare, HR & recruiting, gaming and beyond, using concrete case studies from actual graph production deployments. Read more.
Add to your personal schedule
Ballroom AB
Wes McKinney (Cloudera)
Average rating: ***..
(3.00, 9 ratings)
This talk will address some of the pressing problems in data preparation, analysis, visualization, and collaboration facing the modern data analyst. We will discuss the ways in which both programmatic and UI-driven tools are helping solve these problems and the areas in which more work and innovation are needed. Read more.
Add to your personal schedule
Ballroom AB
Neal Ford (ThoughtWorks)
Average rating: ***..
(3.60, 5 ratings)
Analytics and agility sometimes seem like natural enemies, but analytics suffer the same shifting requirements and uncertainty as other projects. This talk describe technique for incorporating analytics and data science into an agile rhythm. Read more.
Add to your personal schedule
Ballroom AB
Bin Yu (UC Berkeley)
Average rating: ****.
(4.00, 3 ratings)
In a thrilling breakthrough at the intersection of neuroscience and statistics, penalized Least Squares methods have been used to construct a "mind-reading" algorithm that reconstructs movies from fMRI brain signals. Read more.
Add to your personal schedule
Ballroom AB
Ameet Talwalkar (Databricks), Evan Sparks (UC Berkeley)
Average rating: ****.
(4.14, 7 ratings)
Implementing and consuming Machine Learning techniques at scale are difficult tasks for ML Developers and End Users. MLbase (www.mlbase.org) is an open-source platform under active development addressing the issues of both groups. In this talk we will describe the high-level functionality of MLbase and demonstrate its *scalability* and *ease-of-use* via real-world examples. Read more.