Strata 2013 Data Science Sessions

Inside the world of data practitioners, from the hard science of the latest algorithms to thorny issues of cultural change and team-building.

Track Hosts

Monica Rogati is one of the founding members of the LinkedIn product data science team. She leads a team of data scientists and turns data into products, actionable insights and (news) stories.

Peter Skomoroch is a Principal Data Scientist at LinkedIn where he leads a team focused on identity, reputation, information extraction, and building data driven products. He was also the creator of LinkedIn Skills.

Add to your personal schedule
Ballroom AB
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
William Cukierski (Kaggle), Ben Hamner (Kaggle)
Average rating: ***..
(3.67, 12 ratings)
As more industries adopt data-driven policies, people untrained in the formal analysis of data are find themselves staring at a spreadsheet and asking what they did to deserve it. In this tutorial, two of Kaggle’s top data scientists will walk attendees through the basics of solving an analytics challenge, from defining the problem, to performing basic analysis, to visualizing the output. Read more.
Add to your personal schedule
Ballroom F
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Garrett Grolemund (RStudio)
Average rating: ****.
(4.38, 8 ratings)
Learn how to wrangle data in R: from acquiring and cleaning data, to changing data formats and performing targeted, groupwise calculations. This course will emphasize the 'reshape2' and 'plyr' packages. Read more.
Add to your personal schedule
Ballroom E
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Simon Rogers (Guardian), Feilding Cage (Guardian)
Average rating: ****.
(4.67, 6 ratings)
This hands-on session will show how a dataset turns into a story, the narrative process the Guardian's team goes through, the tools used and the lessons learned. Read more.
Add to your personal schedule
Ballroom F
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Wes McKinney (DataPad Inc.)
Average rating: ***..
(3.88, 8 ratings)
This tutorial will be a hands-on introduction to the essential tools for working with structured data in Python, 'pandas' and 'NumPy' Read more.
Add to your personal schedule
Ballroom G
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Sarah Sproehnle (Cloudera, Inc.)
Average rating: ****.
(4.71, 7 ratings)
This tutorial provides a solid foundation for those seeking to understand large scale data processing with MapReduce and Hadoop, plus its associated ecosystem. This session is intended for those who are new to Hadoop and are seeking to understand where Hadoop is appropriate and how it fits with existing systems. No programming experience is required. Read more.
Add to your personal schedule
Ballroom H
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Ryan Boyd (Google), Michael Manoochehri (Google, Inc.), Julia Ferraioli (Google, Inc.)
Average rating: **...
(2.57, 7 ratings)
When data volume and velocity become massive, processing and analysis solutions require specialized technologies for different parts of the data pipeline. Google’s Cloud Platform is designed to help you focus on building applications, not infrastructure. We’ll demonstrate how to build end to end Big Data applications - from data collection, to analysis, to reporting and visualization. Read more.
Add to your personal schedule
Great America Ballroom J
Kate Matsudaira (Decide, Inc.)
Average rating: **...
(2.00, 1 rating)
Data science can power incredible innovation, but the most important insights typically aren't known ahead of time. This makes it challenging to manage schedules, expectations, and goals. At Decide, data science is core to our product. This talk will share lessons learned from both sides, and provide the audience with strategies to improve process and communication in their own teams. Read more.
Add to your personal schedule
Ballroom AB
Elisabeth Crawford (Birchbox)
Average rating: ****.
(4.00, 4 ratings)
Every month Birchbox delivers a box of samples to each of its subscribers. Boxes are targeted to subscribers based on their profile, history, and behavior. In this talk we discuss the mathematics behind allocating samples to customers (aka solving for happiness). Read more.
Add to your personal schedule
Ballroom AB
Michael Bailey (Facebook)
Average rating: ****.
(4.25, 16 ratings)
Everyone wants to predict the future; fame and fortune follow those who succeed. I cover the basics of forecasting including tips, tricks, and best practices, and how forecasting differs from prediction analysis. I walk through simple examples using R and link to several resources to put you on the path to becoming the next Nostradamus. Read more.
Add to your personal schedule
Ballroom AB
Rachel Schutt (Johnson Research Labs)
Average rating: ***..
(3.45, 11 ratings)
Rachel Schutt, Senior Research Scientist at Johnson Research Labs, will discuss her Columbia Data Science course: her motivations for teaching it, how she designed the curriculum, how the NYC tech community was involved, and what impact, if any, she had on her students. She thought about the course as testing the hypothesis: It is possible to incubate awesome data science teams in the classroom. Read more.
Add to your personal schedule
Ballroom AB
Bradley Voytek (UCSF & Uber, Inc.)
Average rating: **...
(2.50, 4 ratings)
With more data come more problems. Did you know Excel dates begin on January 1, 1900? Unless you're using the OS X version, then dates begin on January 1, 1904. Or Unix time, which begins January 1, 1970. These pervasive, easily-overlooked gremlins are the bane of any data scientist and in this session I will explore a variety of these little nuisances. Read more.
Add to your personal schedule
Ballroom AB
Vishwanath Ramarao (Impermium)
Average rating: **...
(2.60, 5 ratings)
Classic data science problems involve finding stationary patterns in big datasets. However, in adversarial settings, enemies deliberately shift their approach to avoid detection. They can challenge learning systems by randomizing behavior, hiding tracks, lacing traffic and more. Successful application of machine learning requires new approaches to feature engineering, training and classification. Read more.
Add to your personal schedule
Ballroom AB
Justin Langseth (Zoomdata, Inc.), Byron Ellis (Spongecell)
Average rating: ***..
(3.83, 6 ratings)
Learn how LivePerson and Zoomdata perform stream processing and visualization on mobile devices of structured site traffic and unstructured chat data in real-time for business decision making. Technologies include Kafka, Storm, and d3.js for visualization on mobile devices. Byron Ellis, Data Scientist for LivePerson will join Justin Langseth of Zoomdata to discuss and demonstrate the solution. Read more.
Add to your personal schedule
Ballroom AB
Philipp Janert (Principal Value, LLC)
Average rating: **...
(2.67, 3 ratings)
Most stable systems rely on feedback - from central heating to industrial plants and biological organisms. This introductory talk will explain what feedback is, why it is relevant to enterprise software development, and how to apply it to some typical problems arising in business and technical situations. Read more.
Add to your personal schedule
Ballroom AB
Alexander Gray (Skytree, Inc.)
Average rating: ***..
(3.30, 10 ratings)
Given a machine learning (ML) problem, which method(s) should you use, and how does big data affect your choices? I will discuss some principles derived from decades of theory and practice, illustrated through real-world ML success stories in medicine, marketing, financial services, and astronomy. Read more.
Add to your personal schedule
Ballroom AB
Dr. Vijay Srinivas Agneeswaran (Impetus Technologies)
Average rating: ***..
(3.00, 2 ratings)
The key takeaway from this session will be an understanding of the third generation of tools for realizing machine learning algorithms - examples of these tools include Twister, HaLoop, GraphLab. Attendees will also understand why the second generation tools such as Mahout has not implemented some of the machine learning algorithms for big data. The session will also have real-life use cases. Read more.
Add to your personal schedule
Ballroom AB
Michael Bean (Forio Simulations)
Average rating: ****.
(4.50, 2 ratings)
Julia is a new mathematical programming language that is scalable, high-performance, and open source. Julia is fast, approaching and often matching the performance of C/C++, easy to learn, and designed for distributed computation. This session will demonstrate some of the special capabilities of Julia and give you the tools you need to get started using this exciting technical computing language. Read more.
Add to your personal schedule
Ballroom F
Nadav Aharony (Behavio)
Average rating: ****.
(4.29, 7 ratings)
Today's smartphones have evolved into incredibly rich sensing and computing devices, that can be used to infer complex and interesting things about us, our environment, and our communities. This talk will give an overview of user-centric, continuous mobile sensing, and our work, originating at the MIT Media Lab, to develop open tools to democratize this capability. Read more.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts