Schedule: Data Science sessions

Data Science, Sutton Center / Sutton South (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Roy Hyunjin Han (CrossCompute)
Average rating: ***..
(3.62, 8 ratings)
Python is the language of choice when it comes to integrating analytical components. We will present a series of concepts and walkthroughs that illustrate how easy scientific computing is in Python, from machine learning and time series to spatial relationships and network analysis. Read more.
Data Science, Regent Parlor (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Susan E. McGregor (Columbia University), Alice Brennan (The New York World), Michael Sullivan (The New York World)
Average rating: ***..
(3.14, 7 ratings)
This tutorial will provide novice users with an overview of a range of common tools use for data cleaning and analysis - including Microsoft Excel, Google Refine, Python and R - along with their relative strengths and weaknesses. Attendees will not only learn useful new skills, and they will know what kind of expertise they need to seek out for help with more complex tasks. Read more.
Business & Industry Data Science, Grand East (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Robert Grossman (Open Data Group), Collin Bennett (Open Data Group)
Average rating: ****.
(4.25, 4 ratings)
A successful big data analytic project is not just about selecting the right algorithm for building a predictive model, but also about how to deploy the model efficiently into operational systems, how to evaluate the effectiveness of the model, and how to continuously improve it. In this tutorial we cover best practices for each of these phases in the life cycle of a predictive model. Read more.
Data Science Hadoop: Case Studies, Sutton Center / Sutton South (NY Hilton)
Donald Miner (ClearEdge IT Solutions)
Average rating: ****.
(4.20, 10 ratings)
The Hadoop and data science communities have matured to the point now that common design patterns across domains are beginning to emerge. Now that Hadoop is maturing and momentum is gaining in the user base, the experienced users can start documenting design patterns that can be shared. In this talk, we'll talk about what makes up a MapReduce design pattern and give some examples. Read more.
Data Science, Beekman / Sutton North (NY Hilton)
Anne Milgram (NYU Law Center on the Administration of Criminal Law Center)
Average rating: ****.
(4.00, 1 rating)
Anne Milgram, Senior Fellow at the NYU Law Center on the Administration of Criminal Law Center. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Ilya Grigorik (Google), Brian Doll (GitHub)
Average rating: ****.
(4.80, 5 ratings)
Open-source developers all over the world contribute to millions of projects every day on GitHub: writing and reviewing code, filing bug reports and updating docs. Data from these events provides an amazing window into open source trends: project momentum, language adoption, community demographics, and more. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Cathy O'Neil (Intent Media)
Average rating: ***..
(3.62, 8 ratings)
In this talk techniques from mathematical financial models will be compared and contrasted with methods coming from machine learning. Specifically, we will discuss the concept of time series data, taking account of seasonality, how to avoid overfitting, continuous updating, and fitting a bayesian prior to your data science model. We will also discuss the question of when to use what tools. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Chang She (Cloudera)
Average rating: ***..
(3.50, 2 ratings)
Proper tooling and good habits that maximize reproducibility are essential to being productive as a data scientist. From management of raw data to model version control, the entire workflow must be carefully controlled from end-to-end to produce quality research that scales with the quantity and complexity of data being analyzed. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Michael Stringer (Datascope Analytics)
Average rating: ****.
(4.50, 4 ratings)
An effective data science team looks a lot like an effective design team: brainstorming creative ideas, making prototypes, receiving feedback, telling stories, and deeply understanding the needs of others. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Amy OConnor (Nokia), Danielle Dean (Nokia)
Average rating: ***..
(3.71, 7 ratings)
Amy O'Connor, Sr. Director of Nokia Analytics, together with her daughter and Nokia Intern, Danielle Dean, will share what makes a great data scientist, their different paths to acquiring the diverse skill sets that are needed and finally Amy will discuss how to spot, attract and train emerging data scientists in what is quickly becoming a heated market. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Wes McKinney (Cloudera)
Average rating: **...
(2.40, 5 ratings)
Data manipulation, cleaning, integration, and preparation can be one of the most time consuming parts of the data science process. In this talk I will discuss key points in the design and implementation of data structures and algorithms for structured data manipulation. It is an accumulation of lessons learned and experience building pandas, a widely-used Python data analysis toolkit. Read more.
Data Science Hadoop: Tools & Technology, Grand East (NY Hilton)
Aaron Kimball (Magnify Consulting), Kiyan Ahmadizadeh (WibiData, Inc.)
Average rating: ****.
(4.33, 3 ratings)
Performing investigative analysis on data stored in HBase is challenging. Most tools operate on files stored in HDFS, and interact poorly with HBase's data model. This talk will describe characteristics of data in HBase and exploratory analysis patterns. We will describe best practices for modeling this data efficiently and survey tools and techniques appropriate for data science teams. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Ted Dunning (MapR)
Average rating: ***..
(3.67, 6 ratings)
This talk will describe how real-time learning can be used for advanced A/B testing as well as a variety of advertising and document targeting problems. The crux of these applications is the Bayesian Bandit algorithm. This algorithm is simple but provides state-of-the-art performance. This talk will be intuitive and practical, but not simple-minded. All code examples are available on github. Read more.
Data Science Hadoop: Tools & Technology, Beekman / Sutton North (NY Hilton)
Justin Erickson (Cloudera), Marcel Kornacker (Cloudera, Inc.)
Average rating: ****.
(4.00, 4 ratings)
This talk will cover what tools and techniques work and don’t work well for data scientists working on Hadoop today and how Cloudera Impala increases the productivity of data science and analysis on Hadoop. Cloudera Impala builds upon experiences and leading edge technology from big data systems at Facebook, Google, and Yahoo. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Justin Moore (Facebook)
Average rating: ***..
(3.43, 7 ratings)
Nearly a billion people actively create and modify nodes and their structured associations in the Facebook object graph. In this talk, Justin Moore describes how a small team within Facebook uses a combination of product, machine learning, and crowdsourcing to maintain and gain insight into this dataset. Read more.
Data Science, Murray West (NY Hilton)
Blake Shaw (Foursquare)
By applying machine learning algorithms to large aggregations of spatiotemporal data we can better understand how people interact with cities and build novel tools to help people navigate the real-world. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Claudia Perlich (Dstillery)
Average rating: ****.
(4.25, 4 ratings)
Building a reliable data-driven solution to a complex business problem is like designing a pocket watch from scratch. At the heart of successful analytics is the art of decomposing the looming big objective into smaller components, each of which may have its own data feed, modeling technique and runtime constraint. We showcase this process on the example of M6D’s online display advertising. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Roger Barga (Microsoft)
Average rating: *****
(5.00, 1 rating)
How do you build and deploy predictive analytics into ongoing business processes so results can be used in real-time to improve operations? This is a common request, in applications ranging from machine-to-machine to oil & gas and utilities. Learn how to leverage all your data assets – including sensor data – to build and operationalize predictive models that improve business operations. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Stefan Karpinski (The Julia Language), Jeff Bezanson (The Julia Language)
Average rating: ****.
(4.00, 1 rating)
Julia is a high-level, high-performance dynamic language for efficient, large-scale scientific and technical computing, which provides simple, flexible primitives for distributed computing, out of the box. These primitives allow various approaches to distributed computation to be implemented succinctly and easily, with high performance, entirely in Julia. Read more.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.