Deep Data is a no-holds-barred program for data scientists. The advanced technical content will keep you up to speed with the latest techniques, and give you the opportunity to debate and network with the most skilled data scientists in our industry.
9:00am – 9:45am
SQL and NoSQL Are Two Sides Of The Same Coin
Contrary to popular belief, SQL and NoSQL are not at odds with each other, they are duals—in fact NoSQL should really be called coSQL. Recognizing this duality can change the way we think about which technology to use when, and what we need to invest in next.
9:45am – 10:30am
From Knowing ‘What’ To Understanding ‘Why’
With the collection of almost every piece of information about your customers comes the ability to start asking your data the right question: Why do they do what they do? And even more: what would they do if I could interact with them. We show for the case of online display advertising, how causal analysis gives interesting new answers about the right (and wrong) ways of spending your money.
10:30am – 11:00am Break
11:00am – 11:30am
The Model and the Train Wreck: A Training Data How-to
Getting training data for a recommender system is easy: if users clicked it, it’s a positive – if they didn’t, it’s a negative. … Or is it? In this talk, we use examples from production recommender systems to bring training data to the forefront: from overcoming presentation bias to the art of crowdsourcing subjective judgments to creative data exhaust exploitation and feature creation.
11:30am – 12:00pm
Corpus Bootstrapping with NLTK
Learn various ways to bootstrap a custom corpus for training highly accurate natural language processing models. Real world examples will be presented with Python code samples using NLTK. Each example will show you how, starting from scratch, you can rapidly produce a highly accurate custom corpus for training the kinds of natural language processing models you need.
The Importance of Importance: An Introduction to Feature Selection
Twenty-first century big data is being used to train predictive models of emotional sentiment, customer churn, patient health, and other behavioral complexities. Variable importance and feature selection reduces the dimensionality of our models, so an unfeasible and complex problem may become somewhat more predictable.
12:30pm – 1:30pm Lunch
Social Network Analysis Isn’t Just For People
The tools of social network analysis are based on mathematical network theory. There is very little in these techniques that actually requires that the data represents social activity. We’ll show how these techniques can be applied to data from areas such as geo, linguistics and the Wikipedia link graph. We’ll visualise and explore the data using Gephi, the “Photoshop for graphs”.
Array Theory vs. Set Theory in Managing Data
Relational databases were based on Set theory — which insists that the order of items does not matter. For many (most?) data problems, however, order does matter. By using Array theory, a relational-like database gains a considerable advantage over set-theory based engines.
3:00pm – 3:30pm Break
3:30pm – 4:00pm
Survival Analysis for Cache Time-to-Live Optimization
We examine the effectiveness of a statistical technique known as survival analysis to optimize the cache time-to-live for hotel rates in a hotel rate cache. We describe how we collect and prepare nearly a billion records per day utilizing MongoDB and Hadoop. Finally, we show how this analysis is improving the operation of our hotel rate cache.
The Data Science Debate
Peter Skomoroch, Michael Driscoll, DJ Patil, Amy Heineike, Pete Warden, Toby Segaran
End the day by joining leading data scientists in debating the hot issues in the profession.
For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at firstname.lastname@example.org.
For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata contacts