Strata 2011 Schedule

Below are the confirmed and scheduled talks at Strata 2011 (schedule subject to change).

Customize Your Own Schedule

Create your own Strata schedule using the personal scheduler function. Mark the tutorials, sessions, keynotes, and events you want to attend by clicking on the calendar icon [calendar icon] next to each listing. Then click on "personal schedule" below and get your own customized schedule generated.

Mission City M
11:30am MAD Skills: A Magnetic, Agile and Deep Approach to Scalable Analytics Brian Dolan (Discovix ), Joe Hellerstein (Trifacta and UC Berkeley)
1:40pm Prototyping With Data Matt Biddulph (Product Club)
2:30pm Big Data, Lean Startup: Data Science on a Shoestring Philip (Flip) Kromer (CSC)
4:10pm New Developments in Large Data Techniques Joseph Turian (MetaOptimize)
5:00pm Google Cloud for Data Crunchers Patrick Chanezon (VMware), Ryan Boyd (Google)
7:45pm OpenTSDB: A Scalable, Distributed Time Series Database Benoit Sigoure (StumbleUpon, Inc.)
8:35pm Avro Data Doug Cutting (Cloudera)
Mission City B1
10:40am Crowdsourcing and the Democratization of Data Lukas Biewald (CrowdFlower)
11:30am Extracting Business Value from Semi-Structured Data Cindi Thompson (Deloitte Consulting LLP)
1:40pm Making Data Science a Sport Anthony Goldbloom (Kaggle)
2:30pm Real World Applications Panel: Healthcare and Medicine Andrew Odewahn (O'Reilly Media), Carol McCall (Tenzing Health), David Van Sickle (Asthmapolis), Jim Golden (Accenture), Indu Subaiya (Health 2.0)
4:10pm Real World Applications Panel: Machine Learning and Decision Support Samy Bengio (Google), Jonathan Seidman (Orbitz Worldwide), Robert Lancaster (Orbitz Worldwide), Alasdair Allan (Babilim Light Industries)
5:00pm Real World Applications Panel: Enterprise and Industry Kenneth Cukier (The Economist), Adam Hurwitz (Happtique), Jinesh Varia (Amazon Web Services), Mario Veiga Pereira (PSR)
7:45pm Real-Time Searching of Big Data with Solr and Hadoop Rod Cope (OpenLogic, Inc.)
8:35pm Esperwhispering: Using Esper to Find Problems in Real-time Data Theo Schlossnagle (OmniTI/Circonus)
Mission City B5
10:40am Telling Great Data Stories Online Jock Mackinlay (Tableau Software)
11:30am Designing For Infinity Dustin Kirk (Neustar)
2:30pm Building and Pricing the Open Data Marketplace Pete Soderling (Stratus Security), Pete Forde (BuzzData)
4:10pm Visualizing Shared, Distributed Data Roman Stanek (GoodData), Pete Warden (Jetpac), Alon Halevy (Google)
7:45pm Building Data Products with Hadoop Sam Shah (LinkedIn)
8:35pm Scaling Data Analysis with Apache Mahout Isabel Drost-Fromm (Apache Software Foundation/ Nokia Gate 5 GmbH)
Mission City B4
10:40am Sponsored by EnterpriseDB
An In-Depth Look at How to Survive the Data Deluge: It's About Dollars and Sense Ed Boyajian (EnterpriseDB)
11:30am Sponsored by Digital Reasoning
Generating Dynamic Social Networks from Large Scale Unstructured Data Tim Estes (Digital Reasoning)
1:40pm Sponsored by Thomson Reuters
Human Expertise and Artificial Intelligence in Vertical Search Peter Jackson (Thomson Reuters)
2:30pm Sponsored by Pervasive
Supercharge Development and Performance of Hadoop Applications Davin Potts (Pervasive)
4:10pm Sponsored by Microsoft
Microsoft DataMarket: Leveraging cloud to deliver public domain and commercial data to millions Sudhir Hasbe (Microsoft), Bruno Aziza (Microsoft)
5:00pm Sponsored by EMC
Social Media Analytics Using Greenplum's Data Computing Appliance Johann Schleier-Smith (Tagged.com)
7:45pm Unleashing Twitter Data for Fun and Insight Matthew Russell (Digital Reasoning Systems)
12:10pm Lunch
Room: Mezzanine
Wednesday Lunchtime BoF Sessions
6:45pm Plenary
Room: Mission CIty Ballroom Foyer
Strata Science Fair
10:15am Morning Break: Sponsored by Microsoft
Room: Ballroom ABCD
3:10pm Afternoon Break: Sponsored by EnterpriseDB
Room: Ballroom ABCD
5:40pm Sponsor Pavilion Reception
Room: Ballroom ABCD
Sponsor Pavilion Reception
8:45am Plenary
Room: Mission City Ballroom
Opening Welcome Edd Dumbill (Silicon Valley Data Science), Alistair Croll (Solve For Interesting)
9:00am Plenary
Room: Mission City Ballroom
What Data Tells Us Hilary Mason (Accel Partners)
9:10am Plenary
Room: Mission City Ballroom
Privacy and Big Behavioral Data in the B2B Space James Powell (Thomson Reuters)
9:25am Plenary
Room: Mission City Ballroom
The Mythology of Big Data Mark Madsen (Third Nature)
9:35am Plenary
Room: Mission City Ballroom
Data Without Limits Werner Vogels (Amazon.com)
9:50am Plenary
Room: Mission City Ballroom
Data Everywhere: There Ought to Be a Marketplace for It Zane Adam (Microsoft Corp)
10:00am Plenary
Room: Mission City Ballroom
Delivering Big Data: A Conversation with Mike Olson and Rod Smith Abhishek Mehta (Tresata), Mike Olson (Cloudera), Rod Smith (IBM Emerging Internet Technologies )
10:10am Plenary
Room: Mission City Ballroom
The $3 Million Heritage Health Prize Anthony Goldbloom (Kaggle)
10:40am-11:20am (40m) Practitioner
Distilling Data Exhaust: How to Surface Insights and Build Products
Peter Skomoroch (Data Wrangling)
Learn how to leverage data exhaust, the digital byproduct of our online activities, to solve problems and discover insights about the world around you. We will walk through a real world example which combines several datasets and statistical techniques to discover insights and make predictions about attendees at O'Reilly Strata.
11:30am-12:10pm (40m) Practitioner
MAD Skills: A Magnetic, Agile and Deep Approach to Scalable Analytics
Brian Dolan (Discovix ) et al
A discussion of Big Data approaches to analysis problems in marketing, forecasting, academia and enterprise computing. We focus on practices to enhance collaboration and employ rich statistical methods: a Magnetic, Agile and Deep (MAD) approach to analytics. While the approach is language-agnostic, we show that sophisticated statistics can be easily scaled in traditional environments like SQL.
1:40pm-2:20pm (40m) Practitioner
Prototyping With Data
Matt Biddulph (Product Club)
If you're a new startup looking for investment, or a team at a large company seeking the green light for a new product, nothing convinces like real running code. But how do you solve the chicken-and-egg problem of filling your early prototype with real data? We'll discuss how to use open datasets and public web APIs as a proxy for the final product while you're still in the development stage.
2:30pm-3:10pm (40m) Practitioner
Big Data, Lean Startup: Data Science on a Shoestring
Philip (Flip) Kromer (CSC)
How do you build a crack team of data scientists on a shoestring budget? In this 40-minute presentation from the co-founder of Infochimps, Flip Kromer will draw from his experiences as a teacher and his vast programming and data experience to share lessons learned in building a team of smart, enthusiastic hires.
4:10pm-4:50pm (40m) Practitioner
New Developments in Large Data Techniques
Joseph Turian (MetaOptimize)
Certain recent academic developments in large data have immediate and sweeping applications in industry. They offer forward-thinking businesses the opportunity to achieve technical competitive advantages. However, these little-known techniques have not been discussed outside academia–until now. What if you knew about important new large data techniques that your competition don't yet know about?
5:00pm-5:40pm (40m) Practitioner
Google Cloud for Data Crunchers
Patrick Chanezon (VMware) et al
Many of the tools Google created to store, query, analyze, visualize data are exposed to external developers. This talk will give you an overview of Google services for Data Crunchers: Google Storage for developers, BigQuery, Machine Learning API, App Engine, Visualization API.
7:45pm-8:30pm (45m) Practitioner
OpenTSDB: A Scalable, Distributed Time Series Database
Benoit Sigoure (StumbleUpon, Inc.)
OpenTSDB is an open-source, distributed time series database designed to monitor large clusters of commodity machines at an unprecedented level of granularity. OpenTSDB allows operation teams to keep track of all the metrics exposed by operating systems, applications and network equipment, and makes the data easily accessible.
8:35pm-9:20pm (45m) Practitioner
Avro Data
Doug Cutting (Cloudera)
Apache Avro provides an expressive, efficient standard for representing large data sets. Avro data is programming-language neutral and MapReduce-friendly. Hopefully it can replace gzipped CSV-like formats as a dominant format for data.
10:40am-11:20am (40m) Disruption & Opportunity
Crowdsourcing and the Democratization of Data
Lukas Biewald (CrowdFlower)
Topics for any discipline that focuses on quantitative or technical data have always depended on the datasets that were available at the time. Crowdsourcing has changed that — democratizing the data-collection process and cutting researchers’ reliance on stagnant, overused datasets. Tools like Amazon Mechanical Turk allow anyone to gather data overnight, rather than waiting years.
11:30am-12:10pm (40m) Disruption & Opportunity
Extracting Business Value from Semi-Structured Data
Cindi Thompson (Deloitte Consulting LLP)
Much useful business data is in "semi-structured" form: government filings, insurance claims, customer comment forms, etc. Although most search tools don't take advantage of it, knowing a little structure goes a long way. This talk will show how semi-structured data can be interpreted, summarized, and applied to produce business value in several real-life examples.
1:40pm-2:20pm (40m) Disruption & Opportunity
Making Data Science a Sport
Anthony Goldbloom (Kaggle)
Data modeling competitions allow companies and researchers to post a problem and have it scrutinised by the world's best data scientists. By exposing a problem to a wide audience, competitions are a great way to get the most out of a dataset. In just a few months, Kaggle's competitions have helped to progress the state of the art in chess ratings and HIV research.
2:30pm-3:10pm (40m) Real World
Real World Applications Panel: Healthcare and Medicine
Andrew Odewahn (O'Reilly Media) et al
Information is changing healthcare forever. From the study of epidemics, to machine learning that can improve diagnosis, to the sequencing of the human genome, we're doing the math of life itself. This panel of practitioners will show us what they're doing in healthcare, pharmaceuticals, and genomics, and how it will change the way we discover, treat, and eliminate disease.
4:10pm-4:50pm (40m) Real World
Real World Applications Panel: Machine Learning and Decision Support
Samy Bengio (Google) et al
Can machines help us make better decisions? In this panel, real-world practitioners from the travel, finance, and energy industry give us an inside look at how they're applying machine learning to their industries, oprimizing the use of resources and helping with decision support.
5:00pm-5:40pm (40m) Real World
Real World Applications Panel: Enterprise and Industry
Kenneth Cukier (The Economist) et al
Join practitioners from a range of industries to learn how they're putting new tools and massive data sets to work. We'll hear how music, geophysics, and the legal system are all changing by putting huge, rich information into the hands of business.
7:45pm-8:30pm (45m) Practitioner
Real-Time Searching of Big Data with Solr and Hadoop
Rod Cope (OpenLogic, Inc.)
Hadoop and HBase make it easy to store terabytes of data, but how do you scale your search mechanism to sift through these mountains of bits and retrieve large result sets in a matter of milliseconds? Careful use of the Solr search server, based on Lucene, made these requirements come to life in our production environment. Come learn how we query terabytes of data in a highly available system.
8:35pm-9:20pm (45m) Practitioner
Esperwhispering: Using Esper to Find Problems in Real-time Data
Theo Schlossnagle (OmniTI/Circonus)
With thousands of datapoints per second from nodes around the world, how can you tell when something isn't right? The bottom line is: it's hard, but with the right tools it is achievable.
10:40am-11:20am (40m) Interfaces
Telling Great Data Stories Online
Jock Mackinlay (Tableau Software)
Interactive visualizations have become the new media for telling stories online. This session will focus on going from a good visualization to a great visualization by focusing on organization, user interface, and formatting. You should expect to leave this session confident in your ability to consistently create excellent interactive visuals.
11:30am-12:10pm (40m) Interfaces
Designing For Infinity
Dustin Kirk (Neustar)
When faced endless data and the need to manage it, there are a variety of proven design patterns that will help designers create usable, efficient, and effective interfaces. From distributing workload to reducing sensory overload, we’ll review the techniques that are enabling the highly scalable user interfaces of today and tomorrow.
1:40pm-2:20pm (40m) Interfaces
Small is the New Big: Lessons in Visual Economy
Kim Rees (Periscopic)
While the majority of charts were designed to handle a variety of data, there is a certain novelty of presenting data in a very succinct way. By designing a presentation method restricted to specific data points, we can realize an economy of space and interface.
2:30pm-3:10pm (40m) The Data Business
Building and Pricing the Open Data Marketplace
Pete Soderling (Stratus Security) et al
The state of open data today is a real mess. It's very difficult to find the data you need and be confident that it's timely and accurate. There is a growing list of companies now vying to become the key destinations for people to gather around new datasets and be excited together. What projects, partnerships and even ventures would be created if there was a marketplace for data?
4:10pm-4:50pm (40m) Interfaces
Visualizing Shared, Distributed Data
Roman Stanek (GoodData) et al
"Many hands make light work", as the saying goes. That's true when thousands of people can collaborate on a data set. In this session, we'll look at collective interfaces that allow many distributed users to examine and share data with one another, and how that's changing traditional desktop visualization tools.
5:00pm-5:40pm (40m) Disruption & Opportunity
Wolfram|Alpha: Answering Questions with the World's Factual Data
Joshua Martell (Wolfram|Alpha)
The world's available scientific and factual data is growing at an alarming pace, but how do we use all this information? How do we incorporate it into our decision making process? Joshua Martell, will give an inside look into how Wolfram|Alpha works, what it takes to make data "computable", understand user input, and present meaningful results.
7:45pm-8:30pm (45m) Practitioner
Building Data Products with Hadoop
Sam Shah (LinkedIn)
How do you go about building a product around data using Hadoop? This talk will present how LinkedIn builds and maintains such features as People You May Know. We will present our architecture for doing so (open-sourced) as well as knowledge we've gained in the process.
8:35pm-9:20pm (45m) Practitioner
Scaling Data Analysis with Apache Mahout
Isabel Drost-Fromm (Apache Software Foundation/ Nokia Gate 5 GmbH)
With growing amounts of digital data at the fingertips of software developers the need for a scalable, easy to use framework is tremendous. This talk introduces Apache Mahout - a project with the goal of implementing scalable machine learning algorithms for the masses.
10:40am-11:20am (40m)
An In-Depth Look at How to Survive the Data Deluge: It's About Dollars and Sense
Ed Boyajian (EnterpriseDB)
The move to cloud infrastructure and the need to handle big data have created the perfect catalysts for organizations to introduce new infrastructure software and break ties from their expensive incumbent vendors. Ed will share a detailed strategy on how to leverage open source database solutions like PostgreSQL to contain database cost and free budget for other, more valuable initiatives.
11:30am-12:10pm (40m)
Generating Dynamic Social Networks from Large Scale Unstructured Data
Tim Estes (Digital Reasoning)
Developing a social network map is fundamental to comprehensively understanding a person. Social networks are dynamic and better derived from real-world data than static configurations. However, the vast majority of this real world data is unstructured. This preso will show how Synthesys uses very large scale unstructured data to create social network maps for reporting and further analysis.
1:40pm-2:20pm (40m)
Human Expertise and Artificial Intelligence in Vertical Search
Peter Jackson (Thomson Reuters)
Our talk summarizes some recent thinking in the field of vertical search and illustrates it in the context of a new version of Westlaw, called WestlawNext. We argue that getting the right allocation of function between person and machine is the key to making specialist content more findable and search results more understandable.
2:30pm-3:10pm (40m)
Supercharge Development and Performance of Hadoop Applications
Davin Potts (Pervasive)
This session explores how to get more done, faster with high-performance Map/Reduce and expand the universe of Hadoop possibilities with tools to speed and simplify development and deployment of analytic applications.
4:10pm-4:50pm (40m)
Microsoft DataMarket: Leveraging cloud to deliver public domain and commercial data to millions
Sudhir Hasbe (Microsoft) et al
Windows Azure Marketplace includes data, imagery, and real-time web services from leading commercial data providers and authoritative public data sources. Customers have access to datasets such as demographic, environmental, financial, retail, weather and sports.
5:00pm-5:40pm (40m)
Social Media Analytics Using Greenplum's Data Computing Appliance
Johann Schleier-Smith (Tagged.com)
Social media websites are producing ginormous amounts of data and creating a massive demand for insight related to users, how they engage with features, where they are coming from, why they are visiting, what excites them, and so forth.
7:45pm-8:30pm (45m) Practitioner
Unleashing Twitter Data for Fun and Insight
Matthew Russell (Digital Reasoning Systems)
This talk demonstrates how an eclectic blend of storage, analysis, and visualization techniques can be used to gain a lot of serious insight from Twitter data, but also to answer fun quesions such as "What does Justin Bieber and the Tea Party have (and not have) in common?"
8:35pm-9:20pm (45m) Practitioner
Riak Core: Scalable, Highly-Available Distributed Systems Infrastructure
Justin Sheehy (Basho Technologies)
Riak Core is a general implementation of a distributed systems model, enabling you to build a customized, scalable, highly-available distributed system without too huge an investment. Justin will explain that model, its history, and how it can be used to build new data processing systems.
12:10pm-1:40pm (1h 30m)
Wednesday Lunchtime BoF Sessions
Birds of a Feather (BoF) sessions provide face to face exposure to those interested in the same projects and concepts. BoFs can be organized for individual projects or broader topics (best practices, open data, standards). BoF topics are entirely up to you. Wednesday's Lunchtime BoF sessions will happen on the hotel side of the Hyatt Regency, Mezzanine Level.
6:45pm-7:45pm (1h)
Strata Science Fair
As part of Strata, we'll be holding a Science Fair. It's a place to demonstrate cutting-edge technologies and cool toys — the more hands-on, the better. Whether it's software that breaks the rules of computing, a compelling new interface, or a prototype that pushes the envelope, we want to see it.
10:15am-10:40am (25m)
Break: Morning Break: Sponsored by Microsoft
3:10pm-4:10pm (1h)
Break: Afternoon Break: Sponsored by EnterpriseDB
5:40pm-6:40pm (1h)
Sponsor Pavilion Reception
Join us in the Sponsor Pavilion immediately following sessions on Wednesday, February 2. Have a drink and some delectable nibbles, network with other Strata attendees, and visit our Sponsors who are at the leading edge of the data conversation.
8:45am-9:00am (15m)
Opening Welcome
Edd Dumbill (Silicon Valley Data Science) et al
Edd Dumbill and Alistair Croll welcome you to Strata.
9:00am-9:10am (10m)
What Data Tells Us
Hilary Mason (Accel Partners)
Data science is evolving rapidly. I'll talk about our current and slightly future technical and philosophical challenges, including realtime vs non-realtime analysis, streams of data vs traditional databases, and some of the opportunities we have to learn amazing things about the world through our data and what this means for those of us who are immersed in working with it.
9:10am-9:25am (15m)
Privacy and Big Behavioral Data in the B2B Space
James Powell (Thomson Reuters)
Ours is a new era of big behavioral data. Unprecedented business model experimentation is rapidly eroding individual privacy despite rising consumer concerns. Successfully managing privacy is a key differentiator for services providers. In the B2B space, the stakes to get privacy right are even higher. This talk will discuss the implications of privacy in order to succeed in the B2B space.
9:25am-9:35am (10m)
The Mythology of Big Data
Mark Madsen (Third Nature)
Big data and analytics have developed a mythology rooted in underlying assumptions. We need to ignore these myths and think clearly about how organizations use data, which means understanding how people use information and make decisions.
9:35am-9:50am (15m)
Data Without Limits
Werner Vogels (Amazon.com)
The new data centricity drives that we have to rethink how we collect, store, manage, analyze and share our data, as all these processes now require limitless resources. This talk will focus on the changes in infrastructure requirements to support the new world and how innovations are removing barriers for companies to be successful.
9:50am-10:00am (10m)
Data Everywhere: There Ought to Be a Marketplace for It
Zane Adam (Microsoft Corp)
Zane Adam from Microsoft speaks about the Azure Data Marketplace.
10:00am-10:10am (10m)
Delivering Big Data: A Conversation with Mike Olson and Rod Smith
Abhishek Mehta (Tresata) et al
The tools we use play a key role in how we use and respond to big data. Hear about the changes being led by key architects of future big data systems.
10:10am-10:15am (5m)
The $3 Million Heritage Health Prize
Anthony Goldbloom (Kaggle)
Data competitions come of age: from movie recommendations to life and death. Possibly the biggest news at Strataconf is Heritage Provider Network's $3 million predictive modeling prize - the biggest data mining competition ever. It requires data scientists to build algorithms that predict who will go to hospital in the next year, so that preventive action can be taken.

Sponsors

  • Thomson Reuters
  • EMC Data Computing Division
  • EnterpriseDB
  • Microsoft
  • Gnip
  • Rackspace Hosting
  • IBM
  • Windows Azure MarketPlace DataMarket
  • Amazon Mechanical Turk
  • Amazon Web Services
  • Aster Data
  • Cloudera
  • Clustrix
  • DataStax, Inc. (formerly Riptano, Inc.)
  • Digital Reasoning Systems
  • Heritage Provider Network
  • Impetus
  • Jaspersoft
  • Karmasphere
  • LinkedIn
  • MarkLogic
  • Pentaho
  • Pervasive
  • Revolution Analytics
  • Splunk
  • Urban Mapping
  • Wolfram|Alpha
  • Esri
  • ParAccel
  • Tableau Software

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Young at syoung@oreilly.com

Download the Strata Sponsor/Exhibitor Prospectus

Contact Us

View a complete list of Strata Contacts