Strata New York Speaker Slides & Video

Presentation slides will be made available after the session has concluded and the speaker has given us the files. Check back if you don't see the file you're looking for—it might be available later! (However, please note some speakers choose not to share their presentations.) Also, check out the presentation files from the 2011 California edition of Strata.

Jer Thorp (The New York Times)
In this presentation, Jer Thorp will discuss his work with names--designing an arrangement algorithm for the 9/11 Memorial in Manhattan. He’ll walk through collaborative processes, admit to a series of failures and ultimately show how humans and software can combine to solve extraordinary problems.
Jon Jenkins (NASA)
The Kepler Mission began its science observations just over two years ago on May 12, 2009, initiating NASA’s first search for Earth-like planets. Initial results and light curves from Kepler are simply breath-taking, including confirmation of the first unquestionable rocky planet, Kepler-10b, and Kepler-11b, a system of 6 transiting planets orbiting one Sun-like star.
Elissa Fink (Tableau Software)
Sometimes the hardest part about making a viz is knowing where to start. Check out the winning vizzes from the Strata/Tableau Data Visualization Contest and get inspired to create your own beautiful visualizations.
Paul Brown (Paradigm4 Inc.)
The science and commercial worlds share requirements for a high performance informatics platform to support collection, curation, collaboration, exploration, and analysis of massive datasets. SciDB is an open source analytical database that provides better analytical performance than relational databases as well as supports key features such as provenance and versioning.
Peter Sirota (Amazon Web Services), Justin Moore (Facebook)
This session will address specific use cases relevant to customers with big data needs. We will highlight customers already successfully utilizing this service as well as showcase top scenarios and explain why it makes sense to leverage the cloud for Big Data needs.
Ken Bado (MarkLogic)
Big Data is more than just volume and velocity. MarkLogic CEO Ken Bado will address why complexity is the key gotcha for organizations trying to outflank their competition by managing Big Data in real time. Learn how winners today are using MarkLogic to manage the complexity of their unstructured information to drive revenue and results.
Jon Jenkins (NASA)
The Kepler spacecraft launched on March 7, 2009, initiating NASA's first search for Earth-size planets orbiting Sun-like stars, with stunning results after being on the job for just over two years. Designing and building the Kepler science pipeline software that processes and analyzes the resulting data to make the discoveries presented a daunting set of challenges.
Jeremy Howard (Kaggle)
'Crowdsourcing big data' might sound like a randomly generated selection of buzz words, but it turns out to represent a powerful leap forward in the accuracy of predictive analytics. This session will explore the reasons why this is the case, using case studies from the fields of astronomy, sports ratings systems and tourism forecasting.
Scott Nicholson (Poynt)
Economists utilize a data analysis toolkit and intuition that can be very helpful to Data Scientists. In particular, econometric methods are quite useful in disentangling correlation and causation, a use case not well-handled by standard machine learning and statistical techniques. This session will cover examples of econometric methods in action, as well as other economics-related insights.
Hjalmar Gislason (DataMarket)
Presentation: external link
Statistics, math and data analysis would easily make most people's "Top 10 Most Boring Topics" list. But an effective data visualization can bring new insights, raise awareness and tell a great story. We want to share our insights from efforts to enable data visualizations on top of massive amounts of data, explore good - and bad - examples and share some of the tools and techniques we use.
Rachel Sterne (City of New York)
From hackathons to API-enabled civic data, learn how New York City government is evolving thanks to deeper engagement with the technology community.
Tyler Bell (Factual), Leo Polovets (Factual)
Factual creates canonical reference sets of 40 million entities from over 2.5 billion fragmentary inputs. This talk explains the Hadoop-based science of our approach combined with what we believe to be a necessary art -- the application of domain-specific knowledge -- in creating pragmatic data services.
A jumpstart lesson on how to get from a blank page and a pile of data to a useful data visualization. We'll focus on the design process, not specific tools. The talk will include discussion of figuring out what story to tell, selecting right data, and picking appropriate encodings. We'll briefly discuss tools and visualization styles, and look at several examples.
Bill Schmarzo (EMC Consulting)
“Big data” provides the opportunity to combine new, rich data sources in novel ways to discover business insights. How do you use analytics to exploit this data so that it will yield real business value? Learn a proven technique that ensures you identify where and how big data analytics can be successfully deployed within your organization. Case study examples will demonstrate its use.
Jake Porway (DataKind), Drew Conway (IA Ventures)
Data scientists and technology companies are rapidly recognizing the immense power of data for drawing insights about their impact and operations, yet NGOs and non-profits are increasingly being left behind with mounting data and few resources to make use of it.
Robert Munro (Idibon)
We will talk about Global Viral Forecasting's 'EpidemicIQ' project, which tracks all the globe's known and potential disease outbreaks. It is the largest humanitarian application of machine-learning and crowdsourcing to date, dynamically adapting to new threats and data sources in real-time.
Mark Madsen (Third Nature)
The first person to conceive of something is usually not the first. They're the first to re-conceive at a point where the current technology caught up to someone else's idea. We're at a point today where many old ideas are being reinvented. Hear why looking to the past, beyond your core field of interest, is worthwhile.
Ryan Boyd (Google), Chris Schalk (Google)
Google is a Data business: over the past few years, many of the tools Google created to store, query, analyze, visualize its data, have been exposed to developers as services. This talk will give you an overview of Google services for Data Crunchers.
Amr Awadallah (Cloudera, Inc.)
Dr. Amr Awadallah, CTO at Cloudera, illustrates how Apache Hadoop is changing the business intelligence data stack, and how the evolving architecture delivers advanced capabilities for solving key business challenges. By enabling the complete value to be derived from both unstructured and structured data, organizations are able to ask and get answers to previously un-addressable big questions.
Chris van der Walt (United Nations Global Pulse), Dane Petersen (Adaptive Path), Sara Farmer (UN Global Pulse)
United Nations Global Pulse and Adaptive Path have been collaborating on a new global crisis impact tool called HunchWorks that allows experts to post hypotheses about emerging crises and crowd source verification. The presentation will focus on lessons learned from a complex project that combines human expertise and big data algorithms using human-centered design and assistive intelligence.
Ben Gimpert (Altos Research)
All big data models are wrong but some are useful, as George Box might have said. Models are not the end result of a big data architecture, but exploratory tools in their own right. They are most useful when data scientists try to understand the business, and when our users learn a bit about data. How can the actual process of modeling improve a big data system, and teach the organization?
Tim Moreton (Acunu)
At the heart of every system that harnesses big data is a pipeline that comprises collecting large volumes of raw data, extract value from it through analytics or data transformations, then delivering that condensed set of results back out -- potentially to millions of users. This talk examines the challenges of building manageable, robust pipelines.
Hilary Mason (Accel Partners)
The flow of data across the social web tells us what people, around the world, are paying attention to at any given moment. Understanding this flow is both a mathematical and a human problem, as we develop and adapt techniques to find stories in the data. Come hear about the expected and the surprises in the bitly data, as well as generalized techniques that apply to any 'realtime' data system.
Randy Lea (Teradata Corporation)
This session will show you how you can bring the science of data to the art of business and empower more business users and analysts to operationalize insights and drive results.
Ken Farmer (ProtectWise)
While most of the focus in data science is on the rapid analysis of vast volumes of data, the hardest part of most solutions is the data acquisition, movement, transformation, and loading - the "data logistics". This presentation will describe the common challenges and solutions - including the best and worst practices that can be reused from Data Warehousing.
John Rauser (Pinterest)
Quantitative Engineer? Business Intelligence Analyst? Data Scientist? The data deluge has come upon us so quickly that we don't even know what to call ourselves, much less how to make a career of working with data. This talk examines the critical traits that lead to success by looking back to what may be the first act of data science.

Sponsors

  • Aster Data
  • EMC Greenplum
  • GE
  • Lexis Nexis
  • MarkLogic
  • Tableau Software
  • Cloudera
  • DataStax
  • Informatica
  • DataSift
  • Splunk
  • Amazon Web Services
  • Datameer
  • Impetus
  • Karmasphere
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Sybase
  • Xeround
  • Media-Science
  • Platfora

Sponsorship Opportunities

For information on sponsorship opportunities at the conference, contact Susan Stewart at sstewart@oreilly.com

Press & Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata Contacts