Strata 2011 Speaker Slides & Videos

Presentation slides will be made available after the session has concluded and the speaker has given us the files. Check back if you don't see the file you're looking for—it might be available later! (However, please note some speakers choose not to share their presentations.) After the conference ends, all recorded video from Strata 2011 will be available on YouTube and a strataconf.blip.tv.

J. C. Herz (Batchtags LLC)
This presentation lays bare the dark underbelly of analytics in the enterprise. Drawing on darkly humorous experiences, the speaker will explain why executives treat analytics as an occult phenomenon. The talk will give executives the mental tools to separate strategically valuable analytics projects from fishing expeditions, and provide litmus tests to keep the witch doctors honest.
Jonathan Ellis (DataStax)
Apache Cassandra is a second-generation distributed database originally open-sourced by Facebook. Its write-optimized shared-nothing architecture results in excellent performance and scalability. This tutorial will cover application design with Cassandra through a series of exercises with Twissandra, a simple Twitter clone written in Python and Django.
Sam Shah (LinkedIn)
How do you go about building a product around data using Hadoop? This talk will present how LinkedIn builds and maintains such features as People You May Know. We will present our architecture for doing so (open-sourced) as well as knowledge we've gained in the process.
Carol McCall (Tenzing Health)
In 2001, the Institutes of Medicine declared that “between the care we have and the care we could have lies not just a gap, but a chasm,” yet nothing’s really changed. Healthcare remains one of the most richly endowed yet poorly equipped knowledge industries anywhere. Using real world examples, we’ll see how BIG DATA may be just what the doctor ordered, but only if we pick the right problems.
This tutorial describes how to draw clear, concise, accurate graphs that are easier to understand than many of the graphs one sees today. The tutorial emphasizes how to avoid common mistakes that produce confusing or even misleading graphs. Graphs for one, two, three, and many variables are covered as well as general principles for creating effective graphs.
Zane Adam (Microsoft Corp)
Zane Adam from Microsoft speaks about the Azure Data Marketplace.
Werner Vogels (Amazon.com)
The new data centricity drives that we have to rethink how we collect, store, manage, analyze and share our data, as all these processes now require limitless resources. This talk will focus on the changes in infrastructure requirements to support the new world and how innovations are removing barriers for companies to be successful.
Abhishek Mehta (Tresata), Mike Olson (Cloudera), Rod Smith (IBM Emerging Internet Technologies )
The tools we use play a key role in how we use and respond to big data. Hear about the changes being led by key architects of future big data systems.
Dustin Kirk (Neustar)
Presentation: external link
When faced endless data and the need to manage it, there are a variety of proven design patterns that will help designers create usable, efficient, and effective interfaces. From distributing workload to reducing sensory overload, we’ll review the techniques that are enabling the highly scalable user interfaces of today and tomorrow.
Mark Madsen (Third Nature)
There has been an explosion in database technology designed to handle big data and deep analytics from both established vendors and startups. This session will provide a quick tour of the primary technology innovations and systems powering the analytic database landscape.
Simon Rogers (Guardian)
90,000 items on Afghanistan, 291,000 on Iraq - and another 251,000 cables. Managing the Wikileaks release is just one of the huge data journalism projects the Guardian's data team has embarked on. This talk will look at how journalists can make sense of data, get stories out of it and our role in supplying open data to the world.
Tim Estes (Digital Reasoning)
Developing a social network map is fundamental to comprehensively understanding a person. Social networks are dynamic and better derived from real-world data than static configurations. However, the vast majority of this real world data is unstructured. This preso will show how Synthesys uses very large scale unstructured data to create social network maps for reporting and further analysis.
Patrick Chanezon (VMware), Ryan Boyd (Google)
Many of the tools Google created to store, query, analyze, visualize data are exposed to external developers. This talk will give you an overview of Google services for Data Crunchers: Google Storage for developers, BigQuery, Machine Learning API, App Engine, Visualization API.
Peter Jackson (Thomson Reuters)
Our talk summarizes some recent thinking in the field of vertical search and illustrates it in the context of a new version of Westlaw, called WestlawNext. We argue that getting the right allocation of function between person and machine is the key to making specialist content more findable and search results more understandable.
DJ Patil (Greylock Partners)
Details coming soon.
Brian Dolan (Discovix ), Joe Hellerstein (Trifacta and UC Berkeley)
A discussion of Big Data approaches to analysis problems in marketing, forecasting, academia and enterprise computing. We focus on practices to enhance collaboration and employ rich statistical methods: a Magnetic, Agile and Deep (MAD) approach to analytics. While the approach is language-agnostic, we show that sophisticated statistics can be easily scaled in traditional environments like SQL.
Data modeling competitions allow companies and researchers to post a problem and have it scrutinised by the world's best data scientists. By exposing a problem to a wide audience, competitions are a great way to get the most out of a dataset. In just a few months, Kaggle's competitions have helped to progress the state of the art in chess ratings and HIV research.
Sudhir Hasbe (Microsoft), Bruno Aziza (Microsoft)
Windows Azure Marketplace includes data, imagery, and real-time web services from leading commercial data providers and authoritative public data sources. Customers have access to datasets such as demographic, environmental, financial, retail, weather and sports.
Joseph Turian (MetaOptimize)
Certain recent academic developments in large data have immediate and sweeping applications in industry. They offer forward-thinking businesses the opportunity to achieve technical competitive advantages. However, these little-known techniques have not been discussed outside academia–until now. What if you knew about important new large data techniques that your competition don't yet know about?
Edd Dumbill (Silicon Valley Data Science), Alistair Croll (Solve For Interesting)
Edd Dumbill and Alistair Croll welcome you to Strata.
Benoit Sigoure (StumbleUpon, Inc.)
OpenTSDB is an open-source, distributed time series database designed to monitor large clusters of commodity machines at an unprecedented level of granularity. OpenTSDB allows operation teams to keep track of all the metrics exposed by operating systems, applications and network equipment, and makes the data easily accessible.
Ben Werther (DataStax (formerly Riptano))
If you are a leading enterprise or web company, then two things are almost certainly true. Data is the lifeblood of your business. And you face an ever-increasing need to scale your applications and data services.
Alistair Croll (Solve For Interesting), Toby Segaran (Google), Amber Case (Geoloqi), Bradford Cross (Flightcaster) Moderated by: Alistair Croll
The convergence of big, open data, ubicomp, and new interfaces will change the way humans work, play, learn, and love. It's a slow transformation that happens one tweet, one blog, and one game at a time -- but it's also an inexorable road towards the singularity. In this panel discussion, we'll look beyond the bytes and algorithms to think about humanity awash in a sea of information.
Benjamin Black (Boundary)
The rise of sensor network data and the expectation for low latency query responses combine to obsolete available databases and storage platforms. We have built a platform for web-scale OLAP and in this talk I will cover how we made our infrastructure capable of real-time update and query performance over hundreds of terabytes of multidimensional data.
James Powell (Thomson Reuters)
Ours is a new era of big behavioral data. Unprecedented business model experimentation is rapidly eroding individual privacy despite rising consumer concerns. Successfully managing privacy is a key differentiator for services providers. In the B2B space, the stakes to get privacy right are even higher. This talk will discuss the implications of privacy in order to succeed in the B2B space.
Samy Bengio (Google), Jonathan Seidman (Orbitz Worldwide), Robert Lancaster (Orbitz Worldwide), Alasdair Allan (Babilim Light Industries)
Can machines help us make better decisions? In this panel, real-world practitioners from the travel, finance, and energy industry give us an inside look at how they're applying machine learning to their industries, oprimizing the use of resources and helping with decision support.
Rod Cope (OpenLogic, Inc.)
Hadoop and HBase make it easy to store terabytes of data, but how do you scale your search mechanism to sift through these mountains of bits and retrieve large result sets in a matter of milliseconds? Careful use of the Solr search server, based on Lucene, made these requirements come to life in our production environment. Come learn how we query terabytes of data in a highly available system.
Kevin Weil (Twitter, Inc.)
Most analytics systems rely on large offline computations, which means results come in hours or days behind. Twitter is all about realtime, but with over 160 million users producing over 90 million tweets per day, we need realtime analytics that scaled horizontally. This talk discusses the development of that infrastructure, as well as the products we are beginning to build on top of it.
Brian Wilson (Thomson Reuters)
New technologies are driving a new era of global collaboration among scientists and researchers. Digital scholarship, the ability to create, collect, publish and collaborate in new digital mediums, is driving the exponential growth of data related to scholarly research. This talk will highlight evolving strategies used to appraise and predict success of institutions and researchers.
Marilyn Craig (Logitech), Terence Craig (PatternBuilders)
Retailers and their suppliers have always operated on the cutting edge of data science. In fact, this industry is responsible for many of the technology advances that have contributed to the exponential growth of data, analytics, and related technology. This session covers the history of data science in retail, current trends, and explores future directions in the “big” data age.
Kim Rees (Periscopic)
While the majority of charts were designed to handle a variety of data, there is a certain novelty of presenting data in a very succinct way. By designing a presentation method restricted to specific data points, we can realize an economy of space and interface.
Johann Schleier-Smith (Tagged.com)
Social media websites are producing ginormous amounts of data and creating a massive demand for insight related to users, how they engage with features, where they are coming from, why they are visiting, what excites them, and so forth.
Stephen Sorkin (Splunk), Narayan Bharadwaj (ClearStory Data)
From customer behaviors & usage statistics to security postures & operational analytics, Splunk's ability to make sense of all types of machine data, structured or unstructured, and mash it up w/ other business data provides complete real-time visibility & operational intelligence. This tutorial demos a new approach for analyzing your organization's petabytes of data to derive real-time insights.
Data competitions come of age: from movie recommendations to life and death. Possibly the biggest news at Strataconf is Heritage Provider Network's $3 million predictive modeling prize - the biggest data mining competition ever. It requires data scientists to build algorithms that predict who will go to hospital in the next year, so that preventive action can be taken.
Barry Devlin (9sight Consulting)
With Big Data comes Big Promises. Mine the blogsphere and discover the secret of eternal wealth. Feast on the Twitter feeds for the wisdom of the ages. We have visited this land in the past, naming it data warehousing and business intelligence. Will we learn the lessons of history? Can we do it differently today? Let’s take this present moment to review the past and imagine the future.
Barry Devlin (9sight Consulting)
For more than 20 years now, data warehousing has put manners on unruly enterprise data. Yet, physics tells us that disorder inexorably increases unless we endlessly fight it. As information volumes and types explode into chaos, is it time to declare the warehouse dead? Or we could move from classical to quantum physics and create a new information architecture. It’s time to make some new choices…
Mark Madsen (Third Nature)
Big data and analytics have developed a mythology rooted in underlying assumptions. We need to ignore these myths and think clearly about how organizations use data, which means understanding how people use information and make decisions.
Hilary Mason (Accel Partners)
Data science is evolving rapidly. I'll talk about our current and slightly future technical and philosophical challenges, including realtime vs non-realtime analysis, streams of data vs traditional databases, and some of the opportunities we have to learn amazing things about the world through our data and what this means for those of us who are immersed in working with it.
Ed Boyajian (EnterpriseDB)
Companies must choose to spend their money and time on the right software initiatives. With exploding volumes of critical data, getting new insight and mastery over business operations demands new investments in BI at multiple levels. Ed will show a proven path for how to avoid exorbitant database software fees and shift that spend to be used in areas like BI where you can realize a stronger ROI.
John Fritz (AMD)
Big Data and predictive analytics can deliver incredible insight that can be used for purposes both good, and not so good. Drawing on real world examples, this session will examine the fine line between competitive advantage and bad behavior, and implications to a complex cast of stakeholders. Let’s begin a dialog on ethics now instead of waiting for our first major crisis.
Joshua Martell (Wolfram|Alpha)
The world's available scientific and factual data is growing at an alarming pace, but how do we use all this information? How do we incorporate it into our decision making process? Joshua Martell, will give an inside look into how Wolfram|Alpha works, what it takes to make data "computable", understand user input, and present meaningful results.
Scott Yara (Greenplum, a division of EMC)
A defining characteristic of modern life is the incredible proliferation of digital information. The Economist estimates that the amount of information created each year is growing at a 60% compounded rate. According to the Harvard Business Review, we humans generated more data last year than in all of previous human history.

Sponsors

  • Thomson Reuters
  • EMC Data Computing Division
  • EnterpriseDB
  • Microsoft
  • Gnip
  • Rackspace Hosting
  • IBM
  • Windows Azure MarketPlace DataMarket
  • Amazon Mechanical Turk
  • Amazon Web Services
  • Aster Data
  • Cloudera
  • Clustrix
  • DataStax, Inc. (formerly Riptano, Inc.)
  • Digital Reasoning Systems
  • Heritage Provider Network
  • Impetus
  • Jaspersoft
  • Karmasphere
  • LinkedIn
  • MarkLogic
  • Pentaho
  • Pervasive
  • Revolution Analytics
  • Splunk
  • Urban Mapping
  • Wolfram|Alpha
  • Esri
  • ParAccel
  • Tableau Software

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Young at syoung@oreilly.com

Download the Strata Sponsor/Exhibitor Prospectus

Contact Us

View a complete list of Strata Contacts