Skip to main content

Speaker Slides and Video

Presentation slides will be made available after the session has concluded and the speaker has given us the files. Check back if you don't see the file you're looking for—it might be available later! (However, please note some speakers choose not to share their presentations.)

Slides:   1-PDF 
The 2013 update to IBM’s Big Data Analytics Survey examines in-depth the key components (people, process, technology, culture, leadership and governance)required for organizations to excel at deriving value from their information assets (structured/unstructured, streaming/static, big data/little data) in a digital landscape that includes big data, mobile and cloud technologies.
Boyd Davis (Intel)
At Intel, we envision a future in which every organization in the world can use new sources of data to enhance its operational intelligence, fostering discoveries and innovation in science, industry, and medicine.
Oscar Boykin (Twitter)
Slides:   1-PDF 
Abstractions are what enable us to think clearly about complex systems. In this talk, we will see how some simple abstractions, such as Monoids, can be used to pattern analytics platforms.
Ted Dunning (MapR)
Slides:   1-PDF 
There are many practical details involved in building an anomaly detection system. In this presentation, I will describe the major classes of these systems, and show you how to build anomaly detection systems for: * Determining when an event rate shifts * Determining when new topics appear in content streams * Determining when systems with defined inputs and outputs act strangely
Vinod Kumar Vavilapalli (Hortonworks)
Slides:   1-PPTX 
The Hadoop 2.0 revolution is in full force! Organizations, companies, users are gearing up for the move from 1.0 to 2.0. In this talk, we will discuss what Hadoop 2.0 is about, what YARN is, what features that HDFS2 unlocks and what it means to move to 2.0. We'll discuss this major migration from 1.0 to 2.0 from various perspectives - admins, frameworks, end users & data processing platforms.
Eli Collins (Cloudera)
Slides:   1-PPTX 
In this talk, we'll explore how Apache Hadoop has rapidly evolved to become the new foundation for enterprise analytics - the enterprise data hub - and learn about the state-of-the-art in deploying a modern data warehouse on top of the Hadoop stack.
Paco Nathan (Databricks)
Slides:   1-PDF 
Google "Omega" research: 80% cluster jobs are batch, 60% cluster resources go to services. Batch is simple, services are hard, mixing workloads is key to building efficient distributed apps. This talk examines case studies of Mesos workloads: ranging from Twitter (100% on prem) to Airbnb (100% cloud). How did they leverage "data center OS" building blocks for orders of magnitude gains at scale?
Nenshad Bardoliwalla (Paxata, Inc.)
Slides:   1-PDF 
Join Paxata’s Nenshad Bardoliwalla for a look at the new breed of data preparation tools that use semantic algorithms to detect data types, apply machine learning to find hidden patterns, and link related columns of data automatically.
Pamela Peele (UPMC)
Slides:   1-PDF 
Many organizations are jumping on the analytic wagon and hiring their own in-house analytic teams. This talk addresses the dos and don'ts of building a data team including some surprising skills sets you will need on your data team, where to find them, how to organize them and then how to manage knowledge discovery to drive business optimization at the corporate level.
Krishna Raj Raja (Cloudphysics), Balaji Parimi (Cloudphysics)
Slides:   1-PDF 
In this talk we discuss the challenges associated with data center operations management and provide details on how CloudPhysics big data platform solves these problems and enables new capabilities that were previously not possible.
Brett Sargent (LumaSense Technologies Inc.)
Slides:   1-PDF 
Smart meters may be the most visible element of the so-called smart grid, but how smart is it if the plants producing the energy are dumb? To ensure the integrity of the grid, every stage of our electrical power infrastructure – including generation, transmission and distribution – has to get ”smart.” Sophisticated sensors connected to big data analytics are key to keeping the power flowing.
Narendra Mulani (Accenture)
Slides:   1-PPTX 
In this session, Accenture’s Narendra Mulani takes us beneath the streets of one of the world’s biggest cities and show how big data architectures, data science algorithms, and advanced visualizations tackle the management challenges of large-scale, infrastructure-heavy networks such as water utilities, and how analytics can replace capital to extend the effectiveness of current infrastructure.
Joe Hellerstein (Trifacta and UC Berkeley), Tutti Taygerly (Trifacta)
Slides:   1-PDF 
If Big Data is the grand challenge of our time, most analytic effort is like ground control: the hard work behind the scenes that enables ambitious analysis to occur.
Mike Wendt (Accenture Technology Labs)
Slides:   1-PPTX 
In this session, we will share the results of our study, a price-performance comparison of a bare-metal Hadoop cluster and cloud-based Hadoop clusters.
Quentin Clark (Microsoft)
Slides:   1-PPTX 
How does the world change when big data reaches a billion people? What happens when anyone, from farmers to criminal investigators, gains the power to quickly derive meaningful insights from vast and varied data sources? Join Quentin Clark, Microsoft Corporate Vice President, who will highlight how simple, familiar tools and cutting-edge cloud technologies are bringing big data to all.
John Akred (Silicon Valley Data Science), Richard Williamson (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science)
Slides:   external link
3-Hours: What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive and realtime analytical workloads.
Eric Pugh (OpenSource Connections)
Slides:   1-ZIP 
The United States Patent and Trademark Office wanted a simple, lightweight, yet modern and rich discovery interface for Chinese patent data. This is how we did it.
Ronan Stokes (Cloudera)
Slides:   1-PDF    2-PDF 
3-Hours: Apache HBase is a distributed, column-oriented, key-value store for Apache Hadoop (via integration with HDFS). In this tutorial, you will learn the basic elements of building a real-time application that uses Apache HBase as a persistent data store.
Bill Franks (Teradata Corporation)
Slides:   1-PDF 
Attend this session to learn how you can take advantage of the new economics of data. This session will present examples of how leading organizations are evolving their enterprise data architectures to bring together the Data Warehouse, Hadoop & Data Discovery Platforms so All Users can benefit from ALL Analytics on ALL Data.
Sebastian Thrun (Udacity)
Slides:   1-PDF 
Today, we are facing a looming job skills gap in any area that uses Big Data. McKinsey estimates there will be a need for 1.5 million data-savy managers and analysts in the next 5 years with this number increasing exponentially around the world. Can digital education be the catalyst to start closing this gap?
John Schitka (SAP)
Slides:   1-PPTX 
Crowdsourcing can be an effective way to collect massive amounts of data to enable deeper analysis in many situations. Explore the foundational steps that can lead to successfully crowd sourcing data though the lenses of the International Barcode of Life and Technical University Munich (TUM) ProteomicsDB projects. SAP is proud to be involved with driving the success of both these projects.
Geoffrey Moore (Geoffrey Moore Consulting)
Slides:   1-PPTX 
Crossing the Chasm has been a key reference point for high-tech marketing since its publication in 1990, but a lot has changed since then, especially with the rise of cloud computing, software as a service, mobile endpoints, big data analytics, and viral marketing.
Kaushik Das (Pivotal)
The emerging Internet Of Things (IOT) enables us to build smart systems. We already have the sensory and motor parts of these systems available, but we don't have the brain. This is where data science comes into the picture! I will talk about how we are using big data technologies in conjunction with data science here at Pivotal to build the digital brain that makes a system smart.
Joe Hellerstein (Trifacta and UC Berkeley), Jeffrey Heer (Trifacta Inc. / Univ of Washington)
Slides:   1-PDF 
90-Minutes: Data analysts routinely report spending more time "wrangling" their data than performing analysis per se. In this tutorial we focus on the ever-present yet oft-overlooked challenges of Data Transformation, including discovery, structure, content and curation. We emphasize recent approaches that jointly emphasize interaction and inference, leveraging both human acuity and...
Diane Chang (Intuit)
Slides:   1-PPTX 
You have data. You’ve hired data scientists. Now, how do you structure your teams? Do you keep the data scientists together to allow them to learn from each other? Or do you assign them individually to project teams so they can share their knowledge and become closer to the business? Intuit experimented with both ways and learned what it takes to get great outcomes.
John Foreman (MailChimp)
Slides:   1-PDF    2-XLSX    3-XLSX    4-FILE 
Data science algorithms (think machine learning, clustering, outlier detection) often get conflated with the industry-standard tools and programming languages that run them. In this tutorial, John Foreman will use only spreadsheets to build models from his book Data Smart to demonstrate exactly how data science techniques work step-by-step.
Ian Huston (Pivotal), Alexander Kagoshima (Pivotal), Noelle Sio (Pivotal)
Slides:   external link
With increased road congestion around the globe and growing amounts of car data we need more intelligent analytical methods to beat the traffic. This talk presents our work on traffic velocity and travel disruption analytics. We describe our approach in detail, how we went from idea to implemented algorithm and how our methods can be applied to gain deep insight into influential factors.
Ramona Pierson (Declara)
Slides:   1-PPTX 
Humans are constantly curious and learning should be about making new discoveries. With big data, we have the potential to take formal learning which is taught and combine it with informal learning which is experienced, to create personalized learning paths for every individual.
Amr Awadallah (Cloudera, Inc.)
Slides:   1-PPTX 
In this talk Dr. Amr Awadallah will present the Enterprise Data Hub (EDH) as the new foundation for the modern information architecture. Built with Apache Hadoop at the core, the EDH is an extremely scalable, flexible, and fault-tolerant, data processing system designed to put data at the center of your business.
Hadley Wickham (Rice University / RStudio)
Slides:   external link
A well-designed domain specific language makes all parts of the data science process easier. In this talk I'll discuss two DSLs implemented in R that make it data manipulation and visualisation both easier to describe and faster to compute.
Milan Vaclavik (CenturyLink Technology Solutions)
Slides:   1-PPTX 
We will discuss the strategic significance of infrastructure core services (compute, storage, network, and comprehensive security) required for robust big data solutions. Also the strategic significance of Hadoop 2.0, Hadoop/NoSQL convergence, and the critical need for effective modeling, query formulation, and data analysis capabilities as Hadoop becomes an enterprise platform for big data.
John Schroeder (MapR Technologies)
Slides:   1-PPTX 
This five-minute keynote will provide a quick overview of some of the more surprising things Hadoop is capable of in 5 minutes or less.
Patrick Shumate (Comcast Cable)
Slides:   1-PPTX 
How Comcast Turns Big Data into Real-Time Operational Insights
Matei Zaharia (Databricks)
Slides:   1-PDF    2-PPTX 
While the first big data systems made a new class of applications possible, organizations must now compete on the speed and sophistication with which they can draw value from data. Future data processing platforms will need to not just scale cost-effectively; but to allow ever more real-time analysis, and to support both simple queries and today's most sophisticated analytics algorithms.
Yann Ramin (Twitter, Inc.)
Slides:   external link
Twitter's Observability stack collects, processes, monitors and visualizes over 170 million real-time time series from all service and system components. This session covers how the stack is built and scales to enable developers and reliability engineers to build fault-tolerant distributed services. In this talk, you will learn what works and what doesn’t, from architecture to implementation.
Big Data without analytics is just data, but how do you perform the analytics? In this session, learn how In-Hadoop analytics is changing the game for the possibilities of Hadoop.
Alistair Croll (Solve For Interesting)
Slides:   1-BIN 
Alistair Croll, Strata Program Chair
Rob Rosen (Pentaho), Tim Garnto (edo)
Slides:   1-PPTX 
edo Interactive shares how they drive agile, improved decision-making by complementing native Hadoop technologies with analytical databases and ETL optimization and data visualization solutions from vendors such as Pentaho.
Peter Pirnejad (City of Palo Alto)
Slides:   1-PPTX 
Open Data, for many local governments is an experiment they are not willing to risk. In these lean times we continuously look for ways to add value and lower expenses. The City of Palo Alto has ventured into this space and has learned lessons as well as discovered proven approaches that will help transform open data from a trial to a successful business model for local agencies.
David Andrzejewski (Sumo Logic)
Slides:   1-PDF 
Organizations of all types and sizes are experiencing an explosion of machine log data whose literally inhuman diversity and scale overwhelms traditional analysis tools and techniques. We will discuss how machine learning can complement human expertise, enabling the extraction of valuable and actionable insights from log data.
Fernand Pajot (Change.org)
Slides:   1-PDF 
With more than 45 million users and over 40,000 petitions created every month, Change.org is the biggest online platform for social change around the world. This talk is about how both bleeding edge and simple machine learning algorithms are used at Change.org to connect users to petitions and social issues which are most relevant to them.
Baron Schwartz (VividCortex)
Slides:   1-BIN 
What if data doesn't need to be big? Many use cases are served as well, or nearly as well, by a Small Data mindset, storage, processing, and algorithms. This talk presents ideas and options you might not have considered for reducing big problems to comparatively small and cheap ones.
J.R. Arredondo (Rackspace)
Slides:   1-PDF 
We will discuss Rackspace’s vision for Data-as-a-Service, and provide a few key questions that could help you complement your technical analysis when choosing a database service. Along the way, we will also discuss parts of the portfolio of data services available at Rackspace, including SQL, MongoDB, Redis and Hadoop-based solutions.
Justin Langseth (Zoomdata, Inc.), Eva Andreasson (Cloudera)
Slides:   1-PDF 
Storing massive data is one challenge. Making it useful throughout all levels of a company in real time is quite another. The ability to intuitively sort, sift and analyze data through touch and gesture is here. We will review several case studies of how companies are creating an intuitive data driven cultures through Cloudera Search, leveraging Impala coupled with Zoomdata visualization.
Brian Abelson (CSV Soundsystem), Thomas Levine (csv soundsystem)
Slides:   1-PDF 
We have developed some open-source tools for building and scaling systems for realtime data analysis with data music videos and data gastronomification. We'll discuss the theory behind these two data analysis methods, and then we'll present case studies on how our tools are used to enable business analytics and instill a data-driven culture.
Ilya Sutskever (Google Inc)
Slides:   1-PDF 
Neural Networks (also known as Deep Learning) are biologically inspired machine learning models. In this talk, I will explain what neural networks are, how they work, and how they were used to achieve the recent record-breaking performance on speech recognition and visual object recognition.
Rodney Mullen (Almost Skateboards)
Rodney is probably the most influential skateboarder in history. He’ll gladly discuss how to balance the analytic methods that help us learn with the internal feel for what we are learning. You’ll get tips for perfecting your Heelflip, and discover how skateboarders perform death-defying stunts without, you know, dying.
Jen van der Meer (Luminary Labs)
Slides:   1-PPTX 
Open data has been established in government circles, particularly with the launch of open data government initiatives such as Data.gov and Data.gov.uk. These efforts are grounded on an overall philosophy that data should be available for to use and share without restrictions, and that government data has value as public infrastructure. But what about all of that commercial data – what role do...
Henrik Brink (wise.io), Joshua Bloom (University of California, Berkeley)
Slides:   1-PDF    external link
Going from raw data to reproducible and production-ready machine-learning in data pipelines and applications is an unsolved problem, leaving businesses with their valuable data unused. New algorithms and frameworks aim to improve the situation and this talk will introduce some of these using examples of real-world machine learning projects.
Farrah Bostic (The Difference Engine)
We feel safer in big numbers, and we believe that numbers don't lie. But numbers don't actually speak for themselves - people speak for them.
Raj Bains (Clustrix, Inc.)
Slides:   1-PPTX 
NewSQL has followed quickly on the heels of NoSQL - providing scale-out of NoSQL along with SQL and ACID guarantees. We'll discuss NewSQL with customer examples and contrast it with SQL on Hadoop implementations.
Ben Redman (Citus Data)
Slides:   1-PDF 
PostgreSQL is an advanced open source database known for its reliability. It also features a rich extension ecosystem that enables features like semi-structured data types, new SQL operators, and a columnar data store. This talk examines extensions available to PostgreSQL users and how CitusDB turns PostgreSQL into a scalable data platform for addressing real world analytics problems.
Max Richman (Mobile Accord - GeoPoll)
Slides:   1-PDF 
At GeoPoll we are building a mobile integration platform to poll millions around the world via their own mobile phones. We do this by integrating with mobile carriers in places like Afghanistan and Congo to target users by location, make messages free, & pay users directly. This is hard. We have learned many dos and don'ts which we would like to share.
Ben Waber (Sociometric Solutions)
Slides:   1-PPTX 
Today companies have no idea what makes their best employees tick, or why one team outperforms another that has the exact same processes. With the explosion of sensors in the workplace, however, we can now discover these best practices in real time. Using real-world case studies, we'll discuss how this fundamentally changes how people work, manage, and change.
David Epstein (Sports Illustrated)
Slides:   1-PPTX 
The gap between legendary and anonymity in sports is often less than a 1% performance difference in elite sports. Thus, finding the core, modifiable variables that determine performance and tweaking them ever so slightly can alchemize silver medals into gold ones.
Susan Etlinger (Altimeter Group)
Slides:   1-PDF 
Join industry analyst Susan Etlinger as she demonstrates how leading brands are deriving actionable intelligence from a holistic view of social and enterprise data, the challenges and opportunities in doing so, and the criteria required to achieve social data intelligence maturity.
Cameran Hetrick (VMware), Kimberly Stedman (Freelance)
Slides:   1-PPTX 
Combine your best algorithms and smartest data architecture, and what do you get? Without humans, you have an expensive, high tech brick. Humans generate data, which is used by and for humans to achieve human goals. If you want your data department to earn its keep by showing real value, you must build your social systems as meticulously as you build your pipeline.
Leo Meyerovich (Graphistry)
Slides:   1-PPTX 
Visualization is a weak link in big data tools: shoving 1MM rows into standard charts breaks their visual design and kills interactivity. In our mission to scale charts, we built the Superconductor language. It automatically compiles declarative visualizations into GPU code (WebCL+WebGL). This talk will explore how we're redesigning and optimizing core charts like heat maps and line graphs.
Rodney Mullen (Almost Skateboards)
The better we tune our practice, the more practice will make perfect.
Jane Kell (AutoTrader.com)
Slides:   1-PDF 
This presentation highlights the business processes, data architecture and analytic tools AutoTrader.com has put in place to enable robust analysis across subject areas, yielding improvements in the consumer experience and ultimately in customer value.
Krista Schnell (Accenture)
Slides:   1-PPTX 
Everyone wants to know, how do I get the most value out of my data? Data requires time and money for proper ingestion, transformation, and analysis, so there better be some concrete ROI. While there are many internal uses for this data (the importance of which cannot be overstated), in this “Insight Economy,” companies must realize that the value of data extends outside the organization as well.
Kurt Brown (Netflix)
Slides:   1-PPTX 
Netflix is a data-driven company. While "data-driven" is often no more than a lofty buzzword, we'll discuss how we make it a reality. We'll dive into the technologies we use and the philosophies underpinning how we get things done. We'll cover our "cloud native" data infrastructure, our use and contributions to open source software, and our open and enabling data environment.
Marcel Kornacker (Cloudera, Inc.)
Slides:   1-PDF 
Learn how and why it is now possible for Apache Hadoop to serve as a virtual Enterprise Data Warehouse (EDW) framework for native Big Data (stored in HDFS) - making it no longer necessary to move that data into the EDW at great expense simply for analysis. In this session, attendees will get an architect-level view of the solution and explore an example configuration and benchmark numbers.
André Karpištšenko (Marinexplore Inc)
Slides:   1-PDF 
Millions of sensors measure oceans and atmosphere to guide decisions made in offshore, renewables, maritime and other industries. Fast and responsive big data solution created by Marinexplore allows many organizations to plan their ocean activities next to others for the first time. New resulting workflows reduce financial, safety and environmental risks through improved decision making.
Kira Radinsky (SalesPredict)
Slides:   1-PDF    external link
In our highly data-driven environment, businesses are essentially becoming semi-autonomous agents, constantly competing for resources, customers and talent.
Abe Gong (Jawbone)
Slides:   1-PDF 
Creating value from big, messy data sets can be a daunting task. The session introduces the Sidekick Pattern: using small, curated data to increase the value of Big Data. Drawing on lessons from data science for Jawbone’s UP fitness tracker, we will see how smart selection of data sidekicks can accelerate analysis, solve cold start problems, and simplify complicated data pipelines.
Ryan Cunningham (ClipCard)
Slides:   1-PDF 
We're failing at big data, and bigger technology isn't helping. Complex infrastructure shouldn't justify complicated experiences. Let's apply the principles of consumer app culture to enterprise decision-making in a way that goes beyond dashboards. Let's use design thinking and metadata to connect people to information in a world where complexity is inevitable and technology alone is insufficient.
Max Shron (Polynumeral)
Slides:   1-BIN 
Why have powerful tools if you aren't asking the right questions? Good questions trump shiny tools, but our community has done little to improve how we train people in the "soft side" of data science. We will show how to borrow ideas from design, the humanities, consulting practices to structure problems and improve the questions we ask of our data.
Farrah Bostic (The Difference Engine)
Slides:   1-PDF 
Using the right tool for the job, understanding how the right data helps make better decisions, and having a sound data infrastructure are needed before big data will come to your rescue. I'll tell a few stories of marketers failing at data, and one or two about the rare client who does it right.
Slides:   1-PPT 
This presentation discusses how we used complex event processing (CEP) and MapReduce based technologies to track and process data from a soccer match as part of the annual DEBS event processing challenge while achieving throughput in excess of 100,000 events/sec.
Edith Harbaugh (LaunchDark.ly)
Slides:   1-PPTX 
In the haste to build and ship product, metrics to measure effectiveness and learn from user behavior can't be left behind. The heart of TripIt is parsing travel itineraries into trips that users can access anywhere, on web, mobile or tablet. TripIt uses data to navigate and prioritize support for a wide range of travel confirmation templates.