Skip to main content

Strata 2014 Schedule

Below are the confirmed and scheduled talks at Strata 2014. Note: The schedule is subject to change.

Customize Your Own Schedule

Create your own conference schedule using the personal scheduler function. Mark the Tutorials, Sessions, Keynotes, and Events you want to attend by clicking on the calendar icon [calendar icon] next to each listing. Then click on personal schedule below and get your own customized schedule generated.

Ballroom AB
Add Thorn in the Side of Big Data: Too Few Artists to your personal schedule
10:40am Thorn in the Side of Big Data: Too Few Artists Chris Re (Stanford University)
Add Organizing Big Data with the Crowd to your personal schedule
1:50pm Organizing Big Data with the Crowd Lukas Biewald (CrowdFlower)
Add Network Science Made Simple: SNA for Pie Chart Makers to your personal schedule
2:20pm Network Science Made Simple: SNA for Pie Chart Makers Marc Smith (Connected Action Consulting Group)
Add Data Science – How to Build and Deploy a Team of Data Scientists to your personal schedule
4:00pm Data Science – How to Build and Deploy a Team of Data Scientists Diane Chang (Intuit), Steven Hillion (Alpine Data Labs), Nick Kolegraff (Rackspace), Matthew Gee (Impact Lab / University of Chicago )
Ballroom CD
Add 10,000: The Most Dangerous Number in Sports to your personal schedule
10:40am 10,000: The Most Dangerous Number in Sports David Epstein (Sports Illustrated)
Add Mining Student Notes in Real Time to Provide Study Guides to your personal schedule
11:30am Mining Student Notes in Real Time to Provide Study Guides Perry Samson (University of Michigan)
Add Building a Lightweight Discovery Interface for Chinese Patents to your personal schedule
1:30pm Building a Lightweight Discovery Interface for Chinese Patents Eric Pugh (OpenSource Connections)
Add How Twitter Monitors Millions of Time-series to your personal schedule
2:20pm How Twitter Monitors Millions of Time-series Yann Ramin (Twitter, Inc.)
Add Exascale Data Analytics @ Facebook to your personal schedule
4:50pm Exascale Data Analytics @ Facebook Sambavi Muthukrishnan (Facebook)
GA Ballroom J
Add Navigating the Big Data Vendor Landscape to your personal schedule
10:40am Navigating the Big Data Vendor Landscape Edd Dumbill (Silicon Valley Data Science)
Add LinkedIn's Stream Experimentation Framework to your personal schedule
11:30am LinkedIn's Stream Experimentation Framework Joseph Adler (Interana, Inc.), Xin Fu (LinkedIn Corporation), Bee-Chung Chen (LinkedIn, Inc.)
Add Making Big Data Portable to your personal schedule
1:30pm Making Big Data Portable Soam Acharya (Altiscale), Charles Wimmer (Altiscale), David Chaiken (Altiscale)
Add Break Down Data Silos with Apache Accumulo to your personal schedule
1:50pm Break Down Data Silos with Apache Accumulo Adam Fuchs (Sqrrl)
Add Stand Back, I'm Going To Try Science! to your personal schedule
2:20pm Stand Back, I'm Going To Try Science! Rachel Poulsen (Silicon Valley Data Science), John Akred (Silicon Valley Data Science)
Add Working With Time Series Data Using Apache Cassandra to your personal schedule
2:40pm Working With Time Series Data Using Apache Cassandra Patrick McFadin (Datastax)
Add Secrets of Apache Hive Queries and UDFs to your personal schedule
4:00pm Secrets of Apache Hive Queries and UDFs Shrikanth Shankar (Qubole Inc.)
Add Apache Hadoop 2.0: Migration from 1.0 to 2.0 to your personal schedule
4:50pm Apache Hadoop 2.0: Migration from 1.0 to 2.0 Vinod Kumar Vavilapalli (Hortonworks)
Ballroom E
Add Machine Learning for Social Change to your personal schedule
11:30am Machine Learning for Social Change Fernand Pajot (Change.org)
Add Evolving Data Governance for the Big Data Enterprise to your personal schedule
1:30pm Evolving Data Governance for the Big Data Enterprise Scott Lee (EMC), Rachel Haines (EMC)
Add Soylent Mean: Data Science is Made of People to your personal schedule
4:00pm Soylent Mean: Data Science is Made of People Cameran Hetrick (VMware), Kimberly Stedman (Freelance)
Add Exploring the Notability Gender Gap (Freebase, BigQuery, Maps) to your personal schedule
4:50pm Exploring the Notability Gender Gap (Freebase, BigQuery, Maps) Felipe Hoffa (Google), Shawn Simister (Google), Ewa Gasperowicz (Google)
Mission City M
Add You're Halfway There: Moving from Insight to Action to your personal schedule
10:40am You're Halfway There: Moving from Insight to Action Bob Filbin (Crisis Text Line)
Add Thinking with Data to your personal schedule
11:30am Thinking with Data Max Shron (Polynumeral)
Add Not Your Typical VC Panel to your personal schedule
1:30pm Not Your Typical VC Panel Michael Dauber (Amplify), Renee DiResta (Haven), Jake Flomenberg (Accel Partners), Matthew Ocko (Data Collective), Ross Fubini (Canaan Partners)
Add Harvard's Clean Energy Project: Big Data Maps To Renewable Energy  to your personal schedule
2:20pm Harvard's Clean Energy Project: Big Data Maps To Renewable Energy Kai Trepte (Harvard Clean Energy Project)
Add Bedtime Stories: Learning from Sleep Data to your personal schedule
4:00pm Bedtime Stories: Learning from Sleep Data Monica Rogati (Jawbone)
Add Sending Millions of Surveys Around the World on Mobile Phones to your personal schedule
4:50pm Sending Millions of Surveys Around the World on Mobile Phones Max Richman (Mobile Accord - GeoPoll)
GA Ballroom K
Add Minority Report Meets Big Data: Touch and Interactive Big Data is Here to your personal schedule
11:30am Minority Report Meets Big Data: Touch and Interactive Big Data is Here Justin Langseth (Zoomdata, Inc.), Eva Andreasson (Cloudera)
Add Superconductor: Scaling Charts with Design and GPUs to your personal schedule
1:50pm Superconductor: Scaling Charts with Design and GPUs Leo Meyerovich (Graphistry)
Add Unlocking the Secrets of Gertrude Stein to your personal schedule
2:20pm Unlocking the Secrets of Gertrude Stein Ian Timourian (Paxata)
Add Music Videos and Gastronomification for Big Data Analysis to your personal schedule
4:00pm Music Videos and Gastronomification for Big Data Analysis Brian Abelson (CSV Soundsystem), Thomas Levine (csv soundsystem)
Add Making Data Human to your personal schedule
4:50pm Making Data Human Shelley Evenson (Fjord)
Ballroom F
Add Fighting Global Cybercrime and BotNets using Big Data to your personal schedule
10:40am Fighting Global Cybercrime and BotNets using Big Data Bryan Hurd (Microsoft Cybercrime Center), Herain Oberoi (Microsoft)
Add Harness Data in Real-Time with Infinite Storage to your personal schedule
11:30am Harness Data in Real-Time with Infinite Storage Yuvaraj Athur Raghuvir (SAP Labs LLC.)
Add Making Big Data Cost Effective in a Bare Metal Cloud to your personal schedule
1:30pm Making Big Data Cost Effective in a Bare Metal Cloud Harold Hannon (SoftLayer)
Add Delivering on the Promise of Big Data to your personal schedule
2:20pm Delivering on the Promise of Big Data Arvind Parthasarathi (YarcData)
Add Big Data: Beyond Bare-Metal? to your personal schedule
4:00pm Big Data: Beyond Bare-Metal? Mike Wendt (Accenture Technology Labs)
Ballroom G
Add Scalable PostgreSQL as your data platform to your personal schedule
2:20pm Scalable PostgreSQL as your data platform Ben Redman (Citus Data)
Add Tracking a Soccer Game with Big Data to your personal schedule
4:50pm Tracking a Soccer Game with Big Data Srinath Perera (WSO2)
Ballroom H
Add You Don't Need to Boil the Big Data Ocean with Hadoop to your personal schedule
11:30am You Don't Need to Boil the Big Data Ocean with Hadoop Ben Werther (Platfora), Sanjay Mathur (Silicon Valley Data Science)
Add How Evernote Measures Conversion Using Hadoop Analytics to your personal schedule
1:30pm How Evernote Measures Conversion Using Hadoop Analytics Damon Cool (Evernote), John Santaferraro (Actian Corporation )
Add Twitter and HP HAVEn: The Big Data Big Picture. to your personal schedule
4:00pm Twitter and HP HAVEn: The Big Data Big Picture. Sanjay Goil (Autonomy IDOL)
10:10am Morning Break sponsored by Cloudera
Room: Exhibit Hall
3:00pm Afternoon Break sponsored by MapR
Room: Exhibit Hall
Add Booth Crawl to your personal schedule
5:30pm Plenary
Room: Exhibit Hall
Booth Crawl
Add Wednesday Lunchtime BoF Tables to your personal schedule
12:10pm Plenary
Room: Exhibit Hall & Hyatt Santa Clara
Wednesday Lunchtime BoF Tables
Add Wednesday Keynote Welcome to your personal schedule
8:45am Plenary
Room: Mission City
Wednesday Keynote Welcome Roger Magoulas (O'Reilly Media), Alistair Croll (Solve For Interesting)
Add Crossing the Chasm: What’s New, What’s Not to your personal schedule
8:50am Plenary
Room: Mission City
Crossing the Chasm: What’s New, What’s Not Geoffrey Moore (Geoffrey Moore Consulting)
Add Evolution from Apache Hadoop to the Enterprise Data Hub to your personal schedule
9:05am Plenary
Room: Mission City
Evolution from Apache Hadoop to the Enterprise Data Hub Amr Awadallah (Cloudera, Inc.)
Add Collecting Massive Data via Crowdsourcing to your personal schedule
9:10am Plenary
Room: Mission City
Collecting Massive Data via Crowdsourcing John Schitka (SAP)
Add Empowering Personalized Learning with Big Data to your personal schedule
9:15am Plenary
Room: Mission City
Empowering Personalized Learning with Big Data Ramona Pierson (Declara)
Add Hadoop in 5 Minutes or Less to your personal schedule
9:25am Plenary
Room: Mission City
Hadoop in 5 Minutes or Less John Schroeder (MapR Technologies)
Add People are Data Too to your personal schedule
9:30am Plenary
Room: Mission City
People are Data Too Farrah Bostic (The Difference Engine)
Add Bringing Big Data to One Billion People to your personal schedule
9:35am Plenary
Room: Mission City
Bringing Big Data to One Billion People Quentin Clark (Microsoft)
Add Small Data in Sports: Little Differences that Mean Big Outcomes to your personal schedule
9:45am Plenary
Room: Mission City
Small Data in Sports: Little Differences that Mean Big Outcomes David Epstein (Sports Illustrated)
Add The Art of Good Practice to your personal schedule
9:55am Plenary
Room: Mission City
The Art of Good Practice Rodney Mullen (Almost Skateboards)
Add Data After Dark: Club Strata to your personal schedule
8:00pm Plenary
Room: Santa Clara Ballroom
Data After Dark: Club Strata
7:00pm Dinner
Room: On Your Own
10:40am-11:20am (40m) Data Science
Thorn in the Side of Big Data: Too Few Artists
Chris Re (Stanford University)
A new generation of data processing systems, including web search, Google's Knowledge Graph, IBM's Watson, and several different recommendation systems, combine rich databases with software driven by machine learning. This talk describes our recent thoughts on one crucial pain point in the construction of trained systems feature engineering.
11:30am-12:10pm (40m) Data Science
Predictive Modeling in the Cloud with Scikit-learn and IPython
Olivier Grisel (INRIA)
IPython and scikit-learn offer a nice environment for interactive data analytics in general and predictive modelling in particular. This presentation will give an overview on how to use both to perform tasks such as distributed model parameter tuning and parallel training of Random Forests on ad hoc compute clusters provisioned in the cloud.
1:30pm-1:50pm (20m) Data Science
Crowdsourcing at Locu: How I Learned to Stop Worrying and Love the Crowd
Adam Marcus ( Independent)
Machine learning and paid crowdsourcing power several virtuous cycles in Locu's data processing pipeline. To solve various problems, we interact with hundreds of long-term crowd workers on oDesk and tens of thousands of shorter-term workers on CrowdFlower. Come learn about Locu's magic with examples based on problems we solve every day.
1:50pm-2:10pm (20m) Data Science
Organizing Big Data with the Crowd
Lukas Biewald (CrowdFlower)
Data scientists know how hard it is to collect, categorize and label vast amounts of data. But some smart data scientists are effectively leveraging the human intelligence of the crowd to solve these problems, resulting in better training of machine learning models and improved system performance.
2:20pm-2:40pm (20m) Data Science
Network Science Made Simple: SNA for Pie Chart Makers
Marc Smith (Connected Action Consulting Group)
SNA, social network analysis, is a powerful technique for making sense of a connected world. But the skills needed to collect, analyze, visualize, and gain insights into collections of connections are hard to find. Now, new tools make networks as easy to manage as a pie chart. Using the familiar Excel spreadsheet, NodeXL enables end users to gain insights into Twitter, Facebook & more.
2:40pm-3:00pm (20m) Data Science
Friending Graph Analytics: Large-Scale Graph Processing Made Easy
Ted Willke (Intel)
Graph analytics promises to uncover new patterns in big data - but it's not easy to use commercially. Why is it so tough for data scientists to construct graphs and extract insight? This talk discusses Intel's efforts to deliver a graph cluster solution that is as easy to work with as it is powerful.
4:00pm-4:40pm (40m) Data Science
Data Science – How to Build and Deploy a Team of Data Scientists
Diane Chang (Intuit) et al
In this panel discussion, experts from four different industries will share their first-hand experiences building and deploying teams of data scientists.
4:50pm-5:30pm (40m) Data Science
The Sidekick Pattern: Using Small Data to Increase the Value of Big Data
Abe Gong (Human Centric Data Science)
Creating value from big, messy data sets can be a daunting task. The session introduces the Sidekick Pattern: using small, curated data to increase the value of Big Data. Drawing on lessons from data science for Jawbone’s UP fitness tracker, we will see how smart selection of data sidekicks can accelerate analysis, solve cold start problems, and simplify complicated data pipelines.
10:40am-11:20am (40m) Data in Action
10,000: The Most Dangerous Number in Sports
David Epstein (Sports Illustrated)
Epstein explains the origins of the "magic number," how it should be used, and how it is often misused in a manner that often hinders performance science-and leads sports executives to overlook simple but important data-as well as the development of athletes.
11:30am-12:10pm (40m) Data in Action
Mining Student Notes in Real Time to Provide Study Guides
Perry Samson (University of Michigan)
What if students could be provided helpful feedback in real-time based on the notes they are typing in class? This talk presents a prototype that has been in use in multiple courses at the University of Michigan to both challenge students' understanding based on the words they type in class and offer further resources for further study.
1:30pm-2:10pm (40m) Data in Action
Building a Lightweight Discovery Interface for Chinese Patents
Eric Pugh (OpenSource Connections)
The United States Patent and Trademark Office wanted a simple, lightweight, yet modern and rich discovery interface for Chinese patent data. This is how we did it.
2:20pm-3:00pm (40m) Data in Action
How Twitter Monitors Millions of Time-series
Yann Ramin (Twitter, Inc.)
Twitter's Observability stack collects, processes, monitors and visualizes over 170 million real-time time series from all service and system components. This session covers how the stack is built and scales to enable developers and reliability engineers to build fault-tolerant distributed services. In this talk, you will learn what works and what doesn’t, from architecture to implementation.
4:00pm-4:40pm (40m) Data in Action
The Netflix Data Platform - A Recipe for High Business Impact
Kurt Brown (Netflix)
Netflix is a data-driven company. While "data-driven" is often no more than a lofty buzzword, we'll discuss how we make it a reality. We'll dive into the technologies we use and the philosophies underpinning how we get things done. We'll cover our "cloud native" data infrastructure, our use and contributions to open source software, and our open and enabling data environment.
4:50pm-5:30pm (40m) Data in Action
Exascale Data Analytics @ Facebook
Sambavi Muthukrishnan (Facebook)
Data analytics is at the heart of product development at Facebook. Facebook’s data warehouse has grown rapidly over the years, and poses unique scalability challenges. This talk will briefly outline the evolution of the analytics software stack in the last year (both storage and query engines) and then delve deeper into the data management and compute challenges at this scale.
10:40am-11:20am (40m) Hadoop and Beyond
Navigating the Big Data Vendor Landscape
Edd Dumbill (Silicon Valley Data Science)
A maze of twisty databases, all of which look the same, and each claim they're best for the job. Welcome to the world of choosing big data vendors. In this session we'll map out the data tool landscape, and lay out a framework to help you choose a solution, or elect to build one yourself.
11:30am-12:10pm (40m) Hadoop and Beyond
LinkedIn's Stream Experimentation Framework
Joseph Adler (Interana, Inc.) et al
This talk describes how LinkedIn's engineering, data science, and reporting teams work together to develop, test, and rank new insights, recommendations, and updates shown on our home page stream.
1:30pm-1:50pm (20m) Hadoop and Beyond
Making Big Data Portable
Soam Acharya (Altiscale) et al
The growing popularity of Hadoop has led to an increasing number of clusters worldwide. Priming these clusters with data from existing client repositories is difficult due to a number of issues including data size, network constraints, security & lack of domain knowledge. In this talk, we present a number of techniques & best practices for uploading large amounts of data to remote Hadoop clusters.
1:50pm-2:10pm (20m) Hadoop and Beyond
Break Down Data Silos with Apache Accumulo
Adam Fuchs (Sqrrl)
Apache Accumulo has evolved from a niche government project to a key component of the Hadoop ecosystem with adopters across a variety of industries. One important differentiator for Accumulo is the concept of "cell-level security." Learn how to properly implement cell-level security concepts from the former technical director of the Accumulo project at NSA.
2:20pm-2:40pm (20m) Hadoop and Beyond
Stand Back, I'm Going To Try Science!
Rachel Poulsen (Silicon Valley Data Science) et al
Design of Experiments (DOE) is a scientific approach to understanding causality using data collection and applied statistical techniques. Through a series of relevant case studies, this session will review the “design” and the “experiment” side of DOE, including systematic data collection and basic statistical applications, and discuss relevant applications beyond A/B testing websites.
2:40pm-3:00pm (20m) Hadoop and Beyond
Working With Time Series Data Using Apache Cassandra
Patrick McFadin (Datastax)
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data. This talk will give an overview of the many ways you can be successful.
4:00pm-4:40pm (40m) Hadoop and Beyond
Secrets of Apache Hive Queries and UDFs
Shrikanth Shankar (Qubole Inc.)
Shrikanth Shankar, Qubole’s VP of Engineering, shares his best practices for building high-performance, scalable queries and deploying User Defined Functions (UDFs) to Big Data applications in Apache Hive. For data analysts and data scientists in the trenches, this is a key session to attend.
4:50pm-5:30pm (40m) Hadoop and Beyond
Apache Hadoop 2.0: Migration from 1.0 to 2.0
Vinod Kumar Vavilapalli (Hortonworks)
The Hadoop 2.0 revolution is in full force! Organizations, companies, users are gearing up for the move from 1.0 to 2.0. In this talk, we will discuss what Hadoop 2.0 is about, what YARN is, what features that HDFS2 unlocks and what it means to move to 2.0. We'll discuss this major migration from 1.0 to 2.0 from various perspectives - admins, frameworks, end users & data processing platforms.
10:40am-11:20am (40m) Ethics, Policy, and Privacy
Adaptive Adversaries: Building Systems to Fight Fraud and Cyber Intruders
Ari Gesher (Palantir Technologies)
Statistical methods tends to fail when there is someone on the other side of a problem actively evading detection. Here we look at three systems successfully used to fight adaptive adversaries engaged in fraud and cyber attacks. Using a combination of big data techniques and interactive human analysis, these systems are protecting commercial banks, pharmaceutical companies, and governments.
11:30am-12:10pm (40m) Ethics, Policy, and Privacy
Machine Learning for Social Change
Fernand Pajot (Change.org)
With more than 45 million users and over 40,000 petitions created every month, Change.org is the biggest online platform for social change around the world. This talk is about how both bleeding edge and simple machine learning algorithms are used at Change.org to connect users to petitions and social issues which are most relevant to them.
1:30pm-2:10pm (40m) Ethics, Policy, and Privacy
Evolving Data Governance for the Big Data Enterprise
Scott Lee (EMC) et al
Earlier Data Governance generations (that support BI-DW or MDM) succeeded by aligning stakeholders and improving data interoperability. But in the world of Big Data, interoperability is table-stakes, and next-gen Data Governance must provide contextual intelligence sufficient to reason out complex inquiries across diverse data. How? Would you believe a mash-up of building codes and game theory?
2:20pm-3:00pm (40m) Ethics, Policy, and Privacy
A Different Look at Data and Security - Learning to Live with Fear
Pablos Holman (Turing AI)
We are at the beginning of creating a generation of scientists & analysts who can relate to data in entirely new ways. The feeble computational models we’ve created in Excel over the course of our lives are fundamentally different than what is just becoming possible.
4:00pm-4:40pm (40m) Ethics, Policy, and Privacy
Soylent Mean: Data Science is Made of People
Cameran Hetrick (VMware) et al
Combine your best algorithms and smartest data architecture, and what do you get? Without humans, you have an expensive, high tech brick. Humans generate data, which is used by and for humans to achieve human goals. If you want your data department to earn its keep by showing real value, you must build your social systems as meticulously as you build your pipeline.
4:50pm-5:30pm (40m) Ethics, Policy, and Privacy
Exploring the Notability Gender Gap (Freebase, BigQuery, Maps)
Felipe Hoffa (Google) et al
What can an SQL query teach us about the gender gap? We'll dive into the 20 million Freebase entities to focus on people notable enough to be part of it. What percentage of them are women? How is the gender gap divided by profession? How is it changing throughout the years? How do all this variables this look mapped at a country, state, and neighborhood level?
10:40am-11:20am (40m) Connected World
You're Halfway There: Moving from Insight to Action
Bob Filbin (Crisis Text Line)
The measure of success for a data scientist is not number of insights, but impact on co-workers' behavior. Moving from insight to action requires an art underutilized by the data science community: storytelling. I will cover techniques including the Fogg model, loss aversion, and minimum viable stories, using examples of my failures and successes in driving behavioral change with data.
11:30am-12:10pm (40m) Connected World
Thinking with Data
Max Shron (Polynumeral)
Why have powerful tools if you aren't asking the right questions? Good questions trump shiny tools, but our community has done little to improve how we train people in the "soft side" of data science. We will show how to borrow ideas from design, the humanities, consulting practices to structure problems and improve the questions we ask of our data.
1:30pm-2:10pm (40m) Connected World
Not Your Typical VC Panel
Michael Dauber (Amplify) et al
A group of VCs who invest from very early, through later stage investments talk about all things Big Data. There will be no “3 Vs” discussion here. The Panelists are committed to making this a lively discussion about topics ranging from the typical (what sectors do they want to invest in) to the atypical (what’s out there that they don’t like?
2:20pm-3:00pm (40m) Connected World
Harvard's Clean Energy Project: Big Data Maps To Renewable Energy
Kai Trepte (Harvard Clean Energy Project)
The present fossil fuel based economy must give way to a renewable energy based future. The Harvard Clean Energy Project set out to discover new molecular materials for the next generation of organic solar cells. In studying 2.3 million (m) compounds with 24m conformers in 150m density functional theory calculations, this Big Data project will benefit mankind aiding the quest for clean energy.
4:00pm-4:40pm (40m) Connected World
Bedtime Stories: Learning from Sleep Data
Monica Rogati (Jawbone)
We optimize ads, but not our mood. We know more about our tweets than our own bodies. That's all about to change. As wearables transform the 'quantified self' from a niche to a mainstream market, they are generating vast amounts of data about our health, habits, and lifestyles
4:50pm-5:30pm (40m) Connected World
Sending Millions of Surveys Around the World on Mobile Phones
Max Richman (Mobile Accord - GeoPoll)
At GeoPoll we are building a mobile integration platform to poll millions around the world via their own mobile phones. We do this by integrating with mobile carriers in places like Afghanistan and Congo to target users by location, make messages free, & pay users directly. This is hard. We have learned many dos and don'ts which we would like to share.
10:40am-11:20am (40m) Design
Information Visualization for Large-Scale Data Workflows
Michael Conover (LinkedIn)
A core element of product innovation and successful predictive modeling, information visualization plays a central role in effective data processing pipelines. In this talk, we will explore how the technologies and workflow patterns used by LinkedIn data scientists can be applied to analytical challenges found across a wide variety of problem domains.
11:30am-12:10pm (40m) Design
Minority Report Meets Big Data: Touch and Interactive Big Data is Here
Justin Langseth (Zoomdata, Inc.) et al
Storing massive data is one challenge. Making it useful throughout all levels of a company in real time is quite another. The ability to intuitively sort, sift and analyze data through touch and gesture is here. We will review several case studies of how companies are creating an intuitive data driven cultures through Cloudera Search, leveraging Impala coupled with Zoomdata visualization.
1:30pm-1:50pm (20m) Design
Napoleon’s March to d3.js--The Future of Big, Real-time Interactive Data Visualization
Justin Langseth (Zoomdata, Inc.) et al
The true power of big data will be realized when average people can use complex analytics to solve everyday problems. We will describe a future engagement model derived from work in the Intelligence Community, reviewing real-world use cases showing how user-centric design is transforming big data from a science requiring specialists to elegant visualizations that deliver insight to average users.
1:50pm-2:10pm (20m) Design
Superconductor: Scaling Charts with Design and GPUs
Leo Meyerovich (Graphistry)
Visualization is a weak link in big data tools: shoving 1MM rows into standard charts breaks their visual design and kills interactivity. In our mission to scale charts, we built the Superconductor language. It automatically compiles declarative visualizations into GPU code (WebCL+WebGL). This talk will explore how we're redesigning and optimizing core charts like heat maps and line graphs.
2:20pm-3:00pm (40m) Design
Unlocking the Secrets of Gertrude Stein
Ian Timourian (Paxata)
Happy accidents can influence one's creative process. Ian Timourian will discuss his exploration of the algorithms and techniques utilized by the famous poet Gertrude Stein through visualization.
4:00pm-4:40pm (40m) Design
Music Videos and Gastronomification for Big Data Analysis
Brian Abelson (CSV Soundsystem) et al
We have developed some open-source tools for building and scaling systems for realtime data analysis with data music videos and data gastronomification. We'll discuss the theory behind these two data analysis methods, and then we'll present case studies on how our tools are used to enable business analytics and instill a data-driven culture.
4:50pm-5:30pm (40m) Design
Making Data Human
Shelley Evenson (Fjord)
This talk by Shelley Evenson, Executive Director of Organizational Evolution at Fjord, will outline the key tenets of designing for big data: the difference between using personal or aggregate data, how to identify and utilize data patterns, how to build trust, and ways to deliver ongoing value at the right moments.
10:40am-11:20am (40m) Sponsored
Fighting Global Cybercrime and BotNets using Big Data
Bryan Hurd (Microsoft Cybercrime Center) et al
BotNets and cybercrime are by their very nature Big Data problems. The Microsoft Cybercrime Center is working in conjunction with law enforcement, public sector, commercial and academic partners to investigate, disable and prosecute cyber criminals...
11:30am-12:10pm (40m) Sponsored
Harness Data in Real-Time with Infinite Storage
Yuvaraj Athur Raghuvir (SAP Labs LLC.)
To seize the future data must be harnessed in actionable time. Based on a real deployment see to achieve instant results with infinite storage - filter large amounts of cold data in Hadoop, analyze in Real-Time with SAP HANA and visualize using SAP Lumira. Learn how solutions from SAP and our Hadoop partners can help your organization seize the future and gain unprecedented insight from Big Data.
1:30pm-2:10pm (40m) Sponsored
Making Big Data Cost Effective in a Bare Metal Cloud
Harold Hannon (SoftLayer)
The cloud provides an easy onramp to building and deploying Big Data solutions, particularly the latest technologies that favor scale-out architectures. Transitioning from initial deployment to a large-scale, highly performant operation without breaking the bank may not be easy.
2:20pm-3:00pm (40m) Sponsored
Delivering on the Promise of Big Data
Arvind Parthasarathi (YarcData)
The real promise of big data isn’t about merely doing analytics cost-effectively and at scale; it’s about discovery. Data discovery means uncovering hidden patterns from disparate sources without needing to know which questions to ask or the data relationships in advance...
4:00pm-4:40pm (40m) Sponsored
Big Data: Beyond Bare-Metal?
Mike Wendt (Accenture Technology Labs)
In this session, we will share the results of our study, a price-performance comparison of a bare-metal Hadoop cluster and cloud-based Hadoop clusters.
4:50pm-5:30pm (40m) Sponsored
Data Transformation: A User-Centric Approach to Accessing and Analyzing Big Data
Joe Hellerstein (UC Berkeley)
Join Trifacta's founders and their customers to learn how Data Transformation is changing the way people work with data. By increasing data analyst productivity and giving business analysts direct access to Big Data for the first time, Trifacta's technology increases the breadth of data they work with, significantly shortens "time to insight", and enables better business decisions.
10:40am-11:20am (40m) Sponsored
Apache Hadoop and the Emergence of the Enterprise Data Hub
Eli Collins (Cloudera)
In this talk, we'll explore how Apache Hadoop has rapidly evolved to become the new foundation for enterprise analytics - the enterprise data hub - and learn about the state-of-the-art in deploying a modern data warehouse on top of the Hadoop stack.
11:30am-12:10pm (40m) Sponsored
Building the Next Generation Data Architecture with Hadoop, Data Warehouse & Data Discovery Platform
Bill Franks (Teradata Corporation)
Attend this session to learn how you can take advantage of the new economics of data. This session will present examples of how leading organizations are evolving their enterprise data architectures to bring together the Data Warehouse, Hadoop & Data Discovery Platforms so All Users can benefit from ALL Analytics on ALL Data.
1:30pm-2:10pm (40m) Sponsored
Building a Data-centered Data Center for Agile Development
Justin Makeig (MarkLogic)
Most data centers are filled with rigid data servers that are tightly linked to specific applications, leading to data duplication, lengthy development cycles, and unnecessary costs. Learn how you can use the MarkLogic Enterprise NoSQL database platform to help create a flexible, agile data fabric that will allow you to iterate your application development, optimize your data, and reduce costs.
2:20pm-3:00pm (40m) Sponsored
Scalable PostgreSQL as your data platform
Ben Redman (Citus Data)
PostgreSQL is an advanced open source database known for its reliability. It also features a rich extension ecosystem that enables features like semi-structured data types, new SQL operators, and a columnar data store. This talk examines extensions available to PostgreSQL users and how CitusDB turns PostgreSQL into a scalable data platform for addressing real world analytics problems.
4:00pm-4:40pm (40m) Sponsored
Transforming Search Engine Marketing at Ask.com
Mohit Sati (Ask.com)
Search Engine Marketing is an important revenue opportunity for Ask.com, planed to nearly double in 2014. Fueled by growth and acquisitions such as About.com and Investopedia, the keyword portfolio will grow by 90x through 2014. SEM Analytics at Ask.com involves tens of millions of cost metrics stored daily, hundreds of millions of portfolio keywords, and billions of historical costs.
4:50pm-5:30pm (40m) Sponsored
Tracking a Soccer Game with Big Data
Srinath Perera (WSO2)
This presentation discusses how we used complex event processing (CEP) and MapReduce based technologies to track and process data from a soccer match as part of the annual DEBS event processing challenge while achieving throughput in excess of 100,000 events/sec.
10:40am-11:20am (40m) Sponsored
Best Practices for Hadoop In Production - Panel Discussion Facilitated by Forrester Analyst
Mike Gualtieri (Forrester Research)
Mike Gualtieri, principal analyst at Forrester Research, Inc., will facilitate a panel of production Hadoop users – including Cisco, The Climate Corporation, The Rubicon Project, and Solutionary – to discuss the challenges and best practices for deploying Hadoop in production. Join us for an engaging conversation on tips and tricks in deploying Hadoop in production.
11:30am-12:10pm (40m) Sponsored
You Don't Need to Boil the Big Data Ocean with Hadoop
Ben Werther (Platfora) et al
Join us as we discuss the real-world applications of big data, examine what's working and what isn't, and discuss why you don't need to boil the big data ocean with Hadoop.
1:30pm-2:10pm (40m) Sponsored
How Evernote Measures Conversion Using Hadoop Analytics
Damon Cool (Evernote) et al
In 2012, Evernote took proactive steps to prepare for a rapidly expanding customer base by making the transition from 18-hour queries on a MySQL server to ad hoc analytics for 200 million daily events—while on a budget. This session explains how Evernote is scaling to hundreds of terabytes and analyzes 200 million events per day using two-tier architecture including Hadoop and analytic platform.
2:20pm-3:00pm (40m) Sponsored
Collaborative Predictive Analytics: How Sony, Havas, and Aridhia Opened the Black Box.
Bruno Aziza (Alpine Data Labs)
In this panel discussion, we’ll hear from entertainment, healthcare, and media industry leaders as they discuss their strategy to demystify analytics end to end. We’ll have a question and answer session moderated by Alpine Data Labs.
4:00pm-4:40pm (40m) Sponsored
Twitter and HP HAVEn: The Big Data Big Picture.
Sanjay Goil (Autonomy IDOL)
Forget the 140 characters, Twitter is Big Data. Every day sees around 100TBs of data ingested and tens of thousands of Hadoop jobs. Join us to hear how Twitter is using HP’s HAVEn platform to run their Big Data analytics. Learn why they’ve integrated HP Vertica with their Hadoop infrastructure to deliver the scale and speed needed for their analytics.
4:50pm-5:30pm (40m) Sponsored
Getting a Handle on Hadoop and its Potential to Catalyze a New Information Architecture Model
Milan Vaclavik (CenturyLink Technology Solutions)
We will discuss the strategic significance of infrastructure core services (compute, storage, network, and comprehensive security) required for robust big data solutions. Also the strategic significance of Hadoop 2.0, Hadoop/NoSQL convergence, and the critical need for effective modeling, query formulation, and data analysis capabilities as Hadoop becomes an enterprise platform for big data.
10:10am-10:40am (30m)
Break: Morning Break sponsored by Cloudera
3:00pm-4:00pm (1h)
Break: Afternoon Break sponsored by MapR
5:30pm-7:00pm (1h 30m) Event
Booth Crawl
Quench your thirst with vendor-hosted libations and snacks while you check out all the cool stuff in the Expo Hall.
12:10pm-1:30pm (1h 20m)
Wednesday Lunchtime BoF Tables
Birds of a Feather (BoF) sessions are informal roundtable discussions happening during lunch on Wednesday, February 12 and Thursday, February 13. You can join any BoF table or start your own with a topic of your choice. The BoF sign-up board will be near the Registration area.
8:45am-8:50am (5m)
Wednesday Keynote Welcome
Roger Magoulas (O'Reilly Media) et al
Strata Program Chairs, Roger Magoulas and Alistair Croll, welcome you to the first day of keynotes.
8:50am-9:05am (15m)
Crossing the Chasm: What’s New, What’s Not
Geoffrey Moore (Geoffrey Moore Consulting)
Crossing the Chasm has been a key reference point for high-tech marketing since its publication in 1990, but a lot has changed since then, especially with the rise of cloud computing, software as a service, mobile endpoints, big data analytics, and viral marketing.
9:05am-9:10am (5m) Sponsored
Evolution from Apache Hadoop to the Enterprise Data Hub
Amr Awadallah (Cloudera, Inc.)
In this talk Dr. Amr Awadallah will present the Enterprise Data Hub (EDH) as the new foundation for the modern information architecture. Built with Apache Hadoop at the core, the EDH is an extremely scalable, flexible, and fault-tolerant, data processing system designed to put data at the center of your business.
9:10am-9:15am (5m) Sponsored
Collecting Massive Data via Crowdsourcing
John Schitka (SAP)
Crowdsourcing can be an effective way to collect massive amounts of data to enable deeper analysis in many situations. Explore the foundational steps that can lead to successfully crowd sourcing data though the lenses of the International Barcode of Life and Technical University Munich (TUM) ProteomicsDB projects. SAP is proud to be involved with driving the success of both these projects.
9:15am-9:25am (10m)
Empowering Personalized Learning with Big Data
Ramona Pierson (Declara)
Humans are constantly curious and learning should be about making new discoveries. With big data, we have the potential to take formal learning which is taught and combine it with informal learning which is experienced, to create personalized learning paths for every individual.
9:25am-9:30am (5m) Sponsored
Hadoop in 5 Minutes or Less
John Schroeder (MapR Technologies)
This five-minute keynote will provide a quick overview of some of the more surprising things Hadoop is capable of in 5 minutes or less.
9:30am-9:35am (5m)
People are Data Too
Farrah Bostic (The Difference Engine)
We feel safer in big numbers, and we believe that numbers don't lie. But numbers don't actually speak for themselves - people speak for them.
9:35am-9:45am (10m) Sponsored
Bringing Big Data to One Billion People
Quentin Clark (Microsoft)
How does the world change when big data reaches a billion people? What happens when anyone, from farmers to criminal investigators, gains the power to quickly derive meaningful insights from vast and varied data sources? Join Quentin Clark, Microsoft Corporate Vice President, who will highlight how simple, familiar tools and cutting-edge cloud technologies are bringing big data to all.
9:45am-9:55am (10m)
Small Data in Sports: Little Differences that Mean Big Outcomes
David Epstein (Sports Illustrated)
The gap between legendary and anonymity in sports is often less than a 1% performance difference in elite sports. Thus, finding the core, modifiable variables that determine performance and tweaking them ever so slightly can alchemize silver medals into gold ones.
9:55am-10:05am (10m)
The Art of Good Practice
Rodney Mullen (Almost Skateboards)
The better we tune our practice, the more practice will make perfect.
8:00pm-11:00pm (3h) Event
Data After Dark: Club Strata
Help us kick off Strata 2014 with a festive gathering featuring a poker tournament. But even if you're not a card shark, join us for plenty of networking, refreshments, and great music, played by DJs whose day job is data science.
7:00pm-8:00pm (1h)
Break: Dinner