Skip to main content

Strata 2014 Schedule

Below are the confirmed and scheduled talks at Strata 2014. Note: The schedule is subject to change.

Customize Your Own Schedule

Create your own conference schedule using the personal scheduler function. Mark the Tutorials, Sessions, Keynotes, and Events you want to attend by clicking on the calendar icon [calendar icon] next to each listing. Then click on personal schedule below and get your own customized schedule generated.

Ballroom AB
Add Probabilistic Programming: What, Why, How, and When to your personal schedule
10:40am Probabilistic Programming: What, Why, How, and When Beau Cronin (Salesforce)
Add Graph All The Things!11: Graph Database Use Cases That Aren't Social to your personal schedule
1:30pm Graph All The Things!11: Graph Database Use Cases That Aren't Social Emil Eifrem (Neo Technology / Neo4j)
Add Agile Analytics to your personal schedule
2:20pm Agile Analytics Neal Ford (ThoughtWorks)
Add  MLbase: Distributed Machine Learning Made Easy to your personal schedule
4:00pm MLbase: Distributed Machine Learning Made Easy Ameet Talwalkar (Databricks), Evan Sparks (UC Berkeley)
Ballroom CD
Add Beyond Hadoop MapReduce: Interactive Advertising Insights with Shark @ Yahoo! to your personal schedule
10:40am Beyond Hadoop MapReduce: Interactive Advertising Insights with Shark @ Yahoo! Nandu Jayakumar (Yahoo! Inc./Stanford University), Tim Tully (Yahoo!)
Add Open Source Big Data for Defense to your personal schedule
11:30am Open Source Big Data for Defense Peter Wang (Continuum Analytics), Chris White (DARPA)
Add Socializing Search. Professionally. to your personal schedule
2:20pm Socializing Search. Professionally. Sriram Sankar (LinkedIn), Daniel Tunkelang (LinkedIn)
Add Real-time Analytics with Open Source Technologies to your personal schedule
4:00pm Real-time Analytics with Open Source Technologies Fangjin Yang (Metamarkets), Gian Merlino (Metamarkets)
GA Ballroom J
Add Spreadsheets: The Dark Matter of Big Data to your personal schedule
10:40am Spreadsheets: The Dark Matter of Big Data Felienne Hermans (Delft University of Technology)
Add Querying Petabytes of Data in Seconds to your personal schedule
1:30pm Querying Petabytes of Data in Seconds Reynold Xin (Databricks), Sameer Agarwal (UC Berkeley)
Add One Size Does Not Fit All: Analyzing Data at Scale with AWS to your personal schedule
2:40pm One Size Does Not Fit All: Analyzing Data at Scale with AWS Rahul Pathak (Amazon Web Services)
Add Making Data Move: Stream Processing in Go to your personal schedule
4:00pm Making Data Move: Stream Processing in Go Matvey Arye (Princeton University/Cloudflare), Albert Strasheim (CloudFlare)
Ballroom E
Add Data Journalism - Organized Crime and Corruption Reporting to your personal schedule
10:40am Data Journalism - Organized Crime and Corruption Reporting Drew Sullivan (Organized Crime and Corruption Reporting Project)
Add Government Data on Both Sides of the Bridge to your personal schedule
11:30am Government Data on Both Sides of the Bridge Jesse Robbins (OnBeep, Inc.), Shannon Spanhake (City & County of San Francisco), Eddie Tejeda (Civic Insight / OpenOakland / Public Ethics Commission)
Add Unboxing Data Startups to your personal schedule
1:30pm Unboxing Data Startups Michael Abbott (Kleiner Perkins Caufield & Byers)
Add Data for Good to your personal schedule
2:20pm Data for Good Jake Porway (DataKind), Drew Conway (IA Ventures), Rayid Ghani (Edgeflip | University of Chicago ), Elena Eneva (Accenture)
Add The Great Debate: Technology Creates More Jobs than it Destroys to your personal schedule
4:00pm The Great Debate: Technology Creates More Jobs than it Destroys Jim Stogdill (O'Reilly Media, Inc.), Brian Behlendorf (Mithril Capital Management LLC), Adrian Cockcroft (Battery Ventures), Ari Gesher (Palantir Technologies), Kimberly Stedman (Freelance)
Mission City M
Add Machine Learning for Machine Data to your personal schedule
10:40am Machine Learning for Machine Data David Andrzejewski (Sumo Logic)
Add Big Industrial Internet Data: Connecting and Optimizing at New Scales to your personal schedule
11:30am Big Industrial Internet Data: Connecting and Optimizing at New Scales Steven Gustafson (GE Global Research), Parag Goradia (GE)
Add Big Data for Big Power: Smart Meters ≠ Smart Grid to your personal schedule
1:30pm Big Data for Big Power: Smart Meters ≠ Smart Grid Brett Sargent (LumaSense Technologies Inc.)
Add Big Data for Better Data Centers to your personal schedule
2:20pm Big Data for Better Data Centers Krishna Raj Raja (Cloudphysics), Balaji Parimi (Cloudphysics)
Add Driving the Future of Smart Cities - How to Beat the Traffic to your personal schedule
4:00pm Driving the Future of Smart Cities - How to Beat the Traffic Ian Huston (Pivotal), Alexander Kagoshima (Pivotal), Noelle Sio (Pivotal)
GA Ballroom K
Add Expressing Yourself in R to your personal schedule
10:40am Expressing Yourself in R Hadley Wickham (Rice University / RStudio)
Add The IPython Notebook: Get Close to Your Data with Python and JavaScript to your personal schedule
11:30am The IPython Notebook: Get Close to Your Data with Python and JavaScript Brian Granger (Cal Poly San Luis Obispo)
Add The Urgent Need to Appify Big Data to your personal schedule
1:30pm The Urgent Need to Appify Big Data Ryan Cunningham (ClipCard)
Add Session with Ben Fry to your personal schedule
2:20pm Session with Ben Fry Ben Fry (Fathom Information Design)
Ballroom F
Add Enabling Business Transformation with Analytics over Real-time Streaming Data to your personal schedule
11:30am Enabling Business Transformation with Analytics over Real-time Streaming Data Anand Venugopal (Impetus Technologies Inc.), Pranay Tonpay (Impetus)
Add Apache Hive & Stinger: Petabyte Scale SQL, IN Hadoop to your personal schedule
1:30pm Apache Hive & Stinger: Petabyte Scale SQL, IN Hadoop Owen O'Malley (HortonWorks), Alan Gates (Hortonworks)
Ballroom G
Add Business Data Lake:  An Evolution in Data Infrastructure to your personal schedule
10:40am Business Data Lake: An Evolution in Data Infrastructure Jeffrey Kelly (The Wikibon Project)
Add FAST and FURIOUS – Big Data Analytics Meets Hadoop  to your personal schedule
1:30pm FAST and FURIOUS – Big Data Analytics Meets Hadoop Wayne Thompson (SAS), Paul Kent (SAS)
Add Are We Data Scientists or Data Janitors? to your personal schedule
2:20pm Are We Data Scientists or Data Janitors? Nenshad Bardoliwalla (Paxata, Inc.)
Ballroom H
Add Scale-Invariant Intelligence  to your personal schedule
10:40am Scale-Invariant Intelligence Vin Sharma (Intel)
Add How Comcast Turns Big Data into Real-Time Operational Insights to your personal schedule
11:30am How Comcast Turns Big Data into Real-Time Operational Insights Patrick Shumate (Comcast Cable)
Add Think Big Data is the Answer? Think Again. to your personal schedule
4:00pm Think Big Data is the Answer? Think Again. Matt Quinn (TIBCO Software, Inc.)
10:10am Morning Break sponsored by Intel
Room: Exhibit Hall
3:00pm Afternoon Break sponsored by SAP
Room: Exhibit Hall
Add Thursday Lunchtime BoF Tables to your personal schedule
12:10pm Plenary
Room: Exhibit Hall & Hyatt Santa Clara
Thursday Lunchtime BoF Tables
Add Thursday Keynote Welcome to your personal schedule
8:45am Plenary
Room: Mission City
Thursday Keynote Welcome Alistair Croll (Solve For Interesting), Roger Magoulas (O'Reilly Media)
Add Big Data Moonshots and Ground Control to your personal schedule
8:50am Plenary
Room: Mission City
Big Data Moonshots and Ground Control Joe Hellerstein (UC Berkeley), Tutti Taygerly (Trifacta)
Add Data Science and Smart Systems: Creating the Digital Brain to your personal schedule
9:00am Plenary
Room: Mission City
Data Science and Smart Systems: Creating the Digital Brain Kaushik Das (Pivotal)
Add A Datacenter OS for a Data-Rich Society to your personal schedule
9:10am Plenary
Room: Mission City
A Datacenter OS for a Data-Rich Society Boyd Davis (Intel)
Add How Companies are Using Spark, and Where the Edge in Big Data Will Be to your personal schedule
9:15am Plenary
Room: Mission City
How Companies are Using Spark, and Where the Edge in Big Data Will Be Matei Zaharia (Databricks)
Add In-Hadoop Analytics: Bringing analytics to big data to your personal schedule
9:25am Plenary
Room: Mission City
In-Hadoop Analytics: Bringing analytics to big data Anjul Bhambhri (IBM)
9:30am Plenary
Room: Mission City
TBC
Add The Future Isn't What it Used to Be to your personal schedule
9:40am Plenary
Room: Mission City
The Future Isn't What it Used to Be James Burke
Add Closing Keynotes to your personal schedule
4:50pm Plenary
Room: Mission City
Closing Keynotes
Add Record Linkage and Other Statistical Models for Quantifying Conflict Casualties in Syria to your personal schedule
4:55pm Plenary
Room: Mission City
Record Linkage and Other Statistical Models for Quantifying Conflict Casualties in Syria Megan Price (Human Rights Data Analysis Group)
Add Keynote with Ben Fry to your personal schedule
5:05pm Plenary
Room: Mission City
Keynote with Ben Fry Ben Fry (Fathom Information Design)
Add Survivorship Bias and the Psychology of Luck to your personal schedule
5:15pm Plenary
Room: Mission City
Survivorship Bias and the Psychology of Luck David McRaney (Author)
10:40am-11:20am (40m) Data Science
Probabilistic Programming: What, Why, How, and When
Beau Cronin (Salesforce)
Probabilistic programming is a new paradigm for modeling and inference that offers hope for a fundamental shift in our approach to understanding the stories behind our data. This talk will provide an overview of the systems currently available and their relative strengths, show examples of their usage, and offer a peak at the road ahead.
11:30am-12:10pm (40m) Data Science
Chicago Bars, Prisoner’s Dilemma, and Practical Models in Search
Chris Harland (Microsoft)
Predictive models are popular for their ability to grapple with massive data and bring to light features which are non-obvious to even the best domain experts. Solving practical problems with real world data involves creating models that balance predictive accuracy with practical significance. This talk provides examples of this balance in optimizing Chicago area bars and extends to Bing search.
1:30pm-1:50pm (20m) Data Science
Graph All The Things!11: Graph Database Use Cases That Aren't Social
Emil Eifrem (Neo Technology / Neo4j)
Recent years have seen an explosion of technologies for managing and analyzing graphs. While most people associate "graph" with "the social graph," there's a wide variety of non-social use cases for graph technologies. This session will explore graph adoption in finance, telecom, healthcare, HR & recruiting, gaming and beyond, using concrete case studies from actual graph production deployments.
1:50pm-2:10pm (20m) Data Science
The Last Mile: Challenges and Opportunities in Data Tools
Wes McKinney (Cloudera)
This talk will address some of the pressing problems in data preparation, analysis, visualization, and collaboration facing the modern data analyst. We will discuss the ways in which both programmatic and UI-driven tools are helping solve these problems and the areas in which more work and innovation are needed.
2:20pm-2:40pm (20m) Data Science
Agile Analytics
Neal Ford (ThoughtWorks)
Analytics and agility sometimes seem like natural enemies, but analytics suffer the same shifting requirements and uncertainty as other projects. This talk describe technique for incorporating analytics and data science into an agile rhythm.
2:40pm-3:00pm (20m) Data Science
Movie Reconstruction from Brain Signals: "Mind-Reading"
Bin Yu (UC Berkeley)
In a thrilling breakthrough at the intersection of neuroscience and statistics, penalized Least Squares methods have been used to construct a "mind-reading" algorithm that reconstructs movies from fMRI brain signals.
4:00pm-4:40pm (40m) Data Science
MLbase: Distributed Machine Learning Made Easy
Ameet Talwalkar (Databricks) et al
Implementing and consuming Machine Learning techniques at scale are difficult tasks for ML Developers and End Users. MLbase (www.mlbase.org) is an open-source platform under active development addressing the issues of both groups. In this talk we will describe the high-level functionality of MLbase and demonstrate its *scalability* and *ease-of-use* via real-world examples.
10:40am-11:20am (40m) Data in Action
Beyond Hadoop MapReduce: Interactive Advertising Insights with Shark @ Yahoo!
Nandu Jayakumar (Yahoo! Inc./Stanford University) et al
Yahoo! ingests hundreds of TB of advertising data into Hadoop each day. This talk describes how we are building our next-generation data architecture on top of Shark and Spark that is orders of magnitude faster than the previous. We will focus on the advanced streaming algorithms implemented in this new architecture, and how the new architecture have enabled deeper insights to our data scientists.
11:30am-12:10pm (40m) Data in Action
Open Source Big Data for Defense
Peter Wang (Continuum Analytics) et al
DARPA's XDATA program seeks to develop open source software to address government Big Data at all stages, from analysis to operations, in the areas of scalable analytics, processing, visualizations, and UIs. This new multi-year effort involves over 25 teams from academia, research labs, and small and large businesses, and includes efforts around Hadoop, Python, R, and other technologies.
1:30pm-2:10pm (40m) Hadoop and Beyond
Graph Analysis with One Trillion Edges on Apache Giraph
Avery Ching (Facebook)
Analyzing graphs can lead to useful insights that drive product and business decisions. This talk describes our efforts at Facebook to scale Apache Giraph to very large graphs (up to one trillion edges) and how we run Apache Giraph in production. We will also talk about how to build applications, some of the algorithms that we have implemented, and their use cases.
2:20pm-3:00pm (40m) Data in Action
Socializing Search. Professionally.
Sriram Sankar (LinkedIn) et al
Social networks bring a new dimension to search. Instead of looking for web pages, users search a world of entities connected by a rich graph of relationships. Serving billions of deeply personalized searches creates unique infrastructure and relevance challenges for LinkedIn. We'll describe how we've addressed those challenges and discuss implications of social networks for the future of search.
4:00pm-4:40pm (40m) Data in Action
Real-time Analytics with Open Source Technologies
Fangjin Yang (Metamarkets) et al
The maturation and development of open source technologies has made it easier than ever for companies to derive insights from vast quantities of data. In this session, we will cover how to build a real-time analytics stack using Kafka, Storm, and Druid. This combination of technologies can power a robust data pipeline that supports real-time ingestion and flexible, low-latency queries.
10:40am-11:20am (40m) Hadoop and Beyond
Spreadsheets: The Dark Matter of Big Data
Felienne Hermans (Delft University of Technology)
Spreadsheets are used extensively in industry: they are the number one tool for financial analysis. But they are as easy to build, as they are difficult to analyze, maintain and check. Felienne’s research aims at developing methods to support spreadsheet users to understand, update and improve spreadsheets.
11:30am-12:10pm (40m) Hadoop and Beyond
The Next Wave of SQL-on-Hadoop: Building a Virtual EDW on Native Hadoop Data
Marcel Kornacker (Cloudera, Inc.)
Learn how and why it is now possible for Apache Hadoop to serve as a virtual Enterprise Data Warehouse (EDW) framework for native Big Data (stored in HDFS) - making it no longer necessary to move that data into the EDW at great expense simply for analysis. In this session, attendees will get an architect-level view of the solution and explore an example configuration and benchmark numbers.
1:30pm-2:10pm (40m) Hadoop and Beyond
Querying Petabytes of Data in Seconds
Reynold Xin (Databricks) et al
BlinkDB is an approximate query engine that answers queries in seconds on extremely large datasets by leveraging data sampling. It exploits advances in machine learning and distributed query processing to allow trading off response times and accuracy. BlinkDB is being integrated into Shark and Presto. We will cover real world use case scenarios of BlinkDB at adopters such as Facebook.
2:20pm-2:40pm (20m) Hadoop and Beyond
Apache Mesos as an SDK for Building Distributed Frameworks
Paco Nathan (Databricks)
Google "Omega" research: 80% cluster jobs are batch, 60% cluster resources go to services. Batch is simple, services are hard, mixing workloads is key to building efficient distributed apps. This talk examines case studies of Mesos workloads: ranging from Twitter (100% on prem) to Airbnb (100% cloud). How did they leverage "data center OS" building blocks for orders of magnitude gains at scale?
2:40pm-3:00pm (20m) Hadoop and Beyond
One Size Does Not Fit All: Analyzing Data at Scale with AWS
Rahul Pathak (Amazon Web Services)
Learn how AWS thinks about big data and how we and our customers have approached managing large datasets using services such as Amazon S3, Amazon Elastic MapReduce, Amazon DynamoDB, and Amazon Redshift.
4:00pm-4:40pm (40m) Hadoop and Beyond
Making Data Move: Stream Processing in Go
Matvey Arye (Princeton University/Cloudflare) et al
Big-data is evolving. The state of the art has gone from running large batch queries over static data sets updated rarely to handling high-velocity data with low processing latency. In this session we present a new data framework that is geared at processing data with a very high update frequency. The framework utilizes the Go language's advanced concurrency primitives and extensibility.
10:40am-11:20am (40m) Ethics, Policy, and Privacy
Data Journalism - Organized Crime and Corruption Reporting
Drew Sullivan (Organized Crime and Corruption Reporting Project)
Endemic organized crime, augmented by corrupt governments and business interests can threaten local and regional security throughout the world. In this session we'll show how journalists can use data and technology to ferret out, investigate and combat corruption.
11:30am-12:10pm (40m) Ethics, Policy, and Privacy
Government Data on Both Sides of the Bridge
Jesse Robbins (OnBeep, Inc.) et al
Join Kiran Jain, the Senior Deputy Attorney for the City of Oakland, and Shannon Spanhake, the Deputy Innovation Officer for the City and County of San Francisco, to learn how governments are changing, and being changed, by the digital age.
1:30pm-2:10pm (40m) Data Science
Unboxing Data Startups
Michael Abbott (Kleiner Perkins Caufield & Byers)
Everyone knows that massive, real-time data processing is behind many of the hottest new companies in technology. But what’s really going on underneath the covers? In this session, investor and technology entrepreneur Michael Abbott unboxes three startups to look at the technology, architecture, and innovations they’ve harnessed to deliver their products and services.
2:20pm-3:00pm (40m) Ethics, Policy, and Privacy
Data for Good
Jake Porway (DataKind) et al
In this session, Edgeflip and Data Science for the Social Good’s Rayid Ghani, IA Ventures Scientist-in-residence and Datakind co-founder Drew Conway, and Datakind co-founder and executive director Jake Porway look at where data is making a difference today, what it promises tomorrow, and what’s holding it back.
4:00pm-4:40pm (40m) Ethics, Policy, and Privacy
The Great Debate: Technology Creates More Jobs than it Destroys
Jim Stogdill (O'Reilly Media, Inc.) et al
The always-popular Great Debate series returns to Strata. In this Oxford-style debate, two opposing teams take opposing positions. We poll the audience, and the teams try to sway opinions. It’ll be a fast-paced, sometimes irreverent look at some of the core challenges of putting data to work.
10:40am-11:20am (40m) Machine Data
Machine Learning for Machine Data
David Andrzejewski (Sumo Logic)
Organizations of all types and sizes are experiencing an explosion of machine log data whose literally inhuman diversity and scale overwhelms traditional analysis tools and techniques. We will discuss how machine learning can complement human expertise, enabling the extraction of valuable and actionable insights from log data.
11:30am-12:10pm (40m) Machine Data
Big Industrial Internet Data: Connecting and Optimizing at New Scales
Steven Gustafson (GE Global Research) et al
This presentation will introduce Big Data in context of the Industrial Internet, describe some of the unique software and analytics opportunities, and present several current research topics making the Industrial Internet a reality.
1:30pm-2:10pm (40m) Machine Data
Big Data for Big Power: Smart Meters ≠ Smart Grid
Brett Sargent (LumaSense Technologies Inc.)
Smart meters may be the most visible element of the so-called smart grid, but how smart is it if the plants producing the energy are dumb? To ensure the integrity of the grid, every stage of our electrical power infrastructure – including generation, transmission and distribution – has to get ”smart.” Sophisticated sensors connected to big data analytics are key to keeping the power flowing.
2:20pm-3:00pm (40m) Machine Data
Big Data for Better Data Centers
Krishna Raj Raja (Cloudphysics) et al
In this talk we discuss the challenges associated with data center operations management and provide details on how CloudPhysics big data platform solves these problems and enables new capabilities that were previously not possible.
4:00pm-4:40pm (40m) Machine Data
Driving the Future of Smart Cities - How to Beat the Traffic
Ian Huston (Pivotal) et al
With increased road congestion around the globe and growing amounts of car data we need more intelligent analytical methods to beat the traffic. This talk presents our work on traffic velocity and travel disruption analytics. We describe our approach in detail, how we went from idea to implemented algorithm and how our methods can be applied to gain deep insight into influential factors.
10:40am-11:20am (40m) Design
Expressing Yourself in R
Hadley Wickham (Rice University / RStudio)
A well-designed domain specific language makes all parts of the data science process easier. In this talk I'll discuss two DSLs implemented in R that make it data manipulation and visualisation both easier to describe and faster to compute.
11:30am-12:10pm (40m) Design
The IPython Notebook: Get Close to Your Data with Python and JavaScript
Brian Granger (Cal Poly San Luis Obispo)
The IPython Notebook is an open-source, web-based interactive computing environment that enables users to create documents that combine live code and data with text, equations, plots and HTML. In this talk I will describe a new interactive widget architecture for the Notebook that allows the seamless integration of JavaScript (d3.js,...) and Python for data exploration and visualization purposes.
1:30pm-2:10pm (40m) Design
The Urgent Need to Appify Big Data
Ryan Cunningham (ClipCard)
We're failing at big data, and bigger technology isn't helping. Complex infrastructure shouldn't justify complicated experiences. Let's apply the principles of consumer app culture to enterprise decision-making in a way that goes beyond dashboards. Let's use design thinking and metadata to connect people to information in a world where complexity is inevitable and technology alone is insufficient.
2:20pm-3:00pm (40m) Design
Session with Ben Fry
Ben Fry (Fathom Information Design)
Ben Fry, Principal, Fathom
4:00pm-4:40pm (40m) Design
StatusWolf: Creating Dashboards That Don't Suck Using Art and Engineering
Mark Troyer (Box, Inc.)
You might think that art has nothing to do with dashboarding - dealing with your data architecture is an engineering/operations problem, right? On the contrary, understanding how to deal with your data in a way that is consumable by humans is fundamentally a design problem. Learn how art and design influenced the process for developing a new dashboarding tool called StatusWolf.
10:40am-11:20am (40m) Sponsored
The Inflection Point - Hadoop and Big Data Analytics
Anjul Bhambhri (IBM)
The Inflection Point - Hadoop and Big Data Analytics
11:30am-12:10pm (40m) Sponsored
Enabling Business Transformation with Analytics over Real-time Streaming Data
Anand Venugopal (Impetus Technologies Inc.) et al
This session will address the exciting possibilities of bringing dramatic improvements in various industry verticals using big data analytics especially real-time analytics over high-volume data in motion.
1:30pm-2:10pm (40m) Sponsored
Apache Hive & Stinger: Petabyte Scale SQL, IN Hadoop
Owen O'Malley (HortonWorks) et al
Apache Hive is the de-facto standard for SQL-in-Hadoop today, with more enterprises relying on this open source project than on any alternative. Enterprises have asked for Hive to become more real-time and interactive‚ and the Hive community has responded.
2:20pm-3:00pm (40m) Sponsored
NonStop HBase - Making HBase Continuously Available for Enterprise Deployment
Jagane Sundar (WANdisco)
Application of the Paxos Protocol Towards Building a Continuously Available HBase
4:00pm-4:40pm (40m) Sponsored
Real-Time Analytics with NewSQL: Why Hadoop is not enough
Raj Bains (Clustrix, Inc.)
NewSQL has followed quickly on the heels of NoSQL - providing scale-out of NoSQL along with SQL and ACID guarantees. We'll discuss NewSQL with customer examples and contrast it with SQL on Hadoop implementations.
10:40am-11:20am (40m) Sponsored
Business Data Lake: An Evolution in Data Infrastructure
Jeffrey Kelly (The Wikibon Project)
Organizations are now moving beyond rigid and high latency data warehouse environments to more flexible and cost-effective "Data Lake(s)": centrally managed repository using low cost technologies such as Hadoop, SQL, In-Memory, and others to land any and all data that might potentially be valuable for analysis and operationalizing that insight.
11:30am-12:10pm (40m) Sponsored
Lessons from the Trenches: edo Interactive Leverages Hadoop to Build Customer Loyalty
Rob Rosen (Pentaho) et al
edo Interactive shares how they drive agile, improved decision-making by complementing native Hadoop technologies with analytical databases and ETL optimization and data visualization solutions from vendors such as Pentaho.
1:30pm-2:10pm (40m) Sponsored
FAST and FURIOUS – Big Data Analytics Meets Hadoop
Wayne Thompson (SAS) et al
In the world of ever growing data volumes, how do you extract insight, trends and meaning from all that data in Hadoop? Getting relevant information in seconds (instead of hours or days) from big data requires a different approach. Join Paul Kent and Wayne Thompson from SAS as they share how to reveal insights in your Big data and redefine how your organization solves complex problems.
2:20pm-3:00pm (40m) Sponsored
Are We Data Scientists or Data Janitors?
Nenshad Bardoliwalla (Paxata, Inc.)
Join Paxata’s Nenshad Bardoliwalla for a look at the new breed of data preparation tools that use semantic algorithms to detect data types, apply machine learning to find hidden patterns, and link related columns of data automatically.
4:00pm-4:40pm (40m) Sponsored
Making Choices: What Kind of Relationship are You Seeking with Your Database?
J.R. Arredondo (Rackspace)
We will discuss Rackspace’s vision for Data-as-a-Service, and provide a few key questions that could help you complement your technical analysis when choosing a database service. Along the way, we will also discuss parts of the portfolio of data services available at Rackspace, including SQL, MongoDB, Redis and Hadoop-based solutions.
10:40am-11:20am (40m) Sponsored
Scale-Invariant Intelligence
Vin Sharma (Intel)
In this session, I will illustrate these architectures with real-world examples of city governments, retail banks, food manufacturers, pharmaceutical companies, and Intel itself applying intelligence wherever data lives.
11:30am-12:10pm (40m) Sponsored
How Comcast Turns Big Data into Real-Time Operational Insights
Patrick Shumate (Comcast Cable)
How Comcast Turns Big Data into Real-Time Operational Insights
1:30pm-2:10pm (40m) Sponsored
The Need for Speed & Scale: A Database for Real-Time Analytics
Eric Frenkiel (MemSQL)
In this session, MemSQL CEO Eric Frenkiel will discuss the benefits for companies that augment their existing information architecture with a versatile real-time database platform to handle high volume and velocity transactional and analytical workloads.
2:20pm-3:00pm (40m) Sponsored
Hadoop From Batch Data Processing to Real-time Query and Streaming Platform
Peter Sirota (Amazon Web Services)
Learn from the Amazon Elastic MapReduce team's recent experience with streaming services such as Amazon Kinesis and low-latency query engines like Impala and Phoenix. We'll clarify many of the implementation details of our Hadoop InputFormat for Amazon Kinesis and demonstrate the power and flexibility of applying existing Hadoop ecosystem technologies to the real-time data paradigm.
4:00pm-4:40pm (40m) Sponsored
Think Big Data is the Answer? Think Again.
Matt Quinn (TIBCO Software, Inc.)
Big Data is really a small data mindset. At the enterprise-level, where the potential for data collection is greatest, companies are still stuck compartmentalizing data. TIBCO CTO Matt Quinn will share how the world’s leading sports teams, airlines, banks and retailers are those that change their Big Data mindset to an All Data one.
10:10am-10:40am (30m)
Break: Morning Break sponsored by Intel
3:00pm-4:00pm (1h)
Break: Afternoon Break sponsored by SAP
12:10pm-1:30pm (1h 20m)
Thursday Lunchtime BoF Tables
Birds of a Feather (BoF) sessions are informal roundtable discussions happening during lunch on Wednesday, February 12 and Thursday, February 13. You can join any BoF table or start your own with a topic of your choice. The BoF sign-up board will be near the Registration area.
8:45am-8:50am (5m)
Thursday Keynote Welcome
Alistair Croll (Solve For Interesting) et al
Strata Program Chairs, Alistair Croll and Roger Magoulas, welcome you to the second day of keynotes.
8:50am-9:00am (10m)
Big Data Moonshots and Ground Control
Joe Hellerstein (UC Berkeley) et al
If Big Data is the grand challenge of our time, most analytic effort is like ground control: the hard work behind the scenes that enables ambitious analysis to occur.
9:00am-9:10am (10m) Sponsored
Data Science and Smart Systems: Creating the Digital Brain
Kaushik Das (Pivotal)
The emerging Internet Of Things (IOT) enables us to build smart systems. We already have the sensory and motor parts of these systems available, but we don't have the brain. This is where data science comes into the picture! I will talk about how we are using big data technologies in conjunction with data science here at Pivotal to build the digital brain that makes a system smart.
9:10am-9:15am (5m) Sponsored
A Datacenter OS for a Data-Rich Society
Boyd Davis (Intel)
At Intel, we envision a future in which every organization in the world can use new sources of data to enhance its operational intelligence, fostering discoveries and innovation in science, industry, and medicine.
9:15am-9:25am (10m)
How Companies are Using Spark, and Where the Edge in Big Data Will Be
Matei Zaharia (Databricks)
While the first big data systems made a new class of applications possible, organizations must now compete on the speed and sophistication with which they can draw value from data. Future data processing platforms will need to not just scale cost-effectively; but to allow ever more real-time analysis, and to support both simple queries and today's most sophisticated analytics algorithms.
9:25am-9:30am (5m) Sponsored
In-Hadoop Analytics: Bringing analytics to big data
Anjul Bhambhri (IBM)
Big Data without analytics is just data, but how do you perform the analytics? In this session, learn how In-Hadoop analytics is changing the game for the possibilities of Hadoop.
9:30am-9:40am (10m)
Plenary
To be confirmed
9:40am-10:10am (30m)
The Future Isn't What it Used to Be
James Burke
Keynote by James Burke, science and technology historian, futurist, and author.
4:50pm-4:55pm (5m)
Closing Keynotes
Strata Program Chairs, Roger Magoulas and Alistair Croll, welcome you to Strata Closing Keynotes
4:55pm-5:05pm (10m)
Record Linkage and Other Statistical Models for Quantifying Conflict Casualties in Syria
Megan Price (Human Rights Data Analysis Group)
How do we know how many people have been killed in Syria? If violence is escalating or decreasing? The hard answer is we don't. But through careful application of machine learning and other statistical techniques, we can quantify what we do, and don't, know.
5:05pm-5:15pm (10m)
Keynote with Ben Fry
Ben Fry (Fathom Information Design)
Ben Fry, Principal, Fathom
5:15pm-5:35pm (20m)
Survivorship Bias and the Psychology of Luck
David McRaney (Author)
David McRaney will tell the story of how the Department of War Math in World War II helped bring to light the psychology of how we miss what is important when it comes to failure, and how the modern understanding of the psychology of luck provides the best game plan for getting the best out of life.