Skip to main content

Tuesday, 02/11/2014

9:00am

Add to your personal schedule
Data Science
Ballroom G
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Olivier Grisel (INRIA)
Average rating: ****.
(4.47, 17 ratings)
3-Hours: Hands on introductory workshop on Predictive Modelling and Machine Learning with open source tools from the Python community such as scikit-learn and IPython. Read more.
Add to your personal schedule
Hadoop and Beyond
GA Ballroom K
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Sameer Agarwal (UC Berkeley), Tathagata Das (University of California Berkeley), Ali Ghodsi (UC Berkeley), Ion Stoica (UC Berkeley), Ameet Talwalkar (UC Berkeley), Reynold Xin (Databricks), Matei Zaharia (Databricks), Joseph Gonzalez (UC Berkeley)
Average rating: ****.
(4.29, 7 ratings)
3-Hours: An introduction to the newest components of the open-source Berkeley Data Analytics Stack (BDAS) in development at UC Berkeley (and an overview of existing ones). BlinkDB is a SQL engine that provides fast approximate distributed query results. MLbase includes a library to make machine learning at scale easy. Tachyon is a file system that provides memory speed sharing across frameworks.. Read more.
Add to your personal schedule
Data Science
Ballroom E
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Brian Granger (Cal Poly San Luis Obispo), Fernando Pérez (University of California at Berkeley)
Average rating: ****.
(4.44, 9 ratings)
3-Hours: IPython is an open source project that provides tools for interactive and parallel computing in Python. This includes the IPython Notebook, a web-based interactive computing environment that enables users to author documents that combine code, text, equations, figures and videos. This tutorial will provide a hands-on tour of the IPython Shell, Notebook and parallel computing architecture Read more.
Add to your personal schedule
SOLD OUT
Hadoop and Beyond
Ballroom F
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
John Akred (Silicon Valley Data Science), Richard Williamson (Silicon Valley Data Science), Stephen OSullivan (Silicon Valley Data Science)
Average rating: ***..
(3.27, 22 ratings)
3-Hours: What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive and realtime analytical workloads. Read more.
Add to your personal schedule
Design
Room 204
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Michael Stringer (Datascope Analytics), Dean Malmgren (Datascope Analytics), Laurie Skelly (Datascope Analytics)
Average rating: ***..
(3.80, 5 ratings)
As with many other types of projects, the most crucial part of any data-oriented project is choosing an appropriate problem or opportunity to focus on in the first place. Read more.
Add to your personal schedule
Data Science
GA Ballroom J
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
John Foreman (MailChimp)
Average rating: ****.
(4.87, 15 ratings)
Data science algorithms (think machine learning, clustering, outlier detection) often get conflated with the industry-standard tools and programming languages that run them. In this tutorial, John Foreman will use only spreadsheets to build models from his book Data Smart to demonstrate exactly how data science techniques work step-by-step. Read more.
Add to your personal schedule
Hadoop and Beyond
Ballroom H
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Rich Raposa (Hortonworks)
Average rating: ****.
(4.30, 10 ratings)
This workshop provides a detailed discussion of the new features of Apache Hadoop 2.0. We will discuss how YARN turns Hadoop from a single use system for batch data processing into a multi-use platform for storing and processing data in many ways other than batch. We will also discuss the details of the new HDFS improvements like High Availability, Federation, and Snapshots. Read more.
Add to your personal schedule
Hardcore Data Science
Ballroom AB
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Average rating: ***..
(3.64, 14 ratings)
All-Day: Strata's regular data science track has great talks with real world experience from leading edge speakers. But we didn't just stop there—we added the Hardcore Data Science day to give you a chance to go even deeper. The Hardcore day will add new techniques and technologies to your data science toolbox, shared by leading data science practitioners from startups, industry, consulting... Read more.
Add to your personal schedule
Data Driven Business
Ballroom CD
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Average rating: ***..
(3.46, 13 ratings)
All-Day: For business strategists, marketers, product managers, and entrepreneurs, Data-Driven Business looks at how to use data to make better business decisions faster. Packed with case studies, panels, and eye-opening presentations, this fast-paced day focuses on how to solve today's thorniest business problems with Big Data. It's the missing MBA for a data-driven, always-on business world. Read more.

12:30pm


Lunch
Break (1h)

1:30pm

Add to your personal schedule
Data Science
Ballroom G
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Carlos Guestrin (GraphLab Inc.)
Average rating: ****.
(4.00, 10 ratings)
3-Hours: This tutorial will provide an introduction to modern machine learning. Attendees will learn how to leverage some of the most popular techniques used in fraud detection, social network analysis, and personalized recommendation services. Read more.
Add to your personal schedule
Hadoop and Beyond
GA Ballroom K
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Andy Konwinski (Databricks), Sameer Agarwal (UC Berkeley), Tathagata Das (University of California Berkeley), Ameet Talwalkar (UC Berkeley), Shivaram Venkataraman (UC Berkeley), Patrick Wendell (Databricks), Reynold Xin (Databricks), Matei Zaharia (Databricks), Joseph Gonzalez (UC Berkeley), Haoyuan Li (UC Berkeley)
Average rating: ***..
(3.10, 10 ratings)
3-Hours: Get hands-on training with the newest components of the open-source Berkeley Data Analytics Stack (BDAS). Lessons will cover BlinkDB, MLbase, Spark, Spark Streaming, and Shark. We will provide each audience member with an EC2 cluster and walk through hands-on exercises using these technologies to analyze real-world datasets. Read more.
Add to your personal schedule
Design
Ballroom E
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Scott Murray (University of San Francisco)
Average rating: ****.
(4.85, 13 ratings)
d3.js is a powerful tool for creating interactive charts on the web with data. But digging into D3 from scratch can make your head spin. This tutorial will take you from scattered to building your own working, interactive scatterplots in three hours. Read more.
Add to your personal schedule
Data Science
Ballroom F
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Vitaly Gordon (LinkedIn)
Average rating: ****.
(4.00, 3 ratings)
90-Minutes: Machine learning is software. As such, it should follow standard software engineering practices,, however, the current tools of the trade are not modular, maintainable or reusable. In this tutorial we will learn to work with Scalding, a Scala DSL which provides both the simplicity of languages like Apache Pig, and the power of a functional fully JVM language. Read more.
Add to your personal schedule
SOLD OUT
Hadoop and Beyond
Room 204
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Florian Leibert (Mesosphere), Paco Nathan (Databricks), Benjamin Hindman (Apache Mesos)
Average rating: ****.
(4.40, 5 ratings)
3-Hours: Mesos is a cluster manager that provides efficient resource isolation for distributed frameworks--much like Google's "Borg" for warehouse scale computing. We'll provide hands-on experience in how to build scalable, fault-tolerant data workflows atop Mesos. We'll use Chronos to orchestrate Hadoop jobs and other data prep, then use Marathon to launch a Rails + Redis app to serve results. Read more.
Add to your personal schedule
Data Science
GA Ballroom J
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Leland Wilkinson (Skytree)
Average rating: ***..
(3.50, 2 ratings)
3-Hours: Adviser is a new and unique statistics and machine learning application that provides a second opinion on the results of your analysis. It incorporates a full range of analytic methods plus an expert system that flags outliers, model miss-specifications, and other anomalies. This workshop will illustrate its use in real data analyses for both novices and experts. Read more.
Add to your personal schedule
Hadoop and Beyond
Ballroom H
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Ronan Stokes (Cloudera)
Average rating: *....
(1.30, 20 ratings)
3-Hours: Apache HBase is a distributed, column-oriented, key-value store for Apache Hadoop (via integration with HDFS). In this tutorial, you will learn the basic elements of building a real-time application that uses Apache HBase as a persistent data store. Read more.

3:30pm

Add to your personal schedule
Data Science
Ballroom F
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Joe Hellerstein (Trifacta and UC Berkeley), Jeffrey Heer (Trifacta Inc. / Univ of Washington)
Average rating: ****.
(4.80, 10 ratings)
90-Minutes: Data analysts routinely report spending more time "wrangling" their data than performing analysis per se. In this tutorial we focus on the ever-present yet oft-overlooked challenges of Data Transformation, including discovery, structure, content and curation. We emphasize recent approaches that jointly emphasize interaction and inference, leveraging both human acuity and... Read more.

5:00pm

Add to your personal schedule
Exhibit Hall
Average rating: ***..
(3.75, 8 ratings)
Grab a drink, mingle with fellow Strata participants, and see the latest technologies and products from leading companies in the data space. Read more.

6:30pm

Add to your personal schedule
Mission City
Average rating: *****
(5.00, 2 ratings)
Once again at Strata, we’ll be inviting the best of the best to demonstrate their innovations at Startup Showcase. Read more.

Wednesday, 02/12/2014

8:45am

Add to your personal schedule
Mission City
Roger Magoulas (O'Reilly Media), Alistair Croll (Solve For Interesting)
Average rating: ***..
(3.75, 4 ratings)
Strata Program Chairs, Roger Magoulas and Alistair Croll, welcome you to the first day of keynotes. Read more.

8:50am

Add to your personal schedule
Mission City
Geoffrey Moore (Geoffrey Moore Consulting)
Average rating: ****.
(4.19, 21 ratings)
Crossing the Chasm has been a key reference point for high-tech marketing since its publication in 1990, but a lot has changed since then, especially with the rise of cloud computing, software as a service, mobile endpoints, big data analytics, and viral marketing. Read more.

9:05am

Add to your personal schedule
Mission City
Amr Awadallah (Cloudera, Inc.)
Average rating: **...
(2.95, 19 ratings)
In this talk Dr. Amr Awadallah will present the Enterprise Data Hub (EDH) as the new foundation for the modern information architecture. Built with Apache Hadoop at the core, the EDH is an extremely scalable, flexible, and fault-tolerant, data processing system designed to put data at the center of your business. Read more.

9:10am

Add to your personal schedule
Mission City
John Schitka (SAP)
Average rating: **...
(2.35, 20 ratings)
Crowdsourcing can be an effective way to collect massive amounts of data to enable deeper analysis in many situations. Explore the foundational steps that can lead to successfully crowd sourcing data though the lenses of the International Barcode of Life and Technical University Munich (TUM) ProteomicsDB projects. SAP is proud to be involved with driving the success of both these projects. Read more.

9:15am

Add to your personal schedule
Mission City
Ramona Pierson (Declara)
Average rating: **...
(2.30, 20 ratings)
Humans are constantly curious and learning should be about making new discoveries. With big data, we have the potential to take formal learning which is taught and combine it with informal learning which is experienced, to create personalized learning paths for every individual. Read more.

9:25am

Add to your personal schedule
Mission City
John Schroeder (MapR Technologies)
Average rating: ***..
(3.22, 23 ratings)
This five-minute keynote will provide a quick overview of some of the more surprising things Hadoop is capable of in 5 minutes or less. Read more.

9:30am

Add to your personal schedule
Mission City
Farrah Bostic (The Difference Engine)
Average rating: ****.
(4.23, 26 ratings)
We feel safer in big numbers, and we believe that numbers don't lie. But numbers don't actually speak for themselves - people speak for them. Read more.

9:35am

Add to your personal schedule
Mission City
Quentin Clark (Microsoft)
Average rating: **...
(2.53, 19 ratings)
How does the world change when big data reaches a billion people? What happens when anyone, from farmers to criminal investigators, gains the power to quickly derive meaningful insights from vast and varied data sources? Join Quentin Clark, Microsoft Corporate Vice President, who will highlight how simple, familiar tools and cutting-edge cloud technologies are bringing big data to all. Read more.

9:45am

Add to your personal schedule
Mission City
David Epstein (Sports Illustrated)
Average rating: ****.
(4.89, 27 ratings)
The gap between legendary and anonymity in sports is often less than a 1% performance difference in elite sports. Thus, finding the core, modifiable variables that determine performance and tweaking them ever so slightly can alchemize silver medals into gold ones. Read more.

9:55am

Add to your personal schedule
Mission City
Rodney Mullen (Almost Skateboards)
Average rating: ***..
(3.45, 22 ratings)
The better we tune our practice, the more practice will make perfect. Read more.

10:40am

Add to your personal schedule
Sponsored
Ballroom G
Eli Collins (Cloudera)
Average rating: *....
(1.50, 2 ratings)
In this talk, we'll explore how Apache Hadoop has rapidly evolved to become the new foundation for enterprise analytics - the enterprise data hub - and learn about the state-of-the-art in deploying a modern data warehouse on top of the Hadoop stack. Read more.
Add to your personal schedule
Design
GA Ballroom K
Michael Conover (LinkedIn)
Average rating: ****.
(4.42, 12 ratings)
A core element of product innovation and successful predictive modeling, information visualization plays a central role in effective data processing pipelines. In this talk, we will explore how the technologies and workflow patterns used by LinkedIn data scientists can be applied to analytical challenges found across a wide variety of problem domains. Read more.
Add to your personal schedule
Ari Gesher (Palantir Technologies)
Average rating: ***..
(3.25, 4 ratings)
Statistical methods tends to fail when there is someone on the other side of a problem actively evading detection. Here we look at three systems successfully used to fight adaptive adversaries engaged in fraud and cyber attacks. Using a combination of big data techniques and interactive human analysis, these systems are protecting commercial banks, pharmaceutical companies, and governments. Read more.
Add to your personal schedule
Sponsored
Ballroom F
Bryan Hurd (Microsoft Cybercrime Center), Herain Oberoi (Microsoft)
Average rating: **...
(2.50, 2 ratings)
BotNets and cybercrime are by their very nature Big Data problems. The Microsoft Cybercrime Center is working in conjunction with law enforcement, public sector, commercial and academic partners to investigate, disable and prosecute cyber criminals... Read more.
Add to your personal schedule
Hadoop and Beyond
GA Ballroom J
Edd Dumbill (Silicon Valley Data Science)
Average rating: ***..
(3.76, 17 ratings)
A maze of twisty databases, all of which look the same, and each claim they're best for the job. Welcome to the world of choosing big data vendors. In this session we'll map out the data tool landscape, and lay out a framework to help you choose a solution, or elect to build one yourself. Read more.
Add to your personal schedule
Sponsored
Ballroom H
Mike Gualtieri (Forrester Research)
Average rating: ***..
(3.67, 3 ratings)
Mike Gualtieri, principal analyst at Forrester Research, Inc., will facilitate a panel of production Hadoop users – including Cisco, The Climate Corporation, The Rubicon Project, and Solutionary – to discuss the challenges and best practices for deploying Hadoop in production. Join us for an engaging conversation on tips and tricks in deploying Hadoop in production. Read more.
Add to your personal schedule
Data Science
Ballroom AB
Chris Re (Stanford University)
Average rating: ****.
(4.18, 11 ratings)
A new generation of data processing systems, including web search, Google's Knowledge Graph, IBM's Watson, and several different recommendation systems, combine rich databases with software driven by machine learning. This talk describes our recent thoughts on one crucial pain point in the construction of trained systems feature engineering. Read more.
Add to your personal schedule
Data in Action
Ballroom CD
David Epstein (Sports Illustrated)
Average rating: ****.
(4.89, 9 ratings)
Epstein explains the origins of the "magic number," how it should be used, and how it is often misused in a manner that often hinders performance science-and leads sports executives to overlook simple but important data-as well as the development of athletes. Read more.
Add to your personal schedule
Connected World
Mission City M
Bob Filbin (Crisis Text Line)
Average rating: ****.
(4.25, 4 ratings)
The measure of success for a data scientist is not number of insights, but impact on co-workers' behavior. Moving from insight to action requires an art underutilized by the data science community: storytelling. I will cover techniques including the Fogg model, loss aversion, and minimum viable stories, using examples of my failures and successes in driving behavioral change with data. Read more.

11:30am

Add to your personal schedule
Sponsored
Ballroom G
Bill Franks (Teradata Corporation)
Average rating: ***..
(3.50, 6 ratings)
Attend this session to learn how you can take advantage of the new economics of data. This session will present examples of how leading organizations are evolving their enterprise data architectures to bring together the Data Warehouse, Hadoop & Data Discovery Platforms so All Users can benefit from ALL Analytics on ALL Data. Read more.
Add to your personal schedule
Design
GA Ballroom K
Justin Langseth (Zoomdata, Inc.), Eva Andreasson (Cloudera)
Average rating: ***..
(3.75, 4 ratings)
Storing massive data is one challenge. Making it useful throughout all levels of a company in real time is quite another. The ability to intuitively sort, sift and analyze data through touch and gesture is here. We will review several case studies of how companies are creating an intuitive data driven cultures through Cloudera Search, leveraging Impala coupled with Zoomdata visualization. Read more.
Add to your personal schedule
Fernand Pajot (Change.org)
Average rating: ***..
(3.00, 3 ratings)
With more than 45 million users and over 40,000 petitions created every month, Change.org is the biggest online platform for social change around the world. This talk is about how both bleeding edge and simple machine learning algorithms are used at Change.org to connect users to petitions and social issues which are most relevant to them. Read more.
Add to your personal schedule
Sponsored
Ballroom F
Yuvaraj Athur Raghuvir (SAP Labs LLC.)
To seize the future data must be harnessed in actionable time. Based on a real deployment see to achieve instant results with infinite storage - filter large amounts of cold data in Hadoop, analyze in Real-Time with SAP HANA and visualize using SAP Lumira. Learn how solutions from SAP and our Hadoop partners can help your organization seize the future and gain unprecedented insight from Big Data. Read more.
Add to your personal schedule
Hadoop and Beyond
GA Ballroom J
Joseph Adler (LinkedIn, Inc.), Xin Fu (LinkedIn Corporation), Bee-Chung Chen (LinkedIn, Inc.)
Average rating: ****.
(4.00, 17 ratings)
This talk describes how LinkedIn's engineering, data science, and reporting teams work together to develop, test, and rank new insights, recommendations, and updates shown on our home page stream. Read more.
Add to your personal schedule
Sponsored
Ballroom H
Ben Werther (Platfora), Sanjay Mathur (Silicon Valley Data Science)
Average rating: ***..
(3.67, 3 ratings)
Join us as we discuss the real-world applications of big data, examine what's working and what isn't, and discuss why you don't need to boil the big data ocean with Hadoop. Read more.
Add to your personal schedule
Data Science
Ballroom AB
Olivier Grisel (INRIA)
Average rating: ***..
(3.86, 7 ratings)
IPython and scikit-learn offer a nice environment for interactive data analytics in general and predictive modelling in particular. This presentation will give an overview on how to use both to perform tasks such as distributed model parameter tuning and parallel training of Random Forests on ad hoc compute clusters provisioned in the cloud. Read more.
Add to your personal schedule
Data in Action
Ballroom CD
Perry Samson (University of Michigan)
Average rating: ****.
(4.67, 6 ratings)
What if students could be provided helpful feedback in real-time based on the notes they are typing in class? This talk presents a prototype that has been in use in multiple courses at the University of Michigan to both challenge students' understanding based on the words they type in class and offer further resources for further study. Read more.
Add to your personal schedule
Connected World
Mission City M
Max Shron (Shron & Co.)
Average rating: **...
(2.71, 14 ratings)
Why have powerful tools if you aren't asking the right questions? Good questions trump shiny tools, but our community has done little to improve how we train people in the "soft side" of data science. We will show how to borrow ideas from design, the humanities, consulting practices to structure problems and improve the questions we ask of our data. Read more.

12:10pm

Add to your personal schedule
Exhibit Hall & Hyatt Santa Clara
Average rating: *****
(5.00, 1 rating)
Birds of a Feather (BoF) sessions are informal roundtable discussions happening during lunch on Wednesday, February 12 and Thursday, February 13. You can join any BoF table or start your own with a topic of your choice. The BoF sign-up board will be near the Registration area. Read more.

1:30pm

Add to your personal schedule
Sponsored
Ballroom G
Justin Makeig (MarkLogic)
Average rating: ***..
(3.00, 1 rating)
Most data centers are filled with rigid data servers that are tightly linked to specific applications, leading to data duplication, lengthy development cycles, and unnecessary costs. Learn how you can use the MarkLogic Enterprise NoSQL database platform to help create a flexible, agile data fabric that will allow you to iterate your application development, optimize your data, and reduce costs. Read more.
Add to your personal schedule
Design
GA Ballroom K
Justin Langseth (Zoomdata, Inc.), Gus Hunt (CIA)
Average rating: **...
(2.75, 12 ratings)
The true power of big data will be realized when average people can use complex analytics to solve everyday problems. We will describe a future engagement model derived from work in the Intelligence Community, reviewing real-world use cases showing how user-centric design is transforming big data from a science requiring specialists to elegant visualizations that deliver insight to average users. Read more.
Add to your personal schedule
Scott Lee (EMC), Rachel Haines (EMC)
Average rating: **...
(2.00, 6 ratings)
Earlier Data Governance generations (that support BI-DW or MDM) succeeded by aligning stakeholders and improving data interoperability. But in the world of Big Data, interoperability is table-stakes, and next-gen Data Governance must provide contextual intelligence sufficient to reason out complex inquiries across diverse data. How? Would you believe a mash-up of building codes and game theory? Read more.
Add to your personal schedule
Sponsored
Ballroom F
Harold Hannon (SoftLayer Technologies)
The cloud provides an easy onramp to building and deploying Big Data solutions, particularly the latest technologies that favor scale-out architectures. Transitioning from initial deployment to a large-scale, highly performant operation without breaking the bank may not be easy. Read more.
Add to your personal schedule
Hadoop and Beyond
GA Ballroom J
Soam Acharya (Altiscale), Charles Wimmer (Altiscale), David Chaiken (Altiscale)
Average rating: **...
(2.33, 6 ratings)
The growing popularity of Hadoop has led to an increasing number of clusters worldwide. Priming these clusters with data from existing client repositories is difficult due to a number of issues including data size, network constraints, security & lack of domain knowledge. In this talk, we present a number of techniques & best practices for uploading large amounts of data to remote Hadoop clusters. Read more.
Add to your personal schedule
Sponsored
Ballroom H
Damon Cool (Evernote), John Santaferraro (Actian Corporation )
Average rating: **...
(2.33, 9 ratings)
In 2012, Evernote took proactive steps to prepare for a rapidly expanding customer base by making the transition from 18-hour queries on a MySQL server to ad hoc analytics for 200 million daily events—while on a budget. This session explains how Evernote is scaling to hundreds of terabytes and analyzes 200 million events per day using two-tier architecture including Hadoop and analytic platform. Read more.
Add to your personal schedule
Data Science
Ballroom AB
Adam Marcus (Locu / GoDaddy)
Average rating: ****.
(4.50, 6 ratings)
Machine learning and paid crowdsourcing power several virtuous cycles in Locu's data processing pipeline. To solve various problems, we interact with hundreds of long-term crowd workers on oDesk and tens of thousands of shorter-term workers on CrowdFlower. Come learn about Locu's magic with examples based on problems we solve every day. Read more.
Add to your personal schedule
Data in Action
Ballroom CD
Eric Pugh (OpenSource Connections)
Average rating: ****.
(4.25, 4 ratings)
The United States Patent and Trademark Office wanted a simple, lightweight, yet modern and rich discovery interface for Chinese patent data. This is how we did it. Read more.
Add to your personal schedule
Connected World
Mission City M
Moderated by:
Michael Dauber (Battery Ventures)
Panelists:
Renee DiResta (OATV), Jake Flomenberg (Accel Partners), Matthew Ocko (Data Collective), Ross Fubini (Canaan Partners)
Average rating: ****.
(4.25, 4 ratings)
A group of VCs who invest from very early, through later stage investments talk about all things Big Data. There will be no “3 Vs” discussion here. The Panelists are committed to making this a lively discussion about topics ranging from the typical (what sectors do they want to invest in) to the atypical (what’s out there that they don’t like? Read more.

1:50pm

Add to your personal schedule
Design
GA Ballroom K
Leo Meyerovich (Graphistry / UC Berkeley)
Average rating: ***..
(3.67, 6 ratings)
Visualization is a weak link in big data tools: shoving 1MM rows into standard charts breaks their visual design and kills interactivity. In our mission to scale charts, we built the Superconductor language. It automatically compiles declarative visualizations into GPU code (WebCL+WebGL). This talk will explore how we're redesigning and optimizing core charts like heat maps and line graphs. Read more.
Add to your personal schedule
Hadoop and Beyond
GA Ballroom J
Adam Fuchs (Sqrrl)
Average rating: ***..
(3.57, 7 ratings)
Apache Accumulo has evolved from a niche government project to a key component of the Hadoop ecosystem with adopters across a variety of industries. One important differentiator for Accumulo is the concept of "cell-level security." Learn how to properly implement cell-level security concepts from the former technical director of the Accumulo project at NSA. Read more.
Add to your personal schedule
Data Science
Ballroom AB
Lukas Biewald (CrowdFlower)
Average rating: ***..
(3.83, 6 ratings)
Data scientists know how hard it is to collect, categorize and label vast amounts of data. But some smart data scientists are effectively leveraging the human intelligence of the crowd to solve these problems, resulting in better training of machine learning models and improved system performance. Read more.

2:20pm

Add to your personal schedule
Sponsored
Ballroom G
Ben Redman (Citus Data)
Average rating: *****
(5.00, 2 ratings)
PostgreSQL is an advanced open source database known for its reliability. It also features a rich extension ecosystem that enables features like semi-structured data types, new SQL operators, and a columnar data store. This talk examines extensions available to PostgreSQL users and how CitusDB turns PostgreSQL into a scalable data platform for addressing real world analytics problems. Read more.
Add to your personal schedule
Design
GA Ballroom K
Ian Timourian (Paxata)
Average rating: ****.
(4.33, 3 ratings)
Happy accidents can influence one's creative process. Ian Timourian will discuss his exploration of the algorithms and techniques utilized by the famous poet Gertrude Stein through visualization. Read more.
Add to your personal schedule
Pablos Holman (Turing AI)
Average rating: ****.
(4.40, 5 ratings)
We are at the beginning of creating a generation of scientists & analysts who can relate to data in entirely new ways. The feeble computational models we’ve created in Excel over the course of our lives are fundamentally different than what is just becoming possible. Read more.
Add to your personal schedule
Sponsored
Ballroom F
Average rating: ****.
(4.00, 3 ratings)
The real promise of big data isn’t about merely doing analytics cost-effectively and at scale; it’s about discovery. Data discovery means uncovering hidden patterns from disparate sources without needing to know which questions to ask or the data relationships in advance... Read more.
Add to your personal schedule
Hadoop and Beyond
GA Ballroom J
Rachel Poulsen (Silicon Valley Data Science), John Akred (Silicon Valley Data Science)
Average rating: ***..
(3.43, 7 ratings)
Design of Experiments (DOE) is a scientific approach to understanding causality using data collection and applied statistical techniques. Through a series of relevant case studies, this session will review the “design” and the “experiment” side of DOE, including systematic data collection and basic statistical applications, and discuss relevant applications beyond A/B testing websites. Read more.
Add to your personal schedule
Sponsored
Ballroom H
Bruno Aziza (Alpine Data Labs)
Average rating: *....
(1.00, 2 ratings)
In this panel discussion, we’ll hear from entertainment, healthcare, and media industry leaders as they discuss their strategy to demystify analytics end to end. We’ll have a question and answer session moderated by Alpine Data Labs. Read more.
Add to your personal schedule
Data Science
Ballroom AB
Marc Smith (Connected Action Consulting Group)
Average rating: **...
(2.67, 3 ratings)
SNA, social network analysis, is a powerful technique for making sense of a connected world. But the skills needed to collect, analyze, visualize, and gain insights into collections of connections are hard to find. Now, new tools make networks as easy to manage as a pie chart. Using the familiar Excel spreadsheet, NodeXL enables end users to gain insights into Twitter, Facebook & more. Read more.
Add to your personal schedule
Data in Action
Ballroom CD
Yann Ramin (Twitter, Inc.)
Average rating: **...
(2.68, 25 ratings)
Twitter's Observability stack collects, processes, monitors and visualizes over 170 million real-time time series from all service and system components. This session covers how the stack is built and scales to enable developers and reliability engineers to build fault-tolerant distributed services. In this talk, you will learn what works and what doesn’t, from architecture to implementation. Read more.
Add to your personal schedule
Connected World
Mission City M
Kai Trepte (Harvard Clean Energy Project)
Average rating: *****
(5.00, 1 rating)
The present fossil fuel based economy must give way to a renewable energy based future. The Harvard Clean Energy Project set out to discover new molecular materials for the next generation of organic solar cells. In studying 2.3 million (m) compounds with 24m conformers in 150m density functional theory calculations, this Big Data project will benefit mankind aiding the quest for clean energy. Read more.

2:40pm

Add to your personal schedule
Hadoop and Beyond
GA Ballroom J
Patrick McFadin (Datastax)
Average rating: ****.
(4.67, 3 ratings)
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data. This talk will give an overview of the many ways you can be successful. Read more.
Add to your personal schedule
Data Science
Ballroom AB
Ted Willke (Intel)
Average rating: ****.
(4.11, 9 ratings)
Graph analytics promises to uncover new patterns in big data - but it's not easy to use commercially. Why is it so tough for data scientists to construct graphs and extract insight? This talk discusses Intel's efforts to deliver a graph cluster solution that is as easy to work with as it is powerful. Read more.

4:00pm

Add to your personal schedule
Sponsored
Ballroom G
Mohit Sati (Ask.com)
Average rating: ****.
(4.00, 1 rating)
Search Engine Marketing is an important revenue opportunity for Ask.com, planed to nearly double in 2014. Fueled by growth and acquisitions such as About.com and Investopedia, the keyword portfolio will grow by 90x through 2014. SEM Analytics at Ask.com involves tens of millions of cost metrics stored daily, hundreds of millions of portfolio keywords, and billions of historical costs. Read more.
Add to your personal schedule
Design
GA Ballroom K
Brian Abelson (CSV Soundsystem), Thomas Levine (csv soundsystem)
Average rating: ***..
(3.00, 1 rating)
We have developed some open-source tools for building and scaling systems for realtime data analysis with data music videos and data gastronomification. We'll discuss the theory behind these two data analysis methods, and then we'll present case studies on how our tools are used to enable business analytics and instill a data-driven culture. Read more.
Add to your personal schedule
Cameran Hetrick (VMware), Kimberly Stedman (Freelance)
Average rating: ****.
(4.60, 5 ratings)
Combine your best algorithms and smartest data architecture, and what do you get? Without humans, you have an expensive, high tech brick. Humans generate data, which is used by and for humans to achieve human goals. If you want your data department to earn its keep by showing real value, you must build your social systems as meticulously as you build your pipeline. Read more.
Add to your personal schedule
Sponsored
Ballroom F
Mike Wendt (Accenture Technology Labs)
Average rating: ****.
(4.50, 2 ratings)
In this session, we will share the results of our study, a price-performance comparison of a bare-metal Hadoop cluster and cloud-based Hadoop clusters. Read more.
Add to your personal schedule
Hadoop and Beyond
GA Ballroom J
Shrikanth Shankar (Qubole Inc.)
Average rating: ****.
(4.25, 4 ratings)
Shrikanth Shankar, Qubole’s VP of Engineering, shares his best practices for building high-performance, scalable queries and deploying User Defined Functions (UDFs) to Big Data applications in Apache Hive. For data analysts and data scientists in the trenches, this is a key session to attend. Read more.
Add to your personal schedule
Sponsored
Ballroom H
Sanjay Goil (Autonomy IDOL)
Average rating: *****
(5.00, 1 rating)
Forget the 140 characters, Twitter is Big Data. Every day sees around 100TBs of data ingested and tens of thousands of Hadoop jobs. Join us to hear how Twitter is using HP’s HAVEn platform to run their Big Data analytics. Learn why they’ve integrated HP Vertica with their Hadoop infrastructure to deliver the scale and speed needed for their analytics. Read more.
Add to your personal schedule
Data Science
Ballroom AB
Diane Chang (Intuit), Steven Hillion (Alpine Data Labs), Nick Kolegraff (Rackspace), Matthew Gee (Effortless Energy / University of Chicago )
Average rating: ***..
(3.78, 9 ratings)
In this panel discussion, experts from four different industries will share their first-hand experiences building and deploying teams of data scientists. Read more.
Add to your personal schedule
Data in Action
Ballroom CD
Kurt Brown (Netflix)
Average rating: ****.
(4.78, 18 ratings)
Netflix is a data-driven company. While "data-driven" is often no more than a lofty buzzword, we'll discuss how we make it a reality. We'll dive into the technologies we use and the philosophies underpinning how we get things done. We'll cover our "cloud native" data infrastructure, our use and contributions to open source software, and our open and enabling data environment. Read more.
Add to your personal schedule
Connected World
Mission City M
Monica Rogati (Jawbone)
Average rating: ****.
(4.50, 10 ratings)
We optimize ads, but not our mood. We know more about our tweets than our own bodies. That's all about to change. As wearables transform the 'quantified self' from a niche to a mainstream market, they are generating vast amounts of data about our health, habits, and lifestyles Read more.

4:50pm

Add to your personal schedule
Sponsored
Ballroom G
Average rating: ***..
(3.00, 2 ratings)
This presentation discusses how we used complex event processing (CEP) and MapReduce based technologies to track and process data from a soccer match as part of the annual DEBS event processing challenge while achieving throughput in excess of 100,000 events/sec. Read more.
Add to your personal schedule
Design
GA Ballroom K
Shelley Evenson (Fjord)
Average rating: ***..
(3.00, 6 ratings)
This talk by Shelley Evenson, Executive Director of Organizational Evolution at Fjord, will outline the key tenets of designing for big data: the difference between using personal or aggregate data, how to identify and utilize data patterns, how to build trust, and ways to deliver ongoing value at the right moments. Read more.
Add to your personal schedule
Felipe Hoffa (Google), Shawn Simister (Google), Ewa Gasperowicz (Google)
Average rating: ****.
(4.00, 4 ratings)
What can an SQL query teach us about the gender gap? We'll dive into the 20 million Freebase entities to focus on people notable enough to be part of it. What percentage of them are women? How is the gender gap divided by profession? How is it changing throughout the years? How do all this variables this look mapped at a country, state, and neighborhood level? Read more.
Add to your personal schedule
Sponsored
Ballroom F
Joe Hellerstein (Trifacta and UC Berkeley)
Average rating: ****.
(4.00, 1 rating)
Join Trifacta's founders and their customers to learn how Data Transformation is changing the way people work with data. By increasing data analyst productivity and giving business analysts direct access to Big Data for the first time, Trifacta's technology increases the breadth of data they work with, significantly shortens "time to insight", and enables better business decisions. Read more.
Add to your personal schedule
Hadoop and Beyond
GA Ballroom J
Vinod Kumar Vavilapalli (Hortonworks)
Average rating: ****.
(4.25, 4 ratings)
The Hadoop 2.0 revolution is in full force! Organizations, companies, users are gearing up for the move from 1.0 to 2.0. In this talk, we will discuss what Hadoop 2.0 is about, what YARN is, what features that HDFS2 unlocks and what it means to move to 2.0. We'll discuss this major migration from 1.0 to 2.0 from various perspectives - admins, frameworks, end users & data processing platforms. Read more.
Add to your personal schedule
Sponsored
Ballroom H
Milan Vaclavik (CenturyLink Technology Solutions)
We will discuss the strategic significance of infrastructure core services (compute, storage, network, and comprehensive security) required for robust big data solutions. Also the strategic significance of Hadoop 2.0, Hadoop/NoSQL convergence, and the critical need for effective modeling, query formulation, and data analysis capabilities as Hadoop becomes an enterprise platform for big data. Read more.
Add to your personal schedule
Data Science
Ballroom AB
Abe Gong (Jawbone)
Average rating: ****.
(4.00, 9 ratings)
Creating value from big, messy data sets can be a daunting task. The session introduces the Sidekick Pattern: using small, curated data to increase the value of Big Data. Drawing on lessons from data science for Jawbone’s UP fitness tracker, we will see how smart selection of data sidekicks can accelerate analysis, solve cold start problems, and simplify complicated data pipelines. Read more.
Add to your personal schedule
Data in Action
Ballroom CD
Average rating: ****.
(4.36, 11 ratings)
Data analytics is at the heart of product development at Facebook. Facebook’s data warehouse has grown rapidly over the years, and poses unique scalability challenges. This talk will briefly outline the evolution of the analytics software stack in the last year (both storage and query engines) and then delve deeper into the data management and compute challenges at this scale. Read more.
Add to your personal schedule
Connected World
Mission City M
Max Richman (Mobile Accord - GeoPoll)
Average rating: ****.
(4.50, 2 ratings)
At GeoPoll we are building a mobile integration platform to poll millions around the world via their own mobile phones. We do this by integrating with mobile carriers in places like Afghanistan and Congo to target users by location, make messages free, & pay users directly. This is hard. We have learned many dos and don'ts which we would like to share. Read more.

5:30pm

Add to your personal schedule
Exhibit Hall
Average rating: ***..
(3.25, 4 ratings)
Quench your thirst with vendor-hosted libations and snacks while you check out all the cool stuff in the Expo Hall. Read more.

8:00pm

Add to your personal schedule
Santa Clara Ballroom
Average rating: *....
(1.50, 2 ratings)
Help us kick off Strata 2014 with a festive gathering featuring a poker tournament. But even if you're not a card shark, join us for plenty of networking, refreshments, and great music, played by DJs whose day job is data science. Read more.

Thursday, 02/13/2014

8:45am

Add to your personal schedule
Mission City
Alistair Croll (Solve For Interesting), Roger Magoulas (O'Reilly Media)
Average rating: ***..
(3.00, 2 ratings)
Strata Program Chairs, Alistair Croll and Roger Magoulas, welcome you to the second day of keynotes. Read more.

8:50am

Add to your personal schedule
Mission City
Joe Hellerstein (Trifacta and UC Berkeley), Tutti Taygerly (Trifacta)
Average rating: ****.
(4.00, 23 ratings)
If Big Data is the grand challenge of our time, most analytic effort is like ground control: the hard work behind the scenes that enables ambitious analysis to occur. Read more.

9:00am

Add to your personal schedule
Mission City
Kaushik Das (Pivotal)
Average rating: **...
(2.47, 19 ratings)
The emerging Internet Of Things (IOT) enables us to build smart systems. We already have the sensory and motor parts of these systems available, but we don't have the brain. This is where data science comes into the picture! I will talk about how we are using big data technologies in conjunction with data science here at Pivotal to build the digital brain that makes a system smart. Read more.

9:10am

Add to your personal schedule
Mission City
Boyd Davis (Intel)
Average rating: ***..
(3.06, 17 ratings)
At Intel, we envision a future in which every organization in the world can use new sources of data to enhance its operational intelligence, fostering discoveries and innovation in science, industry, and medicine. Read more.

9:15am

Add to your personal schedule
Mission City
Matei Zaharia (Databricks)
Average rating: ****.
(4.21, 19 ratings)
While the first big data systems made a new class of applications possible, organizations must now compete on the speed and sophistication with which they can draw value from data. Future data processing platforms will need to not just scale cost-effectively; but to allow ever more real-time analysis, and to support both simple queries and today's most sophisticated analytics algorithms. Read more.

9:25am

Add to your personal schedule
Mission City
Average rating: *....
(1.65, 17 ratings)
Big Data without analytics is just data, but how do you perform the analytics? In this session, learn how In-Hadoop analytics is changing the game for the possibilities of Hadoop. Read more.

9:30am

Mission City
TBC

9:40am

Add to your personal schedule
Mission City
Average rating: ****.
(4.88, 43 ratings)
Keynote by James Burke, science and technology historian, futurist, and author. Read more.

10:40am

Add to your personal schedule
Sponsored
Ballroom G
Moderated by:
Jeffrey Kelly (The Wikibon Project)
Panelists:
Average rating: ***..
(3.60, 5 ratings)
Organizations are now moving beyond rigid and high latency data warehouse environments to more flexible and cost-effective "Data Lake(s)": centrally managed repository using low cost technologies such as Hadoop, SQL, In-Memory, and others to land any and all data that might potentially be valuable for analysis and operationalizing that insight. Read more.
Add to your personal schedule
Design
GA Ballroom K
Hadley Wickham (Rice University / RStudio)
Average rating: ****.
(4.92, 12 ratings)
A well-designed domain specific language makes all parts of the data science process easier. In this talk I'll discuss two DSLs implemented in R that make it data manipulation and visualisation both easier to describe and faster to compute. Read more.
Add to your personal schedule
Drew Sullivan (Organized Crime and Corruption Reporting Project)
Average rating: ***..
(3.40, 5 ratings)
Endemic organized crime, augmented by corrupt governments and business interests can threaten local and regional security throughout the world. In this session we'll show how journalists can use data and technology to ferret out, investigate and combat corruption. Read more.
Add to your personal schedule
Sponsored
Ballroom F
The Inflection Point - Hadoop and Big Data Analytics Read more.
Add to your personal schedule
Hadoop and Beyond
GA Ballroom J
Felienne Hermans (Delft University of Technology)
Average rating: ****.
(4.40, 5 ratings)
Spreadsheets are used extensively in industry: they are the number one tool for financial analysis. But they are as easy to build, as they are difficult to analyze, maintain and check. Felienne’s research aims at developing methods to support spreadsheet users to understand, update and improve spreadsheets. Read more.
Add to your personal schedule
Sponsored
Ballroom H
Vin Sharma (Intel)
In this session, I will illustrate these architectures with real-world examples of city governments, retail banks, food manufacturers, pharmaceutical companies, and Intel itself applying intelligence wherever data lives. Read more.
Add to your personal schedule
Data Science
Ballroom AB
Beau Cronin (Salesforce)
Average rating: ****.
(4.10, 10 ratings)
Probabilistic programming is a new paradigm for modeling and inference that offers hope for a fundamental shift in our approach to understanding the stories behind our data. This talk will provide an overview of the systems currently available and their relative strengths, show examples of their usage, and offer a peak at the road ahead. Read more.
Add to your personal schedule
Data in Action
Ballroom CD
Nandu Jayakumar (Yahoo! Inc./Stanford University), Tim Tully (Yahoo!)
Average rating: ****.
(4.00, 7 ratings)
Yahoo! ingests hundreds of TB of advertising data into Hadoop each day. This talk describes how we are building our next-generation data architecture on top of Shark and Spark that is orders of magnitude faster than the previous. We will focus on the advanced streaming algorithms implemented in this new architecture, and how the new architecture have enabled deeper insights to our data scientists. Read more.
Add to your personal schedule
Machine Data
Mission City M
David Andrzejewski (Sumo Logic)
Average rating: ****.
(4.67, 6 ratings)
Organizations of all types and sizes are experiencing an explosion of machine log data whose literally inhuman diversity and scale overwhelms traditional analysis tools and techniques. We will discuss how machine learning can complement human expertise, enabling the extraction of valuable and actionable insights from log data. Read more.

11:30am

Add to your personal schedule
Sponsored
Ballroom G
Rob Rosen (Pentaho), Tim Garnto (edo)
edo Interactive shares how they drive agile, improved decision-making by complementing native Hadoop technologies with analytical databases and ETL optimization and data visualization solutions from vendors such as Pentaho. Read more.
Add to your personal schedule
Design
GA Ballroom K
Brian Granger (Cal Poly San Luis Obispo)
Average rating: ****.
(4.56, 16 ratings)
The IPython Notebook is an open-source, web-based interactive computing environment that enables users to create documents that combine live code and data with text, equations, plots and HTML. In this talk I will describe a new interactive widget architecture for the Notebook that allows the seamless integration of JavaScript (d3.js,...) and Python for data exploration and visualization purposes. Read more.
Add to your personal schedule
Moderated by:
Jesse Robbins (OnBeep, Inc.)
Panelists:
Shannon Spanhake (City & County of San Francisco), Eddie Tejeda (Civic Insight / OpenOakland / Public Ethics Commission)
Average rating: ***..
(3.50, 2 ratings)
Join Kiran Jain, the Senior Deputy Attorney for the City of Oakland, and Shannon Spanhake, the Deputy Innovation Officer for the City and County of San Francisco, to learn how governments are changing, and being changed, by the digital age. Read more.
Add to your personal schedule
Sponsored
Ballroom F
Anand Venugopal (Impetus Technologies Inc.), Pranay Tonpay (Impetus)
This session will address the exciting possibilities of bringing dramatic improvements in various industry verticals using big data analytics especially real-time analytics over high-volume data in motion. Read more.
Add to your personal schedule
Hadoop and Beyond
GA Ballroom J
Marcel Kornacker (Cloudera, Inc.)
Average rating: ****.
(4.57, 7 ratings)
Learn how and why it is now possible for Apache Hadoop to serve as a virtual Enterprise Data Warehouse (EDW) framework for native Big Data (stored in HDFS) - making it no longer necessary to move that data into the EDW at great expense simply for analysis. In this session, attendees will get an architect-level view of the solution and explore an example configuration and benchmark numbers. Read more.
Add to your personal schedule
Sponsored
Ballroom H
Patrick Shumate (Comcast Cable)
Average rating: ***..
(3.40, 5 ratings)
How Comcast Turns Big Data into Real-Time Operational Insights Read more.
Add to your personal schedule
Data Science
Ballroom AB
Chris Harland (Microsoft)
Average rating: *****
(5.00, 13 ratings)
Predictive models are popular for their ability to grapple with massive data and bring to light features which are non-obvious to even the best domain experts. Solving practical problems with real world data involves creating models that balance predictive accuracy with practical significance. This talk provides examples of this balance in optimizing Chicago area bars and extends to Bing search. Read more.
Add to your personal schedule
Data in Action
Ballroom CD
Peter Wang (Continuum Analytics), Chris White (DARPA)
Average rating: ****.
(4.50, 4 ratings)
DARPA's XDATA program seeks to develop open source software to address government Big Data at all stages, from analysis to operations, in the areas of scalable analytics, processing, visualizations, and UIs. This new multi-year effort involves over 25 teams from academia, research labs, and small and large businesses, and includes efforts around Hadoop, Python, R, and other technologies. Read more.
Add to your personal schedule
Machine Data
Mission City M
Steven Gustafson (GE Global Research), Parag Goradia (GE)
Average rating: **...
(2.50, 2 ratings)
This presentation will introduce Big Data in context of the Industrial Internet, describe some of the unique software and analytics opportunities, and present several current research topics making the Industrial Internet a reality. Read more.

12:10pm

Add to your personal schedule
Exhibit Hall & Hyatt Santa Clara
Birds of a Feather (BoF) sessions are informal roundtable discussions happening during lunch on Wednesday, February 12 and Thursday, February 13. You can join any BoF table or start your own with a topic of your choice. The BoF sign-up board will be near the Registration area. Read more.

1:30pm

Add to your personal schedule
Sponsored
Ballroom G
Wayne Thompson (SAS), Paul Kent (SAS)
In the world of ever growing data volumes, how do you extract insight, trends and meaning from all that data in Hadoop? Getting relevant information in seconds (instead of hours or days) from big data requires a different approach. Join Paul Kent and Wayne Thompson from SAS as they share how to reveal insights in your Big data and redefine how your organization solves complex problems. Read more.
Add to your personal schedule
Design
GA Ballroom K
Ryan Cunningham (ClipCard)
Average rating: ****.
(4.50, 2 ratings)
We're failing at big data, and bigger technology isn't helping. Complex infrastructure shouldn't justify complicated experiences. Let's apply the principles of consumer app culture to enterprise decision-making in a way that goes beyond dashboards. Let's use design thinking and metadata to connect people to information in a world where complexity is inevitable and technology alone is insufficient. Read more.
Add to your personal schedule
Data Science
Ballroom E
Michael Abbott (Kleiner Perkins Caufield & Byers)
Average rating: *****
(5.00, 3 ratings)
Everyone knows that massive, real-time data processing is behind many of the hottest new companies in technology. But what’s really going on underneath the covers? In this session, investor and technology entrepreneur Michael Abbott unboxes three startups to look at the technology, architecture, and innovations they’ve harnessed to deliver their products and services. Read more.
Add to your personal schedule
Sponsored
Ballroom F
Owen O'Malley (HortonWorks), Alan Gates (Hortonworks)
Average rating: ****.
(4.00, 2 ratings)
Apache Hive is the de-facto standard for SQL-in-Hadoop today, with more enterprises relying on this open source project than on any alternative. Enterprises have asked for Hive to become more real-time and interactive‚ and the Hive community has responded. Read more.
Add to your personal schedule
Hadoop and Beyond
GA Ballroom J
Reynold Xin (Databricks), Sameer Agarwal (UC Berkeley)
Average rating: ***..
(3.50, 6 ratings)
BlinkDB is an approximate query engine that answers queries in seconds on extremely large datasets by leveraging data sampling. It exploits advances in machine learning and distributed query processing to allow trading off response times and accuracy. BlinkDB is being integrated into Shark and Presto. We will cover real world use case scenarios of BlinkDB at adopters such as Facebook. Read more.
Add to your personal schedule
Sponsored
Ballroom H
Eric Frenkiel (MemSQL)
Average rating: *****
(5.00, 1 rating)
In this session, MemSQL CEO Eric Frenkiel will discuss the benefits for companies that augment their existing information architecture with a versatile real-time database platform to handle high volume and velocity transactional and analytical workloads. Read more.
Add to your personal schedule
Data Science
Ballroom AB
Emil Eifrem (Neo Technology / Neo4j)
Average rating: ***..
(3.33, 9 ratings)
Recent years have seen an explosion of technologies for managing and analyzing graphs. While most people associate "graph" with "the social graph," there's a wide variety of non-social use cases for graph technologies. This session will explore graph adoption in finance, telecom, healthcare, HR & recruiting, gaming and beyond, using concrete case studies from actual graph production deployments. Read more.
Add to your personal schedule
Hadoop and Beyond
Ballroom CD
Avery Ching (Facebook)
Average rating: ****.
(4.25, 4 ratings)
Analyzing graphs can lead to useful insights that drive product and business decisions. This talk describes our efforts at Facebook to scale Apache Giraph to very large graphs (up to one trillion edges) and how we run Apache Giraph in production. We will also talk about how to build applications, some of the algorithms that we have implemented, and their use cases. Read more.
Add to your personal schedule
Machine Data
Mission City M
Brett Sargent (LumaSense Technologies Inc.)
Average rating: **...
(2.80, 5 ratings)
Smart meters may be the most visible element of the so-called smart grid, but how smart is it if the plants producing the energy are dumb? To ensure the integrity of the grid, every stage of our electrical power infrastructure – including generation, transmission and distribution – has to get ”smart.” Sophisticated sensors connected to big data analytics are key to keeping the power flowing. Read more.

1:50pm

Add to your personal schedule
Data Science
Ballroom AB
Wes McKinney (DataPad Inc.)
Average rating: ***..
(3.00, 9 ratings)
This talk will address some of the pressing problems in data preparation, analysis, visualization, and collaboration facing the modern data analyst. We will discuss the ways in which both programmatic and UI-driven tools are helping solve these problems and the areas in which more work and innovation are needed. Read more.

2:20pm

Add to your personal schedule
Sponsored
Ballroom G
Nenshad Bardoliwalla (Paxata, Inc.)
Join Paxata’s Nenshad Bardoliwalla for a look at the new breed of data preparation tools that use semantic algorithms to detect data types, apply machine learning to find hidden patterns, and link related columns of data automatically. Read more.
Add to your personal schedule
Design
GA Ballroom K
Ben Fry (Fathom Information Design)
Average rating: ****.
(4.75, 4 ratings)
Ben Fry, Principal, Fathom Read more.
Add to your personal schedule
Moderated by:
Jake Porway (DataKind)
Panelists:
Drew Conway (IA Ventures), Rayid Ghani (Edgeflip | University of Chicago ), Elena Eneva (Accenture)
Average rating: ****.
(4.75, 8 ratings)
In this session, Edgeflip and Data Science for the Social Good’s Rayid Ghani, IA Ventures Scientist-in-residence and Datakind co-founder Drew Conway, and Datakind co-founder and executive director Jake Porway look at where data is making a difference today, what it promises tomorrow, and what’s holding it back. Read more.
Add to your personal schedule
Sponsored
Ballroom F
Jagane Sundar (WANdisco)
Average rating: ****.
(4.50, 2 ratings)
Application of the Paxos Protocol Towards Building a Continuously Available HBase Read more.
Add to your personal schedule
Hadoop and Beyond
GA Ballroom J
Paco Nathan (Databricks)
Average rating: ****.
(4.00, 4 ratings)
Google "Omega" research: 80% cluster jobs are batch, 60% cluster resources go to services. Batch is simple, services are hard, mixing workloads is key to building efficient distributed apps. This talk examines case studies of Mesos workloads: ranging from Twitter (100% on prem) to Airbnb (100% cloud). How did they leverage "data center OS" building blocks for orders of magnitude gains at scale? Read more.
Add to your personal schedule
Sponsored
Ballroom H
Peter Sirota (Amazon Web Services)
Average rating: ***..
(3.00, 4 ratings)
Learn from the Amazon Elastic MapReduce team's recent experience with streaming services such as Amazon Kinesis and low-latency query engines like Impala and Phoenix. We'll clarify many of the implementation details of our Hadoop InputFormat for Amazon Kinesis and demonstrate the power and flexibility of applying existing Hadoop ecosystem technologies to the real-time data paradigm. Read more.
Add to your personal schedule
Data Science
Ballroom AB
Neal Ford (ThoughtWorks)
Average rating: ***..
(3.60, 5 ratings)
Analytics and agility sometimes seem like natural enemies, but analytics suffer the same shifting requirements and uncertainty as other projects. This talk describe technique for incorporating analytics and data science into an agile rhythm. Read more.
Add to your personal schedule
Data in Action
Ballroom CD
Sriram Sankar (LinkedIn), Daniel Tunkelang (LinkedIn)
Average rating: ****.
(4.75, 4 ratings)
Social networks bring a new dimension to search. Instead of looking for web pages, users search a world of entities connected by a rich graph of relationships. Serving billions of deeply personalized searches creates unique infrastructure and relevance challenges for LinkedIn. We'll describe how we've addressed those challenges and discuss implications of social networks for the future of search. Read more.
Add to your personal schedule
Machine Data
Mission City M
Krishna Raj Raja (Cloudphysics), Balaji Parimi (Cloudphysics)
In this talk we discuss the challenges associated with data center operations management and provide details on how CloudPhysics big data platform solves these problems and enables new capabilities that were previously not possible. Read more.

2:40pm

Add to your personal schedule
Hadoop and Beyond
GA Ballroom J
Rahul Pathak (Amazon Web Services)
Average rating: ****.
(4.00, 3 ratings)
Learn how AWS thinks about big data and how we and our customers have approached managing large datasets using services such as Amazon S3, Amazon Elastic MapReduce, Amazon DynamoDB, and Amazon Redshift. Read more.
Add to your personal schedule
Data Science
Ballroom AB
Bin Yu (UC Berkeley)
Average rating: ****.
(4.00, 3 ratings)
In a thrilling breakthrough at the intersection of neuroscience and statistics, penalized Least Squares methods have been used to construct a "mind-reading" algorithm that reconstructs movies from fMRI brain signals. Read more.

4:00pm

Add to your personal schedule
Sponsored
Ballroom G
J.R. Arredondo (Rackspace)
We will discuss Rackspace’s vision for Data-as-a-Service, and provide a few key questions that could help you complement your technical analysis when choosing a database service. Along the way, we will also discuss parts of the portfolio of data services available at Rackspace, including SQL, MongoDB, Redis and Hadoop-based solutions. Read more.
Add to your personal schedule
Design
GA Ballroom K
Mark Troyer (Box, Inc.)
Average rating: **...
(2.50, 2 ratings)
You might think that art has nothing to do with dashboarding - dealing with your data architecture is an engineering/operations problem, right? On the contrary, understanding how to deal with your data in a way that is consumable by humans is fundamentally a design problem. Learn how art and design influenced the process for developing a new dashboarding tool called StatusWolf. Read more.
Add to your personal schedule
Moderated by:
Jim Stogdill (O'Reilly Media, Inc.)
Panelists:
Brian Behlendorf (Mithril Capital Management LLC), Adrian Cockcroft (Battery), Ari Gesher (Palantir Technologies), Kimberly Stedman (Freelance)
The always-popular Great Debate series returns to Strata. In this Oxford-style debate, two opposing teams take opposing positions. We poll the audience, and the teams try to sway opinions. It’ll be a fast-paced, sometimes irreverent look at some of the core challenges of putting data to work. Read more.
Add to your personal schedule
Sponsored
Ballroom F
Raj Bains (Clustrix, Inc.)
NewSQL has followed quickly on the heels of NoSQL - providing scale-out of NoSQL along with SQL and ACID guarantees. We'll discuss NewSQL with customer examples and contrast it with SQL on Hadoop implementations. Read more.
Add to your personal schedule
Hadoop and Beyond
GA Ballroom J
Matvey Arye (Princeton University/Cloudflare), Albert Strasheim (CloudFlare)
Average rating: ****.
(4.00, 1 rating)
Big-data is evolving. The state of the art has gone from running large batch queries over static data sets updated rarely to handling high-velocity data with low processing latency. In this session we present a new data framework that is geared at processing data with a very high update frequency. The framework utilizes the Go language's advanced concurrency primitives and extensibility. Read more.
Add to your personal schedule
Sponsored
Ballroom H
Matt Quinn (TIBCO Software, Inc.)
Big Data is really a small data mindset. At the enterprise-level, where the potential for data collection is greatest, companies are still stuck compartmentalizing data. TIBCO CTO Matt Quinn will share how the world’s leading sports teams, airlines, banks and retailers are those that change their Big Data mindset to an All Data one. Read more.
Add to your personal schedule
Data Science
Ballroom AB
Ameet Talwalkar (UC Berkeley), Evan Sparks (UC Berkeley)
Average rating: ****.
(4.14, 7 ratings)
Implementing and consuming Machine Learning techniques at scale are difficult tasks for ML Developers and End Users. MLbase (www.mlbase.org) is an open-source platform under active development addressing the issues of both groups. In this talk we will describe the high-level functionality of MLbase and demonstrate its *scalability* and *ease-of-use* via real-world examples. Read more.
Add to your personal schedule
Data in Action
Ballroom CD
Fangjin Yang (Metamarkets), Gian Merlino (Metamarkets)
Average rating: *****
(5.00, 6 ratings)
The maturation and development of open source technologies has made it easier than ever for companies to derive insights from vast quantities of data. In this session, we will cover how to build a real-time analytics stack using Kafka, Storm, and Druid. This combination of technologies can power a robust data pipeline that supports real-time ingestion and flexible, low-latency queries. Read more.
Add to your personal schedule
Machine Data
Mission City M
Ian Huston (Pivotal), Alexander Kagoshima (Pivotal), Noelle Sio (Pivotal)
Average rating: ****.
(4.00, 1 rating)
With increased road congestion around the globe and growing amounts of car data we need more intelligent analytical methods to beat the traffic. This talk presents our work on traffic velocity and travel disruption analytics. We describe our approach in detail, how we went from idea to implemented algorithm and how our methods can be applied to gain deep insight into influential factors. Read more.

4:50pm

Add to your personal schedule
Mission City
Average rating: ***..
(3.00, 1 rating)
Strata Program Chairs, Roger Magoulas and Alistair Croll, welcome you to Strata Closing Keynotes Read more.

4:55pm

Add to your personal schedule
Mission City
Megan Price (Human Rights Data Analysis Group)
Average rating: *****
(5.00, 5 ratings)
How do we know how many people have been killed in Syria? If violence is escalating or decreasing? The hard answer is we don't. But through careful application of machine learning and other statistical techniques, we can quantify what we do, and don't, know. Read more.

5:05pm

Add to your personal schedule
Mission City
Ben Fry (Fathom Information Design)
Average rating: ****.
(4.67, 6 ratings)
Ben Fry, Principal, Fathom Read more.

5:15pm

Add to your personal schedule
Mission City
David McRaney (Author)
Average rating: ****.
(4.57, 7 ratings)
David McRaney will tell the story of how the Department of War Math in World War II helped bring to light the psychology of how we miss what is important when it comes to failure, and how the modern understanding of the psychology of luck provides the best game plan for getting the best out of life. Read more.

5:35pm

Add to your personal schedule

On Your Own
Average rating: *****
(5.00, 1 rating)
Test session for ratings and feedback from mobile app Read more.