Skip to main content

Schedule: Data in Action sessions

Practical lessons, integration tricks, and a glimpse of what’s next as we look at large-scale, innovative big data applications in production. This track is packed with case studies that highlight behind-the-scenes insights and hard-won lessons from some of the industry’s top practitioners.

Track Hosts

Jen van der Meer is a former Wall Street Analyst and Economist, and has held executive management roles at Organic and Frog Design. Jen was a partner at Drillteam where she developed participatory customer engagement programs for brands such as Target, Toyota, Nestle, Saucony, Neiman Marcus, earning the first social media programs for several of these companies. Drillteam became part of Powered, Inc., which then joined Dachis Group in 2011, where Jen served as EVP Managing Director, overseeing demand generation, client service, strategy consulting, and experience design and delivery.

Roger Chen is an investor at O'Reilly AlphaTech Ventures (OATV), where he looks for collisions between unmet needs and enabling technologies. He spent his past life as a scientist and engineer, alternating between academia and industry while dabbling in the startup and venture capital world. When he is not tinkering with technology, Roger plays sports and wonders what lies beyond bell curves. Roger has a BS from Boston University and a PhD from UC Berkeley, both in Electrical Engineering.

Add to your personal schedule
Ballroom CD
David Epstein (Sports Illustrated)
Average rating: ****.
(4.89, 9 ratings)
Epstein explains the origins of the "magic number," how it should be used, and how it is often misused in a manner that often hinders performance science-and leads sports executives to overlook simple but important data-as well as the development of athletes. Read more.
Add to your personal schedule
Ballroom CD
Eric Pugh (OpenSource Connections)
Average rating: ****.
(4.25, 4 ratings)
The United States Patent and Trademark Office wanted a simple, lightweight, yet modern and rich discovery interface for Chinese patent data. This is how we did it. Read more.
Add to your personal schedule
Ballroom CD
Yann Ramin (Twitter, Inc.)
Average rating: **...
(2.68, 25 ratings)
Twitter's Observability stack collects, processes, monitors and visualizes over 170 million real-time time series from all service and system components. This session covers how the stack is built and scales to enable developers and reliability engineers to build fault-tolerant distributed services. In this talk, you will learn what works and what doesn’t, from architecture to implementation. Read more.
Add to your personal schedule
Ballroom CD
Kurt Brown (Netflix)
Average rating: ****.
(4.78, 18 ratings)
Netflix is a data-driven company. While "data-driven" is often no more than a lofty buzzword, we'll discuss how we make it a reality. We'll dive into the technologies we use and the philosophies underpinning how we get things done. We'll cover our "cloud native" data infrastructure, our use and contributions to open source software, and our open and enabling data environment. Read more.
Add to your personal schedule
Ballroom CD
Average rating: ****.
(4.36, 11 ratings)
Data analytics is at the heart of product development at Facebook. Facebook’s data warehouse has grown rapidly over the years, and poses unique scalability challenges. This talk will briefly outline the evolution of the analytics software stack in the last year (both storage and query engines) and then delve deeper into the data management and compute challenges at this scale. Read more.
Add to your personal schedule
Ballroom CD
Nandu Jayakumar (Yahoo! Inc./Stanford University), Tim Tully (Yahoo!)
Average rating: ****.
(4.00, 7 ratings)
Yahoo! ingests hundreds of TB of advertising data into Hadoop each day. This talk describes how we are building our next-generation data architecture on top of Shark and Spark that is orders of magnitude faster than the previous. We will focus on the advanced streaming algorithms implemented in this new architecture, and how the new architecture have enabled deeper insights to our data scientists. Read more.
Add to your personal schedule
Ballroom CD
Peter Wang (Continuum Analytics), Chris White (DARPA)
Average rating: ****.
(4.50, 4 ratings)
DARPA's XDATA program seeks to develop open source software to address government Big Data at all stages, from analysis to operations, in the areas of scalable analytics, processing, visualizations, and UIs. This new multi-year effort involves over 25 teams from academia, research labs, and small and large businesses, and includes efforts around Hadoop, Python, R, and other technologies. Read more.
Add to your personal schedule
Ballroom CD
Perry Samson (University of Michigan)
Average rating: ****.
(4.67, 6 ratings)
What if students could be provided helpful feedback in real-time based on the notes they are typing in class? This talk presents a prototype that has been in use in multiple courses at the University of Michigan to both challenge students' understanding based on the words they type in class and offer further resources for further study. Read more.
Add to your personal schedule
Ballroom CD
Sriram Sankar (LinkedIn), Daniel Tunkelang (LinkedIn)
Average rating: ****.
(4.75, 4 ratings)
Social networks bring a new dimension to search. Instead of looking for web pages, users search a world of entities connected by a rich graph of relationships. Serving billions of deeply personalized searches creates unique infrastructure and relevance challenges for LinkedIn. We'll describe how we've addressed those challenges and discuss implications of social networks for the future of search. Read more.
Add to your personal schedule
Ballroom CD
Fangjin Yang (Stealth), Gian Merlino (Stealth)
Average rating: *****
(5.00, 6 ratings)
The maturation and development of open source technologies has made it easier than ever for companies to derive insights from vast quantities of data. In this session, we will cover how to build a real-time analytics stack using Kafka, Storm, and Druid. This combination of technologies can power a robust data pipeline that supports real-time ingestion and flexible, low-latency queries. Read more.