Skip to main content

Strata + Hadoop World Schedule

Below are the confirmed and scheduled talks at Strata + Hadoop World 2013. Note: The schedule is subject to change.

Customize Your Own Schedule

Create your own conference schedule using the personal scheduler function. Mark the Tutorials, Sessions, Keynotes, and Events you want to attend by clicking on the calendar icon [calendar icon] next to each listing. Then go to your personal schedule and get your own customized schedule generated.

Grand Ballroom East
Add Apache HBase for Architects to your personal schedule
1:45pm Apache HBase for Architects Nick Dimiduk (Hortonworks, Inc)
Add Securing the Apache Hadoop Ecosystem to your personal schedule
4:15pm Securing the Apache Hadoop Ecosystem Aaron Myers (Cloudera, Inc.), Shreepadma Venugopalan (Cloudera)
Add HDFS Snapshots and Beyond to your personal schedule
5:05pm HDFS Snapshots and Beyond Jing Zhao (Hortonworks, Inc.), Tsz-Wo Sze (Hortonworks Inc.)
Gramercy Suite
Add Scalable, Flexible Data Privacy in the Cloud to your personal schedule
11:00am Scalable, Flexible Data Privacy in the Cloud Ahmed Radwan (Google's Motorola Mobility)
Add Parquet: An Open Columnar Storage for Hadoop to your personal schedule
1:45pm Parquet: An Open Columnar Storage for Hadoop Julien Le Dem (Twitter), Nong Li (Cloudera)
Add AtlasDB: ACID Transactions for Your Favorite Key-value Store to your personal schedule
4:15pm AtlasDB: ACID Transactions for Your Favorite Key-value Store Ari Gesher (Palantir Technologies), Danielle Kramer (Palantir Technologies)
Add REEF - Retainable Evaluator Execution Framework to your personal schedule
5:05pm REEF - Retainable Evaluator Execution Framework Russell Sears (Microsoft)
Sutton Center - Sutton South
Add Hadoop Adventures At Spotify to your personal schedule
11:50am Hadoop Adventures At Spotify Adam Kawa (Spotify)
Add Bringing Video Game Super Powers To Life with Hadoop BI  to your personal schedule
1:45pm Bringing Video Game Super Powers To Life with Hadoop BI Barry Livingston (Riot Games), Ben Werther (Platfora)
Add Hadoop & Data Science for the Enterprise to your personal schedule
2:35pm Hadoop & Data Science for the Enterprise Mark Slusar (Allstate)
Add Integrated Hadoop Management – Tying it All Together! to your personal schedule
4:15pm Integrated Hadoop Management – Tying it All Together! Zach Snyder (The Walt Disney Company)
Grand Ballroom West
Add Hadoop and the Relational Data Warehouse – When to Use Which? to your personal schedule
11:50am Hadoop and the Relational Data Warehouse – When to Use Which? Stephen Brobst (Teradata Corporation), Ari Zilka (Hortonworks)
Add The Big Data Doctor Is In to your personal schedule
1:45pm The Big Data Doctor Is In Bill Schmarzo (EMC Consulting), John Akred (Silicon Valley Data Science), Anand Raman (Impetus Technologies, Inc.), Scott Rose (Think Big Analytics)
Add Ensuring 100% Database Uptime for Real-Time Big Data to your personal schedule
2:35pm Ensuring 100% Database Uptime for Real-Time Big Data Srini Srinivasan (Aerospike Inc.)
Add Is Your Cloud Ready for Big Data? to your personal schedule
4:15pm Is Your Cloud Ready for Big Data? Richard McDougall (VMware)
Add Running On-premise Hadoop as a Business to your personal schedule
5:05pm Running On-premise Hadoop as a Business Sumeet Singh (Yahoo!, Inc.)
Beekman Parlor - Sutton North
Add Real-time Recommendations for Retail: Architecture, Algorithms, and Design to your personal schedule
11:50am Real-time Recommendations for Retail: Architecture, Algorithms, and Design Jonathan Natkins (WibiData), Juliet Hougland (Self)
Add New York City: A Data Science Mecca to your personal schedule
1:45pm New York City: A Data Science Mecca Steve Lohr (The New York Times | Brown Institute for Media Innovation at Columbia University), Chris Wiggins (hackNY/Columbia), Yann LeCun (NYU), Deborah Estrin (Cornell NYC Tech)
Add The Hidden Data Science Pipeline to your personal schedule
2:35pm The Hidden Data Science Pipeline Mark Mims (Infochimps)
Add Making Big Data Small to your personal schedule
4:15pm Making Big Data Small Baron Schwartz (VividCortex)
Add How to Stop Worrying and Start Modeling Big Data with Better Algorithms and H2O to your personal schedule
5:05pm How to Stop Worrying and Start Modeling Big Data with Better Algorithms and H2O Srisatish Ambati (0xdata Inc), Cliff Click (0xdata)
Regent Parlor
Add Data Philanthropy: Private-Public Sector Big Data Partnerships to your personal schedule
11:00am Data Philanthropy: Private-Public Sector Big Data Partnerships Robert Kirkpatrick (UN Global Pulse), Mark Leiter (Nielsen)
Add Drought Prediction and Ecological Monitoring with the Internet of Things to your personal schedule
11:50am Drought Prediction and Ecological Monitoring with the Internet of Things Adam Wolf (Princeton University), Kelly Caylor (Princeton University)
Add Big Data to Enable Connected Products to your personal schedule
1:45pm Big Data to Enable Connected Products Ron Bodkin (Think Big Analytics)
2:35pm TBC
Add Information Revolution In Government to your personal schedule
5:05pm Information Revolution In Government James Stewart (Government Digital Service), James Abley (Government Digital Service)
Nassau Suite
Add Disruptive Data Science Case Study: Visa's Big Data Response to Cyber Threats to your personal schedule
11:00am Disruptive Data Science Case Study: Visa's Big Data Response to Cyber Threats Ravi Devireddy (Visa Inc), Annika Jimenez (Pivotal)
Add Shift into High Gear: Dramatically Improve Hadoop and  NoSQL Performance to your personal schedule
11:50am Shift into High Gear: Dramatically Improve Hadoop and NoSQL Performance M. C. Srivas (MapR Technologies, Inc)
Murray Hill Suite
Add Data Design for Chicago Energy to your personal schedule
11:00am Data Design for Chicago Energy Aaron Wolf (Datascope Analytics), Burton Rast (IDEO)
Add Interactive Visualization of "Big" Data to your personal schedule
11:50am Interactive Visualization of "Big" Data Sean Kandel (Trifacta)
Add The Great Debate: A Connected World is a Better World to your personal schedule
5:05pm The Great Debate: A Connected World is a Better World Jim Stogdill (O'Reilly Media, Inc.), Mona Vernon (Thomson Reuters), Trevor Hughes (International Association of Privacy Professionals), Randy Smerik (Osunatech, Inc.), Lisa Green (Common Crawl Foundation)
Rhinelander South
Add Big Data in the Real World to your personal schedule
11:00am Big Data in the Real World Eron Kelly (Microsoft Corporation), Albert Isern (BISmart)
Add Flexible Schema and the End of ETL to your personal schedule
11:50am Flexible Schema and the End of ETL Daniel Abadi (Yale University), Matthew Grace (Objective Logistics)
Add Between Real and Ideal in Big Data  to your personal schedule
2:35pm Between Real and Ideal in Big Data Yongik Park (LG CNS )
Add Achieving Real Success with Hadoop  to your personal schedule
4:15pm Achieving Real Success with Hadoop Amir Halfon (MarkLogic)
Add The Sands of Time: How Cloud is Changing the Role of the CIO to your personal schedule
5:05pm The Sands of Time: How Cloud is Changing the Role of the CIO Rod Smith (IBM Emerging Internet Technologies )
Rhinelander Center
Add Using A Visual Framework to Simplify ETL on Hadoop Without MapReduce or Programming to your personal schedule
1:45pm Using A Visual Framework to Simplify ETL on Hadoop Without MapReduce or Programming Mike Hoskins (Actian Corporation), Ari Zilka (Hortonworks)
Add Introducing a New Way to Interact with Insight to your personal schedule
2:35pm Introducing a New Way to Interact with Insight Stephanie McReynolds (ClearStory Data), Vaibhav Nivargi (ClearStory Data), Brian Zotter (ClearStory Data), Stephen McDaniel (Freakalytics)
Add Red Hat solutions for real-world big data to your personal schedule
5:05pm Red Hat solutions for real-world big data Greg Kleiman (Red Hat), Syed Rasheed (Red Hat)
Add Sponsor Pavilion Reception to your personal schedule
5:45pm Plenary
Room: Sponsor Pavilion
Sponsor Pavilion Reception
Add Tuesday Keynote Welcome to your personal schedule
8:45am Plenary
Room: Grand Ballroom
Tuesday Keynote Welcome Edd Dumbill (Silicon Valley Data Science), Alistair Croll (Solve For Interesting)
Add Hadoop's Impact on the Future of Data Management to your personal schedule
8:55am Plenary
Room: Grand Ballroom
Hadoop's Impact on the Future of Data Management Mike Olson (Cloudera)
Add Separating Hadoop Myths from Reality to your personal schedule
9:10am Plenary
Room: Grand Ballroom
Separating Hadoop Myths from Reality Jack Norris (MapR Technologies)
Add Big Impact from Big Data to your personal schedule
9:20am Plenary
Room: Grand Ballroom
Big Impact from Big Data Ken Rudin (Facebook)
Add Five Surprising Mobile Trajectories in Five Minutes to your personal schedule
9:30am Plenary
Room: Grand Ballroom
Five Surprising Mobile Trajectories in Five Minutes Tony Salvador (Intel Corporation )
Add Can Big Data Reach One Billion People? to your personal schedule
9:35am Plenary
Room: Grand Ballroom
Can Big Data Reach One Billion People? Quentin Clark (Microsoft)
Add What Makes Us Human? A Tale of Advertising Fraud to your personal schedule
9:40am Plenary
Room: Grand Ballroom
What Makes Us Human? A Tale of Advertising Fraud Claudia Perlich (Dstillery)
Add From Fiction to Facts with Big Data Analytics to your personal schedule
9:50am Plenary
Room: Grand Ballroom
From Fiction to Facts with Big Data Analytics Ben Werther (Platfora)
Add Towards Strata 2014 to your personal schedule
9:55am Plenary
Room: Grand Ballroom
Towards Strata 2014 Roger Magoulas (O'Reilly Media)
Add Startup Showcase Winners Announced to your personal schedule
10:00am Plenary
Room: Grand Ballroom
Startup Showcase Winners Announced Edd Dumbill (Silicon Valley Data Science), Alistair Croll (Solve For Interesting)
Add The Economic Potential of Open Data to your personal schedule
10:05am Plenary
Room: Grand Ballroom
The Economic Potential of Open Data Michael Chui (McKinsey Global Institute)
Add Tuesday Coffee BoFs to your personal schedule
8:00am Coffee Break
Room: Grand Ballroom Foyer
Tuesday Coffee BoFs
10:30am Break sponsored by Intel
Room: Sponsor Pavilion
Add Tuesday Afternoon BoFs to your personal schedule
3:15pm Afternoon Break
Room: Sponsor Pavilion
Tuesday Afternoon BoFs
Add Tuesday Lunchtime Industry BoFs to your personal schedule
12:30pm Lunch sponsored by MapR
Room: America's Hall 1 & 2
Tuesday Lunchtime Industry BoFs
Add Data After Dark to your personal schedule
8:30pm Plenary
Room: West Village
Data After Dark
7:15pm Dinner
Room: On Your Own
Hadoop World
11:00am-11:40am (40m) Hadoop Platform
Morse: Realtime ETL in Facebook Analytics Platform
Jun Fang (Facebook)
Morse is a new system developed in Facebook, to transform its ETL pipeline from daily batch to realtime. It continuously moves, transforms and loads data from distributed log and sharded mysql db, into Hive data warehouse. HBase is used as underlying storage for incrementally updated table, while the data is exposed as external table into Hive for read processing.
Hadoop World
11:50am-12:30pm (40m) Hadoop Platform
From Promise to a Platform: Next Steps in Bringing Workload Diversity to Hadoop
Henry Robinson (Cloudera)
The increasing diversity of frameworks and workloads that run atop a Hadoop cluster gives more flexibility and power to users, but make it very difficult for an administrator to ensure that SLAs are met while allowing exploratory, ad-hoc usage to continue to use all spare capacity. We present our vision and implementation for generalised resource management on Hadoop, suitable for all uses.
Hadoop World
1:45pm-2:25pm (40m) Hadoop Platform
Apache HBase for Architects
Nick Dimiduk (Hortonworks, Inc)
Your application is out-growing its database, you've started shopping NoSQL options. Maybe you've adopted Hadoop into your Data Warehouse. You've heard HBase might be an appropriate technology, but you need to know more. This talk is for you. To understand its use, first understand how it works. This talk explores the design of HBase and its critical paths to ground an understanding of its use.
Hadoop World
2:35pm-3:15pm (40m) Hadoop Platform
What’s Next for Apache HBase: Multi-tenancy, Predictability, and Extensions.
Jonathan Hsieh (Cloudera, Inc)
Apache HBase is a robust random-access distributed datastore built upon Apache Hadoop’s HDFS and Apache ZooKeeper. This talk will describe themes emerging from recent features slated for the upcoming post-0.96 release. These include improvements for multi-tenant deployments; a focus on predictable latencies; and the proliferation of new extensions for features traditionally from databases.
Hadoop World
4:15pm-4:55pm (40m) Hadoop Platform
Securing the Apache Hadoop Ecosystem
Aaron Myers (Cloudera, Inc.) et al
When Hadoop is used for sensitive data, security requirements arise that require strong authentication, authorization of data/resources, and data confidentiality. This session covers how various parts of the Hadoop ecosystem can interact in a secure way to address these requirements. We will focus on the advanced Apache Hive authorization features enabled by the Apache Sentry (incubating) project
Hadoop World
5:05pm-5:45pm (40m) Hadoop Platform
HDFS Snapshots and Beyond
Jing Zhao (Hortonworks, Inc.) et al
In this talk, attendees will understand the high level design of HDFS snapshots, along with how snapshots can be used for data protection and disaster recovery. We will also talk about details of snapshot development and testing. In the end, we will explore how to build and improve other features on top of HDFS snapshots, including Distcp, HBase snapshots, and Hive table snapshots.
Hadoop World
11:00am-11:40am (40m) Hadoop & Beyond
Scalable, Flexible Data Privacy in the Cloud
Ahmed Radwan (Google's Motorola Mobility)
Multi-tenancy is a reality for large-scale data systems, but it poses concerns about exposure of sensitive data. Using anonymization techniques, sensitive data can be protected in ways that maintains user privacy while preserving the ability to use the data effectively for operational needs. In this talk, we explore the challenges and lessons learned in building solutions for data anonymization
Hadoop World
11:50am-12:30pm (40m) Hadoop & Beyond
Real-Time Analytical Processing (RTAP) using Spark and Shark
Jason (Jinquan) Dai (Intel)
There is increasing demand to discover and explore data iteratively, interactively, and for real-time insights, which we lump together under the term Real-Time Analytical Processing (RTAP). This talk presents our efforts and experience on building the real-time analytical processing framework for several large websites, leveraging Spark and Shark research from UC Berkeley.
Hadoop World
1:45pm-2:25pm (40m) Hadoop & Beyond
Parquet: An Open Columnar Storage for Hadoop
Julien Le Dem (Twitter) et al
Parquet is a columnar file format for Hadoop that brings performance and storage benefits. It supports deeply nested data structures and is easy to extend and integrate with existing type systems.
Hadoop World
2:35pm-3:15pm (40m) Hadoop & Beyond
Lessons Learned From A Decade’s Worth of Big Data At The U.S. National Security Agency (NSA)
Adam Fuchs (Sqrrl)
The National Security Agency works with some of the world’s largest, most complex, and most sensitive datasets. In order to analyze this data, NSA has developed some powerful tools, such as Apache Accumulo. Come learn about NSA’s key lessons learned about building a Big Data platform from the former Technical Director of the Accumulo project at the NSA.
Hadoop World
4:15pm-4:55pm (40m) Hadoop & Beyond
AtlasDB: ACID Transactions for Your Favorite Key-value Store
Ari Gesher (Palantir Technologies) et al
AtlasDB is a bolt-on layer for a key-value stores (distributed or otherwise) that implements MVCC and guarantees ACID properties for eventually-consistent data stores. In this talk, we'll take a look at the protocol used to implement the transactions, talk about the performance tradeoffs from using transactions, and look at the transactions API it offers.
Hadoop World
5:05pm-5:45pm (40m) Hadoop in Action
REEF - Retainable Evaluator Execution Framework
Russell Sears (Microsoft)
REEF is a set of tools and services that make it easy to implement new scalable computational frameworks atop YARN, and to allow jobs to perform multiple types of computations, such as MapReduce, iterative machine learning and graph processing. We plan to support additional programming models over time. REEF is language-independent, allowing it to bridge the Java and .NET ecosystems.
Hadoop World
11:00am-11:40am (40m) Hadoop in Action
When Workflows Attack: In the Trenches with Azkaban, LinkedIn's Open-Source Workflow Scheduler
Richard Park (Linkedin Corp)
Azkaban is an open-source workflow management application developed at LinkedIn to schedule and run our Hadoop workflows. Sporting a beautiful web UI, it is designed to be scalable, reliable, modular, secure and extensible. Azkaban has been battle tested on LinkedIn's Hadoop clusters, driving all of our data products over the last few years.
Hadoop World
11:50am-12:30pm (40m) Hadoop in Action
Hadoop Adventures At Spotify
Adam Kawa (Spotify)
A trip into Hadoop jungle to show the most interesting, exciting and surprising places where we have been to while growing fast from a 60 to 690-node Hadoop cluster. We will expose our JIRA tickets, real graphs, statistics, even excerpts from our dialogues. We will share the mistakes that we made and describe the fixes that finally domesticated this love-demanding yellow elephant and its friends.
Hadoop World
1:45pm-2:25pm (40m) Hadoop in Action
Bringing Video Game Super Powers To Life with Hadoop BI
Barry Livingston (Riot Games) et al
Riot Games has built the most played video game in the world - League of Legends - and they need to constantly monitor, develop, and update their games to keep players engaged. Learn about different data architecture approaches more about the Riot Games’ “Data Collection Pipeline” that provides insights into what’s needed to continuously improve the gamers experience.
Hadoop World
2:35pm-3:15pm (40m) Hadoop in Action
Hadoop & Data Science for the Enterprise
Mark Slusar (Allstate)
After a successful round of Hadoop Data Science projects, a company will make a sizable Hadoop commitment. People, process, and technology stand at the tipping point for an exciting adventure in innovation and evolution that creates new possibilities. This presentation educates attendees on the changes from the traditional methods to the new methods and paints a vision of the future.
Hadoop World
4:15pm-4:55pm (40m) Hadoop in Action
Integrated Hadoop Management – Tying it All Together!
Zach Snyder (The Walt Disney Company)
Managing Hadoop clusters to meet business needs can be challenging. Learn how Disney uses an integrated approach, leveraging both Hadoop-specific tools and common IT management tools to create a comprehensive management toolkit for our Hadoop clusters.
Hadoop World
5:05pm-5:45pm (40m) Hadoop in Action
Working with Geospatial Data Using Hadoop and HBase and How Monsanto Used It to Help Farmers Increase Their Yield
Erich Hochmuth (Monsanto) et al
Monsanto is building new technology driven products for their customers that will leverage big data. This talk describes how Monsanto is building these scalable applications with geospatial data, using Hadoop and HBase as the backend systems.
11:00am-11:40am (40m) Enterprise Data
Defining your Big Data Arsenal: NoSQL, Hadoop, and RDBMS
Matt Asay (MongoDB, Inc.)
For some, Hadoop is synonymous with "Big Data." But Hadoop is just one component of a successful Big Data architecture. NoSQL solutions like MongoDB also play a dominant role for storage and real-time data processing, and RDBMS has a place, too. This session will drill down on the different types of NoSQL databases and how they fill out Hadoop and RDBMS in a modern Big Data architecture.
11:50am-12:30pm (40m) Enterprise Data
Hadoop and the Relational Data Warehouse – When to Use Which?
Stephen Brobst (Teradata Corporation) et al
Hortonworks Chief Product Officer, Ari Zilka, and Teradata CTO, Stephen Brobst, show you when to use Hadoop and when to use an MPP relational data warehouse. At a conceptual level, it is easy to think they are interchangeable, but the differences overwhelm the similarities. Two of the most trusted experts in their fields examine how big data technologies are being used in practical deployments.
1:45pm-2:25pm (40m) Enterprise Data
The Big Data Doctor Is In
Bill Schmarzo (EMC Consulting) et al
Opinions might be plentiful in big data, but experience is rare. Join this interactive session to quiz some of the industry's most seasoned minds about what works and what doesn't when it comes to bringing big data to business.
2:35pm-3:15pm (40m) Enterprise Data
Ensuring 100% Database Uptime for Real-Time Big Data
Srini Srinivasan (Aerospike Inc.)
Internet environments for consumer-facing applications routinely demand high throughput while SLAs require100% uptime. This session reviews 10 practices for ensuring high performance and availability based on the real-world lessons of large-scale ad sector deployments where speed means 5 milliseconds, scale is 200,000 to 2 million TPS against terabytes of data, and downtime is not an option.
4:15pm-4:55pm (40m) Enterprise Data
Is Your Cloud Ready for Big Data?
Richard McDougall (VMware)
Big data is transforming the cloud as it moves from web giants into the enterprise. To run today’s multiple workload types, infrastructure must be architected as a common software-defined platform that supports the key workload components for todays and tomorrow’s big data systems. We must plan now to accommodate explosive growth and the need for robust storage, networking and security.
5:05pm-5:45pm (40m) Enterprise Data
Running On-premise Hadoop as a Business
Sumeet Singh (Yahoo!, Inc.)
Enterprises thinking of adopting Hadoop are increasingly debating between on-premise and cloud-based models for their needs. We lay out a set of criteria to help enterprises evaluate their options. For the ones who have already made or have plans to make significant on-premise investments, we present an approach to manage Hadoop as a service with a P&L, and metering & billing provisions.
11:00am-11:40am (40m) Data Science
Viva la Revolution: How MailChimp is using Big Data to Help Users Help Themselves
John Foreman (MailChimp)
MailChimp's first big data effort, the Email Genome Project, was internal, focused on abuse-prevention. But once this centralized storage and analytics capability demonstrated its practical value, the company turned toward crafting user-facing big data products. This talk will detail the results of MailChimp's effort to democratize big data analysis in email marketing for their users.
11:50am-12:30pm (40m) Data Science
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Jonathan Natkins (WibiData) et al
Consumer expectations have dramatically increased and retailers must present relevant content to maintain a competitive advantage. This presentation will demo an e-commerce application with real-time, personalized recommendations and discuss combining open-source system architecture, based on HBase and Kiji, with good predictive model design to build a scalable, real-time recommendation system.
1:45pm-2:25pm (40m) Data Science
New York City: A Data Science Mecca
Steve Lohr (The New York Times | Brown Institute for Media Innovation at Columbia University) et al
What can Data Science do for NYC? What can NYC do for Data Science? Deborah Estrin, first faculty member at CornellTech NYC, Chris Wiggins, cofounder of hackNY and member of the Institute for Data Sciences and Engineering at Columbia, and Yann Lecun, Director of the Center for Data Science at NYU, will answer these questions and more about the current and future of Data Science in NYC.
2:35pm-3:15pm (40m) Data Science
The Hidden Data Science Pipeline
Mark Mims (Infochimps)
This is a talk about the practice of data science. It's about taking all the implicit bits of the data science pipeline and exposing them to the light of day. We'll walk through developing and managing such a data science "pipeline" and cherrypick a few practices from the software development world to improve the quality and stability of results.
4:15pm-4:55pm (40m) Data Science
Making Big Data Small
Baron Schwartz (VividCortex)
What if data doesn't need to be big? Many use cases can be served well by a Small Data mindset, trading off accuracy in return for decreased cost. Examples include Bloom Filters, moving averages, and downsampling. This talk presents ideas and options you might not have considered for reducing big problems to comparatively small and cheap ones.
5:05pm-5:45pm (40m) Data Science
How to Stop Worrying and Start Modeling Big Data with Better Algorithms and H2O
Srisatish Ambati (0xdata Inc) et al
Get both Big Data AND Better Algorithms with opensource math and prediction engine, H2O. Once data science gets past scale & sampling: Asymmetric and unbalanced data and missing elements impact yields of popular algorithms in data science. We present life cycle of Big Data Modeling. H2O brings scale to the versatile R language bringing scale to the math community.
11:00am-11:40am (40m) Data, Connectivity, and Society
Data Philanthropy: Private-Public Sector Big Data Partnerships
Robert Kirkpatrick (UN Global Pulse) et al
A multi-presenter session with representatives from top private-sector companies explaining how, why & what data, tools or data science expertise they have shared for social good. UN Global Pulse Director Robert Kirkpatrick will wrap up with a reflection on how these front-runners can inspire others to share Big Data and how these modalities can be scaled-up.
11:50am-12:30pm (40m) Data, Connectivity, and Society
Drought Prediction and Ecological Monitoring with the Internet of Things
Adam Wolf (Princeton University) et al
This session summarizes our experiences as environmental scientists developing the hardware and network support to create the Internet of Things as we would wish it to be: to better link models with data. We describe the opportunities we envision and challenges we have faced in applying the hardware and propagating the data to a variety of model-intensive applications.
1:45pm-2:25pm (40m) Data, Connectivity, and Society
Big Data to Enable Connected Products
Ron Bodkin (Think Big Analytics)
Products of all kinds now include embedded software and sensors and are connected to the Internet. Their vendors can innovate with new analytic offerings, improve customer experience and improve service. This session looks at the business models emerging across industries, important data sets and emerging standards, the role of Big Data technologies, impediments to adoption and future directions
2:35pm-3:15pm (40m)
Session
To be confirmed
4:15pm-4:55pm (40m) Data, Connectivity, and Society
Data Governance for Regulated Industries Using Hadoop
Justin Makeig (MarkLogic)
Securely and cost-effectively managing petabytes of data from siloed systems is both a threat and opportunity for banking, healthcare, and other organizations in highly regulated industries. Drawn from production projects, this session will examine best practices around the use of Hadoop as part of a regulated data environment, including retention, provenance, privacy, and security.
5:05pm-5:45pm (40m) Data, Connectivity, and Society
Information Revolution In Government
James Stewart (Government Digital Service) et al
The UK Government team behind the GOV.UK website talk about their work on the Performance Platform, a suite of services and a cultural shift taking people away from immensely detailed value stream maps about a call-centre and paper process (which might be an inherently 5-day long journey), to something that's digital, lightweight, fast and pleasant to use.
11:00am-11:40am (40m) Sponsored
Disruptive Data Science Case Study: Visa's Big Data Response to Cyber Threats
Ravi Devireddy (Visa Inc) et al
In this talk Annika Jimenez will paint a picture of requirements for data science success, and Ravi Devireddy will discuss the challenges in cyber security, opportunities with hadoop & big-data, and present some use cases and applications. Both will share lessons learned on the bleeding edge of data science.
11:50am-12:30pm (40m) Sponsored
Shift into High Gear: Dramatically Improve Hadoop and NoSQL Performance
M. C. Srivas (MapR Technologies, Inc)
This session steps through how to double performance for MapReduce jobs, achieve high-speed data ingestion, and execute HBase apps 10X faster with consistent low latency.
1:45pm-2:25pm (40m) Sponsored
Predictable Performance at Scale is the Key to Shorter Time to Results
Jeff Denworth (DDN)
Big Data analytics is becoming a competitive advantage. However, traditional storage systems used for analytics are challenged with the performance and scale requirements. This creates bottlenecks and delays the time to results. Join us to learn how organizations are using high performance storage designed for parallel IO to eliminate bottlenecks and accelerate their analytics infrastructure.
2:35pm-3:15pm (40m) Sponsored
Hardening Hadoop for the Enterprise: Managing Diverse Workloads, Securing and Governing your Big Data Platform
Paul Kent (SAS)
How does IT balance the tension between “one glorious cluster that serves them all” and “one cluster, one purpose – dedicated for the particular task and not to be interfered with by anything”. Kerberos, C-groups and YARN to the rescue! This talk describes the current practices and speculates how things get better under YARN.
4:15pm-4:55pm (40m) Sponsored
Inject Big Data into your Corporate DNA: Enable Every Employee to Make Data Driven Decisions
Anurag Tandon (MicroStrategy)
Big data and big analytics will fundamentally transform how organizations conduct business and make decisions. But for that to happen, everyone in the organization needs access to tools and information. In this session, we'll look at what it takes to enable every employee to make data-driven decisions.
5:05pm-5:45pm (40m) Sponsored
Getting a Handle on Hadoop and its Potential to Catalyze a New Information Architecture Model
Milan Vaclavik (CenturyLink Technology Solutions)
Depending on who you talk to, Hadoop is either a massive disruption in IT, or a logical progression of existing technology trends. In this session, Savvis executives will provide a straightforward view of how Hadoop and related big data market dynamics fit into the broader IT market landscape. They will discuss why Hadoop alone is not a panacea for achieving information insight success...
11:00am-11:40am (40m) Design
Data Design for Chicago Energy
Aaron Wolf (Datascope Analytics) et al
Using a one of a kind dataset of gas and electric energy usage throughout the Chicago area, we built a tool that encourages Chicago citizens to be more energy efficient. The visual tool aligns with the goals of the City of Chicago while also being informative, educational, and encouraging action.
11:50am-12:30pm (40m) Design
Interactive Visualization of "Big" Data
Sean Kandel (Trifacta)
Effective visualization techniques and interaction methods for large data sets.
1:45pm-2:25pm (40m) Design
Non-linear Storytelling: Towards New Methods and Aesthetics for Data Narrative
Giorgia Lupi (Accurat)
How can a data-driven visualization tell multiple interplaying stories, and achieve a viable result in an abstract visual composition?
2:35pm-3:15pm (40m) Design
How to Avoid Some Different Graphical Mistakes
Naomi Robbins (NBR)
Readers and preparers of graphs: Learn to recognize and avoid some common graphical mistakes to understand your data better and make better decisions from data. Examples and mistakes will be different from those used in a similar presentation at the 2011 conference.
4:15pm-4:55pm (40m) Data, Connectivity, and Society
Every Soldier is a Sensor: Countering Corruption in Afghanistan
Amy Gaskins (MetLife)
The Army's Every Soldier is a Sensor (ES2) concept is entrenched in the belief that all soldiers, no matter their rank or specialty, can provide useful information on the battlefield. While deployed to Kandahar, Afghanistan, the 43d Sustainment Brigade put ES2 to the test: training soldiers to obtain critical information about corruption and using it to figure out where our money actually goes.
5:05pm-5:45pm (40m) Data, Connectivity, and Society
The Great Debate: A Connected World is a Better World
Jim Stogdill (O'Reilly Media, Inc.) et al
The Strata Great Debates return to New York with a discussion of the merits and drawbacks of what are rapidly becoming our prosthetic brains. In a vigorous Oxford Style debate, two teams try to convince the audience that they're right. We take score before and after their arguments, and declare a winner. Join us and help us decide whether a connected world is indeed a better one.
11:00am-11:40am (40m) Sponsored
Big Data in the Real World
Eron Kelly (Microsoft Corporation) et al
Learn more about how Microsoft’s Big Data tools are being used to change the way we all do business. Hear from Eron Kelly, General Manager, Microsoft and Albert Isern, CEO, Bismart, on how one of Europe’s largest cities provides a smart-city template that boosts collaboration between the City of Barcelona, its citizens and businesses and other global cities...
11:50am-12:30pm (40m) Sponsored
Flexible Schema and the End of ETL
Daniel Abadi (Yale University) et al
Although there are several SQL-on-Hadoop tools (a concept that Hadapt pioneered in 2009), these tools still rely on ETL (or MapReduce jobs) to structure raw data into a SQL-queryable format. Hear how Hadapt continues to lead the innovation curve with the Data-Driven Schema and Multi-Structured Tables, dramatically improving time-to-insight and depth of analytic possibility.
1:45pm-2:25pm (40m) Sponsored
High Performance, Scalable Big Data Solutions in a Bare Metal Cloud
Harold Hannon (SoftLayer)
The cloud provides an easy onramp to building and deploying Big Data solutions. Transitioning from initial deployment to large-scale, highly performant operations may not be as easy. Understanding the benefits, weaknesses, and performance characteristics of public and bare metal cloud deployments can help you make the right decisions.
2:35pm-3:15pm (40m) Sponsored
Between Real and Ideal in Big Data
Yongik Park (LG CNS )
In this session, we'll discuss some of the real business problems that arise when enterprises embrace big data, from defining requirements, to integrating systems, to managing and sharing resources.
4:15pm-4:55pm (40m) Sponsored
Achieving Real Success with Hadoop
Amir Halfon (MarkLogic)
The flexibility of Apache Hadoop is one of its biggest assets, letting organizations generate value from data that was previously considered too expensive to be stored and processed in traditional databases. But organizations still struggle to get the greatest business value out of their Hadoop deployments. One key concern is how to avoid ...
5:05pm-5:45pm (40m) Sponsored
The Sands of Time: How Cloud is Changing the Role of the CIO
Rod Smith (IBM Emerging Internet Technologies )
The Cloud: it offers opportunities and challenges for organizations as it represents a fundamental shift in how IT organizations provide support and services both internally and externally. Join us as we examine the opportunities and challenges of utilizing the cloud including its impact on the traditional enterprise IT leader - the CIO.
11:00am-11:40am (40m) Sponsored
Data Governance with the Intel Distribution for Apache Hadoop version 3.0
Ritu Kama (Intel) et al
Hadoop is a powerful and extensible platform for big data storage and processing needs. Join Ritu Kama and Vin Sharma, Hadoop product leads at Intel, to learn how the latest release of the Intel Distribution for Apache Hadoop brings together a number of security mechanisms - from role-based access control to fine-grained data auditing - to help enterprises ensure governance of their data lake.
11:50am-12:30pm (40m) Sponsored
Go Beyond SQL on Hadoop, Deep Answers to Today’s Business Questions
Peter Schlampp (Platfora)
Are you getting what you need from big data? If you’re using BI tools and SQL on Hadoop, you’re not. You need deeper insights than are possible with yesterday’s tools...
1:45pm-2:25pm (40m) Sponsored
Using A Visual Framework to Simplify ETL on Hadoop Without MapReduce or Programming
Mike Hoskins (Actian Corporation) et al
This session will address some of the biggest challenges faced by companies trying to do ETL or ELT on Hadoop and highlight how they can reuse existing skills to solve these challenges.
2:35pm-3:15pm (40m) Sponsored
Introducing a New Way to Interact with Insight
Stephanie McReynolds (ClearStory Data) et al
See a whole new way to speed the data processing cycle, converge and analyze diverse data, and interact with insights. Because the old approach limits how much data you can access and slows down decision-making. Join us to see a whole new data architecture and data application that converges more data, faster, from diverse sources, and allows a new level of interactive insights.
4:15pm-4:55pm (40m) Sponsored
Hadoop Appliances: Engineered for the Enterprise
Dan McClary (Oracle)
Organizations are experimenting with Hadoop, but spending too much time in configuration and maintenance. In this session, we'll consider the benefits of an appliance model and the future functionality of pre-integrated Hadoop clusters. Learn about the requirements for an enterprise Hadoop cluster and how a pre-integrated appliance can most efficiently deliver enterprise Hadoop needs.
5:05pm-5:45pm (40m) Sponsored
Red Hat solutions for real-world big data
Greg Kleiman (Red Hat) et al
In this session, we will discuss real-world customer deployment scenarios that succeeded with the help of Red Hat. We’ll show how these technologies can help harness data from a multitude of sources and turn it into your business advantages (or assets).
5:45pm-7:15pm (1h 30m) Event
Sponsor Pavilion Reception
Join your fellow big data enthusiasts at the Strata Conference & Hadoop World Sponsor Pavilion Reception on Tuesday, October 29.
8:45am-8:55am (10m)
Tuesday Keynote Welcome
Edd Dumbill (Silicon Valley Data Science) et al
Program Chairs, Edd Dumbill and Alistair Croll, welcome you to the first day of keynotes.
8:55am-9:10am (15m)
Hadoop's Impact on the Future of Data Management
Mike Olson (Cloudera)
As Hadoop and the surrounding projects & vendors mature, their impact on the data management sector is growing. Mike will talk about his views on how that impact will change over the next five years. How central will Hadoop be to the data center of 2020? What industries will benefit most? Which technologies are at risk of displacement or encroachment?
9:10am-9:20am (10m) Sponsored
Separating Hadoop Myths from Reality
Jack Norris (MapR Technologies)
According to Gartner, Hadoop is near the top of the Hype Cycle. While some customers have questions about the enterprise capabilities of Hadoop, the answers are clear as production deployments continue to expand. This session will use successful customer experiences to highlight the power of Hadoop and separate the myths from reality.
9:20am-9:30am (10m)
Big Impact from Big Data
Ken Rudin (Facebook)
In this talk, Ken will discuss several best practices focused on getting the biggest impact from big data and driving a proactive, data-driven culture.
9:30am-9:35am (5m) Sponsored
Five Surprising Mobile Trajectories in Five Minutes
Tony Salvador (Intel Corporation )
This talk will cover five major mobile trajectories for the next 10 years creating a brand new world : Seven billion futures, Hyper Digitization, Hyper Individualism, Hyper Collectivity & Hyper Differentiation.
9:35am-9:40am (5m) Sponsored
Can Big Data Reach One Billion People?
Quentin Clark (Microsoft)
The idea that big data will transform businesses and the world is indisputable, but are there enough resources to fully embrace this opportunity? Join Quentin Clark, Microsoft Corporate Vice President, who will share Microsoft’s bold goal to consumerize big data - simplifying the data science process and providing easy access to data with everyday tools.
9:40am-9:50am (10m)
What Makes Us Human? A Tale of Advertising Fraud
Claudia Perlich (Dstillery)
Coverage of online advertising fraud finally hit the newsstand a few months ago. But the story really started much earlier. Somewhat surprisingly it was predictive modeling on large data streams from real time bid environment that was the first to pick up symptoms of the yet largest online advertising scam. We tell the tale where models “too good to be true” lead to quite a sinister discovery.
9:50am-9:55am (5m) Sponsored
From Fiction to Facts with Big Data Analytics
Ben Werther (Platfora)
During the session attendees will learn how Big Data Analytics is the difference between fact-based enterprises and those focused on the shallow BI beauty contest.
9:55am-10:00am (5m)
Towards Strata 2014
Roger Magoulas (O'Reilly Media)
Roger Magoulas, incoming Strata chair and Director of Research at O'Reilly, will share insights into the state of data science as a profession and preview Strata in 2014.
10:00am-10:05am (5m)
Startup Showcase Winners Announced
Edd Dumbill (Silicon Valley Data Science) et al
A presentation of the winners from the Strata New York + Hadoop World 2013 Startup Showcase.
10:05am-10:20am (15m)
The Economic Potential of Open Data
Michael Chui (McKinsey Global Institute)
Michael Chui, Senior Fellow, McKinsey Global Institute
8:00am-8:45am (45m) Event
Tuesday Coffee BoFs
Have a particular topic you’d like to discuss with other Strata Conference + Hadoop World attendees during morning coffee? Join in or organize a Birds of a Feather discussion table in the Attendee Lounge (3rd floor). Sign-up board is near the Attendee Lounge.
10:30am-11:00am (30m)
Break: Break sponsored by Intel
3:15pm-4:15pm (1h) Event
Tuesday Afternoon BoFs
Have a particular topic you’d like to discuss with other Strata Conference + Hadoop World attendees? Join in or organize a Birds of a Feather discussion table in the Attendee Lounge (3rd floor). Sign-up board is near the Attendee Lounge.
12:30pm-1:45pm (1h 15m) Event
Tuesday Lunchtime Industry BoFs
Birds of a Feather (BoF) sessions are informal roundtable discussions happening throughout the day on Tuesday and Wednesday. Lunch BoFs will be organized around industries such as finance, media, retail, and more.
8:30pm-11:00pm (2h 30m)
Data After Dark
The must-attend data party of year, Data After Dark is hosted by O'Reilly Strata on Tuesday evening, October 29, from 8:30 to 11:00 pm at five venues in the West Village: The Madelyn: 82 West Third Street; Wicked Willy's: 149 Bleecker Street; GMT Tavern: 142 Bleecker Street; The Red Lion: 151 Bleecker Street; Amity Hall: 80 West Third Street
7:15pm-8:30pm (1h 15m)
Break: Dinner

Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences email mediapartners
@oreilly.com

Press & Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata + Hadoop World 2013 contacts