Skip to main content

Monday, 10/28/2013

8:00am

Sutton Foyer
Coffee Break (1h)

9:00am

Add to your personal schedule
Hadoop & Beyond Grand Ballroom West
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
Israel Ekpo (Walt Disney Parks and Resorts Online)
Average rating: *....
(1.19, 47 ratings)
This is a 3-hour tutorial on how to use Apache Flume to aggregate massive quantities of structured or unstructured data from sources such as log data, click streams, social media data, graph data and network traffic into centralized data stores such as HDFS, ElasticSearch, Neo4j and MongoDB so that they can be processed, digested and visualized in realtime using D3.js and HTML5 WebSockets. Read more.
Add to your personal schedule
Data Science Regent Parlor
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
Matt Harrison (FusionIO)
Average rating: ***..
(3.71, 7 ratings)
This Tutorial will jumpstart your Python experience. Learn the basics-enough Python to be dangerous. Then use two of the most popular packages for analysis, Matplotlib for plotting, and Pandas for data wrangling. This will be a hands-on tutorial, so bring a laptop with Python 2.7 installed, and the gumption to hit the ground running and see what everyone is raving about. Read more.
Add to your personal schedule
Hadoop in Action Sutton Center - Sutton South
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
Tom White (Cloudera), Eric Sammer (Cloudera), Joey Echeverria (Cloudera)
Average rating: ***..
(3.71, 14 ratings)
In this tutorial we'll use the Cloudera Development Kit (CDK) to build a Java web app that logs application events to Hadoop, and then run ad hoc and scheduled queries against the collected data. Read more.
Add to your personal schedule
Design Murray Hill Suite
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
Julie Rodriguez (Sapient Global Markets)
Average rating: *....
(1.00, 5 ratings)
Learn how to find beauty in data. The beauty of a visual is that it can communicate so much. As we become more sophisticated with the amount of data we can harness, it will become more important for us to be equally good at visually communicating that data. This workshop will guide attendees through the process of learning a method that will aide in selecting the right visualization. Read more.
Add to your personal schedule
Hardcore Data Science Gramercy Suite
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
Average rating: ***..
(3.14, 7 ratings)
Strata's regular data science track has great talks with real world experience from leading edge speakers. But we didn't just stop there—we added the Hardcore Data Science day to give you a chance to go even deeper. The Hardcore day will add new techniques and technologies to your data science toolbox, shared by leading data science practitioners from startups, industry, consulting and academia. Read more.
Add to your personal schedule
Data Science Nassau Suite
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
Antonio Piccolboni (Per data LLC), Joseph Rickert (Revolution Analytics)
Average rating: ***..
(3.40, 5 ratings)
This tutorial is aimed at R users who want to use Hadoop to work on big data and Hadoop users who want to do sophisticated analytics. We will introduce to R, Hadoop and the RHadoop project. We will then cover three R packages for Hadoop and the mapreduce model. We will present numerous examples of incremental complexity including the combination of rmr and RevoscaleR to solve modeling problems. Read more.
Add to your personal schedule
Hadoop & Beyond Rhinelander South
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
Tathagata Das (University of California Berkeley), Haoyuan Li (UC Berkeley), Ion Stoica (UC Berkeley), Reynold Xin (Databricks), Sameer Agarwal (UC Berkeley)
Average rating: ****.
(4.80, 10 ratings)
An introduction to the open-source Berkeley Data Analytics Stack (BDAS). Spark is a high-speed cluster computing engine that supports rich analytics (e.g. machine learning) and lower-latency processing (e.g. streaming). Tachyon provides in-memory storage, letting Spark and Hadoop jobs share data efficiently. Shark and GraphX provide high-speed Hive SQL queries and graph processing on top of Spark. Read more.
Add to your personal schedule
Data-Driven Business Beekman Parlor - Sutton North
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
Alistair Croll (Solve For Interesting)
Average rating: **...
(2.50, 4 ratings)
For business strategists, marketers, product managers, and entrepreneurs, Data-Driven Business looks at how to use data to make better business decisions faster. Packed with case studies, panels, and eye-opening presentations, this fast-paced day focuses on how to solve today's thorniest business problems with Big Data. It's the missing MBA for a data-driven, always-on business world. Read more.

1:30pm

Add to your personal schedule
Hadoop Platform Grand Ballroom West
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
John Akred (Silicon Valley Data Science), Richard Williamson (Silicon Valley Data Science), Stephen OSullivan (Silicon Valley Data Science)
Average rating: ***..
(3.71, 17 ratings)
What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive and realtime analytical workloads. Read more.
Add to your personal schedule
Hadoop & Beyond Regent Parlor
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
Patricia Gorla (The Last Pickle)
Average rating: ***..
(3.00, 12 ratings)
Before you analyze your big data, you need a way to store and access it. Here we examine the benefits of using a highly-available, eventually consistent storage system, and what impact this has on real-time analytics. This session will prepare you to set up a multi-node working Cassandra and Hadoop cluster. Read more.
Add to your personal schedule
Hadoop in Action Sutton Center - Sutton South
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
Sean Murphy (JHU), Benjamin Bengfort (Cobrain Company and University of Maryland)
Average rating: ****.
(4.56, 18 ratings)
Much of the world’s data (and your own) is text. The key to unlocking its value is in a series of Natural Language Processing transformations that turn raw strings into a machine usable form. We will use Hadoop alongside Python’s NLTK to do these steps and discuss why each is necessary in your application. Read more.
Add to your personal schedule
Data Science Murray Hill Suite
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
Giovanni Seni (Intuit)
Average rating: ***..
(3.44, 9 ratings)
This tutorial, based on a published book by the speaker, offers a hands-on intro to ensemble models, which combine multiple models into a single predictive system that’s often more accurate than the best of its components. Participants will use data sets and snippets of R code to experiment with the methods to gain a practical understanding of this breakthrough technology. Read more.
Add to your personal schedule
Data Science, Hadoop & Beyond Nassau Suite
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
Leah Hanson (Google)
Average rating: ****.
(4.00, 1 rating)
Julia is a high-performance, open source language with great tools for numerical and statistical work. If you know R, MATLAB, or NumPy, you will feel at home in Julia. Unlike these systems, however, Julia takes advantage of modern compiler technology, combining an intuitive programming model with the speed of a low-level language. This workshop will take you from installed to productive in Julia. Read more.
Add to your personal schedule
Data Science Rhinelander South
Tutorial Please note: to attend, your registration must include Tutorials on Monday.
Matthew Russell (Digital Reasoning)
Average rating: *****
(5.00, 8 ratings)
A code-intensive workshop that breaks down the nuts and bolts of using IPython Notebook to uncover insights from social web APIs such as Twitter, Facebook, LinkedIn, and Google+. Attendees with a basic programming background will walk away with a working knowledge of how to access and mine valuable information the social web. Read more.

5:00pm

Add to your personal schedule
Sponsor Pavilion
Average rating: *....
(1.00, 1 rating)
Grab a drink, mingle with fellow Strata participants on Monday, October 28, and see the latest technologies and products from leading companies in the data space. Read more.

6:30pm

Add to your personal schedule
3rd Floor Foyer
Part of NYC DataWeek. Don't miss Startup Showcase, Strata Conference + Hadoop World's live demo program and competition for startups and early-stage companies. The judges will pick winners from 10 finalist companies selected to present at the showcase. Read more.

8:00pm

Add to your personal schedule
Grand Ballroom
Average rating: ***..
(3.50, 4 ratings)
Ignite is back at Strata + Hadoop World. The theme reflects the conference’s focus on data science and visualization, with an emphasis on the wonder and mysteries that data science is stumbling into. Read more.

Tuesday, 10/29/2013

8:00am

Add to your personal schedule
Grand Ballroom Foyer
Have a particular topic you’d like to discuss with other Strata Conference + Hadoop World attendees during morning coffee? Join in or organize a Birds of a Feather discussion table in the Attendee Lounge (3rd floor). Sign-up board is near the Attendee Lounge. Read more.

8:45am

Add to your personal schedule
Grand Ballroom
Edd Dumbill (Silicon Valley Data Science), Alistair Croll (Solve For Interesting)
Average rating: ***..
(3.50, 10 ratings)
Program Chairs, Edd Dumbill and Alistair Croll, welcome you to the first day of keynotes. Read more.

8:55am

Add to your personal schedule
Grand Ballroom
Mike Olson (Cloudera)
Average rating: ***..
(3.45, 22 ratings)
As Hadoop and the surrounding projects & vendors mature, their impact on the data management sector is growing. Mike will talk about his views on how that impact will change over the next five years. How central will Hadoop be to the data center of 2020? What industries will benefit most? Which technologies are at risk of displacement or encroachment? Read more.

9:10am

Add to your personal schedule
Grand Ballroom
Jack Norris (MapR Technologies)
Average rating: **...
(2.80, 15 ratings)
According to Gartner, Hadoop is near the top of the Hype Cycle. While some customers have questions about the enterprise capabilities of Hadoop, the answers are clear as production deployments continue to expand. This session will use successful customer experiences to highlight the power of Hadoop and separate the myths from reality. Read more.

9:20am

Add to your personal schedule
Grand Ballroom
Ken Rudin (Facebook)
Average rating: ****.
(4.41, 37 ratings)
In this talk, Ken will discuss several best practices focused on getting the biggest impact from big data and driving a proactive, data-driven culture. Read more.

9:30am

Add to your personal schedule
Grand Ballroom
Tony Salvador (Intel Corporation )
Average rating: ***..
(3.08, 13 ratings)
This talk will cover five major mobile trajectories for the next 10 years creating a brand new world : Seven billion futures, Hyper Digitization, Hyper Individualism, Hyper Collectivity & Hyper Differentiation. Read more.

9:35am

Add to your personal schedule
Grand Ballroom
Quentin Clark (Microsoft)
Average rating: **...
(2.50, 10 ratings)
The idea that big data will transform businesses and the world is indisputable, but are there enough resources to fully embrace this opportunity? Join Quentin Clark, Microsoft Corporate Vice President, who will share Microsoft’s bold goal to consumerize big data - simplifying the data science process and providing easy access to data with everyday tools. Read more.

9:40am

Add to your personal schedule
Grand Ballroom
Claudia Perlich (Dstillery)
Average rating: ****.
(4.30, 23 ratings)
Coverage of online advertising fraud finally hit the newsstand a few months ago. But the story really started much earlier. Somewhat surprisingly it was predictive modeling on large data streams from real time bid environment that was the first to pick up symptoms of the yet largest online advertising scam. We tell the tale where models “too good to be true” lead to quite a sinister discovery. Read more.

9:50am

Add to your personal schedule
Grand Ballroom
Ben Werther (Platfora)
Average rating: **...
(2.56, 9 ratings)
During the session attendees will learn how Big Data Analytics is the difference between fact-based enterprises and those focused on the shallow BI beauty contest. Read more.

9:55am

Add to your personal schedule
Grand Ballroom
Roger Magoulas (O'Reilly Media)
Average rating: **...
(2.89, 9 ratings)
Roger Magoulas, incoming Strata chair and Director of Research at O'Reilly, will share insights into the state of data science as a profession and preview Strata in 2014. Read more.

10:00am

Add to your personal schedule
Grand Ballroom
Edd Dumbill (Silicon Valley Data Science), Alistair Croll (Solve For Interesting)
Average rating: ***..
(3.75, 4 ratings)
A presentation of the winners from the Strata New York + Hadoop World 2013 Startup Showcase. Read more.

10:05am

Add to your personal schedule
Grand Ballroom
Michael Chui (McKinsey Global Institute)
Average rating: ***..
(3.67, 15 ratings)
Michael Chui, Senior Fellow, McKinsey Global Institute Read more.

11:00am

Add to your personal schedule
Hadoop Platform Grand Ballroom East
Jun Fang (Facebook)
Average rating: ***..
(3.64, 14 ratings)
Morse is a new system developed in Facebook, to transform its ETL pipeline from daily batch to realtime. It continuously moves, transforms and loads data from distributed log and sharded mysql db, into Hive data warehouse. HBase is used as underlying storage for incrementally updated table, while the data is exposed as external table into Hive for read processing. Read more.
Add to your personal schedule
Enterprise Data Grand Ballroom West
Matt Asay (MongoDB, Inc.)
Average rating: ***..
(3.89, 9 ratings)
For some, Hadoop is synonymous with "Big Data." But Hadoop is just one component of a successful Big Data architecture. NoSQL solutions like MongoDB also play a dominant role for storage and real-time data processing, and RDBMS has a place, too. This session will drill down on the different types of NoSQL databases and how they fill out Hadoop and RDBMS in a modern Big Data architecture. Read more.
Add to your personal schedule
Robert Kirkpatrick (UN Global Pulse), Mark Leiter (Nielsen)
A multi-presenter session with representatives from top private-sector companies explaining how, why & what data, tools or data science expertise they have shared for social good. UN Global Pulse Director Robert Kirkpatrick will wrap up with a reflection on how these front-runners can inspire others to share Big Data and how these modalities can be scaled-up. Read more.
Add to your personal schedule
Hadoop in Action Sutton Center - Sutton South
Richard Park (Linkedin Corp)
Average rating: ****.
(4.17, 6 ratings)
Azkaban is an open-source workflow management application developed at LinkedIn to schedule and run our Hadoop workflows. Sporting a beautiful web UI, it is designed to be scalable, reliable, modular, secure and extensible. Azkaban has been battle tested on LinkedIn's Hadoop clusters, driving all of our data products over the last few years. Read more.
Add to your personal schedule
Design Murray Hill Suite
Aaron Wolf (Datascope Analytics), Burton Rast (IDEO)
Average rating: ***..
(3.62, 8 ratings)
Using a one of a kind dataset of gas and electric energy usage throughout the Chicago area, we built a tool that encourages Chicago citizens to be more energy efficient. The visual tool aligns with the goals of the City of Chicago while also being informative, educational, and encouraging action. Read more.
Add to your personal schedule
Hadoop & Beyond Gramercy Suite
Ahmed Radwan (Google's Motorola Mobility)
Average rating: ****.
(4.20, 5 ratings)
Multi-tenancy is a reality for large-scale data systems, but it poses concerns about exposure of sensitive data. Using anonymization techniques, sensitive data can be protected in ways that maintains user privacy while preserving the ability to use the data effectively for operational needs. In this talk, we explore the challenges and lessons learned in building solutions for data anonymization Read more.
Add to your personal schedule
Sponsored Nassau Suite
Ravi Devireddy (Visa Inc), Annika Jimenez (Pivotal)
Average rating: **...
(2.80, 10 ratings)
In this talk Annika Jimenez will paint a picture of requirements for data science success, and Ravi Devireddy will discuss the challenges in cyber security, opportunities with hadoop & big-data, and present some use cases and applications. Both will share lessons learned on the bleeding edge of data science. Read more.
Add to your personal schedule
Sponsored Rhinelander South
Eron Kelly (Microsoft Corporation), Albert Isern (BISmart)
Average rating: *****
(5.00, 1 rating)
Learn more about how Microsoft’s Big Data tools are being used to change the way we all do business. Hear from Eron Kelly, General Manager, Microsoft and Albert Isern, CEO, Bismart, on how one of Europe’s largest cities provides a smart-city template that boosts collaboration between the City of Barcelona, its citizens and businesses and other global cities... Read more.
Add to your personal schedule
Data Science Beekman Parlor - Sutton North
John Foreman (MailChimp)
Average rating: ****.
(4.50, 10 ratings)
MailChimp's first big data effort, the Email Genome Project, was internal, focused on abuse-prevention. But once this centralized storage and analytics capability demonstrated its practical value, the company turned toward crafting user-facing big data products. This talk will detail the results of MailChimp's effort to democratize big data analysis in email marketing for their users. Read more.
Add to your personal schedule
Sponsored Rhinelander Center
Ritu Kama (Intel), Vin Sharma (Intel)
Average rating: ****.
(4.00, 3 ratings)
Hadoop is a powerful and extensible platform for big data storage and processing needs. Join Ritu Kama and Vin Sharma, Hadoop product leads at Intel, to learn how the latest release of the Intel Distribution for Apache Hadoop brings together a number of security mechanisms - from role-based access control to fine-grained data auditing - to help enterprises ensure governance of their data lake. Read more.

11:50am

Add to your personal schedule
Hadoop Platform Grand Ballroom East
Henry Robinson (Cloudera)
Average rating: ***..
(3.40, 5 ratings)
The increasing diversity of frameworks and workloads that run atop a Hadoop cluster gives more flexibility and power to users, but make it very difficult for an administrator to ensure that SLAs are met while allowing exploratory, ad-hoc usage to continue to use all spare capacity. We present our vision and implementation for generalised resource management on Hadoop, suitable for all uses. Read more.
Add to your personal schedule
Enterprise Data Grand Ballroom West
Stephen Brobst (Teradata Corporation), Ari Zilka (Hortonworks)
Average rating: ****.
(4.00, 6 ratings)
Hortonworks Chief Product Officer, Ari Zilka, and Teradata CTO, Stephen Brobst, show you when to use Hadoop and when to use an MPP relational data warehouse. At a conceptual level, it is easy to think they are interchangeable, but the differences overwhelm the similarities. Two of the most trusted experts in their fields examine how big data technologies are being used in practical deployments. Read more.
Add to your personal schedule
Adam Wolf (Princeton University), Kelly Caylor (Princeton University)
Average rating: ***..
(3.80, 5 ratings)
This session summarizes our experiences as environmental scientists developing the hardware and network support to create the Internet of Things as we would wish it to be: to better link models with data. We describe the opportunities we envision and challenges we have faced in applying the hardware and propagating the data to a variety of model-intensive applications. Read more.
Add to your personal schedule
Hadoop in Action Sutton Center - Sutton South
Adam Kawa (Spotify)
Average rating: ****.
(4.56, 9 ratings)
A trip into Hadoop jungle to show the most interesting, exciting and surprising places where we have been to while growing fast from a 60 to 690-node Hadoop cluster. We will expose our JIRA tickets, real graphs, statistics, even excerpts from our dialogues. We will share the mistakes that we made and describe the fixes that finally domesticated this love-demanding yellow elephant and its friends. Read more.
Add to your personal schedule
Design Murray Hill Suite
Sean Kandel (Trifacta)
Average rating: ****.
(4.38, 16 ratings)
Effective visualization techniques and interaction methods for large data sets. Read more.
Add to your personal schedule
Hadoop & Beyond Gramercy Suite
Average rating: ***..
(3.00, 11 ratings)
There is increasing demand to discover and explore data iteratively, interactively, and for real-time insights, which we lump together under the term Real-Time Analytical Processing (RTAP). This talk presents our efforts and experience on building the real-time analytical processing framework for several large websites, leveraging Spark and Shark research from UC Berkeley. Read more.
Add to your personal schedule
Sponsored Nassau Suite
M. C. Srivas (MapR Technologies, Inc)
Average rating: ***..
(3.00, 1 rating)
This session steps through how to double performance for MapReduce jobs, achieve high-speed data ingestion, and execute HBase apps 10X faster with consistent low latency. Read more.
Add to your personal schedule
Sponsored Rhinelander South
Daniel Abadi (Yale University), Matthew Grace (Objective Logistics)
Average rating: *....
(1.50, 2 ratings)
Although there are several SQL-on-Hadoop tools (a concept that Hadapt pioneered in 2009), these tools still rely on ETL (or MapReduce jobs) to structure raw data into a SQL-queryable format. Hear how Hadapt continues to lead the innovation curve with the Data-Driven Schema and Multi-Structured Tables, dramatically improving time-to-insight and depth of analytic possibility. Read more.
Add to your personal schedule
Data Science Beekman Parlor - Sutton North
Jonathan Natkins (WibiData), Juliet Hougland (Self)
Average rating: ****.
(4.20, 10 ratings)
Consumer expectations have dramatically increased and retailers must present relevant content to maintain a competitive advantage. This presentation will demo an e-commerce application with real-time, personalized recommendations and discuss combining open-source system architecture, based on HBase and Kiji, with good predictive model design to build a scalable, real-time recommendation system. Read more.
Add to your personal schedule
Sponsored Rhinelander Center
Peter Schlampp (Platfora)
Are you getting what you need from big data? If you’re using BI tools and SQL on Hadoop, you’re not. You need deeper insights than are possible with yesterday’s tools... Read more.

12:30pm

Add to your personal schedule
America's Hall 1 & 2
Birds of a Feather (BoF) sessions are informal roundtable discussions happening throughout the day on Tuesday and Wednesday. Lunch BoFs will be organized around industries such as finance, media, retail, and more. Read more.

1:45pm

Add to your personal schedule
Hadoop Platform Grand Ballroom East
Nick Dimiduk (Hortonworks, Inc)
Average rating: ****.
(4.50, 12 ratings)
Your application is out-growing its database, you've started shopping NoSQL options. Maybe you've adopted Hadoop into your Data Warehouse. You've heard HBase might be an appropriate technology, but you need to know more. This talk is for you. To understand its use, first understand how it works. This talk explores the design of HBase and its critical paths to ground an understanding of its use. Read more.
Add to your personal schedule
Enterprise Data Grand Ballroom West
Bill Schmarzo (EMC Consulting), John Akred (Silicon Valley Data Science), Anand Raman (Impetus Technologies, Inc.), Scott Rose (Think Big Analytics)
Average rating: ***..
(3.67, 3 ratings)
Opinions might be plentiful in big data, but experience is rare. Join this interactive session to quiz some of the industry's most seasoned minds about what works and what doesn't when it comes to bringing big data to business. Read more.
Add to your personal schedule
Ron Bodkin (Think Big Analytics)
Average rating: ***..
(3.00, 3 ratings)
Products of all kinds now include embedded software and sensors and are connected to the Internet. Their vendors can innovate with new analytic offerings, improve customer experience and improve service. This session looks at the business models emerging across industries, important data sets and emerging standards, the role of Big Data technologies, impediments to adoption and future directions Read more.
Add to your personal schedule
Hadoop in Action Sutton Center - Sutton South
Barry Livingston (Riot Games), Ben Werther (Platfora)
Average rating: ****.
(4.56, 9 ratings)
Riot Games has built the most played video game in the world - League of Legends - and they need to constantly monitor, develop, and update their games to keep players engaged. Learn about different data architecture approaches more about the Riot Games’ “Data Collection Pipeline” that provides insights into what’s needed to continuously improve the gamers experience. Read more.
Add to your personal schedule
Design Murray Hill Suite
Giorgia Lupi (Accurat)
Average rating: ****.
(4.13, 15 ratings)
How can a data-driven visualization tell multiple interplaying stories, and achieve a viable result in an abstract visual composition? Read more.
Add to your personal schedule
Hadoop & Beyond Gramercy Suite
Julien Le Dem (Twitter), Nong Li (Cloudera)
Average rating: ****.
(4.50, 10 ratings)
Parquet is a columnar file format for Hadoop that brings performance and storage benefits. It supports deeply nested data structures and is easy to extend and integrate with existing type systems. Read more.
Add to your personal schedule
Sponsored Nassau Suite
Average rating: ****.
(4.00, 1 rating)
Big Data analytics is becoming a competitive advantage. However, traditional storage systems used for analytics are challenged with the performance and scale requirements. This creates bottlenecks and delays the time to results. Join us to learn how organizations are using high performance storage designed for parallel IO to eliminate bottlenecks and accelerate their analytics infrastructure. Read more.
Add to your personal schedule
Sponsored Rhinelander South
Harold Hannon (SoftLayer Technologies)
The cloud provides an easy onramp to building and deploying Big Data solutions. Transitioning from initial deployment to large-scale, highly performant operations may not be as easy. Understanding the benefits, weaknesses, and performance characteristics of public and bare metal cloud deployments can help you make the right decisions. Read more.
Add to your personal schedule
Data Science Beekman Parlor - Sutton North
Moderated by:
Steve Lohr (The New York Times | Brown Institute for Media Innovation at Columbia University)
Panelists:
Chris Wiggins (hackNY/Columbia), Yann LeCun (NYU), Deborah Estrin (Cornell NYC Tech)
Average rating: ***..
(3.50, 4 ratings)
What can Data Science do for NYC? What can NYC do for Data Science? Deborah Estrin, first faculty member at CornellTech NYC, Chris Wiggins, cofounder of hackNY and member of the Institute for Data Sciences and Engineering at Columbia, and Yann Lecun, Director of the Center for Data Science at NYU, will answer these questions and more about the current and future of Data Science in NYC. Read more.
Add to your personal schedule
Sponsored Rhinelander Center
Mike Hoskins (Actian Corporation), Ari Zilka (Hortonworks)
Average rating: *****
(5.00, 1 rating)
This session will address some of the biggest challenges faced by companies trying to do ETL or ELT on Hadoop and highlight how they can reuse existing skills to solve these challenges. Read more.

2:35pm

Add to your personal schedule
Hadoop Platform Grand Ballroom East
Jonathan Hsieh (Cloudera, Inc)
Average rating: ****.
(4.57, 7 ratings)
Apache HBase is a robust random-access distributed datastore built upon Apache Hadoop’s HDFS and Apache ZooKeeper. This talk will describe themes emerging from recent features slated for the upcoming post-0.96 release. These include improvements for multi-tenant deployments; a focus on predictable latencies; and the proliferation of new extensions for features traditionally from databases. Read more.
Add to your personal schedule
Enterprise Data Grand Ballroom West
Srini Srinivasan (Aerospike Inc.)
Average rating: *....
(1.80, 5 ratings)
Internet environments for consumer-facing applications routinely demand high throughput while SLAs require100% uptime. This session reviews 10 practices for ensuring high performance and availability based on the real-world lessons of large-scale ad sector deployments where speed means 5 milliseconds, scale is 200,000 to 2 million TPS against terabytes of data, and downtime is not an option. Read more.
Regent Parlor
TBC
Add to your personal schedule
Hadoop in Action Sutton Center - Sutton South
Mark Slusar (Allstate)
Average rating: ****.
(4.00, 4 ratings)
After a successful round of Hadoop Data Science projects, a company will make a sizable Hadoop commitment. People, process, and technology stand at the tipping point for an exciting adventure in innovation and evolution that creates new possibilities. This presentation educates attendees on the changes from the traditional methods to the new methods and paints a vision of the future. Read more.
Add to your personal schedule
Design Murray Hill Suite
Average rating: ***..
(3.80, 15 ratings)
Readers and preparers of graphs: Learn to recognize and avoid some common graphical mistakes to understand your data better and make better decisions from data. Examples and mistakes will be different from those used in a similar presentation at the 2011 conference. Read more.
Add to your personal schedule
Hadoop & Beyond Gramercy Suite
Adam Fuchs (Sqrrl)
Average rating: **...
(2.75, 12 ratings)
The National Security Agency works with some of the world’s largest, most complex, and most sensitive datasets. In order to analyze this data, NSA has developed some powerful tools, such as Apache Accumulo. Come learn about NSA’s key lessons learned about building a Big Data platform from the former Technical Director of the Accumulo project at the NSA. Read more.
Add to your personal schedule
Sponsored Nassau Suite
Paul Kent (SAS)
Average rating: ****.
(4.00, 2 ratings)
How does IT balance the tension between “one glorious cluster that serves them all” and “one cluster, one purpose – dedicated for the particular task and not to be interfered with by anything”. Kerberos, C-groups and YARN to the rescue! This talk describes the current practices and speculates how things get better under YARN. Read more.
Add to your personal schedule
Sponsored Rhinelander South
Yongik Park (LG CNS )
Average rating: *****
(5.00, 1 rating)
In this session, we'll discuss some of the real business problems that arise when enterprises embrace big data, from defining requirements, to integrating systems, to managing and sharing resources. Read more.
Add to your personal schedule
Data Science Beekman Parlor - Sutton North
Mark Mims (Infochimps)
Average rating: **...
(2.67, 6 ratings)
This is a talk about the practice of data science. It's about taking all the implicit bits of the data science pipeline and exposing them to the light of day. We'll walk through developing and managing such a data science "pipeline" and cherrypick a few practices from the software development world to improve the quality and stability of results. Read more.
Add to your personal schedule
Sponsored Rhinelander Center
Stephanie McReynolds (ClearStory Data), Vaibhav Nivargi (ClearStory Data), Brian Zotter (ClearStory Data), Stephen McDaniel (Freakalytics)
Average rating: *****
(5.00, 1 rating)
See a whole new way to speed the data processing cycle, converge and analyze diverse data, and interact with insights. Because the old approach limits how much data you can access and slows down decision-making. Join us to see a whole new data architecture and data application that converges more data, faster, from diverse sources, and allows a new level of interactive insights. Read more.

3:15pm

Add to your personal schedule
Sponsor Pavilion
Have a particular topic you’d like to discuss with other Strata Conference + Hadoop World attendees? Join in or organize a Birds of a Feather discussion table in the Attendee Lounge (3rd floor). Sign-up board is near the Attendee Lounge. Read more.

4:15pm

Add to your personal schedule
Hadoop Platform Grand Ballroom East
Aaron Myers (Cloudera, Inc.), Shreepadma Venugopalan (Cloudera)
Average rating: ***..
(3.22, 9 ratings)
When Hadoop is used for sensitive data, security requirements arise that require strong authentication, authorization of data/resources, and data confidentiality. This session covers how various parts of the Hadoop ecosystem can interact in a secure way to address these requirements. We will focus on the advanced Apache Hive authorization features enabled by the Apache Sentry (incubating) project Read more.
Add to your personal schedule
Enterprise Data Grand Ballroom West
Average rating: *****
(5.00, 3 ratings)
Big data is transforming the cloud as it moves from web giants into the enterprise. To run today’s multiple workload types, infrastructure must be architected as a common software-defined platform that supports the key workload components for todays and tomorrow’s big data systems. We must plan now to accommodate explosive growth and the need for robust storage, networking and security. Read more.
Add to your personal schedule
Justin Makeig (MarkLogic)
Average rating: *....
(1.00, 4 ratings)
Securely and cost-effectively managing petabytes of data from siloed systems is both a threat and opportunity for banking, healthcare, and other organizations in highly regulated industries. Drawn from production projects, this session will examine best practices around the use of Hadoop as part of a regulated data environment, including retention, provenance, privacy, and security. Read more.
Add to your personal schedule
Hadoop in Action Sutton Center - Sutton South
Zach Snyder (The Walt Disney Company)
Average rating: ***..
(3.29, 7 ratings)
Managing Hadoop clusters to meet business needs can be challenging. Learn how Disney uses an integrated approach, leveraging both Hadoop-specific tools and common IT management tools to create a comprehensive management toolkit for our Hadoop clusters. Read more.
Add to your personal schedule
Amy Gaskins (MetLife)
Average rating: ****.
(4.00, 6 ratings)
The Army's Every Soldier is a Sensor (ES2) concept is entrenched in the belief that all soldiers, no matter their rank or specialty, can provide useful information on the battlefield. While deployed to Kandahar, Afghanistan, the 43d Sustainment Brigade put ES2 to the test: training soldiers to obtain critical information about corruption and using it to figure out where our money actually goes. Read more.
Add to your personal schedule
Hadoop & Beyond Gramercy Suite
Ari Gesher (Palantir Technologies), Danielle Kramer (Palantir Technologies)
Average rating: ***..
(3.67, 3 ratings)
AtlasDB is a bolt-on layer for a key-value stores (distributed or otherwise) that implements MVCC and guarantees ACID properties for eventually-consistent data stores. In this talk, we'll take a look at the protocol used to implement the transactions, talk about the performance tradeoffs from using transactions, and look at the transactions API it offers. Read more.
Add to your personal schedule
Sponsored Nassau Suite
Anurag Tandon (MicroStrategy)
Big data and big analytics will fundamentally transform how organizations conduct business and make decisions. But for that to happen, everyone in the organization needs access to tools and information. In this session, we'll look at what it takes to enable every employee to make data-driven decisions. Read more.
Add to your personal schedule
Sponsored Rhinelander South
Amir Halfon (MarkLogic)
The flexibility of Apache Hadoop is one of its biggest assets, letting organizations generate value from data that was previously considered too expensive to be stored and processed in traditional databases. But organizations still struggle to get the greatest business value out of their Hadoop deployments. One key concern is how to avoid ... Read more.
Add to your personal schedule
Data Science Beekman Parlor - Sutton North
Baron Schwartz (VividCortex Inc)
Average rating: ***..
(3.47, 19 ratings)
What if data doesn't need to be big? Many use cases can be served well by a Small Data mindset, trading off accuracy in return for decreased cost. Examples include Bloom Filters, moving averages, and downsampling. This talk presents ideas and options you might not have considered for reducing big problems to comparatively small and cheap ones. Read more.
Add to your personal schedule
Sponsored Rhinelander Center
Dan McClary (Oracle)
Average rating: *****
(5.00, 1 rating)
Organizations are experimenting with Hadoop, but spending too much time in configuration and maintenance. In this session, we'll consider the benefits of an appliance model and the future functionality of pre-integrated Hadoop clusters. Learn about the requirements for an enterprise Hadoop cluster and how a pre-integrated appliance can most efficiently deliver enterprise Hadoop needs. Read more.

5:05pm

Add to your personal schedule
Hadoop Platform Grand Ballroom East
Jing Zhao (Hortonworks, Inc.), Tsz-Wo Sze (Hortonworks Inc.)
Average rating: ****.
(4.50, 8 ratings)
In this talk, attendees will understand the high level design of HDFS snapshots, along with how snapshots can be used for data protection and disaster recovery. We will also talk about details of snapshot development and testing. In the end, we will explore how to build and improve other features on top of HDFS snapshots, including Distcp, HBase snapshots, and Hive table snapshots. Read more.
Add to your personal schedule
Enterprise Data Grand Ballroom West
Sumeet Singh (Yahoo!)
Average rating: ***..
(3.33, 3 ratings)
Enterprises thinking of adopting Hadoop are increasingly debating between on-premise and cloud-based models for their needs. We lay out a set of criteria to help enterprises evaluate their options. For the ones who have already made or have plans to make significant on-premise investments, we present an approach to manage Hadoop as a service with a P&L, and metering & billing provisions. Read more.
Add to your personal schedule
James Stewart (Government Digital Service), James Abley (Government Digital Service)
Average rating: *****
(5.00, 1 rating)
The UK Government team behind the GOV.UK website talk about their work on the Performance Platform, a suite of services and a cultural shift taking people away from immensely detailed value stream maps about a call-centre and paper process (which might be an inherently 5-day long journey), to something that's digital, lightweight, fast and pleasant to use. Read more.
Add to your personal schedule
Hadoop in Action Sutton Center - Sutton South
Erich Hochmuth (Monsanto), Amandeep Khurana (Cloudera)
Average rating: ***..
(3.75, 8 ratings)
Monsanto is building new technology driven products for their customers that will leverage big data. This talk describes how Monsanto is building these scalable applications with geospatial data, using Hadoop and HBase as the backend systems. Read more.
Add to your personal schedule
Moderated by:
Jim Stogdill (O'Reilly Media, Inc.)
Panelists:
Mona Vernon (Thomson Reuters), Trevor Hughes (International Association of Privacy Professionals), Randy Smerik (Osunatech, Inc.), Lisa Green (Common Crawl Foundation)
Average rating: *****
(5.00, 1 rating)
The Strata Great Debates return to New York with a discussion of the merits and drawbacks of what are rapidly becoming our prosthetic brains. In a vigorous Oxford Style debate, two teams try to convince the audience that they're right. We take score before and after their arguments, and declare a winner. Join us and help us decide whether a connected world is indeed a better one. Read more.
Add to your personal schedule
Hadoop in Action Gramercy Suite
Russell Sears (Microsoft)
REEF is a set of tools and services that make it easy to implement new scalable computational frameworks atop YARN, and to allow jobs to perform multiple types of computations, such as MapReduce, iterative machine learning and graph processing. We plan to support additional programming models over time. REEF is language-independent, allowing it to bridge the Java and .NET ecosystems. Read more.
Add to your personal schedule
Sponsored Nassau Suite
Milan Vaclavik (CenturyLink Technology Solutions)
Average rating: ****.
(4.00, 1 rating)
Depending on who you talk to, Hadoop is either a massive disruption in IT, or a logical progression of existing technology trends. In this session, Savvis executives will provide a straightforward view of how Hadoop and related big data market dynamics fit into the broader IT market landscape. They will discuss why Hadoop alone is not a panacea for achieving information insight success... Read more.
Add to your personal schedule
Sponsored Rhinelander South
Rod Smith (IBM Emerging Internet Technologies )
The Cloud: it offers opportunities and challenges for organizations as it represents a fundamental shift in how IT organizations provide support and services both internally and externally. Join us as we examine the opportunities and challenges of utilizing the cloud including its impact on the traditional enterprise IT leader - the CIO. Read more.
Add to your personal schedule
Data Science Beekman Parlor - Sutton North
Srisatish Ambati (0xdata Inc), Cliff Click (0xdata)
Average rating: ***..
(3.17, 6 ratings)
Get both Big Data AND Better Algorithms with opensource math and prediction engine, H2O. Once data science gets past scale & sampling: Asymmetric and unbalanced data and missing elements impact yields of popular algorithms in data science. We present life cycle of Big Data Modeling. H2O brings scale to the versatile R language bringing scale to the math community. Read more.
Add to your personal schedule
Sponsored Rhinelander Center
Greg Kleiman (Red Hat), Syed Rasheed (Red Hat)
Average rating: ****.
(4.00, 1 rating)
In this session, we will discuss real-world customer deployment scenarios that succeeded with the help of Red Hat. We’ll show how these technologies can help harness data from a multitude of sources and turn it into your business advantages (or assets). Read more.

5:45pm

Add to your personal schedule
Sponsor Pavilion
Average rating: ****.
(4.00, 1 rating)
Join your fellow big data enthusiasts at the Strata Conference & Hadoop World Sponsor Pavilion Reception on Tuesday, October 29. Read more.

8:30pm

Add to your personal schedule
West Village
Average rating: ***..
(3.50, 2 ratings)
The must-attend data party of year, Data After Dark is hosted by O'Reilly Strata on Tuesday evening, October 29, from 8:30 to 11:00 pm at five venues in the West Village: The Madelyn: 82 West Third Street; Wicked Willy's: 149 Bleecker Street; GMT Tavern: 142 Bleecker Street; The Red Lion: 151 Bleecker Street; Amity Hall: 80 West Third Street Read more.

Wednesday, 10/30/2013

8:00am

Add to your personal schedule
Sutton Foyer
Have a particular topic you’d like to discuss with other Strata Conference + Hadoop World attendees during morning coffee? Join in or organize a Birds of a Feather discussion table in the Attendee Lounge (3rd floor). Sign-up board is near the Attendee Lounge. Read more.

8:45am

Add to your personal schedule
Grand Ballroom
Edd Dumbill (Silicon Valley Data Science), Alistair Croll (Solve For Interesting)
Program Chairs, Edd Dumbill and Alistair Croll, welcome you to the second day of keynotes. Read more.

8:50am

Add to your personal schedule
Grand Ballroom
Doug Cutting (Cloudera)
Average rating: ***..
(3.31, 16 ratings)
Doug will talk broadly about the future capability of Hadoop in the context of the road traveled so far. What are the limits of Hadoop? How should you think about workloads like SQL and Search? What's next? Read more.

9:05am

Add to your personal schedule
Grand Ballroom
Josh Klahr (Pivotal)
Average rating: **...
(2.92, 13 ratings)
Data is coming at us from everywhere – in small quantities, large magnitudes, and in almost every format. As Pivotal’s Vice President of Data Platform Product Management, Josh Klahr has the know-how to provide insights on how to build an organization that strategically manages this data in today’s modern and complex enterprise environments. Read more.

9:15am

Add to your personal schedule
Grand Ballroom
David Parker (SAP)
Average rating: **...
(2.60, 10 ratings)
Big Data is impacting society in ways never possible before – enabling us all to gain insights that can transform the way we do business, work with others, and live our lives. SAP recognizes that this transformation needs grassroots support... Read more.

9:20am

Add to your personal schedule
Grand Ballroom
Shawndra Hill (University of Pennsylvania)
Average rating: ****.
(4.00, 12 ratings)
In this keynote I will discuss how TV networks and advertisers can derive value from all of the online social activity about TV. Read more.

9:30am

Add to your personal schedule
Grand Ballroom
Will Marshall (Planet Labs)
Average rating: ***..
(3.87, 15 ratings)
Planet Labs is launching the largest ever fleet of Earth-imaging satellites in December. These will enable high resolution imagery of the entire planet to be taken on a more frequent basis. The data is of large potential value: humanitarian applications range from monitoring deforestation and the ice caps to disaster relief and improving agriculture yields in developing nations. Read more.

9:40am

Add to your personal schedule
Grand Ballroom
John Choi (IBM)
Average rating: **...
(2.45, 11 ratings)
What is Big Data? What will it mean for my organization? What technologies do I need? In this session, we will provide a view of what Big Data really means for organizations and how people, processes, and technologies, when brought together, can catalyze a transformational journey. Read more.

9:45am

Add to your personal schedule
Grand Ballroom
Sharmila Shahani-Mulligan (ClearStory Data)
Average rating: **...
(2.10, 10 ratings)
Is your big data analysis constrained by slow cycles, specialist-only access, and a process of one-shot, big data analysis? Traditional approaches are painful, costly and tedious. See a whole new way to speed the cycle, converge and analyze diverse data, and interact on insights. Read more.

9:50am

Add to your personal schedule
Grand Ballroom
Jim Kaskade (Infochimps)
Average rating: ****.
(4.12, 16 ratings)
Data and analytics is a means to an end. Jim highlights a new revolution of analytic applications with some touching examples in the healthcare industry with cancer research and medication therapy management. Read more.

10:00am

Add to your personal schedule
Grand Ballroom
Peta Clarke (Black Girls Code - NY), Donna Knutt (Black Girls Code)
Average rating: ***..
(3.38, 8 ratings)
Details to come.. Read more.

10:05am

Add to your personal schedule
Grand Ballroom
Douglas Merrill (ZestFinance)
Average rating: **...
(2.42, 12 ratings)
Most people think success in big data analysis is about the right mix of vast amounts of data, mathematics and Ph.D.’s (oh my!). Those people are wrong. You need artistry too. This talk will provide some examples of "pure" ML failures and give suggestions on how to build an appropriately artistic team. Read more.

10:15am

Add to your personal schedule
Grand Ballroom
Foster Provost ( NYU | Stern )
Average rating: ***..
(3.55, 11 ratings)
Predictive analytics is one of the most mature areas of data science and an area where "big data" often is associated with competitive advantage. However, concrete results supporting the advantage conferred by big data are few and far between. Read more.

11:00am

Add to your personal schedule
Hadoop Platform Grand Ballroom East
Siddharth Seth (Hortonworks Inc), Hitesh Shah (Hortonworks Inc)
Average rating: ***..
(3.67, 6 ratings)
Apache Hadoop has become popular from its specialization in the execution of MapReduce programs. However, it has been hard to leverage existing Hadoop infrastructure for various other processing paradigms such as real-time streaming, graph processing and message-passing. Learn how this barrier was removed and how new applications are being built and run on Apache Hadoop. Read more.
Add to your personal schedule
Enterprise Data Grand Ballroom West
Q Ethan McCallum (@qethanm), Brett Goldstein (University of Chicago)
Average rating: **...
(2.78, 9 ratings)
Data analysis has become a key element of a business, yet there is painfully little guidance for leadership roles who are tasked with building and managing this critical function. We've spoken with various companies to get their take on how to build an analytics shop, and we'd like to share that information with you. Read more.
Add to your personal schedule
Sponsored Regent Parlor
David Parker (SAP)
Average rating: ***..
(3.75, 4 ratings)
Learn how solutions from SAP and our Hadoop partners can help your organization gain unprecedented insight from Big Data. Read more.
Add to your personal schedule
Hadoop in Action Sutton Center - Sutton South
Feng Peng (Twitter.com)
Average rating: ****.
(4.25, 8 ratings)
At Twitter our Hadoop-centric data analytics pipeline has been rapidly growing in terms of both size and complexity. With thousands of evolving data sources and analytics programs, orchestrating the analytics production becomes extremely difficult without a systematic solution. We will describe our production challenges and illustrate how the service we built help us address them. Read more.
Add to your personal schedule
Hadoop & Beyond Murray Hill Suite
Colin Marc (Stripe)
Average rating: ***..
(3.33, 3 ratings)
Most startups don't start to think about having a real analytics platform until it's too late, and Stripe is certainly no exception. In this session, I'll describe how we approached bulding such a platform, and walk through the steps (and missteps) we took in making our production data available in Hadoop - in realtime - for processing and querying. Read more.
Add to your personal schedule
Design Gramercy Suite A
Average rating: ****.
(4.57, 7 ratings)
This talk discusses the broad design considerations necessary for effective visualizations. Attendees will learn about purpose, content, structure, and formatting. We will also discuss why they must be selected in this order, and discuss the importance and impact each has on your visualization. Read more.
Add to your personal schedule
Sponsored Gramercy Suite B
John Choi (IBM)
Average rating: *****
(5.00, 2 ratings)
How can big data really help me? What's real and what's hype? How do I ensure my Big Data projects are successful? How do I get started? We will provide real world examples and heuristics from organizations successfully navigating their Big Data journey from early projects to organizational transformation. Read more.
Add to your personal schedule
Sponsored Rhinelander South
Brett Sheppard (Splunk)
Learn firsthand how a leading enterprise used Splunk and their Hadoop distribution to empower the organization with new access to Hadoop data. See how they got up and running in under an hour and enabled their developers to start writing big data apps. Read more.
Add to your personal schedule
Data Science Beekman Parlor - Sutton North
Ulrich Rueckert (Datameer)
Average rating: ****.
(4.50, 6 ratings)
Even if one has big data, sometimes there is a lack of key data. This is a problem for predictive analytics: if there is only a limited amount of training material (e.g. user ratings, categorized documents), then it is hard to generate accurate models. The talk introduces new semi-supervised learning methods to overcome this problem by utilizing the vast amount of unlabeled data. Read more.

11:50am

Add to your personal schedule
Hadoop Platform Grand Ballroom East
Jayant Shekhar (Cloudera Inc)
Average rating: ****.
(4.25, 8 ratings)
Hadoop has evolved significantly in recent years, today serving as a unified platform for near-real-time (NRT) and batch workflows, such as querying, analysis and alerting for logs and machine data. In this session, we'll dive into the details of using SolrCloud and Cloudera Impala together to serve search queries, by integrating Flume to stream events into Solr, Impala and HBase. Read more.
Add to your personal schedule
Enterprise Data Grand Ballroom West
John Akred (Silicon Valley Data Science), Stephen OSullivan (Silicon Valley Data Science)
Average rating: ****.
(4.50, 4 ratings)
A modern CIO rationalizing a company’s data architecture must consider a mix of deployment options like a utility executive has to invest in a good generation mix. We articulate a framework for applying the deployment levers available to architects as they plot a course forward in this era of big data technologies, born of our deep experience implementing the world's largest data platforms. Read more.
Add to your personal schedule
Sponsored Regent Parlor
Jorge A Lopez (Syncsort), Matt Brandwein (Cloudera)
Average rating: ****.
(4.17, 6 ratings)
Mainframe is Big Data too! Leveraging it in Hadoop creates a remarkable competitive advantage, but exploiting it without the right tools is nearly impossible, requiring you to wrestle with thousands of lines of Java, Pig, Hive, COBOL and more. This session presents a smarter way to ingest and process mainframe data in Hadoop, and how to bridge the technical, skill and cost gaps between the two. Read more.
Add to your personal schedule
Hadoop in Action Sutton Center - Sutton South
Chris Lintz (Comcast), Gabriel Commeau (Comcast)
Average rating: ****.
(4.00, 8 ratings)
Real-time analytics produced by IP video players help ensure that Comcast delivers the highest quality experience to customers. While ingesting as many messages as Tweets produced every day, these real-time insights are achieved through an in-house architecture leveraging Flume NG and Storm. Read more.
Add to your personal schedule
Hadoop & Beyond Murray Hill Suite
Carlos Guestrin (GraphLab Inc.), Joseph Gonzalez (UC Berkeley)
Average rating: ****.
(4.86, 7 ratings)
GraphLab is like Hadoop for graphs. Users express graph processing algorithms using a simple API and the GraphLab runtime efficiently executes that computation on multicore and distributed architectures. By leveraging advances in graph representation, asynchronous communication, and scheduling, GraphLab is able to achieve orders-of-magnitude performance gains over existing systems like Hadoop. Read more.
Add to your personal schedule
Design Gramercy Suite A
Richard Brath (Oculus), David Jonker (Oculus)
Average rating: ***..
(3.00, 8 ratings)
Visualizations of big graphs often look like spaghetti and can be difficult to use. Working backwards from the analytic questions, we will show some very different 2D and 3D visualizations for social networks. We'll also cover some of the challenges and discuss some open source tools. Read more.
Add to your personal schedule
Sponsored Gramercy Suite B
Arun Murthy (Hortonworks), Alan Gates (Hortonworks), Owen O'Malley (HortonWorks)
Average rating: ****.
(4.75, 4 ratings)
Apache Hive is the de facto standard for SQL-in-Hadoop today with more enterprises relying on this open source project than any alternative. New enterprise requirements for Hive to become more real time or interactive have evolved… and the Hive community has responded. Please join Arun Murthy, Owen O'Malley and Alan Gates to learn more about Stinger and improvements to Apache Hive. Read more.
Add to your personal schedule
Sponsored Rhinelander South
Rob Rosen (Pentaho), Andrew Robbins (Paytronix), Ross Macleod (Paytronix)
Average rating: ***..
(3.00, 3 ratings)
Attend this session to learn: What is 'data blending'? How you can take this "next step" with little investment or new skills. Examples of companies taking the "next step" in big data and benefiting from at-the-source Data Blending. Read more.
Add to your personal schedule
Data Science Beekman Parlor - Sutton North
Wes McKinney (DataPad Inc.)
Average rating: ****.
(4.00, 3 ratings)
This talk will look at end-to-end data workflows (i.e. the sequence of preparation, analysis, visualization, and collaboration) and discuss technologies and tools (both programming and UI-driven) that can help individuals and organizations do more with their data. Read more.

12:30pm

Add to your personal schedule
America's Hall 1 & 2
Birds of a Feather (BoF) sessions are informal roundtable discussions happening throughout the day on Tuesday and Wednesday. Lunch BoFs will be organized around industries such as finance, media, retail, and more. Read more.

1:45pm

Add to your personal schedule
Hadoop Platform Grand Ballroom East
Paul Kent (SAS)
Average rating: *....
(1.00, 1 rating)
Analytically focused organizations are building general purpose Hadoop Clusters and want to deploy a wide range of Analytic Software. As the level of data sharing goes up and the variety of tools used to access data increases, you’ll be faced with choices: what format to store your data in; what catalog to describe the data and its layouts; and how/when/where to decide between tools. Read more.
Add to your personal schedule
Enterprise Data Grand Ballroom West
Erin Shellman (Nordstrom), David Von Lehman (Nordstrom)
Average rating: ****.
(4.50, 10 ratings)
Nordstrom started modestly in 1901 as a small shoe store in Seattle, and has since expanded to 117 full-line department stores and 138 Rack stores across the country. The art of retailing has changed dramatically over the last century and retailers today are concerned with understanding customer behavior and preferences both in the physical world and online. Read more.
Add to your personal schedule
Sponsored Regent Parlor
Matt Schumpert (Datameer)
Average rating: ***..
(3.00, 2 ratings)
With data scientists in short supply, it's surprising that much of their precious time is spent doing "data plumbing"—preparing data or servicing business users rather than doing actual data science. In this session, we'll look at the gradual evolution of tools that's moving us towards self-service data science. Read more.
Add to your personal schedule
Hadoop in Action Sutton Center - Sutton South
Scott Sorensen (Ancestry.com)
Average rating: ***..
(3.50, 4 ratings)
New, affordable DNA sequencing will generate massive new flows of data. Ancestry.com currently manages 4 petabytes of searchable data and is on track to increase this figure exponentially with its new DNA product. Ancestry.com CTO, Scott Sorensen, explains how the company manages tremendous amounts of new data through two categories of Hadoop use cases: 1) analytics and 2) product features. Read more.
Add to your personal schedule
Hadoop & Beyond Murray Hill Suite
Dave Stokes (MySQL Community Team)
Average rating: **...
(2.40, 5 ratings)
MySQL 5.6 includes a NoSQL interface, using an integrated memcached daemon that can automatically store data and retrieve it from InnoDB tables, turning the MySQL server into a fast “key-value store” for single-row insert, update, or delete operations. This session explores using this interface and other 'simple' options for those with MySQL Databases instances seeking to explore big data access. Read more.
Add to your personal schedule
Lyndon Estes (Princeton University)
Knowing where farming occurs and where it will expand is crucial for understanding food security and our changing environment. However, the satellite-based maps we currently rely on are often inaccurate, particularly in Africa. Our project is harnessing open source software, big data, and crowdsourcing to create better crop field maps for Africa. Read more.
Add to your personal schedule
Sponsored Gramercy Suite B
Samuel Kommu (Cisco Systems)
Average rating: ****.
(4.50, 2 ratings)
Is it possible to use a BigData cluster for other applications? Should the cluster be virtualized or on bare metal? Local storage or Shared? Which Hadoop version? Cisco will examine and discuss some of these concepts, to help plan and optimize a Big Data cluster running multiple applications without impacting performance. Read more.
Add to your personal schedule
Sponsored Rhinelander South
Jim Englert (Gilt)
In July 2013 a team from Basho joined up with a team of Gilt engineers at Gilt's Dublin office to spend a few days testing how Riak would handle Gilt's production traffic on the company's main user store. In this talk Jim will discuss this process, the results of this stress test, and how Gilt--one of the top eCommerce companies in the U.S... Read more.
Add to your personal schedule
Data Science Beekman Parlor - Sutton North
Average rating: **...
(2.80, 10 ratings)
Voice of the customer (VOC) data is a rapidly growing, unstructured, untapped data source – for your web site and across social media sites. Topic discovery through clustering of user verbatims, integrated with decision support data, can unleash valuable, actionable insights from millions of customers. Read more.

2:35pm

Add to your personal schedule
Sponsored Regent Parlor
Paul Groom (Kognitio)
Is Hadoop ready for high-concurrency complex BI? Even with Hadoop 2.0 on the way? Advanced analytics requires rip-roaring performance and fast, low-latency execution. Disk is not the solution, in-memory is where the hot BI data needs to reside. This informative session will offer expert advice, opinions from the ""bleeding edge,"" and some hidden secrets from 25 years of work with big data. Read more.
Add to your personal schedule
Enterprise Data Sutton Center - Sutton South
Micheline Casey (Federal Reserve Board)
Average rating: *....
(1.67, 3 ratings)
Traditionally, security has tended to mean "lock down and protect". But there is a balance between securing data while still supporting information sharing and reuse. This presentation is meant to educate data management professionals at all levels how to manage this balance. Read more.
Add to your personal schedule
Hadoop Platform Murray Hill Suite
Greg Rahn (Cloudera)
Average rating: ****.
(4.75, 8 ratings)
Impala brings SQL to Hadoop, but it also brings SQL performance tuning to those using the platform. This technical session will cover several topics in Impala performance analysis to aid in answering the question “why is my query slow?” as well as practical tips and techniques to get the best performance from Impala. Read more.
Add to your personal schedule
Hadoop in Action Gramercy Suite A
RAVI HUBBLY (Lockheed Martin)
Enterprises continue to rely on legacy mainframe-based systems even though utilizing these legacy systems is prone to risks. This is mainly because prior efforts at modernization of these legacy systems have been difficult. In this topic we will discuss usage scenarios where utilizing Hadoop has assisted in modernizing legacy systems and position businesses for big data benefits. Read more.
Add to your personal schedule
Sponsored Gramercy Suite B
Michael Dobrovolsky (Morgan Stanley Wealth Management)
Average rating: **...
(2.60, 5 ratings)
Morgan Stanley is gaining deeper insights from big data to improve operational efficiency and customer value. In this session you’ll learn how Morgan Stanley matured its big data solution with Hadoop to scale big data deployments, leverage crowd innovation, and tackle the challenges associated with big data overload and complexity. Read more.
Rhinelander South
TBC
Add to your personal schedule
Data Science Beekman Parlor - Sutton North
Zack Exley (Wikimedia Foundation), Sahar Massachi (Independent)
Average rating: ****.
(4.00, 13 ratings)
There's something about AB testing that invites statistical malpractice, and that makes communication between academics and practitioners very difficult. Wikipedia's revenue is depends on doing testing right. We'd like to present simple methods that we believe accurately predict future performance from AB test results, while minimizing sample size, along with proofs from four years of test data. Read more.

3:45pm

Add to your personal schedule
Sponsored Regent Parlor
Luca Barone (Cisco), Charles Zedlewski (Cloudera), Timothy Weaver (Dannon), John Garris (UBS), Prakash Nanduri (Paxata), Howard Dresner (sandhill.com), Ben Haines (Box)
Average rating: ***..
(3.00, 1 rating)
Join a lively panel discussion moderated by the undisputed father of BI, Howard Dresner, featuring the emerging leaders of the Gen D Revolution: Luca Barone from Cisco, Timothy Weaver from Dannon, John Garris from UBS and Prakash Nanduri from Paxata. Read more.
Add to your personal schedule
Enterprise Data Sutton Center - Sutton South
Eddie Satterly (Splunk)
Average rating: ***..
(3.38, 8 ratings)
In this session you will hear from big data experts with real world experience on the architectural patterns and tools integrations used to solve real business problems with data. Read more.
Add to your personal schedule
Hadoop Platform Murray Hill Suite
Tanel Poder (Enkitec)
Average rating: *....
(1.50, 2 ratings)
If you are a developer or DBA with Oracle background and want to learn how Hadoop works, this session is for you. We will go through the Hadoop HDFS and MapReduce data processing flow and compare it to the already familiar Oracle database parallel processing - which should make understanding the internals of this new technology a breeze. Read more.
Add to your personal schedule
Hadoop in Action Gramercy Suite A
David Thompson (Western Union)
Average rating: ***..
(3.60, 5 ratings)
In business there are demands that, if not managed well, can cause friction. This friction can be between colleagues and it can be felt by customers and clients. Consider financial services. Leaders constantly face pressures, from meeting revenue targets and consumer needs to engaging in activities like honoring individuals’ privacy rights and protecting people and the business from fraud. Read more.
Add to your personal schedule
Sponsored Gramercy Suite B
Nirmal Ranganathan (Rackspace Hosting), David Dobbins (Rackspace Hosting)
Average rating: ***..
(3.00, 1 rating)
We'll discuss some of the use cases for when a virtual Hadoop cluster makes sense and share some of our experiences and some of the decisions that drove the product design of Rackspace Cloud Big Data; an upcoming HDP as a service offering from Rackspace Hosting. Read more.
Add to your personal schedule
Data Science Beekman Parlor - Sutton North
Robert Johnson (Interana)
Average rating: ***..
(3.30, 10 ratings)
Many of the world's largest datasets are time series. With today's technology the number of things in the world doesn't seem that big, but how those things change over time is. Unfortunately many data tools don't natively consider time a first-class concept. I'll be talking about a variety of ways to organize your data and architect your data systems to get the most out of your time-based data. Read more.

4:35pm

Add to your personal schedule
Enterprise Data Sutton Center - Sutton South
Volkmar Uhlig (Adello)
Average rating: ****.
(4.00, 5 ratings)
Machine-generated data is getting stale fast. Operational data, sensor data, or video feeds requires new automated approaches to capture value. In this sessions we will show how to apply the lessons learned from automated trading systems and high-frequency trading to today’s Big Data problems to monetize information. Read more.
Add to your personal schedule
Hadoop Platform Murray Hill Suite
Philip Zeyliger (Cloudera)
Average rating: ****.
(4.50, 6 ratings)
All is quiet on the log file front, but yet the system is down. What next? This talk will cover the tricks of the trade for debugging distributed systems. Motivated by experience gained diagnosing Hadoop, we’ll dig into the JVM, Linux esoterica, and outlier visualization. Read more.
Add to your personal schedule
Amie Elcan (CenturyLink)
Average rating: ***..
(3.00, 1 rating)
As use of the Internet evolves, the data collected about Internet traffic must evolve in parallel to ensure the performance of applications and to keep access affordable. The ability to characterize how the Internet is being used is essential to the telecom industry. Case studies using R and Python Pandas will be presented to demonstrate the power of analytics to answer strategic questions. Read more.
Add to your personal schedule
Data Science Beekman Parlor - Sutton North
Vaclav Petricek (eHarmony)
Average rating: ****.
(4.45, 11 ratings)
Humans have a mixed record in choosing romantic partners. Are looks or brains more important for a happy marriage? This session will show you how big data and large scale machine learning can help us model such a complex behavior and tell us which traits in a partner actually matter. Who knows - maybe hadoop will help you find Love ;-) Read more.

5:15pm

Add to your personal schedule
Sponsor Pavilion
Average rating: ****.
(4.00, 2 ratings)
Test session for ratings and feedback from mobile app Read more.

Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences email mediapartners
@oreilly.com

Press & Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata + Hadoop World 2013 contacts