Strata + Hadoop World 2012 Schedule

Below are the confirmed and scheduled talks at Strata + Hadoop World 2012 (schedule subject to change).

Customize Your Own Schedule

Create your own Strata + Hadoop World schedule using the personal scheduler function. Mark the tutorials, sessions, keynotes, and events you want to attend by selecting the calendar icon [calendar icon] next to each listing. Then go to your personal schedule and get your own customized schedule generated.

See the list of all events happening onsite, starting on Monday, October 22.

Tuesday, 10/23/2012

9:00am

Visualization & Interface, Beekman / Sutton North (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Average rating: ****.
(4.14, 7 ratings)
Communicating Data Clearly describes how to draw clear, concise, accurate graphs that are easier to understand than many of the graphs one sees today. The tutorial emphasizes how to avoid common mistakes that produce confusing or even misleading graphs. Graphs for one, two, three, and many variables are covered as well as general principles for creating effective graphs. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Roy Hyunjin Han (CrossCompute)
Average rating: ***..
(3.62, 8 ratings)
Python is the language of choice when it comes to integrating analytical components. We will present a series of concepts and walkthroughs that illustrate how easy scientific computing is in Python, from machine learning and time series to spatial relationships and network analysis. Read more.
Hadoop: Tools & Technology, Murray Hill (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Mark Fei (Cloudera)
Average rating: ****.
(4.17, 18 ratings)
Apache Hadoop is enabling companies across many different industries that need to process and analyze large data sets. In this tutorial you will learn why and how people are using Hadoop and related technologies like Hive, Pig and HBase. Read more.
Hadoop: Tools & Technology, Gramercy Suite (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Tom Wheeler (Cloudera, Inc.)
Average rating: ****.
(4.62, 8 ratings)
This tutorial will explore the tools and techniques you need to ensure that your MapReduce applications are both correct and efficient. You'll learn how to do unit testing, integration testing and performance testing for your Hadoop jobs, as well as how to intepret diagnostic information to isolate and solve problems in your code. Read more.
Hadoop: Tools & Technology, Grand East (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Amandeep Khurana (Cloudera), Matteo Bertozzi (Cloudera)
Average rating: **...
(2.50, 10 ratings)
HBase is one of the more popular open source NoSQL databases that have cropped up over the last few years. Building applications that use HBase effectively is challenging. This tutorial is geared towards teaching the basics of building applications using HBase and covers concepts that a developer should know while using HBase as a backend store for their application. Read more.
Business & Industry Data Driven Business Day, Grand West (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
For business strategists, marketers, product managers, and entrepreneurs, Data Driven Business Day looks at how to use data to make better business decisions faster. Packed with case studies, panels, and eye-opening presentations, this fast-paced day focuses on how to solve today's thorniest business problems with Big Data. It's the missing MBA for a data-driven, always-on business world. Read more.
Data Science, Regent Parlor (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Susan E. McGregor (Columbia University), Alice Brennan (The New York World), Michael Sullivan (The New York World)
Average rating: ***..
(3.14, 7 ratings)
This tutorial will provide novice users with an overview of a range of common tools use for data cleaning and analysis - including Microsoft Excel, Google Refine, Python and R - along with their relative strengths and weaknesses. Attendees will not only learn useful new skills, and they will know what kind of expertise they need to seek out for help with more complex tasks. Read more.
Bridge to Big Data, Nassau (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Average rating: ****.
(4.33, 6 ratings)
For CIOs, IT executives, and technology professionals, Strata's Bridge to Big Data lays out the roadmap to get your organization up to speed on big data. In this all-day event, learn how to create big data strategy, manage your first pilot project, demystify vendor solutions and understand how big data differs from BI. Read more.

12:30pm

America's Hall (NY Hilton)
Lunch sponsored by Intel (1h)

1:30pm

Visualization & Interface, Beekman / Sutton North (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Average rating: ****.
(4.33, 6 ratings)
This workshop is a jumpstart lesson on how to get from a blank page and a pile of data to a useful data visualization. We'll focus on the design process, not specific tools. Bring your sample data and paper or a laptop; leave with new visualization ideas. Read more.
Hadoop: Tools & Technology, Sutton Center / Sutton South (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Ed Kohlwey (Booz Allen Hamilton), Stephanie Beben (Booz Allen Hamilton)
Average rating: ***..
(3.43, 7 ratings)
In this tutorial, we’ll provide an introduction to an open source Map/Reduce library for R called RHadoop that makes Map/Reduce programming convenient and easy to understand for statistical modeling users. The session will cover the basics of RHadoop, common techniques and best practices, and some interactive real-world examples. Read more.
Hadoop: Case Studies Hadoop: Tools & Technology, Murray Hill (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Sewook Wee (Accenture), Ryan Tabora (Think Big Analytics), Jason Rutherglen (Datastax)
Average rating: *....
(1.80, 5 ratings)
This tutorial will help participants understand why distributed search is important and teach them how to use the landscape of tools available. Based on our hands-on experience at NetApp, we will lead a tutorial session that will teach participants how to setup and use search technologies such as Apache Solr and Lucene to enable real-time Big Data analytics with Hadoop, HBase, and other NoSQL. Read more.
Hadoop: Tools & Technology, Gramercy Suite (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Dean Wampler (Typesafe)
Average rating: ***..
(3.75, 4 ratings)
This hands-on tutorial teaches you how to setup and use Hive, a high-level, data warehouse tool for Hadoop. Hive provides a SQL-like query language, HiveQL, that is easy to learn for people with prior SQL experience, making Hive attractive for data warehousing teams. Hive leverages the power of Hadoop for working with massive data sets without requiring expertise in MapReduce programming. Read more.
Business & Industry Data Science, Grand East (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Robert Grossman (Open Data Group), Collin Bennett (Open Data Group)
Average rating: ****.
(4.25, 4 ratings)
A successful big data analytic project is not just about selecting the right algorithm for building a predictive model, but also about how to deploy the model efficiently into operational systems, how to evaluate the effectiveness of the model, and how to continuously improve it. In this tutorial we cover best practices for each of these phases in the life cycle of a predictive model. Read more.
Hadoop: Tools & Technology, Regent Parlor (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Hari Shreedharan (Cloudera Inc.), Will McQueen (Cloudera Inc.), Arvind Prabhakar (Cloudera), Prasad Mujumdar (Cloudera Inc.), Mike Percy (Cloudera)
Average rating: ***..
(3.00, 4 ratings)
Apache Flume (incubating) is a scalable, reliable, fault-tolerant, distributed system designed to collect and transfer massive amounts of event data from disparate systems into some storage tier such as Hadoop HDFS. In this tutorial we show how to easily build a large-scale data collection and transfer system in a scalable way using Flume NG, the next generation of Flume. Read more.

5:00pm

Grand Ballroom Foyer (NY Hilton)
Average rating: *****
(5.00, 1 rating)
Join your fellow big data enthusiasts at the Strata Conference & Hadoop World Attendee Reception on on Tuesday, October 23. Read more.

6:30pm

Metropolitan West (Sheraton NY)
Average rating: *****
(5.00, 2 ratings)
Part of NYC Data Week. Don't miss Startup Showcase, Strata's live demo program and competition for startups and early-stage companies. Judges Tim O'Reilly and Fred Wilson will pick winners from 10 finalist companies selected to present at the showcase. Read more.

Wednesday, 10/24/2012

8:45am

Grand Ballroom (NY Hilton)
Edd Dumbill (Silicon Valley Data Science), Alistair Croll (Solve For Interesting)
Average rating: ****.
(4.00, 4 ratings)
Opening remarks by the Strata program chairs, Edd Dumbill and Alistair Croll. Read more.

8:55am

Grand Ballroom (NY Hilton)
Mike Olson (Cloudera)
Average rating: ***..
(3.72, 18 ratings)
Society confronts enormous challenges today: How will we feed nine billion people? How can we diagnose and treat diseases better, and more cheaply? How will we produce more energy, more cleanly, than ever before? Big questions like these demand new approaches, and "Big Data" is a crucial of the toolkit we will use over the coming years to answer them. Read more.

9:10am

Sponsored, Grand Ballroom (NY Hilton)
Ben Werther (Platfora)
Average rating: ***..
(3.20, 20 ratings)
Hadoop is scalable, inexpensive and can store near-infinite amounts of data. But driving it requires exotic skills and hours of batch processing to answer straightforward questions. Learn how everything is about to change. Read more.

9:15am

Grand Ballroom (NY Hilton)
Michael Flowers (NYC Mayor's Office of Policy and Strategic Planning)
Average rating: ****.
(4.47, 36 ratings)
New York City is a complex, thriving organism. Hear how data science has played a surprising and effective role in helping the city government provide services to over 8 million people, from preventing public safety catastrophes to improving New Yorkers' quality of life. Read more.

9:25am

Sponsored, Grand Ballroom (NY Hilton)
Annika Jimenez (Pivotal), Anthony Goldbloom (Kaggle)
Average rating: **...
(2.33, 18 ratings)
Data science is a team sport. Collaboration inside and outside your organization is the ultimate Big Data technique. Success depends on having a collaboration platform and solving the number one problem of the Big Data era: the supply and demand for data scientists. Learn how you can take action today to accelerate the success of your data science efforts. Read more.

9:35am

Grand Ballroom (NY Hilton)
Rich Hickey (Datomic)
Average rating: ***..
(3.44, 18 ratings)
While moving away from single powerful servers, distributed databases still tend to be monolithic solutions. But e.g. key-value storage is rapidly becoming a commodity service, on which richer databases might be built. What are the implications? Read more.

9:45am

Sponsored, Grand Ballroom (NY Hilton)
James Markarian (Informatica)
Average rating: **...
(2.54, 13 ratings)
Data integration for Big Data projects can consume up to 80% of the development effort and yet too many developers reinvent the wheel by hand-coding custom connectors, data parsers, and data integration transformations. A metadata-driven, codeless IDE with pre-built transformations and data quality rules have proven to be up to 10X more productive than hand coding and easier to maintain. Read more.

9:50am

Grand Ballroom (NY Hilton)
Sharmila Shahani-Mulligan (ClearStory Data)
Average rating: **...
(2.79, 19 ratings)
In recent years, "Big Data" has matured from a vague description of massive corporate data to a household term that refers to not just volume but the diversity of data and velocity of change. Today, there's a wealth of data trapped in corporate data repositories, new platforms like Hadoop, a new generation of data marketplaces and volumes generated hourly on the Web. Read more.

10:00am

Grand Ballroom (NY Hilton)
Tim Estes (Digital Reasoning)
Average rating: ***..
(3.75, 20 ratings)
The onset of the Big Data phenomenon has created a unique opportunity, but the challenge ahead of us is to move beyond Big Data infrastructure to morally and practically useful applications. This requires new technologies that close the "Understanding Gap" and, by doing so, can make great strides to prevent evil, reduce suffering, and create more actualized human potential. Read more.

10:50am

Hadoop: Case Studies, Beekman / Sutton North (NY Hilton)
Siraj Khaliq (The Climate Corporation)
Average rating: ***..
(3.50, 2 ratings)
Big Data takes on the planet’s toughest challenge by analyzing weather’s complex behavior. Using hundreds of terabytes of data and trillions of simulation datapoints, The Climate Corporation models weather’s impact on crops to create customized insurance for farmers facing the financial impact of extreme weather. Read more.
Data Science Hadoop: Case Studies, Sutton Center / Sutton South (NY Hilton)
Donald Miner (ClearEdge IT Solutions)
Average rating: ****.
(4.20, 10 ratings)
The Hadoop and data science communities have matured to the point now that common design patterns across domains are beginning to emerge. Now that Hadoop is maturing and momentum is gaining in the user base, the experienced users can start documenting design patterns that can be shared. In this talk, we'll talk about what makes up a MapReduce design pattern and give some examples. Read more.
Visualization & Interface, Murray Hill (NY Hilton)
Jesper Andersen (Bloom Studios)
Average rating: ***..
(3.45, 11 ratings)
In this session we will discuss how subjectivity can be encoded in data, and how this data can be used to help users experience a city more gracefully. We'll create maps and visualizations that re-enforce the ways users engage with cities and augment these experiences using social and crowd-sourced data sources, analytics and both artistic and literal visualization to convey this information. Read more.
Hadoop: Case Studies, Gramercy Suite (NY Hilton)
Erik Shilts (Opower)
Average rating: ***..
(3.56, 9 ratings)
How does Opower deliver insights to millions of households with big (and getting bigger) data? I discuss how to effectively use Hadoop, integrate it with R and Python, and harness an engaged workforce to solve data science and efficiency problems. Read more.
Hadoop: Tools & Technology, Grand East (NY Hilton)
Ben Werther (Platfora), Kevin Beyer (Platfora)
Average rating: ***..
(3.33, 6 ratings)
With traditional ETL (extract-transform-load) you need to decide how you want to transform and store the data before it arrives. Hadoop allows a much more agile pipeline – store the raw data, add a little metadata, and iteratively pull from it at whatever level of detail is needed right now by the application. We'll explore this approach and show you how you can start using it today Read more.
Hadoop & Beyond, Grand West (NY Hilton)
Michael Manoochehri (Google, Inc.), Jim Caputo (Google, Inc.)
Average rating: ***..
(3.47, 15 ratings)
Google’s Dremel is a scalable, interactive ad-hoc query system capable of running SQL-like queries over trillion-row tables in seconds. BigQuery is the externalization of this technology as a REST API and web app. This session will discuss the capabilities of Dremel and dive into the design challenges necessary to make this technology accessible and performant for developers and business users. Read more.
Sponsored, Regent Parlor (NY Hilton)
Tomer Shiran (MapR Technologies), Jack Norris (MapR Technologies)
Average rating: ***..
(3.67, 3 ratings)
Google pioneered the use of the MapReduce framework and inspired the creation of Hadoop through their 2004 white paper. To understand the future of Hadoop and the future of Big Data, it’s important to understand how Google processes and analyzes Big Data internally. Read more.
Sponsored, Nassau (NY Hilton)
PayPal utilizes Hadoop as a cost-effective data platform to handle growing data volumes. Hadoop along with other traditional data platforms serves different business needs at PayPal for customer sentiment analysis, fraud detection, market segmentation, etc. PayPal will share some early experiences with Informatica on Hadoop to move & integrate data on Hadoop & between different data platforms Read more.

11:40am

Data Science, Beekman / Sutton North (NY Hilton)
Anne Milgram (NYU Law Center on the Administration of Criminal Law Center)
Average rating: ****.
(4.00, 1 rating)
Anne Milgram, Senior Fellow at the NYU Law Center on the Administration of Criminal Law Center. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Ilya Grigorik (Google), Brian Doll (GitHub)
Average rating: ****.
(4.80, 5 ratings)
Open-source developers all over the world contribute to millions of projects every day on GitHub: writing and reviewing code, filing bug reports and updating docs. Data from these events provides an amazing window into open source trends: project momentum, language adoption, community demographics, and more. Read more.
Visualization & Interface, Murray Hill (NY Hilton)
Kim Rees (Periscopic)
Average rating: ***..
(3.80, 20 ratings)
Data has been locked in a mindset of rows and columns. Our brains are trapped by database schemas. To get out of that predisposition and communicate visually requires new thinking. This session covers techniques for reframing our thoughts about data, how to describe data, forming a narrative, and coming up with visual solutions. Read more.
Hadoop: Case Studies, Gramercy Suite (NY Hilton)
David Bauer (Data Tactics Corporation)
Average rating: ****.
(4.25, 4 ratings)
DCGS-Army Standard Cloud Multimedia (DSC-M) is focused on the Full Motion Video aspects of our Cloudera Hadoop-based implementation for the U.S. Army. Read more.
Hadoop: Tools & Technology, Grand East (NY Hilton)
Eric Sammer (ScalingData)
Average rating: ****.
(4.25, 8 ratings)
While many of the necessary building blocks for data processing exist within the Hadoop ecosystem, it can be a challenge to assemble them as a production ETL platform. This presentation covers one approach to data ingest, organization, format selection, process orchestration, and external system integration, based on collective experience acquired across many production Hadoop deployments. Read more.
Hadoop & Beyond, Grand West (NY Hilton)
Frank Weigel (Couchbase, Inc.)
Average rating: ****.
(4.00, 4 ratings)
OMGPOP’s Draw Something broke all records when it went viral, skyrocketing to more than 50 million downloads and billions of drawings within a few weeks of launch – with no downtime. This session highlights the application architecture and data management technology that enabled this growth, and provides a real-time data management model for developers of any interactive web application. Read more.
Sponsored, Regent Parlor (NY Hilton)
Average rating: ***..
(3.00, 6 ratings)
This presentation provides an overview of how to comprehensively address big data, including emerging strategies for information management, analytics, and high performance computing. Read more.
Sponsored, Nassau (NY Hilton)
David Jonker (SAP)
Average rating: ***..
(3.00, 1 rating)
Opposites attract and that’s the case with Hadoop and Enterprise Data Warehouses. Both have a role to play in your Big Data projects. This session explores the various approaches to marrying Hadoop to your EDW, and why you’ll want to do that in the first place. Read more.

12:20pm

America's Hall (NY Hilton)
Average rating: ****.
(4.00, 3 ratings)
Birds of a Feather (BoF) sessions are informal roundtable discussions happening during lunch on both days of the conference. You can join any BoF table or start your own with a topic of your choice. The BoF sign-up board will be near the Registration area. Read more.

1:40pm

Business & Industry, Beekman / Sutton North (NY Hilton)
Stan Humphries (Zillow)
Average rating: ****.
(4.00, 2 ratings)
Real estate used to be an industry that had large information asymmetry between professionals and consumers. Zillow has leveled the playing field through its living database on over 100 million homes. Advanced statistical modeling gives consumers even more information and tools, such as the well-known Zestimate. Using data and analytics, Zillow has changed the real estate industry forever. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Cathy O'Neil (Intent Media)
Average rating: ***..
(3.62, 8 ratings)
In this talk techniques from mathematical financial models will be compared and contrasted with methods coming from machine learning. Specifically, we will discuss the concept of time series data, taking account of seasonality, how to avoid overfitting, continuous updating, and fitting a bayesian prior to your data science model. We will also discuss the question of when to use what tools. Read more.
Visualization & Interface, Murray Hill (NY Hilton)
Richard Brath (Oculus), Noah Schwartz (Bloomberg Sports)
Average rating: *****
(5.00, 3 ratings)
MLB captures 10Tb of game data every year. While valuable data, lessons were quickly learned that effective use of this data required different visual front-ends for fans, players, coaches and scouts. The ability to adapt and address different audiences helped the success of this project and can help other big data projects. Read more.
Hadoop: Case Studies Hadoop: Tools & Technology, Gramercy Suite (NY Hilton)
Liyin Tang (Facebook), Vinod Venkataraman (Facebook), Charles Thayer (Facebook)
Average rating: ***..
(3.54, 13 ratings)
ODS is Facebook's internal large-scale monitoring system. HBase turns out be to a good fit for its workload and solves some manageability and scalability challenges with the previous MySQL based setup. We would like to share a series of valuable experiences learnt from building this large scale realtime system based on HBase. Read more.
Hadoop: Tools & Technology, Grand East (NY Hilton)
Arun Murthy (Hortonworks Inc.)
Average rating: ***..
(3.40, 5 ratings)
Apache Hadoop MapReduce has undergone a complete re-haul to emerge as Apache Hadoop YARN, a generic compute fabric to support MapReduce and other application paradigms. This really changes the game to recast Hadoop as a much more powerful data-processing system making Hadoop very different from itself 12 months ago. Now, ever wonder what it might look like in 12 months or 24 months or longer? Read more.
Hadoop & Beyond, Grand West (NY Hilton)
Rich Hickey (Datomic)
Average rating: ***..
(3.33, 3 ratings)
The big data movement has highlighted the value of historical information, and storage is readily available, so why are you still using an update-in-place database? In this talk we'll deconstruct the traditional monolithic database with an eye towards leveraging the scaling properties of distributed architectures, while meeting the business needs for complete historical information. Read more.
Sponsored, Regent Parlor (NY Hilton)
Shawn Bice (Microsoft)
Average rating: *****
(5.00, 4 ratings)
Big Data is attracting strong interest from technologists and business users alike. Yet few organizations can actually reap the benefits of Big Data today because the barriers to entry are still too high. Existing tools are complex and require deep expertise in Hadoop and Data Analysis that are both in short supply. Read more.
Sponsored, Nassau (NY Hilton)
Richard Daley (Pentaho Corporation)
Maximize the value of data stored in Hadoop via operational and ad-hoc reporting, highly interactive analysis, advanced visualizations and dashboards Read more.

2:30pm

Business & Industry, Beekman / Sutton North (NY Hilton)
Oscar Padilla (Entravision Communications), Franklin Rios (Luminar), Vineet Tyagi (Impetus Technologies)
Average rating: ***..
(3.67, 3 ratings)
How a traditional Spanish-language media company, made the strategic decision to build a robust analytics intelligence division to more effectively target the Hispanic market. Attendees will walk away with insights on how this traditional media company implemented a big data and MapReduce operations from the ground up. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Chang She (Cloudera)
Average rating: ***..
(3.50, 2 ratings)
Proper tooling and good habits that maximize reproducibility are essential to being productive as a data scientist. From management of raw data to model version control, the entire workflow must be carefully controlled from end-to-end to produce quality research that scales with the quantity and complexity of data being analyzed. Read more.
Visualization & Interface, Murray Hill (NY Hilton)
Juhan Sonin (Involution Studios)
Average rating: ****.
(4.25, 4 ratings)
hGraph is a compelling, standardized visual representation of a patient's health status for clinicians and patients. Designed to increase awareness of the individual's factors that can affect one's health and lead to improved outcomes, hGraph aggregates all of an individual's health metrics in one location, in a single picture. Read more.
Hadoop: Case Studies, Gramercy Suite (NY Hilton)
Michael Radwin (Intuit)
Average rating: **...
(2.83, 6 ratings)
Imagine the social graph where personal relationships are replaced by commercial relationships based on real financial data. Imagine the possibilities for small businesses to grow, connect, transact and prosper. Read more.
Hadoop: Tools & Technology, Grand East (NY Hilton)
Matt Winkler (Microsoft)
Average rating: ****.
(4.29, 7 ratings)
In this session we’ll discuss our experience extending Hadoop development to new platforms and languages, and key aspects of using non-JVM languages in the Hadoop environment. Read more.
Hadoop & Beyond, Grand West (NY Hilton)
Mike Driscoll (Metamarkets), Eric Tschetter (Metamarkets)
Average rating: ****.
(4.00, 9 ratings)
Hadoop is considered THE technology for addressing Big Data. While it shines as a processing platform, it does not respond anywhere close to "human time". In developing our solution, we needed the ability to query across billions of rows in seconds. Hear how and why we developed Druid, our distributed, in-memory OLAP data store after investigating various commercial and open source alternatives. Read more.
Sponsored, Regent Parlor (NY Hilton)
Greg Khairallah (Intel), Vin Sharma (Intel)
Average rating: **...
(2.50, 2 ratings)
Over the next decade, organizations will need to absorb, analyze, and act upon 50 times more data than they do today. To do this, they will need a scalable infrastructure that can support data-driven discovery and decision-making in real-time. Read more.
Sponsored, Nassau (NY Hilton)
Gary Dusbabek (Rackspace)
Average rating: ***..
(3.50, 4 ratings)
Monitoring thousands of servers generates a lot of data. Many organizations trying to harness the power of big data struggle with the same types of challenges as Rackspace's Cloud Monitoring team. Read more.

4:10pm

Business & Industry, Beekman / Sutton North (NY Hilton)
Kevin Foster (IBM)
Average rating: *....
(1.67, 3 ratings)
In this session, Kevin Foster, IBM Big Data Solution Architect, will provide an overview of big data analytic accelerators and how they are being used by organizations to speed up deployments and solve big data problems sooner. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Michael Stringer (Datascope Analytics)
Average rating: ****.
(4.50, 4 ratings)
An effective data science team looks a lot like an effective design team: brainstorming creative ideas, making prototypes, receiving feedback, telling stories, and deeply understanding the needs of others. Read more.
Visualization & Interface, Murray Hill (NY Hilton)
Lee Feinberg (DecisionViz)
Average rating: *....
(1.50, 4 ratings)
Attendees with learn practical examples how to build a collaborative environment that accelerates the value of big data, with the goal of “making data part of every conversation.” Read more.
Hadoop: Case Studies, Gramercy Suite (NY Hilton)
Ryan Brush (Cerner Corporation)
Average rating: ****.
(4.00, 8 ratings)
A look at using Hadoop, HBase and other technologies to bring together and process health data from many sources in real time. This includes techniques for dealing with data that's incomplete or out-of-order when it arrives, merging bulk and real-time data sets, and creating search indexes and data models to enable better health care. Read more.
Hadoop: Tools & Technology, Grand East (NY Hilton)
Sanjay Radia (Hortonworks), Todd Lipcon (Cloudera, Inc.)
Average rating: ****.
(4.57, 7 ratings)
Hadoop 2.0 offers significant HDFS improvements: new append-pipeline, federation, wire compatibility, NameNode HA, performance improvements, etc. We describe the new features and their benefits and our plans for HDFS over the next year which includes Snapshots, Disaster recovery, RAID, performance improvements etc. We conclude with some of the misconceptions and myths about HDFS. Read more.
Hadoop & Beyond, Grand West (NY Hilton)
David Blair (Akamai Technologies)
Average rating: ****.
(4.00, 2 ratings)
Trecul is a dataflow system that powers Akamai's Online Adversting business, processing billions of events hourly. Trecul is built on top of HDFS & Hadoop Pipes to achieve fantastic runtime performance. We'll talk about it's use of LLVM-based JIT compilation so everything runs as native C++ code, no Java and no runtime interpreter. Akamai has open-sourced Trecul and it is available on Github. Read more.
Sponsored, Regent Parlor (NY Hilton)
Roy Pea (Stanford University), Stephen Coller (Bill and Melinda Gates Foundation), H. Taylor Martin (Utah State University), Ken Koedinger (Carnegie Mellon)
Kicking off with an Ignite-style presentation on the growing importance of our topic, this panel will feature multiple perspectives on what K-12 education can learn from Big Data efforts underway in other industries Read more.
Sponsored, Nassau (NY Hilton)
Patrick Shumate (Comcast Cable), Raanan Dagan (Splunk)
Average rating: ***..
(3.50, 4 ratings)
How do you keep up with the velocity and variety of data streaming in from the operational systems that power your business? What about getting analytics on your data even before you persist and replicate it? Read more.

5:00pm

Business & Industry, Beekman / Sutton North (NY Hilton)
Raymie Stata (Altiscale)
Average rating: **...
(2.50, 2 ratings)
Success in Big Data requires both finding new signals buried in your data sources ("explore"), and using those signals to drive business value ("exploit"). Based on his background in Web Search and Internet Advertising, the speaker will describe these two aspects of Big Data and some of the success factors for each. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Amy OConnor (Nokia), Danielle Dean (Nokia)
Average rating: ***..
(3.71, 7 ratings)
Amy O'Connor, Sr. Director of Nokia Analytics, together with her daughter and Nokia Intern, Danielle Dean, will share what makes a great data scientist, their different paths to acquiring the diverse skill sets that are needed and finally Amy will discuss how to spot, attract and train emerging data scientists in what is quickly becoming a heated market. Read more.
Visualization & Interface, Murray Hill (NY Hilton)
Nigel Holmes (Explanation Graphics), Jon Peltier (Peltier Technical Services), Naomi Robbins (NBR)
Average rating: *....
(1.83, 6 ratings)
"Data visualization" means different things to different people. Some say that to be effective, visualizations need to be clear, concise and accurate. Others say that to be effective, visualizations need to be eye-catching, engaging, and innovative. Naomi Robbins will moderate a panel composed of Jon Peltier and Nigel Homes. Read more.
Hadoop: Case Studies, Gramercy Suite (NY Hilton)
Charles Schmitt (Renaissance Computing Institute)
Your DNA, written out as a string of G, A, T, and C, is about three and half gigabytes long. That string is about 99.9% identical to an arbitrary Reference Genome. Practically all of those differences are harmless, but a a tiny fraction can cause disease, contribute to disease, or just change how your body reacts to drugs. We're using Hadoop to find the variants that actually matter. Read more.
Hadoop: Tools & Technology, Grand East (NY Hilton)
Aaron Myers (Cloudera, Inc.), Todd Lipcon (Cloudera, Inc.)
Average rating: ***..
(3.67, 3 ratings)
The initial implementation of a highly-available HDFS NameNode successfully removed all single points of failure from HDFS. This talk discusses further improvements to this work, including automatic failure detection and failover initiation, as well as removing the dependency on an HA NFS filer. Read more.
Nilesh Jain (Intel Corp)
Average rating: ****.
(4.50, 2 ratings)
The exponential growth of graph-based data analysis is fueling the need for machine learning. Recently, frameworks have emerged to perform these computations at large scale. But, feeding data to these frameworks is a challenge in itself. This talk introduces the GraphBuilder library for Hadoop, which makes the job easier for programmers. Several case studies showacse the utility of library. Read more.
Sponsored, Regent Parlor (NY Hilton)
Rob Metcalf (Digital Reasoning), Laks Srinivasan (Opera Solutions)
This presentation will provide a detailed understanding of the latest techniques in entity resolution and simplified training of machine learning models and the direct impact on the quality of a comprehensive predictive analytics solution. Specific use cases in the financial services and intelligence communities will be featured. Read more.
Sponsored, Nassau (NY Hilton)
Jacob Rapp (Cisco Systems), Eric Sammer (ScalingData)
Average rating: *****
(5.00, 2 ratings)
In this joint session, experts from Cisco and Cloudera reveal the fundamental design considerations of Hadoop in the Enterprise Data Center. Drawing from lessons learned in the real world, they'll share best practices from deployments of Cloudera's Hadoop distribution alongside Cisco's networking components. Read more.

5:40pm

Grand Ballroom Foyer (NY Hilton)
Average rating: **...
(2.00, 1 rating)
Join your fellow big data enthusiasts at the Strata Conference & Hadoop World Attendee Reception on Wednesday, October 24. *Sponsored by Microsoft* Read more.

6:40pm

Liberty Theatre, 234 W 42nd Street
TBC

8:00pm

Liberty Theatre, 234 W 42nd Street
Average rating: *****
(5.00, 1 rating)
The must-attend data party of year, Data After Dark is hosted by O'Reilly Strata at Liberty Theatre off Broadway, on Wednesday evening, October 24. Read more.

Thursday, 10/25/2012

8:45am

Grand Ballroom (NY Hilton)
Edd Dumbill (Silicon Valley Data Science), Alistair Croll (Solve For Interesting)
Opening remarks by the Strata program chairs, Edd Dumbill and Alistair Croll. Read more.

8:50am

Grand Ballroom (NY Hilton)
Rick Smolan (Against All Odds Productions)
Average rating: ****.
(4.75, 8 ratings)
Over the past two decades, Rick Smolan, creator of the best selling "Day in the Life" books, has produced a series of ambitious global projects in collaboration with hundreds of the world’s leading photographers, writers, and graphic designers. This year Smolan invited more than 100 journalists around the globe to explore the world of Big Data. Read more.

9:00am

Grand Ballroom (NY Hilton)
Edd Dumbill (Silicon Valley Data Science), Alistair Croll (Solve For Interesting)
Average rating: **...
(2.00, 1 rating)
We’re excited to launch the Strata Data Innovation Awards to recognize disruptive, innovative technologies in big data and data science, highlight data science as an increasing importance for companies, and showcase the highlights of the growing data community. Read more.

9:10am

Sponsored, Grand Ballroom (NY Hilton)
John Schroeder (MapR Technologies)
Average rating: ***..
(3.50, 2 ratings)
This session will provide insights into how the combination of scale, efficiency, and analytic flexibility creates the power to expand the applications for Hadoop to transform companies as well as entire industries. Read more.

9:20am

Grand Ballroom (NY Hilton)
Doug Cutting (Cloudera)
Average rating: ****.
(4.17, 6 ratings)
Hadoop started as an offline, batch-processing system. It made it practical to store and process much larger datasets than before. Subsequently, more interactive, online systems emerged, integrating with Hadoop. Read more.

9:30am

Sponsored, Grand Ballroom (NY Hilton)
Paul Kent (SAS)
Average rating: ****.
(4.25, 4 ratings)
In this rapid-fire keynote, we’ll introduce how virtually every new technology trend is inextricably linked – or should be to attain maximum leverage. We’ll discuss how you can use technologies such as cloud and mobility to spread the value of analytics pervasively across your virtual organization, and how that positively impacts your employees, customers and partners. Read more.

9:35am

Grand Ballroom (NY Hilton)
Cathy O'Neil (Intent Media), Julie Steele (O'Reilly Media, Inc.)
Average rating: ***..
(3.50, 8 ratings)
A fireside chat with Cathy O'Neil about why universities can't make data scientists. Lots of companies want to hire data scientists, and there aren't enough to go around. Some universities are adding data science graduate departments, but they're facing an uphill battle, thanks to a lack of good data for academics, political infighting, and scalability issues. Read more.

9:45am

Sponsored, Grand Ballroom (NY Hilton)
Irfan Khan (SAP)
Average rating: ***..
(3.33, 6 ratings)
You need more than a database 'hammer' for today's Big Data projects. Organizations need a 'data platform' providing integrated tools to capture, store, process and present data. Without it companies can achieve - volume, velocity, or variety - but not all three. Join us to learn the extreme capabilities needed to distill new business signals from big data. Read more.

9:50am

Grand Ballroom (NY Hilton)
Joe Hellerstein (Trifacta and UC Berkeley)
The story of Big Data technology has centered on engines, algorithms, and statistical methods for data analysis. Less has been said-and too little has been done-regarding technology to improve the lives of data analysts. Read more.

10:00am

Grand Ballroom (NY Hilton)
Samantha Ravich (National Commission for the Review of R&D Programs in the Intelligence Community)
Average rating: ****.
(4.00, 7 ratings)
Samantha Ravich, former National Security Advisor to Vice President Richard Cheney, will discuss the challenges that face strategic decision makers from the wealth of data now provided by advances in technology. Read more.

10:50am

Business & Industry, Beekman / Sutton North (NY Hilton)
Robert Kirkpatrick (UN Global Pulse)
Average rating: ****.
(4.80, 5 ratings)
What can Big Data analysis tell us about human well-being? About how people cope with unemployment, rising food prices, or about people’s perceptions of HIV and other deadly diseases? A lot. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Wes McKinney (Cloudera)
Average rating: **...
(2.40, 5 ratings)
Data manipulation, cleaning, integration, and preparation can be one of the most time consuming parts of the data science process. In this talk I will discuss key points in the design and implementation of data structures and algorithms for structured data manipulation. It is an accumulation of lessons learned and experience building pandas, a widely-used Python data analysis toolkit. Read more.
Visualization & Interface, Murray Hill (NY Hilton)
Romy Misra (Visual.ly)
Average rating: **...
(2.60, 5 ratings)
How do you build technology to empower designers to create data visualizations? This talk is about thought principles and technologies exploring a few ways in which we can do so. Read more.
Hadoop: Case Studies, Gramercy Suite (NY Hilton)
Steve Yun (Allstate), Joseph Rickert (Revolution Analytics)
Average rating: ****.
(4.50, 2 ratings)
Building analytical models is a process of trial and error. Often it makes sense to sample down a data set so that numerous methods and new variables can be tried quickly. Consider moving to the entire data set with Hadoop only after the lessons gleaned from the failures have been incorporated into a few candidate models. Read more.
Data Science Hadoop: Tools & Technology, Grand East (NY Hilton)
Aaron Kimball (Magnify Consulting), Kiyan Ahmadizadeh (WibiData, Inc.)
Average rating: ****.
(4.33, 3 ratings)
Performing investigative analysis on data stored in HBase is challenging. Most tools operate on files stored in HDFS, and interact poorly with HBase's data model. This talk will describe characteristics of data in HBase and exploratory analysis patterns. We will describe best practices for modeling this data efficiently and survey tools and techniques appropriate for data science teams. Read more.
Paul Kent (SAS)
Average rating: ****.
(4.33, 3 ratings)
To unlock the value of Big Data, analytics must be applied. Some enterprises hire platoons of data analysts but many others can't afford to pring on such skilled and expensive resources. How do those businesses uncover opportunity and insight within Big Data assets? They use analytic tools that offload some data discovery to business professionals or deploy intelligent analytic appications. Read more.
Sponsored, Regent Parlor (NY Hilton)
Mike Maxey (Greenplum)
Average rating: **...
(2.00, 1 rating)
Join us for a live demonstration of how you can leverage a data science platform, an open-source model, internal and external data, analytics tools, and visualization using Hadoop. See how unprecedented access to data scientists can deliver entirely new levels of insight to push the boundaries of what’s possible. Find out what you can do NOW to move your data science efforts forward. Read more.
Sponsored, Nassau (NY Hilton)
Peter Schlampp (Platfora)
Richard Just, Big Data Program Manager at Capital One Labs, will share his experience using Hadoop and Platfora software to analyze several aspects of their business, including the adoption of their mobile application. The final solution produced an interactive, self-service web-based BI access to the data. Read more.

11:40am

Business & Industry, Beekman / Sutton North (NY Hilton)
Q Ethan McCallum (@qethanm), Brett Goldstein (University of Chicago)
Average rating: ****.
(4.67, 3 ratings)
We often hear of private-sector companies' use of sophisticated analytics in search of profit. What about the civic sector? How are local governments using their data to improve city services? This talk will explore how the Chicago Mayor's Office teamed up with civic-minded data scientists to pursue data mining solutions for some of the city’s experimental projects. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Ted Dunning (MapR)
Average rating: ***..
(3.67, 6 ratings)
This talk will describe how real-time learning can be used for advanced A/B testing as well as a variety of advertising and document targeting problems. The crux of these applications is the Bayesian Bandit algorithm. This algorithm is simple but provides state-of-the-art performance. This talk will be intuitive and practical, but not simple-minded. All code examples are available on github. Read more.
Visualization & Interface, Murray Hill (NY Hilton)
Joe Lamantia (Oracle Endeca)
Average rating: ***..
(3.00, 7 ratings)
This session presents a simple analytical and generative toolkit for interface design. It provides designers with an effective starting point for creating satisfying and relevant user experiences for Big Data and discovery interfaces. The toolkit helps designers understand and describe users' activities and needs, and then define and design the interactions and interfaces necessary. Read more.
Hadoop: Case Studies, Gramercy Suite (NY Hilton)
Sam Shah (LinkedIn), Joseph Adler (Interana, Inc.)
Average rating: *****
(5.00, 2 ratings)
Many companies use Hadoop for traditional data warehousing applications including data analysis, business reporting, and data storage. But you can use Hadoop to do much more. In this talk, we'll describe how LinkedIn uses Hadoop to create new content, develop recommendations, and send messages to users. Read more.
Hadoop: Tools & Technology, Grand East (NY Hilton)
Jonathan Hsieh (Cloudera, Inc)
Average rating: ****.
(4.50, 2 ratings)
As Apache HBase matures, the community has augmented it with new features that are considered hard requirements for many enterprises. We will discuss how the upcoming HBase 0.96 release addresses many of these shortcomings by introducing new features that will help the administrator minimize downtime, monitor performance, control access to the system, and geo-replicate data across data centers. Read more.
Hadoop & Beyond, Grand West (NY Hilton)
Kurt Brown (Netflix)
Average rating: ****.
(4.29, 7 ratings)
Our Data Science tech stack has shifted from best-of-breed, "classic" business intelligence technologies to a hybrid environment, fully leveraging Hadoop and other Big Data solutions. Our philosophy has also evolved, now distilled in thinking and practice into "data science as a service". Why did we do it? What does it look like? What are the benefits? Come find out. Read more.
Sponsored, Regent Parlor (NY Hilton)
Paul Groom (Kognitio)
Average rating: *....
(1.00, 1 rating)
Business users' attitude to data is changing rapidly – remember when building an EDW was all consuming? Now Big Data is edging the EDW to the side or likely into obscurity. Is this good or bad? How do you bring the values and software investment surrounding the EDW to the wild west of Big Data? Read more.
Sponsored, Nassau (NY Hilton)
This session explores the benefits and implications of virtualizing Hadoop and highlights several VMware initiatives aimed at bridging Hadoop and virtualization. Read more.

12:20pm

America's Hall (NY Hilton)
Birds of a Feather (BoF) sessions are informal roundtable discussions happening during lunch on both days of the conference. You can join any BoF table or start your own with a topic of your choice. The BoF sign-up board will be near the Registration area. Read more.

1:40pm

Data Science Hadoop: Tools & Technology, Beekman / Sutton North (NY Hilton)
Justin Erickson (Cloudera), Marcel Kornacker (Cloudera, Inc.)
Average rating: ****.
(4.00, 4 ratings)
This talk will cover what tools and techniques work and don’t work well for data scientists working on Hadoop today and how Cloudera Impala increases the productivity of data science and analysis on Hadoop. Cloudera Impala builds upon experiences and leading edge technology from big data systems at Facebook, Google, and Yahoo. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Justin Moore (Facebook)
Average rating: ***..
(3.43, 7 ratings)
Nearly a billion people actively create and modify nodes and their structured associations in the Facebook object graph. In this talk, Justin Moore describes how a small team within Facebook uses a combination of product, machine learning, and crowdsourcing to maintain and gain insight into this dataset. Read more.
Sponsored, Regent Parlor (NY Hilton)
Val Bercovici (NetApp)
Average rating: *****
(5.00, 1 rating)
Hadoop continues to climb the IT hype cycle. Along the way, plenty of truth, myth and folklore has been created around Hadoop's business capabilities and technical infrastructure requirements. Come hear NetApp’s real-world discoveries about Hadoop and find out what myths need retiring, as well as which truths need uncovering. Read more.
Sponsored, Nassau (NY Hilton)
George Mathew (Alteryx, Inc.)
The convergence of Analytics and the Cloud creates an interesting opportunity to solve many Big Data challenges that were previously untenable. Alteryx has historically served retailers and consumer brands on optimizing merchandising and store operations decisions with its Strategic Analytics product. Read more.
Business & Industry, Gramercy East (NY Hilton)
Susan E. McGregor (Columbia University), Kathleen Duff
Average rating: *****
(5.00, 2 ratings)
Does self-destructing data protect individuals' right to privacy and offer journalists an essential tool ability to protect their sources? Or would such a technology be a fundamental threat to effective law enforcement? We will describe the basic design of such a self-destructing data technology and discuss its disparate implications for individuals and government entities. Read more.
Hadoop & Beyond, Gramercy West (NY Hilton)
Kenneth Duda (Arista Networks), Amr Awadallah (Cloudera, Inc.)
Explore the network capabilities and architecture necessary to build multi-petabyte clusters. Compare and contrast different networking architectures for Big Data. Use real-world case studies from many of the largest HDFS deployments. Explain how topology aware file systems interact with the network substrate. Discuss differences in architecture based on workload profile and data set size Read more.
Visualization & Interface, Murray East (NY Hilton)
Bitsy Bentley (GfK Custom Research)
Average rating: ****.
(4.60, 5 ratings)
An increasing number of organizations are embracing data to drive intelligent decisions. For many industries, this is a monumental shift in method and culture. Data communication strategies come in many flavors, from static metric reports to immersive data experiences. In this session I present a user-centered framework for designing or evaluating data delivery methods. Read more.
Data Science, Murray West (NY Hilton)
Blake Shaw (Foursquare)
By applying machine learning algorithms to large aggregations of spatiotemporal data we can better understand how people interact with cities and build novel tools to help people navigate the real-world. Read more.

2:30pm

Hadoop: Tools & Technology, Beekman / Sutton North (NY Hilton)
Michael Segel (Segel & Associates.)
Average rating: ***..
(3.00, 3 ratings)
This is a presentation that talks about how cluster design impacts performance. The presentation will cover several different design options and the trade offs in terms of performance and cost. The talk will also cover some of the tuning options based on the underlying hardware considerations. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Claudia Perlich (Dstillery)
Average rating: ****.
(4.25, 4 ratings)
Building a reliable data-driven solution to a complex business problem is like designing a pocket watch from scratch. At the heart of successful analytics is the art of decomposing the looming big objective into smaller components, each of which may have its own data feed, modeling technique and runtime constraint. We showcase this process on the example of M6D’s online display advertising. Read more.
Sponsored, Regent Parlor (NY Hilton)
Rohit Valia (IBM)
New techniques like Hadoop are leading the way to provide a scalable and cost effective solution. This session reviews the technical requirements for a low latency multi-tenant 'big-data' cluster - one where different lines of business and multiple applications can be run with assured SLAs, resulting in higher ROI for these clusters. Read more.
Sponsored, Nassau (NY Hilton)
Chris Selland (HP Vertica), Jerome Levadoux (Autonomy)
Big data is everywhere, and it is increasingly complex and growing quickly, rendering manual and legacy approaches obsolete. Organizations can only realize the business value of big data with a meaning based platform technology that automatically understands all data, structured and unstructured, in real time. Join this session to learn more about Big Data and the technologies around it. Read more.
Business & Industry, Gramercy East (NY Hilton)
Jim Adler (Metanautix)
Average rating: ****.
(4.88, 8 ratings)
Since the first human scrawled an image on a cave wall, the brain has been processing petabytes of data. Today, we're passing through an historical threshold where big data is leaching out of our braincases into the disembodied cloud. For the first time in human existence, we can "think" outside of our brains. What does this mean for privacy, morality, ethics, and the law? Read more.
Hadoop & Beyond, Gramercy West (NY Hilton)
Matt Wood (Amazon Web Services)
Average rating: ****.
(4.00, 3 ratings)
In this talk we will explore how businesses are marrying human judgment with large scale processing, improving the accuracy of Big Data analytics without sacrificing efficiency or scalability. Real-world examples will be discussed in which Hadoop and crowdsourcing are combined through the Amazon Web Services technologies Elastic MapReduce and Mechanical Turk. Read more.
Visualization & Interface, Murray East (NY Hilton)
Hjalmar Gislason (DataMarket)
Average rating: ***..
(3.50, 2 ratings)
You want to publish your data for clients, developers or the general public to use and enjoy. But which file formats to use? Which standards? How to provide an API? Should you visualize the data? And if so, how? DataMarket has been on the receiving end of data from many of the World's key data providers and is now helping leading information companies publishing theirs. Here we share our findings. Read more.
Hadoop: Case Studies, Murray West (NY Hilton)
Average rating: *****
(5.00, 2 ratings)
Evaluating an experiment amidst the shifting landscape of continuous deployment is a difficult task as traditional methods of monitoring operational metrics don’t provide enough information to make product-level decisions. This talk will focus on the framework that we have built to solve this problem - from data logging to the final analysis that drive decision making and everything in between. Read more.

4:10pm

Hadoop: Tools & Technology, Beekman / Sutton North (NY Hilton)
Ron Bodkin (Think Big Analytics)
Average rating: ****.
(4.50, 2 ratings)
There has been a lot of excitement lately about streaming approaches to handling Big Data such as Storm, S4, SQLStream, and InfoStreams. But many use cases can be better handled by low latency access with NoSQL databases and search indexing backed by scoring with batch analytics in Hadoop. We compare such integrated Big Data with streaming systems and look to the future. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Roger Barga (Microsoft)
Average rating: *****
(5.00, 1 rating)
How do you build and deploy predictive analytics into ongoing business processes so results can be used in real-time to improve operations? This is a common request, in applications ranging from machine-to-machine to oil & gas and utilities. Learn how to leverage all your data assets – including sensor data – to build and operationalize predictive models that improve business operations. Read more.
Sponsored, Regent Parlor (NY Hilton)
Russ Kennedy (Cleversafe)
Average rating: ***..
(3.00, 2 ratings)
This session will delve into the MapReduce computation paradigm, introduced by Google and widely adopted via the open-source Hadoop platform, combined with commodity hardware to execute computation at the storage node where data exists. Read more.
Sponsored, Nassau (NY Hilton)
Yekesa Kosuru (Nokia), Jim Tommaney (InfiniDB)
Average rating: *****
(5.00, 1 rating)
Nokia’s Big Data analytics service is a strategic multi-tenant, multi-petabyte platform that executes 10,000 jobs each day. It is made up of technologies that provide location content processing, ETL, ad-hoc SQL, dashboards and advanced analytics, including Calpont InfiniDB for SQL, Scribe, REST, Hadoop, and R. This talk discusses the platform, motivations behind design choices, and challenges. Read more.
Business & Industry, Gramercy East (NY Hilton)
Adrian Woodhead (Expedia)
Being a data-driven organization is core to developing and growing a successful Internet company today. This session will delve into the data ownership implications and considerations product teams need to take into account as they build products and services aimed at growing their user base and scaling their companies’ business. Read more.
Hadoop: Tools & Technology, Gramercy West (NY Hilton)
Josh Patterson (Cloudera), Michael Katzenellenbogen (Cloudera)
Average rating: ****.
(4.00, 2 ratings)
In this session, we will introduce “Knitting Boar”, an open-source Java library for performing distributed online learning on a Hadoop cluster under YARN. We will give an overview of how Woven Wabbit works and examine the lessons learned from YARN application construction. Read more.
Visualization & Interface, Murray East (NY Hilton)
Lynn Cherny (Ghostweather Research & Design, LLC)
Average rating: ***..
(3.50, 6 ratings)
As data scientists, we encounter large networks all the time. Recommendations, social ties, transactions, and other types of data are naturally represented as networks. To understand these networks, metrics help, but visualization is crucial. This talk will focus on tools, techniques, and frameworks to visualize networks cleanly, avoiding or at least minimizing “hairballs”. Read more.
Hadoop: Case Studies, Murray West (NY Hilton)
Bala Venkatrao (Cloudera), Erich Hochmuth (Monsanto), Aparna Ramani (Cloudera), Mark Seidenstricker (Monsanto)
Average rating: *....
(1.00, 1 rating)
Managing Hadoop clusters to meet business needs can be challenging. Learn how Monsanto has effectively tamed the elephant using Cloudera Manager. Read more.

5:00pm

Hadoop & Beyond, Beekman / Sutton North (NY Hilton)
Gabriel Eisbruch (Mercadolibre.Com), Luis Darío Simonassi (MercadoLibre.Com), Jonathan Leibiusky (MercadoLibre.com)
Average rating: ***..
(3.33, 3 ratings)
The quantity of digital information collected and processed every day is growing at an exponential rate. To make sense of this mountain of data we can no longer afford the delays of batch processing systems. In this track we'll introduce Storm, a new, real-time analytic framework, and show how to use it to massively parallelize information analysis, to get instant results from your data. Read more.
Data Science, Sutton Center / Sutton South (NY Hilton)
Stefan Karpinski (The Julia Language), Jeff Bezanson (The Julia Language)
Average rating: ****.
(4.00, 1 rating)
Julia is a high-level, high-performance dynamic language for efficient, large-scale scientific and technical computing, which provides simple, flexible primitives for distributed computing, out of the box. These primitives allow various approaches to distributed computation to be implemented succinctly and easily, with high performance, entirely in Julia. Read more.
Sponsored, Regent Parlor (NY Hilton)
Thomas Strachan (GoodData)
Details to come... Read more.
Sponsored, Nassau (NY Hilton)
Steven Hillion (Alpine Data Labs)
It's not easy doing predictive analytics on Hadoop, with few tools that make it easier or more scalable than writing code from scratch. Join us to discuss a new paradigm that addresses the need for a scalable, powerful solution – one that is purpose-built for Big Data yet is easy to use – illustrated by a demonstration of predictive analytics run on the largest public Hadoop cluster in the world. Read more.
Business & Industry, Gramercy East (NY Hilton)
Jonathan Alexander (Vocalocity, Inc.)
Average rating: ***..
(3.00, 1 rating)
Jonathan Alexander, VP Engineering at Vocalocity and the author of Codermetrics (O’Reilly 2011) and Moneyball for Software Engineering (O’Reilly Radar 2011/2012) presents new ideas on how to gather data and use analytics to create more effective software development teams. Read more.
Hadoop & Beyond Hadoop: Tools & Technology, Gramercy West (NY Hilton)
Avi Bryant (Stripe)
Average rating: *****
(5.00, 3 ratings)
Start on low heat with a base of Hadoop; map, then reduce. Flavor, to taste, with Scala's concise, functional syntax and collections library. Simmer with some Pig bones: a tuple model and high-level join and aggregation operators. Mix in Cascading to hold everything together and boil until it's very, very hot, and you get Scalding, a new API for MapReduce out of Twitter. Read more.
Visualization & Interface, Murray East (NY Hilton)
Kevin Lynagh (Keming Labs), Kim Rees (Periscopic), Hadley Wickham (Rice University / RStudio), David Nolen (ShiftSpace)
Average rating: ****.
(4.00, 2 ratings)
Advances in browser and mobile technologies have made the visualization and interaction of data on web a viable alternative to traditional tools used to visually explore data. Panelists will discuss the current state of web data visualization, as well as novel approaches made possible by recent advances. Read more.
Hadoop: Tools & Technology, Murray West (NY Hilton)
Thejas Madhavan Nair (Hortonworks Inc), Jianyong Dai (Hortonworks)
Average rating: **...
(2.00, 1 rating)
Apache Pig makes Apache Hadoop easier to use thanks to its high-level data flow language, Pig Latin. In this talk, we will discuss common data analysis tasks, the choices one can make while writing a query and impact of each on performance. The core principles behind the optimization recommendations shared during this presentation are applicable to all MapReduce applications. Read more.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.