Strata + Hadoop World 2012 Schedule

Below are the confirmed and scheduled talks at Strata + Hadoop World 2012 (schedule subject to change).

Customize Your Own Schedule

Create your own Strata + Hadoop World schedule using the personal scheduler function. Mark the tutorials, sessions, keynotes, and events you want to attend by selecting the calendar icon [calendar icon] next to each listing. Then go to your personal schedule and get your own customized schedule generated.

See the list of all events happening onsite, starting on Monday, October 22.

Beekman / Sutton North (NY Hilton)
10:50am Beyond Targeted Ads: Big Data for a Better World Robert Kirkpatrick (UN Global Pulse)
11:40am Text-mining Your City Q Ethan McCallum (@qethanm), Brett Goldstein (University of Chicago)
1:40pm Data Science on Hadoop: How Cloudera Impala Unlocks New Productivity and Insights Justin Erickson (Cloudera), Marcel Kornacker (Cloudera, Inc.)
2:30pm Is Your Cluster a Leaning Tower of Pisa? Michael Segel (Segel & Associates.)
4:10pm Real-time Big Data Without Streaming Ron Bodkin (Think Big Analytics)
5:00pm Realtime Processing with Storm Gabriel Eisbruch (Mercadolibre.Com), Luis Darío Simonassi (MercadoLibre.Com), Jonathan Leibiusky (MercadoLibre.com)
Murray Hill (NY Hilton)
10:50am Storytelling with Data Romy Misra (Visual.ly)
Sutton Center / Sutton South (NY Hilton)
11:40am Real-time Learning with Bayesian Bandits Ted Dunning (MapR)
1:40pm Taming the Object Graph Justin Moore (Facebook)
2:30pm The Art of Analytical Decomposition Claudia Perlich (Dstillery)
5:00pm Simple, Flexible Distributed Computing in Julia Stefan Karpinski (The Julia Language), Jeff Bezanson (The Julia Language)
Gramercy Suite (NY Hilton)
10:50am Start Small Before Going Big Steve Yun (Allstate), Joseph Rickert (Revolution Analytics)
11:40am How to Win Friends and Influence People (Using Hadoop) Sam Shah (LinkedIn), Joseph Adler (Interana, Inc.)
Grand East (NY Hilton)
10:50am Performing Data Science with HBase Aaron Kimball (Magnify Consulting), Kiyan Ahmadizadeh (WibiData, Inc.)
11:40am Upcoming Enterprise features in Apache HBase 0.96 Jonathan Hsieh (Cloudera, Inc)
Grand West (NY Hilton)
10:50am Hadoop Enables Business Analytics Paul Kent (SAS)
11:40am Netflix's Evolving Data Science Architecture Kurt Brown (Netflix)
Regent Parlor (NY Hilton)
10:50am Demonstrating The Future of Data Science Mike Maxey (Greenplum)
11:40am The Death of the Enterprise Data Warehouse Paul Groom (Kognitio)
5:00pm BizData Monetization: Turn Your Data into Dollars Thomas Strachan (GoodData)
Murray East (NY Hilton)
1:40pm Designing for Data-driven Organizations Bitsy Bentley (GfK Custom Research)
2:30pm Best Practices for Publishing Data Hjalmar Gislason (DataMarket)
4:10pm Visualizing Networks Lynn Cherny (Ghostweather Research & Design, LLC)
5:00pm Web Data Visualization: What's Becoming Easy, What's Becoming Possible Kevin Lynagh (Keming Labs), Kim Rees (Periscopic), Hadley Wickham (Rice University / RStudio), David Nolen (ShiftSpace)
Murray West (NY Hilton)
4:10pm Taming the Elephant – Learn How Monsanto Manages Their Hadoop Cluster to Enable Genome/Sequence Processing Bala Venkatrao (Cloudera), Erich Hochmuth (Monsanto), Aparna Ramani (Cloudera), Mark Seidenstricker (Monsanto)
5:00pm Making Pig Fly: Optimizing Data Processing on Hadoop Thejas Madhavan Nair (Hortonworks Inc), Jianyong Dai (Hortonworks)
Gramercy East (NY Hilton)
1:40pm This Message Will Self Destruct: The Implications of Self-Destructing Digital Data Susan E. McGregor (Columbia University), Kathleen Duff
2:30pm Big Data is a Hotbed of Thoughtcrime. So What? Jim Adler (Metanautix)
5:00pm Using Data to Tune A Software Team Jonathan Alexander (Vocalocity, Inc.)
Gramercy West (NY Hilton)
1:40pm Designing Scalable Network Architectures for Fast Moving Big Data Kenneth Duda (Arista Networks), Amr Awadallah (Cloudera, Inc.)
2:30pm Combining Hadoop & Crowdsourcing Matt Wood (Amazon Web Services)
4:10pm Knitting Boar Josh Patterson (Cloudera), Michael Katzenellenbogen (Cloudera)
5:00pm Scala + Cascading = Scalding Avi Bryant (Stripe)
Nassau (NY Hilton)
1:40pm Building the Next Platform for Analytic Apps in the Cloud George Mathew (Alteryx, Inc.)
2:30pm Big Data: Turning the Information Overload into an Information Advantage Chris Selland (HP Vertica), Jerome Levadoux (Autonomy)
5:00pm Scalable, Accessible, Predictive Analytics on Hadoop Steven Hillion (Alpine Data Labs)
8:45am Plenary
Room: Grand Ballroom (NY Hilton)
Thursday Welcome Edd Dumbill (Silicon Valley Data Science), Alistair Croll (Solve For Interesting)
8:50am Plenary
Room: Grand Ballroom (NY Hilton)
The Human Face of Big Data Rick Smolan (Against All Odds Productions)
9:00am Plenary
Room: Grand Ballroom (NY Hilton)
Strata Data Innovation Awards 2012 Edd Dumbill (Silicon Valley Data Science), Alistair Croll (Solve For Interesting)
9:10am Plenary
Room: Grand Ballroom (NY Hilton)
Hadoop: Thinking Big John Schroeder (MapR Technologies)
9:20am Plenary
Room: Grand Ballroom (NY Hilton)
Beyond Batch Doug Cutting (Cloudera)
9:30am Plenary
Room: Grand Ballroom (NY Hilton)
Cloud, Mobile and Big Data – How Analytics Provides Value to the Buzzwords Paul Kent (SAS)
9:35am Plenary
Room: Grand Ballroom (NY Hilton)
They Don't Teach You That In School Cathy O'Neil (Intent Media), Julie Steele (O'Reilly Media, Inc.)
9:45am Plenary
Room: Grand Ballroom (NY Hilton)
From Traditional Database to Big Data Platform Irfan Khan (SAP)
9:50am Plenary
Room: Grand Ballroom (NY Hilton)
Of Rocket Ships and Washing Machines: Data Technology for People Joe Hellerstein (Trifacta and UC Berkeley)
10:00am Plenary
Room: Grand Ballroom (NY Hilton)
Are We Really Winning the Information Revolution? Samantha Ravich (National Commission for the Review of R&D Programs in the Intelligence Community)
8:00am Coffee break sponsored by Versant
Room: Grand Ballroom Foyer (NY Hilton)
10:20am Morning Break sponsored by SAS
Room: Break
3:10pm Afternoon Break sponsored by SAP
Room: Break
12:20pm Lunch sponsored by MapR Technologies
Room: America's Hall (NY Hilton)
Thursday Lunch and BoFs
10:50am-11:30am (40m) Business & Industry
Beyond Targeted Ads: Big Data for a Better World
Robert Kirkpatrick (UN Global Pulse)
What can Big Data analysis tell us about human well-being? About how people cope with unemployment, rising food prices, or about people’s perceptions of HIV and other deadly diseases? A lot.
11:40am-12:20pm (40m) Business & Industry
Text-mining Your City
Q Ethan McCallum (@qethanm) et al
We often hear of private-sector companies' use of sophisticated analytics in search of profit. What about the civic sector? How are local governments using their data to improve city services? This talk will explore how the Chicago Mayor's Office teamed up with civic-minded data scientists to pursue data mining solutions for some of the city’s experimental projects.
1:40pm-2:20pm (40m) Data Science, Hadoop: Tools & Technology
Data Science on Hadoop: How Cloudera Impala Unlocks New Productivity and Insights
Justin Erickson (Cloudera) et al
This talk will cover what tools and techniques work and don’t work well for data scientists working on Hadoop today and how Cloudera Impala increases the productivity of data science and analysis on Hadoop. Cloudera Impala builds upon experiences and leading edge technology from big data systems at Facebook, Google, and Yahoo.
2:30pm-3:10pm (40m) Hadoop: Tools & Technology
Is Your Cluster a Leaning Tower of Pisa?
Michael Segel (Segel & Associates.)
This is a presentation that talks about how cluster design impacts performance. The presentation will cover several different design options and the trade offs in terms of performance and cost. The talk will also cover some of the tuning options based on the underlying hardware considerations.
4:10pm-4:50pm (40m) Hadoop: Tools & Technology
Real-time Big Data Without Streaming
Ron Bodkin (Think Big Analytics)
There has been a lot of excitement lately about streaming approaches to handling Big Data such as Storm, S4, SQLStream, and InfoStreams. But many use cases can be better handled by low latency access with NoSQL databases and search indexing backed by scoring with batch analytics in Hadoop. We compare such integrated Big Data with streaming systems and look to the future.
5:00pm-5:40pm (40m) Hadoop & Beyond
Realtime Processing with Storm
Gabriel Eisbruch (Mercadolibre.Com) et al
The quantity of digital information collected and processed every day is growing at an exponential rate. To make sense of this mountain of data we can no longer afford the delays of batch processing systems. In this track we'll introduce Storm, a new, real-time analytic framework, and show how to use it to massively parallelize information analysis, to get instant results from your data.
10:50am-11:30am (40m) Visualization & Interface
Storytelling with Data
Romy Misra (Visual.ly)
How do you build technology to empower designers to create data visualizations? This talk is about thought principles and technologies exploring a few ways in which we can do so.
11:40am-12:20pm (40m) Visualization & Interface
The Language of Discovery: A Toolkit for Designing Big Data Interfaces and Interactions
Joe Lamantia (Oracle Endeca)
This session presents a simple analytical and generative toolkit for interface design. It provides designers with an effective starting point for creating satisfying and relevant user experiences for Big Data and discovery interfaces. The toolkit helps designers understand and describe users' activities and needs, and then define and design the interactions and interfaces necessary.
10:50am-11:30am (40m) Data Science
Building Rich, High Performance Tools for Practical Data Analysis
Wes McKinney (DataPad Inc.)
Data manipulation, cleaning, integration, and preparation can be one of the most time consuming parts of the data science process. In this talk I will discuss key points in the design and implementation of data structures and algorithms for structured data manipulation. It is an accumulation of lessons learned and experience building pandas, a widely-used Python data analysis toolkit.
11:40am-12:20pm (40m) Data Science
Real-time Learning with Bayesian Bandits
Ted Dunning (MapR)
This talk will describe how real-time learning can be used for advanced A/B testing as well as a variety of advertising and document targeting problems. The crux of these applications is the Bayesian Bandit algorithm. This algorithm is simple but provides state-of-the-art performance. This talk will be intuitive and practical, but not simple-minded. All code examples are available on github.
1:40pm-2:20pm (40m) Data Science
Taming the Object Graph
Justin Moore (Facebook)
Nearly a billion people actively create and modify nodes and their structured associations in the Facebook object graph. In this talk, Justin Moore describes how a small team within Facebook uses a combination of product, machine learning, and crowdsourcing to maintain and gain insight into this dataset.
2:30pm-3:10pm (40m) Data Science
The Art of Analytical Decomposition
Claudia Perlich (Dstillery)
Building a reliable data-driven solution to a complex business problem is like designing a pocket watch from scratch. At the heart of successful analytics is the art of decomposing the looming big objective into smaller components, each of which may have its own data feed, modeling technique and runtime constraint. We showcase this process on the example of M6D’s online display advertising.
4:10pm-4:50pm (40m) Data Science
Predictive Modeling and Operational Analytics over Streaming Data
Roger Barga (Microsoft)
How do you build and deploy predictive analytics into ongoing business processes so results can be used in real-time to improve operations? This is a common request, in applications ranging from machine-to-machine to oil & gas and utilities. Learn how to leverage all your data assets – including sensor data – to build and operationalize predictive models that improve business operations.
5:00pm-5:40pm (40m) Data Science
Simple, Flexible Distributed Computing in Julia
Stefan Karpinski (The Julia Language) et al
Julia is a high-level, high-performance dynamic language for efficient, large-scale scientific and technical computing, which provides simple, flexible primitives for distributed computing, out of the box. These primitives allow various approaches to distributed computation to be implemented succinctly and easily, with high performance, entirely in Julia.
10:50am-11:30am (40m) Hadoop: Case Studies
Start Small Before Going Big
Steve Yun (Allstate) et al
Building analytical models is a process of trial and error. Often it makes sense to sample down a data set so that numerous methods and new variables can be tried quickly. Consider moving to the entire data set with Hadoop only after the lessons gleaned from the failures have been incorporated into a few candidate models.
11:40am-12:20pm (40m) Hadoop: Case Studies
How to Win Friends and Influence People (Using Hadoop)
Sam Shah (LinkedIn) et al
Many companies use Hadoop for traditional data warehousing applications including data analysis, business reporting, and data storage. But you can use Hadoop to do much more. In this talk, we'll describe how LinkedIn uses Hadoop to create new content, develop recommendations, and send messages to users.
10:50am-11:30am (40m) Data Science, Hadoop: Tools & Technology
Performing Data Science with HBase
Aaron Kimball (Magnify Consulting) et al
Performing investigative analysis on data stored in HBase is challenging. Most tools operate on files stored in HDFS, and interact poorly with HBase's data model. This talk will describe characteristics of data in HBase and exploratory analysis patterns. We will describe best practices for modeling this data efficiently and survey tools and techniques appropriate for data science teams.
11:40am-12:20pm (40m) Hadoop: Tools & Technology
Upcoming Enterprise features in Apache HBase 0.96
Jonathan Hsieh (Cloudera, Inc)
As Apache HBase matures, the community has augmented it with new features that are considered hard requirements for many enterprises. We will discuss how the upcoming HBase 0.96 release addresses many of these shortcomings by introducing new features that will help the administrator minimize downtime, monitor performance, control access to the system, and geo-replicate data across data centers.
10:50am-11:30am (40m) Hadoop & Beyond, Hadoop: Tools & Technology
Hadoop Enables Business Analytics
Paul Kent (SAS)
To unlock the value of Big Data, analytics must be applied. Some enterprises hire platoons of data analysts but many others can't afford to pring on such skilled and expensive resources. How do those businesses uncover opportunity and insight within Big Data assets? They use analytic tools that offload some data discovery to business professionals or deploy intelligent analytic appications.
11:40am-12:20pm (40m) Hadoop & Beyond
Netflix's Evolving Data Science Architecture
Kurt Brown (Netflix)
Our Data Science tech stack has shifted from best-of-breed, "classic" business intelligence technologies to a hybrid environment, fully leveraging Hadoop and other Big Data solutions. Our philosophy has also evolved, now distilled in thinking and practice into "data science as a service". Why did we do it? What does it look like? What are the benefits? Come find out.
10:50am-11:30am (40m) Sponsored
Demonstrating The Future of Data Science
Mike Maxey (Greenplum)
Join us for a live demonstration of how you can leverage a data science platform, an open-source model, internal and external data, analytics tools, and visualization using Hadoop. See how unprecedented access to data scientists can deliver entirely new levels of insight to push the boundaries of what’s possible. Find out what you can do NOW to move your data science efforts forward.
11:40am-12:20pm (40m) Sponsored
The Death of the Enterprise Data Warehouse
Paul Groom (Kognitio)
Business users' attitude to data is changing rapidly – remember when building an EDW was all consuming? Now Big Data is edging the EDW to the side or likely into obscurity. Is this good or bad? How do you bring the values and software investment surrounding the EDW to the wild west of Big Data?
1:40pm-2:20pm (40m) Sponsored
Top 10 Things We Learned About Hadoop (since we started focusing on it)
Val Bercovici (NetApp)
Hadoop continues to climb the IT hype cycle. Along the way, plenty of truth, myth and folklore has been created around Hadoop's business capabilities and technical infrastructure requirements. Come hear NetApp’s real-world discoveries about Hadoop and find out what myths need retiring, as well as which truths need uncovering.
2:30pm-3:10pm (40m) Sponsored
Maximizing ROI by Sharing your Hadoop Big Data Center
Rohit Valia (IBM)
New techniques like Hadoop are leading the way to provide a scalable and cost effective solution. This session reviews the technical requirements for a low latency multi-tenant 'big-data' cluster - one where different lines of business and multiple applications can be run with assured SLAs, resulting in higher ROI for these clusters.
4:10pm-4:50pm (40m) Sponsored
Combining the Power of Hadoop MapReduce with Object-based Dispersed Storage
Russ Kennedy (Cleversafe)
This session will delve into the MapReduce computation paradigm, introduced by Google and widely adopted via the open-source Hadoop platform, combined with commodity hardware to execute computation at the storage node where data exists.
5:00pm-5:40pm (40m) Sponsored
BizData Monetization: Turn Your Data into Dollars
Thomas Strachan (GoodData)
Details to come...
1:40pm-2:20pm (40m) Visualization & Interface
Designing for Data-driven Organizations
Bitsy Bentley (GfK Custom Research)
An increasing number of organizations are embracing data to drive intelligent decisions. For many industries, this is a monumental shift in method and culture. Data communication strategies come in many flavors, from static metric reports to immersive data experiences. In this session I present a user-centered framework for designing or evaluating data delivery methods.
2:30pm-3:10pm (40m) Visualization & Interface
Best Practices for Publishing Data
Hjalmar Gislason (DataMarket)
You want to publish your data for clients, developers or the general public to use and enjoy. But which file formats to use? Which standards? How to provide an API? Should you visualize the data? And if so, how? DataMarket has been on the receiving end of data from many of the World's key data providers and is now helping leading information companies publishing theirs. Here we share our findings.
4:10pm-4:50pm (40m) Visualization & Interface
Visualizing Networks
Lynn Cherny (Ghostweather Research & Design, LLC)
As data scientists, we encounter large networks all the time. Recommendations, social ties, transactions, and other types of data are naturally represented as networks. To understand these networks, metrics help, but visualization is crucial. This talk will focus on tools, techniques, and frameworks to visualize networks cleanly, avoiding or at least minimizing “hairballs”.
5:00pm-5:40pm (40m) Visualization & Interface
Web Data Visualization: What's Becoming Easy, What's Becoming Possible
Kevin Lynagh (Keming Labs) et al
Advances in browser and mobile technologies have made the visualization and interaction of data on web a viable alternative to traditional tools used to visually explore data. Panelists will discuss the current state of web data visualization, as well as novel approaches made possible by recent advances.
1:40pm-2:20pm (40m) Data Science
What Can We Learn from Billions of Foursquare Check-ins?
Blake Shaw (Foursquare)
By applying machine learning algorithms to large aggregations of spatiotemporal data we can better understand how people interact with cities and build novel tools to help people navigate the real-world.
2:30pm-3:10pm (40m) Hadoop: Case Studies
Continuous Experimentation with Continuous Deployment
Steve Mardenfeld (etsy)
Evaluating an experiment amidst the shifting landscape of continuous deployment is a difficult task as traditional methods of monitoring operational metrics don’t provide enough information to make product-level decisions. This talk will focus on the framework that we have built to solve this problem - from data logging to the final analysis that drive decision making and everything in between.
4:10pm-4:50pm (40m) Hadoop: Case Studies
Taming the Elephant – Learn How Monsanto Manages Their Hadoop Cluster to Enable Genome/Sequence Processing
Bala Venkatrao (Cloudera) et al
Managing Hadoop clusters to meet business needs can be challenging. Learn how Monsanto has effectively tamed the elephant using Cloudera Manager.
5:00pm-5:40pm (40m) Hadoop: Tools & Technology
Making Pig Fly: Optimizing Data Processing on Hadoop
Thejas Madhavan Nair (Hortonworks Inc) et al
Apache Pig makes Apache Hadoop easier to use thanks to its high-level data flow language, Pig Latin. In this talk, we will discuss common data analysis tasks, the choices one can make while writing a query and impact of each on performance. The core principles behind the optimization recommendations shared during this presentation are applicable to all MapReduce applications.
1:40pm-2:20pm (40m) Business & Industry
This Message Will Self Destruct: The Implications of Self-Destructing Digital Data
Susan E. McGregor (Columbia University) et al
Does self-destructing data protect individuals' right to privacy and offer journalists an essential tool ability to protect their sources? Or would such a technology be a fundamental threat to effective law enforcement? We will describe the basic design of such a self-destructing data technology and discuss its disparate implications for individuals and government entities.
2:30pm-3:10pm (40m) Business & Industry
Big Data is a Hotbed of Thoughtcrime. So What?
Jim Adler (Metanautix)
Since the first human scrawled an image on a cave wall, the brain has been processing petabytes of data. Today, we're passing through an historical threshold where big data is leaching out of our braincases into the disembodied cloud. For the first time in human existence, we can "think" outside of our brains. What does this mean for privacy, morality, ethics, and the law?
4:10pm-4:50pm (40m) Business & Industry
UGD (User Generated Data), Product Development, and Privacy
Adrian Woodhead (Expedia)
Being a data-driven organization is core to developing and growing a successful Internet company today. This session will delve into the data ownership implications and considerations product teams need to take into account as they build products and services aimed at growing their user base and scaling their companies’ business.
5:00pm-5:40pm (40m) Business & Industry
Using Data to Tune A Software Team
Jonathan Alexander (Vocalocity, Inc.)
Jonathan Alexander, VP Engineering at Vocalocity and the author of Codermetrics (O’Reilly 2011) and Moneyball for Software Engineering (O’Reilly Radar 2011/2012) presents new ideas on how to gather data and use analytics to create more effective software development teams.
1:40pm-2:20pm (40m) Hadoop & Beyond
Designing Scalable Network Architectures for Fast Moving Big Data
Kenneth Duda (Arista Networks) et al
Explore the network capabilities and architecture necessary to build multi-petabyte clusters. Compare and contrast different networking architectures for Big Data. Use real-world case studies from many of the largest HDFS deployments. Explain how topology aware file systems interact with the network substrate. Discuss differences in architecture based on workload profile and data set size
2:30pm-3:10pm (40m) Hadoop & Beyond
Combining Hadoop & Crowdsourcing
Matt Wood (Amazon Web Services)
In this talk we will explore how businesses are marrying human judgment with large scale processing, improving the accuracy of Big Data analytics without sacrificing efficiency or scalability. Real-world examples will be discussed in which Hadoop and crowdsourcing are combined through the Amazon Web Services technologies Elastic MapReduce and Mechanical Turk.
4:10pm-4:50pm (40m) Hadoop: Tools & Technology
Knitting Boar
Josh Patterson (Cloudera) et al
In this session, we will introduce “Knitting Boar”, an open-source Java library for performing distributed online learning on a Hadoop cluster under YARN. We will give an overview of how Woven Wabbit works and examine the lessons learned from YARN application construction.
5:00pm-5:40pm (40m) Hadoop & Beyond, Hadoop: Tools & Technology
Scala + Cascading = Scalding
Avi Bryant (Stripe)
Start on low heat with a base of Hadoop; map, then reduce. Flavor, to taste, with Scala's concise, functional syntax and collections library. Simmer with some Pig bones: a tuple model and high-level join and aggregation operators. Mix in Cascading to hold everything together and boil until it's very, very hot, and you get Scalding, a new API for MapReduce out of Twitter.
10:50am-11:30am (40m) Sponsored
Turning Raw Data in Hadoop into Interactive BI (Capital One Labs Case Study)
Peter Schlampp (Platfora)
Richard Just, Big Data Program Manager at Capital One Labs, will share his experience using Hadoop and Platfora software to analyze several aspects of their business, including the adoption of their mobile application. The final solution produced an interactive, self-service web-based BI access to the data.
11:40am-12:20pm (40m) Sponsored
Deploy a Highly Available, Elastic, Multi-tenant Hadoop Cluster in 10 Minutes
Richard McDougall (VMware)
This session explores the benefits and implications of virtualizing Hadoop and highlights several VMware initiatives aimed at bridging Hadoop and virtualization.
1:40pm-2:20pm (40m) Sponsored
Building the Next Platform for Analytic Apps in the Cloud
George Mathew (Alteryx, Inc.)
The convergence of Analytics and the Cloud creates an interesting opportunity to solve many Big Data challenges that were previously untenable. Alteryx has historically served retailers and consumer brands on optimizing merchandising and store operations decisions with its Strategic Analytics product.
2:30pm-3:10pm (40m) Sponsored
Big Data: Turning the Information Overload into an Information Advantage
Chris Selland (HP Vertica) et al
Big data is everywhere, and it is increasingly complex and growing quickly, rendering manual and legacy approaches obsolete. Organizations can only realize the business value of big data with a meaning based platform technology that automatically understands all data, structured and unstructured, in real time. Join this session to learn more about Big Data and the technologies around it.
4:10pm-4:50pm (40m) Sponsored
Big Data Analytics Platform at Nokia – Selecting the Right Tool for the Right Workload
Yekesa Kosuru (Nokia) et al
Nokia’s Big Data analytics service is a strategic multi-tenant, multi-petabyte platform that executes 10,000 jobs each day. It is made up of technologies that provide location content processing, ETL, ad-hoc SQL, dashboards and advanced analytics, including Calpont InfiniDB for SQL, Scribe, REST, Hadoop, and R. This talk discusses the platform, motivations behind design choices, and challenges.
5:00pm-5:40pm (40m) Sponsored
Scalable, Accessible, Predictive Analytics on Hadoop
Steven Hillion (Alpine Data Labs)
It's not easy doing predictive analytics on Hadoop, with few tools that make it easier or more scalable than writing code from scratch. Join us to discuss a new paradigm that addresses the need for a scalable, powerful solution – one that is purpose-built for Big Data yet is easy to use – illustrated by a demonstration of predictive analytics run on the largest public Hadoop cluster in the world.
8:45am-8:50am (5m)
Thursday Welcome
Edd Dumbill (Silicon Valley Data Science) et al
Opening remarks by the Strata program chairs, Edd Dumbill and Alistair Croll.
8:50am-9:00am (10m)
The Human Face of Big Data
Rick Smolan (Against All Odds Productions)
Over the past two decades, Rick Smolan, creator of the best selling "Day in the Life" books, has produced a series of ambitious global projects in collaboration with hundreds of the world’s leading photographers, writers, and graphic designers. This year Smolan invited more than 100 journalists around the globe to explore the world of Big Data.
9:00am-9:10am (10m)
Strata Data Innovation Awards 2012
Edd Dumbill (Silicon Valley Data Science) et al
We’re excited to launch the Strata Data Innovation Awards to recognize disruptive, innovative technologies in big data and data science, highlight data science as an increasing importance for companies, and showcase the highlights of the growing data community.
9:10am-9:20am (10m) Sponsored
Hadoop: Thinking Big
John Schroeder (MapR Technologies)
This session will provide insights into how the combination of scale, efficiency, and analytic flexibility creates the power to expand the applications for Hadoop to transform companies as well as entire industries.
9:20am-9:30am (10m)
Beyond Batch
Doug Cutting (Cloudera)
Hadoop started as an offline, batch-processing system. It made it practical to store and process much larger datasets than before. Subsequently, more interactive, online systems emerged, integrating with Hadoop.
9:30am-9:35am (5m) Sponsored
Cloud, Mobile and Big Data – How Analytics Provides Value to the Buzzwords
Paul Kent (SAS)
In this rapid-fire keynote, we’ll introduce how virtually every new technology trend is inextricably linked – or should be to attain maximum leverage. We’ll discuss how you can use technologies such as cloud and mobility to spread the value of analytics pervasively across your virtual organization, and how that positively impacts your employees, customers and partners.
9:35am-9:45am (10m)
They Don't Teach You That In School
Cathy O'Neil (Intent Media) et al
A fireside chat with Cathy O'Neil about why universities can't make data scientists. Lots of companies want to hire data scientists, and there aren't enough to go around. Some universities are adding data science graduate departments, but they're facing an uphill battle, thanks to a lack of good data for academics, political infighting, and scalability issues.
9:45am-9:50am (5m) Sponsored
From Traditional Database to Big Data Platform
Irfan Khan (SAP)
You need more than a database 'hammer' for today's Big Data projects. Organizations need a 'data platform' providing integrated tools to capture, store, process and present data. Without it companies can achieve - volume, velocity, or variety - but not all three. Join us to learn the extreme capabilities needed to distill new business signals from big data.
9:50am-10:00am (10m)
Of Rocket Ships and Washing Machines: Data Technology for People
Joe Hellerstein (Trifacta and UC Berkeley)
The story of Big Data technology has centered on engines, algorithms, and statistical methods for data analysis. Less has been said-and too little has been done-regarding technology to improve the lives of data analysts.
10:00am-10:15am (15m)
Are We Really Winning the Information Revolution?
Samantha Ravich (National Commission for the Review of R&D Programs in the Intelligence Community)
Samantha Ravich, former National Security Advisor to Vice President Richard Cheney, will discuss the challenges that face strategic decision makers from the wealth of data now provided by advances in technology.
8:00am-8:45am (45m)
Break: Coffee break sponsored by Versant
10:20am-10:50am (30m)
Break: Morning Break sponsored by SAS
3:10pm-4:10pm (1h)
Break: Afternoon Break sponsored by SAP
12:20pm-1:40pm (1h 20m)
Thursday Lunch and BoFs
Birds of a Feather (BoF) sessions are informal roundtable discussions happening during lunch on both days of the conference. You can join any BoF table or start your own with a topic of your choice. The BoF sign-up board will be near the Registration area.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.