Tuesday, 02/28/2012

9:00am

Deep Data, Ballroom AB
Deep Data is a no-holds-barred program for data scientists. The advanced technical content will keep you up to speed with the latest techniques, and give you the opportunity to debate and network with the most skilled data scientists in our industry. Read more.
Data Science, Ballroom CD
Please note: to attend, your registration must include Tutorials.
Sarah Sproehnle (Cloudera, Inc.)
Average rating: ****.
(4.83, 6 ratings)
This tutorial provides a solid foundation for those seeking to understand large scale data processing with MapReduce and Hadoop, plus its associated ecosystem. This session is intended for those who are new to Hadoop and are seeking to understand where Hadoop is appropriate and how it fits with existing systems. No programming experience is required. Read more.
Data Science, Ballroom E
Please note: to attend, your registration must include Tutorials.
Ken Krugler (Scale Unlimited)
Average rating: **...
(2.75, 4 ratings)
Want to extract and process Big Data from the web? This tutorial will show you how to use key open source technologies such as Hadoop, Cascading, Bixo, Tika, Mahout and Solr to create scalable, reliable web mining solutions. Read more.
Data Science, Ballroom G
Please note: to attend, your registration must include Tutorials.
Joseph Rickert (Revolution Analytics)
Average rating: ****.
(4.50, 4 ratings)
This tutorial will enable anyone with some programming experience to begin analyzing data with the R programming language Read more.
Data Science, Ballroom H
Please note: to attend, your registration must include Tutorials.
Dean Wampler (Typesafe), Jason Rutherglen (Datastax)
Average rating: ***..
(3.00, 1 rating)
This hands-on tutorial teaches you how to setup and use Hive, a high-level, data warehouse tool for Hadoop. Hive provides a SQL-like query language, HiveQL, that is easy to learn for people with prior SQL experience, making Hive attractive for data warehousing teams. Hive leverages the power of Hadoop for working with massive data sets without requiring expertise in MapReduce programming. Read more.
Sponsored Session, Ballroom F
Please note: to attend, your registration must include Tutorials.
James Dixon (Pentaho), Chris Deptula (OpenBI)
The big data world is extremely chaotic based on technology in its infancy. Learn how to tame this chaos, integrate it within your existing data environments (RDBMS, analytic databases, applications), manage the workflow, orchestrate jobs, improve productivity and make using big data technologies accessible to a much wider spectrum of developers, analysts and data scientists. Read more.
JumpStart, GA K
Jumpstart looks at how building and running businesses changes in a data-driven world. It's the missing MBA for Big Data. Read more.
Visualization & Interface, GA J
Please note: to attend, your registration must include Tutorials.
Average rating: ****.
(4.50, 2 ratings)
This workshop is a jumpstart lesson on how to get from a blank page and a pile of data to a useful data visualization. We'll focus on the design process, not specific tools. Bring your sample data and paper or a laptop; leave with new visualization ideas. Read more.
JumpStart, Great America-K
Alistair Croll (Solve For Interesting)
Opening remarks by Program Chair, Alistair Croll, Founder, Bitcurrent Read more.
Deep Data, A-B
Michael Rys (Microsoft Corp.)
Average rating: ***..
(3.00, 1 rating)
Contrary to popular belief, SQL and NoSQL are not at odds with each other, they are duals—in fact NoSQL should really be called coSQL. Recognizing this duality can change the way we think about which technology to use when, and what we need to invest in next. Read more.

9:20am

JumpStart, Great America-K
Avinash Kaushik (Market Motive)
Average rating: *****
(5.00, 1 rating)
Author and digital marketing evangelist Avinash Kaushik shares his perspective, drawing from experience with some of the world's largest online marketers, and looks at how an analyst mentality is quickly permeating all aspects of business and marketing. Read more.

9:45am

Deep Data, A-B
Claudia Perlich (Dstillery)
Average rating: ****.
(4.00, 1 rating)
With the collection of almost every piece of information about your customers comes the ability to start asking your data the right question: Why do they do what they do? And even more: what would they do if I could interact with them. We show for the case of online display advertising, how causal analysis gives interesting new answers about the right (and wrong) ways of spending your money. Read more.

10:00am

JumpStart, Great America-K
Moderated by:
Terence Craig (PatternBuilders)
Panelists:
Lora Cecere (Supply Chain Insights), Pervinder Johar (CCC Information Services), Marilyn Craig (Logitech)
The effect of big data on all business models cannot be denied. This panel of SCM experts looks at how business are using, or should be using, big data to drive supply chain management issues focusing on the broader manufacturing issues that must be addressed as well as practical tips that can be applied in dealing with supply chains that now span the globe. Read more.

11:00am

JumpStart, Great America-K
J. C. Herz (Batchtags LLC)
Average rating: *****
(5.00, 1 rating)
This presentation lays out some clear, concrete gating conditions for when it makes sense to pull the trigger on big data initiatives, and how they should be procured, depending on the use case, the data assets, and the resources available. Read more.
Deep Data, A-B
Monica Rogati (Jawbone)
Average rating: ****.
(4.50, 2 ratings)
Getting training data for a recommender system is easy: if users clicked it, it’s a positive - if they didn’t, it’s a negative. … Or is it? In this talk, we use examples from production recommender systems to bring training data to the forefront: from overcoming presentation bias to the art of crowdsourcing subjective judgments to creative data exhaust exploitation and feature creation. Read more.

11:25am

JumpStart, Great America-K
Diego Saenz (Data Driven CEO)
What are the fundamental skills that a CEO needs to become “Data Driven”? In this session we will discuss the 3 essential skills that will enable CEOs to effectively lead their organizations into the Data Revolution. These organizations will harness the power of data to innovate, grow profits and beat the competition. Read more.

11:30am

Deep Data, A-B
Jacob Perkins (Weotta)
Average rating: ***..
(3.00, 1 rating)
Learn various ways to bootstrap a custom corpus for training highly accurate natural language processing models. Real world examples will be presented with Python code samples using NLTK. Each example will show you how, starting from scratch, you can rapidly produce a highly accurate custom corpus for training the kinds of natural language processing models you need. Read more.

11:55am

JumpStart, Great America-K
Felix Hamilton (e22 Alloy), Josh Gold (e22 Alloy)
There are many rapidly evolving technologies that provide objective metrics and analytics for most outward facing business interactions. The evolution of similar inward facing tools has not kept pace. In this presentation we discuss which sources of internal organizational data are frequently neglected, approaches for automating data collection, and what valuable insights can result from analysis. Read more.

12:00pm

Deep Data, A-B
Ben Gimpert (Altos Research)
Average rating: ***..
(3.00, 1 rating)
Twenty-first century big data is being used to train predictive models of emotional sentiment, customer churn, patient health, and other behavioral complexities. Variable importance and feature selection reduces the dimensionality of our models, so an unfeasible and complex problem may become somewhat more predictable. Read more.

12:30pm

Santa Clara Ballroom
Lunch Sponsored by HPCC Systems (1h)

1:30pm

Data Science, Ballroom CD
Please note: to attend, your registration must include Tutorials.
Jeremy Howard (Kaggle), Mike Bowles (Biomatica)
Average rating: ****.
(4.44, 9 ratings)
Wouldn't it be great if there were just use two algorithms which could handle most of your predictive modeling needs? It turns out that actually this is the case. Noted machine learning instructor Dr Mike Bowles and champion data miner Jeremy Howard will teach you everything you need to know to apply them successfully. Read more.
Data Science, Ballroom E
Please note: to attend, your registration must include Tutorials.
Simon Rogers (Guardian), Michael Brunton-Spall (Guardian News and Media)
Average rating: ****.
(4.00, 1 rating)
Learn first hand from award-winning Guardian journalists how they mix data, journalism and visualization to break and tell compelling stories: all at newsroom speeds. Read more.
Data Science, Ballroom G
Please note: to attend, your registration must include Tutorials.
Nate McCall (Apigee)
This presentation goes beyond the hype, buzzwords, and rehashed slides and actually presents the attendees with a hands-on, step-by-step tutorial on how to write a Java application on top of Apache Cassandra. It focuses on concepts such as idempotence, tunable consistency, and shared-nothing clusters to help attendees get started with Apache Cassandra quickly while avoiding common pitfalls. Read more.
Data Science, Ballroom H
Please note: to attend, your registration must include Tutorials.
Jock Mackinlay (Tableau Software), Ross Perez (Tableau Software)
Average rating: ****.
(4.00, 2 ratings)
In this hands-on class, learn how to turn data into effective, interactive visualizations. You do not require a Tableau license to participate, but must bring a Windows laptop or virtual machine. Read more.
Sponsored Session, Ballroom F
Please note: to attend, your registration must include Tutorials.
Richard Taylor (HPCC Systems from LexisNexis Risk Solutions)
While extracting entities from massive amounts of text is a major problem, a proven solution exists. This tutorial will demonstrate a natural language parsing technology to extract entities from all kinds of text using massively parallel clusters. Read more.
Data Science, GA J
Please note: to attend, your registration must include Tutorials.
Sarah Sproehnle (Cloudera, Inc.)
Average rating: ****.
(4.25, 4 ratings)
Learn now how to use a Hadoop cluster for data analysis using Java MapReduce, Apache Hive and Apache Pig, and get an overview of using the HBase Hadoop database. Some programming experience is strongly recommended for this session. Read more.
JumpStart, Great America-K
Mark Madsen (Third Nature)
Mark Madsen talks about how regular businesses will eventually embrace a data-driven mindset, with some trademark 'Madsen' history background to put it in context. People throw around 'industrial revolution of data' and 'new oil' a lot without really thinking about what things like the scientific method, or steam power, or petrochemicals did as a result. Read more.
Deep Data, A-B
Matt Biddulph (Product Club)
Average rating: ****.
(4.00, 1 rating)
The tools of social network analysis are based on mathematical network theory. There is very little in these techniques that actually requires that the data represents social activity. We'll show how these techniques can be applied to data from areas such as geo, linguistics and the Wikipedia link graph. We'll visualise and explore the data using Gephi, the "Photoshop for graphs". Read more.

2:10pm

JumpStart, Great America-K
Bill Schmarzo (EMC Consulting)
"Big data" provides the opportunity to combine new, rich data sources in novel ways to discover business insights. How do you use analytics to exploit this data so that it will yield real business value? Learn a proven technique that ensures you identify where and how big data analytics can be successfully deployed within your organization. Case study examples will demonstrate its use. Read more.

2:15pm

Deep Data, A-B
Average rating: ****.
(4.00, 1 rating)
Relational databases were based on Set theory — which insists that the order of items does not matter. For many (most?) data problems, however, order does matter. By using Array theory, a relational-like database gains a considerable advantage over set-theory based engines. Read more.

2:30pm

JumpStart, Great America-K
Michael Hugos (Center for Systems Innovation [c4si])
Average rating: *****
(5.00, 1 rating)
In this session, business agility expert Michael Hugos will present examples from his work in applying immersive animation techniques and gaming dynamics, and discuss how they can address the challenges of consuming - and responding to - the data deluge, turning information overload into business advantage. Read more.

3:30pm

JumpStart, Great America-K
Marcia Tal (Tal Solutions, LLC)
In this session, Marcia Tal will demonstrate how significant business value is being realized through sophisticated understanding of intent and interconnectedness, at scale. Read more.
Deep Data, A-B
Robert Lancaster (Orbitz Worldwide)
Average rating: ****.
(4.00, 1 rating)
We examine the effectiveness of a statistical technique known as survival analysis to optimize the cache time-to-live for hotel rates in a hotel rate cache. We describe how we collect and prepare nearly a billion records per day utilizing MongoDB and Hadoop. Finally, we show how this analysis is improving the operation of our hotel rate cache. Read more.

3:50pm

JumpStart, Great America-K
Marti Hearst (UC Berkeley)
Search user interfaces are slow to change; ideas for new search interfaces rarely take hold. This talk will forecast how search is likely to change and what will stay the same in the coming years. Read more.

4:00pm

Deep Data, A-B
Peter Skomoroch (Data Wrangling), Michael Driscoll (Metamarkets), DJ Patil (Greylock Partners), Toby Segaran (Google), Pete Warden (Jetpac), Amy Heineike (Quid)
Average rating: ****.
(4.00, 1 rating)
Join leading data scientists in debating hot issues in the profession. Read more.

4:20pm

JumpStart, Great America-K
Jon Bruner (O'Reilly Media)
Jon Bruner leads a panel discussion with a few of the day’s presenters and takes final questions from the audience. Read more.

7:00pm

Mission CIty Ballroom Foyer
Average rating: *****
(5.00, 1 rating)
Two events happening in the same time & place: *Mini Maker Faire* is a showcase of innovative data-related hardware, apps, and projects *Data Crush*, an experiment combining wine-tasting with the gathering, analysis, and application of data to track behavioral trends and influencing factors. Read more.

Wednesday, 02/29/2012

8:45am

Mission City Ballroom
This presentation will be streamed live.
Edd Dumbill (Silicon Valley Data Science), Alistair Croll (Solve For Interesting)
Opening remarks by the Strata program chairs, Edd Dumbill and Alistair Croll. Read more.

8:50am

Mission City Ballroom
This presentation will be streamed live.
Doug Cutting (Cloudera)
Average rating: ****.
(4.00, 5 ratings)
Apache Hadoop forms the kernel of an operating system for Big Data. This ecosystem of interdependent projects enables institutions to affordably explore ever vaster quantities of data. The platform is young, but it is strong and vibrant, built to evolve. Read more.

9:00am

Mission City Ballroom
This presentation will be streamed live.
Dave Campbell (Microsoft)
Average rating: ***..
(3.50, 2 ratings)
The explosion of data is both a challenge and opportunity for businesses. In order to thrive in this new world, organizations will need a technical strategy for sifting through all of this data and driving insights. Read more.

9:10am

Mission City Ballroom
This presentation will be streamed live.
Abhishek Mehta (Tresata)
Average rating: ***..
(3.67, 3 ratings)
How big data tools and technologies give us back our individual identity ... because if you didn't know you were unique and special, well, you are. Big data can be applied to solving socio-economic problems that rival the scale and importance of building ad optimization models. Read more.

9:20am

Mission City Ballroom
This presentation will be streamed live.
Mike Olson (Cloudera)
Average rating: ****.
(4.00, 3 ratings)
Tools for attacking big data problems originated at consumer internet companies, but the number and variety of big data problems have spread across industries and around the world. I'll present a brief summary of some of the critical social and business problems that we're attacking with the open source Apache Hadoop platform. Read more.

9:30am

Mission City Ballroom
This presentation will be streamed live.
Flavio Villanustre (LexisNexis Risk Solutions and HPCC Systems)
Average rating: ***..
(3.00, 1 rating)
Back in the late 80s artificial intelligence was set to take over the world; it didn’t happen. In 2012; AI has been stripped down, dressed up and reborn as machine learning. Will it take over the world this time? What makes a Big Data - Machine Learning solution ‘better’? Read more.

9:35am

Mission City Ballroom
This presentation will be streamed live.
Average rating: *****
(5.00, 4 ratings)
The increasing use of online software and digital devices in the classroom provides a source of high-frequency data streams that can be analyzed to better understand student progress, identify individual needs, and develop personal recommendations. Read more.

9:40am

Mission City Ballroom
Avinash Kaushik (Market Motive)
Average rating: ****.
(4.62, 8 ratings)
So you've hoarded the world's data within your enterprise. Now what? Author and digital marketing evangelist Avinash Kaushik shares lessons from the nascent world of Web Analytics on how multiplicity, scale and outsourcing powers a data democracy, and how that in turn drives business action. Read more.

9:55am

Mission City Ballroom
This presentation will be streamed live.
Ben Goldacre (Bad Science)
Average rating: ****.
(4.67, 6 ratings)
Negative results from clinical trials go missing far too often, leading us to overestimate the benefits of treatments. Attempts to remedy this problem haven't worked well. Ben Goldacre, both a doctor and data geek, will talk about how to fix this, and other, problems in medicine. Read more.

10:40am

Data Science, Mission City B1
Q Ethan McCallum (@qethanm)
Average rating: *....
(1.00, 1 rating)
The biggest problem in data science is ... the data itself. Read more.
Business & Industry, Mission City B4
Josh Green (Panjiva)
Despite the hype, Big Data has yet to live up to its potential. Why? Because we’ve spent too much time thinking about the data itself and not enough time considering which business decisions can be improved through the intelligent application of data. Panjiva CEO Josh Green will discuss an alternative approach: starting with a challenging business problem and then tracking down relevant data. Read more.
Visualization & Interface, Ballroom AB
This presentation will be streamed live.
Jock Mackinlay (Tableau Software)
Average rating: ***..
(3.50, 4 ratings)
Visual analysis is an iterative process for working with data that exploits the power of the human visual system. The formal core of visual analysis is the mapping of data to appropriate visual representations. Learn what years of research have taught us about designing visualizations people can learn from and understand. Read more.
Eric Baldeschwieler (Independent)
In this session, Hortonworks CEO Eric Baldeschwieler will look at the current state of Apache Hadoop, how the ecosystem is evolving by working together to close the existing technological and knowledge gaps, and present a roadmap for the future of the project. Read more.
Domain Data, Ballroom E
Chris Moody (Gnip)
Average rating: ***..
(3.67, 3 ratings)
With billions of social activities passing through the ever-growing realtime social web each day, companies are beginning to harness the power of social data. In this session, participants will learn from real-world case studies in Financial Services, Emergency Response, Brand Analytics and other industries about how businesses are applying social data to their operations to drive value. Read more.
Sponsored Session, Ballroom G
Billy Bosworth (DataStax)
In this panel discussion, DataStax CEO BIlly Bosworth will moderate a discussion that will spotlight real mission critical Big Data use cases from "hands-on" practitioners. With companies like Walmart, Netflix, & Apigee among many others adopting Apache Cassandra and other new database technologies, there's never been a more exciting time to be building data intensive applications. Read more.
Sponsored Session, Ballroom H
Eddie Satterly (Splunk), Sanjay Mehta (Splunk)
In this session, Expedia, one of the world’s leading online travel companies, describes how they tapped into their massive machine data to deliver unprecedented insights across key IT and business areas – from ad metrics and risk analysis, to capacity planning, security, and availability analysis. Read more.
Antonio Piccolboni (Per data LLC)
R and Hadoop, the two hottest stars on the Analytics stage, were meant to be together. The open source RHadoop project was established to make it happen. We'll go over what RHadoop does for you, how to use it, and why you should add it to your toolset. Read more.

11:30am

Data Science, Mission City B1
Peter Skomoroch (Data Wrangling)
Average rating: ***..
(3.00, 2 ratings)
New analysts or engineers are often lost when textbook approaches fail on real world data. Drawing inspiration from problem solving techniques in mathematics and physics, we will walk through examples that illustrate how come up with creative solutions and solve real world problems with data. Read more.
Business & Industry, Mission City B4
Dave Rubin (Oracle)
Average rating: **...
(2.00, 1 rating)
There is a revolution at hand centering on this groundswell of data and it will change how we execute our businesses through greater efficiencies, new revenue discovery and even enable innovation. It is the revolution of Big Data. Management Strategies for Big Data will explain this new wave of technology and provide a roadmap for businesses to take advantage of this growing trend. Read more.
Hjalmar Gislason (DataMarket)
Average rating: **...
(2.00, 1 rating)
With the rise of big data more and more people need effective visualizations. Needs may range from simple charts to massive interactive network graphs. A range of tools exist, but still many find none that meet all their requirements: Cross-browser usage, server-side rendering, iOS support, full control of look and feel, and your options are suddenly very slim. We share our lessons and approach. Read more.
Jack Norris (MapR Technologies)
Average rating: ****.
(4.00, 2 ratings)
This session will draw on numerous customer examples to reveal powerful tips, tricks, and in-depth use cases to show how Hadoop can easily integrate, scale, and analyze important data. Read more.
Domain Data, Ballroom E
Marcel Salathé (Penn State University)
Who influences whom? Data science can help answering this question which is of fundamental importance to business, politics, public health and many others. Read more.
Sponsored Session, Ballroom G
Mike Maxey (Greenplum), Katrin Ribant (Havas Digital), Jeff Carey, Keaton Adams (McAfee)
Average rating: **...
(2.50, 2 ratings)
The race is on to create the next competitive advantage. Attend this customer session for a brief introduction to Greenplum’s Big Data Analytics platform. Read more.
Sponsored Session, Ballroom H
Alexander Stojanovic (Microsoft), Martin Hall (Karmasphere), Eric Baldeschwieler (Independent)
Microsoft's Big Data solution turns signals into information. Learn how Microsoft's Hadoop service and rich BI capabilities can drive your business forward. Read more.
Henry Robinson (Cloudera)
At Cloudera, we've found that monitoring Apache Hadoop is itself a big data problem. Here I'll present work we've been doing on turning the vast amounts of monitoring data a Hadoop cluster generates into meaningful signals to help us wrestle with the biggest challenges of maintaining large distributed systems: failure of machines, processes and people, and root-cause analysis after-the-fact. Read more.

12:10pm

Exhibit Hall
Birds of a Feather (BoF) sessions are informal roundtable discussions happening during lunch on Wed 2/29 and Thu 3/1. You can join any BoF table or start your own with a topic of your choice. The BoF sign-up board will be near the Registration area. Read more.

1:30pm

Data Science, Mission City B1
Tony Middleton (HPCC Systems from LexisNexis Risk Solutions)
Average rating: ****.
(4.00, 1 rating)
How to simplify the data integration process and save a significant amount of development time by automatically generating code for processes (data profiling, data cleansing, and record linkage). A case study will show a complex, Big Data linking application, where insurance data was converted to HPCC using the SALT tool and reduced 20,000+ lines of source code to a 48-line SALT specification. Read more.
Business & Industry, Mission City B4
Martin Hall (Karmasphere), Ron Bodkin (Think Big Analytics)
Average rating: *****
(5.00, 1 rating)
While enterprises see an opportunity to increase revenues and decrease costs by becoming a data-driven organization, it is not easy to decide where and how to begin. This session highlights some principles for success through examining two real-world big data case studies. Read more.
Visualization & Interface, Ballroom AB
This presentation will be streamed live.
Jesper Andersen (Bloom Studios)
Average rating: ****.
(4.50, 4 ratings)
See how applying traditional data analysis tools, as well as more esoteric ones like computer vision, to multiple disparate data sets and data types can create a more complete and nuanced narrative of one of San Francisco’s most vibrant streets. Read more.
Sam Shah (LinkedIn)
Average rating: ****.
(4.50, 2 ratings)
In this talk, we'll build a complete, scalable collaborative filtering ("people who X also Y") system that is almost identical to what prominent Internet properties use today. We'll talk about model improvements, performance enhancements, and practical considerations. This is a practical talk accessible to all. Read more.
Domain Data, Ballroom E
John Mulholland (Fannie Mae)
Pascal Boillat, Fannie Mae’s Chief Information Officer, will address how changing data standards and implementation strategies is having a profound effect on the financial services industry. Read more.
Sponsored Session, Ballroom G
Moderated by:
Jim Tommaney (Calpont Corporation)
Panelists:
Fernanda Foertter (Genus plc)
Advances in columnar databases are creating bio-science opportunities that were previously not possible. Fernanda Foertter and the team at Genus discovered an innovative way to store and access the huge volumes of data being generated modeling genotypes. She and Jim Tommaney discuss the benefits of column storage and how InfiniDB’s Map Reduce empowers high performance Big Data analytics. Read more.
Sponsored Session, Ballroom H
Carter Shanklin (VMware), Jags Ramnarayan (Vmware)
Today's users won't tolerate slow applications. More often than not, the database is the bottleneck in the application. Learn how VMware vFabric SQLFire can give you the speed and scale you need in a substantially simpler way. SQLFire is a memory-optimized and horizontally-scalable distributed SQL database. Attend this session to learn how SQLFire gives high performance without the complexity. Read more.
Mark Pollack (SpringSource/VMware)
Hadoop is not an island. To deliver a complete Big Data solution, a data pipeline needs to be developed that incorporates and orchestrates many diverse technologies. Using an example of real-time weblog processing, in this session we will demonstrate how the open source Spring Batch and Spring Integration projects can be used to build manageable and robust pipeline solutions around Hadoop. Read more.

2:20pm

Data Science, Mission City B1
Philip Kromer (Infochimps)
Average rating: *****
(5.00, 1 rating)
Instead of working too hard to define the parameters in an attempt to completely remove the ambiguity, look at what people do, interact with and talk about. We can watch what people do and decide from there what a coffee shop is and where the boundaries of your neighborhood are. It might not be the “truth”, but it can be darn close. Read more.
Business & Industry, Mission City B4
Larry Murdock (Lyris)
Leapfrog enabled their learning toys and set up a system to have millions of toy owners upload their play logs. This talk covers the business strategy and the technical implementation hurdles from perspective of the former Director of Data Services who implemented it. Read more.
Bitsy Bentley (GfK Custom Research)
Average rating: ***..
(3.67, 3 ratings)
Data visualization is just one tool that designers use to communicate data-driven recommendations. In this session I present a case study on the use of user-centered design practices to craft meaningful and actionable data presentations for business users. Data visualization and UX work best when they work together. Read more.
Asad Khan (Microsoft)
As more companies adopt Hadoop to perform data intensive tasks for large data sets, there is a burning need to make Hadoop available to a broader set of developers. This talk covers two approaches Microsoft is exploring for this purpose: 1. JavaScript interfaces to run Hadoop jobs and 2. web interfaces for Hadoop that let developers write and run MapReduce jobs from any platform. Read more.
Domain Data, Ballroom E
Jen Zeralli (S&P Capital IQ), Jeff Sternberg (S&P Capital IQ)
Average rating: ****.
(4.00, 2 ratings)
Topics will span the data flow lifecycle from data collection, curation and quality, to aggregation and standardization of a multitude of complex data sources, to the creation of valuable analytics, including recommendations that connect users to the data. Read more.
Sponsored Session, Ballroom G
Swaminathan Sivasubramanian (Amazon Web Services)
Running large scale datastores requires us to handle various challenges such as scalability, reliability, performance, and reduced operational overhead. In this talk, we will discuss how Amazon DynamoDB was designed to address these problems. Read more.
Sponsored Session, Ballroom H
Tasso Argyros (Teradata Aster)
This session will explore a new class of analytic platforms and technologies such as SQL-MapReduce® which bring the science of data to the art of business. By fusing standard business intelligence and analytics with next-generation data processing techniques such as MapReduce, big data analysis is no longer just in the hands of the few data science or MapReduce specialists in an organization! Read more.
Josh Wills (Cloudera)
Cloudera Data Scientist Josh Wills will share insights and “how to” tricks about Crunch, a Java library that aims to make writing, testing and running MapReduce pipelines that run over any type of data easy, efficient and even fun. Read more.

4:00pm

Data Science, Mission City B1
This presentation will be streamed live.
Xavier Amatriain (Netflix)
Average rating: ****.
(4.50, 2 ratings)
Netflix is known for pushing the envelope of recommendation technologies. The Netflix Prize put a spotlight on recommender system research and a focus on predicting ratings. But, predicting a rating is only part of the recommendation problem. In this talk I will describe how other sources of implicit and contextualized information can be used to create a personalized experience. Read more.
Business & Industry, Mission City B4
Kuntal Malia (ModCloth), Kate Zimmerman (ModCloth)
Learn about how data is used for a fashion retailer that is on a rapid growth path. At ModCloth we don't believe in dictating fashion trends to our customer—we are inverting the pyramid and democratizing fashion. Buying patterns and user interactions are leveraged to help us understand how we can meet our customers' desires Read more.
Michael Edgcumbe (Columbia University), Eric Mika (The Department of Objects)
Average rating: ***..
(3.25, 4 ratings)
Custom data exploration tools can provide efficient and exciting interfaces for audiences not well served by out-of-the-box business intelligence solutions. Frameworks not only beautify data but also surface novel observations from the set. In this session, we survey the creative coding frameworks that lend themselves to visualization and offer some insight into their strengths and weaknesses. Read more.
Data Science, Ballroom CD
Average rating: ***..
(3.00, 1 rating)
How do you architect big data systems that leverage virtualization and platform as a service? We will walk through a layered approach to building a unified analytics platform using virtualization, provisioning tools and platform as a service. Read more.
Domain Data, Ballroom E
Ian White (Urban Mapping, Inc)
Average rating: ***..
(3.00, 2 ratings)
Federal transparency initiatives have spawned millions of rows of data, state and local programs engage developers and wonks with APIs, contests and data galore. Private industry offers attribute-laden device exhaust, forming a geo-footprint of who is going where, when, how and (maybe) for what. Who decides data provenance? Does curated data get treated the same as heterogeneous data? Read more.
Sponsored Session, Ballroom G
Rohit Valia (Platform Computing)
This session looks at the requirements for a multi-tenant big data cluster: one where different lines of businesses, different projects, and multiple applications can be run with assured SLAs, resulting in higher utilization and ROI for these clusters. Read more.
Sponsored Session, Ballroom H
Tim Estes (Digital Reasoning)
Average rating: *****
(5.00, 1 rating)
Data Scientists must deal with many Big Data challenges including volume, velocity and variety of data. These challenges require a new solution - Automated Understanding - a new evolution in software. In this session Tim Estes will show the power of this new capability on a large and valuable dataset that has never been deeply understood by software before. Read more.
Steve Francia (10gen)
Learn how to integrate MongoDB with Hadoop for large-scale distributed data processing. Read more.

4:50pm

Data Science, Mission City B1
Joris Poort (Startup)
Data science applied in engineering driven industries is revolutionizing how highly complex products are developed. Unprecedented access to computing power combined with advanced data science tools provide the opportunity to not only increase the speed of development but also improve the final design. Using a practical aerospace example, Joris will illustrate the tools and techniques described. Read more.
Business & Industry, Mission City B4
Christopher Berry (Syncapse)
Moneyball is to marketing science as CSI is to forensic science. The expectations are high and marketers are shouting "where's the insight?" and "ENHANCE!". Data is long and marketing scientists are short. We can only scale through technology. This is the story of how a developer and two marketing scientists became data scientists in crossing that gap. Read more.
Fabien Girardin (Near Future Laboratory)
Average rating: **...
(2.50, 2 ratings)
In this talk we report on the value of tools that support a human-driven approach to revealing innovation opportunities hidden withing big datasets. Based on our experience in data science projects involving multiple stakeholders we found that sketching with data and rapidly sharing interactive information visualizations is a key practice to transform information into useful services and products. Read more.
Data Science, Ballroom CD
Ana Martinez (CityGrid Media), Kin Lane (API Evangelist)
Learn how Citygrid built a world class platform to aggregate the data powering it's publicly available local places, content and ads APIs using Hadoop, Solr and MongoDB. Read more.
Domain Data, Ballroom E
Leigh Dodds (Kasabi)
Average rating: ****.
(4.00, 2 ratings)
Facebook's Open Graph, Schema.org, and a recent scramble towards a "Rosetta Stone" for geodata, are all examples of a trend towards linking data across the web. Weaving data into the web simplifies integration. Big Data offers ways to mine huge datasets for insight. Linked Data turns the web into a dataset Read more.
Ballroom G
TBC
Ballroom H
TBC
Stefan Groschupf (Datameer)
Average rating: ***..
(3.00, 1 rating)
Using Hadoop based business intelligence analytics, we analyzed Hadoop source code over time. This talk illustrates text and related analytics with Hadoop on Hadoop to reveal the true hidden secrets of the elephant. This entertaining session highlights the value of data correlation across multiple datasets and the visualization of those correlations to reveal hidden data relationships. Read more.

5:30pm

Exhibit Hall
Grab a drink, mingle with fellow Strata participants, and see the latest technologies and products from leading companies in the data space. Read more.

6:30pm

Mission CIty Ballroom Foyer
Average rating: *....
(1.00, 1 rating)
Don't miss Startup Showcase, Strata's live demo program and competition for startups and early-stage companies. With a panel of industry experts providing real-time feedback, Startup Showcase happens during Strata Conference on Wednesday, February 29, 2012. Read more.

Thursday, 03/01/2012

8:45am

Mission City Ballroom
This presentation will be streamed live.
Alistair Croll (Solve For Interesting), Edd Dumbill (Silicon Valley Data Science)
Opening remarks by the Strata program chairs, Alistair Croll and Edd Dumbill. Read more.

8:50am

Mission City Ballroom
This presentation will be streamed live.
Jonathan Gosier (metaLayer Inc.)
Average rating: **...
(2.00, 2 ratings)
Big data isn't just an abstract problem for corporations, financial firms, and tech companies. To your mother, a 'big data' problem might simply be too much email, or a lost file on her computer. We need to democratize access to the tools used for understanding information by taking the hard-work out of drawing insight from excessive quantities of information. Read more.

9:05am

Mission City Ballroom
This presentation will be streamed live.
Luke Lonergan (Greenplum, a division of EMC)
Average rating: ***..
(3.33, 3 ratings)
How are businesses using big data to connect with their customers, deliver new products or services faster and create a competitive advantage? Learn about the changing nature of customer intimacy and how the technologies and techniques around big data analysis provide business advantage in today's social, mobile environment – and why it is imperative to adopt a big data analytics strategy. Read more.

9:15am

Mission City Ballroom
This presentation will be streamed live.
Coco Krumme (MIT Media Lab)
Average rating: ***..
(3.00, 3 ratings)
Why data can tell us only so much about food, flavor, and our preferences. Read more.

9:25am

Mission City Ballroom
This presentation will be streamed live.
Pete Warden (Jetpac)
Average rating: ***..
(3.50, 4 ratings)
Why unstructured data beats structured. Read more.

9:35am

Mission City Ballroom
This presentation will be streamed live.
Usman Haque (Pachube.com)
Average rating: ****.
(4.33, 3 ratings)
The expected massive growth of connected device, appliance and sensor markets in the coming years - often called 'The Internet of Things' - will need a more rich concept of 'open data' than is currently common. Read more.

9:45am

Mission City Ballroom
This presentation will be streamed live.
Gary Lang (MarkLogic)
Average rating: **...
(2.33, 3 ratings)
Big Data is about extracting value from fast, huge, varied, complex data sets. But simply crunching data is only the first step. As adoption of MapReduce and data analytic technologies increases, forward thinking companies are starting to build applications on their core data assets. Read more.

9:50am

Mission City Ballroom
This presentation will be streamed live.
Richard Merkin (Heritage Provider Network)
Average rating: ***..
(3.50, 2 ratings)
Dr. Richard Merkin, President and CEO of Heritage Provider Network, that was recently named one of Fast Company’s 10 most innovative healthcare companies for 2012, will announce the winner of the second progress prize in the $3 million dollar Heritage Health Prize competition. Read more.

9:55am

Mission City Ballroom
This presentation will be streamed live.
Hal Varian (Google)
Average rating: ****.
(4.60, 5 ratings)
Google Insights for Search provides an index of search activity for millions of queries. These queries can sometimes help understand consumer behavior. Hal describes some of the issues that arise in trying to use this data for short-term economic forecasts and provide examples. Read more.

10:40am

Data Science, Mission City B1
This presentation will be streamed live.
Theo Schlossnagle (OmniTI/Circonus)
Average rating: ***..
(3.67, 3 ratings)
In today's environments, we're often forced to collect data before we know if it will be useful. This tendency leads toe seas of data, flowing in real-time with very little structure or understanding of what the data means. Given that, how can you tell when data "is normal?" Let's find out. Read more.
Business & Industry, Mission City B4
Kirkland Barrett (Microsoft)
Average rating: ****.
(4.00, 2 ratings)
A high level overview of Microsoft IT's BI strategy and it's various applications, focusing on Self Service BI, Scorecards and Dashboards, Data Visualizations, and Leadership Decision making through robust BI tools. Read more.
Max Gadney (After The Flood)
The use of video to communicate data is on the rise, but what is the most effective way to do this? Highlighting our current work with the BBC in this field we will look at best practice from storytelling principles to choosing the right visual treatment. Read more.
Vipul Sharma (Eventbrite)
Average rating: *****
(5.00, 1 rating)
This talk will go in details, architecture and challenges of building a recommendation system on a massive social graph. The talk will describe how we applied learning on large datasets using Apache Hadoop and how we scaled millions of reads and writes. We will also showcase Eventbrite's data platform architecture. Read more.
Policy & Privacy, Ballroom E
Moderated by:
Alexander Howard (O'Reilly Media)
Panelists:
Jim Adler (inome), Solon Barocas (New York University)
Average rating: ****.
(4.00, 2 ratings)
So much of the privacy discussion is about data access, fear of future dystopia, and the complexities of law. There is a vacuum around how societal norms should be mapped to rapidly growing capabilities of big data, leaving data professionals in a "don’t ask don't tell" privacy conundrum. This conversation will discuss specific use-cases and frameworks to guide data pros. Read more.
Sponsored Session, Ballroom G
Arun Murthy (Hortonworks Inc.)
This presentation will cover the next generation of Apache Hadoop, known as hadoop-0.23. Learn how MapReduce has been re-architected by the community to improve reliability, availability and scalability as well as adding support for alternate programming paradigms. Also learn about HDFS Federation, which allows for significant scalability improvements, as well as other important advancements. Read more.
Sponsored Session, Ballroom H
Gary Dusbabek (Rackspace)
Monitoring thousands of servers generates a lot of data. Many organizations trying to harness enormous amounts of data struggle with the same types of challenges as the Rackspace cloud monitoring team. Find out how Rackspace uses NoSQL technology, distributed concepts, and open source software in novel ways to produce a multi-region cloud monitoring API. Read more.
Jonathan Ellis (DataStax)
NoSQL, Big Data, massive scale, real-time, in the cloud, do I need it, do I want it, how the heck can I even know if it’s right for me? Choosing any database solution is a critical and tricky decision. Navigating the murky waters of NoSQL can be even tougher. Read more.

11:30am

Data Science, Mission City B1
Jeremy Howard (Kaggle)
Average rating: ****.
(4.50, 4 ratings)
In "The Evolution of Data Products", O'Reilly Media's Mike Loukides notes: "the question of how we take the next step — where data recedes into the background — is surprisingly tough." Jeremy Howard will show why this is tough, and what to do about it. He will show how predictive modelling, simulation, and optimization can be combined to deliver results instead of just delivering data. Read more.
Business & Industry, Mission City B4
Schwark Satyavolu (Truaxis)
By charging interchange fees for retailers and account fees for customers banks have taken a ‘combative’ approach for revenue generation. However, technologies are emerging that enable financial institutions to leverage big data drawn from the transaction data stream to provide new, pro-consumer revenue streams. Read more.
Ryan Ismert (Sportvision, Inc)
Average rating: ****.
(4.00, 1 rating)
Long a staple of broadcast sports, augmented reality (AR) effects (like the virtual "1st and 10" line) are increasingly being driven by digital records of sports events (DREs), collected and distributed live, such as NASCAR's race car tracking system and MLB's PitchFX. The next generation of DRE-derived data will expand the use of AR to more effectively show key "invisible" elements of the game. Read more.
Stefan Groschupf (Datameer)
This session discusses financial services use cases and challenges in using Hadoop analytics including long-term storage and analytics of transactions, identifying cross and up sell opportunities by analyzing web log files and customer profiles, value-at-risk analytics, and understanding the SLA issues and identifying problems in a thousands-of-nodes, big-services oriented architecture. Read more.
Policy & Privacy, Ballroom E
Kaitlin Thaney (Mozilla Science Lab), Betsy Masiello (Google), John Wilbanks (Kauffman Foundation for Entrepreneurship)
Average rating: ***..
(3.00, 2 ratings)
Making sense of the privacy issues around personal data is way too complicated. Pretty Simple Data Privacy builds on the idea that users need three options - Yes, No, Maybe - to control privacy settings on their personal data. We'll explore existing projects and codebases that implement legal and technical tools for all three of the settings. Read more.
Sponsored Session, Ballroom G
David Miller (LexisNexis)
In this session, attendees will learn about a new method for solving big data analytics via HPCC Systems, an open-source enterprise proven platform for Big Data. A case study will be given using patent data to demonstrate how big data can be process, linked, analyzed, searched and delivered to answer various queries. Read more.
Sponsored Session, Ballroom H
Nick Halstead (DataSift)
Nick Halstead CTO of DataSift will talk about Hadoop, HBase and dealing with storing and processing a billion tweets every 3 days. You will get insights into the architecture, pitfalls and real-world lessons on using Big Data technologies. Read more.
Nathan Marz (Twitter)
Average rating: **...
(2.00, 1 rating)
Storm is an open-source realtime computation system relied upon by Twitter for much of its analytics. Storm does for realtime computation what Hadoop did for batch computation. It has a huge range of applications and combines ease of use with a robust foundation. Read more.

12:10pm

Exhibit Hall
Birds of a Feather (BoF) sessions are informal roundtable discussions happening during lunch on Wed 2/29 and Thu 3/1. You can join any BoF table or start your own with a topic of your choice. The BoF sign-up board will be near the Registration area. Read more.

1:30pm

Data Science, Mission City B1
Alyona Medelyan (Pingar), Anna Divoli (Pingar)
Average rating: ***..
(3.00, 1 rating)
In this session we discuss approaches to mining unstructured data that gradually find their way into the real world. Text mining and analytics algorithms strive to identify documents’ categories, main topics, mentioned names and other entities; they summarize and detect sentiment. We describe case studies that take advantage of such algorithms in the legal, forensics and healthcare sectors. Read more.
Business & Industry, Mission City B4
This presentation will be streamed live.
DJ Patil (Greylock Partners)
Average rating: ****.
(4.50, 4 ratings)
What does it really take to build a data product? Recall and relevancy are only parts of the challenge. In fact, an entire new approach is required to build consistently great data products. Read more.
Jason Sundram (Facebook)
Average rating: ****.
(4.50, 2 ratings)
With the explosion of mobile devices, there is a plethora of geo-tagged data available for mining and visualization. To make compelling visualizations, it is often necessary to build tools that allow users to easily explore, mine, map, and market this data. This talk will focus on how to use several open-source frameworks to build such visualizations. Read more.
JP Morgenthal (EMC Consulting)
Hundreds of hours of video recordings culled from multiple cameras. Most of these recordings hold little value as the scene does not change for extended periods of time. For organizations that must keep the original in tact, analyzing these recordings can be very difficult. Using Map/Reduce we can harness parallel processing to identify and tag useful periods of time for faster analysis. Read more.
Business & Industry, Ballroom E
J. C. Herz (Batchtags LLC)
Average rating: ****.
(4.00, 1 rating)
This talk uses the OODA Loop concept (Observe, Orient, Decide, Act) as a framework to categorize Big Data use cases and data-driven services and the front-ends to those services. Rather than starting with the underlying technology or the data sources, the OODA loop starts with WHY the user needs information. It answers the question of when a black box beats an analytic tool, and vice versa. Read more.
Sponsored Session, Ballroom G
Alexander Gray (Skytree, Inc.)
Average rating: ***..
(3.50, 2 ratings)
Machine learning (ML) holds the key to the most advanced uses of big data. But is ML really possible on big data with state-of-the-art methods, or just simple ones? Can ML really be done in real time today? Is MapReduce the right answer? The cloud? I will review the current state of ML technology both at the research level and the industry-readiness level, and current best solution options. Read more.
Sponsored Session, Ballroom H
Gary Lang (MarkLogic)
Average rating: *....
(1.00, 1 rating)
Gary Lang, Senior VP Engineering, MarkLogic, will discuss the concept of Big Data Applications and walk through three in-production implementations of Big Data Applications in action. Read more.
Sean Byrnes (Flurry, Inc.)
Flurry provides an analytics and advertising platform for smartphone applications. Every month we track over 20 billion sessions across over 330 million devices. This talk will go over the Hadoop and HBase architecture we run and the challenges we face managing a massively growing data set. Read more.

2:20pm

Data Science, Mission City B1
Alasdair Allan (The Thing System, Inc.)
Average rating: *****
(5.00, 1 rating)
Big data isn't just about multi-terrabyte data sets hidden inside eventually-concurrent distributed databases in the cloud. It's also about the hidden data you carry with you all the time. This talk will discuss the data that you carry with you all the time; the data on your cell phone and other mobile devices, along with the possibilities for making use of that hidden data. Read more.
Business & Industry, Mission City B4
Piyush Lumba (Microsoft), Francis Irving (ScraperWiki Ltd.)
One of the most significant challenges faced by individuals and organizations is how to discover and collaborate with data within and across their organizations, which often stays trapped in application and organizational silos. Read more.
Mano Marks (Google, Inc. ), Chris Broadfoot (Google)
Beautiful, useful and scalable techniques for analysing and displaying spatial information are key to unlocking important trends in geospatial and geotemporal data. Recent developments in HTML 5 enable rendering of complex visualisations within the browser, facilitating fast, dynamic user interfaces built around web maps. This session will examine emerging technologies that will shape the geoweb. Read more.
Ed Kohlwey (Booz Allen Hamilton)
Average rating: *....
(1.00, 1 rating)
Map/Reduce has created tremendous interest in parallel programming and big data analytics, but it isn't always the right tool for the job. Many new projects have emerged in this space over the last year including two cluster schedulers (YARN and Mesos) and numerous parallel computing environments. We'll provide an introduction to these new technologies, including some you might not have heard of. Read more.
Policy & Privacy, Ballroom E
Kaitlin Thaney (Mozilla Science Lab), Mark Hahnel (FigShare), Ben Goldacre (Bad Science)
Average rating: ***..
(3.00, 1 rating)
In a research environment, under the current operating system, most data and figures collected or generated during your work is lost, intentionally tossed aside or classified as “junk”, or at worst trapped in silos or locked behind embargo periods. In the digital age, this does not need to be the case - and it's imperative we change that reality. Read more.
Sponsored Session, Ballroom G
Vineet Tyagi (Impetus Technologies)
The session will talk about costs involved in Big Data projects, covering the apparent and also hidden aspects of these costs. It will also discuss how to build a Big Data solution with lower cost of “per TB Data Managed and Analyzed”. Read more.
Sponsored Session, Ballroom H
Max Yankelvich (Crowd Computing Systems, Inc.)
Average rating: ***..
(3.00, 2 ratings)
There’s a big opportunity for big data: human processing. Max Yankelevich explores the latest innovations combining the scalable quality control of artificial intelligence with the scalable human judgment of crowdsourcing to solve big data problems. Learn the surprisingly easy methods to leverage the crowd to collect, control, validate and enrich data. Read more.
James Phillips (Couchbase, Inc.)
Average rating: **...
(2.00, 2 ratings)
Mobile devices offer boundless opportunities for collection and presentation of temporally- and spatially-relevant data. But there are obstacles: intermittent connectivity as well as processing, storage and other constraints. Featuring real-world apps, this session covers device data collection; device-device and device-cloud data synchronization; and data aggregation and analysis in the cloud. Read more.

4:00pm

Data Science, Mission City B1
Daniel Tunkelang (LinkedIn), Claire Hunsaker (Samasource)
Average rating: ****.
(4.00, 1 rating)
In this talk, we will analyze various dimensions of microwork that characterize applications, tasks, and crowds. Drawing on our experience at companies that have pioneered the use of microwork (Samasource) and data science (LinkedIn), we will offer practical advice to help you design crowdsourcing workflows to meet your data product needs. Read more.
Business & Industry, Mission City B4
Moderated by:
Siraj Khaliq (The Climate Corporation)
Panelists:
Average rating: ***..
(3.00, 1 rating)
Due to recent advancements in Big Data, cloud computing, and network maturity it's now possible to work with extremely large weather-related data sets. The Climate Corporation CTO Siraj Khaliq discusses how to apply big data principles to the real-world challenge of protecting people and businesses from the financial impact of weather. Read more.
Business & Industry, Ballroom AB
Robbie Allen (Automated Insights, Inc.)
Average rating: ****.
(4.00, 1 rating)
The ultimate utility of Big Data is transforming it into Big Insights. Charts, graphs, and tables of aggregated data are useful but still require interpretation by the end user. With advances in linguistic algorithms and data processing it is now possible to derive meaningful insights from data and present them in digestible narrative content. Read more.
Ron Bodkin (Think Big Analytics), Kumar Palaniappan (NetApp)
NetApp collects 250 TB per year of unstructured data from devices that phone home. They need to be able to do ad hoc analysis and build predictive models for device support and cross-sales. We discuss our experiences building a Big Data system with NetApp using Hadoop and HBase to improve customer service, drive sales and develop better products. Read more.
Policy & Privacy, Ballroom E
Virginia Carlson (Urban Rubrics), Jake Porway (DataKind)
The “common good” challenge for Big Data is to deliver actionable information that can be used by nonprofits and civic orgs. But that challenge isn’t new. Existing data intermediaries for NGOs have a rich history of working in common-good territory. Let’s discuss. What is this history? What can we take away from it to inform new, perhaps disruptive, approaches to meet this challenge? Read more.
Sage Weil (Inktank)
Average rating: ****.
(4.00, 1 rating)
Data storage needs are increasing at an exponential rate. Incumbent storage systems are proprietary, expensive to buy and expensive to maintain. With the advent of the cloud, everyone expects auto scaling. Ceph storage is a massively scalable storage system that aims to fill the distributed storage system void. Read more.

4:50pm

Data Science, Mission City B1
Jan Reichelt (Mendeley Ltd.), William Gunn (Mendeley Research Networks)
Mendeley is a New York and London-based startup that has crowdsourced the world's largest database of academic literature. Over 1M researchers strong, Mendeley is taking academia to the cloud. Read more.
Business & Industry, Mission City B4
Jacomo Corbo (QuantumBlack)
Measuring productivity remains a notoriously difficult problem. We will show how real-time collaboration data are being leveraged to measure, model and forecast organizational productivity and performance in the innovation teams at Boeing and in 3 Formula One teams. On the back of these forecasts, we will show how investment yields were improved by 15% and productivity raised by nearly 20%. Read more.
Cheryl Phillips (The Seattle Times)
A story or report on a subject by its very nature summarizes the underlying data. But readers may have questions specific to a time, date or place. Visualizing the data and providing effective, targeted ways to drill deeper is key to giving the reader more than just the story. Read more.
Data Science, Ballroom CD
Paul Brown (Paradigm4 Inc.)
The science and commercial worlds share requirements for a high performance informatics platform to support collection, curation, collaboration, exploration, and analysis of massive datasets. SciDB is an open source analytical database that provides seamlessly integrated massively scalable analytics. We present performance and scalability for non-embarrassingly parallel operations. Read more.
Data Science, Ballroom E
Peter Kuhn (Scripps Physics Oncology)
Average rating: ****.
(4.50, 6 ratings)
Metastasis is the lethal form of cancer. Metastasis arises through cancer cells traveling through the blood of the patient and colonizing in other organs. Finding and characterizing these cells enables the prediction and monitoring of response to cancer treatments. Read more.
Marc Smith (Social Media Research Foundation)
Average rating: *****
(5.00, 1 rating)
Maps of the complex connections that form when people link, like, reply, rate, review, favorite, friend, follow, edit, and mention one another can reveal important trends. It is possible to create network maps with free and open tools that identify key people and sub-groups in any social media population with just a few key clicks. Can you make a pie chart? You can now make a network chart. Read more.

Sponsors

  • EMC
  • Microsoft
  • HPCC Systems™ from LexisNexis® Risk Solutions
  • MarkLogic
  • Shared Learning Collaborative
  • Cloudera
  • Digital Reasoning Systems
  • Pentaho
  • Rackspace Hosting
  • Teradata Aster
  • VMware
  • IBM
  • NetApp
  • Oracle
  • 1010data
  • 10gen
  • Acxiom
  • Amazon Web Services
  • Calpont
  • Cisco
  • Couchbase
  • Cray
  • Datameer
  • DataSift
  • DataStax
  • Esri
  • Facebook
  • Feedzai
  • Hadapt
  • Hortonworks
  • Impetus
  • Jaspersoft
  • Karmasphere
  • Lucid Imagination
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Scaleout Software
  • Skytree, Inc.
  • Splunk
  • Tableau Software
  • Talend

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at sstewart@oreilly.com.

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
@oreilly.com

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

View a complete list of Strata contacts