Sessions tagged with 'hadoop'

Doug Cutting (Cloudera)
Apache Avro provides an expressive, efficient standard for representing large data sets. Avro data is programming-language neutral and MapReduce-friendly. Hopefully it can replace gzipped CSV-like formats as a dominant format for data.
Sam Shah (LinkedIn)
How do you go about building a product around data using Hadoop? This talk will present how LinkedIn builds and maintains such features as People You May Know. We will present our architecture for doing so (open-sourced) as well as knowledge we've gained in the process.
Peter Skomoroch (Data Wrangling)
Learn how to leverage data exhaust, the digital byproduct of our online activities, to solve problems and discover insights about the world around you. We will walk through a real world example which combines several datasets and statistical techniques to discover insights and make predictions about attendees at O'Reilly Strata.
Abe Taha (Karmasphere), Shevek - (Karmasphere), Ken Krugler (Bixo Labs), Chris Wensel (Concurrent, Inc)
This tutorial will explain MapReduce and how to develop big data applications in Java and high level languages such as Pig and Hive SQL. Using examples it will cover how to prototype, debug, monitor, test and optimize big data applications for Hadoop. Attendees will get hands-on instruction and will leave with a solid understanding of how to analyze data on Hadoop clusters and practical examples.
Rod Cope (OpenLogic, Inc.)
Hadoop and HBase make it easy to store terabytes of data, but how do you scale your search mechanism to sift through these mountains of bits and retrieve large result sets in a matter of milliseconds? Careful use of the Solr search server, based on Lucene, made these requirements come to life in our production environment. Come learn how we query terabytes of data in a highly available system.
Isabel Drost-Fromm (Apache Software Foundation/ Nokia Gate 5 GmbH)
With growing amounts of digital data at the fingertips of software developers the need for a scalable, easy to use framework is tremendous. This talk introduces Apache Mahout - a project with the goal of implementing scalable machine learning algorithms for the masses.

Sponsors

  • Thomson Reuters
  • EMC Data Computing Division
  • EnterpriseDB
  • Microsoft
  • Gnip
  • Rackspace Hosting
  • IBM
  • Windows Azure MarketPlace DataMarket
  • Amazon Mechanical Turk
  • Amazon Web Services
  • Aster Data
  • Cloudera
  • Clustrix
  • DataStax, Inc. (formerly Riptano, Inc.)
  • Digital Reasoning Systems
  • Heritage Provider Network
  • Impetus
  • Jaspersoft
  • Karmasphere
  • LinkedIn
  • MarkLogic
  • Pentaho
  • Pervasive
  • Revolution Analytics
  • Splunk
  • Urban Mapping
  • Wolfram|Alpha
  • Esri
  • ParAccel
  • Tableau Software

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Young at syoung@oreilly.com

Download the Strata Sponsor/Exhibitor Prospectus

Contact Us

View a complete list of Strata Contacts