Beyond Map/Reduce: Getting Creative With Parallel Processing

Ed Kohlwey (Booz Allen Hamilton)
Average rating: *....
(1.00, 1 rating)

While Map/Reduce is an excellent environment for some parallel computing tasks, there are many ways to use a cluster beyond Map/Reduce. Within the last year, the YARN and NextGen Map/Reduce has been contributed into the Hadoop trunk, Mesos has been released as an open source project, and a variety of new parallel programming environments have emerged such as Spark, Giraph, Golden Orb, Accumulo, and others.

We will discuss the features of YARN and Mesos, and talk about obvious yet relatively unexplored uses of these cluster schedulers as simple work queues. Examples will be provided in the context of machine learning. Next, we will provide an overview of the Bulk-Synchronous-Parallel model of computation, and compare and contrast the implementations that have emerged over the last year. We will also discuss two other alternative environments: Spark, an in-memory version of Map/Reduce which features a Scala-based interpreter; and Accumulo, a BigTable-style database that implements a novel model for parallel computation and was recently released by the NSA.

Photo of Ed Kohlwey

Ed Kohlwey

Booz Allen Hamilton

Edmund Kohlwey is a developer and data scientist at Booz Allen Hamilton. For the last three years, he has helped government clients adopt and develop their big data capabilities across many different problem domains.

Sponsors

  • EMC
  • Microsoft
  • HPCC Systems™ from LexisNexis® Risk Solutions
  • MarkLogic
  • Shared Learning Collaborative
  • Cloudera
  • Digital Reasoning Systems
  • Pentaho
  • Rackspace Hosting
  • Teradata Aster
  • VMware
  • IBM
  • NetApp
  • Oracle
  • 1010data
  • 10gen
  • Acxiom
  • Amazon Web Services
  • Calpont
  • Cisco
  • Couchbase
  • Cray
  • Datameer
  • DataSift
  • DataStax
  • Esri
  • Facebook
  • Feedzai
  • Hadapt
  • Hortonworks
  • Impetus
  • Jaspersoft
  • Karmasphere
  • Lucid Imagination
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Scaleout Software
  • Skytree, Inc.
  • Splunk
  • Tableau Software
  • Talend

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at sstewart@oreilly.com.

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
@oreilly.com

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

View a complete list of Strata contacts