MapReduce Design Patterns

Donald Miner (ClearEdge IT Solutions)
Data Science Hadoop: Case Studies, Sutton Center / Sutton South (NY Hilton)
Average rating: ****.
(4.20, 10 ratings)

MapReduce and Hadoop are new territory, both in terms of the novelty of the system but also the novelty of the computing paradigm. An experienced developer’s previous experience will not translate into the world of MapReduce, which means there is a steep learning curve in figuring out what works well and what doesn’t. It takes a seasoned Hadoop developer to understand what the best approach to a new problem is. Design patterns are all about documenting the knowledge and lessons learned of the seasoned Hadoop developer so that new developers can leverage the experts’ experience.

First, we’ll give some examples of design patterns, such as “random sampling” and “reduce-side join with bloom filter”. These examples will serve as an illustration of how to take a design pattern documented by an expert and apply it to a new problem. This will save new and intermediate Hadoop developers plenty of time that would have been spent trying approaches that lead to dead ends.

Second, we’ll talk about how experts can document design patterns and why they should. It’s important to be on the lookout for recurring themes and patterns in your solutions so that at some point a design pattern can be built that can be reapplied over and over. Design patterns, in general, have to be explained in context, with pitfalls and caveats clearly identified. This is even more so the case with MapReduce design patterns, so that you can avoid some of the common design mistakes when modeling your Big Data analytics.

Donald Miner

ClearEdge IT Solutions

Dr. Donald Miner serves as a Solutions Architect at EMC Greenplum, advising and helping customers implement and use Greenplum’s big data systems. Prior to working with Greenplum, Dr. Miner architected several large-scale and mission-critical Hadoop deployments in the U.S. Intelligence Community. He is the author of the upcoming book “MapReduce Design Patterns”, which will be published by O’Reilly in the Fall of 2012. He is also involved in teaching, having previously instructed industry classes on Hadoop and a variety of artificial intelligence courses at the University of Maryland, BC. Dr. Miner received his PhD from the University of Maryland, BC in Computer Science, where he focused on Machine Learning and Multi-Agent Systems in his dissertation.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.