In this talk, we describe REEF, a framework that makes it easy to implement scalable, fault-tolerant runtime environments for a range of computational models. REEF is designed to provide best-of-class performance on a diverse set of workloads, including extract-transform-load tasks written in MapReduce, iterative machine learning algorithms, and ad-hoc declarative query processing. REEF builds atop YARN to provide retainable hardware resources with lifetimes that are decoupled from those of computational tasks. This allows us to build persistent (cross-job) caches and cluster-wide services, but, more importantly, supports high-performance iterative graph processing and machine learning algorithms.
Unlike existing systems, REEF aims for composability of jobs across computational models, providing significant performance and usability gains, even with legacy code. REEF includes a library of interoperable data management primitives optimized for communication and data movement (which are distinct from HDFS’ notion of storage locality). The library also allows REEF applications to access external services, such as user-facing relational databases.
We were careful to decouple lower levels of REEF from the data models and semantics of systems built atop it. The result was two new standalone systems: Tang, a configuration manager and dependency injector, and Wake, a state-of-the-art event-driven programming and data movement framework. Both are language independent, allowing REEF to bridge the JVM and .NET ecosystems.
Russell Sears is one of the core developers of REEF in Microsoft’s new Cloud and Information Services Laboratory, and was previously a member of Yahoo! Research. He obtained has PhD at UC Berkeley, where he was advised by Eric Brewer. He works on scalable storage systems for analytical processing and log-structured indexing for online and low-latency applications.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For exhibition and sponsorship opportunities, contact Susan Stewart at email@example.com
For information on trade opportunities with O'Reilly conferences email mediapartners
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata + Hadoop World 2013 contacts