Skip to main content

Introduction to Hadoop 2.0

Rich Raposa (Hortonworks)
Hadoop and Beyond
Ballroom H
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Average rating: ****.
(4.30, 10 ratings)

3-Hours: This workshop provides a detailed discussion of the new features of Apache Hadoop 2.0. We will discuss how YARN turns Hadoop from a single use system for batch data processing into a multi-use platform for storing and processing data in many ways other than batch. We will also discuss the details of the new HDFS improvements like High Availability, Federation, and Snapshots.

Apache Hadoop 2.0 is not just a major release number, but represents a generational shift in the architecture of Apache Hadoop. With YARN, Apache Hadoop is recast as a significantly more powerful platform – one that takes Hadoop beyond merely batch applications to taking its position as a ‘data operating system’.

In this presentation, we will discuss the details of YARN and provide an overview of how you might develop your own YARN implementation. We will also discuss the components of HDFS High Availability, how to protect your enterprise data with HDFS Snapshots, and how Federation can be used to utilize your cluster resources more effectively. We will also include a brief discussion on migrating from Hadoop 1.x to 2.0.

Attendees should be familiar with the basic components of Hadoop 1.x, and should bring pen and paper for taking notes.

Rich Raposa

Sr. Curriculum Developer, Hortonworks

Rich Raposa, Sr. Curriculum Developer at Hortonworks, has been an author and trainer for over 15 years, having published several programming books and travelled the country teaching software development at companies of all sizes. He joined Hortonworks in July of 2012 and has created their Hadoop 2.0 developer curriculum and certification exams.