3-Hours: This workshop provides a detailed discussion of the new features of Apache Hadoop 2.0. We will discuss how YARN turns Hadoop from a single use system for batch data processing into a multi-use platform for storing and processing data in many ways other than batch. We will also discuss the details of the new HDFS improvements like High Availability, Federation, and Snapshots.
Apache Hadoop 2.0 is not just a major release number, but represents a generational shift in the architecture of Apache Hadoop. With YARN, Apache Hadoop is recast as a significantly more powerful platform – one that takes Hadoop beyond merely batch applications to taking its position as a ‘data operating system’.
In this presentation, we will discuss the details of YARN and provide an overview of how you might develop your own YARN implementation. We will also discuss the components of HDFS High Availability, how to protect your enterprise data with HDFS Snapshots, and how Federation can be used to utilize your cluster resources more effectively. We will also include a brief discussion on migrating from Hadoop 1.x to 2.0.
Attendees should be familiar with the basic components of Hadoop 1.x, and should bring pen and paper for taking notes.
Rich Raposa, Sr. Curriculum Developer at Hortonworks, has been an author and trainer for over 15 years, having published several programming books and travelled the country teaching software development at companies of all sizes. He joined Hortonworks in July of 2012 and has created their Hadoop 2.0 developer curriculum and certification exams.
For exhibition and sponsorship opportunities, contact Susan Stewart at firstname.lastname@example.org
For information on trade opportunities with O'Reilly conferences, email email@example.com
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata contacts