Prerequisites for attendees
This is a hands-on tutorial, so you will need to bring a laptop with a 64-bit OS with you. In order to participate in the hands-on, you MUST do the following in advance:
For common troubleshooting tips during installation, read this.
With a such a large number of components in the Hadoop ecosystem, writing Hadoop applications can be a challenge for users who are new to the platform. The Cloudera Development Kit (CDK) is an open source project with the goal of simplifying Hadoop application development. It codifies best-practice for writing Hadoop applications by providing documentation, examples, tools, and APIs for Java developers.
We will discuss the architecture of a common data pipeline from data ingest from an application to report generation. Hadoop concepts and components (including HDFS, Avro, Flume, Crunch, HCatalog, Hive, Impala, Oozie) will be introduced along the way, and they will be explained in the context of solving a concrete problem for the application. The goal is to build a simple end-to-end Hadoop data application that you can take away and adapt to your own use cases.
Attendees should be familiar with Java and common enterprise APIs like Servlets. No prior experience of Hadoop is necessary, although an awareness of the functions of components in the Hadoop stack is a plus.
Tom White is one of the foremost experts on Hadoop. He has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. His book Hadoop: The Definitive Guide (O’Reilly) is recognized as the leading reference on the subject. In 2011, Whirr, the project he founded to run Hadoop and other distributed systems in the cloud, became a top-level Apache project.
Tom is a software engineer at Cloudera, where he has worked since its foundation, on the core distributions from Cloudera and Apache. Previously he was an independent Hadoop consultant, working with companies to set up, use, and extend Hadoop. He has written numerous articles for O’Reilly, java.net and IBM’s developerWorks, and has spoken at several conferences, most recently at ApacheCon and OSCON in 2011. Tom has a Bachelor’s degree in Mathematics from the University of Cambridge and a Master’s in Philosophy of Science from the University of Leeds, UK.
Eric Sammer is currently a Principal Solution Architect at Cloudera where he helps customers plan, deploy, develop for, and use Hadoop and the related projects at scale. His background is in the development and operations of distributed, highly concurrent, data ingest and processing systems. He’s been involved in the open source community and has contributed to a large number of projects over the last decade.
Joey Echeverria is a Senior Solutions Architect at Cloudera where he works directly with customers to deploy production Hadoop clusters and solve a diverse range of business and technical problems. Joey joined Cloudera from the NSA where he worked on data mining, network security, and clustered data processing using Hadoop. Prior to working full time for NSA, Joey attended Carnegie Mellon University where he attained an M.S. and a B.S. in Electrical and Computer Engineering.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For exhibition and sponsorship opportunities, contact Susan Stewart at email@example.com
For information on trade opportunities with O'Reilly conferences email mediapartners
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata + Hadoop World 2013 contacts