Hadoop Data Warehousing with Hive

Dean Wampler (Typesafe)
Hadoop: Tools & Technology, Gramercy Suite (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Presentation: external link
Average rating: ***..
(3.75, 4 ratings)

In this hands-on tutorial, you’ll learn how to install and use Hive for Hadoop-based data warehousing. You’ll also learn some tricks of the trade and how to handle known issues.

Writing Hive Queries

We’ll spend most of the tutorial using a series of hands-on exercises with actual Hive queries, so you can learn by doing. We’ll go over all the main features of Hive’s query language, HiveQL, and how Hive works with data in Hadoop.

Advanced Techniques

Hive is very flexible about the formats of data files, the “schema” of records and so forth. We’ll discuss options for customizing these and other aspects of your Hive and data cluster setup. We’ll briefly examine how you can write Java user defined functions (UDFs) and other plugins that extend Hive for data formats that aren’t supported natively.

Hive in the Hadoop Ecosystem

We’ll learn Hive’s place in the Hadoop ecosystem, such as how it compares to other available tools. We’ll discuss installation and configuration issues that ensure the best performance and ease of use in a real production cluster. In particular, we’ll discuss how to create Hive’s separate “metadata” store in a traditional relational database, such as MySQL. We’ll offer tips on data formats and layouts that improve performance in various scenarios.

Photo of Dean Wampler

Dean Wampler

Typesafe

Dean Wampler is Principal Consultant at Think Big Analytics, specialists in “Big Data”, Machine Learning, and the Hadoop ecosystem. He speaks frequently at conferences on various big data and other programming topics.

Dean is the author of Functional Programming for Java Developers (O’Reilly, 2011), the co-author of Programming Scala (O’Reilly, 2009) and the co-author of the forthcoming Programming Hive, also from O’Reilly.

Comments on this page are now closed.

Comments

Picture of Dean Wampler
Dean Wampler
10/22/2012 7:33pm EDT

Robert and Michael, there are no prerequisites, other than some prior exposure to SQL. I’ll give you a download link tomorrow with the tutorial contents and we’ll all log into Amazon Elastic MapReduce clusters for the exercises. So, if you’re on Windows, install Putty. That’s it!

Robert Mancuso
10/22/2012 5:23pm EDT

Same question, any pre-reqs?

michael semb wever
10/18/2012 1:18am EDT

Are there prerequisites for this tutorial? A list of tools (and versions) we should have installed to save time on any exercises?

Picture of Dean Wampler
Dean Wampler
07/23/2012 3:39pm EDT

Salman, I’m not sure if they are available separately. Send email to confreg@oreilly.com to find out. Also, note that I’ll be presenting an updated version in NYC in October. Finally, another option is to get our the Hive book I co-wrote, which O’Reilly is publishing in September.

Salman Ahmed
07/23/2012 3:22pm EDT

Would this tutorial be available for purchase individually? or I will have to pay $595 to give me access to all tutorials?

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.