Using HBase

Amandeep Khurana (Cloudera), Matteo Bertozzi (Cloudera)
Hadoop: Tools & Technology, Grand East (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Average rating: **...
(2.50, 10 ratings)

HBase is one of the new NoSQL data stores that have come up in the recent years and has been gaining popularity at a fast pace. It is a true open source implementation of the Google Bigtable, and is a part of the Hadoop ecosystem. HBase is known to scale to 100s of nodes easily, providing fast random access to terabytes and petabytes of data. This tutorial is to get you started in the world of HBase so you can build a scalable application of your own.

We’ll accomplish this by covering the following aspects:

  • The background of HBase as a datastore
  • Setting up HBase on a *nix machine (bring your laptop with Linux on it. Macs work just as well and so does a remote EC2 instance)
  • Get familiar with the client libraries using hands-on exercises
  • HBase data model and schema design basics
  • Overview of HBase internals and design assumptions

At the end of the tutorial, you’ll have an understanding of how to build applications that use HBase as the backend store.

Requirement: Make sure to come with your laptops (Mac / Linux or access to an EC2 instance) and if possible, download HBase 0.94.1 tar ball from the apache website (http://hbase.apache.org) so we can get to work right away. The tutorial includes hands-on exercises.

Photo of Amandeep  Khurana

Amandeep Khurana

Cloudera

Amandeep is a Solutions Architect at Cloudera where he’s involved in the entire lifecycle of Hadoop adoption for customers – from use case discovery to taking systems to production. Amandeep is also a co-author of HBase In Action, a book geared towards building applications using HBase. Prior to Cloudera, Amandeep was at Amazon Web Services, where he was a part of the Elastic MapReduce team and built the first version of EMR’s HBase offering.

Matteo Bertozzi

Cloudera

Software Engineer at Cloudera, currently focused on the Apache HBase project.

Comments on this page are now closed.

Comments

Warren Pfeffer
10/22/2012 10:16pm EDT

Would the Cloudera CDH3 version be OK?

Name : hadoop-hbase Version : 0.90.6+84.73 Repo : cloudera-cdh3

Eric Czech
10/22/2012 5:30pm EDT

Are there any best practices for serving low-latency random reads from HBase using a cluster that is simultaneously running a lot of MapReduce jobs? More specifically, how do you keep the MapReduce jobs from creating intermittent, large spikes in read latency? Is replication typically the best option for dealing with this?

Picture of Amandeep  Khurana
Amandeep Khurana
10/22/2012 3:11pm EDT

Matthew, not really. Any linux instance should do fine as long as you are able to connect to it from your laptop. I’d recommend not using EC2 because you’ll need reliable internet connectivity for the period you are doing exercises.

Jack, we’ll work with standalone. You don’t need Hadoop installed. In fact, it’s cleaner to keep Hadoop out of the picture for this tutorial.

Picture of Matthew Kleiderman
Matthew Kleiderman
10/22/2012 12:53pm EDT

Any configuration suggestions for EC2 instances?

Jack Zhou
10/21/2012 10:55pm EDT

Hi Amandeep, are we going to run some examples on top of a pseudo cluster? If we do, does the cluster version matter? I have installed hadoop 1.0.4 but hbase 0.94.1 has a hadoop-core-1.0.3.jar in its lib direcotry. Does this matter? Thanks,

Picture of Amandeep  Khurana
Amandeep Khurana
10/21/2012 7:19pm EDT

0.94.0 would work just fine and so would 0.92.x. We’ll be doing some basic work with the API and any of those versions would suffice.

Robert Goretsky
10/21/2012 2:13pm EDT

I’m just preparing my Mac laptop for the tutorial on Tuesday. I have been using the ‘brew’ package manager to install hadoop and hbase. The latest version of hbase supported by brew currently is 0.94.0. Is there anything critical in the upgrade to 0.94.1 that is needed for this tutorial? If so I could take a stab at updating the brew formula – I think it just involves pointing it to the correct tarball..

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.