Upcoming Enterprise features in Apache HBase 0.96

Jonathan Hsieh (Cloudera, Inc)
Hadoop: Tools & Technology, Grand East (NY Hilton)
Average rating: ****.
(4.50, 2 ratings)

Apache HBase is a distributed data store that is in production today at many enterprises and sites serving large volumes of near-real-time random-accesses. As Apache HBase matures, the community has augmented the system with new features that many enterprise consider to be hard requirements. We will discuss how the upcoming HBase 0.96 release addresses many of these shortcomings by introducing new features that will help the administrator monitor and control access to the system, and new mechanisms to minimize downtime due to expected and unexpected outages.

Administrators need to understand and control the workloads of on an HBase Cluster. To support this, metrics can used to understand the performance and detect problems. HBase’s metrics have been improved to provide more useful latency metrics as well as metrics at the fine-grained region level. For controls, new access control features have been added to limit the users who have access to view and edit HBase tables via the new security coprocessor.

Downtime can usually be attributed to expected events such as upgrades or maintenance and to unexpected events such as failures. Expected downtime is often due to software upgrades. One major featue of the the new release simplifies upgrades and reduces downtime by adding a future proofing wire- and data-compatibility layer. This decouples clients version from server version and reduces downtime due to lockstep upgrades.

Unexpected downtime is often due metadata corruption or service outages. Two major improvements have can help handle both of these cases. In the rare event of a corruption, the hbck tool has been improved to handle many new classes of problems. For service outages, the replication mechanisms have been hardened.

The talk will conclude with highlights up upcoming features such as table snapshots support and new performance variability reducing features that will widen the applications enterprises can consider using HBase for.

Photo of Jonathan  Hsieh

Jonathan Hsieh

Cloudera, Inc

Jonathan is a Software Engineer with Cloudera, currently focused on the Apache HBase project. He is an Apache HBase committer and PMC member, as well as a committer on the Apache Sqoop project, and a committer and founder of the Apache Flume (incubating) project. Jonathan has an M.S. in Computer Science from University of Washington and also has an M.S. and a B.S. in Electrical and Computer Engineering from Carnegie Mellon University.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.