Skip to main content

The Next Wave of SQL-on-Hadoop: Building a Virtual EDW on Native Hadoop Data

Marcel Kornacker (Cloudera, Inc.)
Hadoop and Beyond
GA Ballroom J
Average rating: ****.
(4.57, 7 ratings)
Slides:   1-PDF 

Apache Hadoop has become an attractive platform for exabyte-capacity data storage. In the enterprise data warehouse (EDW) area in particular, Hadoop now increasingly serves as complementary technology for cost-efficient data loading and cleaning, supporting the EDW’s role in enabling interactive analysis and reporting on relational data. However, thanks to recent advances in the Hadoop ecosystem that expand the range of EDW-equivalent analytic capabilities entirely in open source software, it is now also possible for Hadoop to serve as a virtual EDW for native Big Data (stored in HDFS). Thus, costly processes for moving that data into the traditional EDW just for the purpose of analysis are no longer required.

In this session, attendees will get an architect-level view of this solution (comprising HDFS, Cloudera Impala, and the Parquet columnar storage format) and explore an example configuration and benchmark numbers that demonstrate how it offers a high level of performance, functionality, and ability to handle a multi-user workload, while retaining Hadoop’s traditional strengths of flexibility and ease of scaling.

Photo of Marcel Kornacker

Marcel Kornacker

Software Engineer, Cloudera, Inc.

Marcel Kornacker is a tech lead at Cloudera for new products development and creator of the Cloudera Impala project. Following his graduation in 2000 with a PhD in databases from UC Berkeley, he held engineering positions at several database-related start-up companies. Marcel joined Google in 2003 where he worked on several ads serving and storage infrastructure projects, then became tech lead for the distributed query engine component of Google’s F1 project.