Making Pig Fly: Optimizing Data Processing on Hadoop

Thejas Madhavan Nair (Hortonworks Inc), Jianyong Dai (Hortonworks)
Hadoop: Tools & Technology, Murray West (NY Hilton)
Average rating: **...
(2.00, 1 rating)

Apache Pig makes Apache Hadoop easier to use thanks to its high-level data flow language, Pig Latin. While writing a Pig Latin query, there are certain choices that one makes in a each statement. It starts with very the first statement in your query, the load statement, which reflects the choice you made in the storage format. These decisions can have significant impact on performance of your query. For example, choice of the join algorithm for your query could result in orders of magnitude of difference in performance. In this talk, we will discuss common data analysis tasks, the choices that one makes when writing a query and the impact of each on query run time. The core principles behind the optimization recommendations shared during this presentation are applicable to all MapReduce applications.

Knowledge of the following will be useful:

  • Basics of Apache Hadoop map reduce and HDFS.
  • Basics of Apache pig.

Thejas Madhavan Nair

Hortonworks Inc

Thejas Nair is a software engineer working on Apache pig, hcatalog and hive projects at Hortoworks. He is a committer and PMC member of Apache Pig project. Previously, he worked at Yahoo for 9 years, developing solutions for large scale distributed data processing.

Jianyong Dai

Hortonworks

Jianyong Dai is a Apache Pig PMC member/committer and worked on Pig for almost 3 years at Yahoo and later at Hortonworks. I received my PhD in computer science specialize in computer security, data mining and distributed computing from University of Central Florida. I am interested in data science, large scale processing, Hadoop, Pig, HCatalog, Hive, and more.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.