Software testing is hard enough, but it becomes especially challenging when you’re doing large-scale, distributed data processing. This tutorial will present a mix of lecture and instructor-led demonstrations to explain how you can verify that your code performs exactly as you intended.
This session will focus on four key topics:
We will also discuss several problems developers commonly introduce into their code, as well as ways to recognize and solve them.
Tom Wheeler’s career spans more than fifteen years in the communications, biotech, financial, healthcare, aerospace and defense industries. Before joining Cloudera, he developed engineering software at Boeing, helped to design a high-volume data processing system for WebMD and served as senior programmer/analyst for a brokerage firm. Mr. Wheeler is a frequent presenter at both user groups and software conferences.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.
For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com
For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com
View a complete list of Strata contacts.
Comments
The current version that Tom provided below, is now updated at the top of this page as a ZIP file.
Hello attendees:
It seems that this Web site doesn’t have the latest version. I am working with O’Reilly to correct this, but in the meantime, you can find the current version of the slides and demos (including the MiniMRCluster and MiniDFSCluster example) here:
tomwheeler.com/tmp/TomWheel...
Hi Michael,
There are no specific prerequisites. It’s not practical to do this session as a hands-on workshop, so it will be a mix of lecture and demonstration. Thus, you needn’t have anything in particular on your computer (or even have a computer with you at all). I do plan on making my slides and all code used in the demos available following the session.
Although my demos will use Cloudera’s CDH4 distribution, I expect that they would run equally well on any modern version of Apache Hadoop, whether it comes from the Apache site or through another vendor’s distribution.
Are there prerequisites for this tutorial? A list of tools (and versions) we should have installed to save time on the demonstrations?