As storage costs have dropped, organizations can now afford to save the vast majority of data that passes through them. Systems like Hadoop’s MapReduce permit such data to be easily analyzed and mined to improve businesses. However classic data formats like CSV, XML and gzipped archives serve such uses poorly. Some have weak data models. Others support rich datastructures but are inefficient. Most integrate poorly with MapReduce.
Apache Avro data files define an expressive, efficient standard for representing large data collections. Avro supports rich, recursive datatypes and includes facilities for datatype evolution. In Avro, new datatypes may be processed and defined on the fly, useful from dynamic scripting and query languages. Avro data is compact and fast to process. Avro data files are compressed and MapReduce-friendly.
This talk will describe how Avro achieves these capabilities and how applications can start incorporating Avro data today.
Founder of Apache Lucene, Nutch, Hadoop and Avro projects.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities at the conference, contact Susan Young at firstname.lastname@example.org
Download the Strata Sponsor/Exhibitor Prospectus
View a complete list of Strata Contacts