Data anonymization can be used to address data privacy while still allowing the data to be analyzed and used effectively. Data anonymization typically employs formal models of data security, including K-anonymity and L-diversity, which requires primitives like encryption, hashing, and unique or custom mappings to ensure the degree of anonymity and diversity of the anonymized results. The relative complexity of such primitives combined with the increasingly growing size of the collected data pose real big data challenges.
Hadoop Anonymization Toolbox (HAT) is a configuration-driven framework built on top of Hadoop to address such big data anonymization challenges. HAT provides a simple and flexible way of composing Hadoop anonymization jobs by using out of the box anonymization primitives and tools. The framework is highly scalable and can easily adapt to data size outbursts or fluctuations. It is also designed to effectively deal with structure and schema evolution of the collected data. It is inherently extensible where existing primitives can be extended or new primitives added. HAT currently processes hundreds of Terabytes of data on daily basis. We believe that HAT addresses a gap in the existing big data ecosystem for a scalable data anonymization solution, and we are excited to announce our plans to make it available to the open source community. We anticipate that open sourcing HAT will also help other organizations solve their data anonymization needs.
The objective of this talk is to educate potential enterprise cloud users about big data anonymization and its potential benefits, pitfalls and ways to determine vulnerabilities. We’ll discuss data anonymization challenges and evaluate current anonymization tools. We’ll then describe HAT architectural design and implementation details, and share our experiences in building such a solution.
I have a diversity of interests focused around areas of cloud computing, data/metadata management, semantics and data integration, with emphasis on using advancements in these areas to build solutions that are useful for customers. I am an Apache Sqoop PMC member and committer, Apache Flume PMC member and committer.
Currently I am working at Google’s Motorola Mobility as part of the Cloud team. Before that, and after finishing my PhD, I worked at Yahoo and Cloudera.
For exhibition and sponsorship opportunities, contact Susan Stewart at firstname.lastname@example.org
For information on trade opportunities with O'Reilly conferences email mediapartners
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata + Hadoop World 2013 contacts