©2011, O'Reilly Media, Inc.
(800) 889-8969 or (707) 827-7019
Monday-Friday 7:30am-5pm PT
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
Big Noise always accompanies Big Data, especially when extracting entities from the tangle of duplicate, partial, fragmented and heterogeneous information we call the Internet. The ~17m physical businesses in the US, for example, are found on over 1 billion webpages and endpoints across 5 million domains and applications. Organizing such a disparate collection of pages into a canonical set of things requires a combination of distributed data processing and human-based domain knowledge. This presentation stresses the importance of entity resolution within a business context and provides real-world examples and pragmatic insight into the process of canonicalization.
Tyler Bell is the Director of Product for Factual, an LA-based startup that is, amongst other things, creating a global coverage of the world’s places and local businesses. He previously taught archaeology at the University of Oxford and, more recently and topically, was the Product Lead for Yahoo’s Geo Technologies Group. He writes about semantic- and geo-technologies for O’Reilly Radar at http://radar.oreilly.com/tylerb/
Leo is a software engineer at Factual where he works on data cleaning tools and entity resolution. Prior to Factual, Leo was a software engineer at Google and an early engineer at LinkedIn.
For information on sponsorship opportunities at the conference, contact Susan Stewart at firstname.lastname@example.org
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata Contacts