Hadoop is responsible for computing a varying array of data products at LinkedIn, including People You May Know (LinkedIn’s people recommendation service), People Who Viewed This Also Viewed (LinkedIn’s collaborative filtering), Who’s Viewed My Profile?, Career Center, LinkedIn’s job recommendations, and more. These products are immensely successful and extremely data intensive: People You May Know, for example, generates a significant portion of the invitations on LinkedIn, churning through over 50 TB of data every day.
In this talk, I will detail the pieces of infrastructure that allow us to make this happen (all open sourced), which will allow an attendee to build their own data products. I will also give tips & tricks that we have learned, sometimes painfully, along the way. This talk is geared towards the intermediate Hadoop user who perhaps has a few jobs that compute some data, but wants to learn how to put this into a productionized process. There will also be some nuggets for advanced users on how LinkedIn deals with big data.
The talk will be subdivided into 4 “proverbs,” as follows.
Sam Shah is a Senior Software Engineer in the Search, Network, and Analytics Team at LinkedIn, working on applied data products. He is particularly involved in the relevance backends behind “People You May Know,” LinkedIn’s people recommendation service, and LinkedIn’s collaborative filtering system. He holds a Ph.D. from the University of Michigan.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities at the conference, contact Susan Young at syoung@oreilly.com
Download the Strata Sponsor/Exhibitor Prospectus
View a complete list of Strata Contacts
Comments
I enjoyed this session, though it was less about Hadoop itself and more about the practical aspects of designing, developing and deploying large data analysis processes. Seeing all the real-world constraints laid out on top of the basic data-flow and the complexity this adds is an important, but often ignored or underestimated consideration in live running, real-world systems.