Hadoop is gaining momentum with most companies as a means to do log analysis and business reporting. Hadoop is a great tool for solving these problems, but it can be used to build much more interesting data applications.
Hadoop is a general purpose, high performance data processing pipeline. At LinkedIn, the largest professional social network, we use Hadoop for several uncommon and interesting use cases. For instance, we look at marketing as a recommendation problem, not a sales problem. To do this, we use Hadoop for our recommendation, data processing, and content delivery pipelines, approaching marketing as a scientific process that helps us learn how to advertise better. To this end, we’ve developed a Hadoop-based system that generates and prioritizes marketing email messages. As another example, we use Hadoop to generate updates in a member’s news feed. This system can be used to deliver rich analytical insights to members or to quickly prototype an idea for a new update, all with a 1-line command that’s easy enough for even product managers to use. As one final example, we use Hadoop to power several recommendation systems, including People You May Know.
In this talk, we’ll describe how LinkedIn leverages Hadoop for these use cases. We’ll give detailed descriptions of the systems and tools that we have built to use Hadoop for production pipelines (such as Azkaban and Kafka), and interesting things we’ve learned along the way. We’ll talk about how Hadoop allows us to come up with ideas, rapidly test them, and how we can quickly turn these ideas into scalable production processes.
Sam Shah is a principal engineer on the LinkedIn data team. He leads many of the site’s large-scale recommendation and analytics systems, which analyze hundreds of terabytes of data daily to produce products and insights that serve LinkedIn’s members. His work involves pure research, product-focused features, and infrastructure development, including social network analysis, recommendation engines, distributed systems, and grid computing. Some of the products under his purview include “People You May Know”, “Who’s Viewed My Profile?”, Skills, related searches, job recommendations, and more. Sam holds a Ph.D. in Computer Science from the University of Michigan.
Joseph Adler has many years of experience in data mining and data analysis at companies including DoubleClick, American Express, and VeriSign. He graduated from MIT with an B.Sc. and M.Eng in Computer Science and Electrical Engineering. He is the inventor of several patents for computer security and cryptography, and the author of “Baseball Hacks” and “R in a Nutshell”. Currently, he is a senior data scientist at LinkedIn.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities, contact Susan Stewart at firstname.lastname@example.org.
For information on trade opportunities contact Kathy Yu at mediapartners
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata contacts.