Getting training data for a recommender system is easy: if users clicked it, it’s a positive – if they didn’t, it’s a negative.
… Or is it? You’ve probably learned an algorithm to run on top of your existing algorithm, now and every time you re-train. And what do you do when the data product you’re building doesn’t have any users yet? Do you really launch with random results, hand label 50K examples, or ask a Turker to pretend they’re User #1337?
Unlike having a better algorithm, having better training data can improve your results by orders of magnitude. Yet training data generation is often an afterthought—a footnote in a formula-filled publication.
In this talk, we use examples from production recommender systems to bring training data to the forefront: from overcoming presentation bias to the art of crowdsourcing subjective judgments to creative data exhaust exploitation and feature creation.
As one of the founding members of the LinkedIn data science team, Monica turns data into products, actionable insights and (news) stories.
Monica obtained her PhD in Computer Science from Carnegie Mellon, where she focused on text mining and applied machine learning. At LinkedIn, she pioneered data driven products with multi-million dollar business impact and is currently building mathematical models that power LinkedIn’s personalized recommendations. When she doesn’t name projects after Harry Potter, Monica finds stories in the LinkedIn data about the most overused buzzwords, trending job titles, entrepreneur DNA, promotion cycles for Millennials and first names that tend to succeed. Her stories appeared in thousands of media outlets – from the Wall Street Journal & The Economist to NPR & CNN to Real Simple & (yes!) Howard Stern.
For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at firstname.lastname@example.org.
For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata contacts