From Alpha to a Data-Driven Product

Ben Smith (Top10)
Nerdcore
Location: Thames Suite
Average rating: ***..
(3.00, 1 rating)

The original Top10 Alpha was made with two things in mind, to prove the Top10 concept worked and get funding. A year on and Top10’s code-base and technologies have evolved to mirror the focus and ambitions of the product: to help people discover new products, as well as collect and rank their favorite things. Where there was a LAMP website that took some user data, now there is a data processing pipeline that takes, processes and analyses all kinds of data to visualise across different devices and inside different ecosystems.

This talk uses the redevelopment of Top10 as a super-practical case-study of taking an idea from Alpha to a fully data-driven product, describing the process and the technologies involved.

The first goal was to get the platform to scale from ‘very few’ users to ‘not loads, but enough’. The obvious bottleneck was the hyper-rational MySQL DB so the even-more-obvious solution was to denormalise the data, but in a way that would be useful when we had a significant number of users. We built a distributed, real-time processing pipeline based on AKKA who’s first job was to denormalise and index the data into Redis.

In December 2011, Spotify launched their app platform. Our app was one of the first 10 to launch, so our ‘not loads, but enough’ users soon became ‘pretty sizable’. To complement the concurrency and asynchronicity of the processing pipeline, we built the api layer in Scala and replaced MySQL with Cassandra, as the Redis indexes meant we only needed to query by key (and we were well into multiple data-centers by then).

We then needed to rebuild the website to be responsive, mobile-focused and take in lots more data from lots more users. It was a no-brainer to redevelop the web-layer in Nodejs to maintain a non-blocking infrastructure from the top to the bottom.

Now we can really get some value from the data. With the real-time processing pipeline and asynchronous stack we’ve been writing all kinds of adapters to:
  • combine incoming data with other data sources (Facebook, Freebase, Amazon, ...)
  • categorise it using Bayesian filters
  • parse and understand sentences in the data using NLP libraries
  • increment on top of batch processed machine learning (Hadoop / Mahout)
  • build up separate search indexes (Lucene)

Ben Smith

Top10

Ben Smith is the Head of Technology for Top10. Previously he lead the development of social-networking and identity related web-services for the BBC and then became the CTO for MetaBroadcast.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com or +1 (707) 827-7148

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.