Skip to main content

The Evolution of Hadoop at Stripe: Replicating MongoDB into HBase in Realtime, and How We Bolted Analytics onto an Existing System

Colin Marc (Stripe)
Hadoop & Beyond Murray Hill Suite
Average rating: ***..
(3.33, 3 ratings)
Slides:   1-ZIP 

As a payments provider, Stripe has a veritable goldmine of data to use — and lots of uses for it, from checkout conversion analysis to fraud prevention. But until recently, that data was stored only in disparate production systems, and what aggregates we did have were very ad-hoc.

We chose to approach this problem iteratively, in order to better understand the requirements and constraints and to explore the different technologies available. With some work (and lots of mistakes), we were able to build a system that streams data into HBase from our production services in real-time, making it available for analytics using MapReduce, Impala and other technologies in the ecosystem.

In my presentation, I’ll discuss the various architectures and technologies we tried, what worked well, and the lessons we learned.

Photo of Colin Marc

Colin Marc

Stripe

Colin Marc is a developer at Stripe, where he’s recently been spending his building analytics and modeling infrastructure. Besides programming all the things, Colin is also interested in Tuvan throat-singing and iambic tetrameter.

Comments on this page are now closed.

Comments

Marek Kolodziej
10/30/2013 8:32pm EDT

Would it be possible to post the slides here, like the other speakers have?

Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences email mediapartners
@oreilly.com

Press & Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata + Hadoop World 2013 contacts