The physics approach to big data

Adam Kocoloski (Cloudant)
Data Science
Location: King's Suite - Balmoral
Average rating: ***..
(3.75, 4 ratings)
Slides:   1-PDF 

Physicists are natural problem solvers, equipped to think through what tools will work for particular data challenges. In the era of big data, these challenges are growing increasingly relevant. This talk will examine how particle physics research at Brookhaven National Labs and at CERN has informed how we, as physicists, see the evolution of data science.

In a way, we had it easy. Analyzing isolated collisions translated well to distributed university research systems and parallel models of computing. Compared to the growing body of data collected in today’s Web applications and sensor networks, modeling transactions between people and causality between events is much harder. In other ways, we shared the challenge of filtering big data to find useful information, which we addressed with blind analysis and machine learning.

We used blind analyses to protect against our inherent selection bias. As humans, we’ve evolved to recognize patterns. Data scientists are people too, and they can’t become overly reliant on data visualization. It’s too easy for us to see things that aren’t really there. So we worked on ways to recognize the noise in our experiments - the data we didn’t want - so we could inversely select the data we wanted to keep.

We didn’t turn to machine learning because we thought it would be fun. We had to get more efficient at throwing away data we didn’t need. Writing everything to disk and refining it later was too cumbersome, even for the most advanced, internationally funded research systems. Most of what we would have saved would have been noise.

As physicists, we learned that the way big data is headed, there’s no way we’ll be able to keep writing it all down. That’s the time vs. intelligence tradeoff today’s data scientists must learn: right when you collect the data, you need to make decisions on throwing it away.

Photo of Adam Kocoloski

Adam Kocoloski

Cloudant

Adam is an Apache CouchDB developer and one of the founders of Cloudant. He is the lead architect of a Dynamo-flavored clustering solution for CouchDB that serves as the core of Cloudant’s distributed data hosting platform. Adam received his Ph.D. in Physics from MIT in 2010, where he studied the gluon’s contribution to the spin structure of the proton using a motley mix of server farms running Platform LSF, SGE, and Condor. He and his wife Hillary are the proud parents of two beautiful girls.

Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences email mediapartners
@oreilly.com

Press & Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts