Please log in to leave a comment

The $3 Million Heritage Health Prize

Average rating
****.
(4.00, 3 ratings)
Add your rating
Location: Mission City Ballroom
Please note: This and all other keynotes will be live streamed and recorded.

Data competitions come of age: from movie recommendations to life and death. Possibly the biggest news at Strataconf is Heritage Provider Network’s $3 million predictive modeling prize – the biggest data mining competition ever. It requires data scientists to build algorithms that predict who will go to hospital in the next year, so that preventive action can be taken.

  • View transcript

O’Reilly Strata Making Data Work Conference

February 2, 2011

The $3 Million Heritage Health Prize

Anthony Goldbloom

Jonathan Gluck (Heritage Provider Network)

Good morning. My name is Jonathan Gluck, and I’m from Heritage Provider Network, which is one of the largest physicians’ groups in the United States. Heritage Provider Network is not an insurance company, but instead is a group of doctors that provides the actual medical care to the members.

So why is a doctors’ group at a data conference? The answer is that better predictive modeling and data mining is precisely what health care needs. The health care system in the United States is currently not a health care system, but instead we can think of it as a sick care system. We provide care to people after they have already become ill. What we need to do is move towards a system that seeks to keep people healthy and prevents them from becoming sick.

So, the challenge we have for you is conceptually simple. Create an algorithm that will predict who will go to the hospital from a given patient population. If an algorithm of that sort can be created, doctors can intercede with the patients and seek to keep them healthy and prevent them from going to the hospital. That will help us move towards a health care system. If you can create the winning algorithm, you’ll win $3 million. There will also be milestone prizes along the way for teams leading at given points in time.

Today, we’re excited to announce that the competition phase will begin on April 4th, 2011. Now let me turn it over to Anthony, who is going to give you a few more details about the prize.

Anthony Goldbloom (Kaggle)

Thanks a lot, Jonathan. I’m Anthony Goldbloom, Kaggle’s founder, and we’ll be running the prize. So, if you choose to participate, you’ll see a lot more of us. Well, first off, welcome to the biggest ever data prize. This is really exciting.

Three things I wanted to say. First of all, why a prize? For those of you who don’t know about Kaggle, we run data predictive modeling competitions. So, we’ve run a whole range of things, and what we find is that by throwing the problem open to a wide audience, everybody tries their own different techniques and you very quickly make rapid improvements on the benchmark. We’ve never hosted a competition where there’s been a benchmark, a model that’s been built in-house, that hasn’t been outperformed.

On the slide at the moment, you can see some of our champions. They come from all over the place. These are the people who have finished in the top three in our competitions. So, we have a physicist from Portugal, a mathematician from Israel, an engineer from New Mexico. Very often, the best answers aren’t coming from where you might expect them to come from.

Our first competition was won by a 25-year-old computer scientist from the University of Ljubljana in Slovenia. He beat a team from MIT, a team from SAS, and I suspect, if I was contracting for that job, I would not have picked him, and I would have missed out on the best model.

What sort of data can you expect to see? HPN are releasing data on things like patients’ hospital visits, doctor’s visits, what lab tests they’ve had, have they had a blood test, what meds they’re on. Your algorithm might say . . . here’s an example of a sick patient that your algorithm might flag as being at risk. They might have diabetes, high cholesterol, hypertension, and they’re not filling their prescriptions. It’s pretty obvious that this guy’s crook, and there’s a high chance he or she is going to hospital. There are many others like him in the dataset, less obvious, and I guess we want you to find them.

I don’t think it’s a stretch to say this is probably the most exciting thing to happen in 2011 in this space. How many here have heard of the Netflix Prize? I suspect a majority. Yeah. How many competed? Yes, so there’s a smattering.

The Netflix Prize solved a grave problem. It helped prevent you from seeing bad movies. The $3 million dollar Heritage Health Prize, aside from the fact the prize is three times bigger, solves a somewhat graver problem. It prevents people from going to the hospital and getting really sick where it’s preventable. I’m very excited about this, and I hope to see many more of you entering the competition.

I have just a couple of things to say. There’s a Heritage Health Prize booth in the Exhibitor Hall. For anybody who wants more information, just come by and have a chat, and I’ll be giving a talk at 1:40 where I’ll set more of this stuff out.

Meeting transcription services provided by:
Speechpad - Transcription Services
and built on
Photo of Anthony Goldbloom

Anthony Goldbloom

Kaggle

Anthony is the Founder and CEO of Kaggle. He assists companies with framing modeling tasks as data prediction competitions, ensuring that competitions reflect real-life projects. Before founding Kaggle, Anthony worked in the macroeconomic modelling areas of the Reserve Bank of Australia and before that the Australian Treasury. In these roles, Anthony was responsible for building macroeconomic models, generating economic forecasts and simulating the impact of changes in interest rates and fiscal policy on the Australian economy. Anthony holds a first class honours degree in economics and econometrics from the University of Melbourne and has published in The Economist magazine and the Australian Economic Review.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Sponsors

  • Thomson Reuters
  • EMC Data Computing Division
  • EnterpriseDB
  • Microsoft
  • Gnip
  • Rackspace Hosting
  • IBM
  • Windows Azure MarketPlace DataMarket
  • Amazon Mechanical Turk
  • Amazon Web Services
  • Aster Data
  • Cloudera
  • Clustrix
  • DataStax, Inc. (formerly Riptano, Inc.)
  • Digital Reasoning Systems
  • Heritage Provider Network
  • Impetus
  • Jaspersoft
  • Karmasphere
  • LinkedIn
  • MarkLogic
  • Pentaho
  • Pervasive
  • Revolution Analytics
  • Splunk
  • Urban Mapping
  • Wolfram|Alpha
  • Esri
  • ParAccel
  • Tableau Software

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Young at syoung@oreilly.com

Download the Strata Sponsor/Exhibitor Prospectus

Contact Us

View a complete list of Strata Contacts

Speakers Video Social