When doing predictive modelling, there are two situations in which you might find yourself:
For case (1), lasso and elastic-net regularized generalized linear models are a set of modern algorithms which meet all these needs. They are fast, work on huge data sets, and avoid over-fitting automatically. They are available in the “glmnet” package in R.
For case (2), ensembles of decision trees (often known as “Random Forests”) have been the most successful general-purpose algorithm in modern times. For instance, most Kaggle competitions have at least one top entry that heavily uses this approach. This algorithm is very simple to understand, and is fast and easy to apply. It is available in the “randomForest” package in R.
Mike and Jeremy will explain in simple terms, using no complex math, how these algorithms work, and will also explain using numerous examples how to apply them using R. They will also provide advice on how to select from these algorithms, and will show how to prepare the data, and how to use the trained models in practice.
Jeremy Howard is President and Chief Scientist at Kaggle. Previously, he founded FastMail (sold to Opera Software) and Optimal Decisions sold to ChoicePoint – now called LexisNexis Risk Solutions). Prior to that he worked in management consulting, at McKinsey & Company and A.T. Kearney. Jeremy’s passion is applying algorithms to data. At FastMail he used algorithms to automate nearly every part of the business – as a result the company only needed a total of 3 full time staff, and got over a million signups. Optimal Decisions was a business entirely built to commercialise a new algorithm he designed for the optimal pricing of insurance. Jeremy competes regularly in data mining competitions, which he uses to test himself and stay on the leading edge of machine learning and predictive modelling technology. He is currently ranked #1 on Kaggle’s overall competitor rankings, out of over 16,000 data scientists.
Dr Mike Bowle’s career is one of the most extraordinary in Silicon Valley. Mike’s career started out in research, as an assistant professor at MIT. He went on to found and run two companies, both of which went on to huge IPOs: First was Com21, an early pioneer in developing cable modem networks, which Mike led to a successful NASDAQ IPO at a $300m valuation. He then went on to create IBeam Broadcasting, a video distribution network, which after just 2.5 years he led to a $3b IPO. More recently he has been active as co-founder and instructor for the series of data mining courses run at Hacker Dojo. These courses are nearly always sold out, and have received great feedback from participants.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at sstewart@oreilly.com.
For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
@oreilly.com
For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com
View a complete list of Strata contacts
Comments
Hi – still very interested in seeing the correct video, so the first one from this series that goes into random forests algo, is there a link?
The video link is for the wrong presentation. This was a tutorial, available on the conference DVD/for online viewing. Someone needs to remove the video link from this page. Also, someone should replace the PDF that doesn’t read properly with the DOC Jeremy uploaded on another site.
The video link appears to be incorrect- it points to the session “From Predictive Modelling to Optimization”. Is there a correct link for the video?
@Pedro The video can be seen at youtu.be/vYrWTDxoeGg – we’ve also just updated this session page with the links to the video & the free book.
Hi there, will the video be made available? Even if not here, the Strata newsletter announced that one could get Jeremy’s new book and also see the video, but I cannot see the link for the video. Any helpd?
Thanks Jeremy. The doc file comes through cleanly. Just tried the ZIP file again. The PDF still renders with boxes around ”?” where each character should be.Odd that I’m the only one reporting the problem.
Once again, great talk! Thanks Mike and Jeremy.
That’s odd. The PDF is working for me. I’ve popped the .doc file here: public.jhoward.fastmail.fm/... . Let me know if that works ok for you.
Anyone else having trouble with the PDF? The text does not display correctly. I just get a bunch of ????.
Where have the slides/code been posted? Please let me know. This was a great session!!
The slides and code for this presentation will be available shortly.
Hi, I was wondering when and where you’ll be posting the slides, code, and data from this talk. Thank you!