When doing predictive modelling, there are two situations in which you might find yourself:
For case (1), lasso and elastic-net regularized generalized linear models are a set of modern algorithms which meet all these needs. They are fast, work on huge data sets, and avoid over-fitting automatically. They are available in the “glmnet” package in R.
For case (2), ensembles of decision trees (often known as “Random Forests”) have been the most successful general-purpose algorithm in modern times. For instance, most Kaggle competitions have at least one top entry that heavily uses this approach. This algorithm is very simple to understand, and is fast and easy to apply. It is available in the “randomForest” package in R.
Mike and Jeremy will explain in simple terms, using no complex math, how these algorithms work, and will also explain using numerous examples how to apply them using R. They will also provide advice on how to select from these algorithms, and will show how to prepare the data, and how to use the trained models in practice.
Jeremy Howard is President and Chief Scientist at Kaggle. Previously, he founded FastMail (sold to Opera Software) and Optimal Decisions sold to ChoicePoint – now called LexisNexis Risk Solutions). Prior to that he worked in management consulting, at McKinsey & Company and A.T. Kearney. Jeremy’s passion is applying algorithms to data. At FastMail he used algorithms to automate nearly every part of the business – as a result the company only needed a total of 3 full time staff, and got over a million signups. Optimal Decisions was a business entirely built to commercialise a new algorithm he designed for the optimal pricing of insurance. Jeremy competes regularly in data mining competitions, which he uses to test himself and stay on the leading edge of machine learning and predictive modelling technology. He is currently ranked #1 on Kaggle’s overall competitor rankings, out of over 16,000 data scientists.
Dr Mike Bowle’s career is one of the most extraordinary in Silicon Valley. Mike’s career started out in research, as an assistant professor at MIT. He went on to found and run two companies, both of which went on to huge IPOs: First was Com21, an early pioneer in developing cable modem networks, which Mike led to a successful NASDAQ IPO at a $300m valuation. He then went on to create IBeam Broadcasting, a video distribution network, which after just 2.5 years he led to a $3b IPO. More recently he has been active as co-founder and instructor for the series of data mining courses run at Hacker Dojo. These courses are nearly always sold out, and have received great feedback from participants.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at firstname.lastname@example.org.
For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata contacts