In this tutorial, we show how open source tools can be used for the entire life cycle of a predictive model built over big data. Specifically, for anyone who has built a model, we show how to: 1) perform an exploratory data analysis (EDA) of data managed by Hadoop using R and other open source tools; 2) leverage the EDA to build analytic and statistical models over data managed by Hadoop; 3) deploy these models into operational systems; and 4) measure the performance of the models and continuously improve them.
We cover the following topics:
Robert Grossman (@bobgrossman) is the Founder and a Partner of Open Data Group, which specializes in building predictive models over big data. He is a Core Faculty and Senior Fellow at the Institute for Genomics and Systems Biology (IGSB) and the Computation Institute (CI) at the University of Chicago. He has led the development of new open source software tools for analyzing big data, cloud computing, data mining, distributed computing and high performance networking. Prior to starting Open Data Group, he founded Magnify, Inc. in 1996, which provides data mining solutions to the insurance industry. Grossman was Magnify’s CEO until 2001 and its Chairman until it was sold to ChoicePoint in 2005. He blogs about big data, data science, and data engineering at rgrossman.com.
Collin Bennett is a principal at Open Data Group. In three and a half years with the company, Collin has worked on the open source Augustus scoring engine and a cloud-based environment for rapid analytic prototyping called RAP. Additionally, he has released open source projects for the Open Cloud Consortium. One of these, MalGen, has been used to benchmark several parallel computation frameworks. Previously, he led software development for the Product Development Team at Acquity Group, an IT consulting firm head-quartered in Chicago. He also worked at startups Orbitz (when it was still was one) and Business Logic Corporation. He has co-authored papers on Weyl tensors, large data clouds, and high performance wide area cloud testbeds. He holds degrees in English, Mathematics and Computer Science.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities, contact Susan Stewart at firstname.lastname@example.org.
For information on trade opportunities contact Kathy Yu at mediapartners
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata contacts.