Introduction to R for Data Mining

Joseph Rickert (Revolution Analytics)
Data Science, Ballroom G
Please note: to attend, your registration must include Tutorials.
Average rating: ****.
(4.50, 4 ratings)
Attendees: We will be using Revolution R Enterprise 5.0.1 during the tutorial. Download a 90-day evaluation copy of Revolution R Enterprise software as well as the scripts and other auxiliary material. The file STRATA2012scripts.zip contains 40 R scripts that form the basis of the tutorial. The auxiliary material is in two files:

  1. pdfs.zip contains some optional reference material
  2. movies.zip contains a data used in a data base example. It is not essential for the tutorial but is included for completeness.

The R scripts were tested on R 2.13.2. However, there should be no problem running them on R 2.14.1. The scripts also require a number of packages to be loaded. The primary package is rattle. When rattle loads, it also loads most of the other packages that will be needed. Additionally, near the top of each script required packages will be identified by line of code similar to library(ggplot2). CRAN packages can be dound and downloaded <a href = “http://cran.r-project.org/web/packages/”>here.

Only 5 of the 40 scripts require Revolution R Enterprise. All of the others can be run from open source R which you may download directly from CRAN.

This tutorial will enable anyone with some programming experience to begin analyzing data with the R programming language

Syllabus

  • Where did R come from?
  • What makes R different from other statistical software?
  • Data structures in R
    • Reading and writing data sets
    • Manipulating Data
  • Basic statistics in R
    • Exploratory Data Analysis
    • Multiple Regression
    • Logistic Regression
  • Data mining in R
    • Cluster analysis
    • Classification algorithms
  • Working with Big Data
    • Challenges
    • Extensions to R for big data
  • Where to go from here?
    • The R community
    • Resources for learning R
    • Getting help
Photo of Joseph Rickert

Joseph Rickert

Revolution Analytics

I am a marketing manager at Revolution Analytics with a passion for analyzing data. I have worked a number of successful Silicon Valley start-ups including Sytek, Alantec, Parallan Computer and Scotts-Valley Instruments. I have graduate degrees in both the Humanities and Statistics. I taught statistics briefly at SJSU and I blog at blog.revolutionanalytics.com

Comments on this page are now closed.

Comments

Bharath Mundlapudi
03/17/2012 1:41pm PDT

Hi, How i can i download this file STRATA2012scripts.zip? Please advice.

Picture of Sophia DeMartini
Sophia DeMartini
02/29/2012 10:56am PST

Hi Michael,

I'm sorry, the information I provided was actually incorrect.

This is actually called “Complete Video Compilation” and it is not available on oreilly.com. It can be pre-ordered at the following URL:

strataconf.com/strata2012/p...

Please let me know if you have any other questions. I can also be reached via email at sophia@oreilly.com.

Thank you, Sophia

Picture of Sophia DeMartini
Sophia DeMartini
02/29/2012 10:13am PST

Hi Michael,

We're recording all tutorials and sessions at Strata, and they'll be available after the conference as part of the All Access video compilation, which can be purchased on oreilly.com.

Best, Sophia

Picture of Micheál Keane
Micheál Keane
02/28/2012 6:23pm PST

Couldn’t make this because I was at the Deep Data series. Will a video of this be posted?

Picture of Zebulon Young
Zebulon Young
02/28/2012 4:16pm PST

Nathan: The ZIP files identified above are each available by following the link at the top of the instructions (“Download a 90-day evaluation copy of Revolution R Enterprise”), after completing a registration form. I think that since they are all behind this registration form, it wasn’t appropriate to link to them each directly. Hope this information helps.

Nathan Wenzel
02/28/2012 4:09pm PST

I see the names of the code zip files, but they don’t apear to be links. Where can the .R code from the presentation be found?

Picture of Devender Gollapally
Devender Gollapally
02/27/2012 4:03pm PST

R studio seems nice

Picture of Joseph Rickert
Joseph Rickert
02/27/2012 8:31am PST

Sorry for the late down load notice. (1) Revolution R Enterprise is not required for running most of the scripts. I will use it to show R handling a fairly large data set.

(2) The other scripts will run on standard CRAN R

(3) Revolution R Enterprise will not run on a MAC

(4) Rattle would be nice to have loaded if you can. I am running it with Revolution R Enterprise that has R 2.13.2 underneath

Picture of Leigh Dodds
Leigh Dodds
02/27/2012 7:28am PST

If anyone else on Ubuntu has Rattle installation issues, then the steps here worked for me:

groups.google.com/group/rat... users/msg/c6cff79f3960295b

Picture of Leigh Dodds
Leigh Dodds
02/27/2012 7:22am PST

I only have Ubuntu available so can’t use Revolution. The community edition which does run on Ubuntu seems to be too old.

I’ve also encountered GTK+ issues when trying to install Rattle. This seems to be a common issue with the latest version.

Are there other options?

Matthew Feinberg
02/27/2012 6:59am PST

I’m in a similar boat to Richard. Just found out about the Windows/RHEL5 requirement when all I have is Ubuntu and OSX. Hopefully there won’t be much focus on Revolution specific features, because there’s no time to work around this requirement.

Picture of Richard Marciano
Richard Marciano
02/27/2012 4:17am PST

Can I run Revolution R Enterprise 5.0.1. on my Mac OSX?

Mohit Anchlia
02/26/2012 5:56pm PST

Do I need to bring my own laptop or are machines available in the room?

Picture of Sophia DeMartini
Sophia DeMartini
02/23/2012 2:46pm PST

Hi Manish,

No deep knowledge of statistics is required. Some experience of working with data and a familiarity with regression analysis would be helpful.

Picture of Manish Bhatt
Manish Bhatt
02/23/2012 1:06pm PST

How much Statistics should I need to know in order to attend this session ?

Sponsors

  • EMC
  • Microsoft
  • HPCC Systems™ from LexisNexis® Risk Solutions
  • MarkLogic
  • Shared Learning Collaborative
  • Cloudera
  • Digital Reasoning Systems
  • Pentaho
  • Rackspace Hosting
  • Teradata Aster
  • VMware
  • IBM
  • NetApp
  • Oracle
  • 1010data
  • 10gen
  • Acxiom
  • Amazon Web Services
  • Calpont
  • Cisco
  • Couchbase
  • Cray
  • Datameer
  • DataSift
  • DataStax
  • Esri
  • Facebook
  • Feedzai
  • Hadapt
  • Hortonworks
  • Impetus
  • Jaspersoft
  • Karmasphere
  • Lucid Imagination
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Scaleout Software
  • Skytree, Inc.
  • Splunk
  • Tableau Software
  • Talend

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at sstewart@oreilly.com.

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
@oreilly.com

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

View a complete list of Strata contacts