Visualizing Geo Data

Jason Sundram (Facebook)
Presentation: Visualizing Geo Data Presentation [PDF]
Average rating: ****.
(4.50, 2 ratings)

In an increasingly mobile world, we are each generating tons of geo-tagged data. Photo uploads to Instagram, tweets, Foursquare check-ins, local searches, and even real-time public-transportation feeds are commonplace. The companies that gather this data make a lot of it freely available. The people who work for these companies have many opportunities to learn from this data. But in order to learn, we must first figure out what questions to ask. Visualization is a tool that helps us think of questions and begin to answer them.

There are 3 different major ways to think about geodata:
  1. Over time
  2. Aggregated spatially (e.g. by county)
  3. Aggregated by density (e.g. heatmap)

Additionally, creating tools that allow users to explore data on multiple scales (i.e. zoom) is important, but adds complexity: you have to find a tile source and perhaps even render your data to tiles.

Choice of projection is key. Most of us grew up with the Mercator projection, but an equal-area projection is often a better choice.

I will take one data set and walk through visualizing it using the 3 approaches described above.

The first example will use Processing and Tile Mill to generate a zoomable animated map, playing back a month worth of data. I’ll show how to render the map to a movie for easy distribution.

The second example will use d3.js to show the same data at a county level in a chloropleth map. I’ll discuss color schemes and interaction, and compare what can be done with d3.js to Fathom’s Stats of the Union project.

The last example will talk about how to make a heatmap with millions of data points.

Photo of Jason Sundram

Jason Sundram

Facebook

I’m a senior data scientist at eBay/PayPal. The work I do looks closely at data generated by mobile users. I’ve worked on the WHERE PlaceGraph (http://site.where.com/blog/the-where-placegraph/), a tool that reveals the connections between places based on searches and checkins.

In general, I do data visualization with big data, using Python and R for analysis, and Processing and javascript/canvas for display/interaction.

I’m also an accomplished violinist. My interest in music led me to work on creating The Echo Nest’s Music Analyzer, which listens to music the way people do, and extracts summary data that can be used to find out how danceable a song is. I co-created visualizer.fm, a site that synchs music to various visualizations of the Echo Nest’s analysis data. It’s hypnotic and interesting.

Sponsors

  • EMC
  • Microsoft
  • HPCC Systems™ from LexisNexis® Risk Solutions
  • MarkLogic
  • Shared Learning Collaborative
  • Cloudera
  • Digital Reasoning Systems
  • Pentaho
  • Rackspace Hosting
  • Teradata Aster
  • VMware
  • IBM
  • NetApp
  • Oracle
  • 1010data
  • 10gen
  • Acxiom
  • Amazon Web Services
  • Calpont
  • Cisco
  • Couchbase
  • Cray
  • Datameer
  • DataSift
  • DataStax
  • Esri
  • Facebook
  • Feedzai
  • Hadapt
  • Hortonworks
  • Impetus
  • Jaspersoft
  • Karmasphere
  • Lucid Imagination
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Scaleout Software
  • Skytree, Inc.
  • Splunk
  • Tableau Software
  • Talend

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at sstewart@oreilly.com.

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners
@oreilly.com

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

View a complete list of Strata contacts