Jon Bruner
Jon Bruner

How To Build an Interactive Map with Open-Source Tools

November 17, 2011 | 2:35 pm
My interactive migration map for Forbes

My interactive migration map for Forbes, showing inbound (blue) and outbound (red) migration to and from Maricopa County, Arizona

My latest interactive migration map on Forbes.com improves on the previous version in a few ways: it’s got five years of data instead of one; a brand-new layout; and some much-requested features like a search tool and the ability to switch off the lines. But the upgrade that I’m most excited about is in the code: I built the map using nothing but open-source software, from Python and MySQL to handle the data right down to JavaScript to display the map. I’ve been steadily moving much of my data handling to Python and MySQL, but this is the first map I’ve made using JavaScript, and interactive JS maps are still rare elsewhere, too.

The previous map was built in Flash, and I used some other proprietary software to handle the data and tweak the presentation. Moving to JavaScript for interactive applications saves money you’d otherwise spend on Flash licenses and it makes your work more widely available: this map functions on the iPad, for instance (albeit very slowly, since it’s computationally intensive and involves fairly large downloads). Here, in case it’s useful for anyone else who makes these sorts of things, is a rundown of how I built the map.

Overview

This year’s map is similar in basic function to last year’s. When you visit the page, JavaScript code renders a county map of the United States and prepares it for interaction. When you roll over a county, an event listener fires, displaying a callout with the name of the county and turning the county’s edges red. When you click on a county, your browser downloads a corresponding file that includes a list of other counties to which and from which people migrated, along with relevant stats (income per capita of migrants) and the figures that are shown above the map (year-by-year migration, population). Your browser fills out the stats at the top of the screen, draws a graph (or animates a change from the previous graph, if you’ve already clicked on a county), and loops over the counties in the file, filling them with some shade of red or blue to indicate net inward or outward migration.

My JavaScript code deals with two big datasets: one—the migration data—is downloaded and rendered on the fly every time you click on a county. The other consists of the contours of the map itself: the locations of the boundaries that define the 3,143 counties in the United States.

The Map

I started by building a generalized interactive map of U.S. counties, where each county listens for rollover and click events and the appearance of each county can be changed programmatically. This is the sort of interaction that Flash has been critical for in the past, but the rise of faster browsers that better comply with universal standards means we can make this sort of map with JavaScript.

You can build a map like this with HTML5 Canvas, or, more promisingly, publish the map as an SVG image and use a library like JQuery to manipulate the appearance of the counties with CSS. But neither of those techniques is compatible with Internet Explorer 7 or 8, which together still have significant (roughly 15%, in the case of this map) market share. To get around this browser compatibility issue, I used the excellent Raphaël JavaScript library to draw counties and handle interactions with them. Raphael renders images as SVGs for users with modern browsers and as VMLs for Internet Explorer users, and it provides a useful set of functions for interacting with shapes once they’ve been drawn.

We want Raphael to create each county as a polygon (or group of polygons). For this, we need polygon definitions for each county, and we can find those in a very useful SVG file available on Wikimedia. SVGs are vector graphics that work something like HTML; open this SVG county map in a text editor and you’ll see a list of nodes that look like this:

<path
 d="M 404.13498,227.558 L 407.75898,227.324 L 407.95298,228.019
    L 408.99798,231.791 L 409.07498,232.061 L 405.21798,232.503
    L 404.57198,232.58 L 404.13498,227.558"
 id="01111"
 label="Randolph, AL" />

That definition draws and labels Randolph County, Alabama. The “d” attribute contains the county’s edges: start at x = 404, y = 227, then move to 407, 227, and so forth. We need to get these paths into Raphael so that we can draw them on the page. Fortunately, the path definition syntax for Raphael looks very similar; we can convert the SVG’s paths to the slightly more compact Raphael format using regular expressions and scale linearly as needed to the width and height of our eventual map.

I extracted the path definition and county ID (a FIPS code—see below) from the SVG file with Python’s useful BeautifulSoup library and stored them in a MySQL database . I then queried that database, along with another one that I’ve built to return properly-styled place names (i.e., “Randolph, AL” becomes “Randolph County (Roanoke), Ala.”), to create a single JSON file that contains a name, ID and path definition for each county. Here’s how Randolph County looks in that file (remember that I’ve increased the size of the map to fit my page, and have scaled the path linearly):

["01111", "Randolph County (Roanoke), Ala.",
    "M727,410L734,409L734,410L736,417L736,418L729,419L728,419L727,410"]

This JSON file is fairly large (mine is about 580KB), but it’s much smaller than the original SVG file (about 1.9MB). Now it becomes easy to download this definition file, loop over it, and draw the counties. In the map’s JavaScript, we write (after importing JQuery and Raphael):

$(document).ready(function() {
    $.getJSON("/path/to/counties.json", function(data) {
        drawMap(data);
    })});

function drawMap() {
    map = Raphael(
        document.getElementById("map_div_id", mapWidth, mapHeight)
    );
    var pathCount = data.length;
    //Loop over all of the counties in the JSON file
    for (i = 0; i < pathCount; i++) {
        //The county's polygon definition is available at data[i][2]
        var thisPath = map.path(data[i][2]);
        //and its ID is at data[i][0];
        thisPath.id = data[i][0];
        thisPath.name = data[i][1];
        //Give the paths whatever appearance you want
        thisPath.attr({stroke:"#FFFFFF", fill:"#CBCBCB",
            "stroke-width":"0.2"});
        //Add event listeners for rollovers
        thisPath.mouseover(function (e) {countyMouseOver(e)});
    }
}

Now the event functions will look something like this. You just have to retrieve the event target’s Raphael node, and then you’ve got yourself a Raphael object that can take all of the Raphael methods. Avoid the temptation to operate directly on these targets with JQuery, because then you’ll lose Internet Explorer compatibility.

function countyMouseOver(e) {
    //Retrieve the mouseover target as a Raphael object
    var raph = e.target.raphael;
    //Use this to display a callout or whatever
    var thisCountyName = raph.name;
    //Change the color of the county's edges to indicate selection
    raph.attr({stroke:"#FF0000", "stroke-width":"1"});
    //Get ready for a click
    thisPath.click(function (e) {countyClick(e)});
}

There’s obviously a lot more than that going on in the migration map, but that’s the foundational structure of the map. It takes a moment for most browsers to render this, but there’s still room to load all of your data in this step if you’re doing something fairly simple with your map. If you need to show more data, you’ll have to make the map download it on the fly, as I do in the migration map.

Adding More Data

The migration map presents a little under 20 megabytes of data in total—that’s pairwise in- and out-migration totals for every county in the country for five years. We obviously can’t have users download all of this data at the outset, and that’d be overkill in any case because most users only look at a handful of counties in a single session. So I pre-compiled one JSON file for each county for each year (15,715 files altogether) and published them to Forbes.com. The map downloads and parses them as users click on counties. So the countyClick function looks something like this, specifying an individual county JSON file to download and initiating the process:

function countyClick(e) {
    var thisID = e.target.raphael.id;
    //Compose the path to the JSON file for this county
    var url = 'path/to/json/files/' + thisID + '.json';
    $.getJSON(url, function(data) {renderData(data)});
}

Then we do whatever we want with the data in the callback function renderData(data).

The IRS Data

A bit about the IRS data I used in the migration map, in case you’re interested.

This data comes in two files for each year, one for inbound moves by county and the other for outbound moves. Each file contains one line for each pair of counties in the country along with tax return stats for the people who moved between them: number of returns, number of exemptions, and total adjusted gross income, in thousands, for those returns. So in the 2009 outbound CSV file, we see this line:

"01","001","01","047","AL","Dallas County",42,94,972

In dealing with these files it’s useful to know about FIPS codes, 5-digit unique identifiers for each county. The first two digits correspond to the state and the last three to the county. In the IRS files they’re broken apart. When concatenated, the two columns on the left give us the county code for Autauga County, Alabama (01001). The third and fourth columns give us the code for Dallas County, Alabama (01047), and the last three columns tell us that people who moved from Autauga County to Dallas County in 2009 filed a total of 42 income tax returns, on which they counted 94 exemptions, and that the total adjusted gross income on all of those returns was $972,000.

Note that only people who file income tax returns will be included in this data, so it leaves out some retirees, some young people, and some low-income people. Nevertheless, we can glean a lot of information from this single line of data that’s useful in comparing this migratory flow to other migratory flows around the country: for instance, that adjusted gross income per capita among people who pay income tax and moved from Autauga County to Dallas County in 2009 was $10,340. (Household AGI, if you want to make an additional leap to equate a tax return with a household, averaged $23,100.) The IRS only reports these figures for groups of 10 returns or more, in order to preserve the privacy of filers.

Since the IRS data comes in the form of two CSV files per year, it’s best to consolidate all of the data in one place—I uploaded it to a MySQL database that was easy to query when it came time to build the individual county files that underlie the map.

Share


Comments

Claus
November 18, 2011 | 12:24 AM
Thanks, that's exactly what I was looking for!
November 18, 2011 | 10:36 AM
I'm glad you found it useful!
Martin
November 18, 2011 | 4:38 AM
It seams that there is some problem loading the raphael library.
craig
November 18, 2011 | 7:41 AM
From one programmer to another, that map is super cool!
November 18, 2011 | 9:30 AM
Thanks!
November 20, 2011 | 2:07 PM
Jon, I'm curious about how you made the determination for what constitutes a “properly-styled place name”.
November 21, 2011 | 12:13 PM
By that I meant styled according to AP standards, which mostly means state abbreviations that aren't postal codes (i.e., "W.Va." instead of "WV"). I also inserted the name of each county's largest city in parentheses for improved searchability.
November 21, 2011 | 6:00 AM
Hey this is good. Let me see if I can come up with something like this for my country. I'm from India :)
November 21, 2011 | 1:19 PM
Thanks for the info Jon, the quotes were exactly that quotes. My curiosity was whether or not the naming convention had a source datafile or a convention. Now, I know, convention. I liked the naming style as was wondering mainly how much worked I'd have to do to reproduce it.
Tim
November 24, 2011 | 12:24 PM
"Close to 40 million Americans move from one home to another every year." What percent of that number are folks who move within a county and thus don't appear on the interactive map?
November 24, 2011 | 2:52 PM
Most moves--about 2/3--are within the same county and are therefore not represented on the map. Here are some general stats on migration: http://www.census.gov/hhes/migration/data/cps/cps2011.html
November 26, 2011 | 3:54 PM
Excellent example of what can be done with python, javascript and json these days. Thanks for sharing the details. I often take a similar approach of precompiling many small datasets when I know users will only be asking for a few. Understanding how users will interact with your tool and designing to meet those expectations is key. Congratulations on a well conceived, well executed data visualization tool.
Karen
December 2, 2011 | 12:06 PM
Very nice map and good directions. Thanks.
Jeremy
January 7, 2012 | 2:12 PM
Great job! One optimization thought - what if you downloaded the initial county map so it didn't need to be rendered in the browser? The initial view would be presented sooner - but could you still load all the data through raphael and have it handle the interactive rendering but skip the initial rendering? And would it be any faster to load that data if you could skip the step of actually rendering the initial map?
January 9, 2012 | 1:42 PM
That's an excellent thought, and I'd definitely download the initial county map already rendered as an SVG if only some people weren't still using IE 6, 7 and 8, which can't render SVGs. If I'd had an extra week to make this map, I might have detected browser on load and used an SVG for modern browsers and VML for Internet Explorer. That's basically what Raphael is doing, but you're right--it'd be more efficient to pre-render the map.
April 30, 2012 | 7:32 PM
Outstanding work, Jon! Is there a simple way to embed a view of the American Migration map in an external site? Also, I've been working on a state-by-state map of BMV organ donation designation rates. They vary wildly from state-to-state and from county-to-county. Would you collaborate with me on the Opt US In map? http://www.optusin.org I've almost reached my technical limit; the next step of interactivity would be adding a special "on hover" effect for the map legend that lights up all the states in a given donor category, and adding a state donor percentage to a pop-up effect for each state. I'd love to get the organ donor data down to the county level, but there's no ready data source for that yet... the Organ Donor Organizations are organized like a confederacy, with each state BMV reporting data to one of the state's regional ODOs singly, and without apparent data consistency. (Something I'd love to see change.) With your help, the US Organ Donation map could look so much better, and be so much more effective at drawing awareness toward the inexplicable disparities in organ donor education and designation that have left us in a situation where 18 Americans die each day waiting on the organ donor waiting list, with ostensibly pro-life states falling short of 50% organ donor designation. Any tips or help or collaboration you'd offer would be very much appreciated, on embeds of the American Migration map or help or collaboration on Opt US In.
May 25, 2012 | 1:11 PM
Thanks for your note! The easiest way to embed a view of the map is to take a screen shot of the map using shift+prntscrn on a PC or command-4 on a Mac. Also, you can use the share button once you've clicked on a county to get a URL that will pre-load that county. Nice work on the organ donation map--I think you've got all of the basics that you'd need to add rollovers to the legend. The easiest way to do that would be to render the legend in Raphael along with the map, which would make the color boxes and labels Raphael objects that could have hover listeners added.
jason
May 24, 2012 | 8:03 PM
How do you do all this with your own image ? I have a world map from a MMO that I would like to make interactive to show all the mobs teleports , dungeons etc. here is the map I made http://www3.picturepush.com/photo/a/6990271/img/6990271.jpg
May 25, 2012 | 1:06 PM
I'm afraid this only works by starting with a vector image--one that's defined as a group of shapes rather than one that's defined as pixels, like the JPG you've linked. In order to make that map image interactive with the method outlined above, you'd have to trace and label shapes over your image using a vector editor like Adobe Illustrator, and use the resulting SVG as the basis for your interactive application.
June 21, 2012 | 4:47 PM
Jon - amazing stuff. I really like it. I'm wondering if you know of a place to get individual state maps (for all 50) that have FIPS codes and SVG. We want to build an application that shows county results, but want it to only be one state at a time so the counties are bigger and more legible. Any thoughts you have on where to get a set of maps like that would be great. Thanks!
August 1, 2012 | 12:18 PM
Thanks, Joel. I can't think of a place where those would be pre-made, but you should be able to write a Python script that takes the national map with counties and breaks it up state-by-state. Counties are grouped into states by the first two digits of their FIPS codes, so you could loop over the counties in the national SVG and create a new SVG every time you come across a change of state prefix.
Robert Hudson
August 5, 2012 | 4:39 PM
Jon, thanks for sharing your experience. Very impressive. Good luck at O'Reilly. Joel - perhaps this helps you: http://libremap.org/data/boundary/
August 11, 2012 | 2:29 AM
I could recommend jVectorMap library (http://jvectormap.com/) for the web interactive data visualization.
Matthew Burroughs
October 31, 2012 | 3:30 PM
Great job. Quick question I am trying to create a map with interactive variables to illustrate the rate of regional distribution associated with various economic models. Which language would be best in using these functions to illustrate this distribution.
October 31, 2012 | 4:26 PM
Thanks--if you build the map as outlined here, then you'd probably want to use JS to control it, since JS is used to build it in the first place.
Jon Truelove
November 12, 2012 | 5:30 PM
Great rundown. You've inspired me to move forward on an interactive world map here at work...seems like internal clients are constantly requesting map-driven data presentations. Thanks so much!
November 12, 2012 | 5:50 PM
Thanks! I'm delighted to hear that.
December 18, 2012 | 4:50 PM
Gotta say, I completely disagree with the "saving money" argument. I could probably build this in a day with Flash, SVG and some JSFL, and the performance and file size would kick open technologies into touch as well. Unfortunately Steve Jobs has seen to it that Flash (even when it IS the right tool for the job) won't survive, so we have to resort to all sorts of unrelated libraries and disparate technologies that only work in some browsers, and spend ages longer learning and debugging them. Flash licenses (now super cheap with Creative Cloud) vs learning multiple open-stack technologies? I'd put my money on Flash regarding cost, time and performance, but things move on, so must we.
December 18, 2012 | 5:11 PM
BTW - just to clarify, I meant the mapping component would be built in a day in Flash, the data wrangling and so forth would obviously take more work :)

Leave a Reply

Your email address will not be published. Required fields are marked *

Name *
E-mail Address *
Web Site
Comment

Home | Recent Work | RSS© 2004-2014 Jonathan E. Bruner
Hier gehts weiter Stromvergleich