Working with ZIP Code Data & MySQL

November 06, 2013

I’m working on a project currently where I need to be able to use zip code data, in relation to their longitude and latitude coordinates. I found what I felt was a half decent source of the data. The download file listed at the top says it includes CSV and SQL but I found the SQL implementation to not be quite what I was looking for, at least for MySQL.

At the bottom of this post you will find the download of the MySQL formatted zip codes.

With the MySQL database you have the options for doing geometric queries which I haven’t delved too far into, but it seemed really useful. I was using this data for proximity-based searches, using someone’s zip code.

The SQL that comes with the download is irrelevant to MySQL as it doesn’t work with that  AddGeometryColumn function.

create table zcta (
    zip char(5) primary key,
    city varchar(64),
    state char(2),
    type char(1),
    timezone int
);

select AddGeometryColumn('zipcode', 'zcta', 'location', -1, 'POINT', 2);

However, the CSV data was full of what I needed. So after some looking around I was able to mold the data into the right format that would be applicable to MySQL.

Here’s the create table syntax:

CREATE TABLE `zcta` (
  `zip` char(5) NOT NULL,
  `city` varchar(64) DEFAULT NULL,
  `state` char(2) DEFAULT NULL,
  `dst` char(1) DEFAULT NULL,
  `timezone` int(11) DEFAULT NULL,
  `location` point NOT NULL,
  PRIMARY KEY (`zip`),
  SPATIAL KEY `location` (`location`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

The MyISAM engine is apparently better for creating spatial indexes, according to the documentation. It sounds like newer versions of MySQL do have some support for InnoDB and spatial data types:

InnoDB tables do not support spatial data types before MySQL 5.0.16. As of 5.0.16, InnoDB supports spatial data types, but not indexes on them.

Alright, so I’m running with the MyISAM engine. I have the CSV file, and the SQL file doesn’t apply in this case. After building the table I am now able to import my data. The syntax for POINT columns is a little different so keep that in mind when inserting and selecting data. To get the CSV file into the right format I wrote this find/replace regex to run on the CSV data:

Find: ^(((\(|)".*?",){3})"(.*?){1}","(.*?){1}"(,.*)$
Replace: \1 GeomFromText('POINT(\4 \5)')\6

I then wrote the first part of the INSERT statement defining the necessary columns to be inserted. Run the script and you’re data should import and you’re ready to go. And that was able to get me the final format to insert the data into the SPATIAL-aware database setup with our long/lat POINT datatypes.

The end result

The SQL file of zip codes formatted for MySQL based on my create table syntax above:
MySQL ZIP Codes Data

So far this data has been working out really well for me, let me know if you notice any shortcomings or have any questions. Again, I pulled this data from this blog so if you need any information specific to the source of this data there are some details there. Cheers!