Example: Earthquake data via the USGS website

Recently we had a bunch of small earthquakes in the Oakland, CA area. I thought it would be fun to use Allegro Common Lisp to retrieve and display information about recent earthquakes.

I found that the USGS website has hourly, daily and weekly quake data available here:

http://earthquake.usgs.gov/eqcenter/recenteqsww/catalogs/
Possible formats of interest to me: XML (RSS) and CSV. I decided to use the CSV format, comma separated files, since XML seemed like overkill.

What I wanted to do was to find recent earthquakes near where I lived. However, the data from USGS is of the form:

Src,Eqid,Version,Datetime,Lat,Lon,Magnitude,Depth,NST
ak,00074326,5,"January 09, 2007 23:59:55 GMT",64.1478,-150.1976,1.8,1.00,07
Time, location (latitude and longitude in decimal degrees) and magnitude, among other bits of data. The data is nice, but I needed two things to make it useful:
  1. filter out the coordinates I didn't want, and
  2. convert latitude/longitude coordinates into city names.

An implication of (1) is that I needed an easy way to convert a place name to coordinates, because if I want the facility to work for more than just my location, others will need an easy way to look up their own location. Fortunately, Google Maps provides this information. Their API requires that you supply a key assigned to you. (You can obtain a key from Google here.) NOTE: see the installation instructions below for where to put the key.

That makes it fairly easy to write a function place-to-location:

cl-user(8): (place-to-location "Berkeley, CA")
#<location 37.871666,-122.27167>
cl-user(9): (location-to-place (place-to-location "Berkeley, CA"))
"Berkeley, California"
cl-user(10): (location-to-place
	      (place-to-location
	       "2629 College Ave, Berkeley, CA"))
"Berkeley, California"
cl-user(11): (place-to-location "2629 College Ave, Berkeley, CA")
#<location 37.86319,-122.25365>
cl-user(12): 

To help me get a feel for how to filter the data I wanted to know the distance between two locations. I used the "Great Circle Distance" Formula. Some examples (distances are in miles):

cl-user(17): (distance-between
	      (place-to-location "Boulder, CO")
	      (place-to-location "Piedmont, CA"))
923.7414773838934d0
cl-user(18): (distance-between
	      (place-to-location "Lake Wylie, SC")
	      (place-to-location "Piedmont, CA"))
2283.922367943316d0
cl-user(19): 

Converting location coordinates into city names turned out to be tricky. There is no readily available web resource that will do this (like Google Maps). I did find The Zip Code Database Project which contains location coordinates for each zip code, and the data includes the names of the cities for each zip code. It's a fairly simple matter to find the nearest match for a set of coordinates.

Speaking of coordinates, how should they be represented in Lisp? At first I was using a cons of latitude and longitude. However, using car and cdr to access the slots got old and I created a measures module to define the location object. The measures API also contains the distance-between and location-near-p functions.

The zipcodes module provides the location-to-zip function, and the google-maps module uses this to provide location-to-place and place-to-location.

Putting it all together, we can do things like this:

cl-user(8): (get-quake-info
	     :period :week
	     :larger-than nil
	     :within 1.0
	     :reference-location 
	     (place-to-location
	      "555 12th St, Oakland, CA, 94607, US"))
;; Downloading data...done.
January 09, 2007 22:33:42 GMT: Hollister, California: magnitude=3.1
January 09, 2007 22:16:21 GMT: Los Altos, California: magnitude=2.0
January 09, 2007 21:39:52 GMT: Hollister, California: magnitude=1.3
January 09, 2007 16:56:54 GMT: Aromas, California: magnitude=1.4
January 09, 2007 13:20:02 GMT: Cobb, California: magnitude=1.4
January 09, 2007 11:52:11 GMT: Hayward, California: magnitude=1.2
January 09, 2007 03:01:03 GMT: El Cerrito, California: magnitude=1.9
January 08, 2007 11:30:55 GMT: San Jose, California: magnitude=2.0
January 07, 2007 14:22:56 GMT: Walnut Creek, California: magnitude=1.2
January 07, 2007 09:20:38 GMT: Kenwood, California: magnitude=3.1
January 06, 2007 12:58:01 GMT: Yountville, California: magnitude=1.8
January 06, 2007 07:04:01 GMT: Oakland, California: magnitude=2.4
January 05, 2007 22:55:53 GMT: Walnut Creek, California: magnitude=1.3
January 05, 2007 20:18:51 GMT: Livermore, California: magnitude=2.4
January 05, 2007 17:54:17 GMT: Ladera, California: magnitude=1.1
January 05, 2007 11:09:43 GMT: Cobb, California: magnitude=2.4
January 05, 2007 10:40:26 GMT: Concord, California: magnitude=1.3
January 04, 2007 20:38:03 GMT: Los Altos, California: magnitude=1.7
January 04, 2007 20:21:41 GMT: Aromas, California: magnitude=2.6
January 04, 2007 13:42:36 GMT: Palo Alto, California: magnitude=1.2
January 04, 2007 03:42:10 GMT: Oakland, California: magnitude=1.9
nil
cl-user(9): 

The :within keyword value is in decimal degrees. A degree of latitude is approximately 69 miles. So 3.0 degrees is a little more than 200 miles in latitude. I use this for longitude, as well, but a degree of longitude changes with latitude, from 69 at the equator to 0 at the north and south poles.

The :period keyword value can be :hour, :day or :week.

The :larger-than keyword is a filter on magnitude. nil means all quakes. 2.0 means those larger than magnitude 2.0.

Implementation details

For those that want to know more about how this example was implemented, this section is for you.

Zip codes

The zip code data, from the The Zip Code Database Project, comes in a large CSV file and has the following form:

"zip code", "state abbreviation", "latitude", "longitude", "city", "state"
"35004", "AL", " 33.606379", " -86.50249", "Moody", "Alabama"
...

There are 33,178 entries in the file. The first questions that come to mind are:

  • How to efficiently represent the data in the fasl file (because I don't want to be dependent at fasl load time on the CSV file)?
  • How to efficiently represent the data in memory?

For representation in the fasl file, I first did the obvious and straightforward simple vector of zip structs written out to the fasl file. It turns out that because the compiler must do circularity detection and using make-load-form is fairly expensive, having the compiler write out a simple vector of 33000+ struct objects is a very slow process, one that creates a lot of garbage and uses lots of memory.

So, to avoid the make-load-form on the structure objects, I decided to spread the slots of all the zip code structures across a bunch of vectors--strings and numbers should easily be handled by the compiler in an efficient manner. In other words, the zip code data in the fasl file would consist of one vector per structure slot, each element of the vector being the slot value from one zip code structure instance. And, this would all be created at compile time. Like this:

(defvar *zipcode-data*
    #.(let ((zips
	     (prog2
		 (progn (format t "reading zips.cvs...")
			(force-output))
		 (read-zips-csv 
		  (merge-pathnames "zips.csv" *compile-file-pathname*))
	       (format t "done~%"))))
	(list 'list
	      (vector-of zips #'zip-code)
	      (vector-of zips #'zip-state-abbrev)
	      (vector-of zips (lambda (zip)
				(let ((loc (zip-location zip)))
				  (cons (location-latitude loc)
					(location-longitude loc)))))
	      (vector-of zips #'zip-city)
	      (vector-of zips #'zip-state))))

For those not familiar with #., it is used to evaluate Lisp code at `read' time. In this case, since the expression above is being read by the compiler, the code in the #.(let ...) is being evaluated at compiler read time. The result of reading that form is a list, like this:

(list #<...vector of zip codes...>
      ...
      #<...vector of states...>)

That list is then compiled into the fasl file as the value of the *zipcode-data* special variable. When the fasl file is loaded into Lisp, *zipcode-data* will be a list of simple vectors. *zipcode-data* is then used to construct the data structure we actually want to use, which leads us to...

How to efficiently represent the data in memory? I know, because of the nature of the types of searches of the data I will do, that sequential searches on the data will be necessary. To me, that means simple vector. (If, for example, only a single zip to location mapping was needed, then a more appropriate data structure would have been a hash table.) So, at load time, we reassemble the list of simple vectors into our single simple vector of zip structs:

(defvar *zipcodes*
    (destructuring-bind (codes abbrevs locations cities states)
	*zipcode-data*
      (do* ((max (length codes))
	    (zips (make-array max))
	    (i 0 (1+ i)))
	  ((= i max) zips)
	(setf (svref zips i)
	  (make-zip :code (svref codes i)
		    :state-abbrev (svref abbrevs i)
		    :location (let ((loc (svref locations i)))
				(make-location
				 :latitude (car loc)
				 :longitude (cdr loc)))
		    :city (svref cities i)
		    :state (svref states i))))))

For those that have not used destructuring-bind, it merely takes a list and destructures it, or breaks it down into component parts. In this case, I'm taking a flat list and assigning each element to a variable. The first being codes and the fifth being states, corresponding to the first and fifth elements of the list held by *zipcode-data*.

Next, we nil out *zipcode-data* so the list of simple vectors can be garbage collected, since we no longer need them.

(setq *zipcode-data* nil)

This same technique can be used on other structure types. Nothing in the solution is specific to the `zip' struct object.

Installation

You can download the sources from usgs.tar.gz or usgs.zip.

Unpack the .tgz file. You will now have a usgs/ directory where you unpacked. Into this directory create a file called "google-maps-key.cl" which contains your Google Maps API key, like this:

(setq util.google.maps:*default-key*
  "...key obtained from google...")
Where ...key obtained from google... is the key you received from Google. The formatting of this file is important: the key must be a string and bound to the variable util.google.maps:*default-key*.

Then, you are ready to start up Allegro CL in the directory where you unpacked the source files, type

:ld load.cl
That should print something like this:
cl-user(3): :ld load.cl
; Loading /home/layer/src/usgs/load.cl
;   Fast loading /backup/acl/acl80/code/asdf.001
;;; Installing asdf patch, version 1
; loading system definition from /home/layer/src/usgs/usgs.asd into
; #<The asdf0 package>
;   Loading /home/layer/src/usgs/usgs.asd
; registering #<system usgs @ #x71d13832> as usgs
; loading system definition from ../google/google.asd into #<The asdf0 package>
;   Loading /home/layer/src/google/google.asd
; registering #<system google @ #x71d1e01a> as google
; loading system definition from ../zipcodes/zipcodes.asd into
; #<The asdf0 package>
;   Loading /home/layer/src/zipcodes/zipcodes.asd
; registering #<system zipcodes @ #x71d26a92> as zipcodes
; loading system definition from ../measures/measures.asd into
; #<The asdf0 package>
;   Loading /home/layer/src/measures/measures.asd
; registering #<system measures @ #x71d2f99a> as measures
;;; Compiling file /home/layer/src/measures/measures.cl
;;; Writing fasl file /home/layer/src/measures/measures.fasl
;;; Fasl write complete
;   Fast loading /home/layer/src/measures/measures.fasl
;;; Compiling file /home/layer/src/zipcodes/zip-package.cl
;;; Writing fasl file /home/layer/src/zipcodes/zip-package.fasl
;;; Fasl write complete
;;; Compiling file /home/layer/src/zipcodes/zip-util.cl
;   Fast loading /home/layer/src/zipcodes/zip-package.fasl
;;; Writing fasl file /home/layer/src/zipcodes/zip-util.fasl
;;; Fasl write complete
;;; Compiling file /home/layer/src/zipcodes/zip-api.cl
;   Fast loading /home/layer/src/zipcodes/zip-package.fasl
;   Fast loading /home/layer/src/zipcodes/zip-util.fasl
reading zips.cvs...done
;;; Writing fasl file /home/layer/src/zipcodes/zip-api.fasl
9094928 bytes have been tenured, next gc will be global.
See the documentation for variable *global-gc-behavior* for more information.
;;; Fasl write complete
;   Fast loading /home/layer/src/zipcodes/zip-package.fasl
;   Fast loading /home/layer/src/zipcodes/zip-util.fasl
;   Fast loading /home/layer/src/zipcodes/zip-api.fasl
;;; Compiling file /home/layer/src/google/google-maps.cl
;   Fast loading /backup/acl/acl80/code/aserve.004
;;; Installing aserve patch, version 4
;     Fast loading from bundle code/acldns.fasl.
;;; Installing acldns patch, version 1
;;; Writing fasl file /home/layer/src/google/google-maps.fasl
;;; Fasl write complete
;   Fast loading /home/layer/src/google/google-maps.fasl
;;; Compiling file /home/layer/src/usgs/usgs.cl
;   Fast loading /backup/acl/acl80/code/regexp2.001
;;; Installing regexp2 patch, version 1
;     Fast loading /backup/acl/acl80/code/yacc.001
;;; Installing yacc patch, version 1
;   Fast loading /backup/acl/acl80/code/update.fasl
;;; Installing update patch, version 6
;     Fast loading /backup/acl/acl80/code/crc.fasl
;     Fast loading /backup/acl/acl80/code/inflate.fasl
;       Fast loading from bundle code/iodefs.fasl.
;         Fast loading from bundle code/iordefs.fasl.
;           Fast loading /backup/acl/acl80/code/efmacs.002
;;; Installing efmacs patch, version 1
;;; Writing fasl file /home/layer/src/usgs/usgs.fasl
;;; Fasl write complete
;   Fast loading /home/layer/src/usgs/usgs.fasl
;     Loading /home/layer/src/usgs/google-maps-key.cl
;; Try this:

(get-quake-info :period :week
  :larger-than nil
  :within 1.0
  :reference-location (place-to-location
			 "555 12th St, Oakland, CA, 94607, US"))
cl-user(4): 

Now, you can evaluate the form printed just before the prompt to see if it works for you. Make sure you are in the cl-user package when you do evaluate the above form. NOTE: the IDE starts up in the cg-user package.

Copyright © 2014 Franz Inc., All Rights Reserved | Privacy Statement
Delicious Google Buzz Twitter Google+