MySQL Great Circle Distance (Haversine formula)

I've got a working PHP script that gets Longitude and Latitude values and then inputs them into a MySQL query. I'd like to make it solely MySQL. Here's my current PHP Code:

if ($distance != "Any" && $customer_zip != "") { //get the great circle distance

    //get the origin zip code info
    $zip_sql = "SELECT * FROM zip_code WHERE zip_code = '$customer_zip'";
    $result = mysql_query($zip_sql);
    $row = mysql_fetch_array($result);
    $origin_lat = $row['lat'];
    $origin_lon = $row['lon'];

    //get the range
    $lat_range = $distance/69.172;
    $lon_range = abs($distance/(cos($details[0]) * 69.172));
    $min_lat = number_format($origin_lat - $lat_range, "4", ".", "");
    $max_lat = number_format($origin_lat + $lat_range, "4", ".", "");
    $min_lon = number_format($origin_lon - $lon_range, "4", ".", "");
    $max_lon = number_format($origin_lon + $lon_range, "4", ".", "");
    $sql .= "lat BETWEEN '$min_lat' AND '$max_lat' AND lon BETWEEN '$min_lon' AND '$max_lon' AND ";
    }

Does anyone know how to make this entirely MySQL? I've browsed the Internet a bit but most of the literature on it is pretty confusing.


From Google Code FAQ - Creating a Store Locator with PHP, MySQL & Google Maps:

Here's the SQL statement that will find the closest 20 locations that are within a radius of 25 miles to the 37, -122 coordinate. It calculates the distance based on the latitude/longitude of that row and the target latitude/longitude, and then asks for only rows where the distance value is less than 25, orders the whole query by distance, and limits it to 20 results. To search by kilometers instead of miles, replace 3959 with 6371.

SELECT id, ( 3959 * acos( cos( radians(37) ) * cos( radians( lat ) ) 
* cos( radians( lng ) - radians(-122) ) + sin( radians(37) ) * sin(radians(lat)) ) ) AS distance 
FROM markers 
HAVING distance < 25 
ORDER BY distance 
LIMIT 0 , 20;

$greatCircleDistance = acos( cos($latitude0) * cos($latitude1) * cos($longitude0 - $longitude1) + sin($latitude0) * sin($latitude1));

with latitude and longitude in radian.

so

SELECT 
  acos( 
      cos(radians( $latitude0 ))
    * cos(radians( $latitude1 ))
    * cos(radians( $longitude0 ) - radians( $longitude1 ))
    + sin(radians( $latitude0 )) 
    * sin(radians( $latitude1 ))
  ) AS greatCircleDistance 
 FROM yourTable;

is your SQL query

to get your results in Km or miles, multiply the result with the mean radius of Earth (3959 miles,6371 Km or 3440 nautical miles)

The thing you are calculating in your example is a bounding box. If you put your coordinate data in a spatial enabled MySQL column, you can use MySQL's build in functionality to query the data.

SELECT 
  id
FROM spatialEnabledTable
WHERE 
  MBRWithin(ogc_point, GeomFromText('Polygon((0 0,0 3,3 3,3 0,0 0))'))

If you add helper fields to the coordinates table, you can improve response time of the query.

Like this:

CREATE TABLE `Coordinates` (
`id` INT(10) UNSIGNED NOT NULL COMMENT 'id for the object',
`type` TINYINT(4) UNSIGNED NOT NULL DEFAULT '0' COMMENT 'type',
`sin_lat` FLOAT NOT NULL COMMENT 'sin(lat) in radians',
`cos_cos` FLOAT NOT NULL COMMENT 'cos(lat)*cos(lon) in radians',
`cos_sin` FLOAT NOT NULL COMMENT 'cos(lat)*sin(lon) in radians',
`lat` FLOAT NOT NULL COMMENT 'latitude in degrees',
`lon` FLOAT NOT NULL COMMENT 'longitude in degrees',
INDEX `lat_lon_idx` (`lat`, `lon`)
)    

If you're using TokuDB, you'll get even better performance if you add clustering indexes on either of the predicates, for example, like this:

alter table Coordinates add clustering index c_lat(lat);
alter table Coordinates add clustering index c_lon(lon);

You'll need the basic lat and lon in degrees as well as sin(lat) in radians, cos(lat)*cos(lon) in radians and cos(lat)*sin(lon) in radians for each point. Then you create a mysql function, smth like this:

CREATE FUNCTION `geodistance`(`sin_lat1` FLOAT,
                              `cos_cos1` FLOAT, `cos_sin1` FLOAT,
                              `sin_lat2` FLOAT,
                              `cos_cos2` FLOAT, `cos_sin2` FLOAT)
    RETURNS float
    LANGUAGE SQL
    DETERMINISTIC
    CONTAINS SQL
    SQL SECURITY INVOKER
   BEGIN
   RETURN acos(sin_lat1*sin_lat2 + cos_cos1*cos_cos2 + cos_sin1*cos_sin2);
   END

This gives you the distance.

Don't forget to add an index on lat/lon so the bounding boxing can help the search instead of slowing it down (the index is already added in the CREATE TABLE query above).

INDEX `lat_lon_idx` (`lat`, `lon`)

Given an old table with only lat/lon coordinates, you can set up a script to update it like this: (php using meekrodb)

$users = DB::query('SELECT id,lat,lon FROM Old_Coordinates');

foreach ($users as $user)
{
  $lat_rad = deg2rad($user['lat']);
  $lon_rad = deg2rad($user['lon']);

  DB::replace('Coordinates', array(
    'object_id' => $user['id'],
    'object_type' => 0,
    'sin_lat' => sin($lat_rad),
    'cos_cos' => cos($lat_rad)*cos($lon_rad),
    'cos_sin' => cos($lat_rad)*sin($lon_rad),
    'lat' => $user['lat'],
    'lon' => $user['lon']
  ));
}

Then you optimize the actual query to only do the distance calculation when really needed, for example by bounding the circle (well, oval) from inside and outside. For that, you'll need to precalculate several metrics for the query itself:

// assuming the search center coordinates are $lat and $lon in degrees
// and radius in km is given in $distance
$lat_rad = deg2rad($lat);
$lon_rad = deg2rad($lon);
$R = 6371; // earth's radius, km
$distance_rad = $distance/$R;
$distance_rad_plus = $distance_rad * 1.06; // ovality error for outer bounding box
$dist_deg_lat = rad2deg($distance_rad_plus); //outer bounding box
$dist_deg_lon = rad2deg($distance_rad_plus/cos(deg2rad($lat)));
$dist_deg_lat_small = rad2deg($distance_rad/sqrt(2)); //inner bounding box
$dist_deg_lon_small = rad2deg($distance_rad/cos(deg2rad($lat))/sqrt(2));

Given those preparations, the query goes something like this (php):

$neighbors = DB::query("SELECT id, type, lat, lon,
       geodistance(sin_lat,cos_cos,cos_sin,%d,%d,%d) as distance
       FROM Coordinates WHERE
       lat BETWEEN %d AND %d AND lon BETWEEN %d AND %d
       HAVING (lat BETWEEN %d AND %d AND lon BETWEEN %d AND %d) OR distance <= %d",
  // center radian values: sin_lat, cos_cos, cos_sin
       sin($lat_rad),cos($lat_rad)*cos($lon_rad),cos($lat_rad)*sin($lon_rad),
  // min_lat, max_lat, min_lon, max_lon for the outside box
       $lat-$dist_deg_lat,$lat+$dist_deg_lat,
       $lon-$dist_deg_lon,$lon+$dist_deg_lon,
  // min_lat, max_lat, min_lon, max_lon for the inside box
       $lat-$dist_deg_lat_small,$lat+$dist_deg_lat_small,
       $lon-$dist_deg_lon_small,$lon+$dist_deg_lon_small,
  // distance in radians
       $distance_rad);

EXPLAIN on the above query might say that it's not using index unless there's enough results to trigger such. The index will be used when there's enough data in the coordinates table. You can add FORCE INDEX (lat_lon_idx) to the SELECT to make it use the index with no regards to the table size, so you can verify with EXPLAIN that it is working correctly.

With the above code samples you should have a working and scalable implementation of object search by distance with minimal error.