How to efficiently find the closest locations nearby a given location
I'm making a script where a load of business are loaded into a mySQL database with a latitude and longitude. Then I am supplying that script with a latitude an longitude (of the end user) and the script has to calculate the distance from the supplied lat/long to EACH of the entries it gets from the database and order them in order of nearest to furthest.
I only realistically need about 10 or 20 "nearest" results, but I can't think of anyway to do this other than to get all the results from the database and run the function on each of them and then array sort.
This is what I have already:
<?php
function getDistance($point1, $point2){
$radius = 3958; // Earth's radius (miles)
$pi = 3.1415926;
$deg_per_rad = 57.29578; // Number of degrees/radian (for conversion)
$distance = ($radius * $pi * sqrt(
($point1['lat'] - $point2['lat'])
* ($point1['lat'] - $point2['lat'])
+ cos($point1['lat'] / $deg_per_rad) // Convert these to
* cos($point2['lat'] / $deg_per_rad) // radians for cos()
* ($point1['long'] - $point2['long'])
* ($point1['long'] - $point2['long'])
) / 180);
$distance = round($distance,1);
return $distance; // Returned using the units used for $radius.
}
include("../includes/application_top.php");
$lat = (is_numeric($_GET['lat'])) ? $_GET['lat'] : 0;
$long = (is_numeric($_GET['long'])) ? $_GET['long'] : 0;
$startPoint = array("lat"=>$lat,"long"=>$long);
$sql = "SELECT * FROM mellow_listings WHERE active=1";
$result = mysql_query($sql);
while($row = mysql_fetch_array($result)){
$thedistance = getDistance($startPoint,array("lat"=>$row['lat'],"long"=>$row['long']));
$data[] = array('id' => $row['id'],
'name' => $row['name'],
'description' => $row['description'],
'lat' => $row['lat'],
'long' => $row['long'],
'address1' => $row['address1'],
'address2' => $row['address2'],
'county' => $row['county'],
'postcode' => strtoupper($row['postcode']),
'phone' => $row['phone'],
'email' => $row['email'],
'web' => $row['web'],
'distance' => $thedistance);
}
// integrate google local search
$url = "http://ajax.googleapis.com/ajax/services/search/local?";
$url .= "q=Off+licence"; // query
$url .= "&v=1.0"; // version number
$url .= "&rsz=8"; // number of results
$url .= "&key=ABQIAAAAtG"
."Pcon1WB3b0oiqER"
."FZ-TRQgsWYVg721Z"
."IDPMPlc4-CwM9Xt"
."FBSTZxHDVqCffQ2"
."W6Lr4bm1_zXeYoQ"; // api key
$url .= "&sll=".$lat.",".$long;
// sendRequest
// note how referer is set manually
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, /* url */);
$body = curl_exec($ch);
curl_close($ch);
// now, process the JSON string
$json = json_decode($body, true);
foreach($json['responseData']['results'] as $array){
$thedistance = getDistance($startPoint,array("lat"=>$array['lat'],"long"=>$array['lng']));
$data[] = array('id' => '999',
'name' => $array['title'],
'description' => '',
'lat' => $array['lat'],
'long' => $array['lng'],
'address1' => $array['streetAddress'],
'address2' => $array['city'],
'county' => $array['region'],
'postcode' => '',
'phone' => $array['phoneNumbers'][0],
'email' => '',
'web' => $array['url'],
'distance' => $thedistance);
}
// sort the array
foreach ($data as $key => $row) {
$id[$key] = $row['id'];
$distance[$key] = $row['distance'];
}
array_multisort($distance, SORT_ASC, $data);
header("Content-type: text/xml");
echo '<?xml version="1.0" encoding="UTF-8"?>'."\n";
echo '<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">'."\n";
echo '<plist version="1.0">'."\n";
echo '<array>'."\n";
for($i = 0; isset($distance[$i]); $i++){
//echo $data[$i]['id']." -> ".$distance[$i]."<br />";
echo '<dict>'."\n";
foreach($data[$i] as $key => $val){
echo '<key><![CDATA['.$key.']]></key>'."\n";
echo '<string><![CDATA['.htmlspecialchars_decode($val, ENT_QUOTES).']]></string>'."\n";
}
echo '</dict>'."\n";
}
echo '</array>'."\n";
echo '</plist>'."\n";
?>
Now, this runs fast enough with only 2 or 3 businesses in the database, but I'm currently loading 5k businesses into the database and I'm worried that its going to be incredibly slow running this for EACH entry? What do you think?
Its not the kind of data I could cache either, as the likelihood of two users having the same lat/long is liable to be incredibly rare, and therefore wouldn't help.
What can I do about this?
Thanks for any help and any suggestions. They're all much appreciated.
Solution 1:
I think what you're trying to achieve could be done better using the Haversine formula in your SQL. Google has a tutorial on how to get the nearest locations in a MySQL database but the general idea is this SQL:
SELECT id, ( 3959 * acos( cos( radians(37) ) * cos( radians( lat ) )
* cos( radians( lng ) - radians(-122) ) + sin( radians(37) )
* sin( radians( lat ) ) ) ) AS distance
FROM markers
HAVING distance < 25
ORDER BY distance LIMIT 0 , 20;
Then all the work you need to do is done on the database, so you don't have to pull all the businesses into your PHP script before you even check the distance.
Solution 2:
Option 1: Do the calculation on the database by switching to a database that supports GeoIP.
Option 2: Do the calculation on the database: you're using MySQL, so the following stored procedure should help
CREATE FUNCTION distance (latA double, lonA double, latB double, LonB double)
RETURNS double DETERMINISTIC
BEGIN
SET @RlatA = radians(latA);
SET @RlonA = radians(lonA);
SET @RlatB = radians(latB);
SET @RlonB = radians(LonB);
SET @deltaLat = @RlatA - @RlatB;
SET @deltaLon = @RlonA - @RlonB;
SET @d = SIN(@deltaLat/2) * SIN(@deltaLat/2) +
COS(@RlatA) * COS(@RlatB) * SIN(@deltaLon/2)*SIN(@deltaLon/2);
RETURN 2 * ASIN(SQRT(@d)) * 6371.01;
END//
EDIT
If you have an index on latitude and longitude in your database, you can reduce the number of calculations that need to be calculated by working out an initial bounding box in PHP ($minLat, $maxLat, $minLong and $maxLong), and limiting the rows to a subset of your entries based on that (WHERE latitude BETWEEN $minLat AND $maxLat AND longitude BETWEEN $minLong AND $maxLong). Then MySQL only needs to execute the distance calculation for that subset of rows.
FURTHER EDIT (as an explanation for the previous edit)
If you're simply using the SQL statement provided by Jonathon (or a stored procedure to calculate the distance) then SQL still has to look through every record in your database, and to calculate the distance for every record in your database before it can decide whether to return that row or discard it.
Because the calculation is relatively slow to execute, it would be better if you could reduce the set of rows that need to be calculated, eliminating rows that will clearly fall outside of the required distance, so that we're only executing the expensive calculation for a smaller number of rows.
If you consider that what you're doing is basically drawing a circle on a map, centred on your initial point, and with a radius of distance; then the formula simply identifies which rows fall within that circle... but it still has to checking every single row.
Using a bounding box is like drawing a square on the map first with the left, right, top and bottom edges at the appropriate distance from our centre point. Our circle will then be drawn within that box, with the Northmost, Eastmost, Southmost and Westmost points on the circle touching the borders of the box. Some rows will fall outside that box, so SQL doesn't even bother trying to calculate the distance for those rows. It only calculates the distance for those rows that fall within the bounding box to see if they fall within the circle as well.
Within PHP, we can use a very simple calculation that works out the minimum and maximum latitude and longitude based on our distance, then set those values in the WHERE clause of your SQL statement. This is effectively our box, and anything that falls outside of that is automatically discarded without any need to actually calculate its distance.
There's a good explanation of this (with PHP code) on the Movable Type website that should be essential reading for anybody planning to do any GeoPositioning work in PHP.
Solution 3:
If you have a lot of points, queries with distance formulas in them will be very slow because it's not using an index for the search. For efficiency you'd either have to use a rectangular bounding box to make it faster, or you can use a database with GIS features built in. PostGIS is free and here's an article on doing nearest neighbor search:
http://www.bostongis.com/PrinterFriendly.aspx?content_name=postgis_nearest_neighbor_generic