Formulas to Calculate Geo Proximity
I need to implement a Geo proximity search in my application but I'm very confused regarding the correct formula to use. After some searches in the Web and in StackOverflow I found that the solutions are:
- Use the Haversine Formula
Use the Great-Circle Distance Formula- Use a Spatial Search Engine in the Database
Option #3 is really not an option for me ATM. Now I'm a little confused since I always though that the Great-Circle Distance Formula and Haversine Formula were synonymous but apparently I was wrong?
The above screen shot was taken from the awesome Geo (proximity) Search with MySQL paper, and uses the following functions:
ASIN, SQRT, POWER, SIN, PI, COS
I've also seen variations from the same formula (Spherical Law of Cosines), like this one:
(3956 * ACOS(COS(RADIANS(o_lat)) * COS(RADIANS(d_lat)) * COS(RADIANS(d_lon) - RADIANS(o_lon)) + SIN(RADIANS(o_lat)) * SIN(RADIANS(d_lat))))
That uses the following functions:
ACOS, COS, RADIANS, SIN
I am not a math expert, but are these formulas the same? I've come across some more variations, and formulas (such as the Spherical Law of Cosines and the Vincenty's formulae - which seems to be the most accurate) and that makes me even more confused...
I need to choose a good general purpose formula to implement in PHP / MySQL. Can anyone explain me the differences between the formulas I mentioned above?
- Which one is the fastest to compute?
- Which one provides the most accurate results?
- Which one is the best in terms of speed / accuracy of results?
I appreciate your insight on these questions.
Based on theonlytheory answer I tested the following Great-Circle Distance Formulas:
- Vincenty Formula
- Haversine Formula
- Spherical Law of Cosines
The Vincenty Formula is dead slow, however it's pretty accurate (down to 0.5 mm).
The Haversine Formula is way faster than the Vincenty Formula, I was able to run 1 million calculations in about 6 seconds which is pretty much acceptable for my needs.
The Spherical Law of Cosines Formula revealed to be almost twice as fast as the Haversine Formula, and the precision difference is neglectfulness for most usage cases.
Here are some test locations:
-
Google HQ (
37.422045
,-122.084347
) -
San Francisco, CA (
37.77493
,-122.419416
) -
Eiffel Tower, France (
48.8582
,2.294407
) -
Opera House, Sydney (
-33.856553
,151.214696
)
Google HQ - San Francisco, CA:
- Vincenty Formula:
49 087.066 meters
- Haversine Formula:
49 103.006 meters
- Spherical Law of Cosines:
49 103.006 meters
Google HQ - Eiffel Tower, France:
- Vincenty Formula:
8 989 724.399 meters
- Haversine Formula:
8 967 042.917 meters
- Spherical Law of Cosines:
8 967 042.917 meters
Google HQ - Opera House, Sydney:
- Vincenty Formula:
11 939 773.640 meters
- Haversine Formula:
11 952 717.240 meters
- Spherical Law of Cosines:
11 952 717.240 meters
As you can see there is no noticeable difference between the Haversine Formula and the Spherical Law of Cosines, however both have distance offsets as high as 22 kilometers compared to the Vincenty Formula because it uses an ellipsoidal approximation of the earth instead of a spherical one.
Solution 1:
The Law of Cosines and the Haversine Formula will give identical results assuming a machine with infinite precision. The Haversine formula is more robust to floating point errors. However, today's machines have double precision of the order of 15 significant figures, and the law of cosines may work just fine for you. Both these formulas assume spherical earth, whereas Vicenty's iterative solution (most accurate) assumes ellipsoidal earth (in reality the earth is not even an ellipsoid - it is a geoid). Some references: http://www.movable-type.co.uk/scripts/gis-faq-5.1.html
It gets better: note the latitude to be used in the law of cosines as well as the Haversine is the geocentric latitude, which is different from geodetic latitude. For a sphere, these two are the same.
Which one is fastest to compute?
In order from fastest to slowest are: law of cosines (5 trig. calls) -> haversine (involves sqrt) -> Vicenty (have to solve this iteratively in a for loop)
Which one is most accurate?
Vicenty.
Which one is best when speed and accuracy are both considered?
If your problem domain is such that for the distances you are trying to calculate, the earth can be considered as flat, then you can work out (I am not going to give details) a formula of the form x = kx * difference in longitude, y = ky * difference in latitude. Then distance = sqrt(dxdx + dydy). If your problem domain is such that it can be solved with distance squared, then you won't have to take sqrt, and this formula will be as fast as you get possibly get. It has the added advantage that you can calculate the vector distance - x is distance in east direction, and y is distance in the north direction. Otherwise, experiment with the 3 and choose what works best in your situation.
Solution 2:
So you want to:
- sort records by distance from p0
- select only records whose distance from p0 is less than r
The trick is that you don't exactly need to compute the great circle distance for that! You can do with any function from a pair of points to a real value that strictly grows with the great circle distance between the points. There are many such functions and some are much faster to compute than the various formulas for the exact great circle distance. One such function is the Euclidean distance in 3D. Converting latitude and longitude to a 3D point on the sphere doesn't involve inverse trigonometric functions.
Once you have x,Y,Z, you can realize that you don't actually need the distance from p0 to your point, because you can as well use the distance from the tangent plane at p0. That distance also strictly grows with the great circle distance, and is computed from X,Y,Z as a linear combination - not even a square root is needed. You just need to precompute the coefficients and the cutoff distance that corresponds to the desired great circle distance.