Whats the fastest way to lookup big tables for points within radius MySQL (latitude longitude)

Currently I have a few tables with 100k+ rows. I am trying to lookup the data like follows.

SELECT
*, SQRT(POW(69.1 * (latitude - '49.1044302'), 2) + POW(69.1 * ('-122.801094' - longitude) * COS(latitude / 57.3), 2)) AS distance
FROM stops
HAVING distance < 5
ORDER BY distance limit 100

But currently this method slows with high load. Some queries are taking 20+ seconds to complete.

If anyone knows any better ways to optimize this would be great.


Solution 1:

Well first of all if you have a lot of geospatial data, you should be using mysql's geospatial extensions rather than calculations like this. You can then create spatial indexes that would speed up many queries and you don't have to write long drawn out queries like the one above.

Using a comparision with ST_Distance or creating a geometry with the radius of interest along with ST_within might give you good results and could be a lot faster than the current. However the best and fastest way to achieve this, ST_Dwithin isn't implemented yet in mysql.

Solution 2:

Spatial indexes definitely depend on MySQL version. Our sites search lat/lons as well, but we are using an older version of MySQL (5.1-something) (no spatial indexes). Your query is similar to ours, but ours is based on radians. Depending on your exact needs, you can optimize it (from what you have) quite a bit.

  1. Definitely drop the sqrt() from the database query, it has to compute for every row -- only compute it at the end when displaying the actual distance to a user -- also square the "having distance < 5" to "< 25". Sqrt is expensive and easily moved to where it doesn't need to be computed.
  2. unquote the lat/lon '49.1044302' so it is strictly an int, and do lat/lon type checking outside the query. This won't speed it up, but will prevent incorrect casting due to spurious trailing whitespaces in your lat/lon variable.
  3. Convert the 5 into the actual lat/lot degrees difference in each direction +/5 to produce a limiting range (a box as it were). Add it to the "where" part of the query -- this restriction will get you a substantially reduced, almost exact result row set -- basically the x and y +/- range on the lat and lon is the upper bound on the results -- the diagonals being computed only nuance the results and their distances slightly.
  4. Move as much of the math outside the select and where -- it'll have to scan the whole table and create a temporary one, computed on every row, to give you those results. A lot of the math in the query can be converted to a constant.
  5. Speed up the row reduction (the selecting box) even more by lowering the resolution on the lat/lon (copying) into another field (and perhaps multiply by 10 or 100 & convert to an INT) and adding an index on that field, and using that field with +/- bounds in the where, at least then it will be able to use a key -- mysql can reduce and those results much faster.

At least that's how we do it.