MySQL Query IN() Clause Slow on Indexed Column

I Have a MySQL query that is being generated by a PHP script, the query will look something like this:

SELECT * FROM Recipe_Data WHERE 404_Without_200 = 0 AND Failures_Without_Success = 0 AND RHD_No IN (10, 24, 34, 41, 43, 51, 57, 59, 61, 67, 84, 90, 272, 324, 402, 405, 414, 498, 500, 501, 510, 559, 562, 595, 632, 634, 640, 643, 647, 651, 703, 714, 719, 762, 765, 776, 796, 812, 814, 815, 822, 848, 853, 855, 858, 866, 891, 920, 947, 956, 962, 968, 1049, 1054, 1064, 1065, 1070, 1100, 1113, 1119, 1130, 1262, 1287, 1292, 1313, 1320, 1327, 1332, 1333, 1335, 1340, 1343, 1344, 1346, 1349, 1352, 1358, 1362, 1365, 1482, 1495, 1532, 1533, 1537, 1549, 1550, 1569, 1571, 1573, 1574, 1596, 1628, 1691, 1714, 1720, 1735, 1755, 1759, 1829, 1837, 1844, 1881, 1919, 2005, 2022, 2034, 2035, 2039, 2054, 2076, 2079, 2087, 2088, 2089, 2090, 2091, 2092, 2154, 2155, 2156, 2157, 2160, 2162, 2164, 2166, 2169, 2171, 2174, 2176, 2178, 2179, 2183, 2185, 2186, 2187, 2201, 2234, 2236, 2244, 2245, 2250, 2255, 2260, 2272, 2280, 2281, 2282, 2291, 2329, 2357, 2375, 2444, 2451, 2452, 2453, 2454, 2456, 2457, 2460, 2462, 2464, 2465, 2467, 2468, 2469, 2470, 2473, 2474, 2481, 2485, 2487, 2510, 2516, 2519, 2525, 2540, 2545, 2547, 2553, 2571, 2579, 2580, 2587, 2589, 2597, 2602, 2611, 2629, 2660, 2662, 2700, 2756, 2825, 2833, 2835, 2858, 2958, 2963, 2964, 3009, 3090, 3117, 3118, 3120, 3121, 3122, 3123, 3126, 3127, 3129, 3130, 3133, 3135, 3137, 3138, 3139, 3141, 3142, 3145, 3146, 3147, 3151, 3152, 3155, 3193, 3201, 3204, 3219, 3221, 3222, 3223, 3224, 3225, 3226, 3227, 3228, 3229, 3231, 3232, 3233, 3234, 3235, 3237, 3239, 3246, 3250, 3253, 3259, 3261, 3291, 3315, 3328, 3377, 3381, 3383, 3384, 3385, 3387, 3388, 3389, 3390, 3396, 3436, 3463, 3465, 3467, 3470, 3471, 3484, 3507, 3515, 3554, 3572, 3641, 3672, 3683, 3689, 3690, 3692, 3693, 3694, 3697, 3698, 3705, 3711, 3713, 3715, 3716, 3717, 3719, 3720, 3722, 3726, 3727, 3732, 3737, 3763, 3767, 3770, 3771, 3772, 3773, 3803, 3810, 3812, 3816, 3846, 3847, 3848, 3851, 3874, 3882, 3902, 3903, 3906, 3908, 3916, 3924, 3967, 3987, 4006, 4030, 4043, 4045, 4047, 4058, 4067, 4107, 4108, 4114, 4115, 4131, 4132, 4133, 4137, 4138, 4139, 4140, 4141, 4142, 4146, 4150, 4151, 4152, 4153, 4157, 4158, 4160, 4163, 4166, 4167, 4171, 4179, 4183, 4221, 4225, 4242, 4257, 4435, 4437, 4438, 4443, 4446, 4449, 4450, 4451, 4452, 4454, 4460, 4550, 4557, 4618, 4731, 4775, 4804, 4972, 5025, 5026, 5039, 5042, 5294, 5578, 5580, 5599, 5602, 5649, 5726, 5779, 5783, 5931, 5934, 5936, 5939, 5940, 5941, 5978, 6044, 6056, 6113, 6116, 6118, 6122, 6123, 6125, 6127, 6128, 6129, 6130, 6131, 6135, 6141, 6145, 6147, 6150, 6152, 6153, 6154, 6160, 6166, 6169);

The column RHD_No is the primary key for this database, and there are about 400,000 rows total. The problem is, the query is extremely slow, it's often around 2 seconds, but I've seen it get as long as 10.

When I try to explain the query, everything seems like it should be fine:

+----+-------------+-------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table       | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-------------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | Recipe_Data | range | PRIMARY       | PRIMARY | 4       | NULL |  420 | Using where |
+----+-------------+-------------+-------+---------------+---------+---------+------+------+-------------+

When I profile the query I get:

mysql> show profile;
+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000015 |
| checking query cache for query | 0.000266 |
| Opening tables                 | 0.000009 |
| System lock                    | 0.000004 |
| Table lock                     | 0.000006 |
| init                           | 0.000115 |
| optimizing                     | 0.000038 |
| statistics                     | 0.000797 |
| preparing                      | 0.000047 |
| executing                      | 0.000002 |
| Sending data                   | 2.675270 |
| end                            | 0.000007 |
| query end                      | 0.000003 |
| freeing items                  | 0.000071 |
| logging slow query             | 0.000002 |
| logging slow query             | 0.000058 |
| cleaning up                    | 0.000005 |
+--------------------------------+----------+

I've been working on this problem for a long time and I haven't been able to find a solution. Is there anything overtly wrong with this query? I don't see how looking at 420 rows should take 2+ seconds.


Solution 1:

You are accessing 420 rows by primary key which will probably lead to an index access path. This could access 2 index pages and one data page per key. If these are in cache, the query should run fast. If not, every page access that goes to disk will incur the usual disk latency. If we assume 5ms disk latency and 80% cache hits, we arrive at 420*3*0.2*5ms=1.2 seconds which is on the order of what you're seeing.

Solution 2:

The problem is that IN is basically treated as a bunch of ORs (e.g.

col IN (1,2,3)

is

col = 1 OR col = 2 OR col = 3

This is a LOT slower than a join.

What you should do is to generate the SQL code which creates the temporary table, populates it with the values in the "IN" clause, and then join with that temp table

CREATE TEMPORARY TABLE numbers (n INT)

Then in a loop, add

INSERT numbers  VALUES ($next_number)

Then at the end

SELECT * FROM numbers, Recipe_Data 
WHERE numbers.n = RHD_No

Solution 3:

You should transform the IN clauses to INNER JOIN clauses.

You can transform a query like this one:

SELECT  foo   
FROM    bar   
WHERE bar.stuff IN  
       (SELECT  stuff FROM asdf)

Into a query like this other one:

SELECT  b.foo 
FROM    ( 
        SELECT  DISTINCT stuff 
        FROM    asdf ) a 
JOIN    bar b 
ON      b.stuff = a.stuff

You will gain a lot of performance.

As the php generate the query, try some kind of trick like a temporary table for the items inside the IN clause. Always try to avoid the IN clauses if you can, because they are very time consuming.