Which is fastest? SELECT SQL_CALC_FOUND_ROWS FROM `table`, or SELECT COUNT(*)
When you limit the number of rows to be returned by a SQL query, usually used in paging, there are two methods to determine the total number of records:
Method 1
Include the SQL_CALC_FOUND_ROWS
option in the original SELECT
, and then get the total number of rows by running SELECT FOUND_ROWS()
:
SELECT SQL_CALC_FOUND_ROWS * FROM table WHERE id > 100 LIMIT 10;
SELECT FOUND_ROWS();
Method 2
Run the query normally, and then get the total number of rows by running SELECT COUNT(*)
SELECT * FROM table WHERE id > 100 LIMIT 10;
SELECT COUNT(*) FROM table WHERE id > 100;
Which method is the best / fastest?
It depends. See the MySQL Performance Blog post on this subject: To SQL_CALC_FOUND_ROWS
or not to SQL_CALC_FOUND_ROWS
?
Just a quick summary: Peter says that it depends on your indexes and other factors. Many of the comments to the post seem to say that SQL_CALC_FOUND_ROWS
is almost always slower - sometimes up to 10x slower - than running two queries.
MySQL has started deprecating SQL_CALC_FOUND_ROWS
functionality with version 8.0.17 onwards.
So, it is always preferred to consider executing your query with LIMIT
, and then a second query with COUNT(*)
and without LIMIT
to determine whether there are additional rows.
From docs:
The SQL_CALC_FOUND_ROWS query modifier and accompanying FOUND_ROWS() function are deprecated as of MySQL 8.0.17 and will be removed in a future MySQL version.
COUNT(*) is subject to certain optimizations. SQL_CALC_FOUND_ROWS causes some optimizations to be disabled.
Use these queries instead:
SELECT * FROM tbl_name WHERE id > 100 LIMIT 10; SELECT COUNT(*) WHERE id > 100;
Also, SQL_CALC_FOUND_ROWS
has been observed to having more issues generally, as explained in the MySQL WL# 12615 :
SQL_CALC_FOUND_ROWS has a number of problems. First of all, it's slow. Frequently, it would be cheaper to run the query with LIMIT and then a separate SELECT COUNT() for the same query, since COUNT() can make use of optimizations that can't be done when searching for the entire result set (e.g. filesort can be skipped for COUNT(*), whereas with CALC_FOUND_ROWS, we must disable some filesort optimizations to guarantee the right result)
More importantly, it has very unclear semantics in a number of situations. In particular, when a query has multiple query blocks (e.g. with UNION), there's simply no way to calculate the number of “would-have-been” rows at the same time as producing a valid query. As the iterator executor is progressing towards these kinds of queries, it is genuinely difficult to try to retain the same semantics. Furthermore, if there are multiple LIMITs in the query (e.g. for derived tables), it's not necessarily clear to which of them SQL_CALC_FOUND_ROWS should refer to. Thus, such nontrivial queries will necessarily get different semantics in the iterator executor compared to what they had before.
Finally, most of the use cases where SQL_CALC_FOUND_ROWS would seem useful should simply be solved by other mechanisms than LIMIT/OFFSET. E.g., a phone book should be paginated by letter (both in terms of UX and in terms of index use), not by record number. Discussions are increasingly infinite-scroll ordered by date (again allowing index use), not by paginated by post number. And so on.
When choosing the "best" approach, a more important consideration than speed might be the maintainability and correctness of your code. If so, SQL_CALC_FOUND_ROWS is preferable because you only need to maintain a single query. Using a single query completely precludes the possibility of a subtle difference between the main and count queries, which may lead to an inaccurate COUNT.
According to the following article: https://www.percona.com/blog/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/
If you have an INDEX on your where clause (if id is indexed in your case), then it is better not to use SQL_CALC_FOUND_ROWS and use 2 queries instead, but if you don't have an index on what you put in your where clause (id in your case) then using SQL_CALC_FOUND_ROWS is more efficient.
IMHO, the reason why 2 queries
SELECT * FROM count_test WHERE b = 666 ORDER BY c LIMIT 5;
SELECT count(*) FROM count_test WHERE b = 666;
are faster than using SQL_CALC_FOUND_ROWS
SELECT SQL_CALC_FOUND_ROWS * FROM count_test WHERE b = 555 ORDER BY c LIMIT 5;
has to be seen as a particular case.
It in facts depends on the selectivity of the WHERE clause compared to the selectivity of the implicit one equivalent to the ORDER + LIMIT.
As Arvids told in comment (http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/#comment-1174394), the fact that the EXPLAIN use, or not, a temporay table, should be a good base for knowing if SCFR will be faster or not.
But, as I added (http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/#comment-8166482), the result really, really depends on the case. For a particular paginator, you could get to the conclusion that “for the 3 first pages, use 2 queries; for the following pages, use a SCFR” !