MySQL Long Query Progress Monitoring
Just to preface my question, I understand that there is no direct support for something like this. What I am looking for is any sort of work-around, or convoluted derivation that would get me a half-respectable result.
I am working with a rather large MySQL cluster (tables > 400 million rows) using the cluster engine.
Is anyone aware of a way to either directly retrieve or otherwise derive a somewhat (or better) accurate indication of progress through a long query in mysql? I have some queries that can take up to 45 minutes, and I need to determine if we're 10% or 90% through the processing.
EDIT:
As requested in the comments here is a distilled and generified version of one of the queries that is leading to my original question...
SELECT `userId`
FROM `openEndedResponses` AS `oe`
WHERE
`oe`.`questionId` = 3 -- zip code
AND (REPLACE( REPLACE( `oe`.`value`, ' ', '' ), '-', '' ) IN ( '30071', '30106', '30122', '30134', '30135', '30168', '30180', '30185', '30187', '30317', '30004' ));
This query is run against a single table with ~95 million rows. It takes 8 seconds to run the query and another 13 to transfer the data (21 sec total). Considering the size of the table, and the fact that there are string manipulation functions being used, I'd say it's running pretty damn fast. However, to the user, it's still 21 seconds appearing either stuck or idle. Some indication of progress would be ideal.
I know this is an old question, but I was looking for a similar answer, when trying to figure out how much longer my update would take on a query of 250m rows.
If you run:
SHOW ENGINE INNODB STATUS \G
Then under TRANSACTIONS find the transaction in question, examine this section:
---TRANSACTION 34282360, ACTIVE 71195 sec starting index read
mysql tables in use 2, locked 2
1985355 lock struct(s), heap size 203333840, 255691088 row lock(s), undo log entries 21355084
The important bit is "undo log entries". For each updated row, in my case it seemed to add an undo log entry (trying running it again after a few seconds and see how many have been added).
If you skip to the end of the status report, you'll see this:
Number of rows inserted 606188224, updated 251615579, deleted 1667, read 54873415652
0.00 inserts/s, 1595.44 updates/s, 0.00 deletes/s, 3190.88 reads/s
Here we can see that the speed updates are being applied is 1595.44 rows per second (although if you're running other update queries in tandem, then this speed might be separated between your queries).
So from this, I know 21m have been updated with (250m-21m) 229m rows left to go.
229,000,000 / 1600 = 143,125 seconds to go (143,125 / 60) / 60 = 39.76 hours to go
So it would appear I can twiddle my thumbs for another couple of days. Unless this answer is wrong, in which case I'll update it sometime before then!
I was able to estimate something like this by querying the number of rows to process then breaking the processing into a loop, working on only a subset of the total rows at a time.
The full loop was rather involved, but the basic logic went like:
SELECT @minID = Min(keyColumn) FROM table WHERE condition
SELECT @maxID = Max(keyColumn) FROM table WHERE condition
SELECT @potentialRows = (@maxID - @minID) / @iterations
WHILE @minID < @maxID
BEGIN
SET @breakID = @minID + @potentialRows
SELECT columns FROM table WITH (NOLOCK, ...)
WHERE condition AND keyColumn BETWEEN @minID AND @breakID
SET @minID = @breakID + 1
END
Note this works best if IDs are evenly distributed.