mongodb cursor id not valid error
I am trying to iterate through this loop:
for doc in coll.find()
I get the following error at the 100,000th plus record.
File "build\bdist.win32\egg\pymongo\cursor.py", line 703, in next
File "build\bdist.win32\egg\pymongo\cursor.py", line 679, in _refresh
File "build\bdist.win32\egg\pymongo\cursor.py", line 628, in __send_message
File "build\bdist.win32\egg\pymongo\helpers.py", line 95, in _unpack_response
pymongo.errors.OperationFailure: cursor id '1236484850793' not valid at server
what does this error mean?
Solution 1:
Maybe your cursor timed out on the server. To see if this is the problem, try to set timeout=False`:
for doc in coll.find(timeout=False)
See http://api.mongodb.org/python/1.6/api/pymongo/collection.html#pymongo.collection.Collection.find
If it was a timeout problem one possible solution is to set the batch_size
(s. other answers).
Solution 2:
- Setting the
timeout=False
is dangerous and should never be used, because the connection to the cursor can remain open for unlimited time, which will affect system performance. The docs specifically reference the need to manually close the cursor. - Setting the
batch_size
to a small number will work, but creates a big latency issue, because we need to access the DB more often than needed.
For example:
5M docs with a small batch will take hours to retrieve the same data that a default batch_size returns in several minutes.
In my solution it is mandatory to use sort on the cursor:
done = False
skip = 0
while not done:
cursor = coll.find()
cursor.sort( indexed_parameter ) # recommended to use time or other sequential parameter.
cursor.skip( skip )
try:
for doc in cursor:
skip += 1
do_something()
done = True
except pymongo.errors.OperationFailure, e:
msg = e.message
if not (msg.startswith("cursor id") and msg.endswith("not valid at server")):
raise
Solution 3:
Setting timeout=False
is a very bad practice. A better way to get rid of the cursor id timeout exception is to estimate how many documents your loop can process within 10 minutes, and come up with an conservative batch size. This way, the MongoDB client (in this case, PyMongo) will have to query the server once in a while whenever the documents in the previous batch were used up. This will keep the cursor active on the server, and you will still be covered by the 10-minute timeout protection.
Here is how you set batch size for a cursor:
for doc in coll.find().batch_size(30):
do_time_consuming_things()