MongoTransactionException: Query failed with error code 251

Faced this Mongo exception when query a document in a transaction, with a specific id only (all other ids are fine):

Exception occured: org.springframework.data.mongodb.MongoTransactionException: 
Query failed with error code 251 and error message 
'Given transaction number 1 does not match any in-progress transactions. The active transaction number is -1' on server mongo-1:27017; 
nested exception is com.mongodb.MongoQueryException: 
Query failed with error code 251 and error message 
'Given transaction number 1 does not match any in-progress transactions. The active transaction number is -1' on server mongo-1:27017 
at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:136) 
at org.springframework.data.mongodb.core.MongoTemplate.potentiallyConvertRuntimeException(MongoTemplate.java:2902) 
at org.springframework.data.mongodb.core.MongoTemplate.executeFindMultiInternal(MongoTemplate.java:2810) 
at org.springframework.data.mongodb.core.MongoTemplate.doFind(MongoTemplate.java:2555) 
at org.springframework.data.mongodb.core.ExecutableFindOperationSupport$ExecutableFindSupport.doFind(ExecutableFindOperationSupport.java:216) 
at org.springframework.data.mongodb.core.ExecutableFindOperationSupport$ExecutableFindSupport.oneValue(ExecutableFindOperationSupport.java:128) 
at org.springframework.data.mongodb.repository.query.AbstractMongoQuery.lambda$getExecution$4(AbstractMongoQuery.java:153) 
at org.springframework.data.mongodb.repository.query.AbstractMongoQuery.doExecute(AbstractMongoQuery.java:126) 
at org.springframework.data.mongodb.repository.query.AbstractMongoQuery.execute(AbstractMongoQuery.java:101) 
at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.doInvoke(RepositoryFactorySupport.java:618) 

The MongoDB is a sharded cluster, and similar action is performed against a completely similar cluster (in another test environment) but no exception occurs.

Another info, this exception is always after a MongoDB socket exception as below:

Got socket exception on connection [connectionId{localValue:9, serverValue:11760}] to mongo-1:27017. All connections to mongo-1:27017 will be closed.

We had a similar issue in our project: some tests were failing with 251 'Given transaction number * does not match any in-progress transactions'. We wrote a test to reproduce the issue and then debugged it. I am 90% sure that we found the issue.

So the hypothesis is as follows:
when we use spring mongo transactions, actual transaction is started when the first request is sent to database (it does not matter what kind of request - query, update etc.). So, when first request is sent in parallel with any other requests, but was not able to complete before them, and they think that the transaction has been started (but it hasn't), then we see this error occur. "The transaction x+1 does not exist. Last transaction is x". What led to this conclusion is the following log sequence (we enabled trace logs during test execution):

  • 2021-07-15 12:56:41.331 DEBUG 4900 --- [ Test worker] org.mongodb.driver.protocol.command : Sending command '{"find": "items", "filter": {"_id": "d1"}, "limit": 1, "$db": "...", ..., "txnNumber": 1, "startTransaction": true, "autocommit": false, ...}' with request id 23 to database ... on connection ...
  • ...
  • 2021-07-15 12:56:41.334 DEBUG 4900 --- [ Test worker] org.mongodb.driver.protocol.command : Sending command '{"find": "items", "filter": {"_id": "d2"}, "limit": 1, "$db": "...", ..., "txnNumber": 1, "autocommit": false, ...}' with request id 24 to database ... on connection ...
  • ...
  • 2021-07-15 12:56:41.337 DEBUG 4900 --- [ntLoopGroup-3-6] org.mongodb.driver.protocol.command : Execution of command with request id 24 failed to complete successfully ...
    com.mongodb.MongoCommandException: Command failed with error 251 (NoSuchTransaction): 'Given transaction number 1 does not match any in-progress transactions. The active transaction number is -1' on server ...
  • ...
  • 2021-07-15 12:56:41.379 DEBUG 4900 --- [ntLoopGroup-3-8] org.mongodb.driver.protocol.command : Execution of command with request id 23 completed successfully ...

Notice how in first log we have "startTransaction": true, but in second - we don't. Yet, second request finished first, thus it errored. This error also occures if second request is starting in between first request start and first request complition, e.g.: first request starts, second request start, first request completes, second request completes.

Test code (in kotlin):

Service:

@Service
class TestService {

    @Autowired private lateinit var itemRepository: ItemRepository

    @Transactional
    fun parallelQuery(): Mono<Void> {
        return Flux.fromIterable(listOf("d1", "d2"))
            .flatMap {
                itemRepository.findById(it)
            }
            .then()
    }
}

Test itself:

internal class MonoZipTest: BaseTest() {
    @Autowired private lateinit var testService: TestService

    @Test
    fun test() {

        val mono =
            Flux.fromIterable(1..1000)
                .flatMap {
                    testService.parallelQuery()
                }
                .then()

        StepVerifier.create(mono)
            .expectError(MongoTransactionException::class.java)
            .verify()
    }
}

Conclusion: Ensure that first request to mongodb in transaction is not done in parallel with any other request in that transaction.


A little disclaimer: this is not a proper answer as I'm not sure that it could help the original poster; also, I have no proper explanation for this effect. I just wanted to let people know about this effect; usually I would use a comment for this, but it's too long for a comment. I post this in a hope that it could help someone to solve their problem.

We saw a similar exception recently:

Caused by: com.mongodb.MongoQueryException: Query failed with error code 251 and error message 'Given transaction number 198 does not match any in-progress transactions. The active transaction number is 197' on server aaa.bbb.ccc:27017
    at com.mongodb.operation.FindOperation$3.onResult(FindOperation.java:822)

I cannot show the code, so I will try to explain what it does.

In a non-reactive setting, it would be like this:

X = getX()
Y = saveAndReturnY()
pair = (X, Y)
processPair(pair)

This logic was implemented, and it worked fine (I do not show the working code version, it's not relevant). This code was running under transaction.

Then, I refactored this code to simplify it. It ended up looking like this:

getX() // Mono<X>
    .zipWith(saveAndReturnY()) // Mono<Tuple<X, Y>>
    .flatMap(this::processPair)

And with this code we observed the exception above. It did not happen 100% of times, something like 50/50: sometimes it did, sometimes it did not. This was only happening on MongoDB Atlas. Our integration tests using a MongoDB of the same version (4.2.9) working in a Docker container (via testcontainers) worked fine and we never managed to reproduce the problem there.

Long story short: it turned out that if we replace zipWith() with zipWhen(), the problem goes away:

getX() // Mono<X>
    .zipWhen(x -> saveAndReturnY()) // Mono<Tuple<X, Y>>
    .flatMap(this::processPair)

The difference is that with zipWith(), subscriptions happen at the same time to both Monos zipped together, that is to Mono<X> and Mono<Y>, so they execute "in parallel". But for zipWhen(), they execute strictly sequentially: you first get X out of Mono<X>, then you subscribe to Mono<Y>. In the original code (before refactoring), it worked in the same (sequential, or, probably, it's better to say, causal) way.

I'm not sure why this happened for us, we are still investigating. Maybe there is some problem that was just surfaced by this parallelism. Or there is something in MongoDB transactions that I do not understand (yet) that makes such exceptions happen when parallelism kicks off.

Anyway, in our case, getting rid of the parallelism solved the problem. Try analyzing your code and see whether there are some operators like Mono.zipWith() or Flux.flatMap() that subscribe to more than one Publisher eagerly causing such a "parallelism" effect.