MongoDB, performance of query by regular expression on indexed fields

Actually according to the documentation,

If an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan. Further optimization can occur if the regular expression is a “prefix expression”, which means that all potential matches start with the same string. This allows MongoDB to construct a “range” from that prefix and only match against those values from the index that fall within that range.

http://docs.mongodb.org/manual/reference/operator/query/regex/#index-use

In other words:

For /Jon Skeet/ regex ,mongo will full scan the keys in the index then will fetch the matched documents, which can be faster than collection scan.

For /^Jon Skeet/ regex ,mongo will scan only the range that start with the regex in the index, which will be faster.

In case anyone still has an issue with search performance, there is a way to optimize regex search even if it searches for a word in a sentence (not necessarily at the beginning ^ or the end $ of the string).

The field should have a text index

db.someCollection.createIndex({ someField: "text" })

and the queries on should use regex only after performing a plain search first

db.someCollection.find({ $and: 
  [
    { $text: { $search: "someWord" }}, 
    { someField: { $elemMatch: {$regex: /test/ig, $regex: /other/ig}}}
  ]
})

This ensures that the regex will run only for the results of the initial, plain search, which should be quite fast thanks to the index on this field. It might have a huge impact on search performance, depending on how large the collection is.

MongoDB, performance of query by regular expression on indexed fields

Related

Recent Posts