Understanding IXSCAN and COLLSCAN in MongoDB logs

I'm attempting to grep through some Mongo logs in an attempt to find slow operations that I need to optimize. Slow query logging is at the default and is logging operations over 100ms.

I think that it's safe to say that generally speaking a search for COLLSCANS will show queries that need attention. What's less clear is that if IXSCANS is a detail I should search on.

Considering the MongoDB documentation here:

https://docs.mongodb.com/manual/reference/explain-results/#collection-scan-vs-index-use

my understanding is that this is a binary situation, a query is either a COLLSCAN or an IXSCAN. So if I grep for IXSCAN, I will be looking at ALL the slow queries that are not COLLSCANS. Is this true?

I'm attempting to grep through some Mongo logs in an attempt to find slow operations that I need to optimize. Slow query logging is at the default and is logging operations over 100ms.

Rather than grepping through MongoDB logs, I would highly recommend using the scripts from the open source mtools project. NOTE: I'm not the original mtools author, but I am a contributor.

mtools is a collection of Python scripts that was inspired by the pain of grepping through GBs of logs to try to find information of interest for production MongoDB deployments. Key scripts are intended to fit in with the typical command line workflow of piping output through successive filters (eg. mlogfilter --scan | mplotqueries).

For example:

mloginfo --queries is a good starting point: it aggregates query patterns so you can focus on queries that run frequently and have more overall impact on your deployment.
mlogfilter is essentially a grep for MongoDB logs: you can filter log lines by namespace, duration, connection, pattern, and other criteria. The --scan option is helpful to identify inefficient queries that aren't necessarily a collection scan.
mplotqueries is a tool for visualizing logs, which can be very helpful to identify patterns and outliers.

I think that it's safe to say that generally speaking a search for COLLSCANS will show queries that need attention. What's less clear is that if IXSCANS is a detail I should search on.

Collection scans are generally of interest, but may also be the result of one-off queries or expected usage on a small collection. Instead of focusing on the query type, I would review slow queries (or slow operations in general) for your deployment to get a better understanding of what you might be able to improve. Using an index is usually good, but there are inefficient index usages (eg. in-memory sorting or case-insensitive regexes) that can be worth addressing.

my understanding is that this is a binary situation, a query is either a COLLSCAN or an IXSCAN. So if I grep for IXSCAN, I will be looking at ALL the slow queries that are not COLLSCANS. Is this true?

If you grep for IXSCAN you'll be finding all log lines that mention IXSCAN, but the slow query logging outcome is definitely not binary and will also vary by MongoDB server version. While efficient index usage is an obvious optimization, there are a number of internal query planner stages that may be relevant to understanding query performance.

When you find an interesting slow query in the logs, the next step is usually reviewing more detailed explain output. I use explain(true)(aka allPlansExecution mode) as this shows details for query plans that were considered as well as the winning plan. If you aren't sure how to interpret the explain output for a slow query, I'd recommend posting on DBA StackExchange.

It's worth noting that explaining a query is not a measure of the actual performance with a workload. In normal operation query plans are cached, whereas detailed explain output specifically re-evaluates the candidate indexes and query plan. See Query Plans in the MongoDB manual for more information.

Understanding IXSCAN and COLLSCAN in MongoDB logs

Related

Recent Posts