How do mdfind and mdls derive what metadata a file has and where is it stored?

The man page of mdfind says the following:

The mdfind command consults the central metadata store and returns a list of files that match the given metadata query. The query can be a string or a query expression.

One would assume that the "central metadata store" references store.db under .Spotlight-V100 (as indicated in this answer) but mdls, which I assume uses the same "central metadata store", seems to work fine without any of the Spotlight files present. I tried the following command on Catalina (10.15.7) and there was no delay in displaying the output of mdls, indicating that the command does not utilise the Spotlight DB.

sudo rm -rf /System/Volumes/Data/.Spotlight-V100 ~/Library/Metadata/CoreSpotlight/ 
mdls ~/Downloads/MacVim.dmg

Manually using Spotlight with CommandSpace on the other hand launches a lot of mdworker_shared processes and re-creates the .Spotlight-V100 directory.

I also do not see how mdls and mdfind could be using extended attributes, as indicated in this answer, since invoking xattr -l ~/Downloads/MacVim.dmg does not produce any output while mdls ~/Downloads/MacVim.dmg shows several kMD* attributes.


Solution 1:

mdfind consults the Spotlight database in order to provide search results faster than scanning the whole file system for each query.

mdls on the other hand does not rely on the Spotlight database, as it doesn't need to scan the file system - you're specifying the file you want to examine directly. It does however rely on the Spotlight API in order to provide the data requested.

I.e. mdls is not a program that contains code that tries to "parse" all sorts of file types or gather information information from many place. Nor does it directly look this up in the store.dbfile.

When you request the metadata for a specific file through the Spotlight API, information is gathered from a number of places and given to the program (mdls in this case) in a uniform format. These sources are for example:

  • the file system meta data
  • the extended attributes stored in the file system
  • information from Application bundles and similar places
  • information gathered by the Spotlight importer plugin for the specific file type

Note that some importer plugins are external - i.e. small programs stored seperately from Spotlight itself. For example in /System/Library/Spotlight you'll typically find importers for things such as audio files, video files, compressed archives, etc. Similarly applications can come with their own Spotlight importers, stored in locations such as for example:

/Applications/Microsoft Outlook.app/Contents/Library/Spotlight/Microsoft Outlook Spotlight Importer.mdimporter

In this case for importing data from Outlook into Spotlight.

Other plugins are internal - i.e. they're selfcontained within Spotlight and do not require external programs. Your example in the question was for a .dmg file, and the importer for those is internal.

You can run the importer for a specific file to see exactly what attributes it would have given to the Spotlight index - without actually changing the Spotlight index. Run a command like this:

mdimport -t -d2 ~/Downloads/MacVim.dmg

Note that -d2 means that you see all the imported metadata attributes, except for kMDItemTextContent which in the case of a document would typically be very large and not preferable to see in a Terminal output. You can view that attribute as well by using the -d3 argument instead.

As these importers are effectively run as general purpose programs, the data sources for the importers themselves can be almost anything. I.e. it is not restricted to returning data found in the file system itself - it could even give attributes found by consulting cloud servers over the network (for example when you have a file stored in iCloud).