How do mdfind and mdls derive what metadata a file has and where is it stored?
The man page of mdfind
says the following:
The mdfind command consults the central metadata store and returns a list of files that match the given metadata query. The query can be a string or a query expression.
One would assume that the "central metadata store" references store.db
under .Spotlight-V100
(as indicated in this answer) but mdls
, which I assume uses the same "central metadata store", seems to work fine without any of the Spotlight files present. I tried the following command on Catalina (10.15.7) and there was no delay in displaying the output of mdls
, indicating that the command does not utilise the Spotlight DB.
sudo rm -rf /System/Volumes/Data/.Spotlight-V100 ~/Library/Metadata/CoreSpotlight/
mdls ~/Downloads/MacVim.dmg
Manually using Spotlight with CommandSpace on the other hand launches a lot of mdworker_shared
processes and re-creates the .Spotlight-V100
directory.
I also do not see how mdls
and mdfind
could be using extended attributes, as indicated in this answer, since invoking xattr -l ~/Downloads/MacVim.dmg
does not produce any output while mdls ~/Downloads/MacVim.dmg
shows several kMD*
attributes.
Solution 1:
mdfind
consults the Spotlight database in order to provide search results faster than scanning the whole file system for each query.
mdls
on the other hand does not rely on the Spotlight database, as it doesn't need to scan the file system - you're specifying the file you want to examine directly. It does however rely on the Spotlight API in order to provide the data requested.
I.e. mdls
is not a program that contains code that tries to "parse" all sorts of file types or gather information information from many place. Nor does it directly look this up in the store.db
file.
When you request the metadata for a specific file through the Spotlight API, information is gathered from a number of places and given to the program (mdls
in this case) in a uniform format. These sources are for example:
- the file system meta data
- the extended attributes stored in the file system
- information from Application bundles and similar places
- information gathered by the Spotlight importer plugin for the specific file type
Note that some importer plugins are external - i.e. small programs stored seperately from Spotlight itself. For example in /System/Library/Spotlight
you'll typically find importers for things such as audio files, video files, compressed archives, etc. Similarly applications can come with their own Spotlight importers, stored in locations such as for example:
/Applications/Microsoft Outlook.app/Contents/Library/Spotlight/Microsoft Outlook Spotlight Importer.mdimporter
In this case for importing data from Outlook into Spotlight.
Other plugins are internal - i.e. they're selfcontained within Spotlight and do not require external programs. Your example in the question was for a .dmg
file, and the importer for those is internal.
You can run the importer for a specific file to see exactly what attributes it would have given to the Spotlight index - without actually changing the Spotlight index. Run a command like this:
mdimport -t -d2 ~/Downloads/MacVim.dmg
Note that -d2
means that you see all the imported metadata attributes, except for kMDItemTextContent
which in the case of a document would typically be very large and not preferable to see in a Terminal output. You can view that attribute as well by using the -d3
argument instead.
As these importers are effectively run as general purpose programs, the data sources for the importers themselves can be almost anything. I.e. it is not restricted to returning data found in the file system itself - it could even give attributes found by consulting cloud servers over the network (for example when you have a file stored in iCloud).