How does a case-insensitive filesystem access files?
A programmer at my work, who has used Linux all his life, was berating Windows for having a case-insensitive file system (among other things), which he said is the worst idea possible and can never be beneficial. I said that was just because he was used to case-sensitive filesystems, and that it makes a lot more sense to have a case-insensitive file system (part of my reasoning being that my name is David
, but if you referred to me david
I would still know you meant me, and the same should apply to files). He then explained his position, stating that a case-insensitive filesystem must incur a performance hit.
So now I'm wondering... how does a case-insensitive filesystem access files? Let me try to explain what I'm thinking:
Say you have a case-sensitive filesystem (and OS kernel etc.) such that in practical terms, if a directory exists called exampleDir
, I must type exactly cd exampleDir
to cd into it. If I type cd exampledir
, I should receive an error that the directory does not exist. This seems like a simple case in my mind. When I type the command, the filesystem can simply take the exact characters I typed (ignoring what the kernel might do to add the current working directory path to the string and so on) and begin running through the list of available filenames, doing a direct compare on each name; for example:
for(var i = 0; i < files.length; i++) {
if(filename == files[i]) return true;
}
Now the interesting part, the case-insensitive filesystem (assuming case-preserving, as per Windows). In practical terms, if a directory exists called exampleDir
, I could type cd exampleDir
or cd eXamPleDIr
and I would still succeed in getting into the folder. What I really want to know, is what does the code look like to achieve this. In order to preserve case, the directory name must be stored with its case. So does that mean you have to do two conversions to lower or upper case every time you want to access a file by its filename? How much of a performance hit does that translate into? Are there any tricks used to reduce the performance decrease from using a case-insensitive filesystem? This is how I imagine the filesystem code would have to look:
for(var i = 0; i < files.length; i++) {
if(toLowercase(filename) == toLowercase(files[i])) return true;
}
Please Note: Since it seems this wasn't clear from my question, I'm absolutely not asking which type is better, nor am I asking what the advantages and disadvantages are. I am only asking how (in technical terms) a case-insensitive filesystem deals with the fact that humans can type a filename with random case.
Operating systems generally work with handles. An "open" function is called, which specifies the filename, and a handle is returned. Further I/O calls take a handle, not a filename.
Other functions that require file name would be creating files, listing a directory, and deleting files.
So any performance hit with dealing with case insensitivity is not really going to affect much actual I/O, just file management.
Some programs use lock files to indicate resources are in use. This could translate to a lot of creates and deletes.
However, the overhead of doing two comparisons instead of one is likely a matter of a few additional assembly language instructions. Meaning less than 50 or so cycles. Maybe 500 or 5000 if cache misses come into play.
It's really, really not worth worrying about unless you literally are worried about the performance of creating/deleting billions of files in a short amount of time. High disk I/O applications include things like databases, and databases typically open a few very large files and keep them open while the database is being used. So those sorts of applications - one that typically requires all the disk I/O that there is - do not make a lot of calls where the filename has to be parsed.
The speed of the medium is going to be a bottleneck far before the time in dealing with filenames even approaches it.