What's the best way to store thousands of images in a windows folder structure?

We have hundreds of thousands of jpg images in a windows folder structure like this but it's really hard to interact and work with them in a snappy way (listing takes time, copying takes time, etc). Here's the structure:

images/
  1/
    10001/
      10001-a.jpg
      10001-b.jpg
      ...
      10001-j.jpg (10 images in each XXXXX folder)
    10002/
    10003/
    ...
    19999/
  2/
    20001/
    20002/
    20003/
    ...
    29999/
  3/
  4/
  5/
  6/
  7/
  8/
  9/

Now, browsing these images is a little bit slow because there are appr. 10 000 folders in each X folder and listing those simply takes time.

Is there a better way to organize the images with less subfolders/items? Would changing the structure to this have any effect?

images/
  1/
    0/
      0/
        0/
          0/
          1/
          2/
          3/
          4/
          5/
          6/
          7/
          8/
          9/
          10000/ (image folder, same as path)
            10000-a.jpg
            10000-b.jpg
            ...
            10000-j.jpg (10 images in each image folder)
        1/
        2/
        3/
        4/
        5/
        6/
        7/
        8/
        9/
      1/
      2/
      3/
      4/
      5/
      6/
      7/
      8/
      9/
    1/
    2/
    3/
    4/
    5/
    6/
    7/
    8/
    9/
  2/
  3/
  4/
  5/
  6/
  7/
  8/
  9/

Thus, locating image 48617-c.jpg would be equal to path 4/8/6/1/7/48617/48617-c.jpg.

The reason for having a separate folder with the full path number 48617 is to simplify copying of a complete 10-image batch (by copying the entire folder).

Now... no folder will have more than 11 immediate subfolders but there will be lots of extra single digit folders for separation purposes. Would this setup speed up browsing and interaction having multiple users adding/copying/deleting/etc images?


Solution 1:

Windows is a bit special when it comes to folder layout with kajillions of files. Especially images, since Windows Explorer treats them special. That said, there are a few guide-lines to follow to keep things from getting too out of hand:

  • If you intend to browse the directory structure from Windows Explorer for any reason, keep it under 10,000 entries in a directory (files & sub-directories).
  • If you will be interacting with it solely from cli utilities or code the 10K limit is far more flexible.
  • Don't create TOO many sub-directories, each directory you create is another discrete operation a copy has to make when copying.
    • If each file creates N directories, the number of file-system objects created by that file will be 1+N, which linearly scales your copy-times.
    • A short, exponential tree (i.e. three tiers of directories, each with 256 sub-directories) can scale amazingly far before you run into the 10K/per-directory limit.
  • If you're accessing it with code, go for direct opens instead of parsing directory-listings prior to open. A failed fopen() followed by a directory-scan is faster than a dir-scan followed by a guaranteed fopen() in many cases.

Caveats:

  • File-count is immutable, but directory count is up to you. The SUM of those two counts impacts how fast copy operations take.
  • Try, if at all possible, to not browse with Windows Explorer unless you have to. It doesn't deal well with big directories, and there isn't much you can do about it.

Solution 2:

There's plenty of good information on the math in my answer from How does directory complexity influences on i-nodes?

With that said, different filesystem handle large numbers of files in directories in various ways. Some are OK with 10,000 entries, others buckle. As a quickly invented rule of thumb, 1,000 is probably a good target cap if you have design control. Entries in a directory are usually stored as some kind of list and it is up to the reading application to sort their order. For example, ls in the Unix world reads things into memory from directory order and then prints them out in alphabetical order.

Take a look at the math from the other question. Also consider what sysadmin1338 said about Explorer behaving differently. Explorer will create thumbnails of anything it recognizes as an image and then read the thumbnails to display them. That's a lot of disk IO to look at a directory that's chock full of files.

Solution 3:

Depending on whether you have the resources to develop such a system, this sounds like a good candidate for a SQL Server database using FILESTREAM storage for the files. That way, you leave the organization of the directories to SQL Server and all you have to worry about is how you manage the data itself. You could probably use SQL Express since FILESTREAM data isn't taken into account when calculating the database size.