So far I have seen an article on performance and scalability mainly focusing on how long it takes to add new links. But is there any information about limitations regarding number of files, number of folders, total size, etc?

Right now I have a single file server with millions of JPGs (approx 45 TB worth) that are shared on the network through several standard file shares. I plan to create a DFS namespace and replicate all these images to another server for high availability purposes. Will I encounter extra problems with DFS that I'm otherwise not experiencing with plain-jane file shares? Is there a more recommended way to replicate these millions of files and make them available on the network?

EDIT 2:

All files are usually written to disk once and never modified after that. The only time they are modified is when they are eventually deleted, possibly years later. So everything is pretty static.

EDIT:

I would experiment on my own and write a blog post about it, but I don't have the hardware for the second server yet. I'd like to collect information before buying 45 TB of hard drive space...


Solution 1:

We are currently using 2008 R2 DFSR with 57 TB of replicated files ( 1.6 Million ) and have an overall volume size in excess of 90 TB, without any issues .
So MS tested limits are a bit naive in this respect, and IMHO they should buy some more disk space and do some more testing. If you're not time critical on the initial sync DFSR can do that too. What it doesn't like especially is the same file being modified on multiple hosts as it has to do the arbitration on which to keep.

Solution 2:

With 45TB of data, you are above the tested limitations of DFS-R on Server 2008, as per:

DFS-R: FAQ

Size of all replicated files on a server: 10 terabytes.

Number of replicated files on a volume: 8 million.

Maximum file size: 64 gigabytes.

Edit:

If your files will likely never change, you could utilize the namespace portion of DFS to create a virtualized path for your share. You could then run robocopy in a scheduled task to sync your servers. You would need to use something like robocopy for your initial sync, even if you were going to use DFS-R.

Solution 3:

"Is there a more recommended way to replicate these millions of files and make them available on the network?" Yup - either a SAN or NAS device to centralize them, or distributed storage like Isilon, Gluster, etc. DFS is nice, but it means that every server has a complete copy of everything, so that's not a good architecture if you need to scale a lot bigger.

Also, your architecture might have difficulty scaling anyway. I've seen some large image systems that don't store as files - they have a database that stores the metadata and the byte offsets of the images, and rolls them up into big binary files that are distributed in a way that's easy on the disk and filesystem. Need an image, and it will look up the blob file and pull the image out of it with the start and ending byte.