What filesystem can I use for a large amount of small-sized data?
I was copying ~3.7TB of data from one 4TB external drive with HFS+ filesystem, to another 4TB external drive with an exFAT filesystem. The new HDD got filled after only ~75% of the data could be transferred, due to, I am guessing, the allocation unit size of exFAT using up more space per file for small file sizes.
I am copying a lot (~ millions) of small files (1.5kB each). So I am trying to figure out how to do this.
Therefore I need a filesystem that fullfills the following requirements:
-
Block size small enough that I can fit millions of files sized 1.5kB wasting minimal space. (here exFAT has a problem)
-
Read/write compatible with Linux. (here HFS+ has a problem)
-
Able to make a 4T partition on Linux. (here ext4 has problems)
Any alternative filesystem?
UPDATE: This question was flagged as already been answered in another post (Optimizing file system for lots of small files?). However the accepted answer does not work for me:
mkfs.ext4 -b 1024 /dev/your_partition
Warning: specified blocksize 1024 is less than device physical sectorsize 4096
/dev/sdc: Cannot create filesystem with requested number of inodes while setting up superblock
Files systems that you can use
-
Ones that support some kind of block suballocation. There are many forms of this like
- block subdivision: divide a block into two multiple times
- tail packing: share the last partial block of multiple files in a block
- variable block size: allows merging or dividing blocks
Some filesystems with block suballocation: ReiserFS, Reiser4, JFS, NWFS, ZFS, Btrfs, UFS1/2, VMFS... For example VMFS5/6 has block size of 1 MB but have Support of small files of 1 KB
-
Ones that support resident/inline files like NTFS, ext4 or Btrfs
However given the scale of your problem which involves millions of files, ReiserFS/Reiser4, Btrfs and ZFS may be the best solution
For more details read below
Since your files are around 1.5 KB, the ideal block size for your case would be 512-byte. However your disk has 4 KB physical sector size (A.K.A Advanced Format) as can be seen from the error message:
Warning: specified blocksize 1024 is less than device physical sectorsize 4096
which means you can't create a block size smaller than that. You need to use block suballocation to reduce wasted space. You can open Comparison of file systems - Allocation and layout policies and sort on Block suballocation / Tail packing / Variable block size to know which file systems support such features
Another alternative is to store data in metadata space where multiple records is allocated into a single block
In NTFS each file is represented by an MFT record which is the analog of inode in *nix. Files that are small will be stored in the MFT record directly, saving space and also improving access time because you don't need another disk read to get the real data. Those are called resident files. Later in ext4 a similar feature was added and called inline files:
The inline data feature was designed to handle the case that a file's data is so tiny that it readily fits inside the inode, which (theoretically) reduces disk block consumption and reduces seeks. If the file is smaller than 60 bytes, then the data are stored inline in
inode.i_block
.
The 60-byte value is for the default inode size of 256-byte. The inode structure consumes 156 bytes and the other 40 bytes may be for some extended feature. However you can change the inode size to 2 KB to fit all of your 1.5 KB files with the -I inode-size
option while formatting
In Btrfs there's also a similar feature where small files are written directly into the metadata stream
max_inline=bytes
( default: min(2048, page size) )
Specify the maximum amount of space, that can be inlined in a metadata B-tree leaf. The value is specified in bytes, optionally with a K suffix (case insensitive).
https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)
It seems Reiser4 also has such a feature although I can't confirm it.
In NTFS the current default MFT record is 1 KB although it was 4 KB in NTFS 1.0 in Windows NT 3.1. That only allows files around 600-900 bytes or less to be resident so you'll have to change the MFT record size. It's possible although you'll have a hard time finding a formatting software that allows changing the default MFT record size
Some people have roughly the same situation as yours
- Millions of small files: best filesystem / best options
- filesystem for millions of small files
- What is the most high-performance Linux filesystem for storing a lot of small files (HDD, not SSD)?
There are also many misunderstandings from your side
Read/write compatible with Linux. (here HFS+ has a problem)
There are many read/write HFS+ drivers in Linux available, so this shouldn't be a problem. The only issue with HFS+ is that it's in the same era as ext2 so it's far more inferior compared to modern file systems like ext4, NTFS, ZFS or Btrfs
Able to make a 4T partition on Linux. (here ext4 has problems)
Neither ext4 nor exFAT has issue creating a 4 TB partition with 1 KB block size. In fact any 32-bit filesystem can create a 4 TB volume with 1 KB block size because 232 blocks × 210 bytes/block = 4 × 240 bytes = 4 TB, and with the default 4 KB block size then the maximum partition size is 16 TB. ext4 uses 48-bit address, thus has much bigger maximum size
Block size small enough that I can fit millions of files sized 1.5kB wasting minimal space. (here exFAT has a problem)
In fact the only issue with exFAT is that the default block size is too big. The minimum block size on exFAT is 1 sector so it can have 512-byte block size in a disk with 512-bit sector (See 9.2 Cluster Size Limits in the spec). You can see 512-byte and 1 KB options in the format dialog as below. Unfortunately your disk has 4 KB sector so exFAT doesn't work