File size mismatch

What's the reason behind the difference in reported file sizes?

[root@localhost]# ls -lah sendlog
-rw-rw-r-- 1 mail mail 1.3T Aug 15 17:30 sendlog

[root@localhost]# du -m sendlog
24M  sendlog

This came to our attention when a server's backup kept failing for quota issues, so it wasn't only "ls" which was seeing this wrong size.

Terms like "sparse files" and "block assignment" are coming to mind, but I'm not sure why it would happen or the real reason behind it. Obviously there is a difference in the ways the two commands check size, am I right always trusting du?

FYI, this should be a pretty standard mail log file.


The difference between the values is as follows.

From the manual of stat(2)

struct stat {
    // snip
    off_t     st_size;    /* total size, in bytes */
    // snip
    blkcnt_t  st_blocks;  /* number of blocks allocated */
    // snip
};

The st_blocks field indicates the number of blocks allocated to the file, 512-byte units. (This may be smaller than st_size/512, for example, when the file has holes.)

The size as reported by ls is st_size, the size as reported by du is st_blocks * 512

The value reported by du is the number of bytes used by the file on the filesystem/disk, and the value reported by ls is the actual size/length of the file when you interact with it. (In addition to operating with on-disk usage, du also only counts hardlilnked files once)

Which value is the "right one" depends on context. If you're after disk-usage du is correct, if you're wondering how many bytes is in the file, ls/st_size is correct.

In addition, you can by using various options get i.e. du (--apparent-size) to use the size reported by st_size or you can get ls (-s) to report the number of blocks used.

Your assumption regarding your logfile beeing a sparse file sounds plausible, however, the reason why I don't know.


Just as Kjetil explained, you have a sparse file. Blocks of blank data inside the file are not allocated to disk until you actually write to those blocks. How that happened in a log file is a mystery. You have to check your audit logs from the last time sendlog had a correct size to the time where it had this huge hole. Perhaps the answer is in the log file itself.

Perhaps someone did that intentionally to cause havoc in your system. Or it was some software error.

You can create your own terabyte-sized file easily with:

dd if=/dev/zero of='OMG_Thats_a_1_terabyte_file!!.dat' seek=1T bs=1 count=1

That file will allocate only a few kilobytes of disk space in any current Linux version with a filesystem with support for sparse files.

Your backup solution needs a replacement. Any serious backup system nowadays handle sparse files efficiently. Even the simplest solution using GNU tar support it (-S or --sparse option).