Why is a directory copied with the cp command smaller than the original?
I am tying to copy one directory with a large number of files to another destination. I did:
cp -r src_dir another_destination/
Then I wanted to confirm that the size of the destination directory is the same as the original one:
du -s src_dir
3782288 src_dir
du -s another_destination/src_dir
3502320 another_destination/src_dir
Then I had a thought that there might be several symbolic links that are not followed by the cp
command and added the -a
flag:
-a Same as -pPR options. Preserves structure and attributes of files but not directory structure.
cp -a src_dir another_destination/
but du -s
gave me the same results. It is interesting that both the source and destination have the same number of files and directories:
tree src_dir | wc -l
4293
tree another_destination/src_dir | wc -l
4293
What am I doing wrong that I get different sizes with the du
command?
UPDATE
When I try to get sizes of individual directories with the du
command I get different results:
du -s src_dir/sub_dir1
1112 src_dir/sub_dir1
du -s another_destination/src_dir/sub_dir1
1168 another_destination/src_dir/sub_dir1
When I view files with ls -la
, individual file sizes are the same but totals are different:
ls -la src_dir/sub_dir1
total 1168
drwxr-xr-x 5 hirurg103 staff 160 Jan 30 20:58 .
drwxr-xr-x 1109 hirurg103 staff 35488 Jan 30 21:43 ..
-rw-r--r-- 1 hirurg103 staff 431953 Jan 30 20:58 file1.pdf
-rw-r--r-- 1 hirurg103 staff 126667 Jan 30 20:54 file2.png
-rw-r--r-- 1 hirurg103 staff 7386 Jan 30 20:49 file3.png
ls -la another_destination/src_dir/sub_dir1
total 1112
drwxr-xr-x 5 hirurg103 staff 160 Jan 30 20:58 .
drwxr-xr-x 1109 hirurg103 staff 35488 Jan 30 21:43 ..
-rw-r--r-- 1 hirurg103 staff 431953 Jan 30 20:58 file1.pdf
-rw-r--r-- 1 hirurg103 staff 126667 Jan 30 20:54 file2.png
-rw-r--r-- 1 hirurg103 staff 7386 Jan 30 20:49 file3.png
That is because du
by default shows not the size of the file(s), but the disk space that they are using. You need to use the -b
option to get sum of file sizes, instead of total of disk space used. For example:
% printf test123 > a
% ls -l a
-rw-r--r-- 1 mnalis mnalis 7 Feb 1 19:57 a
% du -h a
4,0K a
% du -hb a
7 a
Even though the file is only 7 bytes long, it will occupy a whole 4096 bytes of disk space (in my particular example; it will vary depending on the filesystem used, cluster size etc).
Also, some filesystems support so-called sparse files, which do not use any disk space for blocks which are all zeros. For example:
% dd if=/dev/zero of=regular.bin bs=4k count=10
10+0 records in
10+0 records out
40960 bytes (41 kB, 40 KiB) copied, 0,000131003 s, 313 MB/s
% cp --sparse=always regular.bin sparse.bin
% ls -l *.bin
-rw-r--r-- 1 mnalis mnalis 40960 Feb 1 20:04 regular.bin
-rw-r--r-- 1 mnalis mnalis 40960 Feb 1 20:04 sparse.bin
% du -h *.bin
40K regular.bin
0 sparse.bin
% du -hb *.bin
40960 regular.bin
40960 sparse.bin
In short, to verify all files were copied, you'd use du -sb
instead of du -s
.
It might be due to the size of the directory "files".
In most filesystems, on disk, a directory is much like a regular file (with just a list of names and node numbers, mostly), using more blocks as it grows.
If you add many files, the directory itself grows. But if you remove them afterwards, in many filesystems, the directory will not shrink.
So if one of the directories in your original tree had many files at some point, which were later deleted, the copy of that directory will be "smaller", as it only uses as many blocks as it needs for the current number of files.
In the listings in your update, there are 3 directories you haven't listed. Compare the size of those (or descendants of those) in your ls -al
output.
To find where the difference is, you can try an ls -alr
on both directories, redirected to a file, and then a diff
of the two outputs.