How to find the n largest files in a folder?

How to find the n largest files in a folder except the ones from the first folder?

In this example, for n=2:

dir
--file 50KB
--dir1
--dir2
----file2_1.txt (size 25KB)
--dir3
----dir3_1
------file3_1.txt (size 35KB)
------file3_2 (size 25KB)

Result:

dir/dir3/dir3_1/file3_1.txt 35KB
dir/dir2/file2_1.txt 25KB

Solution 1:

find . -mindepth 2 -printf "%s\t%p\n" | sort -n | cut -f 2- | tail -n $n

Here, the largest file is last. If you want to change the order so the largest file is first:

find . -mindepth 2 -printf "%s\t%p\n" | sort -nr | cut -f 2- | head -n $n
# .............................................^...............^^^^

With the GNU toolset, you can handle filenames that contain newlines (annoying but valid):

find . -mindepth 2 -printf "%s\t%p\0" | sort -znr | cut -zf 2- | head -zn $n

And to get your desired output, you can do:

find . -mindepth 2 -printf "%s\t%p\n" |
  sort -nr |
  head -n 5 |
  perl -MNumber::Bytes::Human=format_bytes -F'\t' -lane '
    push @F, format_bytes(shift @F);
    print join "\t", @F;
  '

Using perl module Number::Bytes::Human from CPAN.

Solution 2:

Although you tagged your question bash, here is a zsh solution in case others find it useful.

Given

% tree -h dir
dir
├── [ 512]  dir1
├── [ 512]  dir2
│   └── [ 25K]  file2_1.txt
├── [ 512]  dir3
│   └── [ 512]  dir3_1
│       ├── [ 35K]  file3_1.txt
│       └── [ 25K]  file3_2.txt
└── [ 50K]  file

4 directories, 4 files

then using zsh with glob qualifiers:

% print -RC1 dir/*/**/*(.OLon[1,2])
dir/dir3/dir3_1/file3_1.txt
dir/dir2/file2_1.txt

where

  • dir/*/ ensures we start at least 1 directory below dir, equivalent of find's -mindepth

  • **/* is a shell glob that matches recursively (the same is available in bash if the globstar option is set)

  • () encloses a collection of qualifiers, specifically

    • . matches regular files only (equivalent of find -type f)
    • OL orders the results by size (Length) descending, while on breaks ties by name ascending
    • [1,2] selects a range of results

Unlike find, shell globs generally omit hidden files by default - if you want to include them, add D to the qualifiers i.e. (.DOLon[1,2])

Solution 3:

From the top of my head:

ls -lsR * | awk '{print $6,$10}'| sort -nr | head -n5