Longest filename in a large directory

Solution 1:

I guess the @steeldriver's solution is a better choice however here is my alternative solution, you can use a combinations of commands to find exactly two (or more) longest file names.

find . | awk 'function base(f){sub(".*/", "", f); return f;} \
{print length(base($0)), $0}'| sort -nr | head -2

the output would be like:

length ./path/to/file

Here a real example:

42 ./path/to/this-file-got-42-character-right-here.txt
31 ./path/to/this-file-got-31-character.txt

Notes

find gives us a list of all files within that directory like:

./path/to/this-file-got-31-character.txt

using awk we add file length to start of each line (it's exactly the file length not the path's):

31 ./path/to/this-file-got-31-character.txt

finally we sort it based on file length and get the first two line using head.

Solution 2:

Based on comments, what you really need in this case is a list of all the files whose names are longer than some maximum number of characters - and fortunately that's relatively easy using a find regex:

find $PWD -regextype posix-extended -regex '.*[^/]{255,}$'

For such a large number of files and directories, you probably want to avoid sorting - instead let's just keep a running record of the longest and second longest filenames, and their full pathnames:

find $PWD -printf '%p\0' | awk -v RS='\0' '
  {
    # get the length of the basename of the current filepath
    n = split($0,a,"/");
    currlen = length(a[n]);

    if (currlen > l[1]) {
      # bump the current longest to 2nd place
      l[2] = l[1]; p[2] = p[1];
      # store the new 1st place length and pathname
      l[1] = currlen; p[1] = $0;
    }
    else if (currlen > l[2]) {
      # store the new 2st place length and pathname
      l[2] = currlen; p[2] = $0;
    }
  }

  END {
      for (i in l) printf "(%d) %d : %s\n", i, l[i], p[i];
  }'

or with GNU awk (which supports 2D arrays)

$ find $PWD -printf '%p\0' | gawk -v RS='\0' '
  {
    # get the length of the basename of the current filepath
    n = split($0,a,"/");
    currlen = length(a[n]);

    if (currlen > p[1][1]) {
      # bump the current longest to 2nd place
      p[2][1] = p[1][1]; p[2][2] = p[1][2];
      # store the new 1st place length and pathname
      p[1][1] = currlen; p[1][2] = $0;
    }
    else if (currlen > p[2][1]) {
      # store the new 2st place length and pathname
      p[2][1] = currlen; p[2][2] = $0;
    }
  }

  END {
      for (i in p[1]) printf "(%d) %d : %s\n", i, p[i][1], p[i][2];
  }'