Tail terminal command is extremely slow when processing files with long lines

The tail command has become unusably slow for large files (e.g. 1 GB text file).

EXAMPLE:

$ tail -265000 file.txt > last_265000.txt

takes 10+ minutes, whereas

$ head -265000 file.txt > first_265000.txt

is basically instantaneous.

My workaround is to use the tac command (installed via brew install coretools) to reverse the file lines and use head:

$ tac file.txt | head -265000 | tac > last_265000.txt

I don't remember this being an issue in my prior macs. Tried switching shell from zsh to bash, but same deal.


Specs: MacBook Pro (13-inch, M1, 2020) Big Sur

The text file: https://www1.nyc.gov/assets/finance/downloads/tar/tc1_22.zip


Summary of below, so far:

  • /usr/bin/tail in Big Sur is very slow when compared to similar utilities.
  • When presented with a text file with long lines, it can become unusably slow.
  • We don't know if this is unique to this version of macOS.
  • We don't know if this is on Apple's radar.
  • gtail from coreutils is an effective replacement.

It looks like the default tail in /usr/bin is very slow when a large line count is requested, and is orders of magnitude worse when given a binary file:

text file:

> time tail -3 threeGB.txt > /dev/null
tail -3 threeGB.txt > /dev/null  0.00s user 0.01s system 89% cpu 0.011 total


> time tail -26500 threeGB.txt > /dev/null
tail -26500 threeGB.txt > /dev/null  1.31s user 1.85s system 98% cpu 3.195 total

> time gtail -26500 threeGB.txt > /dev/null
gtail -26500 threeGB.txt > /dev/null  0.00s user 0.00s system 42% cpu 0.018 total


> tail -265000 threeGB.txt > /dev/null
tail -265000 threeGB.txt > /dev/null  12.80s user 17.76s system 99% cpu 30.700 total

> gtail -265000 threeGB.txt > /dev/null
gtail -265000 threeGB.txt > /dev/null  0.02s user 0.02s system 87% cpu 0.038 total

binary file:

> time tail -3 twoGB.mpg > /dev/null    
tail -3 twoGB.mpg > /dev/null  0.00s user 0.00s system 40% cpu 0.011 total

> time tail -26500 twoGB.mpg > /dev/null 
tail -26500 twoGB.mpg > /dev/null  338.10s user 265.03s system 78% cpu 12:52.17 total

> time gtail -26500 twoGB.mpg > /dev/null
gtail -26500 twoGB.mpg > /dev/null  0.01s user 0.04s system 24% cpu 0.193 total


(MacBook Pro 2017, Intel i7, 16GB, Big Sur, internal ssd)

I don't know if this is a new problem with Big Sur. It could be that this wasn't previously noticed due to some combination of file size, file type, and line count.

A workaround, as demonstrated above: use gtail from coreutils. If you want, you can link it into your PATH as tail (this post has some info for that).


Added: testing with the OP's text file, using the same MacBook Pro listed above.

> time tail -265000 tc1_22.TXT >/dev/null
tail -265000 tc1_22.TXT > /dev/null  577.89s user 808.23s system 99% cpu 23:11.05 total

> time gtail -265000 tc1_22.TXT >/dev/null
gtail -265000 tc1_22.TXT > /dev/null  0.46s user 0.66s system 97% cpu 1.154 total

The file is ASCII text, CRLF, with tab-delimited fields. Significantly, each line has 1,900 characters; the performance here fits in with the pattern found by @John Palmieri.