Tail terminal command is extremely slow when processing files with long lines
The tail
command has become unusably slow for large files (e.g. 1 GB text file).
EXAMPLE:
$ tail -265000 file.txt > last_265000.txt
takes 10+ minutes, whereas
$ head -265000 file.txt > first_265000.txt
is basically instantaneous.
My workaround is to use the tac
command (installed via brew install coretools
) to reverse the file lines and use head
:
$ tac file.txt | head -265000 | tac > last_265000.txt
I don't remember this being an issue in my prior macs. Tried switching shell from zsh to bash, but same deal.
Specs: MacBook Pro (13-inch, M1, 2020) Big Sur
The text file: https://www1.nyc.gov/assets/finance/downloads/tar/tc1_22.zip
Summary of below, so far:
-
/usr/bin/tail
in Big Sur is very slow when compared to similar utilities. - When presented with a text file with long lines, it can become unusably slow.
- We don't know if this is unique to this version of macOS.
- We don't know if this is on Apple's radar.
-
gtail
fromcoreutils
is an effective replacement.
It looks like the default tail
in /usr/bin
is very slow when a large line count is requested, and is orders of magnitude worse when given a binary file:
text file:
> time tail -3 threeGB.txt > /dev/null
tail -3 threeGB.txt > /dev/null 0.00s user 0.01s system 89% cpu 0.011 total
> time tail -26500 threeGB.txt > /dev/null
tail -26500 threeGB.txt > /dev/null 1.31s user 1.85s system 98% cpu 3.195 total
> time gtail -26500 threeGB.txt > /dev/null
gtail -26500 threeGB.txt > /dev/null 0.00s user 0.00s system 42% cpu 0.018 total
> tail -265000 threeGB.txt > /dev/null
tail -265000 threeGB.txt > /dev/null 12.80s user 17.76s system 99% cpu 30.700 total
> gtail -265000 threeGB.txt > /dev/null
gtail -265000 threeGB.txt > /dev/null 0.02s user 0.02s system 87% cpu 0.038 total
binary file:
> time tail -3 twoGB.mpg > /dev/null
tail -3 twoGB.mpg > /dev/null 0.00s user 0.00s system 40% cpu 0.011 total
> time tail -26500 twoGB.mpg > /dev/null
tail -26500 twoGB.mpg > /dev/null 338.10s user 265.03s system 78% cpu 12:52.17 total
> time gtail -26500 twoGB.mpg > /dev/null
gtail -26500 twoGB.mpg > /dev/null 0.01s user 0.04s system 24% cpu 0.193 total
(MacBook Pro 2017, Intel i7, 16GB, Big Sur, internal ssd)
I don't know if this is a new problem with Big Sur. It could be that this wasn't previously noticed due to some combination of file size, file type, and line count.
A workaround, as demonstrated above: use gtail
from coreutils
. If you want, you can link it into your PATH as tail
(this post has some info for that).
Added: testing with the OP's text file, using the same MacBook Pro listed above.
> time tail -265000 tc1_22.TXT >/dev/null
tail -265000 tc1_22.TXT > /dev/null 577.89s user 808.23s system 99% cpu 23:11.05 total
> time gtail -265000 tc1_22.TXT >/dev/null
gtail -265000 tc1_22.TXT > /dev/null 0.46s user 0.66s system 97% cpu 1.154 total
The file is ASCII text, CRLF, with tab-delimited fields. Significantly, each line has 1,900 characters; the performance here fits in with the pattern found by @John Palmieri.