Use `less` pager on file with non-standard encoding

I often use the less pager to view logfiles. Usually I use less -F to follow the progress of the log à la tail.

However, some logfiles use national characters in a non-standard encoding (Latin-1, while the system uses UTF-8). Obviously, these will not be displayed correctly.

How can I view such files with less?

The only solutions I found:

  • Correct the encoding of the file (recode or iconv). This does not work while the file is still being written, so does not let me use less -F. Plus it destroys the logfiles original timestamp, which is bad from an auditing perspective.
  • Use a pipe (recode latin1... |less). Works for files in progress, but unfortunately then less -F does not appear to work (it just does not update; I believe the recode process exits once it's done).

Any solution that lets me "tail" a logfile and still shows national characters correctly?


Solution 1:

Hm, apparently less cannot do this. The part in less' sourcecode that implements the "following" seems to be:

A_F_FOREVER:
                        /*
                         * Forward forever, ignoring EOF.
                         */
                        if (ch_getflags() & CH_HELPFILE)
                                break;
                        cmd_exec();
                        jump_forw();
                        ignore_eoi = 1;
                        while (!sigs)
                        {
                                make_display();
                                forward(1, 0, 0);
                        }
                        ignore_eoi = 0;

As far as my (limited) knowledge of C goes, this means that if "follow" is activated, less will:

  1. seek to the end of input
  2. read and update the display in a loop, until Ctrl-C is pressed

If input is a pipel, 1. will not return until the pipe signals EOF. If I use tail -f xx|less, the pipe will never signal EOF, so less hangs :-(.

I did however find a way to get what I want:

 tail -f inputfile | recode latin1.. > /tmp/tmpfile

then

less +F /tmp/tmpfile

This will work, because it lets less +F work on a real file. It's still somewhat awkward, because recode apparently only processes data in blocks of 4096 bytes, but it works...

Solution 2:

It's possible that recode is buffering output in the pipe so output only comes through when the buffer, probably 4K, is full. You can try using the unbuffer script that comes with expect.