Did `man ls > temp.txt`. Output textfile is corrupted

When I execute man ls > temp.txt the output text file is corrupted. By corrupted I mean that first and last letters in some words are overly repeated.

Several first lines in temp.txt:

LS(1)                     BSD General Commands Manual                    LS(1)

NNAAMMEE
     llss -- list directory contents

SSYYNNOOPPSSIISS
     llss [--AABBCCFFGGHHLLOOPPRRSSTTUUWW@@aabbccddeeffgghhiikkllmmnnooppqqrrssttuuwwxx11] [_f_i_l_e _._._.]

DDEESSCCRRIIPPTTIIOONN

And so on. Without redirection man ls is perfectly normal. What's happening?


Solution 1:

From man man:

To get a plain text version of a man page, without backspaces and underscores, try

    # man foo | col -b > foo.mantxt

man prints formatted version of man page, underscores and double letters are parsed

Its not so much that they are 'parsed' but rather "if you don't have a terminal, bold format is to be displayed as a repeated character". Once you hook it up to a terminal (vt100, xterm, Terminal, etc...), man recognizes the terminal and sends the appropriate control codes to do color, bold, underline and the like. Its being parsed correctly - just for a null terminal type.

comment by MichaelT

Solution 2:

Once upon a time, computers were routinely hooked up to teletypes (teleprinters) which would print all text on paper in real time as it was received. Although teleprinters didn't have any facilities for underlined or bold-faced text, outputting an underline, backspacing, and printing something else would cause that something else to appear underlined. Likewise outputting a character, backspacing, and outputting the same character would tend to make the character appear darker, though the effectiveness of that would vary depending upon the quality of the installed ribbon (if the ribbon was old and feeble, typing the same character twice would make it significantly darker; with a new ribbon typing the character even once would achieve close to maximum blackness). Further, even if a user wasn't attached to a printer, redirecting the output of man to a print spooler would have been pretty common, which probably explains why man would behave that way even when the output was redirected.

BTW, on some printers (and even teleprinters), the performance of _←U_←N_←D_←E_←R_←L_←I_←N_←I_←N_←G would be noticeably worse than ___________←←←←←←←←←←←UNDERLINING, since the former requires the printhead to repeatedly reverse direction (and typically overshoot its target at both ends). The same would be true when using multi-strike boldface as well, but there the behavior could actually be advantageous since the first time each character is printed would immediately follow a backspace character and the second would not. If the print head was accelerating while printing the first character, that would cause it to be misaligned slightly relative to the second, making the bold-face effect more effective.

Solution 3:

Mateusz's answer is correct, but it is worth pointing out that rather than stripping out formatting intended for a tty, you can have man format differently.

For example, you can get a nicely formatted pdf instead with:

man -t ls | pstopdf -i -o ~/ls.pdf