How to clean up output of linux 'script' command

I'm using the linux 'script' command http://www.linuxcommand.org/man_pages/script1.html to track some interactive sessions. The output files from that contain unprintable characters, including my backspace keystrokes.

Is there a way to tidy these output files up so they only contain what was displayed on screen?

Or is there another way to record an interactive shell session (input and output)?


Solution 1:

If you want to view the file, then you can send the output through col -bp; this interprets the control characters. Then you can pipe through less, if you like.

col -bp typescript | less -R

On some systems col wouldn't accept a filename argument, use this syntax instead:

col -bp <typescript | less -R

Solution 2:

cat typescript | perl -pe 's/\e([^\[\]]|\[.*?[a-zA-Z]|\].*?\a)//g' | col -b > typescript-processed

here's some interpretation of the string input to perl:

  • s/pattern//g means to do a substitution on the entire (the g option means do the entire thing instead of stopping on the first substitute) input string

here's some interpretation of the regex pattern:

  • \e match the special "escape" control character (ASCII 0x1A)
  • ( and ) are the beginning and end of a group
  • | means the group can match one of N patterns. where the N patterns are
    • [^\[\]] or
    • \[.*?[a-zA-Z] or
    • \].*?\a
  • [^\[\]] means
    • match a set of NOT characters where the not characters are [ and ]
  • \[.*?[a-zA-Z] means
    • match a string starting with [ then do a non-greedy .*? until the first alpha character
  • \].*?\a means
    • match a string that starts with ] then do a non-greedy .*? until you hit the special control character called "the alert (bell) character"