Why are special characters such as "carriage return" represented as "^M"?
Why is ^M
used to represent a carriage return in VIM and other contexts?
My guess is that M
is the 13th letter of the Latin alphabet and a carriage return is \x0D
or decimal 13
. Is this the reason? Is this representation documented anywhere?
I notice that Tab is represented by ^I
, which is the ninth letter of the Latin alphabet. Conversely, Tab is \x09
or decimal 9
, which supports my theory stated above. However, where might this be documented as fact?
I believe that what OP was actually asking about is called Caret Notation.
Caret notation is a notation for unprintable control characters in ASCII encoding. The notation consists of a caret (^) followed by a capital letter; this digraph stands for the ASCII code that has the numerical value equivalent to the letter's numerical value. For example the EOT character with a value of 4 is represented as ^D because D is the 4th letter in the alphabet. The NUL character with a value of 0 is represented as ^@ (@ is the ASCII character before A). The DEL character with the value 127 is usually represented as ^?, because the ASCII '?' is before '@' and -1 is the same as 127 if masked to 7 bits. An alternative formulation of the translation is that the printed character is found by inverting the 7th bit of the ASCII code
The full list of ASCII control characters along with caret notation can be found here
Regarding vim and other text editors: You'll typically only see ^M if you open a Windows-formatted (CRLF) text file in an editor that expects Linux line endings (LF). The 0x0A is rendered as a line break, the 0x0D right before it gets printed as ^M. Most of the time, editor default settings include 'automatically recognize line endings'.
That is exactly the reason.
ASCII defines characters 0-31 as non-printing control codes. Here's an extract from the ascii(7)
manual page from a random Linux system (man ascii
), up to and including CR (13):
Oct Dec Hex Char
─────────────────────────────────────────────
000 0 00 NUL '\0'
001 1 01 SOH (start of heading)
002 2 02 STX (start of text)
003 3 03 ETX (end of text)
004 4 04 EOT (end of transmission)
005 5 05 ENQ (enquiry)
006 6 06 ACK (acknowledge)
007 7 07 BEL '\a' (bell)
010 8 08 BS '\b' (backspace)
011 9 09 HT '\t' (horizontal tab)
012 10 0A LF '\n' (new line)
013 11 0B VT '\v' (vertical tab)
014 12 0C FF '\f' (form feed)
015 13 0D CR '\r' (carriage ret)
Conventionally these characters are generated with Control and the letter relating to the character required. Teletypes and early terminal keyboards had 'BELL' written above the G key for this reason.
The standards document that defined ASCII is ASA X3.4-1963, which was published by the American Standards Association in 1963. I can't find the original document on their website, but this extract from the original document shows the character table, including the control codes above.
The notation goes back to the earliest ASCII Teletypes (ca 1963). There was a CTRL key that toggled the 0x40 bit so that CTRL-M (carriage return) would be 0D instead of 4D, CTRL-G (bell) would be 07 instead of 47, CTRL-L (form feed) would be 0C instead of 4C.
There was no "design" in assigning particular letters to particular functions, it was just chance that, when the dust settled from assigning ASCII codes, the M key was one bit different from carriage return and hence carriage return became CTRL-M.
Here is the best shot I can find of an ASR33 keyboard. As you can see the control character names are printed in small letters on the corresponding alpha keys.
Image by Marcin Wichary, User:AlanM1 (Derived (cropped) from File:ASR-33 2.jpg) [CC BY 2.0], via Wikimedia Commons
The M key does not have a notation on it because there is a dedicated "RETURN" key, so CTRL-M is redundant.