How are character encodings related to fonts?

I mean, does a font have to support every character encoding? Or does a character encoding have to support every font?

What do Unicode fonts mean? are they fonts that support only Unicode, and they dont support, say, windows-1252?


To start with basics, everything is based on US-ASCII which is an 7 bit code with 128 code points in the set, numbered hex 00 through 7F or decimal 0-127. This is mapped to control codes, English alphanumeric, and basic punctuation characters

Adding 1 bit to this for an 8 bit code (byte) gives us another 128 code points or Extended ASCII.

Character sets/code pages were required early on to change how the code points in the upper 128 bits mapped to characters to cover the alphabet for the particular language you wished to represent. This works reasonably well for most western European languages. ISO 8859-1/Latin-1 is an example of such a character set. Another is Windows-1252 which has changes from ISO 8859-1 to help it cover more or different characters.

Languages with more complex character sets like Chinese, Japanese, and Korean exceed the capabilities of the 256 code point set and use a double-byte code to enable their representation.

Unicode UTF-8 is a multi-byte character encoding scheme (1-4 bytes) with backward compatibility to ISO 8859-1/Latin-1 being its first 128 characters. It has room for over 1 million code points which means that each code point can actually represent a character, unlike the mucking around done with Extended ASCII which means that a code point maps to a different character, depending on the character set/code page/encoding.

Fonts are glyphs that are mapped to code points and visually represent characters. The contents of a font are dependent on what languages it was originally meant to cover. You can use Character Map to see what glyphs are contained within the font.

Unicode fonts don't necessarily cover all the code points, you need to see where they were intended to be used. For example, in Windows 7, fire up Character Map and view the characters in Calibri and then compare them to Ebrima, Meiryo and Raavi. Note that they are vastly different because each one is tailored to a different geographic region.

As to Unicode fonts and the Windows-1252 character set, Windows uses a mapping table to translate Windows-1252 to Unicode where it doesn't match ISO 8859-1 for a "Best Fit" scenario where some characters in the Windows-1252 character set may not display.


Character set

A character set is a collection of characters, to each of which a number is assigned.

A well known character set is ASCII. This is a set of 128 characters numbered from 0 to 127. These numbers can all be expressed in 7 bits (therefore it is a 7-bit character set)

Most but not all other character sets include the ASCII set with the same numbering. Examples of character sets that are not like ASCII include EBCDIC. There were also European variants on ASCII that had differing characters in certain positions (e.g to include £)

Encodings

Large character sets such as Unicode, with more than a million characters, would require three or four bytes per character to accomodate the large range of numbers that have been assigned to characters. Instead they use a system that allows that number to be "encoded" as one, two, three or more bytes. With the UTF-8 encoding scheme, the characters that are the same as ASCII characters happen to be encoded with single bytes with the same byte value as in ASCII.

The above encodings are used when storing text in files.

Typefaces

A Typeface is a specific design of the visual representation (i.e. shape) for a set of characters. The shapes are called glyphs. A typeface might have several glyphs for one character (consider "a"). It might have glyphs for pairs of characters called ligatures (e.g. "ff" or "fi"). In a typeface the set of characters, for which a glyph has been designed, therefore often differs from the set of characters in well-known character sets (typefaces usually do not include glyphs for ASCII control characters).

Fonts

In the context of computers, a font means a file containing glyphs ordered according to some numbering scheme (which often is not the same as the numbering in any well-known character set). Historically there were bit-mapped fonts which represented a specific size (in pixels or points) of a typeface. Currently most fonts use mathematical curves to describe glyphs and so can be scaled to represent any size of typeface.

Putting it all together

When you display a text file, the computer has to be told (or guess) the encoding used in the file. It will then use a different numbering (e.g. a 16-bit variant of Unicode) to represent the text in memory, it will then use information in a font file to map the internal representation to the numbering (encoding) used in the font file.