Range of valid character for a base 64 encoding

Solution 1:

Here is what I could turn up: RFC 4648

It includes this convenient table:

                  Table 1: The Base 64 Alphabet

 Value Encoding  Value Encoding  Value Encoding  Value Encoding
     0 A            17 R            34 i            51 z
     1 B            18 S            35 j            52 0
     2 C            19 T            36 k            53 1
     3 D            20 U            37 l            54 2
     4 E            21 V            38 m            55 3
     5 F            22 W            39 n            56 4
     6 G            23 X            40 o            57 5
     7 H            24 Y            41 p            58 6
     8 I            25 Z            42 q            59 7
     9 J            26 a            43 r            60 8
    10 K            27 b            44 s            61 9
    11 L            28 c            45 t            62 +
    12 M            29 d            46 u            63 /
    13 N            30 e            47 v
    14 O            31 f            48 w         (pad) =
    15 P            32 g            49 x
    16 Q            33 h            50 y

So a regular expression that matches any character that should never appear in Base 64 encodings would be:

[^A-Za-z0-9+/=]

However, as kapeps answer points out, this is only the recommendation. Specific implementations might choose a different set of 64 characters. (In fact, even the linked RFC contains an alternative table for URL and filename safe encoding, which replaces character 62 and 63 with - and _ respectively). So I guess it really depends on the implementation that created the encoding.

Solution 2:

You are probably safe with the other answers in most situations, but according to the Wikipedia article on Base64 there shouldn't be a definite list you can rely on:

The particular choice of character set selected for the 64 characters required for the base varies between implementations.

RFC 4648 mentions other alphabets, such as the "URL and Filename safe" Base 64 Alphabet, where + and / are replaced with - and _.

There's a table of Base64 variants which use different characters. Keep in mind that there are implementation specific rules about line separators, which you can find in the same table. Some implementations like Mime even allow (and ignore) characters that are not in the alphabet.

Solution 3:

Base64 only contains A–Z, a–z, 0–9, +, / and =. So the list of characters not to be used is: all possible characters minus the ones mentioned above.

For special purposes . and _ are possible, too.