Why is "ss" equal to the German sharp-s character 'ß'?
Coming from this question I'm wondering why ä
and ae
are different(which makes sense) but ß
and ss
are treated as equal. I haven't found an answer on SO even if this question seems to be related and even mentions "that ß
will compare equal to SS
in Germany, or similar" but not why.
The only resource on MSDN I found was this: How to: Compare Strings
Here is mentioned following but also lacks the why:
// "They dance in the street."
// Linguistically (in Windows), "ss" is equal to
// the German essetz: 'ß' character in both en-US and de-DE cultures.
.....
So why does this evaluate to true
, both with de-DE
culture or any other culture:
var ci = new CultureInfo("de-DE");
int result = ci.CompareInfo.Compare("strasse", "straße", CompareOptions.IgnoreNonSpace); // 0
bool equals = String.Equals("strasse", "straße", StringComparison.CurrentCulture); // true
equals = String.Equals("strasse", "straße", StringComparison.InvariantCulture); // true
If you look at the Ä page, you'll see that not always Ä is a replacement for Æ (or ae), and it is still used in various languages.
The letter ß instead:
While the letter "ß" has been used in other languages, it is now only used in German. However, it is not used in Switzerland, Liechtenstein or Namibia.[1] German speakers in Germany, Austria, Belgium,[2] Denmark,[3] Luxembourg[4] and South Tyrol, Italy[5] follow the standard rules for ß.
So the ß is used in a single language, with a single rule (ß == ss), while the Ä is used in multiple languages with multiple rules.
Note that, considering that case folding is:
Case folding is primarily used for caseless comparison of text, such as identifiers in a computer program, rather than actual text transformation
The official Unicode 7.0 Case Folding Properties tells us that
00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
where 00DF is ß and 0073 is s, so ß can be considered, for caseless comparison, as ss.
Some background info for you. Taken from here.
Windows Alt Codes
In Windows, combinations of the ALT key plus a numeric code can be used to type a non-English character (accented letter or punctuation symbol) in any Windows application. More detailed instructions about typing accents with ALT keys are available. Additional options for entering accents in Windows are also listed in the Accents section of this Web site.
Note: The letters ü, ö, ä and ß can be replaced by "ue", "oe", "ae" or "ss" respectively.
German ALT Codes
Sym Windows ALT Code
Ä ALT+0196
ä ALT+0228
Ö ALT+0214
ö ALT+0246
Ü ALT+0220
ü ALT+0252
ß ALT+0223
€ ALT+0128
Taken from here.
In the German alphabet, the letter ß, called "Eszett" (IPA: [ɛsˈtsɛt]) or "scharfes S", in English "sharp S", is a consonant that evolved as a ligature of "long s and z" (ſz) and "long s over round s" (ſs). When speaking it is pronounced [s] (see IPA). Since the German orthography reform of 1996, it is used only after long vowels and diphthongs, while ss is written after short vowels. The name eszett comes from the two letters S and Z as they are pronounced in German. It is also called scharfes S (IPA: [ˈʃaɐ̯.fəs ˈʔɛs, ˈʃaː.fəs ˈʔɛs] in German, meaning "sharp S". Its Unicode encoding is U+00DF.