Swift countElements() return incorrect value when count flag emoji
Update for Swift 4 (Xcode 9)
As of Swift 4 (tested with Xcode 9 beta) grapheme clusters break after every second regional indicator symbol, as mandated by the Unicode 9 standard:
let str1 = "π©πͺπ©πͺπ©πͺπ©πͺπ©πͺ"
print(str1.count) // 5
print(Array(str1)) // ["π©πͺ", "π©πͺ", "π©πͺ", "π©πͺ", "π©πͺ"]
Also String
is a collection of its characters (again), so one can
obtain the character count with str1.count
.
(Old answer for Swift 3 and older:)
From "3 Grapheme Cluster Boundaries" in the "Standard Annex #29 UNICODE TEXT SEGMENTATION": (emphasis added):
A legacy grapheme cluster is defined as a base (such as A or γ«) followed by zero or more continuing characters. One way to think of this is as a sequence of characters that form a βstackβ.
The base can be single characters, or be any sequence of Hangul Jamo characters that form a Hangul Syllable, as defined by D133 in The Unicode Standard, or be any sequence of Regional_Indicator (RI) characters. The RI characters are used in pairs to denote Emoji national flag symbols corresponding to ISO country codes. Sequences of more than two RI characters should be separated by other characters, such as U+200B ZWSP.
(Thanks to @rintaro for the link).
A Swift Character represents an extended grapheme cluster, so it is (according to this reference) correct that any sequence of regional indicator symbols is counted as a single character.
You can separate the "flags" by a ZERO WIDTH NON-JOINER:
let str1 = "π©πͺ\u{200C}π©πͺ"
print(str1.characters.count) // 2
or insert a ZERO WIDTH SPACE:
let str2 = "π©πͺ\u{200B}π©πͺ"
print(str2.characters.count) // 3
This solves also possible ambiguities, e.g. should "π«βπ·βπΊβπΈ" be "π«βπ·πΊβπΈ" or "π«π·βπΊπΈ" ?
See also How to know if two emojis will be displayed as one emoji? about a possible method
to count the number of "composed characters" in a Swift string,
which would return 5
for your let str1 = "π©πͺπ©πͺπ©πͺπ©πͺπ©πͺ"
.
Here's how I solved that problem, for Swift 3:
let str = "π©πͺπ©πͺπ©πͺπ©πͺπ©πͺ" //or whatever the string of emojis is
let range = str.startIndex..<str.endIndex
var length = 0
str.enumerateSubstrings(in: range, options: NSString.EnumerationOptions.byComposedCharacterSequences) { (substring, substringRange, enclosingRange, stop) -> () in
length = length + 1
}
print("Character Count: \(length)")
This fixes all the problems with character count and emojis, and is the simplest method I have found.