Find out if Character in String is emoji?

I need to find out whether a character in a string is an emoji.

For example, I have this character:

let string = "๐Ÿ˜€"
let character = Array(string)[0]

I need to find out if that character is an emoji.


What I stumbled upon is the difference between characters, unicode scalars and glyphs.

For example, the glyph ๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘งโ€๐Ÿ‘ง consists of 7 unicode scalars:

  • Four emoji characters: ๐Ÿ‘จ๐Ÿ‘ฉ๐Ÿ‘ง๐Ÿ‘ง
  • In between each emoji is a special character, which works like character glue; see the specs for more info

Another example, the glyph ๐Ÿ‘Œ๐Ÿฟ consists of 2 unicode scalars:

  • The regular emoji: ๐Ÿ‘Œ
  • A skin tone modifier: ๐Ÿฟ

Last one, the glyph 1๏ธโƒฃ contains three unicode characters:

  • The digit one: 1
  • The variation selector
  • The Combining Enclosing Keycap: โƒฃ

So when rendering the characters, the resulting glyphs really matter.

Swift 5.0 and above makes this process much easier and gets rid of some guesswork we needed to do. Unicode.Scalar's new Property type helps is determine what we're dealing with. However, those properties only make sense when checking the other scalars within the glyph. This is why we'll be adding some convenience methods to the Character class to help us out.

For more detail, I wrote an article explaining how this works.

For Swift 5.0, this leaves you with the following result:

extension Character {
    /// A simple emoji is one scalar and presented to the user as an Emoji
    var isSimpleEmoji: Bool {
        guard let firstScalar = unicodeScalars.first else { return false }
        return firstScalar.properties.isEmoji && firstScalar.value > 0x238C
    }

    /// Checks if the scalars will be merged into an emoji
    var isCombinedIntoEmoji: Bool { unicodeScalars.count > 1 && unicodeScalars.first?.properties.isEmoji ?? false }

    var isEmoji: Bool { isSimpleEmoji || isCombinedIntoEmoji }
}

extension String {
    var isSingleEmoji: Bool { count == 1 && containsEmoji }

    var containsEmoji: Bool { contains { $0.isEmoji } }

    var containsOnlyEmoji: Bool { !isEmpty && !contains { !$0.isEmoji } }

    var emojiString: String { emojis.map { String($0) }.reduce("", +) }

    var emojis: [Character] { filter { $0.isEmoji } }

    var emojiScalars: [UnicodeScalar] { filter { $0.isEmoji }.flatMap { $0.unicodeScalars } }
}

Which will give you the following results:

"Aฬ›อšฬ–".containsEmoji // false
"3".containsEmoji // false
"Aฬ›อšฬ–โ–ถ๏ธ".unicodeScalars // [65, 795, 858, 790, 9654, 65039]
"Aฬ›อšฬ–โ–ถ๏ธ".emojiScalars // [9654, 65039]
"3๏ธโƒฃ".isSingleEmoji // true
"3๏ธโƒฃ".emojiScalars // [51, 65039, 8419]
"๐Ÿ‘Œ๐Ÿฟ".isSingleEmoji // true
"๐Ÿ™Ž๐Ÿผโ€โ™‚๏ธ".isSingleEmoji // true
"๐Ÿ‡น๐Ÿ‡ฉ".isSingleEmoji // true
"โฐ".isSingleEmoji // true
"๐ŸŒถ".isSingleEmoji // true
"๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง".isSingleEmoji // true
"๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ".isSingleEmoji // true
"๐Ÿด๓ ง๓ ข๓ ฅ๓ ฎ๓ ง๓ ฟ".containsOnlyEmoji // true
"๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง".containsOnlyEmoji // true
"Hello ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง".containsOnlyEmoji // false
"Hello ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง".containsEmoji // true
"๐Ÿ‘ซ Hรฉllo ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง".emojiString // "๐Ÿ‘ซ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง"
"๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง".count // 1

"๐Ÿ‘ซ Hรฉllล“ ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง".emojiScalars // [128107, 128104, 8205, 128105, 8205, 128103, 8205, 128103]
"๐Ÿ‘ซ Hรฉllล“ ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง".emojis // ["๐Ÿ‘ซ", "๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง"]
"๐Ÿ‘ซ Hรฉllล“ ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง".emojis.count // 2

"๐Ÿ‘ซ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘ฆ".isSingleEmoji // false
"๐Ÿ‘ซ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ง๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘ฆ".containsOnlyEmoji // true

For older Swift versions, check out this gist containing my old code.


The simplest, cleanest, and swiftiest way to accomplish this is to simply check the Unicode code points for each character in the string against known emoji and dingbats ranges, like so:

extension String {

    var containsEmoji: Bool {
        for scalar in unicodeScalars {
            switch scalar.value {
            case 0x1F600...0x1F64F, // Emoticons
                 0x1F300...0x1F5FF, // Misc Symbols and Pictographs
                 0x1F680...0x1F6FF, // Transport and Map
                 0x2600...0x26FF,   // Misc symbols
                 0x2700...0x27BF,   // Dingbats
                 0xFE00...0xFE0F,   // Variation Selectors
                 0x1F900...0x1F9FF, // Supplemental Symbols and Pictographs
                 0x1F1E6...0x1F1FF: // Flags
                return true
            default:
                continue
            }
        }
        return false
    }

}

Swift 5.0

โ€ฆ introduced a new way of checking exactly this!

You have to break your String into its Scalars. Each Scalar has a Property value which supports the isEmoji value!

Actually you can even check if the Scalar is a Emoji modifier or more. Check out Apple's documentation: https://developer.apple.com/documentation/swift/unicode/scalar/properties

You may want to consider checking for isEmojiPresentation instead of isEmoji, because Apple states the following for isEmoji:

This property is true for scalars that are rendered as emoji by default and also for scalars that have a non-default emoji rendering when followed by U+FE0F VARIATION SELECTOR-16. This includes some scalars that are not typically considered to be emoji.


This way actually splits up Emoji's into all the modifiers, but it is way simpler to handle. And as Swift now counts Emoji's with modifiers (e.g.: ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ, ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป, ๐Ÿด) as 1 you can do all kind of stuff.

var string = "๐Ÿค“ test"

for scalar in string.unicodeScalars {
    let isEmoji = scalar.properties.isEmoji

    print("\(scalar.description) \(isEmoji)"))
}

// ๐Ÿค“ true
//   false
// t false
// e false
// s false
// t false

NSHipster points out an interesting way to get all Emoji's:

import Foundation

var emoji = CharacterSet()

for codePoint in 0x0000...0x1F0000 {
    guard let scalarValue = Unicode.Scalar(codePoint) else {
        continue
    }

    // Implemented in Swift 5 (SE-0221)
    // https://github.com/apple/swift-evolution/blob/master/proposals/0221-character-properties.md
    if scalarValue.properties.isEmoji {
        emoji.insert(scalarValue)
    }
}

With Swift 5 you can now inspect the unicode properties of each character in your string. This gives us the convenient isEmoji variable on each letter. The problem is isEmoji will return true for any character that can be converted into a 2-byte emoji, such as 0-9.

We can look at the variable isEmoji and also check the for the presence of an emoji modifier to determine if the ambiguous characters will display as an emoji.

This solution should be much more future proof than the regex solutions offered here.

extension String {
    func containsOnlyEmojis() -> Bool {
        if count == 0 {
            return false
        }
        for character in self {
            if !character.isEmoji {
                return false
            }
        }
        return true
    }
    
    func containsEmoji() -> Bool {
        for character in self {
            if character.isEmoji {
                return true
            }
        }
        return false
    }
}

extension Character {
    // An emoji can either be a 2 byte unicode character or a normal UTF8 character with an emoji modifier
    // appended as is the case with 3๏ธโƒฃ. 0x238C is the first instance of UTF16 emoji that requires no modifier.
    // `isEmoji` will evaluate to true for any character that can be turned into an emoji by adding a modifier
    // such as the digit "3". To avoid this we confirm that any character below 0x238C has an emoji modifier attached
    var isEmoji: Bool {
        guard let scalar = unicodeScalars.first else { return false }
        return scalar.properties.isEmoji && (scalar.value > 0x238C || unicodeScalars.count > 1)
    }
}

Giving us

"hey".containsEmoji() //false

"Hello World ๐Ÿ˜Ž".containsEmoji() //true
"Hello World ๐Ÿ˜Ž".containsOnlyEmojis() //false

"3".containsEmoji() //false
"3๏ธโƒฃ".containsEmoji() //true

extension String {
    func containsEmoji() -> Bool {
        for scalar in unicodeScalars {
            switch scalar.value {
            case 0x3030, 0x00AE, 0x00A9,// Special Characters
            0x1D000...0x1F77F,          // Emoticons
            0x2100...0x27BF,            // Misc symbols and Dingbats
            0xFE00...0xFE0F,            // Variation Selectors
            0x1F900...0x1F9FF:          // Supplemental Symbols and Pictographs
                return true
            default:
                continue
            }
        }
        return false
    }
}

This is my fix, with updated ranges.