Is there a way to check whether unicode text is in a certain language?

I'll be getting text from a user that I need to validate is a Chinese character.

Is there any way I can check this?

You can use regular expression to match with Supported Named Blocks:

private static readonly Regex cjkCharRegex = new Regex(@"\p{IsCJKUnifiedIdeographs}");
public static bool IsChinese(this char c)
{
    return cjkCharRegex.IsMatch(c.ToString());
}

Then, you can use:

if (sometext.Any(z=>z.IsChinese()))
     DoSomething();

According to the information provided here in unicode website you can find the block of Chinese or any other language and then implement a parser to check if a word is in the range or no. just like

public bool IsChinese(string text)
{
    return text.Any(c => c >= 0x20000 && c <= 0xFA2D);
}

Note that

As a handy reference, the Unicode Consortium here provides a search interface to the Unicode Hàn (漢) Database (Unihan).

The database link I'd provided above is showing you the characters

As several people mentioned here, in unicode, chinese, japan, and Korean characters are encoded together, and there are several ranges to it. https://en.wikipedia.org/wiki/CJK_Compatibility

For the simplicity, here's a code sample that detects all the CJK range:

public bool IsChinese(string text)
{
    return text.Any(c => (uint)c >= 0x4E00 && (uint)c <= 0x2FA1F);
}

Is there a way to check whether unicode text is in a certain language?

Related

Recent Posts