What's the complete range for Chinese characters in Unicode?
U+4E00..U+9FFF
is part of the complete set, but not all
May be you would find a complete list through the CJK Unicode FAQ (which does include "Chinese, Japanese, and Korean" characters)
The "East Asian Script" document does mention:
Blocks Containing Han Ideographs
Han ideographic characters are found in five main blocks of the Unicode Standard, as shown in Table 12-2
Table 12-2. Blocks Containing Han Ideographs
Block Range Comment
CJK Unified Ideographs 4E00-9FFF Common
CJK Unified Ideographs Extension A 3400-4DBF Rare
CJK Unified Ideographs Extension B 20000-2A6DF Rare, historic
CJK Unified Ideographs Extension C 2A700–2B73F Rare, historic
CJK Unified Ideographs Extension D 2B740–2B81F Uncommon, some in current use
CJK Unified Ideographs Extension E 2B820–2CEAF Rare, historic
CJK Compatibility Ideographs F900-FAFF Duplicates, unifiable variants, corporate characters
CJK Compatibility Ideographs Supplement 2F800-2FA1F Unifiable variants
Note: the block ranges can evolve over time: latest is in CJK Unified Ideographs.
See also Wikipedia:
- CJK Unified Ideographs Extension A
- CJK Unified Ideographs Extension B
- CJK Unified Ideographs Extension C
- CJK Unified Ideographs Extension D
- CJK Unified Ideographs Extension E
- CJK Unified Ideographs Extension F (Unicode 10)
Unicode currently has 74605 CJK characters. CJK characters not only includes characters used by Chinese, but also Japanese Kanji, Korean Hanja, and Vietnamese Chu Nom. Some CJK characters are not Chinese characters.
1) 20941 characters from the CJK Unified Ideographs block.
Code points U+4E00 to U+9FCC.
- U+4E00 - U+62FF
- U+6300 - U+77FF
- U+7800 - U+8CFF
- U+8D00 - U+9FCC
2) 6582 characters from the CJKUI Ext A block.
Code points U+3400 to U+4DB5. Unicode 3.0 (1999).
3) 42711 characters from the CJKUI Ext B block.
Code points U+20000 to U+2A6D6. Unicode 3.1 (2001).
- U+20000 - U+215FF
- U+21600 - U+230FF
- U+23100 - U+245FF
- U+24600 - U+260FF
- U+26100 - U+275FF
- U+27600 - U+290FF
- U+29100 - U+2A6DF
3) 4149 characters from the CJKUI Ext C block.
Code points U+2A700 to U+2B734. Unicode 5.2 (2009).
4) 222 characters from the CJKUI Ext D block.
Code points U+2B740 to U+2B81D. Unicode 6.0 (2010).
5) CJKUI Ext E block.
Coming soon
If the above is not spaghetti enough, take a look at known issues. Have fun =)
The exact ranges for Chinese characters (except the extensions) are [\u2E80-\u2FD5\u3190-\u319f\u3400-\u4DBF\u4E00-\u9FCC\uF900-\uFAAD]
.
[\u2e80-\u2fd5]
CJK Radicals Supplement is a Unicode block containing alternative, often positional, forms of the Kangxi radicals. They are used headers in dictionary indices and other CJK ideograph collections organized by radical-stroke.
[\u3190-\u319f]
Kanbun is a Unicode block containing annotation characters used in Japanese copies of classical Chinese texts, to indicate reading order.
[\u3400-\u4DBF]
CJK Unified Ideographs Extension-A is a Unicode block containing rare Han ideographs.
[\u4E00-\u9FCC]
CJK Unified Ideographs is a Unicode block containing the most common CJK ideographs used in modern Chinese and Japanese.
[\uF900-\uFAAD]
CJK Compatibility Ideographs is a Unicode block created to contain Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings.
For the details please refer to here, and the extensions are provided in other answers.