Weird grep behavior with CJK characters? (bash)

grep fails to match certain strings with CJK characters. For example.

  1. Create a text file with content below:
==ShellType.サモナ\u30FC==
  1. Use grep.
    >> grep "ShellType.サモナ\u30FC" test.txt
    (empty output)
    >> grep "ShellType.サモナ.*\u30FC" test.txt
    ==ShellType.サモナ\u30FC==

Is this a grep bug or CJK characters need special handling?
How to properly search with CJK strings with grep, or other reliable tools?

System: Ubuntu 20.04
GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
grep (GNU grep) 3.4


It has nothing to do with CJK. You can use -o to (more or less) see what \u actually means in grep:

[tom@ideapad ~]$ cat /tmp/meh 
==ShellType.サモナ\u30FC==
[tom@ideapad ~]$ grep -o '\u' /tmp/meh 
u
[tom@ideapad ~]$ grep -o '.\u' /tmp/meh 
\u
[tom@ideapad ~]$ grep -o '.*\u' /tmp/meh 
==ShellType.サモナ\u
[tom@ideapad ~]$ grep -o '.*.*\u' /tmp/meh 
==ShellType.サモナ\u
[tom@ideapad ~]$ grep -o '==ShellType.サモナ.*\u' /tmp/meh
==ShellType.サモナ\u
[tom@ideapad ~]$ grep -o '==ShellType.サモナ.\u' /tmp/meh
==ShellType.サモナ\u

Note that I've been using single quotes since with \, double quotes could make things even more complicated. The proper way to do the grep you (seem to) desire are:

[tom@ideapad ~]$ grep -o '==ShellType\.サモナ\\u' /tmp/meh 
==ShellType.サモナ\u
[tom@ideapad ~]$ grep -o "==ShellType\\.サモナ\\\\u" /tmp/meh 
==ShellType.サモナ\u

As far as I know, grep does not consider \u30FC (however further escaped) to be a unicode character like printf in a shell does. To actually grep one with its code point, you can make the shell expand it first with ANSI-C quoting (it might not work in every POSIX shell though):

[tom@ideapad ~]$ printf '\u30FC' > /tmp/heh
[tom@ideapad ~]$ grep $'\u30FC' /tmp/heh 
ー

P.S. It might be worth mentioning that, while ANSI-C quoting makes use of single quotes in its syntax, it does NOT mean that it works like single quotes for the parts other than the code point expansion:

[tom@ideapad ~]$ grep -o $'==ShellType\.サモナ\\u' /tmp/meh 
[tom@ideapad ~]$ grep -o $'==ShellType\\.サモナ\\\\u' /tmp/meh 
==ShellType.サモナ\u