How to forbid non-UTF-8 filenames?

Solution 1:

No. You'd either have to modify Linux or the filesystem implementation, or use a pass-through filter filesystem (perhaps implemented with fuse) that enforces the restriction.

It's a nice idea, but probably very difficult to get consensus on:

  • The old-school purists will insist that a filename should be able to be any nul-terminated byte string.
  • Others will say that if you enforce valid UTF-8, you should also go further and forbid other Unicode errors like combining characters without base characters, unassigned code points, and so on.

Solution 2:

zfs has a utf8only mount option that will enforce this.

There is a patch to add this to ext4 but it didn't seem to get much response.

Solution 3:

The filesystem itself (and by extension the linux filesystem layer) allows any character in a filename other than null and /. Modifying the driver to remove support for such names is theoretically possible, but might create unwanted side-effects: for example, what happens when you mount a filesystem that already has such files? Are they invisible? Do you get a kernel panic? Do you escape those names on display? And if you do escape the names, does that break any userland tools or make certain files inaccessible? [see "rootkit"]. Also, forking the OS means you have to manually rebuild for each kernel update and apply your patch accordingly -- a bit annoying.

If you do want to go ahead with it, though, the easiest way to do so is to create FUSE layer. It does adversely affect performance, but certainly it's the easiest way to get started and test your idea. You could read the documentation and write such a program in just a few hours.