Trying to find files that contain only NULs, but getting some others
The files I am trying to find/list are:
- Any size (0 bytes accepted)
- Consist only of ASCII NUL characters (0x00)
- If there are any characters other than 0x00, the file shouldn't be listed.
The command I have now is:
grep -RLP '[^\x00]' .
Which works, but it also finds file which consists only of two bytes: 0xFF, 0xFE. Don't know why.
Is there any better command to find such files?
Solution 1:
In short, what is happening here is that grep
is trying to interpret your file as Unicode data. The sequence 0xFF, 0xFE is a Byte Order Marker for UTF-16.
(In my testing, even other sequences involving two 0xFF's or two 0xFE's etc. would still not match the '[^\x00]'
regex, since even when trying to do UTF-8 these would be considered non-characters.)
Using a locale that doesn't use Unicode for character types should fix this, which you can accomplish by setting the LC_CTYPE environment variable. Use the C
locale to force ASCII encoding (so no Unicode enabled):
LC_CTYPE=C grep -RLP '[^\x00]' .
UPDATE: As pointed out by @steeldriver, grep still acts on a line-by-line basis, so files containing NUL bytes and newlines will still match.
@DavidFoerster's solution using grep's -z
does a good job of solving this problem, using the NUL bytes as separators does the trick.
Alternatively, I came up with a short Python 3 script (allzeroes.py
) to check whether the file's contents are all zeroes:
#!/usr/bin/python3
import sys
assert len(sys.argv) == 2
with open(sys.argv[1], 'rb') as f:
for block in iter(lambda: f.read(4096), b''):
if any(block):
sys.exit(1)
Which you can use in a find
to locate all matches recursively:
$ find . -type f -exec allzeroes.py {} \; -print
I hope that helps.
Solution 2:
You can abuse grep
’s alternative null-terminated line mode and thus search for files that contain only empty lines:
grep -L -z -e . ...
Replace ...
with the file set that you want to scan (here: -R .
).
Explanation
-
-z
,--null-data
– Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline.1 -
-e .
– Use.
as the search pattern, i. e. match any character. -
-L
,--files-without-match
– Suppress normal output; instead print the name of each input file from which no output would normally have been printed. The scanning will stop on the first match.1
Test case
Set-up:
: > empty
truncate -s 100 zero
printf '%s\0' foo bar > foobar
Run test:
$ grep -L -z -e . empty zero foobar
empty
zero
1 From the grep(1)
manual page.