Find and report line numbers of empty lines in text file
I have text file containing 14000+ lines. it contains some data I am using for data training of speech recognition.
I created that file via coding of java and due to some semantic error a few of the lines are empty. Every time I run training it gives an error after about 30 minutes complaining that there is a empty line.
Is there any code/script/command which can give me list of line numbers with empty lines, so I can fill those empty lines and save my time?
Working should be like:
I will input a file.txt
and it will give me
line number 1121,1212,1450,13000 and so on ...
are empty in file.txt
Solution 1:
You can find the empty lines, and their line numbers, with
grep -E --line-number --with-filename '^$' file.txt
An example:
w3@aardvark:~(0)$ grep -E --line-number --with-filename '^$' file.txt
file.txt:1:
file.txt:3:
file.txt:4:
w3@aardvark:~(0)$ cat -n file.txt
1
2 Not empty
3
4
5 Not empty
w3@aardvark:~(0)$
If your "empty" lines contain blanks or TABs, use:
grep -E --line-number --with-filename '^\s*$' file.txt
Solution 2:
sed
will report the line number with the =
command, so you can use this expression to report line numbers of empty lines (lines with nothing between ^
(start of line) and $
(end of line)):
sed -n '/^$/=' file
We use the -n
option to suppress printing the stream (line numbers are printed separately from the lines themselves when we use =
, so there is no p
command here), so the only output is line numbers of the matching lines.
$ sed -n '/^$/=' foo
1
3
5
7
(if lines 1, 3 , 5 and 7 are empty in foo
)
Here's an example to show how you can get the user interaction you wanted. You could use any solution in place of the sed
expression in these structures...
$ cat foo
2
4
6
8
So:
$ read -p "Enter file name: "; echo -e "The following lines are empty in "$REPLY":\n$(sed -n '/^$/=' "$REPLY" | tr '\n' ' ')"
Enter file name: foo
The following lines are empty in foo:
1 3 5 7
(Use tr '\n' ','
to get commas instead of spaces)
You could save as a script (I'm naming mine empline
):
#!/bin/bash
read -p "Enter file name: "
echo -e "The following lines are empty in "$REPLY":\n\
$(sed -n '/^$/=' "$REPLY" | tr '\n' ' ')"
Make the script executable:
chmod u+x empline
Then you can run it like this
$ ./empline
Enter file name: foo
The following lines are empty in foo:
1 3 5 7
You could skip the read
line and replace "$REPLY"
with "$1"
to use the filename as a positional parameter (so run ./empline foo
). To simplify usage, you could make a function and add to the end of your ~/.bashrc
:
function empline() {
echo -e "The following lines are empty in "$1":\n\
$(sed -n '/^$/=' "$1" | tr '\n' ' ')"
}
This takes the filename as argument:
$ empline foo
The following lines are empty in foo:
1 3 5 7
Solution 3:
Using awk
The method for multiple file input (see end of post) is the most robust.
Single file input:
awk 'BEGIN { printf "Line numbers of empty lines in " ARGV[1] ": " } !NF { printf sep NR ; sep="," } END { printf "\n" }' file.txt
The BEGIN
section runs before the input file is processed.
ARGV[1]
is the name of the input file. This corresponds to awk's FILENAME
variable, which does not work in the BEGIN
section.
!NF
matches lines that are blank or that only contain field separators. The default field separators are space and tab characters, so lines that contain only spaces and tabs count as empty. NF
(without the exclamation point) matches lines that contain data, and adding !
inverts the match.
NR
is the input file's line number currently being evaluated. NR
does not reset to 1 if additional input files are specified on the command line.
To prevent a comma from appearing in front of the first matching line number, leave the string sep
undefined until after printing the first match.
The END
section runs after the input file is processed. In this example, it terminates the output cleanly by printing a Unix-style newline character.
Example output:
Line numbers of empty lines in file.txt: 8,13,15,20,25,28
It's a bit sloppy to use a string name without first setting it, even if you initially want it to be empty. You could explicitly set the sep
string to be empty in the BEGIN
section:
awk 'BEGIN { sep="" ; printf "Line numbers of empty lines in " ARGV[1] ": " } !NF { printf sep NR ; sep="," } END { printf "\n" }' file.txt
Multiple file input:
awk 'FNR==1 && NR>1 { printf "\n" } FNR==1 { sep="" ; printf "Line numbers of empty lines in " FILENAME ": " } !NF { printf sep FNR ; sep="," } END { printf "\n" }' file1.txt file2.txt file3.txt
FNR
is similar to NR
, except that the FNR
line number counter resets to 1 at the start of each file.
The section FNR==1 && NR>1 { printf "\n" }
causes each file's output to print on a separate line. It prints a newline character when the first line of each additional input file is processed, but not for the first line of the first file.
Example output:
Line numbers of empty lines in file1.txt: 8,13,15,20,25,28
Line numbers of empty lines in file2.txt: 1,2,4,6,7,9,10
Line numbers of empty lines in file3.txt: 3,8,9,11,13,15
Solution 4:
Pure Bash, using the example file foo
from Zanna's answer:
i=0
while read line; do
((++i))
if [[ $line == '' ]]; then
echo $i
fi
done < foo
Output:
1
3
5
7
Or you might prefer the Bash equivalent of the Python solution using enumerate()
:
cat -n foo |
while read -r i line; do
if [[ $line == '' ]]; then
echo $i
fi
done