Find and report line numbers of empty lines in text file

I have text file containing 14000+ lines. it contains some data I am using for data training of speech recognition.

I created that file via coding of java and due to some semantic error a few of the lines are empty. Every time I run training it gives an error after about 30 minutes complaining that there is a empty line.

Is there any code/script/command which can give me list of line numbers with empty lines, so I can fill those empty lines and save my time?

Working should be like:

I will input a file.txt and it will give me

line number 1121,1212,1450,13000 and so on ... are empty in file.txt

enter image description here


Solution 1:

You can find the empty lines, and their line numbers, with

grep -E --line-number --with-filename '^$' file.txt  

An example:

w3@aardvark:~(0)$ grep -E --line-number --with-filename '^$' file.txt
file.txt:1:
file.txt:3:
file.txt:4:
w3@aardvark:~(0)$ cat -n file.txt
     1  
     2  Not empty
     3  
     4  
     5  Not empty
w3@aardvark:~(0)$ 

If your "empty" lines contain blanks or TABs, use:

grep -E --line-number --with-filename '^\s*$' file.txt

Solution 2:

sed will report the line number with the = command, so you can use this expression to report line numbers of empty lines (lines with nothing between ^ (start of line) and $ (end of line)):

sed -n '/^$/=' file

We use the -n option to suppress printing the stream (line numbers are printed separately from the lines themselves when we use =, so there is no p command here), so the only output is line numbers of the matching lines.

$ sed -n '/^$/=' foo 
1
3
5
7

(if lines 1, 3 , 5 and 7 are empty in foo)


Here's an example to show how you can get the user interaction you wanted. You could use any solution in place of the sed expression in these structures...

$ cat foo

2

4

6

8

So:

$ read -p "Enter file name: "; echo -e "The following lines are empty in "$REPLY":\n$(sed -n '/^$/=' "$REPLY" | tr '\n' ' ')"
Enter file name: foo
The following lines are empty in foo:
1 3 5 7 

(Use tr '\n' ',' to get commas instead of spaces)

You could save as a script (I'm naming mine empline):

#!/bin/bash
read -p "Enter file name: "
echo -e "The following lines are empty in "$REPLY":\n\
$(sed -n '/^$/=' "$REPLY" | tr '\n' ' ')"

Make the script executable:

chmod u+x empline

Then you can run it like this

$ ./empline
Enter file name: foo
The following lines are empty in foo:
1 3 5 7 

You could skip the read line and replace "$REPLY" with "$1" to use the filename as a positional parameter (so run ./empline foo). To simplify usage, you could make a function and add to the end of your ~/.bashrc:

function empline() {
    echo -e "The following lines are empty in "$1":\n\
$(sed -n '/^$/=' "$1" | tr '\n' ' ')"
}

This takes the filename as argument:

$ empline foo
The following lines are empty in foo:
1 3 5 7 

Solution 3:

Using awk

The method for multiple file input (see end of post) is the most robust.

Single file input:

awk 'BEGIN { printf "Line numbers of empty lines in " ARGV[1] ": " } !NF { printf sep NR ; sep="," } END { printf "\n" }' file.txt

The BEGIN section runs before the input file is processed.

ARGV[1] is the name of the input file. This corresponds to awk's FILENAME variable, which does not work in the BEGIN section.

!NF matches lines that are blank or that only contain field separators. The default field separators are space and tab characters, so lines that contain only spaces and tabs count as empty. NF (without the exclamation point) matches lines that contain data, and adding ! inverts the match.

NR is the input file's line number currently being evaluated. NR does not reset to 1 if additional input files are specified on the command line.

To prevent a comma from appearing in front of the first matching line number, leave the string sep undefined until after printing the first match.

The END section runs after the input file is processed. In this example, it terminates the output cleanly by printing a Unix-style newline character.

Example output:

Line numbers of empty lines in file.txt: 8,13,15,20,25,28

It's a bit sloppy to use a string name without first setting it, even if you initially want it to be empty. You could explicitly set the sep string to be empty in the BEGIN section:

awk 'BEGIN { sep="" ; printf "Line numbers of empty lines in " ARGV[1] ": " } !NF { printf sep NR ; sep="," } END { printf "\n" }' file.txt

Multiple file input:

awk 'FNR==1 && NR>1 { printf "\n" } FNR==1 { sep="" ; printf "Line numbers of empty lines in " FILENAME ": " } !NF { printf sep FNR ; sep="," } END { printf "\n" }' file1.txt file2.txt file3.txt

FNR is similar to NR, except that the FNR line number counter resets to 1 at the start of each file.

The section FNR==1 && NR>1 { printf "\n" } causes each file's output to print on a separate line. It prints a newline character when the first line of each additional input file is processed, but not for the first line of the first file.

Example output:

Line numbers of empty lines in file1.txt: 8,13,15,20,25,28
Line numbers of empty lines in file2.txt: 1,2,4,6,7,9,10
Line numbers of empty lines in file3.txt: 3,8,9,11,13,15

Solution 4:

Pure Bash, using the example file foo from Zanna's answer:

i=0
while read line; do
    ((++i))
    if [[ $line == '' ]]; then
        echo $i
    fi
done < foo

Output:

1
3
5
7

Or you might prefer the Bash equivalent of the Python solution using enumerate():

cat -n foo | 
    while read -r i line; do
       if [[ $line == '' ]]; then
            echo $i
        fi
    done