playing with awk command trying to remove first line in input

below is my input file, results keep returning "Breed has Votes votes"

    Breed, Votes
    Black Lab, 30
    Chihuahua, 2
    Pug, 1
    Corgi, 45
    Shar Pei, 21
    Shih Tzu, 5
    Maltese, 7

#!/bin/sh/awk
##comment The below awk script runs on the file dog_breed.txt, FS refers to field separator which is a comma in this case. We initialize our max variable to 0 and max_breed to the first breed, then iterate over the rows to find the max voted breed.
    
    BEGIN{
            FS=", "
            max=0;
            max_breed=$1
    }
    {
            if(max<($2)){
                    max=$2;
                    max_breed=$1;
            }
    }
    END{
            print max_breed " has " max " votes"
    }

Solution 1:

You can skip the first record (line) by adding a rule to the block:

NR > 1 {
        if(max<($2)){
                max=$2;
                max_breed=$1;
        }
}

However it's worth trying to understand why you get the result that you do when you don't exclude the first line - that's because:

  • when NR==1, the value of max is 0 (numeric - assigned in the BEGIN block) but the value of $2 is Votes (which is a string). So the expression max<($2) converts max to a string and performs a lexicographical comparison. If 0 is less than V in your locale, then the result is TRUE, and max is assigned the string value Votes

  • for subsequent lines, $2 is numeric, but max is now a string so $2 gets converted to a string and again the comparison is lexicographical. Assuming V has a greater lexicographical weight than any of the digits 0 thru 9, V always wins.

As an aside, your shebang looks invalid - it likely should be

#!/usr/bin/awk -f

or

#!/bin/awk -f

depending on your Ubuntu version. Also, assignments like max_breed=$1 aren't really meaningful in a BEGIN block, since it is executed before any records have been processed.