AWK script that sums a column with a header

Everyone eventually writes awk '{s+=$3} END{print s}'. It works on TSV, it works on CSV without quoted fields, and then one day a column has an empty cell or the file has a header row and the number is wrong.

This version handles a header, lets you pick the column by name, and skips blanks. It is one awk invocation, no head | tail dance.

# usage:  sumcol.awk col=latency_ms file.tsv
# or:     awk -f sumcol.awk -v col=bytes -F, file.csv

NR == 1 {
    for (i = 1; i <= NF; i++)
        if ($i == col) { idx = i }
    if (!idx) { print "no column: " col > "/dev/stderr"; exit 2 }
    next
}

$idx ~ /^[0-9eE.+\-]+$/ {
    sum += $idx
    n++
}

END {
    if (n == 0) { print "no numeric rows" > "/dev/stderr"; exit 1 }
    printf "sum=%.3f  n=%d  mean=%.3f\n", sum, n, sum/n
}

The regex guard is the thing I used to forget. Without it, a stray - or N/A silently contributes zero and you do not notice until your number is 5% off. Default field separator is whitespace, pass -F, for CSV without quoted commas. See also /snippets/find-modified-today-multi-repo/.