AWK script that sums a column with a header
Everyone eventually writes awk '{s+=$3} END{print s}'. It works on TSV, it works on CSV without quoted fields, and then one day a column has an empty cell or the file has a header row and the number is wrong.
This version handles a header, lets you pick the column by name, and skips blanks. It is one awk invocation, no head | tail dance.
# usage: sumcol.awk col=latency_ms file.tsv
# or: awk -f sumcol.awk -v col=bytes -F, file.csv
NR == 1 {
for (i = 1; i <= NF; i++)
if ($i == col) { idx = i }
if (!idx) { print "no column: " col > "/dev/stderr"; exit 2 }
next
}
$idx ~ /^[0-9eE.+\-]+$/ {
sum += $idx
n++
}
END {
if (n == 0) { print "no numeric rows" > "/dev/stderr"; exit 1 }
printf "sum=%.3f n=%d mean=%.3f\n", sum, n, sum/n
}
The regex guard is the thing I used to forget. Without it, a stray - or N/A silently contributes zero and you do not notice until your number is 5% off. Default field separator is whitespace, pass -F, for CSV without quoted commas. See also /snippets/find-modified-today-multi-repo/.