awk

"pattern-directed scanning and processing language" - man awk

Examples

Some of these require GNU awk.

Print the first column of a file

awk '${print $1}' filename.txt

Print column 2 if column 1 matches a string

ps aux | awk '$1 == "root" {print $2}'

Pass in a variable and value

ps | awk -v host="$HOSTNAME" '{print host,$0}'

Sort a file by line lengths

awk '{print length, $0}' testfile.txt | sort -n

TDL to CSV

awk '{gsub("\t","\",\"",$0); print;}' | sed 's#^#"#;s#$#"#;'

Print the first column of every other line

% is the modulus operator, which finds the remainder after an integer divide.

awk 'NR % 2 == 0 { print $1 }'

Print only even numbered lines

ls | awk 'NR % 2 == 0 { print $0 }'

Print only odd numbered lines

ls | awk 'NR % 2 != 0 { print $0 }'

Print even numbered lines on the same line before odd numbered lines

awk '{if (NR%2==0) { print $0 " " prev } else { prev=$0 }}'

Print sum all the first columns of each line in a file

awk '{sum += $1} END {print sum}' filename

Print count, sum, and average of the first column of stdin

for _ in {1..100} ; do echo $((RANDOM % 100)) ; done |
awk '{sum += $1} END {avg = sum/NR ; printf "Count:   %s\nSum:     %s\nAverage: %s\n", NR, sum, avg}'

Split file by recurring string

This will create a new file every time the string "SERVER" is found, essentially splitting the file by that string. Concatenating all of the output files would create the original file (potentially adding an extra newline).

awk '/SERVER/{n++}{print >"out" sprintf("%02d", n) ".txt" }' example.txt

Show count of syslog messages per minute

awk -F: {'print $1 `“`:`”` $2'} /var/log/messages |uniq -c

Show count of root logins per minute

awk -F: '/root/{print $1 ":" $2}' /var/log/auth.log |uniq -c

Print lines in ls where UID is numeric

ls -la | awk '$3 ~/[0-9]/{print}'

Show only zfs snapshots whose size is zero

zfs list -t snapshot | awk '$2 == 0'

Print a line if the third field does not match a regex

echo {100..200} | fold -w 12 | awk '$3 !~ /[13579]$/ {print}'

Show 500 errors in a standard apache access log

awk '$9 ~ /5[0-9][0-9]/' access.log

Show total rss and vsz count for all cronolog processes

ps aux |
  grep -i cronolo[g] |
  awk '{vsz += $5; rss += $6} END {print "vsz total = "vsz ; print "rss total = "rss}'

Get IPv4 address on BSD/OSX

ifconfig | awk '$1 == "inet" && $2 != "127.0.0.1" {print $2}'

Get IPv6 address on BSD/OSX

ifconfig | awk '$1 == "inet6" && $2 !~ "::1|.*lo" {print $2}'

Print the last element

ls -la | awk -F" " '{print $NF}'

Print 2nd to last element

ls -la | awk -F" " '{print $(NF - 1)}'

Print the previous line on string match

This works by storing the previous line. If the current line matches the regex, the previous line is printed from the stored value.

$ awk '/32 host/ { print previous_line } {previous_line=$0}' /proc/net/fib_trie | column -t | sort -u
|--  10.134.243.137
|--  127.0.0.1
|--  169.50.9.172

Add content to line 1 if there is no match

This adds a yaml document separator to the beginning of all yaml files in the current directory only if it does not already have one.

tempfile=$(mktemp)
for file in ./*.yaml ; do
  awk 'NR == 1 && $0 != "---" {print "---"} {print}' "${file}" > "${tempfile}" \
  && mv "${tempfile}" "${file}"
done

Show all docker images in a helm chart and their https links

helm template . --set global.baseDomain=foo.com -f /Users/danielh/a/google-environments/prod/cloud/app/config.yaml 2>/dev/null |
awk '/image: / {match($2, /(([^"]*):[^"]*)/, a) ; printf "https://%s %s\n", a[2], a[1] ;}' |
sort -u |
column -t

A less complicated awk form of this that uses other shell commands would be

helm template . --set global.baseDomain=foo.com -f /Users/danielh/a/google-environments/prod/cloud/app/config.yaml 2>/dev/null |
grep 'image: ' |
awk '{print $2}' |
sed 's/"//g' |
sed 's/\(\(.*\):.*\)/https:\/\/\2 \1/' |
sort -u |
column -t

So it really depends on where you want to put your complications, how performant you want to be, and how readable you want it to be. These both produce identical output, but some people find it easier to read shorter commands with simpler syntaxes, which is great for maintainability when performance is not an issue.

https://quay.io/astronomer/ap-alertmanager  quay.io/astronomer/ap-alertmanager:0.23.0
https://quay.io/astronomer/ap-astro-ui      quay.io/astronomer/ap-astro-ui:0.25.4
https://quay.io/astronomer/ap-base          quay.io/astronomer/ap-base:3.14.2
https://quay.io/astronomer/ap-cli-install   quay.io/astronomer/ap-cli-install:0.25.2
...snip...

Show a list of dns hostname queries with domain stripped, sorted by hostname length

This samples 100k dns queries, strips off all the domain names in the queried hostname, and prints the length of that first component of the FQDN (the bare hostname) along with the bare hostname itself, and shows the longest 25 entries.

tcpdump -c 100000 -l -n -e dst port 53 |
awk '$14 == "A?" {gsub(/\..*/, "", $15) ; print(length($15), $15) ; fflush("/dev/stdout") ;}' |
sort -u |
sort -n |
tail -n 25

Run this on your kube-dns nodes to see how close you're getting to the 63 character limit. You will never see errors though, because any name with components that are longer than 63 characters are not sent over the wire, so you'll need to check your logs for those. A good string to search for is "63 characters".

awk