Pipestat

Some days it feels like reading and analysing logs is the only thing I do at work.
And when I analyse logs it's usually something like

$ zcat -f *log* | grep -v something | awk '/something else/ { print $3 }' | sed 's/something/else/g' | sort | uniq -c | sort -gr > output

And when this does not fully comply to what i need to do i fall back to perl and write a ugly oneliner of the type

$ tail -f foo | perl -lane 's/foo(.*)/bar$1bar/g'

So when my wife and daughter got a visit from Mr Noro I had some spare time and wrote tailstat. It's a small perlprogram that does what I manualy do every day

It has a few different modes

Plain

[ay@wildbill logs]# for i in foo foo bar bar foo foobar;do echo $i;done | pipestat
foo                       : (50.0%) (107546.3 hits/s) (3 hits / 6)
bar                       : (33.3%) (71697.5 hits/s) (2 hits / 6)
foobar                    : (16.7%) (35848.8 hits/s) (1 hits / 6)

Parsed 6 lines in 0.00 secs (46863.7 lines/s)

a single regexp

[ay@wildbill logs]# for i in foo foo bar bar foo foobar;do echo $i;done | pipestat foo
foo                       : (66.7%) (100462.4 hits/s) (4 hits / 6)
                            : (33.3%) (50231.2 hits/s) (2 hits / 6)

Parsed 6 lines in 0.00 secs (25317.7 lines/s)

More real examples,

[ay@wildbill logs]# cat www-access_log | pipestat 'css' 'jpe?g' 'gif' 'png'
jpe?g                     : (30.3%) (24590.5 hits/s) (122851 hits / 405830)
css                       : (11.7%) (9494.6 hits/s) (47434 hits / 405830)
gif                       : (6.6%) (5355.4 hits/s) (26755 hits / 405830)
png                       : (0.4%) (341.3 hits/s) (1705 hits / 405830)
<rest>                    : (51.0%) (41451.2 hits/s) (207085 hits / 405830)

Parsed 405830 lines in 5.00 secs (81230.8 lines/s)

Or if you just want field number 5…

[root@wildbill logs]# cat www-access_log | pipestat --field=9 --limit=5
200                       : (88.2%) (77506.3 hits/s) (350721 hits / 397549)
HTTP/1.1"                 : (3.3%) (2858.5 hits/s) (12935 hits / 397549)
404                       : (2.9%) (2530.8 hits/s) (11452 hits / 397549)
301                       : (2.4%) (2092.8 hits/s) (9470 hits / 397549)
403                       : (1.5%) (1321.3 hits/s) (5979 hits / 397549)
<limited>                 : (1.8%) (1545.2 hits/s) (6992(1311) hits(uniq / 397549)

Parsed 397549 lines in 4.53 secs (87795.5 lines/s)

This takes a apache accesslogg and print out the 9th whitespace seperated field and writes out the 5 most common hits, the rest of the hits are sumed up in But as you can see there is something in the logs that confuses the fieldseperator

So then we go into matchmodus.

[root@wildbill logs]# cat www-access_log | pipestat -m 'HTTP/1.1" (\d+)'
200                       : (92.3%) (218082.1 hits/s) (375934 hits / 407426)
404                       : (3.1%) (7229.9 hits/s) (12463 hits / 407426)
301                       : (2.5%) (5910.1 hits/s) (10188 hits / 407426)
403                       : (1.4%) (3339.1 hits/s) (5756 hits / 407426)
302                       : (0.4%) (873.1 hits/s) (1505 hits / 407426)
405                       : (0.0%) (37.1 hits/s) (64 hits / 407426)
206                       : (0.0%) (19.1 hits/s) (33 hits / 407426)
304                       : (0.0%) (4.6 hits/s) (8 hits / 407426)
<rest>                    : (0.4%) (855.7 hits/s) (1475 hits / 407426)

Parsed 407426 lines in 1.72 secs (236328.4 lines/s)

Or we could run it on the live log for say 60 seconds and what article is the most popular

[ay@joanie ~]$ varnishlog -i RxURL -I artid | pipestat --runtime=60 --limit 5 -m 'artid=(d+)'
542689                    : (7.9%) (1.5 hits/s) (91 hits / 1154)
542739                    : (7.5%) (1.4 hits/s) (86 hits / 1154)
542568                    : (4.4%) (0.8 hits/s) (51 hits / 1154)
542723                    : (4.1%) (0.8 hits/s) (47 hits / 1154)
542718                    : (3.7%) (0.7 hits/s) (43 hits / 1154)
<limited>                 : (72.4%) (13.5 hits/s) (835(244) hits(uniq) / 1154)
<rest>                    : (0.1%) (0.0 hits/s) (1 hits / 1154)

Parsed 1154 lines in 61.80 secs (18.7 lines/s)

Have fun. The code is here