Command line tools to analyze Apache log files [closed]

Solution 1:

While the tools above are all cool I think I know what the questioner was asking. It often pains me that I can't pull the information out of an access-log in the way I can with other files.

It's because of the dumb access log format:

127.0.0.1 - - [16/Aug/2014:20:47:29 +0100] "GET /manual/elisp/index.html HTTP/1.1" 200 37230 "http://testlocalhost/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0"

Why did they use [] for the date and "" for other things? did they think we wouldn't know a date was in field 4? It's incredibly frustrating.

The best tool right now for this is gawk:

gawk 'BEGIN { FPAT="([^ ]+)|(\"[^\"]+\")|(\\[[^\\]]+\\])" } { print $5 }'

on the data above this would give you:

"GET /manual/elisp/index.html HTTP/1.1"

In other words, the FPAT gives you the ability to pull out the fields of the apache-log as if they were actual fields instead of just space separated entities. This is always what I want. I can then parse that a bit more with a pipeline.

Making the FSPAT work is defined here: http://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html

You can therefore set up an alias to make a gawk that can parse apache logs:

alias apacheawk="gawk -vFPAT='([^ ]+)|(\"[^\"]+\")|(\\\\[[^\\\\]]+\\\\])' " apacheawk '$6 ~ /200/ { print $5 } | sort | uniq

made this for me:

"GET / HTTP/1.1"
"GET /manual/elisp/index.html HTTP/1.1"
"GET /manual/elisp/Index.html HTTP/1.1"
"GET /scripts/app.js HTTP/1.1"
"GET /style.css HTTP/1.1"

and of course almost anything else is now possible.

Enjoy!

Solution 2:

wtop is cool. There's other utilities as well. Often, I'll parse logs using bash, sed, and awk.