Command line tools to analyze Apache log files [closed]
Solution 1:
While the tools above are all cool I think I know what the questioner was asking. It often pains me that I can't pull the information out of an access-log in the way I can with other files.
It's because of the dumb access log format:
127.0.0.1 - - [16/Aug/2014:20:47:29 +0100] "GET /manual/elisp/index.html HTTP/1.1" 200 37230 "http://testlocalhost/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0"
Why did they use [] for the date and "" for other things? did they think we wouldn't know a date was in field 4? It's incredibly frustrating.
The best tool right now for this is gawk:
gawk 'BEGIN { FPAT="([^ ]+)|(\"[^\"]+\")|(\\[[^\\]]+\\])" } { print $5 }'
on the data above this would give you:
"GET /manual/elisp/index.html HTTP/1.1"
In other words, the FPAT gives you the ability to pull out the fields of the apache-log as if they were actual fields instead of just space separated entities. This is always what I want. I can then parse that a bit more with a pipeline.
Making the FSPAT work is defined here: http://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html
You can therefore set up an alias to make a gawk that can parse apache logs:
alias apacheawk="gawk -vFPAT='([^ ]+)|(\"[^\"]+\")|(\\\\[[^\\\\]]+\\\\])' "
apacheawk '$6 ~ /200/ { print $5 } | sort | uniq
made this for me:
"GET / HTTP/1.1"
"GET /manual/elisp/index.html HTTP/1.1"
"GET /manual/elisp/Index.html HTTP/1.1"
"GET /scripts/app.js HTTP/1.1"
"GET /style.css HTTP/1.1"
and of course almost anything else is now possible.
Enjoy!
Solution 2:
wtop is cool. There's other utilities as well. Often, I'll parse logs using bash, sed, and awk.