How to see if incoming HTTP requests are successful (view the responses) on Linux?
I'm really new to managing servers, and I've discovered that I can run ngrep and see connections from Google to our site (which is currently getting thousands upon thousands of 5xx errors and "crawl anomaly" errors in search console).
My question is how can I see if the connections are successful (if google is actually receiving a 200 response code, or not). Also, does Connection: close. mean that the connection was closed and they did not receive a response?
Example of output from ngrep.
$ sudo ngrep -il -d eth0 -W byline "Chrome/41.0.2272.96 Mobile Safari/537.36" port 80
GET / HTTP/1.1.
Host: subdomain.com.example.com.
User-Agent: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) X-Middleton/1.
Accept: text/html,application/xhtml+xml,application/signed-exchange;v=b3,application/xml;q=0.9,*/*;q=0.8.
Amp-Cache-Transform: google;v="1..3".
Cookie: BLAH.
From: googlebot(at)googlebot.com.
If-Modified-Since: Mon, 04 May 2020 15:51:07 GMT.
X-Forwarded-For: IP, IP.
X-Middleton: 1.
X-Middleton-Ip: IP.
X-Real-Ip: IP.
X-Snipe: BLAH.
Accept-Encoding: gzip.
Connection: close.
Solution 1:
Apache Logging
As per @hermanb response on the other question you asked on this topic, the easiest way to do this is not via tcpdump or ngrep, rather its via simply grepping the apache logs - which record the information you want in an easier to read form, including the response code.
The log file location and layout will vary depending on your setup and are very customizeable. The default logs will likely have the information you want, but if not, you can create your own logs.
On my server there are lines
LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
CustomLog ${APACHE_LOG_DIR}/other_vhosts_access.log vhost_combined
In the config file which creates a file /var/log/apache/other_vhosts_access.log. The "CustomLog" bit at the front says to use a custom log format, and the vhost_combined bit at the end describes the log format.
You can, of-course, have more then 1 log file, and more then 1 log format. This is all described at https://httpd.apache.org/docs/2.4/mod/mod_log_config.html#customlog
Once you have a format you can - in real time, grep the log file to see what you are looking for - for example,
tail -f /var/log/apache2/other_vhosts_access.log | grep --line-buffered " 200
"
would look at each line of the log file as its created and look for " 200 ". and using "grep -v" to invert the search. There are lots of more complex ways to get information out, buy using more complex grep expressions and/or more/other logging.
You should also read the page linked above as its got lots of information on formatting your own logs - for example maybe try
LogFormat "%s %v:%p/%r" statuscodelog
CustomLog /path/to/mylogfile.log statuscodelog
Which should, in theory, produce a simple output in the fom
200 example.com/requested/path1 500 example.com/requested/file/causes/error
You can then load this into a spreadsheet for easy analysis or grep this more specifically and simply by something like
tail -f /path/to/mylogfile.log | grep --line-buffered "^5"
To get a list of URLS returning status code 5** Of-course, this can be tailored to catch whatever you want, in whatever format you decide to create. This is just a fairly minimal example.
Connection Close
This is a duplicate of What does "Connection: close" mean when used in the response message? - the accepted answer shows that the "Connection: close" is something the web server or client sends to show the connection should not be persisted. The link goes into more depth, but it is simply following the specification for an HTTP/1.1 connection for applications that do not support persistent connections.