Why does mod_security require an ACCEPT HTTP header field?
After some debugging, I found that the core ruleset of mod_security blocks requests that don't have the (optional!) ACCEPT header field.
This is what I find in the logs:
ModSecurity: Warning. Match of "rx ^OPTIONS$" against "REQUEST_METHOD" required. [file "/etc/apache2/conf.d/modsecurity/modsecurity_crs_21_protocol_anomalies.conf"] [line "41"] [id "960015"] [msg "Request Missing an Accept Header"] [severity "CRITICAL"] [tag "PROTOCOL_VIOLATION/MISSING_HEADER"] [hostname "example.com"] [uri "/"] [unique_id "T4F5@H8AAQEAAFU6aPEAAAAL"]
ModSecurity: Access denied with code 400 (phase 2). Match of "rx ^OPTIONS$" against "REQUEST_METHOD" required. [file "/etc/apache2/conf.d/modsecurity/optional_rules/modsecurity_crs_21_protocol_anomalies.conf"] [line "41"] [id "960015"] [msg "Request Missing an Accept Header"] [severity "CRITICAL"] [tag "PROTOCOL_VIOLATION/MISSING_HEADER"] [hostname "example.com"] [uri "/"] [unique_id "T4F5@H8AAQEAAFU6aPEAAAAL"]
Why is this header required? I understand that "most" clients send these, but why is their absence considered a security threat?
I didn't write these rules but as I understand it, there is a strong correlation between clients that don't include this header and malicious clients and also between clients that do include it and benign clients.
You may find certain bots (for example: Pingdom, HostTracker, UpDowner, magpie-crawler, Yandex, Yodao, MJ12, GigaBot and the LinkedInBot in a quick grep through my logs) that don't send this header however if you combine this with a rule that matches "normal" User-Agents such as Chrome, Firefox, IE, Safari, Opera, etc. then you will be able to avoid blocking those bots.
There are some clients (or possibly a proxy that modifies the headers) that send an accept:
header (and most other headers in lower case). I haven't yet been able to determine whether these are malicious or not, however they all claim to be "Firefox/3.6.8" and have:
Via:HTTP/1.1 silk
X-Forwarded-For:10.161.106.98
or some other 10.x.x.x IP address in their headers... which is suspicious.
RFC 2616 states that the Accept header SHOULD be present in all requests. Note that this isn't an absolute requirement, so a user-agent is still conditionally compliant (as defined in the RFC) if it doesn't send this header.
The rationale for denying requests without an Accept header is that all regular web browsers do send the header, while many bots do not. In practice, though, after seeing millions of requests, some "good" bots don't send the Accept header either. So this rule is not perfect and does generate false positives.