Detecting Slashdot effect in nginx
Is there a way I can make Nginx to notify me if hits from a referrer goes beyond a threshold?
e.g If my website is featured at Slashdot and all of sudden I have 2K hits coming in an hour I want to be notified when goes beyond 1K hits an hour.
Will it be possible to do this in Nginx? Possibly without lua? (since my prod is not lua compiled)
The most efficient solution might be to write a daemon that would tail -f
the access.log
, and keep track of the $http_referer
field.
However, a quick and dirty solution would be to add an extra access_log
file, to log only the $http_referer
variable with a custom log_format
, and to automatically rotate the log every X minutes.
This can be accomplished with the help of standard logrotate scripts, which might need to do graceful restarts of nginx in order to have the files reopened (e.g., the standard procedure, take a look at /a/15183322 on SO for a simple time-based script)…
Or, by using variables within
access_log
, possibly by getting the minute specification out of$time_iso8601
with the help of themap
or anif
directive (depending on where you'd like to put youraccess_log
).
So, with the above, you may have 6 log files, each covering a period of 10 minutes, http_referer.Txx{0,1,2,3,4,5}x.log
, e.g., by getting the first digit of the minute to differentiate each file.
Now, all you have to do is have a simple shell script that could run every 10 minutes, cat
all of the above files together, pipe it to sort
, pipe it to uniq -c
, to sort -rn
, to head -16
, and you have a list of the 16 most common Referer
variations — free to decide if any combinations of numbers and fields exceeds your criteria, and perform a notification.
Subsequently, after a single successful notification, you could remove all of these 6 files, and, in subsequent runs, not issue any notification UNLESS all six of the files are present (and/or a certain other number as you see fit).
I think this would be far better done with logtail and grep. Even if it's possible to do with lua inline, you don't want that overhead for every request and you especially don't want it when you have been Slashdotted.
Here's a 5-second version. Stick it in a script and put some more readable text around it and you're golden.
5 * * * * logtail -f /var/log/nginx/access_log -o /tmp/nginx-logtail.offset | grep -c "http://[^ ]slashdot.org"
Of course, that completely ignores reddit.com and facebook.com and all of the million other sites that could send you lots of traffic. Not to mention 100 different sites sending you 20 visitors each. You should probably just have a plain old traffic threshold that causes an email to be sent to you, regardless of referrer.