Recommended LogParser queries for IIS monitoring?

A good indicator for hacking activies or other attacks is the number of errors per hour. The following script returns the dates and hours that had more than 25 error codes returned. Adjust the value depending on the amount of traffic on the site (and the quality of your web application ;-) ).

SELECT date as Date, QUANTIZE(time, 3600) AS Hour, 
       sc-status as Status, count(*) AS ErrorCount
FROM   {filename} 
WHERE  sc-status >= 400 
GROUP BY date, hour, sc-status 
HAVING ErrorCount > 25
ORDER BY ErrorCount DESC

The result could something like this:

Date       Hour     Status ErrorCount
---------- -------- ------ ------
2009-07-24 18:00:00 404    187
2009-07-17 13:00:00 500    99
2009-07-21 21:00:00 404    80
2009-07-03 04:00:00 404    45
...

The next query detects an unusually high number of hits on a single URL from one IP address. In this example I chose 500, but you may have to change the query for edge cases (excluding the IP address of Google London for example ;-) .)

SELECT DISTINCT date AS Date, cs-uri-stem AS URL,
      c-ip AS IPAddress, Count(*) AS Hits
FROM  {filename}
GROUP BY date, c-ip, cs-uri-stem
HAVING Hits > 500
ORDER BY Hits Desc
Date       URL                                 IPAddress       Hits
---------- ----------------------------------- --------------- ----
2009-07-24 /Login.aspx                         111.222.111.222 1889
2009-07-12 /AccountUpdate.aspx                 11.22.33.44     973
2009-07-19 /Login.aspx                         123.231.132.123 821
2009-07-21 /Admin.aspx                         44.55.66.77     571
...

One thing you could consider to filter out legitimate traffic (and broaden your scope) is to enable cs(Cookie) in your IIS logs, add a bit of code that sets a small cookie using javascript, and add WHERE cs(Cookie)=''.

Because of your small bit of code, every user should have a cookie unless they manually disabled cookies (which a small percent of people might do) or unless that user is actually a bot that doesn't support Javascript (for example, wget, httpclient, etc. don't support Javascript).

I suspect that if a user has a high volume of activity, but they accept cookies and have javascript enabled, they are more likely to be a legitimate user, whereas if you find a user with a high volume of activity but no cookie/javascript support, they are more likely to be a bot.


Sorry, can't comment yet so I'm forced to answer.

There's a minor bug with the 'Top bandwidth usage by URL' query. While most of the time you'd be okay taking your requests for a page and multiplying by the file size, in this case, since you're not paying attention to any query parameters, you're going to run into some slightly-to-very inaccurate numbers.

For a more accurate value, just do a SUM(sc-bytes) instead of the MUL(Hits, AvgBytes) as ServedBytes.