Response time slows as the day goes on, where to start troubleshooting?

My IT people just shrug their shoulders when I bring this up, so I am turning to SE for some help.

I think this image shows it best.

graph of http performance

Quite simply, as the day goes on response time gets worse and worse, until sometime around midnight something happens and it plummets back to almost normal. We are on IIS, this page happens to be still in Classic ASP, but this occurs on all pages even the plain HTML pages, which I think rules out a SQL connectivity issue.

I guess my question is, where do I start looking? I went through the regular logs and saw nothing that jumped out at me. But something is obviously going on, and I don't know where to start.


Solution 1:

There are a number of things that could be causing this - unfortunately, we probably need a bit more information.

Before I get into my actual response, just a quick point on your HTML pages: generally speaking, the application pool can only respond to a certain number of requests at a time. If it is busy responding to requests for dynamic pages, then it may not have any threads left to serve the static pages. For this reason, a code problem on a dynamic page can create the illusion that the static pages are being served "slowly". My point is, don't rule out code or SQL.

As an example: if you have 100 pages all hitting a database or API at the same time, and all 100 are waiting a response, request 101 may be blocked until until 1 of the first 100 completes.

Now, there are plenty of things you can do to help you diagnose this problem:

  • What is your load profile like normally? This makes a big difference - it may be that you always have an issue, but you can't see the impact until your site actually receives load. You could try and test this (in staging) with something like JMeter.

  • Enable IIS logs (if you haven't already), and then looking at them to see which requests are taking the longest. You can use something like Log Parser (from Microsoft) to run SQL-like queries against your logs (or even dump your logs into a SQL database), if that makes life easier. Once you know which pages are taking the longest, you can focus some of your attention on them.

  • Does your application have logs? If not, you should consider adding some logging. If you already have logs, what do they say? Are there exceptions being thrown by your application? Is there something that is consistently failing?

  • How much memory is your application pool using? A memory leak is an obvious candidate, but you should be quite easy to see. Use Windows' inbuilt Performance Monitor to track memory consumed by your application pool over the day, and see if this increases as the day goes on.

  • As I mentioned in the opening, SQL may still be an issue. I would recommend having a look at the database server, to see if there are any long running, or blocked, queries (e.g. in sys.dm_exec_requests, look at the wait_type, wait_time, blocking_session_id and the total_elapsed_time).

  • Check how many connections your application pool has open, using something like TCPView (another Microsoft tool). Your application pool will try to re-use connections where possible, but you'll probably see a lot of open connections to your application pool. One interesting thing you can see from this is now many connections you have open to your SQL database or any external APIs your application using.

  • Use an Application Performance and Monitoring tool. AppDynamics, or a similar tool, will be able to help pinpoint slow performing parts of your code. Unfortunately, there's a little bit of a learning curve to be able to use these tools effectively, but they can be very powerful in helping to diagnose problems with your applications.

Update

Restarting your application pool may help solve the problem if you have a memory leak, but you need to be careful with this: there may be some adverse impacts. After you restart your application pool, your application will begin loading static objects into memory, etc. Depending on how complex your application is, this may take a long time (could be 5-10 minutes or more). During this time, requests to your server may be delayed, making it appear that the problem is exacerbated.

If you are running a single server, your site may become temporarily unavailable while the application restarts (due to the app pool being busy, and being unable to respond to requests). If you are running in a farm, with a load balancer, the load balancer may drop your server out while the app pool restarts, which could direct all traffic to other servers and overload them. Don't restart the app pool on all of your servers at the same time, and try to "warm up" app pools (by simulating requests against the server) before reintroducing servers to the farm.

In other words: unless it's definitely an issue with memory leaking, it may not be worth restarting the app pool, because the issue may re-emerge straight away.

Note: Restarting the application pool will not impact any currently-running requests. These will continue to completion, unless you forcibly shut-down the app pool (e.g. Crtl + Alt + Del)