Unexplained periodic humps in system CPU usage on W2K3E

We have a Windows 2003 R2 Enterprise 64-bit server running a custom workload suffering from an odd performance problem. The pared-down version below suffers from a smaller hump, but it's qualitatively the same.

We've reduced it down to a simple trivial app doing nothing more than:

  • listening on a socket
  • joining a multicast group
  • listening for packets coming in on that group
  • reading and discarding the packets

The test application itself is a slightly modified version of the Boost ASIO multicast receiver example so there's not really much that ought to be going wrong. Actual code(!) below…

Every so often while running this program under load, the CPU for this process will ramp up with all of the processing happening in kernel code:

perfmon graph(only CPU 6 is shown here. For the duration of this test (3h17m) all other processors are idle)

As you can see from the graph, when the load spikes hit all of the processing time is happening in kernel code. The time spent is mostly spent in Deferred Procedure Calls (max 16.8%) and handling Interrupts (max 8.5%). It looks like there's some sort of deferred cleanup happening, but we have no idea what it could be.

As far as we can tell it's happening on W2K3E-64 only.

It is happening on different hardware (HS21, HS22, HS22V, HP DL380).

Running the test application on Windows 2008 demonstrates the problem to a much smaller extent (more often but smaller humps).

How can we fix this or where should we look next?


Actual code from the example:

void handle_receive_from(const boost::system::error_code& error,
    size_t bytes_recvd)
{
    if (!error)
    {
        ++m_receivedPackets;
        m_receivedBytes += bytes_recvd;
        m_last64TotalBytes += bytes_recvd;
        if ( ( m_receivedPackets & 0x3F ) == 0 )
        {
            printf( "Received %u bytes in %u packets. The average size of the last 64 packets was %u bytes, and the last byte received was %x.\n", 
                m_receivedBytes, m_receivedPackets, m_last64TotalBytes / 64, m_buffer[ bytes_recvd - 1 ] );
            m_last64TotalBytes = 0;
        }

        m_socket.async_receive_from(
            boost::asio::buffer(m_buffer, max_length), m_senderEndpoint,
            boost::bind(&receiver::handle_receive_from, this,
            boost::asio::placeholders::error,
            boost::asio::placeholders::bytes_transferred));
    }
    else
    {
        std::cerr << "An error occurred when performing an asyncronous read." << std::endl;
        m_socket.get_io_service().stop();
    }
}

Solution 1:

"It looks like there's some sort of deferred cleanup happening, but we have no idea what it could be."

This could be garbage collection, but I am not sure if garbage collection shows up as privileged time. If this is a .NET application, you can look at the .NET CLR Memory performance counters (Gen 2 in particular is expensive).

In this end guessing possible issues seems a bit backwards. Your best bet would be to profile your application and see what it is doing during this to see what the calls the application is making. You might be able to get away with just using Process Monitor to watch the syscalls.

Solution 2:

I assume that the system is receiving multicast packets. Can you try and prevent it from receiving the packets and see if you see the same problem?

What about joining the multicast group but then not listening for packets?

You say its happening on different systems, but what about the actual NIC hardware? It's possible that it's the same on those different systems.

Update: If all of the systems are using Broadcom NICs, it's possible that the problem is with the NIC. In particular, the Microsoft-supplied Broadcom drivers are lousy; the ones at Broadcom's web site are much better.