How to determine CPU and memory consumption from inside a process

I once had the task of determining the following performance parameters from inside a running application:

  • Total virtual memory available
  • Virtual memory currently used
  • Virtual memory currently used by my process
  • Total RAM available
  • RAM currently used
  • RAM currently used by my process
  • % CPU currently used
  • % CPU currently used by my process

The code had to run on Windows and Linux. Even though this seems to be a standard task, finding the necessary information in the manuals (WIN32 API, GNU docs) as well as on the Internet took me several days, because there's so much incomplete/incorrect/outdated information on this topic to be found out there.

In order to save others from going through the same trouble, I thought it would be a good idea to collect all the scattered information plus what I found by trial and error here in one place.


Windows

Some of the above values are easily available from the appropriate Win32 API, I just list them here for completeness. Others, however, need to be obtained from the Performance Data Helper library (PDH), which is a bit "unintuitive" and takes a lot of painful trial and error to get to work. (At least it took me quite a while, perhaps I've been only a bit stupid...)

Note: for clarity all error checking has been omitted from the following code. Do check the return codes...!

  • Total Virtual Memory:

    #include "windows.h"
    
    MEMORYSTATUSEX memInfo;
    memInfo.dwLength = sizeof(MEMORYSTATUSEX);
    GlobalMemoryStatusEx(&memInfo);
    DWORDLONG totalVirtualMem = memInfo.ullTotalPageFile;
    

    Note: The name "TotalPageFile" is a bit misleading here. In reality this parameter gives the "Virtual Memory Size", which is size of swap file plus installed RAM.

  • Virtual Memory currently used:

    Same code as in "Total Virtual Memory" and then

     DWORDLONG virtualMemUsed = memInfo.ullTotalPageFile - memInfo.ullAvailPageFile;
    
  • Virtual Memory currently used by current process:

    #include "windows.h"
    #include "psapi.h"
    
    PROCESS_MEMORY_COUNTERS_EX pmc;
    GetProcessMemoryInfo(GetCurrentProcess(), (PROCESS_MEMORY_COUNTERS*)&pmc, sizeof(pmc));
    SIZE_T virtualMemUsedByMe = pmc.PrivateUsage;
    
  • Total Physical Memory (RAM):

    Same code as in "Total Virtual Memory" and then

    DWORDLONG totalPhysMem = memInfo.ullTotalPhys;
    
  • Physical Memory currently used:

    Same code as in "Total Virtual Memory" and then

    DWORDLONG physMemUsed = memInfo.ullTotalPhys - memInfo.ullAvailPhys;
    
  • Physical Memory currently used by current process:

    Same code as in "Virtual Memory currently used by current process" and then

    SIZE_T physMemUsedByMe = pmc.WorkingSetSize;
    
  • CPU currently used:

    #include "TCHAR.h"
    #include "pdh.h"
    
    static PDH_HQUERY cpuQuery;
    static PDH_HCOUNTER cpuTotal;
    
    void init(){
        PdhOpenQuery(NULL, NULL, &cpuQuery);
        // You can also use L"\\Processor(*)\\% Processor Time" and get individual CPU values with PdhGetFormattedCounterArray()
        PdhAddEnglishCounter(cpuQuery, L"\\Processor(_Total)\\% Processor Time", NULL, &cpuTotal);
        PdhCollectQueryData(cpuQuery);
    }
    
    double getCurrentValue(){
        PDH_FMT_COUNTERVALUE counterVal;
    
        PdhCollectQueryData(cpuQuery);
        PdhGetFormattedCounterValue(cpuTotal, PDH_FMT_DOUBLE, NULL, &counterVal);
        return counterVal.doubleValue;
    }
    
  • CPU currently used by current process:

    #include "windows.h"
    
    static ULARGE_INTEGER lastCPU, lastSysCPU, lastUserCPU;
    static int numProcessors;
    static HANDLE self;
    
    void init(){
        SYSTEM_INFO sysInfo;
        FILETIME ftime, fsys, fuser;
    
        GetSystemInfo(&sysInfo);
        numProcessors = sysInfo.dwNumberOfProcessors;
    
        GetSystemTimeAsFileTime(&ftime);
        memcpy(&lastCPU, &ftime, sizeof(FILETIME));
    
        self = GetCurrentProcess();
        GetProcessTimes(self, &ftime, &ftime, &fsys, &fuser);
        memcpy(&lastSysCPU, &fsys, sizeof(FILETIME));
        memcpy(&lastUserCPU, &fuser, sizeof(FILETIME));
    }
    
    double getCurrentValue(){
        FILETIME ftime, fsys, fuser;
        ULARGE_INTEGER now, sys, user;
        double percent;
    
        GetSystemTimeAsFileTime(&ftime);
        memcpy(&now, &ftime, sizeof(FILETIME));
    
        GetProcessTimes(self, &ftime, &ftime, &fsys, &fuser);
        memcpy(&sys, &fsys, sizeof(FILETIME));
        memcpy(&user, &fuser, sizeof(FILETIME));
        percent = (sys.QuadPart - lastSysCPU.QuadPart) +
            (user.QuadPart - lastUserCPU.QuadPart);
        percent /= (now.QuadPart - lastCPU.QuadPart);
        percent /= numProcessors;
        lastCPU = now;
        lastUserCPU = user;
        lastSysCPU = sys;
    
        return percent * 100;
    }
    

Linux

On Linux the choice that seemed obvious at first was to use the POSIX APIs like getrusage() etc. I spent some time trying to get this to work, but never got meaningful values. When I finally checked the kernel sources themselves, I found out that apparently these APIs are not yet completely implemented as of Linux kernel 2.6!?

In the end I got all values via a combination of reading the pseudo-filesystem /proc and kernel calls.

  • Total Virtual Memory:

    #include "sys/types.h"
    #include "sys/sysinfo.h"
    
    struct sysinfo memInfo;
    
    sysinfo (&memInfo);
    long long totalVirtualMem = memInfo.totalram;
    //Add other values in next statement to avoid int overflow on right hand side...
    totalVirtualMem += memInfo.totalswap;
    totalVirtualMem *= memInfo.mem_unit;
    
  • Virtual Memory currently used:

    Same code as in "Total Virtual Memory" and then

    long long virtualMemUsed = memInfo.totalram - memInfo.freeram;
    //Add other values in next statement to avoid int overflow on right hand side...
    virtualMemUsed += memInfo.totalswap - memInfo.freeswap;
    virtualMemUsed *= memInfo.mem_unit;
    
  • Virtual Memory currently used by current process:

    #include "stdlib.h"
    #include "stdio.h"
    #include "string.h"
    
    int parseLine(char* line){
        // This assumes that a digit will be found and the line ends in " Kb".
        int i = strlen(line);
        const char* p = line;
        while (*p <'0' || *p > '9') p++;
        line[i-3] = '\0';
        i = atoi(p);
        return i;
    }
    
    int getValue(){ //Note: this value is in KB!
        FILE* file = fopen("/proc/self/status", "r");
        int result = -1;
        char line[128];
    
        while (fgets(line, 128, file) != NULL){
            if (strncmp(line, "VmSize:", 7) == 0){
                result = parseLine(line);
                break;
            }
        }
        fclose(file);
        return result;
    }
    
  • Total Physical Memory (RAM):

    Same code as in "Total Virtual Memory" and then

    long long totalPhysMem = memInfo.totalram;
    //Multiply in next statement to avoid int overflow on right hand side...
    totalPhysMem *= memInfo.mem_unit;
    
  • Physical Memory currently used:

    Same code as in "Total Virtual Memory" and then

    long long physMemUsed = memInfo.totalram - memInfo.freeram;
    //Multiply in next statement to avoid int overflow on right hand side...
    physMemUsed *= memInfo.mem_unit;
    
  • Physical Memory currently used by current process:

    Change getValue() in "Virtual Memory currently used by current process" as follows:

    int getValue(){ //Note: this value is in KB!
        FILE* file = fopen("/proc/self/status", "r");
        int result = -1;
        char line[128];
    
        while (fgets(line, 128, file) != NULL){
            if (strncmp(line, "VmRSS:", 6) == 0){
                result = parseLine(line);
                break;
            }
        }
        fclose(file);
        return result;
    }
    

  • CPU currently used:

    #include "stdlib.h"
    #include "stdio.h"
    #include "string.h"
    
    static unsigned long long lastTotalUser, lastTotalUserLow, lastTotalSys, lastTotalIdle;
    
    void init(){
        FILE* file = fopen("/proc/stat", "r");
        fscanf(file, "cpu %llu %llu %llu %llu", &lastTotalUser, &lastTotalUserLow,
            &lastTotalSys, &lastTotalIdle);
        fclose(file);
    }
    
    double getCurrentValue(){
        double percent;
        FILE* file;
        unsigned long long totalUser, totalUserLow, totalSys, totalIdle, total;
    
        file = fopen("/proc/stat", "r");
        fscanf(file, "cpu %llu %llu %llu %llu", &totalUser, &totalUserLow,
            &totalSys, &totalIdle);
        fclose(file);
    
        if (totalUser < lastTotalUser || totalUserLow < lastTotalUserLow ||
            totalSys < lastTotalSys || totalIdle < lastTotalIdle){
            //Overflow detection. Just skip this value.
            percent = -1.0;
        }
        else{
            total = (totalUser - lastTotalUser) + (totalUserLow - lastTotalUserLow) +
                (totalSys - lastTotalSys);
            percent = total;
            total += (totalIdle - lastTotalIdle);
            percent /= total;
            percent *= 100;
        }
    
        lastTotalUser = totalUser;
        lastTotalUserLow = totalUserLow;
        lastTotalSys = totalSys;
        lastTotalIdle = totalIdle;
    
        return percent;
    }
    
  • CPU currently used by current process:

    #include "stdlib.h"
    #include "stdio.h"
    #include "string.h"
    #include "sys/times.h"
    #include "sys/vtimes.h"
    
    static clock_t lastCPU, lastSysCPU, lastUserCPU;
    static int numProcessors;
    
    void init(){
        FILE* file;
        struct tms timeSample;
        char line[128];
    
        lastCPU = times(&timeSample);
        lastSysCPU = timeSample.tms_stime;
        lastUserCPU = timeSample.tms_utime;
    
        file = fopen("/proc/cpuinfo", "r");
        numProcessors = 0;
        while(fgets(line, 128, file) != NULL){
            if (strncmp(line, "processor", 9) == 0) numProcessors++;
        }
        fclose(file);
    }
    
    double getCurrentValue(){
        struct tms timeSample;
        clock_t now;
        double percent;
    
        now = times(&timeSample);
        if (now <= lastCPU || timeSample.tms_stime < lastSysCPU ||
            timeSample.tms_utime < lastUserCPU){
            //Overflow detection. Just skip this value.
            percent = -1.0;
        }
        else{
            percent = (timeSample.tms_stime - lastSysCPU) +
                (timeSample.tms_utime - lastUserCPU);
            percent /= (now - lastCPU);
            percent /= numProcessors;
            percent *= 100;
        }
        lastCPU = now;
        lastSysCPU = timeSample.tms_stime;
        lastUserCPU = timeSample.tms_utime;
    
        return percent;
    }
    

TODO: Other Platforms

I would assume, that some of the Linux code also works for the Unixes, except for the parts that read the /proc pseudo-filesystem. Perhaps on Unix these parts can be replaced by getrusage() and similar functions?


Mac OS X

Total Virtual Memory

This one is tricky on Mac OS X because it doesn't use a preset swap partition or file like Linux. Here's an entry from Apple's documentation:

Note: Unlike most Unix-based operating systems, Mac OS X does not use a preallocated swap partition for virtual memory. Instead, it uses all of the available space on the machine’s boot partition.

So, if you want to know how much virtual memory is still available, you need to get the size of the root partition. You can do that like this:

struct statfs stats;
if (0 == statfs("/", &stats))
{
    myFreeSwap = (uint64_t)stats.f_bsize * stats.f_bfree;
}

Total Virtual Currently Used

Calling systcl with the "vm.swapusage" key provides interesting information about swap usage:

sysctl -n vm.swapusage
vm.swapusage: total = 3072.00M  used = 2511.78M  free = 560.22M  (encrypted)

Not that the total swap usage displayed here can change if more swap is needed as explained in the section above. So the total is actually the current swap total. In C++, this data can be queried this way:

xsw_usage vmusage = {0};
size_t size = sizeof(vmusage);
if( sysctlbyname("vm.swapusage", &vmusage, &size, NULL, 0)!=0 )
{
   perror( "unable to get swap usage by calling sysctlbyname(\"vm.swapusage\",...)" );
}

Note that the "xsw_usage", declared in sysctl.h, seems not documented and I suspect there there is a more portable way of accessing these values.

Virtual Memory Currently Used by my Process

You can get statistics about your current process using the task_info function. That includes the current resident size of your process and the current virtual size.

#include<mach/mach.h>

struct task_basic_info t_info;
mach_msg_type_number_t t_info_count = TASK_BASIC_INFO_COUNT;

if (KERN_SUCCESS != task_info(mach_task_self(),
                              TASK_BASIC_INFO, (task_info_t)&t_info,
                              &t_info_count))
{
    return -1;
}
// resident size is in t_info.resident_size;
// virtual size is in t_info.virtual_size;

Total RAM available

The amount of physical RAM available in your system is available using the sysctl system function like this:

#include <sys/types.h>
#include <sys/sysctl.h>
...
int mib[2];
int64_t physical_memory;
mib[0] = CTL_HW;
mib[1] = HW_MEMSIZE;
length = sizeof(int64_t);
sysctl(mib, 2, &physical_memory, &length, NULL, 0);

RAM Currently Used

You can get general memory statistics from the host_statistics system function.

#include <mach/vm_statistics.h>
#include <mach/mach_types.h>
#include <mach/mach_init.h>
#include <mach/mach_host.h>

int main(int argc, const char * argv[]) {
    vm_size_t page_size;
    mach_port_t mach_port;
    mach_msg_type_number_t count;
    vm_statistics64_data_t vm_stats;

    mach_port = mach_host_self();
    count = sizeof(vm_stats) / sizeof(natural_t);
    if (KERN_SUCCESS == host_page_size(mach_port, &page_size) &&
        KERN_SUCCESS == host_statistics64(mach_port, HOST_VM_INFO,
                                        (host_info64_t)&vm_stats, &count))
    {
        long long free_memory = (int64_t)vm_stats.free_count * (int64_t)page_size;

        long long used_memory = ((int64_t)vm_stats.active_count +
                                 (int64_t)vm_stats.inactive_count +
                                 (int64_t)vm_stats.wire_count) *  (int64_t)page_size;
        printf("free memory: %lld\nused memory: %lld\n", free_memory, used_memory);
    }

    return 0;
}

One thing to note here are that there are five types of memory pages in Mac OS X. They are as follows:

  1. Wired pages that are locked in place and cannot be swapped out
  2. Active pages that are loading into physical memory and would be relatively difficult to swap out
  3. Inactive pages that are loaded into memory, but haven't been used recently and may not even be needed at all. These are potential candidates for swapping. This memory would probably need to be flushed.
  4. Cached pages that have been some how cached that are likely to be easily reused. Cached memory probably would not require flushing. It is still possible for cached pages to be reactivated
  5. Free pages that are completely free and ready to be used.

It is good to note that just because Mac OS X may show very little actual free memory at times that it may not be a good indication of how much is ready to be used on short notice.

RAM Currently Used by my Process

See the "Virtual Memory Currently Used by my Process" above. The same code applies.


Linux

In Linux, this information is available in the /proc file system. I'm not a big fan of the text file format used, as each Linux distribution seems to customize at least one important file. A quick look as the source to 'ps' reveals the mess.

But here is where to find the information you seek:

/proc/meminfo contains the majority of the system-wide information you seek. Here it looks like on my system; I think you are interested in MemTotal, MemFree, SwapTotal, and SwapFree:

Anderson cxc # more /proc/meminfo
MemTotal:      4083948 kB
MemFree:       2198520 kB
Buffers:         82080 kB
Cached:        1141460 kB
SwapCached:          0 kB
Active:        1137960 kB
Inactive:       608588 kB
HighTotal:     3276672 kB
HighFree:      1607744 kB
LowTotal:       807276 kB
LowFree:        590776 kB
SwapTotal:     2096440 kB
SwapFree:      2096440 kB
Dirty:              32 kB
Writeback:           0 kB
AnonPages:      523252 kB
Mapped:          93560 kB
Slab:            52880 kB
SReclaimable:    24652 kB
SUnreclaim:      28228 kB
PageTables:       2284 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   4138412 kB
Committed_AS:  1845072 kB
VmallocTotal:   118776 kB
VmallocUsed:      3964 kB
VmallocChunk:   112860 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

For CPU utilization, you have to do a little work. Linux makes available overall CPU utilization since system start; this probably isn't what you are interested in. If you want to know what the CPU utilization was for the last second, or 10 seconds, then you need to query the information and calculate it yourself.

The information is available in /proc/stat, which is documented pretty well at http://www.linuxhowtos.org/System/procstat.htm; here is what it looks like on my 4-core box:

Anderson cxc #  more /proc/stat
cpu  2329889 0 2364567 1063530460 9034 9463 96111 0
cpu0 572526 0 636532 265864398 2928 1621 6899 0
cpu1 590441 0 531079 265949732 4763 351 8522 0
cpu2 562983 0 645163 265796890 682 7490 71650 0
cpu3 603938 0 551790 265919440 660 0 9040 0
intr 37124247
ctxt 50795173133
btime 1218807985
processes 116889
procs_running 1
procs_blocked 0

First, you need to determine how many CPUs (or processors, or processing cores) are available in the system. To do this, count the number of 'cpuN' entries, where N starts at 0 and increments. Don't count the 'cpu' line, which is a combination of the cpuN lines. In my example, you can see cpu0 through cpu3, for a total of 4 processors. From now on, you can ignore cpu0..cpu3, and focus only on the 'cpu' line.

Next, you need to know that the fourth number in these lines is a measure of idle time, and thus the fourth number on the 'cpu' line is the total idle time for all processors since boot time. This time is measured in Linux "jiffies", which are 1/100 of a second each.

But you don't care about the total idle time; you care about the idle time in a given period, e.g., the last second. Do calculate that, you need to read this file twice, 1 second apart.Then you can do a diff of the fourth value of the line. For example, if you take a sample and get:

cpu  2330047 0 2365006 1063853632 9035 9463 96114 0

Then one second later you get this sample:

cpu  2330047 0 2365007 1063854028 9035 9463 96114 0

Subtract the two numbers, and you get a diff of 396, which means that your CPU had been idle for 3.96 seconds out of the last 1.00 second. The trick, of course, is that you need to divide by the number of processors. 3.96 / 4 = 0.99, and there is your idle percentage; 99% idle, and 1% busy.

In my code, I have a ring buffer of 360 entries, and I read this file every second. That lets me quickly calculate the CPU utilization for 1 second, 10 seconds, etc., all the way up to 1 hour.

For the process-specific information, you have to look in /proc/pid; if you don't care abut your pid, you can look in /proc/self.

CPU used by your process is available in /proc/self/stat. This is an odd-looking file consisting of a single line; for example:

19340 (whatever) S 19115 19115 3084 34816 19115 4202752 118200 607 0 0 770 384 2
 7 20 0 77 0 266764385 692477952 105074 4294967295 134512640 146462952 321468364
8 3214683328 4294960144 0 2147221247 268439552 1276 4294967295 0 0 17 0 0 0 0

The important data here are the 13th and 14th tokens (0 and 770 here). The 13th token is the number of jiffies that the process has executed in user mode, and the 14th is the number of jiffies that the process has executed in kernel mode. Add the two together, and you have its total CPU utilization.

Again, you will have to sample this file periodically, and calculate the diff, in order to determine the process's CPU usage over time.

Edit: remember that when you calculate your process's CPU utilization, you have to take into account 1) the number of threads in your process, and 2) the number of processors in the system. For example, if your single-threaded process is using only 25% of the CPU, that could be good or bad. Good on a single-processor system, but bad on a 4-processor system; this means that your process is running constantly, and using 100% of the CPU cycles available to it.

For the process-specific memory information, you ahve to look at /proc/self/status, which looks like this:

Name:   whatever
State:  S (sleeping)
Tgid:   19340
Pid:    19340
PPid:   19115
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 256
Groups: 0 1 2 3 4 6 10 11 20 26 27
VmPeak:   676252 kB
VmSize:   651352 kB
VmLck:         0 kB
VmHWM:    420300 kB
VmRSS:    420296 kB
VmData:   581028 kB
VmStk:       112 kB
VmExe:     11672 kB
VmLib:     76608 kB
VmPTE:      1244 kB
Threads:        77
SigQ:   0/36864
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: fffffffe7ffbfeff
SigIgn: 0000000010001000
SigCgt: 20000001800004fc
CapInh: 0000000000000000
CapPrm: 00000000ffffffff
CapEff: 00000000fffffeff
Cpus_allowed:   0f
Mems_allowed:   1
voluntary_ctxt_switches:        6518
nonvoluntary_ctxt_switches:     6598

The entries that start with 'Vm' are the interesting ones:

  • VmPeak is the maximum virtual memory space used by the process, in kB (1024 bytes).
  • VmSize is the current virtual memory space used by the process, in kB. In my example, it's pretty large: 651,352 kB, or about 636 megabytes.
  • VmRss is the amount of memory that have been mapped into the process' address space, or its resident set size. This is substantially smaller (420,296 kB, or about 410 megabytes). The difference: my program has mapped 636 MB via mmap(), but has only accessed 410 MB of it, and thus only 410 MB of pages have been assigned to it.

The only item I'm not sure about is Swapspace currently used by my process. I don't know if this is available.


Linux

A portable way of reading memory and load numbers is the sysinfo call

Usage

   #include <sys/sysinfo.h>

   int sysinfo(struct sysinfo *info);

DESCRIPTION

   Until Linux 2.3.16, sysinfo() used to return information in the
   following structure:

       struct sysinfo {
           long uptime;             /* Seconds since boot */
           unsigned long loads[3];  /* 1, 5, and 15 minute load averages */
           unsigned long totalram;  /* Total usable main memory size */
           unsigned long freeram;   /* Available memory size */
           unsigned long sharedram; /* Amount of shared memory */
           unsigned long bufferram; /* Memory used by buffers */
           unsigned long totalswap; /* Total swap space size */
           unsigned long freeswap;  /* swap space still available */
           unsigned short procs;    /* Number of current processes */
           char _f[22];             /* Pads structure to 64 bytes */
       };

   and the sizes were given in bytes.

   Since Linux 2.3.23 (i386), 2.3.48 (all architectures) the structure
   is:

       struct sysinfo {
           long uptime;             /* Seconds since boot */
           unsigned long loads[3];  /* 1, 5, and 15 minute load averages */
           unsigned long totalram;  /* Total usable main memory size */
           unsigned long freeram;   /* Available memory size */
           unsigned long sharedram; /* Amount of shared memory */
           unsigned long bufferram; /* Memory used by buffers */
           unsigned long totalswap; /* Total swap space size */
           unsigned long freeswap;  /* swap space still available */
           unsigned short procs;    /* Number of current processes */
           unsigned long totalhigh; /* Total high memory size */
           unsigned long freehigh;  /* Available high memory size */
           unsigned int mem_unit;   /* Memory unit size in bytes */
           char _f[20-2*sizeof(long)-sizeof(int)]; /* Padding to 64 bytes */
       };

   and the sizes are given as multiples of mem_unit bytes.