Monitor system CPU / system calls in Linux
I have a couple processes that are eating up a lot of system CPU time (as determined by looking at vmstat). Is there an easy way to find out what kind of system calls are being made?
I know there is strace, but is there a quicker and easier way? Does there exist something like a "top" for system calls?
I think strace with the -c
flag is probably the closest I know of. If you haven't used the -c
flag, try this:
$ sudo strace -c -p 12345
Where 12345 is the Process ID (PID) of the process in question. Note that stracing a process does add additional overhead, so while you're tracing it, the process will run slower.
After running that for however long you want to collect data, press Ctrl-C
to stop your data collection and output the results. It'll produce something like this:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
31.88 0.001738 145 12 futex
16.79 0.000915 11 80 tgkill
12.36 0.000674 34 20 read
9.76 0.000532 266 2 statfs
8.42 0.000459 13 35 time
4.38 0.000239 6 40 gettimeofday
3.65 0.000199 4 48 sigprocmask
2.94 0.000160 18 9 open
2.88 0.000157 12 13 stat64
1.32 0.000072 9 8 munmap
0.90 0.000049 6 8 mmap2
0.88 0.000048 3 14 7 sigreturn
0.79 0.000043 5 9 close
0.77 0.000042 4 10 rt_sigprocmask
0.64 0.000035 3 12 setitimer
0.55 0.000030 5 6 6 rt_sigsuspend
0.53 0.000029 4 8 fstat64
0.29 0.000016 8 2 setresuid32
0.13 0.000007 4 2 _llseek
0.09 0.000005 3 2 prctl
0.04 0.000002 2 1 geteuid32
------ ----------- ----------- --------- --------- ----------------
100.00 0.005451 341 13 total
As you can see, this is a breakdown of all system calls made by the application, sorted by total time, and including the average time per call and number of calls for each syscall. If you want to sort them differently, see the man page for strace, as there's a couple of options.
The type of strace switches that I tend to use is this.
strace -ffttT -p pid -o /tmp/strace.out
An example of this would look like,
19:35:57.485493 mprotect(0x7f35e7472000, 16384, PROT_READ) = 0 <0.000037>
19:35:57.485599 mprotect(0x7f35e7692000, 4096, PROT_READ) = 0 <0.000030>
19:35:57.485697 mprotect(0x7f35e78b7000, 4096, PROT_READ) = 0 <0.000030>
19:35:57.485782 munmap(0x7f35e7896000, 129588) = 0 <0.000037>
19:35:57.485875 set_tid_address(0x7f35e78949d0) = 10730 <0.000029>
19:35:57.485960 set_robust_list(0x7f35e78949e0, 0x18) = 0 <0.000024>
19:35:57.486048 futex(0x7fff8f58628c, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000025>
19:35:57.486131 futex(0x7fff8f58628c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, 7f35e7894700) = -1 EAGAIN (Resource temporarily unavailable) <0.000024>
You see the time difference at the right hand side of the system call showing how much time it has taken to go from one system call to another.
It will catch you the time difference between the system calls. So, when you see that a system call has quite a few seconds gap with the next system call, then there it is making some noise.
Another method is to coredump it with gcore. However, that requires a little experience navigating through gdb.
But, if the thread is a kernel thread, then you can't strace or coredump it. In that case, we have to use something more complex. In RHEL5 kernel, we use oprofile. In RHEL6, we use perf. I prefer perf over oprofile. Perf data can be collected with graph like format showing the system call where maximum percentage of CPU is being used.
With a test perf, I see this.
38.06% swapper [kernel.kallsyms] [k] mwait_idle_with_hints ↑
29.45% swapper [kernel.kallsyms] [k] read_hpet
4.90% swapper [kernel.kallsyms] [k] acpi_os_read_port ▒
4.74% swapper [kernel.kallsyms] [k] hpet_next_event
It shows the kernel function where 38% CPU time is being spent. Now, we can check the function and see what it is doing and what it is supposed to do.
With a few examples, it is not that much hard.
Maybe try out one of the sampling profilers, such as oprofile, or for newer kernels, perf. If you're lucky, "perf top" might tell you precisely what you want. See here for some examples