Find out which task is generating a lot of context switches on linux

According to vmstat, my Linux server (2xCore2 Duo 2.5 GHz) is constantly doing around 20k context switches per second.

# vmstat 3
procs -----------memory----------  ---swap-- -----io----  -system-- ----cpu----
 r  b   swpd   free   buff  cache    si   so    bi    bo   in    cs us sy id wa
 2  0   7292 249472  82340 2291972    0    0     0     0    0     0  7 13 79  0
 0  0   7292 251808  82344 2291968    0    0     0   184   24 20090  1  1 99  0
 0  0   7292 251876  82344 2291968    0    0     0    83   17 20157  1  0 99  0
 0  0   7292 251876  82344 2291968    0    0     0    73   12 20116  1  0 99  0

... but uptime shows small load: load average: 0.01, 0.02, 0.01 and top doesn't show any process with high %CPU usage.

How do I find out what exactly is generating those context switches? Which process/thread?

I tried to analyze pidstat output:

# pidstat -w 10 1

12:39:13          PID   cswch/s nvcswch/s  Command
12:39:23            1      0.20      0.00  init
12:39:23            4      0.20      0.00  ksoftirqd/0
12:39:23            7      1.60      0.00  events/0
12:39:23            8      1.50      0.00  events/1
12:39:23           89      0.50      0.00  kblockd/0
12:39:23           90      0.30      0.00  kblockd/1
12:39:23          995      0.40      0.00  kirqd
12:39:23          997      0.60      0.00  kjournald
12:39:23         1146      0.20      0.00  svscan
12:39:23         2162      5.00      0.00  kjournald
12:39:23         2526      0.20      2.00  postgres
12:39:23         2530      1.00      0.30  postgres
12:39:23         2534      5.00      3.20  postgres
12:39:23         2536      1.40      1.70  postgres
12:39:23        12061     10.59      0.90  postgres
12:39:23        14442      1.50      2.20  postgres
12:39:23        15416      0.20      0.00  monitor
12:39:23        17289      0.10      0.00  syslogd
12:39:23        21776      0.40      0.30  postgres
12:39:23        23638      0.10      0.00  screen
12:39:23        25153      1.00      0.00  sshd
12:39:23        25185     86.61      0.00  daemon1
12:39:23        25190     12.19     35.86  postgres
12:39:23        25295      2.00      0.00  screen
12:39:23        25743      9.99      0.00  daemon2
12:39:23        25747      1.10      3.00  postgres
12:39:23        26968      5.09      0.80  postgres
12:39:23        26969      5.00      0.00  postgres
12:39:23        26970      1.10      0.20  postgres
12:39:23        26971     17.98      1.80  postgres
12:39:23        27607      0.90      0.40  postgres
12:39:23        29338      4.30      0.00  screen
12:39:23        31247      4.10     23.58  postgres
12:39:23        31249     82.92     34.77  postgres
12:39:23        31484      0.20      0.00  pdflush
12:39:23        32097      0.10      0.00  pidstat

Looks like some postgresql tasks are doing >10 context swiches per second, but it doesn't all sum up to 20k anyway.

Any idea how to dig a little deeper for an answer?


Solution 1:

Well, quite interesting case. Try observing watch -tdn1 cat /proc/interrupts. Do you see any valuable changes there?

Solution 2:

Try using

pidstat -wt

The 't' option shows the threads also. It might be a thread who is doing the context switches .

Solution 3:

In newer kernel version

sudo perf record -e context-switches -a  # record the events

# then ctrl+c

sudo perf report # inspect the result

This will give you the exactly result about context-switches events.

And you may be find the reason caused the context-switches by append "-g" flags (The readable result determined by symbol information)

sudo perf record -e context-switches -a -g

Solution 4:

Context switch are normal. A process is assigned to an quanta of time, if it finish (or it paused caused by the need of ressources) what it have to do it can let the processor go.

That said to count how many context switch are done (it becomes a stackoverflow.com answers) it would take the internal kernel schedule() command to write into the processes tables. A there is no such thing if you program your own kernel you'll be able to see but it's quite difficult.