Selecting a Linux I/O Scheduler
As documented in /usr/src/linux/Documentation/block/switching-sched.txt
, the I/O scheduler on any particular block device can be changed at runtime. There may be some latency as the previous scheduler's requests are all flushed before bringing the new scheduler into use, but it can be changed without problems even while the device is under heavy use.
# cat /sys/block/hda/queue/scheduler
noop deadline [cfq]
# echo anticipatory > /sys/block/hda/queue/scheduler
# cat /sys/block/hda/queue/scheduler
noop [deadline] cfq
Ideally, there would be a single scheduler to satisfy all needs. It doesn't seem to exist yet. The kernel often doesn't have enough knowledge to choose the best scheduler for your workload:
-
noop
is often the best choice for memory-backed block devices (e.g. ramdisks) and other non-rotational media (flash) where trying to reschedule I/O is a waste of resources -
deadline
is a lightweight scheduler which tries to put a hard limit on latency -
cfq
tries to maintain system-wide fairness of I/O bandwidth
The default was anticipatory
for a long time, and it received a lot of tuning, but was removed in 2.6.33 (early 2010). cfq
became the default some while ago, as its performance is reasonable and fairness is a good goal for multi-user systems (and even single-user desktops). For some scenarios -- databases are often used as examples, as they tend to already have their own peculiar scheduling and access patterns, and are often the most important service (so who cares about fairness?) -- anticipatory
has a long history of being tunable for best performance on these workloads, and deadline
very quickly passes all requests through to the underlying device.
It's possible to use a udev rule to let the system decide on the scheduler based on some characteristics of the hw.
An example udev rule for SSDs and other non-rotational drives might look like
# set noop scheduler for non-rotating disks
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="noop"
inside a new udev rules file (e.g., /etc/udev/rules.d/60-ssd-scheduler.rules
). This answer is based on the debian wiki
To check whether ssd disks would use the rule, it's possible to check for the trigger attribute in advance:
for f in /sys/block/sd?/queue/rotational; do printf "$f "; cat $f; done
The aim of having the kernel support different ones is that you can try them out without a reboot; you can then run test workloads through the sytsem, measure performance, and then make that the standard one for your app.
On modern server-grade hardware, only the noop one appears to be at all useful. The others seem slower in my tests.