Linux IO monitoring per file?
I am interested in a utility or process for monitoring disk IO per file on CentOS.
On Win2008, the resmon utility allows this type of drilldown, but none of the Linux utilities I have found do this (iostat, iotop, dstat, nmon).
My interest in monitoring IO bottlenecks on database servers. With MSSQL, I have found it an informative diagnostic to know which files / filespaces are getting hit the hardest.
SystemTap is probably your best option.
Here is how the output from the iotime.stp example looks like:
825946 3364 (NetworkManager) access /sys/class/net/eth0/carrier read: 8190 write: 0
825955 3364 (NetworkManager) iotime /sys/class/net/eth0/carrier time: 9
[...]
117061 2460 (pcscd) access /dev/bus/usb/003/001 read: 43 write: 0
117065 2460 (pcscd) iotime /dev/bus/usb/003/001 time: 7
[...]
3973737 2886 (sendmail) access /proc/loadavg read: 4096 write: 0
3973744 2886 (sendmail) iotime /proc/loadavg time: 11
The disadvantage (aside from the learning curve) is that you will need to install kernel-debug, which may not be possible on a production system. However, you can resort to cross-instrumentation where you compile a module on a development system, and run that .ko on the production system.
Or if you are impatient, look at Chapter 4. Useful SystemTap Scripts from the beginners guide.
SystemTap* script:
#!/usr/bin/env stap
#
# Monitor the I/O of a program.
#
# Usage:
# ./monitor-io.stp name-of-the-program
global program_name = @1
probe begin {
printf("%5s %1s %6s %7s %s\n",
"PID", "D", "BYTES", "us", "FILE")
}
probe vfs.read.return, vfs.write.return {
# skip other programs
if (execname() != program_name)
next
if (devname=="N/A")
next
time_delta = gettimeofday_us() - @entry(gettimeofday_us())
direction = name == "vfs.read" ? "R" : "W"
# If you want only the filename, use
// filename = kernel_string($file->f_path->dentry->d_name->name)
# If you want only the path from the mountpoint, use
// filename = devname . "," . reverse_path_walk($file->f_path->dentry)
# If you want the full path, use
filename = task_dentry_path(task_current(),
$file->f_path->dentry,
$file->f_path->mnt)
printf("%5d %1s %6d %7d %s\n",
pid(), direction, $return, time_delta, filename)
}
The output looks like this:
[root@sl6 ~]# ./monitor-io.stp cat
PID D BYTES us FILE
3117 R 224 2 /lib/ld-2.12.so
3117 R 512 3 /lib/libc-2.12.so
3117 R 17989 700 /usr/share/doc/grub-0.97/COPYING
3117 R 0 3 /usr/share/doc/grub-0.97/COPYING
Or if you choose to display only the path from the mountpoint:
[root@f19 ~]# ./monitor-io.stp cat
PID D BYTES us FILE
26207 R 392 4 vda3,usr/lib64/ld-2.17.so
26207 R 832 3 vda3,usr/lib64/libc-2.17.so
26207 R 1758 4 vda3,etc/passwd
26207 R 0 1 vda3,etc/passwd
26208 R 392 3 vda3,usr/lib64/ld-2.17.so
26208 R 832 3 vda3,usr/lib64/libc-2.17.so
26208 R 35147 16 vdb7,ciupicri/doc/grub2-2.00/COPYING
26208 R 0 2 vdb7,ciupicri/doc/grub2-2.00/COPYING
[root@f19 ~]# mount | grep -E '(vda3|vdb7)'
/dev/vda3 on / type ext4 (rw,relatime,seclabel,data=ordered)
/dev/vdb7 on /mnt/mnt1/mnt11/data type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
Limitations/bugs:
-
mmap based I/O doesn't show up because
devname
is"N/A"
- files on tmpfs don't show up because
devname
is"N/A"
- it doesn't matter if the reads are from the cache or the writes are to the buffer
The results for Matthew Ife's programs:
-
for mmaptest private:
PID D BYTES us FILE 3140 R 392 5 vda3,usr/lib64/ld-2.17.so 3140 R 832 5 vda3,usr/lib64/libc-2.17.so 3140 W 23 9 N/A,3 3140 W 23 4 N/A,3 3140 W 17 3 N/A,3 3140 W 17 118 N/A,3 3140 W 17 125 N/A,3
-
for mmaptest shared:
PID D BYTES us FILE 3168 R 392 3 vda3,usr/lib64/ld-2.17.so 3168 R 832 3 vda3,usr/lib64/libc-2.17.so 3168 W 23 7 N/A,3 3168 W 23 2 N/A,3 3168 W 17 2 N/A,3 3168 W 17 98 N/A,3
-
for diotest (direct I/O):
PID D BYTES us FILE 3178 R 392 2 vda3,usr/lib64/ld-2.17.so 3178 R 832 3 vda3,usr/lib64/libc-2.17.so 3178 W 16 6 N/A,3 3178 W 1048576 509 vda3,var/tmp/test_dio.dat 3178 R 1048576 244 vda3,var/tmp/test_dio.dat 3178 W 16 25 N/A,3
*Quick setup instructions for RHEL 6 or equivalent: yum install systemtap
and debuginfo-install kernel
You'd actually want to use blktrace
for this. See Visualizing Linux IO with Seekwatcher and blktrace.
I'll see if I can post one of my examples soon.
Edit:
You don't mention distribution of Linux, but maybe this is a good case for a dtrace script on Linux or even System Tap, if you're using a RHEL-like system.