How to tell if linux disk IO is causing excessive (> 1 second) application stalls
I have a Java application performing a large volume (hundreds of MB) of continuous output (streaming plain text) to about a dozen files a ext3 SAN filesystem. Occasionally, this application pauses for several seconds at a time. I suspect that something related to ext3 vsfs (Veritas Filesystem) functionality (and/or how it interacts with the OS) is the culprit.
What steps can I take to confirm or refute this theory? I am aware of iostat
and /proc/diskstats
as starting points.
Revised title to de-emphasize journaling and emphasize "stalls"
I have done some googling and found at least one article that seems to describe behavior like I am observing: Solving the ext3 latency problem
Additional Information
- Red Hat Enterprise Linux Server release 5.3 (Tikanga)
- Kernel:
2.6.18-194.32.1.el5
- Primary application disk is fiber-channel SAN:
lspci | grep -i fibre
>>14:00.0 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (rev 03)
- Mount info:
type vxfs (rw,tmplog,largefiles,mincache=tmpcache,ioerror=mwdisable) 0 0
-
cat /sys/block/VxVM123456/queue/scheduler
>>noop anticipatory [deadline] cfq
My guess is that there's some other process that hogs the disk I/O capacity for a while. iotop
can help you pinpoint it, if you have a recent enough kernel.
If this is the case, it's not about the filesystem, much less about journalling. It's the I/O scheduler the responsible to arbitrate between conflicting applications. An easy test: check the current scheduler and try a different one. It can be done on the fly, without restarting. For example, on my desktop to check the first disk (/dev/sda
):
cat /sys/block/sda/queue/scheduler
=> noop deadline [cfq]
shows that it's using CFQ, which is a good choice for desktops but not so much for servers. Better set 'deadline':
echo 'deadline' > /sys/block/sda/queue/scheduler
cat /sys/block/sda/queue/scheduler
=> noop [deadline] cfq
and wait a few hours to see if it improves. If so, set it permanently in the startup scripts (depends on distribution)
Well one easy test would be to mount that ext3 fs as ext2 and then profile the application's performance.
The answer is "Yes" (journaling ALWAYS adds latency :-)
The question of how significant it is can really only be answered by a direct test, but generally assume that for every (journaled) operation it takes around twice as long as it would without journaling enabled.
Since you mentioned in your comments on another answer that you can't do the direct test in your production environment (and presumably don't have a dev/test environment you can use) you do have one other option: Look at your disk statistics and see how much time you spend writing to the journal device.
Unfortunately this only really helps if your journal device is discrete and can be instrumented separately from the "main" disk.
Second time I'm plugging a McKusick video today, but if you wade through this video there's a great discussion of some of the work a journaling filesystem has to do (and the performance impact involved).
Not directly useful/relevant to you and your particular question, but a great general background on filesystems and journaling.
Yes, journaling causes latency. But it's a small piece of the equation. I'd consider it the 5th or 6th item to look at... However, this is another in a trend of systems storage questions that do not include enough relevant information.
- What type of server hardware are you using? (make and model)
- Please describe the storage setup (RAID controller, cache configuration, number and arrangement of disks)
- What operating system are you using? Distribution and kernel versions would be helpful.
Why do I ask for this information?
Your hardware setup and RAID level can have a HUGE impact on your observed performance. Read and write caching on hardware RAID controllers can and should be tuned to accommodate your workload and I/O patterns. The operating system matters because it impacts the tool recommendations and tuning techniques that would be helpful to you. Different distributions and kernels have different default settings, thus performance characteristics vary between them.
So in this case, there are a number of possibilities:
- Your RAID array may not be able to keep up with the workload (not enough spindles).
- Or you could benefit from write caching.
- You may have fragmentation issues (how full is the filesystem?).
- You could have an ill-fitting RAID level that's counter to the requisite performance characteristics.
- Your RAID controller may need tuning.
- You may need to change your system's I/O scheduler and run some block-device tuning.
- You could consider a more performance-optimized filesystem like XFS.
- You could drop the journal and remount your filesystems as ext2. This can be done on the fly.
- You might have cheap SATA disks that may be experiencing bus timeouts.
But as-is, we don't have enough information to go on.