Extremely Slow IO with Simple PostgreSQL 8.4.4 Queries on Centos 5.5

Solution 1:

The fact that your device is /dev/xvdb1 implies that you're running under Xen. How is your storage configured? Is there contention for the underlying device, and how does iostat look on that?

Unless you can eliminate that as likely, that's where I'm going to point the whirling spinner of poor performance blame.

Basically, the overall approach to untangling a performance problem like this is to think about all the layers where a bottleneck could occur, and then devise tests to eliminate each one until you isolate the problem.

Solution 2:

Here are some suggestions, in more or less random order:

  1. Autovacum is not turned on by default in CentOS. There are multiple settings you have to set to enable it. Double check so the vacum process actually runs. It's easy to miss one of the settings required.

  2. Note that you have to do a second filter step for that query, which can be expensive depending on what you get back. I would consider an index such as:

    CREATE INDEX consumer_m_lower_last ON consumer_m (lower(last_name));

    Which will match against your query and remove the Recheck.

  3. Also, as mattdm points out, you can't trust iostat in virtualized environments.

  4. You should probably check http://lonesysadmin.net/2008/02/21/elevatornoop/ if you have IO problems in an XEN environment. Elevator settings can have an impact, but not this large.

  5. Is the underlying disk using LVM snapshots? While this is very useful from a management perspective it can murder IO performance. This is true both if the block device you are usign is a snapshot, and if a snapshot has been taken of the block device.

Solution 3:

I doubt this is a problem with PostgreSQL, and is more likely just an issue with Disk IO. As the comments from another answer mention, if it is a disk IO issue, you really should measure from Dom0 so you get a picture of everything that is happening.

I had a very similar problem a while back and it turned out to be an issue with the disk controller. Very slow disk access was causing the system to bottleneck while waiting for disk IO (which showed up as very high load averages and wait times, but also cause processes waiting for disk to consume more CPU than they would otherwise. It turned out that the kernel wasn't recognizing the controller properly and was falling back onto the old school IDE controller instead of a fast sata one.

The fix was to boot with

hda=noprobe hda=none 

at the end of the kernel string in /etc/grub.conf. (Of course, add all the disks you have, ala: hdc=noprobe, hdc=none, hdd=...)