Putting Oracle redo logs on DRAM SSD for a heavy write database?

I have a Sun M4000 connected to an EMC CX4-120 array with a write-heavy database. Writes peak at around 1200 IO/s and 12MB/s.

According to EMC, I am saturating the write cache on the EMC array.

I think the simplest solution is to move the redo logs to a DRAM based SSD. This will reduce the load on the EMC array by half and apps won't be seeing log buffer waits. Yes, the DBWR may become a bottleneck, but the apps won't be waiting for it (like they do on redo commits!)

I currently cycle through about 4 4GB redo logs, so even 20GB or so of SSD would make a big difference. Since this is short-term storage and is constantly being overwritten, Flash based SSDs probably aren't a great idea.

The M4000 doesn't have any extra drive lots, so a PCI-E card would be perfect, I could go external or move boot volumes to the EMC and free up the local drives.

Sun sells a Flash Accelerator F20 PCIe card, but that seems to be a cache for some SATA disks, not a DRAM SSD solution. Details are sketchy, it doesn't list the M4000 as supported, and I'm tired of fighting Sun's phone tree looking for human help. :(

Do others agree that a DRAM SSD is the way to go? Any hardware recommendations?

UPDATE In addition to the info in a comment below, I tried various settings for "commit_write" and it didn't make a difference.


First - I guess you have very few disks in the array. 1200IOPS can be easily supported be 12 spinning disks (100 IOPS per disk is very reasonable). If the cache can't handle it, it means that your sustained write rate of 1200 IOPS is way more than your disks can support.

Anyway, SSD for redo logs isn't likely to help. First, are your session wait mostly on the COMMIT statement? Check the top wait events in statspack / AWR to verify. I would guess ~95% of your I/O is not to the redo logs at all. For example, a single row insert to a table with 5 indexes can do 1 I/O to read a table block (that has space for the row), read 5 index blocks (to update them), write 1 data block, 1 undo block and 5 index blocks (or more, if non-leaf blocks are updated) and 1 redo block. So, check statspack and see your wait events, you are likely waiting a lot of both READs and WRITEs for data / indexes. Waiting for reads slows down the INSERT, and the WRITE activity makes READs even slower - it is the same disks (BTW - do you really need all the indexes? dropping those who aren't must have will accelerate the inserts).

Another thing to check is RAID definition - is it RAID1 (mirroring - each write is two writes) or RAID 5 (each write is 2 reads and two writes for checksum calculation). RAID 5 is way slower in write-intensive load.

BTW - if the disks can't hanlde the write load, DBWR will be a bottleneck. Your SGA will be full with dirty blocks, and you will not have room left to read new blocks (like index blocks that needs to be processed / updated) until DBWR can write some dirty blocks to disks. Again, check statspack / awr report /addm to diagnose what's the bottleneck, typically based on the top 5 wait events.


dd is nothing compared to block i/o.

For some other views, check around, anandtech.com did an exaustive test (granted with MS SQL server) with SAS rotating vs SSD, in various combinations, and the Solaris world has ZFS with SSD making up various parts (logs, cache, etc).

But yes, if RAID 5 vs RAID 10 is the same (for writes), you are doing something wrong. With linear writes, heck RAID 5 could be faster (i.e. it can do the parity in memory, then write the stripes and parity all at once), but with random small block (4-8k), you get killed by updating stripes (as noted by others), the raid 10 should be more than 2x faster, if not, something is wrong.

You need to dig deeper, before you spend money on hardware.


I saw a post about mounting UFS partitions using the "forcedirectio" option and setting the Oracle parameter "filesystemio_options" to "setall".

I tried it and see a 4-5x improvement in Oracle writes! Yea!

The key symptoms were low throughput but good response times on the disk. This seems to help some people but not others. It certainly did the job for me.

I may consider SSDs for new servers, but this server is running fine now.

Robert


If this box had only been an x86/64 box running linux I'd have happily recommended one of the FusionIO PCIe drive cards - they're astonishingly fast and don't 'die' with heavy writes like SSDs do. Unfortunately they're not supported with either Sparc or Solaris, you might want to contact them to discuss this though.