Pay no attention to that SAN behind the curtain

Once upon a time, I built my own SQL servers, and had control over drive configuration, RAID levels, etc. The traditional advice of separation of data, logs, tempdb, backups, (depending on budget!) was always a pretty important part of the SQL server design process.

Now with an enterprise-level SAN, I just request a specific amount of drive space for a new SQL server, divided into logical drives for data, backups, and fileshares. Certainly makes my job easier, but there is a part of me that doesn't feel completely comfortable that I can't really peek "behind the curtain" to see what is really going on back there.

My understanding is that the SAN team doesn't configure different "types" of drives any differently (optimizing data drives for random access vs log drives for streaming writes). Some of this may depend on the SAN product itself (we have an HP XP12000 and an HP XP24000), but I've been assured that the HP software does all sorts of dynamic performance configuration (watching for IO hotspots and reconfiguring on the fly to optimize those LUNs), so that the app teams and DBAs don't need to worry about any of that stuff. Something about "spreading the load of all the servers over a huge number of spindles" or something like that.

My questions/discussion:

Without making enemies on the SAN team, how can I reassure myself and the application developers that our SQL servers aren't suffering from poorly configured storage? Just use perfmon stats? Other benchmarks like sqlio?
If I load test on these SAN drives, does that really give me a reliable, repeatable measure of what I will see when we go live? (assuming that the SAN software might "dynamically configure" differently at different points in time.)
Does heavy IO in one part of the SAN (say the Exchange server) impact my SQL servers? (assuming they aren't giving dedicated disks to each server, which I've been told they are not)
Would requesting separating logical drives for different functions logical drives (data vs log vs tempdb) help here? Would the SAN see the different IO activity on these and optimally configure them differently?
We're in a bit of a space crunch right now. Application teams being told to trim data archives, etc. Would space concerns cause the SAN team to make different decisions on how they configure internal storage (RAID levels, etc) that could impact my server's performance?

Thanks for your thoughts (similar topic briefly discussed in this SF question)

Solution 1:

Without making enemies on the SAN team, how can I reassure myself and the application developers that our SQL servers aren't suffering from poorly configured storage? Just use perfmon stats? Other benchmarks like sqlio?

In short, there probably isn't a way to be truly sure. What I would say (I am a SAN admin), is that if your applications are performing up to your expectations, don't worry about it. If you start to see performance issues that you believe could be related to SAN/Disk IO performance, then it might be wise to inquire. I do not use much HP storage like you do, but in the IBM/NetApp world I can say from experience that there aren't many options which would allow you to configure it "poorly". Most enterprise storage these days takes a lot of the guesswork out of building raid arrays, and doesn't really let you do it wrong. Unless they are mixing drive speeds and capacities within the same raid groups you can rest-assured in most cases that your disk is performing fine.

If I load test on these SAN drives, does that really give me a reliable, repeatable measure of what I will see when we go live? (assuming that the SAN software might "dynamically configure" differently at different points in time.)

Load testing should be plenty reliable. Just keep in mind that when you are load testing one box, that being on a shared SAN/Disk Array that its performance can (and will) be affected by other systems using the same storage.

Does heavy IO in one part of the SAN (say the Exchange server) impact my SQL servers? (assuming they aren't giving dedicated disks to each server, which I've been told they are not)

It can. It is not all about the disks, or which disks, the servers are on. All of the data is being served up via a disk controller, and then a SAN switch. The performance you will see greatly depends on how the disk controller is connected to is corresponding disk shelves, and the corresponding SAN. If the entire array connects to the backbone SAN on one single strand of 4gbps fiber, then clearly the performance will be impacted. If the array is connected across two redundant SAN's which are load balanced, using trunked links, then it would impossible for exchange alone to suck up too much bandwidth. Another thing which needs to be considered is how many IO/sec the array is capable of. As long as the array and the SAN it is connected to are scaled correctly, heavy IO in other parts of the SAN environment should not impact your SQL performance.

Would requesting separating logical drives for different functions logical drives (data vs log vs tempdb) help here? Would the SAN see the different IO activity on these and optimally configure them differently?

That is probably a matter of preference, and also greatly depends on how your storage admins configure it. They could give you three LUNs in the same array or volume, in which case its all the same anyway. If they gave you individual LUNs on different arrays, in different volumes (physically different disks), then it might be worth it for you to separate them.

We're in a bit of a space crunch right now. Application teams being told to trim data archives, etc. Would space concerns cause the SAN team to make different decisions on how they configure internal storage (RAID levels, etc) that could impact my server's performance?

I don't imagine your storage admin would change the raid level in order to free up space. If he would, then he should probably be fired. Space concerns can lead things to be configured differently, but not normally in a performance-impacting way. They might just become a little more tight about how much space they give you. They might enable features such as data de-duplication (if the array supports it) which can hinder the performance of the array while the process runs, but not around the clock.

Solution 2:

The SAN team should have tools that can help you reveal if your app is hotspotting. Obviously, you should monitor and measure on your end too.

Most of my experience is with EMC so YMMV. But the following should apply to most SAN equipment.

There are only so many ports going into the array. Sometimes there is a SAN switch in between that you can define zones. Just because the array is a essentially a big pool of storage does not mean that you should not worry about IO performance.

So if you feel that you are having IO issues, you need to narrow down where the bottleneck is. If it is somewhere between the HBA and the array, you can then figure out if the HBA is maxed out or if the SAN port on the switch/array side is oversubscribed. Additionally, you should have the SAN team monitor access patterns for your app, both from a cold start and running hot.

Obviously, the underlying storage does make a difference say running slow big RAID5 vs speedy RAID10 as you will at some point have to hit the disk regardless of the different levels of cache.

HTH. You can ping me offline if you have a specific issue as this could take a while to dig through.

Solution 3:

The first thing you need to know before doing any sort of benchmarking is to what tolerance your own workload needs to run in. So benchmark your own stuff before checking out the new system. That way if you find you're pushing a max of, say, 56MB/s during peak loads (backups?), finding out that the SAN-attached disk array 'only' pushes 110MB/s under simulated peak loads, you can be assured that the limit isn't going to be the I/O channel.

When checking out a new disk array I've done this kind of performance testing. The new array used SATA drives instead of fibre-channel (SCSI) drives, and I needed to assure myself that it would work in our environment. I was deeply dubious. But after characterization, I found out that the new system had enough I/O overhead under peak to keep up with measured peak on the more reliable disks. It surprised me.

Due to the shared nature of SAN attached disk arrays, performance is variable across the week. If you already know when your peak I/O load is, do a series of load-tests during the time of day when your peak I/O load is. That way you can better characterize what kind of I/O overhead is available during the periods you're most interested in. Load tests during non-peak times will give you a feel for how 'snappy' things will get, but peak testing will give you true bounds checking.

Does heavy IO in one part of the SAN (say the Exchange server) impact my SQL servers? (assuming they aren't giving dedicated disks to each server, which I've been told they are not)

If the Exchange LUNs share disks with your SQL LUNs, they absolutely will. We use HP EVAs, not XPs, but I think they use the same "disk group" terminology. LUNs in the same disk-group share disks, and therefore contend for I/O on those physical devices. The more disks you put into a disk group, the more wiggle-room the array has to juggle I/O. The arrays (at least the EVA's do this, and I presume the more expensive XP's do the same) distribute logical LUN blocks across the physical disks in a non-sequential way. This allows it to do what you suggest, which is dynamically distribute groups of frequently accessed blocks to different physical devices to increase parallelism and reduce I/O contention at the disk level.

The question to ask is how much I/O budget does that disk group have, and whether or not the applications using those LUNs are oversubscribed for I/O. That's a question that the storage admins will have to keep track of. It could be that peak I/O for Exchange (probably during the backups) may not coincide with the SQL loads, and both systems can coexist happily.

For the HP arrays, you'd need to put the different I/O patterns into different disk groups not LUNs. Database I/O patterns shouldn't coexist with web-serving access patterns, for instance. Different LUNs don't markedly improve your performance unless they're in different disk-groups. If they're in the same disk-group the only real advantage is to the operating system, where it can do I/O scheduling in the kernel to improve parallelism to the disk subsystem. That said...

The HP arrays, to my understanding anyway, are aware of different access patterns on LUNs, but pay close attention to actual logical blocks. Putting the logs on a different LUN puts a bound on the logical blocks that'll get that kind of I/O traffic, and that'll ease the task of correctly sorting logical blocks on the physical disks.

Definitely. If space is tight, you're not going to get dedicated disk-groups for your I/O (unless your storage environment is large enough to justify dedicating 7TB of physical disk for your exclusive use, at which point that just may be the case). The Raid5/Raid10 debate depends in large part on the policies of the organization, and asking is your best bet.

Solution 4:

I suggest opening up a dialog with your SAN Team and vendor to address your concerns. One of the problems you're going to have with running your own benchmarks is that your tests may not have bearing on what happens in production, particularly at peak loads. Most SANs have tons of battery-backed cache, which in many cases (particularly when you run synthetic benchmarks) means that you're writing to RAM and getting kick-ass performance.

Depending on your environment and the solution you're using, some vendor CE may have just flown in and setup the SAN to whatever standard he prefers. That happens more than you think. You're going to have to chip away at the "the SAN team knows all" shell until you have confidence that the solution is meeting your requirements.

Good luck.

Solution 5:

I was at an oracle conference once with a talk on this topic - sane SAN for databases.

Gist of the talk is available in this PDF file or at the authors site here