Solution 1:

The big one that's likely to kill you is disk IO. Collecting both the transactions per second and the sectors read/written per second will give you a start on determining what you'll need on the SAN. Keep an eye on memory and pagefile usage, too, that can do bad things to your disk IO stats, and provisioning your VMs with some extra memory is simple.

Network is probably the next most important one, but that's pretty simple -- aggregate transfer and packets per second, make sure it's not too ridiculous.

CPU is the least likely bottleneck on a modern system, in my experience. I'd be inclined not to worry about it unless you've got multiple machines that are pegging their CPU consistently. Provisioning an extra VM server if you're running out of CPU is straightforward.

Solution 2:

After a bit more reseach, I think this is a good generic list of counters :

Logical Disk

  • Avg Disk sec/Read
  • Avg Disk sec/Write
  • % Idle Time

Memory

  • % Comitted Bytes in Use
  • Avaliable MBytes
  • Free System Page Tables Entries
  • Pages/sec
  • Pool NonPaged Bytes
  • Pool Paged Bytes

Network

  • Bytes Total /sec
  • Output Queue Length

Physical Disk

  • % Idle Time
  • Avg Disk sec/Read
  • Avg Disk sec /Write
  • Avg Disk Queue Length
  • Avg Disk Bytes/sec

Process

  • Handle Count
  • Private Bytes
  • Thread Count

Processor

  • % Interrupt Time
  • % Processor Time
  • % User Time

System

  • Processor Queue Length
  • Terminal Server (Optional)
  • Active Sessions
  • Inactive Sessions
  • Total Sessions

Solution 3:

For disk bound I like to monitor '\PhysicalDisk( ... )\Current Disk Queue Length' for each physical disk.

For your problem viewing things with perfmon: Although this might be out of the scope of what you are doing, I monitor windows counters with Nagios using the check_nt plugin, and nsclient++ installed on the client. I can then graph everything using n2rrd , I can also use rrdtool to create custom graphs.

All the stuff you listed is often run in a vmware/san enviornment. It is really just a question of how powerful the SAN and virtual server will need to be and the right architecture. If you are willing to spend the cash for an expensive san, the vendors should be able to tell you what you need.

Solution 4:

Dependent upon your usage, disk IO and networks are like to be the biggest cause for concern in moving to a VMWare type infrastructure, especially if your VM's are being stored on the SAN, you should definitely be assessing network usage and disk IO for all machines you would migrate. Most servers for VMWare type usage should come with a nice number of NIC's however its still worth bearing in mind how many you will be able to use, as well as the speed of disks on the SAN. VMWare ESX supports the ability to not write all disk changes back to the VM immediately and therefore you can save on some performance that way.

Measuring performance we used RRDTool to access performance as Kyle said, this is really useful.

Solution 5:

Virtual machines are not like typical servers, in that you run into problems in different areas. Most of the time, CPU isn't the bottlenecking resource, but RAM is. The things to really know before you go in:

  • Disk throughput How fast do you pound your storage? MB/read, MB/write both average and peak (as mentioned elsewhere in this thread RRDTool is good for this). Do you know when your peaks are, and whether or not they'll coincide with I/O peaks on other VMs stored on the same ESX cluster. In our environment backups are the peak I/O time, but we get bursts during the day. The answer to this will tell you whether or not you can get away with file-backed disks, or if you have to direct present LUNs to VMs.
  • Network throughput Know how fast you need to be. As above, backups are the area when we start attempting to saturate our NICs. Know how much data you're pounding out. I'm pretty sure there are NICs out there that can do VLAN tagging, which can ease load-balancing problems if your network infrastructure supports it.
  • RAM creep Known your programs. We have one that will consume every bit of memory given to it, which causes the VMWare console to whine and complain about usage and recommend giving it more. If you're not as tragically underfunded as we are, hopefully your ESX servers will be provisioned with a lot of RAM. In our environment, we consider a VM to be 'piggy' if it needs over 1GB of RAM. Yours may be different.

Determining whether or not you can use file-backed disks or if you require direct-presented LUNs can take a bit of knowing. Direct-presented LUNs are where your storage array presents LUNs directly to VM's, which is made easier by using NPIV. You can do it without NPIV but it may be too perilous for your blood, all brand new Fibre Channel hardware should support it and ESX 3.5 certainly does. Direct presented removes a layer of abstraction between the storage array and the virtual machine pounding I/O, and in that sense it can provide better performance. However, direct-presentation is trickier to set up and has a higher start-up time in the "wrap your head around it" stage.

File backed disks are just plain easier. Plus, they can be moved between storage arrays pretty simply (for certain values of simple, where copying multi GB files is concerned), something that direct-presentation requires (usually very expensive) array-level replication software to accomplish. Low I/O load things work just peachy on file-backed, and even some higher I/O things as well. We're running a full Exchange 2007 install for over 3000 users on file-backed disks. The backups could be faster, but during the day the users don't notice any slow-downs.