Single-thread app 50% slower on VMware X5650 than physical E5450
Depending on the type of virtualization, 5% overhead is pretty much best case scenario. With full paravirtualization, you can achieve such overhead on IO-light workloads quite easily. With hardware-assisted virtualization (technology used by VMWare), it is possible to achieve such a low overhead on IO-light workloads on an hypervisor with few VM. With full virtualization (no CPU extensions), 5% overhead is pretty much a dream.
Keep in mind this can depend on very many factors. Virtualization tends to add a significant amount of latency between disks and the guest OS. This will increase IO wait and therefore load averages while keeping CPU usage rather low. If your storage is on the lower side of the IOPS scale, this will have a very big impact. If you are using network storage, this will almost always add latency due to having to access a network for each IO instead of just accessing an internal bus.
Virtualization can also add extra network latency if you use special network configuration modules such as virtual switches but this usually is not very significant.
Virtualization tends to add many extra interrupts which are required to switch from a VM to another. Depending on the scheduler of the hypervisor, this can be significant. There isn't much you can do about this since it is just due to the nature of virtualization. But it is something to keep in mind as a justification to lower performance.
Due to the single-threaded nature of your application, having more cores will yield no significant performance improvement. Both CPUs have similar frequencies but you will notice the X5650 has a slower frequency without "Turbo Boost". You may want to check that feature is compatible/enabled with your setup.
33% overhead on IO intensive workload is, I find, not so bad. Try separating the storage for your two VMs and see if it helps.