What are the limits of running NTP servers in virtual machines?
The simple fact is that clock accuracy within a VM is still really bad. This comes from a few spots, but the killer thing is that the time drift is not constant; the drift factor changes from moment to moment. NTP is a protocol that has clock compensation built within it, but it was designed with a static drift factor built in. For example, if a physical machine loses 12 seconds every 30 days, NTP can compensate for that and does so very well. But if that machine can lose anywhere from 4 to 70 seconds every 30 days, NTP isn't so good at tracking that level of change.
What makes it really hard for NTP to keep up in a VM environment is that the local clock it sees can change its drift factor over the course of a minute. Depending on the frequency it is checking its parent time sources it can cause major drift-factor changes and cause it to go out-of-sync far more often. Out-of-sync time cascades throughout your organization.
NTP for a local network is a relatively low impact protocol with a very small memory footprint, and can happily piggy-back on your other network infrastructure servers like your DNS and DHCP servers. Some routers can also provide NTP functionality, so you may want to look into that.
Ideally you want two separate servers in separate locations that each sync against a different set of higher stratum servers. It would also be a very good idea of both time-servers were configured to use the other server as a 'peer', which will minimize the impact to time-service should one of the upstream time-sources go awry; there will be a stratum change but at least it won't report out-of-sync. And finally, be nice to your upstream time providers and configure your servers to go a very long time between polls once time is well established. This is the 'maxpoll' parameter on the 'server' line, and is a power of two in seconds between sync attempts.
If you absolutely had to use VMs for this, I'd set up no less than three such NTP servers. Each of those needs to be on a different host, and if possible in a different data-center. As with what I just suggested, they need different time-sources and should peer with each other. Then configure all of your NTP clients to use all three as Parent sources. Make sure your maxpoll values are low enough to never go more than an hour and a half between sync packets off-network, and 30 minutes on-network. Chances are good at least one of the three will be in-sync at any given time. For clients that can only talk to one time-host, they'll just have to put up with the occasional out-of-sync event. Overall, time-quality in this scenario would not be as exact as it would be with physical servers.
If I had to ball-park, I'd say your consensus time in the pure-VM environment would probably be within, oh, 30 to 100ms of true. In a purely physical environment, your consensus time would probably be within 10ms once the time servers had been up long enough for time to settle.
See the vmware timekeeping document. Running a NTP daemon in a VM is probably not a good idea, particularly if you need reliable time.
unfortunately ntp and virtualisation does not go very well together. clients are ok in most cases, however ntp server (esp str2 and above) generally won't work reliably on virtual server.
i'm commenting from xen and xen enterprise perspective, but i believe vmware/kvm will be just the same.
re different servers, yes, you are right, ideally they should be in different environments as well, so that temp/humidity are not affecting the accuracy either, but at least i don't bother with that. also don't forget that whatever you do it still be not as accurate as proper atomic clock, so just accept this (very slight) deviation.