Why did robocopy cause my Windows 2012 server to hang last night?

It sounds like its a network card driver issue for sure. To see if this is a bug with your dual-nic setup, adjust the IPG parameter to about 20 milliseconds and remove your /MT:128 parameter (since /IPG and /MT are not compatible). Using your "switches I specified" line in your original post it would look like this.

/MIR /COPYALL /R /W /IPG:20 /Z

The /IPG:20 (inter-packet gap) will slow down the transmission considerably, but provides stability.

The /Z (restartable mode) is important for copies over the network, in case of network disruptions (caused by bad cards, drivers, or by actual network issues) because it will allow the copy to pick up where it left off.

If this completes successfully, you've got an issue with your network driver. The issue would be that whatever driver your using can't handle the throughput of /IPG:0.

The final nail in the coffin for the NIC driver being the root cause of your server hanging would be to replace the card and rerun the command that caused it to hang. Apart from that you could probably also unplug one of the connections so the multiplexing doesn't occur, and run the command that produced the error.

Suggestion came from cb42 on technet.

http://social.technet.microsoft.com/Forums/en-US/itprovistaapps/thread/9555a996-1301-4f68-b9d3-82a87fc6ba46/

...and ss64 rocks (just sayin!) http://ss64.com/nt/robocopy.html


It appears to me that Robocopy is A) buggy, and B) hooks into the kernel in some way that can make the entire system incredibly unstable when it bugs out. We've seen this happen quite often (especially with the MT option) when syncing over reasonably high-speed WAN links (20Mbps - 100Mbps). So I'm pretty sure it's not a NIC driver having traffic volume issues - we do things in production that abuse them far more badly than this, and we see this even with 10Gbps LAN connections on Cisco UCS / VMWare 5.5, with everything patched current and Robocopy v6.3.9600.17415 dated 10/28/2014.

I'd love it if somebody can definitively prove we're all doing something stupid, but it looks like Microsoft is just putting out some unbelievably dangerous code.