using i7 "gamer" cpu in a HPC cluster

I'm running WRF weather model. That's a ram intensive, highly parallel application.

I need to build a HPC cluster for that. I use 10GB infiniband interconnect.

WRF doesn't depends of core count, but on memory bandwidth. That's why a core i7 3820 or 3930K performs better than high-grade xeons E5-2600 or E7

Seems like universities uses xeon E5-2670 for WRF. It costs about $1500. Spec2006 fp_rates WRF bench shows $580 i7 3930K performs the same with 1600MHz RAM.

What's interesting is that i7 can handle up to 2400MHz ram, doing a great performance increase for WRF. Then it really outperforms the xeon. Power comsumption is a bit higher, but still less than 20€ a year. Even including additional part I'll need (PSU, infiniband, case), the i7 way is still 700 €/cpu cheaper than Xeon.

So, is it ok to use "gamer" hardware in a HPC cluster ? or should I do it pro with xeon ?

(This is not a critical application. I can handle downtime. I think I don't need ECC?)


Solution 1:

We did this in the high-frequency financial trading world for a short while (close parallel to HPC, given the application architecture I was working with)...

Around early 2010, I was deploying to custom 3U rack-chassied, single socket i7 "gaming systems" with 10GbE Solarflare (with OpenOnload UDP message kernel-bypass) and/or Infiniband interconnects.

I had no IPMI/out-of-band management, no power management, single power supplies and no hot-swappable parts. We used both SSDs and internal SAS disks at different points, but disk was not critical to the compute nodes. The operating systems were Fedora Linux with a highly-customized and tuned kernel.

This worked in proof-of-concept, and was a holdover until we got a feeling for how our trading applications would react in production with live data. However, as things grew, this became a management nightmare...

Issues like cooling, data center heat/space/density (these things were 3U boxes), support and remote management ended up killing the effort. While the CPU's never technically failed, every other component ha issues!. And this was with only a cycle of 8 hours of daily production use...

What did we do long-term?

We abandoned the gamer PC's and went with proper purpose-built server hardware. Yes, this was a financial firm, so we didn't have budget limitations, but I still needed to be conscious of pricing, considering the potential scale of an unproven application. There are good servers in all price ranges, and if you're planning to scale-up, deals can be worked out with manufacturers. You don't think the big HPC research lab clusters pay retail price for gear, do you? Neither did we...

So, if you're looking to do this, think about the big picture. If you're just thinking about using the desktop-grade CPU in otherwise server-grade hardware, it will work... But I would not recommend it for full-time use.

If you know the CPU limitations and availability issues going into this, then I can only offer a data point for consideration.

Solution 2:

The i7 can't use ECC Buffered RAM and can't be installed in dual or quad socket configurations. That seems like reason enough to not use it - but of course, your needs might dictate otherwise.

Solution 3:

Personally I would still lean towards the Xeon as it is designed to support more concurrent load over a longer period of time. If you are running CPU intensive work over an extended period of time, say hours or weeks the Xeon has a much longer mean time to failure than the i7 does.

Other than that I defer to @MDMarra's response as to server load out.