How does one cluster multiple machines to act as one to run multiple virtual machines on this single-seeming machine? [duplicate]

Solution 1:

The type of cluster that presents as a single operating system with lots of memory, multiple CPUs and can run whatever would normally run on the non-clustered version of that OS is called a Single System Image. This takes multiple cluster nodes and does just what you said, merges them into a single OS instance.

This is not commonly done because such a system is extremely hard to engineer correctly, and systems that cluster at the application level instead of the OS level are a lot easier to set up and often perform much better.

The reason for the performance difference has to do with assumptions. A process running on an OS assumes all of its available resources are local. A Cluster-ready process (such as a render farm) assumes that some resources are local and some are remote. Because of the differences of assumption how resources are allocated are very different.

Taking a general-purpose single-node operating system like Linux and converting it into a SSI-style cluster takes a lot of reworking of kernel internals. Concepts such as memory locality (see also: numa) are extremely important on such a system, and the cost of switching a process to a different CPU can be a lot higher. Secondly, a concept not really present in Linux, locality of CPU, is also very important; if you have a multi-threaded process, having two processes running on one node and two on another can perform a lot slower than all four running on the same node. It is up to the operating system to make local vs remote choices for processes that are likely blind to such distinctions.

However, if you have a cluster-ready application (such as those listed by Chopper) the application itself will make local/remote decisions. The application is fully aware of the local vs remote implications of operations and will act accordingly.

Solution 2:

Note: I'm not an expert in this topic.

The way I understand you, your interested in high performance computer clusters (as opposed to other cluster approaches like High-Availability or Load-Balancing). What you probably want is Super-, Grid- or Distributed-computing.

How does one take multiple computers and make them act as one, such that all their processors and memory are combined now and you are running any application such that yo are running them on a single very fast computer.

Without specialized hardware (see for example Torus interconnect or InfiniBand) you're limited to connecting the computers using Ethernet (meaning you either can do distributed- or grid-computing). But you should not forget or underestimate the speed difference of a local high-speed computer bus as opposed to Ethernet!

Now the question if grid- or distributed computing is something you should strive to achieve is highly dependent on the tasks you want to accomplish. With a bottle-neck like Ethernet, grid- or distributed-computing only makes sens for takts/applications that don't need to be very responsive and need to do very computation-intensive tasks. Which (broadly speaking disqualifies any application that isn't of a scientific nature. Also the application should probably be programmed in a way, that it can fully take advantage of the distributed nature of it's host.

If you're still interested, here is a list of compatible operation systems: Single system image