Is it possible to create a faster computer from many computers?
My post below was (I think rather unfairly) closed on StackOverflow hence my reposting here.
Original Question: How can I use several computers to create a faster environment? I have about 12 computers with 4GB each and 2GHz each. I need to run some time consuming data transform and would like to use the combined power of these machines. They are all running Win2003 server.
Basically we have a large number of video files that we need to transform so our analysts can do their analysis. The problem is complicated by the fact I can't tell you more about the project.
Original Post: https://stackoverflow.com/questions/1126710/is-it-possible-to-create-a-faster-computer-from-many-computers
What you're asking about is at the crux of a major question in computing today. Individual processing cores aren't going to get much faster, so we need programmers to start writing code that breaks larger problems down into smaller problems that can be processed in parallel on multiple computers.
Short answer for you: If your processing software allows you to break jobs apart and run them in parallel already then do that. If it doesn't, then talk to the people who wrote it about having them re-tool it to work in a more parallel fashion.
The mechanics of getting the data out to the individual computers, starting up parallel tasks, making sure the jobs actually finished up, and bringing the data back is a lot of what the cluster management software that other posters are mentioning does. There are some non-trivial problems involved, but in general cluster management software is about job scheduling and resource management. The cluster management software doesn't handle actually doing the parallel work-- that's what your "processing software" is going to have to do.
There's no "magic" that you can throw at a group of multiple computers to make them "act like" a single faster machine. You're not going to get out of this w/o having software that's built to take advantage of multiple processors out of the gate.
To combine the processing power of multiple machines your going to want to run some sort of clustering software, often called a compute or computing cluster. Some examples of applicaitons that can do this are:
- Microsoft Compute Cluster
- Windows High Performance Cluster
- Beowulf Cluster
These are however complicated bits of software, creating a usable high performance cluster is a complicated and potentially expensive job and should not be undertaken lightly. Your also going to need special software that can run on a cluster to do your work. You can't simply connect a bunch of windows computers together and magically make a cluster, and install a standard bunch of Windows applications on it.
A cluster of computers works well when you can say...
Can I have someone sit at each computer and do part of the problem, and will that speed things up? In other words, can the problem be split into chunks that are basically independent?
Given that you describe the problem as "a large number of video files to transform", yes, this would work on a cluster.
Microsoft does have clustering software that I know nothing about. It might be the way to go.
Probably easier would be:
Dedicate one computer as the master. All the rest are clients.
Put all your video files on this master in a folder.
Make a shared folder per client and have each client mount that shared folder.
The clients run some windows scripting language (perl would work) which wakes up every so often, and, if something is in the shared folder, renames it to work_yourfilenamehere and runs your transform. When it's done it renames it to done_yourfilenamehere. If not, just wait for 60 secs or so.
The master computer runs a script which looks in each of the shared folders. If they have nothing in them then put one file to be transformed. If they have a file named done_yourfilenamehere then move it to the done folder. Loop until nothing is left in the master folder.
Basically all the clients should be kept as busy as possible, with each client taking what ever time they needed to transform each file.