How can my Go program keep all the CPU cores busy?

Solution 1:

From the Go FAQ:

Why doesn't my multi-goroutine program use multiple CPUs?

You must set the GOMAXPROCS shell environment variable or use the similarly-named function of the runtime package to allow the run-time support to utilize more than one OS thread.

Programs that perform parallel computation should benefit from an increase in GOMAXPROCS. However, be aware that concurrency is not parallelism.

(UPDATE 8/28/2015: Go 1.5 is set to make the default value of GOMAXPROCS the same as the number of CPUs on your machine, so this shouldn't be a problem anymore)

And

Why does using GOMAXPROCS > 1 sometimes make my program slower?

It depends on the nature of your program. Problems that are intrinsically sequential cannot be sped up by adding more goroutines. Concurrency only becomes parallelism when the problem is intrinsically parallel.

In practical terms, programs that spend more time communicating on channels than doing computation will experience performance degradation when using multiple OS threads. This is because sending data between threads involves switching contexts, which has significant cost. For instance, the prime sieve example from the Go specification has no significant parallelism although it launches many goroutines; increasing GOMAXPROCS is more likely to slow it down than to speed it up.

Go's goroutine scheduler is not as good as it needs to be. In future, it should recognize such cases and optimize its use of OS threads. For now, GOMAXPROCS should be set on a per-application basis.

In short: it is very difficult to make Go use "efficient use of all your cores". Simply spawning a billion goroutines and increasing GOMAXPROCS is just as likely to degrade your performance as speed it up because it will be switching thread contexts all the time. If you have a large program that is parallelizable, then increasing GOMAXPROCS to the number of parallel components works fine. If you have a parallel problem embedded in a largely non-parallel program, it may speed up, or you may have to make creative use of functions like runtime.LockOSThread() to ensure the runtime distributes everything correctly (generally speaking Go just dumbly spreads currently non-blocking Goroutines haphazardly and evenly among all active threads).

Also, GOMAXPROCS is the number of CPU cores to use, if it's greater than NumCPU I'm fairly sure that it simply clamps to NumCPU. GOMAXPROCS isn't strictly equal to the number of threads. I'm not 100% sure of exactly when the runtime decides to spawn new threads, but one instance is when the number of blocking goroutines using runtime.LockOSThread() is greater than or equal to GOMAXPROCs -- it will spawn more threads than cores so it can keep the rest of the program running sanely.

Basically, it's quite simple to increase GOMAXPROCS and make go use all cores of your CPU. It's quite another thing at this point in Go's development to actually get it to smartly and efficiently use all cores of your CPU, requiring a lot of program design and finagling to get right.

Solution 2:

This question cannot be answered, it is much too broad.

Take your problem, your algorithm and your workload and measure what is best for this combination.

Nobody can answer a question like "Is there any heuristic that adding twice as much salt to my lunch will make it taste better?" as this depends on the lunch (tomatoes benefit much more from salt than strawberries) your taste and how much salt there is already. Try it.

On more: runtime.GOMAXPROCS(runtime.NumCPU()) has achieved cult status but controlling the number of threads by setting the GOMAXPROCS environment variable from the outside might be the much better option.

Solution 3:

runtime.GOMAXPROCS() sets the number of (virtual) CPU cores that your program can use simultaneously. Allowing Go to use more CPU cores than you actually have won't help, as your system only has so many CPU cores.

In order to run in more than one thread, your program has to have several goroutines, typically function calls with go someFunc(). If your program doesn't start any additional goroutines it will naturally run in only one thread no matter how many CPUs/cores you allow it to use.

Check out this and the following exercises on how to create goroutines.