difference between omp critical and omp single
I am trying to understand the exact difference between #pragma omp critical
and #pragma omp single
in OpenMP:
Microsoft definitions for these are:
- Single: Lets you specify that a section of code should be executed on a single thread, not necessarily the master thread.
- Critical: Specifies that code is only be executed on one thread at a time.
So it means that in both, the exact section of code afterwards would be executed by just one thread and other threads will not enter that section e.g. if we print something, we will see the result on screen once, right?
How about the difference? It looks that critical take care of time of execution, but not single! But I don't see any difference in practice! Does it mean that a kind of waiting or synchronization for other threads (which do not enter that section) is considered in critical, but there is nothing that holds other threads in single? How it can change the outcome in practice?
I appreciate if anyone can clarify this to me especially by an example. Thanks!
Solution 1:
single
and critical
are two very different things. As you mentioned:
-
single
specifies that a section of code should be executed by single thread (not necessarily the master thread) -
critical
specifies that code is executed by one thread at a time
So the former will be executed only once while the later will be executed as many times as there are of threads.
For example the following code
int a=0, b=0;
#pragma omp parallel num_threads(4)
{
#pragma omp single
a++;
#pragma omp critical
b++;
}
printf("single: %d -- critical: %d\n", a, b);
will print
single: 1 -- critical: 4
I hope you see the difference better now.
For the sake of completeness, I can add that:
-
master
is very similar tosingle
with two differences:-
master
will be executed by the master only whilesingle
can be executed by whichever thread reaching the region first; and -
single
has an implicit barrier upon completion of the region, where all threads wait for synchronization, whilemaster
doesn't have any.
-
-
atomic
is very similar tocritical
, but is restricted for a selection of simple operations.
I added these precisions since these two pairs of instructions are often the ones people tend to mix-up...
Solution 2:
single
and critical
belong to two completely different classes of OpenMP constructs. single
is a worksharing construct, alongside for
and sections
. Worksharing constructs are used to distribute a certain amount of work among the threads. Such constructs are "collective" in the sense that in correct OpenMP programs all threads must encounter them while executing and moreover in the same sequential order, also including the barrier
constructs. The three worksharing constructs cover three different general cases:
-
for
(a.k.a. loop construct) distributes automatically the iterations of a loop among the threads - in most cases all threads get work to do; -
sections
distributes a sequence of independent blocks of code among the threads - some threads get work to do. This is a generalisation of thefor
construct as a loop with 100 iterations could be expressed as e.g. 10 sections of loops with 10 iterations each. -
single
singles out a block of code for execution by one thread only, often the first one to encounter it (an implementation detail) - only one thread gets work.single
is to a great extent equivalent tosections
with a single section only.
A common trait of all worksharing constructs is the presence of an implicit barrier at their end, which barrier might be turned off by adding the nowait
clause to the corresponding OpenMP construct, but the standard does not require such behaviour and with some OpenMP runtimes the barrier might continue to be there despite the presence of nowait
. Incorrectly ordered (i.e. out of sequence in some of the threads) worksharing constructs might therefore lead to deadlocks. A correct OpenMP program will never deadlock when the barriers are present.
critical
is a synchronisation construct, alongside master
, atomic
, and others. Synchronisation constructs are used to prevent race conditions and to bring order in the execution of things.
-
critical
prevents race conditions by preventing the simultaneous execution of code among the threads in the so-called contention group. This means all threads from all parallel regions encountering similarly named critical constructs get serialised; -
atomic
turns certain simple memory operations into atomic ones, usually by utilising special assembly instructions. Atomics complete at once as a single non-breakable unit. For example, an atomic read from some location by one thread, which happens concurrently with an atomic write to the same location by another thread, will either return the old value or the updated value, but never some kind of an intermediate mash-up of bits from both the old and the new values; -
master
singles out a block of code for execution by the master thread (thread with ID of 0) only. Unlikesingle
, there is no implicit barrier at the end of the construct and also there is no requirement that all threads must encounter themaster
construct. Also, the lack of implicit barrier means thatmaster
does not flush the shared memory view of the threads (this is an important but very poorly understood part of OpenMP).master
is basically a shorthand forif (omp_get_thread_num() == 0) { ... }
.
critical
is a very versatile construct as it is able to serialise different pieces of code in very different parts of the program code, even in different parallel regions (significant in the case of nested parallelism only). Each critical
construct has an optional name provided in parenthesis immediately after. Anonymous critical constructs share the same implementation-specific name. Once a thread enters such a construct, any other thread encountering another construct of the same name is put on hold until the original thread exits its construct. Then the serialisation process continues with the rest of the threads.
An illustration of the concepts above follows. The following code:
#pragma omp parallel num_threads(3)
{
foo();
bar();
...
}
results in something like:
thread 0: -----< foo() >< bar() >-------------->
thread 1: ---< foo() >< bar() >---------------->
thread 2: -------------< foo() >< bar() >------>
(thread 2 is purposely a latecomer)
Having the foo();
call within a single
construct:
#pragma omp parallel num_threads(3)
{
#pragma omp single
foo();
bar();
...
}
results in something like:
thread 0: ------[-------|]< bar() >----->
thread 1: ---[< foo() >-|]< bar() >----->
thread 2: -------------[|]< bar() >----->
Here [ ... ]
denotes the scope of the single
construct and |
is the implicit barrier at its end. Note how the latecomer thread 2 makes all other threads wait. Thread 1 executes the foo()
call as the example OpenMP runtime chooses to assign the job to the first thread to encounter the construct.
Adding a nowait
clause might remove the implicit barrier, resulting in something like:
thread 0: ------[]< bar() >----------->
thread 1: ---[< foo() >]< bar() >----->
thread 2: -------------[]< bar() >---->
Having the foo();
call within an anonymous critical
construct:
#pragma omp parallel num_threads(3)
{
#pragma omp critical
foo();
bar();
...
}
results in something like:
thread 0: ------xxxxxxxx[< foo() >]< bar() >-------------->
thread 1: ---[< foo() >]< bar() >------------------------->
thread 2: -------------xxxxxxxxxxxx[< foo() >]< bar() >--->
With xxxxx...
is shown the time a thread spends waiting for other threads executing a critical construct of the same name before it could enter its own construct.
Critical constructs of different names do not synchronise with each other. E.g.:
#pragma omp parallel num_threads(3)
{
if (omp_get_thread_num() > 1) {
#pragma omp critical(foo2)
foo();
}
else {
#pragma omp critical(foo01)
foo();
}
bar();
...
}
results in something like:
thread 0: ------xxxxxxxx[< foo() >]< bar() >---->
thread 1: ---[< foo() >]< bar() >--------------->
thread 2: -------------[< foo() >]< bar() >----->
Now thread 2 does not synchronise with the other threads because its critical construct is named differently and therefore makes a potentially dangerous simultaneous call into foo()
.
On the other hand, anonymous critical constructs (and in general constructs with the same name) synchronise with one another no matter where in the code they are:
#pragma omp parallel num_threads(3)
{
#pragma omp critical
foo();
...
#pragma omp critical
bar();
...
}
and the resulting execution timeline:
thread 0: ------xxxxxxxx[< foo() >]< ... >xxxxxxxxxxxxxxx[< bar() >]------------>
thread 1: ---[< foo() >]< ... >xxxxxxxxxxxxxxx[< bar() >]----------------------->
thread 2: -------------xxxxxxxxxxxx[< foo() >]< ... >xxxxxxxxxxxxxxx[< bar() >]->