A. Examples

The following are examples of the constructs defined in this document. A statement following a directive is compound only when necessary, and a non-compound statement is indented from a directive preceding it.

A.1 A simple loop in parallel

The following example demonstrates how to parallelize a loop using the parallel for directive. The loop iteration variable is private by default, so it isn't necessary to specify it explicitly in a private clause.

#pragma omp parallel for
    for (i=1; i<n; i++)
        b[i] = (a[i] + a[i-1]) / 2.0;

A.2 Conditional compilation

The following examples illustrate the use of conditional compilation using the OpenMP macro _OPENMP. With OpenMP compilation, the _OPENMP macro becomes defined.

# ifdef _OPENMP
    printf_s("Compiled by an OpenMP-compliant implementation.\n");
# endif

The defined preprocessor operator allows more than one macro to be tested in a single directive.

# if defined(_OPENMP) && defined(VERBOSE)
    printf_s("Compiled by an OpenMP-compliant implementation.\n");
# endif

A.3 Parallel regions

The parallel directive can be used in coarse-grain parallel programs. In the following example, each thread in the parallel region decides what part of the global array x to work on, based on the thread number:

#pragma omp parallel shared(x, npoints) private(iam, np, ipoints)
{
    iam = omp_get_thread_num();
    np =  omp_get_num_threads();
    ipoints = npoints / np;
    subdomain(x, iam, ipoints);
}

A.4 The nowait clause

If there are many independent loops within a parallel region, you can use the nowait clause to avoid the implied barrier at the end of the for directive, as follows:

#pragma omp parallel
{
    #pragma omp for nowait
        for (i=1; i<n; i++)
             b[i] = (a[i] + a[i-1]) / 2.0;
    #pragma omp for nowait
        for (i=0; i<m; i++)
            y[i] = sqrt(z[i]);
}

A.5 The critical directive

The following example includes several critical directives. The example illustrates a queuing model in which a task is dequeued and worked on. To guard against many threads dequeuing the same task, the dequeuing operation must be in a critical section. Because the two queues in this example are independent, they're protected by critical directives with different names, xaxis and yaxis.

#pragma omp parallel shared(x, y) private(x_next, y_next)
{
    #pragma omp critical ( xaxis )
        x_next = dequeue(x);
    work(x_next);
    #pragma omp critical ( yaxis )
        y_next = dequeue(y);
    work(y_next);
}

A.6 The lastprivate clause

Correct execution sometimes depends on the value that the last iteration of a loop assigns to a variable. Such programs must list all such variables as arguments to a lastprivate clause so that the values of the variables are the same as when the loop is executed sequentially.

#pragma omp parallel
{
   #pragma omp for lastprivate(i)
      for (i=0; i<n-1; i++)
         a[i] = b[i] + b[i+1];
}
a[i]=b[i];

In the preceding example, the value of i at the end of the parallel region will equal n-1, as in the sequential case.

A.7 The reduction clause

The following example demonstrates the reduction clause:

#pragma omp parallel for private(i) shared(x, y, n) \
                         reduction(+: a, b)
    for (i=0; i<n; i++) {
        a = a + x[i];
        b = b + y[i];
    }

A.8 Parallel sections

In the following example (for section 2.4.2), functions xaxis, yaxis, and zaxis can be executed concurrently. The first section directive is optional. All section directives need to appear in the lexical extent of the parallel sections construct.

#pragma omp parallel sections
{
    #pragma omp section
        xaxis();
    #pragma omp section
        yaxis();
    #pragma omp section
        zaxis();
}

A.9 Single directives

The following example demonstrates the single directive. In the example, only one thread (usually the first thread that encounters the single directive) prints the progress message. The user must not make any assumptions as to which thread will execute the single section. All other threads will skip the single section and stop at the barrier at the end of the single construct. If other threads can proceed without waiting for the thread executing the single section, a nowait clause can be specified on the single directive.

#pragma omp parallel
{
    #pragma omp single
        printf_s("Beginning work1.\n");
    work1();
    #pragma omp single
        printf_s("Finishing work1.\n");
    #pragma omp single nowait
        printf_s("Finished work1 and beginning work2.\n");
    work2();
}

A.10 Sequential ordering

Ordered sections are useful for sequentially ordering the output from work that's done in parallel. The following program prints out the indexes in sequential order:

#pragma omp for ordered schedule(dynamic)
    for (i=lb; i<ub; i+=st)
        work(i);
void work(int k)
{
    #pragma omp ordered
        printf_s(" %d", k);
}

A.11 A fixed number of threads

Some programs rely on a fixed, prespecified number of threads to execute correctly. Because the default setting for the dynamic adjustment of the number of threads is implementation-defined, such programs can choose to turn off the dynamic threads capability and set the number of threads explicitly to keep portability. The following example shows how to do this using omp_set_dynamic, and omp_set_num_threads:

omp_set_dynamic(0);
omp_set_num_threads(16);
#pragma omp parallel shared(x, npoints) private(iam, ipoints)
{
    if (omp_get_num_threads() != 16)
      abort();
    iam = omp_get_thread_num();
    ipoints = npoints/16;
    do_by_16(x, iam, ipoints);
}

In this example, the program executes correctly only if it's executed by 16 threads. If the implementation isn't capable of supporting 16 threads, the behavior of this example is implementation-defined.

The number of threads executing a parallel region stays constant during a parallel region, regardless of the dynamic threads setting. The dynamic threads mechanism determines the number of threads to use at the start of the parallel region and keeps it constant for the duration of the region.

A.12 The atomic directive

The following example avoids race conditions (simultaneous updates of an element of x by many threads) by using the atomic directive:

#pragma omp parallel for shared(x, y, index, n)
    for (i=0; i<n; i++)
    {
        #pragma omp atomic
            x[index[i]] += work1(i);
        y[i] += work2(i);
    }

The advantage of using the atomic directive in this example is that it allows updates of two different elements of x to occur in parallel. If a critical directive is used instead, then all updates to elements of x are executed serially (though not in any guaranteed order).

The atomic directive applies only to the C or C++ statement immediately following it. As a result, elements of y aren't updated atomically in this example.

A.13 A flush directive with a list

The following example uses the flush directive for point-to-point synchronization of specific objects between pairs of threads:

int   sync[NUMBER_OF_THREADS];
float work[NUMBER_OF_THREADS];
#pragma omp parallel private(iam,neighbor) shared(work,sync)
{
    iam = omp_get_thread_num();
    sync[iam] = 0;
    #pragma omp barrier

    // Do computation into my portion of work array
    work[iam] = ...;

    //  Announce that I am done with my work
    // The first flush ensures that my work is
    // made visible before sync.
    // The second flush ensures that sync is made visible.
    #pragma omp flush(work)
    sync[iam] = 1;
    #pragma omp flush(sync)

    // Wait for neighbor
    neighbor = (iam>0 ? iam : omp_get_num_threads()) - 1;
    while (sync[neighbor]==0)
    {
        #pragma omp flush(sync)
    }

    // Read neighbor's values of work array
    ... = work[neighbor];
}

A.14 A flush directive without a list

The following example (for section 2.6.5) distinguishes the shared objects affected by a flush directive with no list from the shared objects that aren't affected:

// omp_flush_without_list.c
#include <omp.h>

int x, *p = &x;

void f1(int *q)
{
    *q = 1;
    #pragma omp flush
    // x, p, and *q are flushed
    //   because they are shared and accessible
    // q is not flushed because it is not shared.
}

void f2(int *q)
{
    #pragma omp barrier
    *q = 2;

    #pragma omp barrier
    // a barrier implies a flush
    // x, p, and *q are flushed
    //   because they are shared and accessible
    // q is not flushed because it is not shared.
}

int g(int n)
{
    int i = 1, j, sum = 0;
    *p = 1;

    #pragma omp parallel reduction(+: sum) num_threads(10)
    {
        f1(&j);
        // i, n and sum were not flushed
        //   because they were not accessible in f1
        // j was flushed because it was accessible
        sum += j;
        f2(&j);
        // i, n, and sum were not flushed
        //   because they were not accessible in f2
        // j was flushed because it was accessible
        sum += i + j + *p + n;
    }
    return sum;
}

int main()
{
}

A.15 The number of threads used

Consider the following incorrect example (for section 3.1.2):

np = omp_get_num_threads(); // misplaced
#pragma omp parallel for schedule(static)
    for (i=0; i<np; i++)
        work(i);

The omp_get_num_threads() call returns 1 in the serial section of the code, so np will always be equal to 1 in the preceding example. To determine the number of threads that will be deployed for the parallel region, the call should be inside the parallel region.

The following example shows how to rewrite this program without including a query for the number of threads:

#pragma omp parallel private(i)
{
    i = omp_get_thread_num();
    work(i);
}

A.16 Locks

In the following example (for section 3.2), the argument to the lock functions should have type omp_lock_t, and that there's no need to flush it. The lock functions cause the threads to be idle while waiting for entry to the first critical section, but to do other work while waiting for entry to the second. The omp_set_lock function blocks, but the omp_test_lock function doesn't, allowing the work in skip() to be done.

// omp_using_locks.c
// compile with: /openmp /c
#include <stdio.h>
#include <omp.h>

void work(int);
void skip(int);

int main() {
   omp_lock_t lck;
   int id;

   omp_init_lock(&lck);
   #pragma omp parallel shared(lck) private(id)
   {
      id = omp_get_thread_num();

      omp_set_lock(&lck);
      printf_s("My thread id is %d.\n", id);

      // only one thread at a time can execute this printf
      omp_unset_lock(&lck);

      while (! omp_test_lock(&lck)) {
         skip(id);   // we do not yet have the lock,
                     // so we must do something else
      }
      work(id);     // we now have the lock
                    // and can do the work
      omp_unset_lock(&lck);
   }
   omp_destroy_lock(&lck);
}

A.17 Nestable locks

The following example (for section 3.2) demonstrates how a nestable lock can be used to synchronize updates both to a whole structure and to one of its members.

#include <omp.h>
typedef struct {int a,b; omp_nest_lock_t lck;} pair;

void incr_a(pair *p, int a)
{
    // Called only from incr_pair, no need to lock.
    p->a += a;
}

void incr_b(pair *p, int b)
{
    // Called both from incr_pair and elsewhere,
    // so need a nestable lock.

    omp_set_nest_lock(&p->lck);
    p->b += b;
    omp_unset_nest_lock(&p->lck);
}

void incr_pair(pair *p, int a, int b)
{
    omp_set_nest_lock(&p->lck);
    incr_a(p, a);
    incr_b(p, b);
    omp_unset_nest_lock(&p->lck);
}

void f(pair *p)
{
    extern int work1(), work2(), work3();
    #pragma omp parallel sections
    {
        #pragma omp section
            incr_pair(p, work1(), work2());
        #pragma omp section
            incr_b(p, work3());
    }
}

A.18 Nested for directives

The following example of for directive nesting is compliant because the inner and outer for directives bind to different parallel regions:

#pragma omp parallel default(shared)
{
    #pragma omp for
        for (i=0; i<n; i++)
        {
            #pragma omp parallel shared(i, n)
            {
                #pragma omp for
                    for (j=0; j<n; j++)
                        work(i, j);
            }
        }
}

A following variation of the preceding example is also compliant:

#pragma omp parallel default(shared)
{
    #pragma omp for
        for (i=0; i<n; i++)
            work1(i, n);
}

void work1(int i, int n)
{
    int j;
    #pragma omp parallel default(shared)
    {
        #pragma omp for
            for (j=0; j<n; j++)
                work2(i, j);
    }
    return;
}

A.19 Examples showing incorrect nesting of work-sharing directives

The examples in this section illustrate the directive nesting rules.

The following example is noncompliant because the inner and outer for directives are nested and bind to the same parallel directive:

void wrong1(int n)
{
  #pragma omp parallel default(shared)
  {
      int i, j;
      #pragma omp for
      for (i=0; i<n; i++) {
          #pragma omp for
              for (j=0; j<n; j++)
                 work(i, j);
     }
   }
}

The following dynamically nested version of the preceding example is also noncompliant:

void wrong2(int n)
{
  #pragma omp parallel default(shared)
  {
    int i;
    #pragma omp for
      for (i=0; i<n; i++)
        work1(i, n);
  }
}

void work1(int i, int n)
{
  int j;
  #pragma omp for
    for (j=0; j<n; j++)
      work2(i, j);
}

The following example is noncompliant because the for and single directives are nested, and they bind to the same parallel region:

void wrong3(int n)
{
  #pragma omp parallel default(shared)
  {
    int i;
    #pragma omp for
      for (i=0; i<n; i++) {
        #pragma omp single
          work(i);
      }
  }
}

The following example is noncompliant because a barrier directive inside a for can result in deadlock:

void wrong4(int n)
{
  #pragma omp parallel default(shared)
  {
    int i;
    #pragma omp for
      for (i=0; i<n; i++) {
        work1(i);
        #pragma omp barrier
        work2(i);
      }
  }
}

The following example is noncompliant because the barrier results in deadlock due to the fact that only one thread at a time can enter the critical section:

void wrong5()
{
  #pragma omp parallel
  {
    #pragma omp critical
    {
       work1();
       #pragma omp barrier
       work2();
    }
  }
}

The following example is noncompliant because the barrier results in deadlock due to the fact that only one thread executes the single section:

void wrong6()
{
  #pragma omp parallel
  {
    setup();
    #pragma omp single
    {
      work1();
      #pragma omp barrier
      work2();
    }
    finish();
  }
}

A.20 Bind barrier directives

The directive binding rules call for a barrier directive to bind to the closest enclosing parallel directive. For more information on directive binding, see section 2.8.

In the following example, the call from main to sub2 is compliant because the barrier (in sub3) binds to the parallel region in sub2. The call from main to sub1 is compliant because the barrier binds to the parallel region in subroutine sub2. The call from main to sub3 is compliant because the barrier doesn't bind to any parallel region and is ignored. Also, the barrier only synchronizes the team of threads in the enclosing parallel region and not all the threads created in sub1.

int main()
{
    sub1(2);
    sub2(2);
    sub3(2);
}

void sub1(int n)
{
    int i;
    #pragma omp parallel private(i) shared(n)
    {
        #pragma omp for
        for (i=0; i<n; i++)
            sub2(i);
    }
}

void sub2(int k)
{
     #pragma omp parallel shared(k)
     sub3(k);
}

void sub3(int n)
{
    work(n);
    #pragma omp barrier
    work(n);
}

A.21 Scope variables with the private clause

The values of i and j in the following example are undefined on exit from the parallel region:

int i, j;
i = 1;
j = 2;
#pragma omp parallel private(i) firstprivate(j)
{
  i = 3;
  j = j + 2;
}
printf_s("%d %d\n", i, j);

For more information on the private clause, see section 2.7.2.1.

A.22 The default(none) clause

The following example distinguishes the variables that are affected by the default(none) clause from the variables that aren't:

// openmp_using_clausedefault.c
// compile with: /openmp
#include <stdio.h>
#include <omp.h>

int x, y, z[1000];
#pragma omp threadprivate(x)

void fun(int a) {
   const int c = 1;
   int i = 0;

   #pragma omp parallel default(none) private(a) shared(z)
   {
      int j = omp_get_num_thread();
             //O.K.  - j is declared within parallel region
      a = z[j];       // O.K.  - a is listed in private clause
                      //      - z is listed in shared clause
      x = c;          // O.K.  - x is threadprivate
                      //      - c has const-qualified type
      z[i] = y;       // C3052 error - cannot reference i or y here

      #pragma omp for firstprivate(y)
         for (i=0; i<10 ; i++) {
            z[i] = y;  // O.K. - i is the loop control variable
                       // - y is listed in firstprivate clause
          }
       z[i] = y;   // Error - cannot reference i or y here
   }
}

For more information on the default clause, see section 2.7.2.5.

A.23 Examples of the ordered directive

It's possible to have many ordered sections with a for specified with the ordered clause. The first example is noncompliant because the API specifies the following rule:

"An iteration of a loop with a for construct must not execute the same ordered directive more than once, and it must not execute more than one ordered directive." (See section 2.6.6.)

In this noncompliant example, all iterations execute two ordered sections:

#pragma omp for ordered
for (i=0; i<n; i++)
{
    ...
    #pragma omp ordered
    { ... }
    ...
    #pragma omp ordered
    { ... }
    ...
}

The following compliant example shows a for with more than one ordered section:

#pragma omp for ordered
for (i=0; i<n; i++)
{
    ...
    if (i <= 10)
    {
        ...
        #pragma omp ordered
        { ... }
    }
    ...
    (i > 10)
    {
        ...
        #pragma omp ordered
        { ... }
    }
    ...
}

A.24 Example of the private clause

The private clause of a parallel region is only in effect for the lexical extent of the region, not for the dynamic extent of the region. Therefore, in the example that follows, any uses of the variable a within the for loop in the routine f refers to a private copy of a, while a usage in routine g refers to the global a.

int a;

void f(int n)
{
    a = 0;

    #pragma omp parallel for private(a)
    for (int i=1; i<n; i++)
    {
        a = i;
        g(i, n);
        d(a);     // Private copy of "a"
        ...
    }
    ...

void g(int k, int n)
{
    h(k,a); // The global "a", not the private "a" in f
}

A.25 Examples of the copyprivate data attribute clause

Example 1: The copyprivate clause can be used to broadcast values acquired by a single thread directly to all instances of the private variables in the other threads.

float x, y;
#pragma omp threadprivate(x, y)

void init( )
{
    float a;
    float b;

    #pragma omp single copyprivate(a,b,x,y)
    {
        get_values(a,b,x,y);
    }

    use_values(a, b, x, y);
}

If routine init is called from a serial region, its behavior isn't affected by the presence of the directives. After the call to the get_values routine has been executed by one thread, no thread leaves the construct until the private objects designated by a, b, x, and y in all threads have become defined with the values read.

Example 2: In contrast to the previous example, suppose the read must be performed by a particular thread, say the master thread. In this case, the copyprivate clause can't be used to do the broadcast directly, but it can be used to provide access to a temporary shared object.

float read_next( )
{
    float * tmp;
    float return_val;

    #pragma omp single copyprivate(tmp)
    {
        tmp = (float *) malloc(sizeof(float));
    }

    #pragma omp master
    {
        get_float( tmp );
    }

    #pragma omp barrier
    return_val = *tmp;
    #pragma omp barrier

    #pragma omp single
    {
       free(tmp);
    }

    return return_val;
}

Example 3: Suppose that the number of lock objects required within a parallel region can't easily be determined prior to entering it. The copyprivate clause can be used to provide access to shared lock objects that are allocated within that parallel region.

#include <omp.h>

omp_lock_t *new_lock()
{
    omp_lock_t *lock_ptr;

    #pragma omp single copyprivate(lock_ptr)
    {
        lock_ptr = (omp_lock_t *) malloc(sizeof(omp_lock_t));
        omp_init_lock( lock_ptr );
    }

    return lock_ptr;
}

A.26 The threadprivate directive

The following examples demonstrate how to use the threadprivate directive to give each thread a separate counter.

Example 1

int counter = 0;
#pragma omp threadprivate(counter)

int sub()
{
    counter++;
    return(counter);
}

Example 2

int sub()
{
    static int counter = 0;
    #pragma omp threadprivate(counter)
    counter++;
    return(counter);
}

A.27 C99 variable length arrays

The following example demonstrates how to use C99 Variable Length Arrays (VLAs) in a firstprivate directive.

Note

Variable length arrays aren't currently supported in Visual C++.

void f(int m, int C[m][m])
{
    double v1[m];
    ...
    #pragma omp parallel firstprivate(C, v1)
    ...
}

A.28 The num_threads clause

The following example demonstrates the num_threads clause. The parallel region is executed with a maximum of 10 threads.

#include <omp.h>
main()
{
    omp_set_dynamic(1);
    ...
    #pragma omp parallel num_threads(10)
    {
        ... parallel region ...
    }
}

A.29 Work-sharing constructs inside a critical construct

The following example demonstrates using a work-sharing construct inside a critical construct. This example is compliant because the work-sharing construct and the critical construct don't bind to the same parallel region.

void f()
{
  int i = 1;
  #pragma omp parallel sections
  {
    #pragma omp section
    {
      #pragma omp critical (name)
      {
        #pragma omp parallel
        {
          #pragma omp single
          {
            i++;
          }
        }
      }
    }
  }
}

A.30 Reprivatization

The following example demonstrates the reprivatization of variables. Private variables can be marked private again in a nested directive. You don't need to share those variables in the enclosing parallel region.

int i, a;
...
#pragma omp parallel private(a)
{
  ...
  #pragma omp parallel for private(a)
  for (i=0; i<10; i++)
     {
       ...
     }
}

A.31 Thread-safe lock functions

The following C++ example demonstrates how to initialize an array of locks in a parallel region by using omp_init_lock.

// A_13_omp_init_lock.cpp
// compile with: /openmp
#include <omp.h>

omp_lock_t *new_locks() {
   int i;
   omp_lock_t *lock = new omp_lock_t[1000];
   #pragma omp parallel for private(i)
   for (i = 0 ; i < 1000 ; i++)
      omp_init_lock(&lock[i]);

   return lock;
}

int main () {}