Tuesday, September 27, 2011

Creating a Barrier between Processes with SysV Semaphore

Running deterministic simulations gets too much complicated when you are dealing with more than one processes. In each simulation run you need to make sure that processes are assigned to same CPUs all the time and kernel scheduler does not introduce any randomness. For single process with multiple threads its easy to synchronize them using thread level synchronizations like mutex, barriers etc.. But with multiple processes things get little complicated without proper allocation of resources to each process and its threads. Also the point when we create a checkpoint is important because we need to make sure that all resource allocation by kernel is done and benchmarks are ready to start their Region-Of-Interest (ROI).

In this post I'll explain how to use System V IPC mechanism to create a barrier between multiple processes. The general idea is to wait on a barrier before each thread starts executing ROI and once all threads reach this point we create a checkpoint and run our simulations from this checkpoint only. The trick is to use SysV semaphores as a barrier across processes. It provides a semaphore operation mode where each process/thread wait for semaphore value to become 0. In short we initialize a semaphore to total number of threads and each thread will decrement the semaphore and wait till its value reaches to 0.

Create/Get a Semaphore
First we create a semaphore using semget. As described in man page, if we pass the key parameter to a specific value it will create a process shared semaphore which can be accessed by other processes using semget.
static int SEM_ID = 786;

int get_semaphore ()
{
    int sem_id;

    sem_id = semget(SEM_ID, 1, IPC_CREAT | 0666);

    if (sem_id == -1) {
        perror("get_semaphore: semget");
        exit(1);
    }

    return sem_id;
}
As highlighted in line 7 we use a common SEM_ID to get the semaphore. get_semaphore function returns the semaphore id which will be used in later function to set semaphore value and perform operations on it.

Set Semaphore's Value
Once we get the semaphore id then we call semctl to set the value of the semaphore. Remember that we must call this function only once.
int set_semaphore (int sem_id, int val)
{
    return semctl(sem_id, 0, SETVAL, val);
}


Decrement Semaphore
Now when a thread/process reaches the start of ROI region they should decrement the semaphore value and wait till its value becomes zero. To decrement the semaphore we use semop function. semop uses a pointer to a sembuf structure which specifies the operation to perform on semaphore.
void decrement_semaphore (int sem_id)
{
    struct sembuf sem_op;

    sem_op.sem_num  = 0;
    sem_op.sem_op   = -1; /* <-- Specify decrement operation */
    sem_op.sem_flg = 0;

    semop(sem_id, &sem_op, 1);
}
As highlighted in line 6 we set sem_op variable to -1 which will tell the kernel to perform atomic decrement operation on the semaphore.

Wait for other Threads/Processes
Now we just have to wait till all the processes reach to our barrier. For that we again use semop function. This time we set sem_op variable's value to 0 to tell the kernel that wake-up this thread/process once semaphore value is 0.
void wait_semaphore (int sem_id)
{
    struct sembuf sem_op;

    sem_op.sem_num  = 0;
    sem_op.sem_op   = 0;
    sem_op.sem_flg = 0;

    semop(sem_id, &sem_op, 1);
}
This function call will block until all the threads have decremented the semaphore and its value is 0.

I have put all these function into a single file: sem_helper.h.

Putting it all Together
Now we create a small application (set_semaphore.cpp) that will create and initialize the semaphore to specific value.
#include 
#include 
#include 
#include 

#include "sem_helper.h"
#include "ptlcalls.h"

using namespace std;

int main (int argc, char** argv)
{
    int sem_id;
    int sem_val;
    int rc;

    if (argc < 2) {
        cout << "Please specify the initial semaphore value.\n";
        return 1;
    }

    /* Get semaphore */
    sem_id = get_semaphore();

    /* Retrive semaphore value from command line arg */
    sem_val = atoi(argv[1]);

    /* Set semaphore value */
    rc = set_semaphore(sem_id, sem_val);
    if (rc == -1) {
        perror("set_semaphore: semctl");
        return 1;
    }

    /* Now wait for semaphore to reach to 0 */
    wait_semaphore (sem_id);

    /* All threads will be in ROI so either create a checkpoint or
     * switch to simulation mode. */
    ptlcall_checkpoint_and_shutdown("checkpoint_1");

    return 0;
}
Compile this code into an application and run it as shown below. (We run it in background mode because once all threads reach to barrier we call the 'ptlcall' to create a checkpoint.)
$ ./set_semaphore NUM_THREADS &
In next step we modify our benchmarks to use this semaphore functions to wait at the barrier. Just before ROI begins write the following code.
#include "sem_helper.h"

void sync_all_processes ()
{
    int sem_id;
    int rc;

    /* Get semaphore */
    sem_id = get_semaphore();

    /* Decrement semaphore value */
    decrement_semaphore (sem_id);

    /* Now wait till semaphore value reaches 0 */
    wait_semaphore (sem_id);
}
Tip: For Parsec Benchmarks you can modify the hook to call the sync_all_processes.

14 comments:

  1. Thanks Avadh! I've been spending a few months banging my head against this! I ran into a lot of trouble when getting >2 threads to synchronize with each other (where the waiting time for the last process to exit kernel mode was >4billion cycles). I tried System V shared memory regions, semaphore, and even pthreads (using sched_affinity) to eliminate this but to no avail. I'll give the semwait a go and see.

    ReplyDelete
  2. Hi Zhong, I hope that you'll find the solution using this. Let me know if you have more questions :)

    ReplyDelete
  3. Indeed I did. Amazingly the latest git pull and your single counting semaphore solved the problem!
    Thank you once again!

    ReplyDelete
  4. Hi Avadh,

    Thanks for the post! However, I am facing one issue - I am not able to re-compile the hooks library in the PARSEC benchmarks to make use of the sync_all_process() routine. The command I am using is:

    parsecmgmt -a buld -p hooks -c gcc-hooksROI

    It complains about some other pkgs not having the gcc-hooksROI.bldconf file. Any thoughts on how I could fix it ?

    ReplyDelete
    Replies
    1. I think configuration name is just 'gcc-hooks' which includes hooks for ROI.

      Delete
    2. Thanks Avadh. It is indeed just gcc-hooks, but had to give -x roi as well.

      Delete
  5. Hi Avadh,
    To do the simulations with PARSEC benchmarks, I modify the hooks.c to include the sync_all_processes() routine and call it from inside __parsec_roi_begin of hooks.c
    However, the run freezes after printing out "Entering ROI". Any ideas on what could be the problem.
    My code :
    void __parsec_roi_begin()
    {
    ...
    printf(HOOKS_PREFIX"Entering ROI\n");
    fflush(NULL);
    sync_all_processes();
    ...
    }

    ReplyDelete
    Replies
    1. Before starting any benchmark, do you set semaphore value to number of threads? Take a look at 'set_semaphore.cpp' file given above, compile it and run it to set semaphore value before you start the benchmark.

      Delete
    2. I do compile it and run it before starting benchmark. What I do is :

      ./set_semaphore 4 &
      parsecmgmt -a run -c gcc-hooks -x roi -n 4 -p bodytrack

      Delete
    3. Can you check that semaphore ID used in both parsec and set_semaphore binary is same. It is defined in the header file.

      PS: This background in threaded reply is horrible :). I should change it.

      Delete
    4. After including the routine in the hook.c and its signature in hooks.h, I do a fulluninstall, fullclean and build.

      parsecmgmt -a fulluninstall -p hooks
      parsecmgmt -a fullclean -p hooks
      parsecmgmt -a build -c gcc-hooks -x roi -p hooks

      Is it the correct way to rebuild the hooks library ? Because while running any benchmark, I get an error with loading shared libraries.

      /root/parsec-2.1/pkgs/libs/hooks/inst/amd64-linux.gcc-hooks.roi/lib/libhooks.so.0 : Invalid ELF header

      Delete
    5. This comment has been removed by the author.

      Delete
    6. I have partially figured out what the problem is. The semaphore ids of both parsec and set_semaphore binaries are same, but because of some problem in the parsecmgmt script, even when I pass -n N, only 1 instead of N threads are spawned, hence they can't bring down the semaphore to zero. This behaviour is tested by running the parsecmgmt command N times when I use ./set_semaphore N &, and true enough, the "checkpoint_1" is created.

      Now I need to figure out why parsecmgmt won't behave the way it is supposed to. I have also been sourcing the global build and system config and setting the variables manually, as parsecmgmt isn't doing its job.

      Delete
    7. Also, in my case, it seems that the semaphore can't be brought down to zero. I tested that with one thread running. Were you able to figure out the problem?

      Delete