cs357 Lecture Notes
Spring 2000
Week 9, Thursday

For today you should have read the POSIX handout about mutexes,
and read the rest of Chapter Six, and done significant work on
the assignment.

For Tuesday you should finish the assignment.

Lock
----

If we can read and write a shared variable atomically,
then we can build a lock.

A lock is a shared value plus two operations:

aquire (lock): when control returns from this method,
	       we know that we have the lock (and no
	       one else does)

release (lock): we're done, and someone else can have
		the lock.


Here's a very simple implementation of a lock that turns
out not to be correct (although it does demonstrate a
style of C programming that I want to recommend)...


typedef struct {
  int value;
} Lock;

Lock *make_lock ()
{
  Lock *lock = (Lock *) malloc (sizeof(Lock));
  lock->value = 0;
  return lock;
}

void acquire (Lock *lock)
{
  while (lock->value == 1);
  lock->value = 1;
}

void release (Lock *lock)
{
  lock->value = 0;
}

Why isn't this correct?  What could go wrong?

What if we could read a value from memory and write it
at the same time?

   (Why does the value have to be in memory?)


The Intel x86 instruction set provides Bit Test and Set

For fun, I compiled lock.c to get lock.s

    gcc -S lock.c

yields

acquire:
        pushl %ebp              // save the value of the reg
        movl %esp,%ebp          // get the stack pointer
        nop
        .p2align 4,,7
.L3:
        movl 8(%ebp),%eax       // move lock into reg
        cmpl $1,(%eax)          // compare lock->value to 1
        je .L5                  // if it's 1, jump to bottom
        jmp .L4                 // otherwise, jump out
        .p2align 4,,7
.L5:
        jmp .L3                 // jump from bottom to top
        .p2align 4,,7
.L4:
        movl 8(%ebp),%eax       // move lock into reg
        movl $1,(%eax)          // set lock->value to 1
.L2:
        leave                   // restore the saved reg, I think
        ret


I rewrote it as follows:

acquire:
        pushl %ebp
        movl %esp,%ebp
        nop
        movl 8(%ebp), %eax      // move lock into reg (once!)
        .p2align 4,,7
.L2:
        btsl $0,0(%eax)         // bit test and set lock->value
        jc .L2                  // if value = 1, jump to top
        .p2align 4,,7
.L3:
        leave                   // if we get here, we know that
        ret                     // value was 0 and that we set it


I couldn't help doing a little optimization while I was in there.

Downside: atomic test and set is hard to implement on multiprocessors.


POSIX mutexes
-------------

There are several problems with the implementation of locks
we have so far:

1) it does not provide bounded waiting.

2) it uses spin waiting

Spin waiting means that when a thread is waiting for the
lock it executes a tight loop, consuming CPU time, until
the lock becomes available.

In the best case, the waiting thread wastes the rest of
the current quantum and all of any future quanta it is
given.

In the worst case, the waiting thread might have higher
priority than the thread that has the lock, causing deadlock

(depending on the scheduler -- this is one other argument in
favor of lottery scheduling over strict priorities)

Ideally, we would like a waiting process to tell the OS
that it cannot run.

There are several ways to do this:

1) sched_yield:  this tells the OS that you would like to
   give up the rest of your quantum, but the OS is free to
   reschedule you in the future

2) sleep: tells the OS that you would like to give up the
   rest of the current quantum, and prevents you from being
   scheduled again for _at_least_ x seconds (maybe more)

(UNIX provides a system call named sleep that takes an
argument in units of seconds.  There is also a system
call named select that is more complicated (and intended
for something else altogether) that can be used to sleep
for more precise intervals (in ms))

Neither of these has quite the behavior we want, which
is to yield and prohibit rescheduling until some application-
level event occurs.

mutexes make this possible!

mutexes have the same interface as locks, which means that
in one sense they are an implementation of locks, but their
behavior is different.

From the man page:

       "A  mutex  has  two possible states: unlocked (not owned by
       any thread), and locked (owned by one thread). A mutex can
       never  be owned by two different threads simultaneously. A
       thread attempting to lock a mutex that is  already  locked
       by  another  thread  is  suspended until the owning thread
       unlocks the mutex first."


You can use pthread_mutex_trylock to busy wait on a mutex.

In multiprocessor environments, it is often optimal to
busy wait for a short period of time and then block.

1) if you get the lock quickly, you save the overhead of
   a context switch

2) if not, you let someone else use the CPU

   (Why not busy-wait in a single processor environment?)


Semaphores
----------

Ok, before we go on I have to clarify something.

Whenever people talk about synchronization, they
talk about locks, condition variables and semaphores.

But it is often hard to figure out what they are
implementing in terms of what.

For example, in our book, they use semaphores to
solve the mutual exclusion problem, but then they
admit that in order to implement semaphores you
need to be able to provide mutual exclusion!

At this point, your head explodes.

Here's the answer:

ALL SYNCHRONIZATION MECHANISMS ARE EQUIVALENT

meaning that if you have any of them, you can
implement the others.

The differences among them include:

1) performance issues

   is there busy waiting?

   if there is a lock, how long is the critical section
   (since this affects the amount of contention)?

2) expressability

   how well does the mechanism match the problem you
   are trying to solve; can you express the solution
   naturally?


To demonstrate equivalence, I will show how to implement
semaphores using mutexes and condition variables.

To demonstrate expressability, we will look at several
classical problems in synchronization and solve them
using these tools.

A semaphore is an integer that is initialized and
then accessed only through two operations:

P   proberen   decrement   "test and wait if necessary"
V   verhogen   increment   "signal and wake if necessary" 

wait (Semaphore *s)
{
    acquire (s->mutex);
    s->value--;
    if (s->value < 0) {
        wait (s->cond_var);   // note that this releases the mutex
    }
    release (s->mutex);
}
	   
signal (Semaphore *s)
{
    acquire (s->mutex);
    s->value++;
    if (s->value < 0) {
        signal (s->cond_var);   
    }
    release (s->mutex);   
}

How to interpret this:

Think of the value as being the number of items available,
if non-negative, or the number of people in queue if negative.

If you invoke P while value >= 1, you don't have to wait.
If you invoke P while value <= 0, you have to wait

If you invoke V while value < 0, you are reducing the length
                                 of the queue, so you have
				 to wake someone up 


Semaphores are a good synchronization primitive because

1) they are efficient (little, if any, spin waiting,
   depending on implementation)

2) they are versatile

POSIX provides an implementation of Semaphores, but not
as part of the Pthread package.  See the man page for
sem_init.


Bounded buffer problem
----------------------

a.k.a. producer-consumer problem

1) producer puts things into a shared buffer (think of an
   array of integers)

2) consumer takes things out

Maximum capacity is fixed.

Example: coke machine

Synchronization requirements:

1) can't take coke from an empty machine

2) can't put coke into a full machine

3) have to maintain mutually exclusive access to machine


Initialize three Semaphores:

fullBuffers = 0
emtpyBuffers = capacity
machineFree = 1
 
Producer code (coke delivery man)

wait (emptyBuffers)
wait (machineFree)
    put one Coke in machine
signal (machineFree)
signal (fullBuffers)


Consumer code

wait (fullBuffers)
wait (machineFree)
    put one Coke in machine
signal (machineFree)
signal (emptyBuffers)


First, notice that the Semaphores are not all being
used the same way.

machineFree is simply acting as a mutex; it's value
is always 0 or 1

fullBuffers counts the number of cans of coke, or
the number of people waiting in queue

emptyBuffers counts the number of empty slots, or
the number of producers waiting with cans of coke.


Likely scenario: there is only one producer, running
in a loop, so emptyBuffers is never lower than -1.


Moral: high-level sync mechanisms like Semaphores allow
complicated problems to be solved with readably,
demonstrably correct, symmetric code.

Well, almost.

What would happen if you reversed
the order of the first two statements?


Readers/writers problem
-----------------------

Similar to mutual exclusion, except that it is ok to
have multiple readers, but not multiple writers and not
readers and writers at the same time.

1) readers can proceed as long as there are zero writers
2) writers can proceed as long as there are zero readers and zero writers

Variables:

integer readcount;    // how many readers in the room?
Semaphore mutex;      // protect access to readcount
Semaphore wrt;        // can a writer enter?

Mutex is being used as a lock; wrt is being used
as a condition variable!


The writer code is simple:

wait (wrt)
    do some writing
signal (wrt)

This provides mutual exclusion among writers, but also
gives readers an ability to control writers.  Here's
the reader code:

wait (mutex)
  readcount++
  if (readcount == 1) wait (wrt);
signal (mutex)

    do some reading

wait (mutex)
  readcount--;
  if (readcount == 0) signal (wrt);
signal (mutex)


The use of mutex is obviously correct, in the sense of
providing mutual exclusion.

The first reader to enter has to bar writers by decrementing wrt

    if there were no writers, then the reader proceeds
    and future writers are barred

    if there is a writer then we wait for him to leave

    while we wait, no other readers can get mutex!

Subsequent readers don't have to wait on wrt, because
the presence of a prior reader indicates that it is
safe to enter, and that writers are already barred.

For similar reasons, the last reader to leave has to
unbar the writers.

Problem with this solution: writer might have to wait
forever while readers come and go.  We might like to
have a system where the readers already in the system
can finish, but once a writer arrives, future readers
have to wait.


Dining philosophers!