cs357 Lecture Notes
Spring 2000
Week 9, Tuesday

For today you should have read Chapter 6, Sections 6.1 through 6.4

You should have picked up hw6, scheduled meeting
times with your partner, and gotten started on at least the first
experiment.  Homework 6 is due next Tuesday.

For Thursday you should read the POSIX handout about mutexes.
Also you should read the rest of Chapter Six.

Wednesday: cs fete!


Chapter 6: Synchronization
--------------------------

In general we cannot make any claims about the order in
which concurrent code runs.

      Thread A			Thread B
      --------			--------
      a1			b1
      a2			b2

We can say for sure that a2 > a1      > means "happens after"
We can sat for sure that b2 > b1

But, we cannot compare any of the as to any of the bs.

Shared variables can impose constraints on the ordering.
On the exam we saw something like:

     Thread A                  Thread B
     --------                  --------
                                b1
      while (x == 5) {          x = 6
        // wait                 b2
      }
      a1

Now, if we can prove that the initial state of x is not 5
and that B is the only thread that can make the value of
x be 5, then we can prove that a1 > b1.

But we can still make no claim about a1 <> b2.

This is an example of a simple form of synchronization.

It is not a particularly useful form because it only allow
us to make a few simple claims, and it is not easy to generalize
it to solve the problems we are interested in.

synchonization:  together time

   any mechanism that creates a temporal relationship between
   statements of distinct threads


Mutual exclusion
----------------

Remember that if two threads access a shared variable concurrently,
the outcome is often nondeterministic.

    (Except in special cases, e.g. all accesses are reads,
    or all writes are writing the same value.)

Non-deterministic is not always bad.  But correctness of programs
often demands determinism.

The fundamental source of the problem is that we cannot control
when interrupts happen.

Any non-atomic operating might get interrupted.


Interrupts off!
--------------

One simple solution is to turn interrupts off before doing
something non-atomic and potentially non-deterministic:

     Thread A                  Thread B
     --------                  --------
     temp1 = x		       temp2 = x
     x = temp1 + 1	       x = temp2 + 1

Obviously it would be bad if either thread got preemtped.
between statements.

     Thread A                  Thread B
     --------                  --------
     interrupts off	       interrupts off
     temp1 = x		       temp2 = x
     x = temp1 + 1	       x = temp2 + 1
     interrupts on	       interrupts on

This solves the problem, but it raises some difficulties:

1) we don't want to give users the ability to turn off
   interrupts

   a) it would violate CPU protection, because they could
      run forever

   b) it would violate interprocess protection, since if
      a process neglected to turn interrupts back on, it
      would crash the OS

   c) if a process leaves interrupts off too long, it might
      cause an I/O buffer to overflow, causing I/O operations
      to fail.

2) it's hard to turn interrupts off on a multiprocessor

Nevertheless, this is sometimes an appropriate mechanism
inside the kernel, where other mechanisms might not be
available, or might be unacceptably inefficient, and where
the code is more carefully controlled.


Mutual exclusion problem
------------------------

Before we look at other solutions, it is useful to state the
problem more carefully.

For many problems, what we really want is not necessarily
that threads be immune from interruption.

There is a more conservative requirement that serves the
same purpose, but might be cheaper to implement.

We can identify sections of code in different threads,
called critical sections, that should never run concurrently
with each other.

In the previous example, the critical section is between
"enter" and "leave".

     Thread A                  Thread B
     --------                  --------
     enter		       enter
     temp1 = x		       temp2 = x
     x = temp1 + 1	       x = temp2 + 1
     leave		       leave

The requirement is that if one of these threads is running
in its critical section, then the other one (or others) cannot.

This is sufficient to guarantee determinism, but it is weaker
than the no interrupt requirement, since we are free to interrupt
A in its critical section if necessary.

If we do, we are free to run any other thread, including B, as
long as B does not try to enter its critical section.

In that case --- A was running in its critical section and
was interrupted, and now B is trying to get in --- B will have
to wait at "enter" until A has resumed and left the critical
section.


Software solution for two processes
-----------------------------------

The book's algorithm 3

flag[i] indicates that thread i would like to enter it's
critical section.

     Thread A                  Thread B
     --------                  --------
     i = 0		       i = 1           // my id
     j = 1		       j = 0           // other id

Enter section

     flag[0] = true	       flag[1] = true    // set my flag
     turn = 1		       turn = 0	         // set turn to other
     while (flag[1] &&         while (flag[0] && // wait
	    turn = 1)                 turn = 0)

Critical section

     assert (!flag[1] ||       assert (!flag[0] ||
	    turn = 0)                  turn = 1)

     temp1 = x		       temp2 = x
     x = temp1 + 1	       x = temp2 + 1

Laeve section

     flag[0] = false	       flag[1] = false

Can we prove this is correct?

Only thread i writes flag[i], so flag[i] is not a shared variable.

Turn is a shared variable and both threads write it concurrently,
so the outcome is non-deterministic, but it doesn't matter which
value "sticks".

Inside the critical section, both threads can assert that
"either the other thread is not trying to enter or the other
thread is trying to enter but it is my turn."

This proves that mutual exclusion is guaranteed.  In addition, we
might want to prove the properties of progress and bounded waiting.

Progress: if a thread wants to enter, it might have to wait for
	  another thread that is trying to enter, but it shouldn't
	  have to wait for a thread that is not trying to enter

	  (this is the problem with Algorithm 1, which requires
	  strict alternation)

Bounded waiting: if a thread is waiting to enter, there has to
	be a bound on the number of times other threads enter before
	it does

	(can't break ties by choosing the thread with a smaller
	id, for example) 


Bakery algorithm
----------------

We can generalize the previous solution to n processes, at the
cost of some additional complexity.

It is hard to solve this problem in software, and this is
a relatively simple synchronization problem!

Can we get a little help from the hardware?


Lock
----

Yes!  If we can read and write a shared variable atomically,
then we can build a lock.

A lock is a shared value plus two operations:

aquire (lock): when control returns from this method,
	       we know that we have the lock (and no
	       one else does)

release (lock): we're done, and someone else can have
		the lock.


Here's a very simple implementation of a lock that turns
out not to be correct:

typedef struct {
  int value;
} Lock;

Lock *make_lock ()
{
  Lock *lock = (Lock *) malloc (sizeof(Lock));
  lock->value = 0;
  return lock;
}

void acquire (Lock *lock)
{
  while (lock->value == 1);
  lock->value = 1;
}

void release (Lock *lock)
{
  lock->value = 0;
}

Why isn't this correct?  What could go wrong?

What if we could read a value from memory and write it
at the same time?

   (Why does the value have to be in memory?)


The Intel x86 instruction set provides Bit Test and Set

For fun, I compiled lock.c to get lock.s

    gcc -S lock.c

yields

acquire:
        pushl %ebp              // save the value of the reg
        movl %esp,%ebp          // get the stack pointer
        nop
        .p2align 4,,7
.L3:
        movl 8(%ebp),%eax       // move lock into reg
        cmpl $1,(%eax)          // compare lock->value to 1
        je .L5                  // if it's 1, jump to bottom
        jmp .L4                 // otherwise, jump out
        .p2align 4,,7
.L5:
        jmp .L3                 // jump from bottom to top
        .p2align 4,,7
.L4:
        movl 8(%ebp),%eax       // move lock into reg
        movl $1,(%eax)          // set lock->value to 1
.L2:
        leave                   // restore the saved reg, I think
        ret


I rewrote it as follows:

acquire:
        pushl %ebp
        movl %esp,%ebp
        nop
        movl 8(%ebp), %eax      // move lock into reg (once!)
        .p2align 4,,7
.L2:
        btsl $0,0(%eax)         // bit test and set lock->value
        jc .L2                  // if value = 1, jump to top
        .p2align 4,,7
.L3:
        leave                   // if we get here, we know that
        ret                     // value was 0 and that we set it


I couldn't help doing a little optimization while I was in there.

Note: this solution does not provide bounded waiting.  The
book shows how to augment it.

Downside: atomic test and set is hard to implement on multiprocessors.