cs357 Lecture Notes Spring 2000 Week 9, Thursday For today you should have read the POSIX handout about mutexes, and read the rest of Chapter Six, and done significant work on the assignment. For Tuesday you should finish the assignment. Lock ---- If we can read and write a shared variable atomically, then we can build a lock. A lock is a shared value plus two operations: aquire (lock): when control returns from this method, we know that we have the lock (and no one else does) release (lock): we're done, and someone else can have the lock. Here's a very simple implementation of a lock that turns out not to be correct (although it does demonstrate a style of C programming that I want to recommend)... typedef struct { int value; } Lock; Lock *make_lock () { Lock *lock = (Lock *) malloc (sizeof(Lock)); lock->value = 0; return lock; } void acquire (Lock *lock) { while (lock->value == 1); lock->value = 1; } void release (Lock *lock) { lock->value = 0; } Why isn't this correct? What could go wrong? What if we could read a value from memory and write it at the same time? (Why does the value have to be in memory?) The Intel x86 instruction set provides Bit Test and Set For fun, I compiled lock.c to get lock.s gcc -S lock.c yields acquire: pushl %ebp // save the value of the reg movl %esp,%ebp // get the stack pointer nop .p2align 4,,7 .L3: movl 8(%ebp),%eax // move lock into reg cmpl $1,(%eax) // compare lock->value to 1 je .L5 // if it's 1, jump to bottom jmp .L4 // otherwise, jump out .p2align 4,,7 .L5: jmp .L3 // jump from bottom to top .p2align 4,,7 .L4: movl 8(%ebp),%eax // move lock into reg movl $1,(%eax) // set lock->value to 1 .L2: leave // restore the saved reg, I think ret I rewrote it as follows: acquire: pushl %ebp movl %esp,%ebp nop movl 8(%ebp), %eax // move lock into reg (once!) .p2align 4,,7 .L2: btsl $0,0(%eax) // bit test and set lock->value jc .L2 // if value = 1, jump to top .p2align 4,,7 .L3: leave // if we get here, we know that ret // value was 0 and that we set it I couldn't help doing a little optimization while I was in there. Downside: atomic test and set is hard to implement on multiprocessors. POSIX mutexes ------------- There are several problems with the implementation of locks we have so far: 1) it does not provide bounded waiting. 2) it uses spin waiting Spin waiting means that when a thread is waiting for the lock it executes a tight loop, consuming CPU time, until the lock becomes available. In the best case, the waiting thread wastes the rest of the current quantum and all of any future quanta it is given. In the worst case, the waiting thread might have higher priority than the thread that has the lock, causing deadlock (depending on the scheduler -- this is one other argument in favor of lottery scheduling over strict priorities) Ideally, we would like a waiting process to tell the OS that it cannot run. There are several ways to do this: 1) sched_yield: this tells the OS that you would like to give up the rest of your quantum, but the OS is free to reschedule you in the future 2) sleep: tells the OS that you would like to give up the rest of the current quantum, and prevents you from being scheduled again for _at_least_ x seconds (maybe more) (UNIX provides a system call named sleep that takes an argument in units of seconds. There is also a system call named select that is more complicated (and intended for something else altogether) that can be used to sleep for more precise intervals (in ms)) Neither of these has quite the behavior we want, which is to yield and prohibit rescheduling until some application- level event occurs. mutexes make this possible! mutexes have the same interface as locks, which means that in one sense they are an implementation of locks, but their behavior is different. From the man page: "A mutex has two possible states: unlocked (not owned by any thread), and locked (owned by one thread). A mutex can never be owned by two different threads simultaneously. A thread attempting to lock a mutex that is already locked by another thread is suspended until the owning thread unlocks the mutex first." You can use pthread_mutex_trylock to busy wait on a mutex. In multiprocessor environments, it is often optimal to busy wait for a short period of time and then block. 1) if you get the lock quickly, you save the overhead of a context switch 2) if not, you let someone else use the CPU (Why not busy-wait in a single processor environment?) Semaphores ---------- Ok, before we go on I have to clarify something. Whenever people talk about synchronization, they talk about locks, condition variables and semaphores. But it is often hard to figure out what they are implementing in terms of what. For example, in our book, they use semaphores to solve the mutual exclusion problem, but then they admit that in order to implement semaphores you need to be able to provide mutual exclusion! At this point, your head explodes. Here's the answer: ALL SYNCHRONIZATION MECHANISMS ARE EQUIVALENT meaning that if you have any of them, you can implement the others. The differences among them include: 1) performance issues is there busy waiting? if there is a lock, how long is the critical section (since this affects the amount of contention)? 2) expressability how well does the mechanism match the problem you are trying to solve; can you express the solution naturally? To demonstrate equivalence, I will show how to implement semaphores using mutexes and condition variables. To demonstrate expressability, we will look at several classical problems in synchronization and solve them using these tools. A semaphore is an integer that is initialized and then accessed only through two operations: P proberen decrement "test and wait if necessary" V verhogen increment "signal and wake if necessary" wait (Semaphore *s) { acquire (s->mutex); s->value--; if (s->value < 0) { wait (s->cond_var); // note that this releases the mutex } release (s->mutex); } signal (Semaphore *s) { acquire (s->mutex); s->value++; if (s->value < 0) { signal (s->cond_var); } release (s->mutex); } How to interpret this: Think of the value as being the number of items available, if non-negative, or the number of people in queue if negative. If you invoke P while value >= 1, you don't have to wait. If you invoke P while value <= 0, you have to wait If you invoke V while value < 0, you are reducing the length of the queue, so you have to wake someone up Semaphores are a good synchronization primitive because 1) they are efficient (little, if any, spin waiting, depending on implementation) 2) they are versatile POSIX provides an implementation of Semaphores, but not as part of the Pthread package. See the man page for sem_init. Bounded buffer problem ---------------------- a.k.a. producer-consumer problem 1) producer puts things into a shared buffer (think of an array of integers) 2) consumer takes things out Maximum capacity is fixed. Example: coke machine Synchronization requirements: 1) can't take coke from an empty machine 2) can't put coke into a full machine 3) have to maintain mutually exclusive access to machine Initialize three Semaphores: fullBuffers = 0 emtpyBuffers = capacity machineFree = 1 Producer code (coke delivery man) wait (emptyBuffers) wait (machineFree) put one Coke in machine signal (machineFree) signal (fullBuffers) Consumer code wait (fullBuffers) wait (machineFree) put one Coke in machine signal (machineFree) signal (emptyBuffers) First, notice that the Semaphores are not all being used the same way. machineFree is simply acting as a mutex; it's value is always 0 or 1 fullBuffers counts the number of cans of coke, or the number of people waiting in queue emptyBuffers counts the number of empty slots, or the number of producers waiting with cans of coke. Likely scenario: there is only one producer, running in a loop, so emptyBuffers is never lower than -1. Moral: high-level sync mechanisms like Semaphores allow complicated problems to be solved with readably, demonstrably correct, symmetric code. Well, almost. What would happen if you reversed the order of the first two statements? Readers/writers problem ----------------------- Similar to mutual exclusion, except that it is ok to have multiple readers, but not multiple writers and not readers and writers at the same time. 1) readers can proceed as long as there are zero writers 2) writers can proceed as long as there are zero readers and zero writers Variables: integer readcount; // how many readers in the room? Semaphore mutex; // protect access to readcount Semaphore wrt; // can a writer enter? Mutex is being used as a lock; wrt is being used as a condition variable! The writer code is simple: wait (wrt) do some writing signal (wrt) This provides mutual exclusion among writers, but also gives readers an ability to control writers. Here's the reader code: wait (mutex) readcount++ if (readcount == 1) wait (wrt); signal (mutex) do some reading wait (mutex) readcount--; if (readcount == 0) signal (wrt); signal (mutex) The use of mutex is obviously correct, in the sense of providing mutual exclusion. The first reader to enter has to bar writers by decrementing wrt if there were no writers, then the reader proceeds and future writers are barred if there is a writer then we wait for him to leave while we wait, no other readers can get mutex! Subsequent readers don't have to wait on wrt, because the presence of a prior reader indicates that it is safe to enter, and that writers are already barred. For similar reasons, the last reader to leave has to unbar the writers. Problem with this solution: writer might have to wait forever while readers come and go. We might like to have a system where the readers already in the system can finish, but once a writer arrives, future readers have to wait. Dining philosophers!