cs357 Lecture Notes
Spring 2000
Week 6, Thursday

For today you should have read the handout about threads
and finished reading Chapter 4.

Also, you should have picked up homework 4, compiled and run the code,
printed documentation, and done the first experiment.

For next time you should finish assignment 4 and prepare for
the second exam, which is next Tuesday!


Topics for today
----------------

1) Threads, completed.

2) Homework questions?

3) Virtual memory.


Atomicity
---------

Imagine a threaded program with the initial state x = 1
(x is a shared variable).

    Thread 1		     Thread 2
    --------		     --------
     x++                      x++

It seems like it doesn't matter what order these happen in;
either way, x gets incremented twice.

But it turns out that when this code gets compiled, it looks
more like:

    Thread 1		     Thread 2
    --------		     --------
     t1 <- x                  t2 <- x
     t1 <- t1 + 1             t2 <- t2 +1
     x <- t1                  x <- t1

Now it is not clear whether x will get incremented once
or twice!

This example show that in order to understand syncronization,
we have to understand which operations are atomic, and which
are subject to interruption.

If the hardware provides an instruction that increments the
value of a memory location, as the VAX did, then it is usually
not possible to interrupt in the middle.

If the hardware is a register machine that has to load,
modify and store, then it usually is possible.

But if the compiler had made x a register variable, then it
would not be possible!!!

Here's another unpleasant hardware evilness.

Reading and writing a word is atomic, but on some machines
reading or writing a double is not!

    Thread 1		     Thread 2
    --------		     --------
     x = -1.0                 x = 2.0

It is possible for this code to result in x = -2.0!


Assignment 4, experiment 1
--------------------------

How do all these stacks get packed into an address space?

How can we find out?


Assignment 4, experiment 2
--------------------------

The race is on!

    Thread 1		     Thread 2
    --------		     --------
     i = n
     pthread_create
     while (i < 2n)	      while (i > 0)
       i++                      i--
     print "parent wins!"     print "child wins!"


This is an example of unsynchronized access to a shared variable.

Unsynrchonized means we cannot make a statement about the relative
order of statements in different threads.

Synchronization constraints are claims that we can make about
the order of statements in different threads.


Practice exam
-------------

Here are the solutions for the practice exam.  I strongly
recommend that you read the practice exam and think hard
about the questions before you look at the answers.


True or False

Creating a thread is faster than creating a process.

	 Usually true.

One way for processes to communicate with each other is
through direct memory access.

	 False.  DMA is a means of I/O

With a virtual memory system, user programs may access any
location in physical memory.

         False.

On some systems, a process's page tables are stored in the
process's address space.

         False.  If that were true, you would not be able
         to find a page that was paged out.

As the quantum size approaches zero, round-robin scheduling
becomes the same thing as FIFO scheduling.

         False.  But we haven't covered this yet.

If two threads make concurrent accesses to the same variable,
the results of the program will be nondeterministic.

         Not if both accesses are reads.

In 1997, the Manila stock exchange outperformed the Bangkok
exchange.

         We didn't cover this this year.


(4 points) On a parallel processor, a given process might not run on
the same processor all the time, but it is likely to run faster
if it does.  Why?

         Cache affinity.  As it runs, it accumulates state in cache.
	 If it switches to another processor, it loses it.

(4 points) Explain two reasons why a process's CPU time might be less
than its wall clock time.

         I spends time blocked on I/O; it spends time waiting
	 for other processes to run.

(4 points) What two events might occur that would require the
scheduler to choose a new process to run?

	 The process blocks on I/O; the process terminates.

(4 points) Name two other events that might occur that might inspire
the scheduler to preempt a running process and start another process.

         The timer expires; some other interrupt occurs.

(5 points) If context switches are very expensive, there are
circumstances when the best decision for the scheduler is to
leave the CPU idle, even if there are ready processes.  Why?

         If you know that an interrupt is going to occur soon,
	 it might not be worth starting a process.


Virtual memory
--------------

Virtual memory and page tables have come up several times.
I have been finding it awkward to talk around them, so I want
to take a minute to explain.

In most systems, there is special hardware on the CPU that
translates the memory addresses generated by running programs
from 

VIRTUAL ADDRESS:  an address relative to the address space
	          of a specific process

to

PHYSICAL ADDRESS: an address relative to the beginning of
	          physical memory

The unit that does this translation is sometimes called
the memory management unit (MMU).

The size of the VAS is determined by the size of the
addresses.  32-bit addresses yield a VAS of 2^32 bytes,
which is 4 GB.

Clearly, that is bigger than most physical memories (although
not for long), which means that not all virtual memory locations
are in memory at the same time.

There are three possible fates for a virtual memory location:

1) it might be in memory

2) it might be on disk

3) it might not be allocated yet, so it might not exist anywhere
   (example: all the space between the heap and the stack)

Most of the time, running processes occupy less than 4 MB
of memory, which is 1/1024 of the the available address space.

That's why, most of the time, you can fit all the _allocated_
memory for multiple processes in physical memory.


Pages
-----

The way this all works is that both VAS and PM are divided into
pages that are the same size.  Each allocated virtual page has
to be put somewhere in PM, but the nice thing is that it doesn't
matter where.

The pages associated with a process don't have to be in contiguous
physical memory!

The MMU translates VA to PA by

1) break the address into two pieces, a page number and an offset
   within that page

2) look the page number up in a table, and find the physical
   page (sometimes called a frame) where it resides

3) attach the offset to the physical page number... that's
   the address that gets sent to the memory unit


Example
-------

8-bit addresses, 64 byte pages/frames

How many pages in VAS?

Assume there are 32 frames in physical memory.  How
big is a physical page number?

Translation:

1) think of a VA as a 2-bit page number and a 6-bit offset

2) translate the page number into a 5-bit physical page
   number

3) append the 6-bit offset and send it to the memory unit


In real life, with big addresses, we have a couple of problems:

1) page tables are too big

2) translation is too slow.


Implementation/interface
------------------------

Later, we will look at the implementation details that solve
these problems.

For now, the important thing is to know the capabilities
this interface provides:

1) processes can allocate new pages any time there are physical
   pages available

2) if a process tries to access an unallocated page, a trap
   occurs

3) a page can be marked read only, in which case a write
   access will cause a trap

   a) useful for implementing copy on right

   b) useful for implementing shared text

4) the virtual address spaces of different processes overlap,
   but when one process generates address X, it maps to a different
   location than when another process generates address X