cs357 Lecture Notes
Spring 2000
Week 11, Thursday

For today you should have finished homework 7 and the take
home exam.

Homework 8 is available now.  It is due next Thursday 27 April.
As always, you should meet with your partner as soon as possible!

For next time you should read Chapter 8.  You can skim Sections
8.1 through 8.4 -- they develop the idea of memory management
gradually, but the stuff that is most applicable to current 
systems is in Sections 8.5, 8.6 and 8.7

As always, make sure you get through the chapter and don't get
bogged down.

Exam solutions
--------------

CPU Scheduling
--------------

1) Give priority to a process that has accumulated
the least amount of total CPU.

A process that has not run for long is expected to
complete soon, so this heuristic is an approximation
of SJF.  As such, we expect it to yield good 
average turnaround time.

Also, short processes are more likely to be interactive
processes, so this heuristic might help with response
time.


2) Give priority to a process that has made I/O requests
during most of its recent quanta.

By allowing an I/O bound process to proceed, we maximize
the chance of overlapping the I/O of one process with
the CPU of another, so this heuristic is likely to help
with CPU utilization.

Also, I/O bound processes are more likely to be interactive
jobs, so this heuristic might help with response
time.


3) Give a larger quantum to a process that has accumulated
a lot of CPU time.

This reduces the context-switch overhead of a long process,
which helps CPU utilization.

It probably does not help with either of the other goals.


Concurrency
-----------

Assume that two threads are running concurrently with the
following shared variables (and a shared output stream):


A gets printed before B.

  Maybe

A gets printed before D.

  True

Thread 1 prints i before Thread 2 prints D.

  Maybe

When Thread 1 prints i, it prints the value 5.

  False

G gets printed before B.

  Maybe


If G gets printed before D then the final value of i is 5.

  True

If H gets printed before G then the final value of i is 7.

  False  (the premiss does not imply the conclusion)


Describe in words how s1 is being used.  

   As a condition variable.

Describe in words how s2 is being used.  

   As a mutex.


Threads
-------

Is i a shared variable?  Why or why not?

   No.  It is a local varaible in entry, so it is allocated
   on the stack, and stacks are not shared between threads.


At first it may seem odd that the program prints two
different values, but both variables appear to be at the same
location.  One possibility is that we are seing the result of two
consecutive writes to the same location.  What is another possibility?

   The other possibility is that the two threads have their
   stack in the same virtual location, but that the addresses
   are mapped to different physical locations (and hence different
   values)

What experiment could you do to determine which
explanation is correct?  Explain both what you would do and how
you would interpret the results.

   You could print the values of both variable after both
   have been written.  If they are still different, then
   there must be two locations with two values.

   There are two ways to do this.  If you just print the
   variables repeatedly, eventually you will see that both
   writes have taken place.

   Or you can use a synchronization mechanism to make sure
   that both threads wait until both threads have finished
   the write.  This kind of sync is known as a barrier, and
   it can be implemented using sempahores thusly:

   Semaphore bar1 = 0;
   Semaphore bar2 = 0;
   
   Thread 1                              Thread 2
   --------                              --------

   int i = random ();                    int i = random ();

   signal (bar1);                        wait (bar1);
   wait (bar2);                          signal (bar2);

   print the value of i                  print the value of i

   This is not the most efficient thing in the world, but it
   creates the desired effect -- neither thread can proceed
   until both have reached the barrier.

   Challenge for the reader: write a general barrier that
   works for n threads such that none can proceed until
   all n reach the barrier, and then all n can proceed.


Memory management
-----------------


Physical memory systems
  old PC OS, some current Crays

1) Contiguous Allocation

2) Loading and linking

3) Swapping


Virtual memory systems
   just about everything else

1) Frame allocation (page replacement policy)

2) Loading becomes trivial

3) Swapping is subsumed by page replacement

4) Address translation


Loading and linking
-------------------

loading = bringing a program from disk into memory

linking = resolving the addresses that are embedded in
          the executable file

Some linking happens at compile time.

Some might have to be left until load time.

Compiler (which generates the executable) and the loader
(which is part of the OS) have to share a data structure
that identifies any addresses that have to be resolved.


Contiguous allocation
---------------------

Hard problem, exactly analogous to heap allocation (malloc).

As processes start and complete, they tend to leave lots
of small, unused spaces between the allocated space.

   (see page 253)

External fragmentation: available space is broken up
into little bits such that there is not enough _contiguous_
space for a request.

Strategies

1) first-fit

2) best-fit

3) worst-fit

Is it feasible to move a program after it has started
running?

Depends on whether it contains absolute or relative addresses
for its data references and jumps.

Also depends on what information from the executable we
can access at run time.


Variations on the theme
-----------------------

Swapping: while a process is suspended, we might write
	  its memory to disk and then copy it back, either
	  in the same location or elsewhere.

Overlays: we might not have to have the entire program in
	  memory; we might go get the rest of it later, or
	  have chunks of code that occupy the same space
	  alternately

Segmentation: we might require contiguous allocation within
              each segment, but there is no reason to require
	      the segments to be contiguous

    (Why is it better to allocate lots of little pieces?)


Virtual memory systems
----------------------

No need for contiguous allocation, so external fragmentation
goes away, but...

     fixed size pages => internal fragmentation

Loading becomes trivial, since all (virtual) addresses can be
resolved at compile time.

We know at compile time exactly where in VM the program will
reside, and we don't care where in PM.


Address translation (review)
-------------------

CPU generates virtual (logical) addresses

MMU translates to physical addresses and presents translated
address to the memory system.

VA broken into a page number and offset within page.


Page size
---------

Page size determines how many bits are in the offset.

1K = 10 bits
4K = 12 bits

How big is the page table for a 32-bit address space?

1K = 22 bits of page number = 2^22 pages = 4 million
4K = 20 bits of page number = 2^20 pages = 1 million

Bigger pages -> smaller page tables

Bigger pages -> more internal fragmentation

       (on average 1/2 page per segment per process)


Page table representation
-------------------------

Even 1 million entries is too much.  Assuming each
entry is 4 bytes, that's 4 MB just for the page table!

Fortunately, most processes don't use most of memory
most of the time.

Solution: two level page table

Break VAS into 2^10 books.

Each book contains 2^10 pages.

Each page contains 2^12 bytes.


Level one: find the right book

      2^10 4-byte entries = 4K

Level two: find the right page

      2^10 4-byte entries = 4K

The book table fits on a page!

One part of the page table fits on a page!

Convenient to put page tables in physical memory.

    (or maybe even in virtual memory)

Most books are completely empty.

Probably just two books for most processes; total of 3 pages
for the page table!


Access time
-----------

Every memory access (cache miss) requires:

1) read an entry from the book table (in memory)

2) read an entry from the page table (in memory)

3) read the value from memory

We just tripled the memory access time!  Aaarg!

Fortunately, we have our old friend locality.

Cache address translations!!!

Address translation cache = translation lookaside buffer (TLB)

TLB entry

    tag  = 20 bit book number/page number
    data = physical page number

TLB usually fully associative, to minimize contention,
maximize hit rate, because hit rate is going to be
very important for performance.

     (See the problem from the book).


TLB behavior
------------

How many entries will a typical process have?

1 MB of allocated memory = 256 4K pages

How many bits to name a physical page?

128 MB = 15 bits of physical page name

How big is a TLB entry?

20 bits + 15 bits + flags = roughly 40 bits = 5 bytes

How much TLB space per process?

5 bytes * 256 pages = 1-2 KB


What happens when we switch processes?

Maybe have to flush the TLB!


Can we avoid that?


Paging
------

Do all of a process's pages have to be in memory for
the process to run?  Nope.

Page table entry has two possible results.  Either:

1) a physical address, in which case address translation
   proceeds and the process continues running

OR

2) a location on disk, in which case

   a) the process blocks

   b) the OS finds a free frame

   c) the OS issues a disk read

   d) when the read completes, the OS updates the page table entry

   e) the process resumes and reissues the instruction

This process is called a page fault.


Demand paging
-------------

Like lazy swapping.  Instead of moving all of a process's pages
back into memory, we leave them on disk until the first time
they are needed.

Con: page faults take a long time (8 ms); screw up interactive
     response

Pro: many pages never loaded!  e.g. rare error-handling code

     (Implications for code generation?)


Page replacement
----------------

What if a page fault occurs and there are no physical
pages available?

We have to kick someone out!

b1) OS selects a frame

b2) OS issues a disk write

b3) when write completes, OS resumes page fault handler


Optimizations

1) keep at least one free page so you can overlap
   writing the old page and reading the new one

2) if the victim page has not been modified (and is
   already on disk), you don't have to write it


Page replacement algorithms
---------------------------