cs357 Lecture Notes
Spring 2000
Week 8, Tuesday

For today you should have picked up homework 5, scheduled meeting
times with your partner, and gotten started on at least the first
experiment.  Homework 5 is due this Thursday.

Today (and Thursday)...

1) Chapter 5: CPU Scheduling

2) Homework 5 intro

3) Exam discussion

4) Homework 4 results


Homework 4 results
------------------

Virtual address space looks like what we thought.

	(Draw the picture!)

Children's stacks have about 2 MB each.

	(Express values in human-comprehensible terms!)

Can't get them to crash, segmentation fault happens first.

        (Most people were good about reporting accurately,
	 even if they did not have an explanation.)

	(Do a little math: how many recursions before the
	seg fault?  Is that enough to consume 2 MB?)

Tug of war: lousy results!

        (Plot run time versus n!)


CPU Scheduling continued...

Goals
-----

1) minimize the average turnaround time for all processes

2) satisfy real-time constraints, like requirements for
   interactive behavior

3) maintain high utilization of the CPU and the other hardware
   by overlapping the CPU of one process with the I/O of
   another

4) behave predictably and fairly


Strategies (heuristics)
-----------------------

FIFO:  first in first out

       pro: simple, and meets some definition of fairness

       con: bad average turnaround when short waits for long

            no priority mechanism: hard to meet real-time constraints


SJF:   shortest (time to completion or blocking) first

       pro: provably optimal (minimal) average turnaround time
            for a set of processes all available at t0

       con: we generally don't know the duration of each process's
            next CPU burst, and also the model assumed in the 
	    proof does not consider the dependency of successive
	    bursts


Round Robin:  give each process a little bit of time

       pro: generally good when there is large variability in duration

       con: overhead becomes signficant as quantum gets smalle

            pessimal when jobs are the same length


External priorities:

       pro: satisfies real time requirements

       con: burden on users/programmers

            starvation / bidding wars

Multi-level feedback queue:
     
       pro: reduces user burden

       con: can get fooled, manipulated

            occasional pathological behavior

	    (example: priority increases with I/O; what
	              happens to a thrashing process?)

             can be expensive to keep track of needed info


Non-deterministic scheduling:

     Example: lottery scheduling.

     Problem: for any deterministic system you can contruct
              pathological cases that yield bad performance

     Solution: a little randomness goes a long way

     Con: unpredictable execution model for programmers


Any idea what Linux is doing?

    Intro to Homework 5.


Real-time scheduling
--------------------

When processes arrive, we generally need to know

1) their resource requirements

2) their deadlines

3) sometimes a cost function (how bad would it be if
   we missed the deadline)

The goal of the real time scheduler is either

1) find any schedule that misses no deadlines

   (or maybe the best such schedule)

2) if not possible, minimize the total cost of all misses


Some real-time schedulers involve recurring tasks, in which
case they sometimes break the time line into cycles that
are the LCM of the cycle times of recurring tasks.

In the case where there is guaranteed to be a schedule that
misses no deadlines, and the cost of a context switch is zero,
there is a simple algorithm that is guaranteed to find a
successful schedule:

   Work on the Process with the Nearest Deadline First

Sound familiar?

Problem: if the schedule is untenable, this algorithm is
often pessimal.


Dependencies
------------

Complication: there are often dependencies among tasks.

1) the sooner you issue an I/O command, the sooner it completes

2) users might not submit the next process until this one
   completes

These dependencies limit the usefulness of simple queueing
models.

Instead, designers use heuristics like:

1) give processes priority if you expect them to do I/O soon

   (and why might you expect that?)

2) degrade the priority of things that have been running for
   a long time

   a) they are less likely to be interactive

   b) the relative cost of delays is declining


Evaluation techniques
---------------------

1) Analysis: only works for very simple strategies/models

2) Simulation: useful for checking out wide range of
               possibilities.

               depends on workload model or traces

3) Implementation: most reliable, but also depends on workload

(Which do Waldspurger and Weihl do?)

(What is their workload or workload model?)

(What is their performance metric?)


Exam 2 solutions
----------------

Part One: Fork

void recurse_and_fork (int n)
{
  pid_t pid;

  if (n == 0) return;

  printf ("Hello from level %d\n", n);
  pid = fork ();

  recurse_and_fork (n-1);
}  

int main ()
{
  recurse_and_fork (3);
}


(4 points) Is this program legal?  If not, why not?  If
so, what is the output?

Legal.  The output is

Hello from level 3.
Hello from level 2.
Hello from level 2.
Hello from level 1.
Hello from level 1.
Hello from level 1.
Hello from level 1.


(2 points) How many child processes did the program create?

   3000

(2 points) Did the child processes run concurrently with
the parent process?

    Yes.

(2 points) Did the child processes run concurrently with
each other?

    No.

(2 points) Name all the system calls that are invoked by
the parent process inside the loop.

    fork, wait

(2 points) Name all the system calls that are invoked by
the child processes.

    exit

(3 points) How much system time is charged to the parent
process for each child it creates?

    0.41 / 3000 = 140 us

(3 points) How much system time is charged to each child
process?

    0.88 / 3000 = 290 us

(4 points) Based on these results, do you think the child
processes are being charged for some part of the fork and wait
system calls?  Give evidence to support your claim, but also
indicate any alternative explanations.

    It seems likely that the child is being charged for part
    of the fork time (or maybe the wait time), since 290 us
    is a lot for just exiting, and it seems unlikely that it
    takes longer to do nothing and then exit than it does to
    create a new process.

(4 points) Let's say that you ran the program again and
got the following result...

What changes, if any, would you make to the answers you have
given above (questions 2-9)?

    I would modify my estimates to reflect the variability
    in the results.  Instead of saying 140 us, I would say
    "somewhere in the range from 90 us to 140 us".

(4 points) If you needed a more accurate estimate of the
per-process costs of forking, how would you do it?

    Increasing the number of processes would yield a more
    precise estimate of what we are already measuring, but
    it would not yield a more accurate estimate of the
    fork time.  To do that I would have to redesign the
    experiment to get at just the fork time.


*Part Two: True or False

When a process executes an I/O operation, it moves from
the ready state into the waiting/blocked state.

If two threads make unsynchronized access to a shared variable,
then the program is nondeterministic.

A context switch between threads of the same program is
generally slower than a context switch between processes.

When an interrupt occurs, the running process moves into
the ready state, and the scheduler must choose a new process to run.

Interprocess protection prevents one thread from reading
and writing the memory of other threads.

The sum of a process's CPU time and system time is usually
roughly equal to its wall clock time.

In a virtual memory system, a process can read and write any
address in the physical address space.


	All are false, except Number 4, which was sufficiently
	ambiguous that I decided to accept either answer.


*Part Three: Short answer questions

(4 points) Assume that the quantum allocated to a process
is 2 ms, and that each time the timer exipires it takes the OS
35 s to handle the interrupt, select a process and start
it.  How much overhead would be imposed on a single process
running by itself?

	35 us out of every 2 ms + 35 us = 1.72 %

	So many people misinterpreted this question that I
	decided not to count it.

(4 points) Linux implements fork using copy-on-write.
Other systems provide versions of fork that copy all the pages
in the parent's address space at the time of the fork.  Name
one advantage of each scheme.

    Advantages of copy-on-write: fork time is reduced, child
    can begin execution sooner, any pages that are never
    copied yield net benefit.

    Advantages of copy-at-fork-time: simpler to implement,
    no need to incur overhead of future traps.

    Note: neither one _necessarily_ yields less overhead;
    it depends on how many of the pages are eventually copied.

(4 points) How does a virtual address system facilitate
the implementation of copy-on-write?

    By marking a page read-only you can force future
    writes to cause a trap, giving the OS a chance to
    copy the page, update the page table, and return
    control to the process.

(4 points) How can you tell whether an operation like
x++ is atomic?  Why might it be important to know?

    The only way to tell is by looking at the implementation
    or documentation, or possibly by testing it many times
    for non-atomic behavior (although this is a one-way test).

    It is important to know you have to know in order to tell
    whether a program that contains x++ is deterministic.

(4 points) What is the difference between a process
and a thread?

    Processes have their own address spaces.  Threads share
    part of their address spaces, usually the text, static
    segment and heap.

(4 points) Name one advantage and two disadvantages of
user-level threads over kernel-level threads.

    This comes right out of the book, page 105

    Advantage of user-level: faster context switch (no kernel
                             intervention required)

    Disadvantages: unfair scheduling

                   one thread's I/O blocks the whole process


* Part Four: Synchronization

(4 points) Initial state:  x = 1

     Thread A                  Thread B
     --------                  --------
      x++                       x++
      x++                       print x

Output = 2, 3 or 4


(4 points) Initial state: x = 1, y = 3

     Thread A                  Thread B
     --------                  --------
      x = y                     y = x
      print x                   x = 7
      print y

Output = 11, 33, 71, 73

(4 points) Initial state: x = 5

     Thread A                  Thread B
     --------                  --------
      while (x == 5) {          x = 6
        // wait                 print x
      }
      x = 7
      print x

Output = 77, 67