cs357 Lecture Notes
Spring 2000
Week 4, Thursday

There was no reading assignment for today, but you
should be ready to hand in Homework 2.

The reading assignment for next time is Chapter 4,
Sections 4.1 through 4.4.

Homework 3 will be distributed soon, and will be due
next Thursday.

Topics for today
----------------

1) exam review

2) homework 2

3) Chapter 4 intro


Homework 2
----------

Start with a model of the system you are trying to explore.

Use the model to predict the behavior of a simple experiment.

Run the experiment and see what you get.

Anomalies?

1) add to the model

2) make new "prediction"

3) test the new prediction independently

Once you have a model that can explain the observed
phenomena, you are ready to run a more complicated
experiment.


Model #1
--------

Simple two-level cache system, block size = word size

Prediction:

1) L1 speed up to L1 cache size

2) L2 speed up to L2 cache size

3) memory speed thereafter

Actual: not even close

Blocks > words  =>  graceful degradation as cache size is
                    exceeded


Model #2
--------

L1 block size = L2 block size

Prediction:

1) as stride increases, performance degrades until
   stride = block size, then stops degrading

Actual: pretty good agreement; we can infer block size

One anomaly:

1) Performance starts to degrade before cache size is exceeded!

   (add associativity to the model)

Everything explained (more or less).

Complicate the experiment!


Big strides
-----------

Prediction: when stride >= 2 * block size, it should delay
the onset of bad performance!

Actual:

1) unexpected effect: performance of L2 cache is getting
   worse!

2) delayed onset doesn't appear until much bigger than the
   L1 block size


What is still missing from this model?

1) no discussion of write-through/write-back behavior
 

Exam
----

If we start with the r/w head in the middle of the
disk (over the 20th track), and a request comes in to read
a random track from somewhere on the disk, how long, on
average, do we expect the seek time to be?

	 10 ms

If the disk is rotating at 5400 rpm, how long do we
expect the rotational latency to be, on average?

       5400 rpm = 90 rps

       average rotational latency = 1/2 rev = 1/180 s


If the transfer rate between the disk and memory is
20 MB/s, how big would the block size have to be to make
the transfer time equal 400 $\mu$s? 

    20 KB would take 1 ms, or 2.5 time the allotted time.

    So we only have time for 8 KB.

When the disk is idle, where would you place the r/w
heads in order to minimize the average seek time of the next
request that arrives?

	At the 1/4 and 3/4 points.

What would the average seek time be for this two-head
system?

	5 ms

In a real system, would you expect the average seek time between
consecutive requests to be higher or lower than the previous answer?
Explain why in a sentence or two.

	Lower because the sequence of disk accesses is likely
	to have high spatial locality, meaning that many seeks
	move a small number of tracks.


If we expect an I/O operation to take a long time,
it is generally a good idea to handle it asynchronously.

   True

Some people type so fast that the CPU spends more
than 10% of its cycles just handling keystrokes.

   False, not even close.

Timesharing is similar to multiprogramming, except that
it switches between processes more often in order to
minimize response time.

   True

While a DMA device is accessing memory, the bus is
busy and the CPU cannot continue processing.

   False.  The bus may be busy, but the CPU can chug away.

Given a fixed number of transistors, you can generally
build a bigger set-associative cache than you can build a
fully-associative cache.

   True.  The tags are smaller and the machinery is simpler.
   That leaves more space for data.

Getting a block of data from the memory of another machine
on a LAN is typically faster than getting a block of data from
local disk.

   True.  See the notes for an estimate.

A typical CPU can execute more than 200 instructions in
the time it takes to fetch a value from memory.

   No way.  Memory access times are like 60 ns.  Cycle
   times are like 2 ns, so you can do about 30 instructions
   per memory access.


Every time a trap or interrupt occurs, the mode bit
must be switched to kernel mode.  What would happen if
this mechanism failed?

     Kernel code would be executing in user mode, meaning
     that if it tried to execute a priviledged instruction,
     another trap would occur, and the OS would probably
     fail.

Why do I/O devices have to be protected by the OS?
Name two bad things an evil user could do, given direct
access to an I/O device.

       1) hog the device
       2) make uncoordinated access to the device

Imagine a machine that implements memory protection using
base and limit registers as described in the book.  Also
imagine that it has a trap/interrupt vector that is stored
in registers on the CPU.  Finally imagine that there are
special instructions to write values in those registers,
and that these instructions are not priviledged.

Explain exactly how an evil user could violate interprocess
protection and read/write data in the memory space of another
process.

   1) change one of the vector entries to point to user code
   2) wait for the corresponding interrupt/trap to happen
   3) when it does, the evil user code will be executing in
      kernel mode
   4) change the base and limit registers
   5) access all the data you want!  Boo-wa-ha-ha!

Name one advantage of interrupts over polling and one
advantage of polling over interrupts.

   Interrupts are more efficient, since the CPU does not waste
   cycles checking on pending operations.

   Polling is simpler to implement.


This section is based on a cache-testing program like
the one in HW2.  You can assume that the array size,
stride, and block size are all measured in 4-byte words.
Assume:


The size of the L1 cache is 4 KW (4096 words),
and the access time is 10 ns.

The size of the L2 cache is 32 KW, and the access time is
30 ns.

The size of the blocks that are transferred from
L2 to L1 cache are 8 words.

The access time for memory is 60 ns.


If the size of the array is 2 KW, how long does it
take to traverse the array, reading and writing each word
ONCE?

After the first traversal, how long do each of the
subsequent traversals take?

    It fits entirely in L1 cache, so there are 2048 reads
    and 2048 writes at 10 ns each = 40960 ns = 40.96 us

If we increase the size of the array to 8 KW, how
long does it take to traverse the array (ignoring the first
traversal)?

What if keep the array size at 8 KW, but we increase
the stride to 2 W; in other words, we only access every
other byte?


Chapter 4 intro
---------------

Processes, threads, interprocess communication.

Processes
---------

What's the API?

What does the OS implementation look like?

What does the HW interface look like (implementation of
     context switching)?


Threads
-------

What's the difference between a thread and a process?

Again, what are the interfaces (API, HW) and implementation issues?


Interprocess communication
--------------------------

Implicit -- through shared memory
	    I write, you read, we've communicated

Explicit -- we create a socket and pass messages

	 (underlying mechanism might be the OS, passing notes
	 between processes on the same machine, or a network,
	 passing between processes on different machines).