cs357 Lecture Notes
Spring 2000
Week 2, Thursday

For today you should have read Sections 2.1 through 2.4
and prepared an answer for Question 2.4

Last time we got up through Section 2.2
and talked about all the possible ways I/O can be done

For next time, read Sections 2.5 - 2.7.
Read the questions at the end of the chapter, but you don't
have to write answers for them.

We'll have a quiz that will include a question from material
we have already covered in lecture and a question from the
reading.

Section 2.2
-----------

How does data get from the CPU to somewhere else?

1) CPU talks to the controller in one of a two possible ways
   a) special instruction
   b) psuedo-memory access  (memory-mapped I/O)

2) CPU might wait for the controller to finish before
   proceeding (synchronous I/O) or go off and do something
   else (asynchronous).

3) If the operation produces a result, the CPU might have to
   poll (ask repeatedly) to see if it is done, or the
   device might generate an interrupt when it is done.

4) If the destination of the result is memory, we have another
   choice.  The CPU can explicitly move the data from the
   device into memory, or the device might move it into
   memory without interrupting the CPU.  The latter is called
   DMA -- direct memory access.


We talked about 2-4 last time.  The one we left out was the
first one.

Question 2.4: For what types of operations is DMA useful?

What are the two axes along which we categorize I/O devices?


Section 2.3
-----------

CPU stores a small amount of information (32 words)
in registers.

Additional information (128 KB) on chip in cache.

More information (1 MB) in an off-chip cache.

More information (128 MB) in main memory.

More information (10 GB) on a hard drive.


Notice that the abstract picture of the memory hierarchy
is not necessarily similar to our model of the system
structure.


Each layer of the memory hierarchy

1) bigger than the previous

2) slower than the previous

3) cheaper per unit of storage than the previous


Let's put some numbers in the table to help start calibrating
our instincts for system performance.


Device	      Access time	Typical size*		Cost/byte

register         2 ns		  128 B			hard to quantify

on-chip cache   10 ns		  128 KB		hard to quantify

off-chip cache  20 ns		    1 MB		

main memory     60 ns             128 MB                

disk		 8 ms		    8 GB


* on a current single-processor, $1000 workstation


Where does that huge gap come from, between memory and disk?

That's the difference between mechanical things and electronic
things.  Electrical signals move faster than big metal things.


Magnetic disks
--------------

Time to get something from disk =

    time to move the arm from track to track  (2-18 ms)

+   time to wait for the data to come under the head (0-16 ms)

+   time to transfer the data (block size / transfer rate)

                          (512 B / 20 Mbps = 200 us)

512 B is the amount of data in each sector on the drive,
but it doesn't have to be the unit of transfer between
the drive and memory (the block size).

What do we think of the block size for this drive?

How can we speed up seek time?

How can we speed up rotational latency?

Is it important to speed up transfer time?


Consequences
------------

1) Why bother moving programs from disk into memory?
   Why not run programs from disk?

   Disks are too damn slow.


2) In that case, why not keep everything in memory all
   the time?

   a) too small, but that's not a good answer because the
      natural question is _why_ is it too small

   b) too expensive to have enough memory to do that**

   c) too volatile


Desperate times
---------------

Both memory and disk are getting bigger over time.

Memory has been growing faster, so the size discrepancy
has shrunk (and will probably continue)

The performance gap has not changed at all.

People are becoming increasingly desperate to compensate
for the inadequacy of disks:

1) In effect, people do keep all their programs resident
   in memory.  Only data has to go to disk.

2) Frequently-used data gets cached on disk (although we
   will examine some weirdnesses about this design later).

3) Non-volatile memory has been the next-great thing for
   a long time.

4) Lots of attempts to find something to fill this gap.
   Has to be

   a) faster than disk
   b) cheaper per unit than memory

5) One possibility is to harvest unused memory in idle machines.

   What is the tranfer time of a 4 KB block on a 100 Mbps Ethernet?

   200 us + 4 KB / 100 Mbps = 200 us + 320 us = 0.5 ms

   About a factor of 10 better than disk, and FREE !!


Caching
-------

Definition: moving data up a level in the storage hierarchy,
	    usually temporarily

There are lots of variations on the idea that appear in
different places.

who?	  user explicit (tape to disk)
	  user implicit (execute a program)
	  compiler	(memory to registers)
	  system	(virtual memory, disk to memory)
	  hardware	(L1 and L2 caches)

what?	  how much data gets moved at a time depends on
	  the performance characteristics of the two levels
	  involved, the data transfer rate between them,
	  and the access patterns

	  1) bigger performance gap?  get more data.
	  2) slower transfer rate?  get more data.
	  3) predictable access pattern?  get more data.

when?	  on demand or before (prefetching)

where?	  (a) everywhere in the hierarchy

	  (b) cache policy -- who goes where in the cache,
	                      who gets kicked out first

why?      Locality, locality, locality.

	  1) temporal locality: use it recently, use it soon
	     library books, data, text
	     suggests policies like LRU, LFU

	  2) spatial locality: use me, use my neighbor
	     library books, data, text
             suggests policies like prefetching, large block size

how?	  What are the implementation difficulties:

	  1) reliability:  well, you think you wrote it, but you
	                   didn't!
          2) consistency:  modify a cached copy, when does the
			   original get updated?