cs357 Lecture Notes Spring 2000 Week 2, Thursday For today you should have read Sections 2.1 through 2.4 and prepared an answer for Question 2.4 Last time we got up through Section 2.2 and talked about all the possible ways I/O can be done For next time, read Sections 2.5 - 2.7. Read the questions at the end of the chapter, but you don't have to write answers for them. We'll have a quiz that will include a question from material we have already covered in lecture and a question from the reading. Section 2.2 ----------- How does data get from the CPU to somewhere else? 1) CPU talks to the controller in one of a two possible ways a) special instruction b) psuedo-memory access (memory-mapped I/O) 2) CPU might wait for the controller to finish before proceeding (synchronous I/O) or go off and do something else (asynchronous). 3) If the operation produces a result, the CPU might have to poll (ask repeatedly) to see if it is done, or the device might generate an interrupt when it is done. 4) If the destination of the result is memory, we have another choice. The CPU can explicitly move the data from the device into memory, or the device might move it into memory without interrupting the CPU. The latter is called DMA -- direct memory access. We talked about 2-4 last time. The one we left out was the first one. Question 2.4: For what types of operations is DMA useful? What are the two axes along which we categorize I/O devices? Section 2.3 ----------- CPU stores a small amount of information (32 words) in registers. Additional information (128 KB) on chip in cache. More information (1 MB) in an off-chip cache. More information (128 MB) in main memory. More information (10 GB) on a hard drive. Notice that the abstract picture of the memory hierarchy is not necessarily similar to our model of the system structure. Each layer of the memory hierarchy 1) bigger than the previous 2) slower than the previous 3) cheaper per unit of storage than the previous Let's put some numbers in the table to help start calibrating our instincts for system performance. Device Access time Typical size* Cost/byte register 2 ns 128 B hard to quantify on-chip cache 10 ns 128 KB hard to quantify off-chip cache 20 ns 1 MB main memory 60 ns 128 MB disk 8 ms 8 GB * on a current single-processor, $1000 workstation Where does that huge gap come from, between memory and disk? That's the difference between mechanical things and electronic things. Electrical signals move faster than big metal things. Magnetic disks -------------- Time to get something from disk = time to move the arm from track to track (2-18 ms) + time to wait for the data to come under the head (0-16 ms) + time to transfer the data (block size / transfer rate) (512 B / 20 Mbps = 200 us) 512 B is the amount of data in each sector on the drive, but it doesn't have to be the unit of transfer between the drive and memory (the block size). What do we think of the block size for this drive? How can we speed up seek time? How can we speed up rotational latency? Is it important to speed up transfer time? Consequences ------------ 1) Why bother moving programs from disk into memory? Why not run programs from disk? Disks are too damn slow. 2) In that case, why not keep everything in memory all the time? a) too small, but that's not a good answer because the natural question is _why_ is it too small b) too expensive to have enough memory to do that** c) too volatile Desperate times --------------- Both memory and disk are getting bigger over time. Memory has been growing faster, so the size discrepancy has shrunk (and will probably continue) The performance gap has not changed at all. People are becoming increasingly desperate to compensate for the inadequacy of disks: 1) In effect, people do keep all their programs resident in memory. Only data has to go to disk. 2) Frequently-used data gets cached on disk (although we will examine some weirdnesses about this design later). 3) Non-volatile memory has been the next-great thing for a long time. 4) Lots of attempts to find something to fill this gap. Has to be a) faster than disk b) cheaper per unit than memory 5) One possibility is to harvest unused memory in idle machines. What is the tranfer time of a 4 KB block on a 100 Mbps Ethernet? 200 us + 4 KB / 100 Mbps = 200 us + 320 us = 0.5 ms About a factor of 10 better than disk, and FREE !! Caching ------- Definition: moving data up a level in the storage hierarchy, usually temporarily There are lots of variations on the idea that appear in different places. who? user explicit (tape to disk) user implicit (execute a program) compiler (memory to registers) system (virtual memory, disk to memory) hardware (L1 and L2 caches) what? how much data gets moved at a time depends on the performance characteristics of the two levels involved, the data transfer rate between them, and the access patterns 1) bigger performance gap? get more data. 2) slower transfer rate? get more data. 3) predictable access pattern? get more data. when? on demand or before (prefetching) where? (a) everywhere in the hierarchy (b) cache policy -- who goes where in the cache, who gets kicked out first why? Locality, locality, locality. 1) temporal locality: use it recently, use it soon library books, data, text suggests policies like LRU, LFU 2) spatial locality: use me, use my neighbor library books, data, text suggests policies like prefetching, large block size how? What are the implementation difficulties: 1) reliability: well, you think you wrote it, but you didn't! 2) consistency: modify a cached copy, when does the original get updated?