cs357 Lecture Notes Spring 2000 Week 4, Thursday There was no reading assignment for today, but you should be ready to hand in Homework 2. The reading assignment for next time is Chapter 4, Sections 4.1 through 4.4. Homework 3 will be distributed soon, and will be due next Thursday. Topics for today ---------------- 1) exam review 2) homework 2 3) Chapter 4 intro Homework 2 ---------- Start with a model of the system you are trying to explore. Use the model to predict the behavior of a simple experiment. Run the experiment and see what you get. Anomalies? 1) add to the model 2) make new "prediction" 3) test the new prediction independently Once you have a model that can explain the observed phenomena, you are ready to run a more complicated experiment. Model #1 -------- Simple two-level cache system, block size = word size Prediction: 1) L1 speed up to L1 cache size 2) L2 speed up to L2 cache size 3) memory speed thereafter Actual: not even close Blocks > words => graceful degradation as cache size is exceeded Model #2 -------- L1 block size = L2 block size Prediction: 1) as stride increases, performance degrades until stride = block size, then stops degrading Actual: pretty good agreement; we can infer block size One anomaly: 1) Performance starts to degrade before cache size is exceeded! (add associativity to the model) Everything explained (more or less). Complicate the experiment! Big strides ----------- Prediction: when stride >= 2 * block size, it should delay the onset of bad performance! Actual: 1) unexpected effect: performance of L2 cache is getting worse! 2) delayed onset doesn't appear until much bigger than the L1 block size What is still missing from this model? 1) no discussion of write-through/write-back behavior Exam ---- If we start with the r/w head in the middle of the disk (over the 20th track), and a request comes in to read a random track from somewhere on the disk, how long, on average, do we expect the seek time to be? 10 ms If the disk is rotating at 5400 rpm, how long do we expect the rotational latency to be, on average? 5400 rpm = 90 rps average rotational latency = 1/2 rev = 1/180 s If the transfer rate between the disk and memory is 20 MB/s, how big would the block size have to be to make the transfer time equal 400 $\mu$s? 20 KB would take 1 ms, or 2.5 time the allotted time. So we only have time for 8 KB. When the disk is idle, where would you place the r/w heads in order to minimize the average seek time of the next request that arrives? At the 1/4 and 3/4 points. What would the average seek time be for this two-head system? 5 ms In a real system, would you expect the average seek time between consecutive requests to be higher or lower than the previous answer? Explain why in a sentence or two. Lower because the sequence of disk accesses is likely to have high spatial locality, meaning that many seeks move a small number of tracks. If we expect an I/O operation to take a long time, it is generally a good idea to handle it asynchronously. True Some people type so fast that the CPU spends more than 10% of its cycles just handling keystrokes. False, not even close. Timesharing is similar to multiprogramming, except that it switches between processes more often in order to minimize response time. True While a DMA device is accessing memory, the bus is busy and the CPU cannot continue processing. False. The bus may be busy, but the CPU can chug away. Given a fixed number of transistors, you can generally build a bigger set-associative cache than you can build a fully-associative cache. True. The tags are smaller and the machinery is simpler. That leaves more space for data. Getting a block of data from the memory of another machine on a LAN is typically faster than getting a block of data from local disk. True. See the notes for an estimate. A typical CPU can execute more than 200 instructions in the time it takes to fetch a value from memory. No way. Memory access times are like 60 ns. Cycle times are like 2 ns, so you can do about 30 instructions per memory access. Every time a trap or interrupt occurs, the mode bit must be switched to kernel mode. What would happen if this mechanism failed? Kernel code would be executing in user mode, meaning that if it tried to execute a priviledged instruction, another trap would occur, and the OS would probably fail. Why do I/O devices have to be protected by the OS? Name two bad things an evil user could do, given direct access to an I/O device. 1) hog the device 2) make uncoordinated access to the device Imagine a machine that implements memory protection using base and limit registers as described in the book. Also imagine that it has a trap/interrupt vector that is stored in registers on the CPU. Finally imagine that there are special instructions to write values in those registers, and that these instructions are not priviledged. Explain exactly how an evil user could violate interprocess protection and read/write data in the memory space of another process. 1) change one of the vector entries to point to user code 2) wait for the corresponding interrupt/trap to happen 3) when it does, the evil user code will be executing in kernel mode 4) change the base and limit registers 5) access all the data you want! Boo-wa-ha-ha! Name one advantage of interrupts over polling and one advantage of polling over interrupts. Interrupts are more efficient, since the CPU does not waste cycles checking on pending operations. Polling is simpler to implement. This section is based on a cache-testing program like the one in HW2. You can assume that the array size, stride, and block size are all measured in 4-byte words. Assume: The size of the L1 cache is 4 KW (4096 words), and the access time is 10 ns. The size of the L2 cache is 32 KW, and the access time is 30 ns. The size of the blocks that are transferred from L2 to L1 cache are 8 words. The access time for memory is 60 ns. If the size of the array is 2 KW, how long does it take to traverse the array, reading and writing each word ONCE? After the first traversal, how long do each of the subsequent traversals take? It fits entirely in L1 cache, so there are 2048 reads and 2048 writes at 10 ns each = 40960 ns = 40.96 us If we increase the size of the array to 8 KW, how long does it take to traverse the array (ignoring the first traversal)? What if keep the array size at 8 KW, but we increase the stride to 2 W; in other words, we only access every other byte? Chapter 4 intro --------------- Processes, threads, interprocess communication. Processes --------- What's the API? What does the OS implementation look like? What does the HW interface look like (implementation of context switching)? Threads ------- What's the difference between a thread and a process? Again, what are the interfaces (API, HW) and implementation issues? Interprocess communication -------------------------- Implicit -- through shared memory I write, you read, we've communicated Explicit -- we create a socket and pass messages (underlying mechanism might be the OS, passing notes between processes on the same machine, or a network, passing between processes on different machines).