cs357 Lecture Notes Spring 2000 Week 3, Thursday For today you should have read Chapter 3, and also picked up the HW2 code, compiled and run it, and started to think about what to do with the data. For next time you should work on the practice exam, finish hw2, and get ready for the exam. Also, the first exam is next Tuesday! It will cover Chapters 1-3 and the first homework. It may not take the whole class period. Interprocess protection (continued) ----------------------- I/O protection -------------- Special instructions? Make them priviledged. Memory-mapped I/O? Protect that part of memory. Trap/interrupt vector? Protect the instructions that write it. Memory protection ----------------- Book gives an example using a memory scheme with special registers for base and bounds. This is not a common scheme compared to virtual memory, but it makes for a simple example of how to use dual-mode operation to build memory protection. Before starting a user process, OS 1) sets the base and bound registers (remember that registers are on chip. setting registers is usually an instruction in the ISA. It would have to be protected. 2) sets the mode bit to USER 3) jumps to a known location in the user code Is it ok for the user process to _read_ the base and bound registers (assuming there is an operation that does it)? Every time the user code makes a memory access, the hardware checks the base and bounds, causing a trap if the address is out of bounds. CPU protection -------------- How do we make sure the user code eventually returns control to the OS? One way is by making a system call. Since the address of the system call is in system memory, trying to jump there causes a trap. Another way is if an interrupt occurs while the user code is running. But what if neither one happens for a really long time? What we need is the ability to generate an interrupt at a given future time. The timer does that. The OS can write a value into the timer that is decremented in real time. When it gets to zero, the hardware causes an interrupt, which returns control of the CPU to the OS. What does the OS do if there are no other processes to run? Memory caches reviewed ---------------------- Direct-mapped: For every memory location, there is exactly one place in the cache it can go. For every cache location, there are many possible values that might be there. The tag indicates which of the possible values is there. If two memory locations collide (map to the same cache location) only one of them can be resident at a time. Fully-associative: Any memory location can go anywhere in the cache. The tags have to be bigger, because there are more possible values that might be in any location. If the cache is full, we have to make a decision about which value to remove (usually random or a coarse approximation of LRU) Collisions are less deterministic, pathological behavior rarer, overall performance better. Price: bigger tags, more complicated circuitry, smaller for a given number of gates. Set associative: Multiple small fully-associative caches. Direct mapping from memory locations to caches, but multiple locations within each cache where the value can go. Experimental design ------------------- What we want to know: 1) how big are the L1 and L2 caches? 2) what is the block size for transferring things from memory to L2, L2 to L1, memory to L1? 3) what is the associativity of the caches? Eventually, I will ask you to design the experiment for yourself, but this time I am giving you the experimental apparatus and asking you to interpret the data. Here's how you should go about it. 1) Start with a simple experiment and figure out what behavior you expect in theory. 2) Look at the real data and compare it with your theoretical expectations. 3) Gradually make the experiment more complicated. Simple experiment ----------------- Create a big array and traverse it many times, reading and writing each word (4 bytes) once per traversal. Calculate the average time per read/write (How to measure something small. Repeat many times and divide by the number of repetitions.) (How to handle loop overhead) What performance do you expect when the array is smaller than the L1 cache? What about when it is bigger than the L2 cache? What about in between? How would you plot the data in order to test this theory? Next experiment --------------- What happens when 1) the size of the array is bigger than the L1 cache, but the stride is two words? four words? What happens when the stride is the same as the block size? And the process repeats from there. Add something to the experiment, look at results, check for anomalies and see if you can explain them, repeat. This is, by the way, a good way to present the results, as a series of experiments like this.