cs357 Lecture Notes Spring 2000 Week 11, Thursday For today you should have finished homework 7 and the take home exam. Homework 8 is available now. It is due next Thursday 27 April. As always, you should meet with your partner as soon as possible! For next time you should read Chapter 8. You can skim Sections 8.1 through 8.4 -- they develop the idea of memory management gradually, but the stuff that is most applicable to current systems is in Sections 8.5, 8.6 and 8.7 As always, make sure you get through the chapter and don't get bogged down. Exam solutions -------------- CPU Scheduling -------------- 1) Give priority to a process that has accumulated the least amount of total CPU. A process that has not run for long is expected to complete soon, so this heuristic is an approximation of SJF. As such, we expect it to yield good average turnaround time. Also, short processes are more likely to be interactive processes, so this heuristic might help with response time. 2) Give priority to a process that has made I/O requests during most of its recent quanta. By allowing an I/O bound process to proceed, we maximize the chance of overlapping the I/O of one process with the CPU of another, so this heuristic is likely to help with CPU utilization. Also, I/O bound processes are more likely to be interactive jobs, so this heuristic might help with response time. 3) Give a larger quantum to a process that has accumulated a lot of CPU time. This reduces the context-switch overhead of a long process, which helps CPU utilization. It probably does not help with either of the other goals. Concurrency ----------- Assume that two threads are running concurrently with the following shared variables (and a shared output stream): A gets printed before B. Maybe A gets printed before D. True Thread 1 prints i before Thread 2 prints D. Maybe When Thread 1 prints i, it prints the value 5. False G gets printed before B. Maybe If G gets printed before D then the final value of i is 5. True If H gets printed before G then the final value of i is 7. False (the premiss does not imply the conclusion) Describe in words how s1 is being used. As a condition variable. Describe in words how s2 is being used. As a mutex. Threads ------- Is i a shared variable? Why or why not? No. It is a local varaible in entry, so it is allocated on the stack, and stacks are not shared between threads. At first it may seem odd that the program prints two different values, but both variables appear to be at the same location. One possibility is that we are seing the result of two consecutive writes to the same location. What is another possibility? The other possibility is that the two threads have their stack in the same virtual location, but that the addresses are mapped to different physical locations (and hence different values) What experiment could you do to determine which explanation is correct? Explain both what you would do and how you would interpret the results. You could print the values of both variable after both have been written. If they are still different, then there must be two locations with two values. There are two ways to do this. If you just print the variables repeatedly, eventually you will see that both writes have taken place. Or you can use a synchronization mechanism to make sure that both threads wait until both threads have finished the write. This kind of sync is known as a barrier, and it can be implemented using sempahores thusly: Semaphore bar1 = 0; Semaphore bar2 = 0; Thread 1 Thread 2 -------- -------- int i = random (); int i = random (); signal (bar1); wait (bar1); wait (bar2); signal (bar2); print the value of i print the value of i This is not the most efficient thing in the world, but it creates the desired effect -- neither thread can proceed until both have reached the barrier. Challenge for the reader: write a general barrier that works for n threads such that none can proceed until all n reach the barrier, and then all n can proceed. Memory management ----------------- Physical memory systems old PC OS, some current Crays 1) Contiguous Allocation 2) Loading and linking 3) Swapping Virtual memory systems just about everything else 1) Frame allocation (page replacement policy) 2) Loading becomes trivial 3) Swapping is subsumed by page replacement 4) Address translation Loading and linking ------------------- loading = bringing a program from disk into memory linking = resolving the addresses that are embedded in the executable file Some linking happens at compile time. Some might have to be left until load time. Compiler (which generates the executable) and the loader (which is part of the OS) have to share a data structure that identifies any addresses that have to be resolved. Contiguous allocation --------------------- Hard problem, exactly analogous to heap allocation (malloc). As processes start and complete, they tend to leave lots of small, unused spaces between the allocated space. (see page 253) External fragmentation: available space is broken up into little bits such that there is not enough _contiguous_ space for a request. Strategies 1) first-fit 2) best-fit 3) worst-fit Is it feasible to move a program after it has started running? Depends on whether it contains absolute or relative addresses for its data references and jumps. Also depends on what information from the executable we can access at run time. Variations on the theme ----------------------- Swapping: while a process is suspended, we might write its memory to disk and then copy it back, either in the same location or elsewhere. Overlays: we might not have to have the entire program in memory; we might go get the rest of it later, or have chunks of code that occupy the same space alternately Segmentation: we might require contiguous allocation within each segment, but there is no reason to require the segments to be contiguous (Why is it better to allocate lots of little pieces?) Virtual memory systems ---------------------- No need for contiguous allocation, so external fragmentation goes away, but... fixed size pages => internal fragmentation Loading becomes trivial, since all (virtual) addresses can be resolved at compile time. We know at compile time exactly where in VM the program will reside, and we don't care where in PM. Address translation (review) ------------------- CPU generates virtual (logical) addresses MMU translates to physical addresses and presents translated address to the memory system. VA broken into a page number and offset within page. Page size --------- Page size determines how many bits are in the offset. 1K = 10 bits 4K = 12 bits How big is the page table for a 32-bit address space? 1K = 22 bits of page number = 2^22 pages = 4 million 4K = 20 bits of page number = 2^20 pages = 1 million Bigger pages -> smaller page tables Bigger pages -> more internal fragmentation (on average 1/2 page per segment per process) Page table representation ------------------------- Even 1 million entries is too much. Assuming each entry is 4 bytes, that's 4 MB just for the page table! Fortunately, most processes don't use most of memory most of the time. Solution: two level page table Break VAS into 2^10 books. Each book contains 2^10 pages. Each page contains 2^12 bytes. Level one: find the right book 2^10 4-byte entries = 4K Level two: find the right page 2^10 4-byte entries = 4K The book table fits on a page! One part of the page table fits on a page! Convenient to put page tables in physical memory. (or maybe even in virtual memory) Most books are completely empty. Probably just two books for most processes; total of 3 pages for the page table! Access time ----------- Every memory access (cache miss) requires: 1) read an entry from the book table (in memory) 2) read an entry from the page table (in memory) 3) read the value from memory We just tripled the memory access time! Aaarg! Fortunately, we have our old friend locality. Cache address translations!!! Address translation cache = translation lookaside buffer (TLB) TLB entry tag = 20 bit book number/page number data = physical page number TLB usually fully associative, to minimize contention, maximize hit rate, because hit rate is going to be very important for performance. (See the problem from the book). TLB behavior ------------ How many entries will a typical process have? 1 MB of allocated memory = 256 4K pages How many bits to name a physical page? 128 MB = 15 bits of physical page name How big is a TLB entry? 20 bits + 15 bits + flags = roughly 40 bits = 5 bytes How much TLB space per process? 5 bytes * 256 pages = 1-2 KB What happens when we switch processes? Maybe have to flush the TLB! Can we avoid that? Paging ------ Do all of a process's pages have to be in memory for the process to run? Nope. Page table entry has two possible results. Either: 1) a physical address, in which case address translation proceeds and the process continues running OR 2) a location on disk, in which case a) the process blocks b) the OS finds a free frame c) the OS issues a disk read d) when the read completes, the OS updates the page table entry e) the process resumes and reissues the instruction This process is called a page fault. Demand paging ------------- Like lazy swapping. Instead of moving all of a process's pages back into memory, we leave them on disk until the first time they are needed. Con: page faults take a long time (8 ms); screw up interactive response Pro: many pages never loaded! e.g. rare error-handling code (Implications for code generation?) Page replacement ---------------- What if a page fault occurs and there are no physical pages available? We have to kick someone out! b1) OS selects a frame b2) OS issues a disk write b3) when write completes, OS resumes page fault handler Optimizations 1) keep at least one free page so you can overlap writing the old page and reading the new one 2) if the victim page has not been modified (and is already on disk), you don't have to write it Page replacement algorithms ---------------------------