cs357 Lecture Notes Spring 2000 Week 6, Thursday For today you should have read the handout about threads and finished reading Chapter 4. Also, you should have picked up homework 4, compiled and run the code, printed documentation, and done the first experiment. For next time you should finish assignment 4 and prepare for the second exam, which is next Tuesday! Topics for today ---------------- 1) Threads, completed. 2) Homework questions? 3) Virtual memory. Atomicity --------- Imagine a threaded program with the initial state x = 1 (x is a shared variable). Thread 1 Thread 2 -------- -------- x++ x++ It seems like it doesn't matter what order these happen in; either way, x gets incremented twice. But it turns out that when this code gets compiled, it looks more like: Thread 1 Thread 2 -------- -------- t1 <- x t2 <- x t1 <- t1 + 1 t2 <- t2 +1 x <- t1 x <- t1 Now it is not clear whether x will get incremented once or twice! This example show that in order to understand syncronization, we have to understand which operations are atomic, and which are subject to interruption. If the hardware provides an instruction that increments the value of a memory location, as the VAX did, then it is usually not possible to interrupt in the middle. If the hardware is a register machine that has to load, modify and store, then it usually is possible. But if the compiler had made x a register variable, then it would not be possible!!! Here's another unpleasant hardware evilness. Reading and writing a word is atomic, but on some machines reading or writing a double is not! Thread 1 Thread 2 -------- -------- x = -1.0 x = 2.0 It is possible for this code to result in x = -2.0! Assignment 4, experiment 1 -------------------------- How do all these stacks get packed into an address space? How can we find out? Assignment 4, experiment 2 -------------------------- The race is on! Thread 1 Thread 2 -------- -------- i = n pthread_create while (i < 2n) while (i > 0) i++ i-- print "parent wins!" print "child wins!" This is an example of unsynchronized access to a shared variable. Unsynrchonized means we cannot make a statement about the relative order of statements in different threads. Synchronization constraints are claims that we can make about the order of statements in different threads. Practice exam ------------- Here are the solutions for the practice exam. I strongly recommend that you read the practice exam and think hard about the questions before you look at the answers. True or False Creating a thread is faster than creating a process. Usually true. One way for processes to communicate with each other is through direct memory access. False. DMA is a means of I/O With a virtual memory system, user programs may access any location in physical memory. False. On some systems, a process's page tables are stored in the process's address space. False. If that were true, you would not be able to find a page that was paged out. As the quantum size approaches zero, round-robin scheduling becomes the same thing as FIFO scheduling. False. But we haven't covered this yet. If two threads make concurrent accesses to the same variable, the results of the program will be nondeterministic. Not if both accesses are reads. In 1997, the Manila stock exchange outperformed the Bangkok exchange. We didn't cover this this year. (4 points) On a parallel processor, a given process might not run on the same processor all the time, but it is likely to run faster if it does. Why? Cache affinity. As it runs, it accumulates state in cache. If it switches to another processor, it loses it. (4 points) Explain two reasons why a process's CPU time might be less than its wall clock time. I spends time blocked on I/O; it spends time waiting for other processes to run. (4 points) What two events might occur that would require the scheduler to choose a new process to run? The process blocks on I/O; the process terminates. (4 points) Name two other events that might occur that might inspire the scheduler to preempt a running process and start another process. The timer expires; some other interrupt occurs. (5 points) If context switches are very expensive, there are circumstances when the best decision for the scheduler is to leave the CPU idle, even if there are ready processes. Why? If you know that an interrupt is going to occur soon, it might not be worth starting a process. Virtual memory -------------- Virtual memory and page tables have come up several times. I have been finding it awkward to talk around them, so I want to take a minute to explain. In most systems, there is special hardware on the CPU that translates the memory addresses generated by running programs from VIRTUAL ADDRESS: an address relative to the address space of a specific process to PHYSICAL ADDRESS: an address relative to the beginning of physical memory The unit that does this translation is sometimes called the memory management unit (MMU). The size of the VAS is determined by the size of the addresses. 32-bit addresses yield a VAS of 2^32 bytes, which is 4 GB. Clearly, that is bigger than most physical memories (although not for long), which means that not all virtual memory locations are in memory at the same time. There are three possible fates for a virtual memory location: 1) it might be in memory 2) it might be on disk 3) it might not be allocated yet, so it might not exist anywhere (example: all the space between the heap and the stack) Most of the time, running processes occupy less than 4 MB of memory, which is 1/1024 of the the available address space. That's why, most of the time, you can fit all the _allocated_ memory for multiple processes in physical memory. Pages ----- The way this all works is that both VAS and PM are divided into pages that are the same size. Each allocated virtual page has to be put somewhere in PM, but the nice thing is that it doesn't matter where. The pages associated with a process don't have to be in contiguous physical memory! The MMU translates VA to PA by 1) break the address into two pieces, a page number and an offset within that page 2) look the page number up in a table, and find the physical page (sometimes called a frame) where it resides 3) attach the offset to the physical page number... that's the address that gets sent to the memory unit Example ------- 8-bit addresses, 64 byte pages/frames How many pages in VAS? Assume there are 32 frames in physical memory. How big is a physical page number? Translation: 1) think of a VA as a 2-bit page number and a 6-bit offset 2) translate the page number into a 5-bit physical page number 3) append the 6-bit offset and send it to the memory unit In real life, with big addresses, we have a couple of problems: 1) page tables are too big 2) translation is too slow. Implementation/interface ------------------------ Later, we will look at the implementation details that solve these problems. For now, the important thing is to know the capabilities this interface provides: 1) processes can allocate new pages any time there are physical pages available 2) if a process tries to access an unallocated page, a trap occurs 3) a page can be marked read only, in which case a write access will cause a trap a) useful for implementing copy on right b) useful for implementing shared text 4) the virtual address spaces of different processes overlap, but when one process generates address X, it maps to a different location than when another process generates address X