Software Systems Spring 2005 For today, you should have: 1) finished Homework 2 2) read Tanenbaum: Processes and Threads Outline: 1) Homework 2 discussion 2) The process abstraction 3) Address space 4) CPU scheduling For next time you should: 1) Prepare for the exam. 2) read Tanenbaum pages 132 to 142, and answer the reading questions below. Quiz 2 ------ Not quite as good as the first one... avg = 7.5 2) seek time is _only_ the time to move the read/write head. rotational latency is the time for the data to come around. total latency = seek time + rotational latency + overhead See the example on page 684 of Hennessey and Patterson. 3) Many people misunderstood Question 3. In general, "data rate" means something pertaining to bandwidth. In this case, I was asking about the rate data could be read off the medium, which is the rate at which data passes under the read/write head. Think of it like a lathe tool peeling data off a spinning piece. I didn't grade this part of the quiz. 5) Moving a bigger block into cache only helps if there is _spatial_ locality. If there is only temporal locality, big blocks actually hurt. Workload reports: Of the 18 people who took the quiz, 7 reported that they are spending 6 - 7.5 hours per week 10 8 - 9.5 1 10 + So I think we are right on target; maybe a little low. Since you have additional time you could/should be allocating to this class, I will expect high quality homework, well presented, and I expect people to do the reading. The exam -------- Thinking about the exam (next Tuesday!) ... There will be short answer questions on the exam. Your answers should be responsive, correct, concise, clear and legible! Correct is usually good for about half credit. Resist the temptation to quote from the readings/notes. The quote is likely to be true, but very unlikely to be specifically responsive to the question, and sometimes context-sensitive. Also, part of the reason for these questions is to test your ability to express your ideas using the vocabulary of the class. Use vocabulary precisely, but speak plainly! Address your answer as if another student in the class had asked the question. The first exam will be only one hour. We will have class for the first 40 minutes, a ten minute break, and the exam for the last 60 minutes. The exam will include both completed topics and topics in progress. (Questions on topics in progress will be easier!) I love to introduce new things on an exam, for two reasons: 1) it makes the exam a learning experience, 2) it tests your ability to generalize from the examples we have seen and apply abstract principles to concrete examples. Homework 2 ---------- Review of results. Some thoughts about experimental design, data analysis and data vizualization: 1) We often take the axes for granted, but often the choice of parameter space is a crucial part of experimental design. Have to adjust the range for size and stride. Log transform is a _really_ good idea. 2) Cartooning data is important for figuring out what to expect. 3) Start with something simple enough to understand, and add complexity gradually. 4) Balance between enough data to mitigate noise, and too much data to understand. Complications: 1) when the stride is much larger than the block size, the program is only using a small part of the array, and the part it uses might fit into cache, even if the array doesn't. 2) For some reason, the smallest array sizes see L2 latencies! Reading questions from Tanenbaum -------------------------------- 1) What is the difference between a program and a process? 2) If a program contains two consecutive statements, A and B, how much real time will exlapse between the executation of A and the execution of B? 3) At the bottom of page 73, the list of "events that cause processes to be created" is nonsense. Why? 4) What's a daemon? 5) How do processes end? 6) In UNIX, what is the first process to execute, and how does it get loaded? 7) What states can a process be in, and what events cause a process to transition from one state to another? 8) What are the entries in the process table? What information is stored in an entry? 9) What is an interrupt? What is the interrupt vector? 10) When an interrupt occurs, how does the hardware state of the running process get saved? Address space ------------- Review of what we learned last time: Address space: The set of memory addresses a process can access. With 32-bit addresses, the address space is the set of hex numbers from 0000 0000 to ffff ffff The compiler and run time system work together to arrange data in the address space. At compile time: 1) the compiler generates the program text, which goes in the TEXT segment 2) if the compiler can tell how big something is, it can allocate it statically, in the STATIC segment 3) the compiler also generates procedure entry-exit code At run time: 1) when the procedure entry-exit code runs, it (de)allocates frames on the run-time stack, in the STACK segment 2) if the code executes malloc, the run-time system allocates space dynamically, in the HEAP segment The standard arrangement of these things is something like: STACK (growing down) space HEAP (growing up) STATIC TEXT In the experiment we ran yesterday, I got the following output, which I have sorted by address. Address of x is 0xbfe21194 Address of a is 0xbfe1d2f0 STACK Address of a is 0xbfe19440 Address of rv is 0x085eaf60 Address of rv is 0x085e9018 HEAP Address of r1 is 0x085e9008 Address of global is 0x08049724 STATIC Address of main is 0x08048414 TEXT It looks like the stack might have started at bfff ffff rather than the top of memory at ffff ffff So that makes us wonder what those high addresses are being used for. Google challenge: 1) what are addresses > bfff ffff used for 2) is there anything else in the address space? Experiment 2: Download a new version of address_space: rm address_space.tgz (if necessary) wget wb/ss/code/address_space.tgz tar -xzf address_space.tgz cd address_space make sleep (./sleep 1 &) ; ./sleep 2 What sense can we make of the output? CPU scheduling -------------- Read Tanenbaum pages 132--142, and answer the following questions: 1) What are the two reasons Tanenbaum gives for his claim that scheduling doesn't matter much on simple PCs? 2) What is a CPU-bound process? What is an I/O bound process? 3) Why does Tanenbaum claim that the percentage of I/O bound processes is increasing? Why is he wrong? 4) What are the four occasions when the scheduler has to make a decision? 5) Why are virtually all schedulers in the real world preemptive? 6) Why is "fairness" a strange way to describe the goal of a scheduler? 7) What is throughput? What is turnaround time? 8) Why does Tanenbaum think utilization is a bad metric? While we're at it, what is a metric? 9) In an interactive system, what is the most important metric? 10) In a real time system, what is the most important metric?