Software Systems Spring 2005 Outline: 1) class organization 2) LB model of data transfer 3) start in on Homework 1 For next time: 1) read Patterson's article 2) work on Homework 1 (due next Tuesday) Four topics, one uber-topic, 2 kinds of work -------------------------------------------- The topics are Operating Systems, Networks, Databases and Run-time systems. The uber-topic is performance evaluation, which includes workload characterization, experimental design, measurement, modeling, analysis, simulation, implementation, and verification. The two kinds of work are homeworks: reinforce the ideas and practice the techniques with short well-defined projects. projects: apply the techniques to a long-term, open-ended project. A few words about this class ---------------------------- 1) This class is a little different from what you see in a lot of curricula. Necessity: We can only offer a few CS electives, so we want to make sure we get the good stuff. Virtue: I will be combining material from several advanced classes, looking for connections and cross-cutting ideas. 2) I am making it up as we go along. Necessity: I have a plan, but at this point there are lots of configurable parameters. Virtue: In the spirit of Olin, please help me create this class. In particular, I am hoping that some projects will become future homeworks. 3) Reading technical papers is one of the goals. Necessity: There are no textbooks for this sort of thing; we are working without a net. Virtue: Textbooks are nice, but eventually you will need to read research papers, data sheets, etc. So we'll get some practice. I will assign a few papers by my favorite author! LB model of data transfer ------------------------- The LB model is based on the observation that in many systems, the time to transfer a data object from one place to another is roughly linear with the size of the data. So we can characterize the line with: latency: the time to transfer some standard size chunk, usually the smallest relevant size measured in units of time bandwidth: the marginal rate at which additional data is sent measured in units of data size / time Graphically, latency is the intercept of the line size=min_size, and bandwidth is the inverse of the slope of the line Complications: 1) depending on context, latency might measure a one-way transfer (in which case it is often called delay) or a round trip (in which case it is often called a round trip time) 2) bandwidth, strictly speaking, is measured in Hz, because it measures the width of a band, which is a range of frequencies. "data rate" or "capacity" would probably be more correct. The reason they are used interchangably is that Shannon's theorem equates them: C = B log2 (1 + S/N) where: C is the maximum information-carrying capacity of a signal in bits/second B is the width of the band in Hz S/N is the signal-to-noise ratio In other words, the data rate is the bandwidth multiplied by a factor that depends on noise. 3) The actual data rate an application achieves, which is sometimes called throughput, is often related to bandwidth, but the relationship can be complicated. 4) Some systems are only roughly linear, and some are not very linear at all. Examples from tcp/scatter: In a packet-switched network, there are often flat spots in the time-size curve. Nevertheless: 1) the MB model is used frequently. 2) latency and bandwidth are possibly the two most important metrics of system performance. Sometimes you only care about one of them ----------------------------------------- For example, in networks: 1) interactive applications tend to send lots of short messages, performance depends on latency 2) moving large files tends to depend on bandwidth 3) often, startup depends on latency, steady state on bandwidth (which is why remaining time estimates converge from above) In operating systems: 1) sizes are chosen to amortize latency 2) therefore, both parameters matter more often than by chance Mnemonic of the day: "Bandwidth is for big things, Latency is for little things"