cs357 Lecture Notes Spring 2000 Week 13, Thursday For today you should have printed and read "A Fast File System for UNIX" Also, you should have finished Homework 9. The final exam is Friday 12 May from 12:30 to 2:30. The exam will be comprehensive. The format will be similar to the other exams, but about 50% longer. Disk block allocation --------------------- Contiguous, linked, indexed. Contiguous 1) not much info in the inodes 2) fast sequential access 3) fast random access 4) very difficult allocation Linked 1) not much info in the inodes 2) easy allocation 3) a little overhead at the end of each block 4) possibly bad sequential access 5) very bad random access Indexed 1) same overhead as linked, but collected into index block 2) easy allocation 3) good sequential and random access 4) inodes are now variable size Multi-level indexed (UNIX inodes) 1) first few links are in the inode itself 2) single indirect block is similar to indexed 3) double indirect block / triple indirect block 4) indirection always comes with a price (either cache space or access time) UNIX inode performance ---------------------- How big is the biggest file in UNIX? 32-bit block numbers, 4K disk blocks. Can we refer to all of that with a single inode? 12 indices in inode Indirect block is 4K, each address is 32 bits. (How many bytes can the double indirect block span?) (If we switch to a 64-bit file system address, can a single inode handle the whole thing?) Free space management --------------------- FAT = bitmap Linked list of free blocks Hybrids. For example, keep a free inode. Out of core data structures! Directory implementation ------------------------ It's a mapping from strings to inodes. Hash table. File system caching -------------------- Use some of physical memory to cache frequently-used file blocks 1) directory blocks 2) inodes 3) recently read-written data Limitation: has to be non-volatile or write-through. Interesting conflict: 1) is it ever a good idea to page out a process's memory in order to cache a disk block? 2) how can we unify the page replacement policy and the disk cache policy? Systems research/design ----------------------- Four phases 1) empirical observation/measurement 2) analysis/design 3) implementation (a simple matter of programming) 4) evaluation This semester we have emphasized (1) in the labs and (2) in the lectures. (3) is partly an issue of software engineering, but it is also influenced by a) hardware issues (example: test and set instruction) b) interations among software modules (compiler-loader-OS) c) autonomy, access and control (who can implement what, and what are the consequences for other parties?) Finally, (4) has many of the same characteristics of (1). Fast file system ---------------- What are the observations that motivate their design? Fundamental: Block size too small and seek times too long to use disk-memory bandwidth. Correlary: Distribution of file sizes. Importance of sequential access. Locality within directories. inode layout is as important as data layout. What are the critical elements of the design? Bigger blocks. Special partial-block file mechanism. Better data layout. Better inode layout. What is parameterization? (Section 3.2) How reliable are these results are technology changes? What are the relevant performance characteristics? Information about implementation? Autonomy issues -- too late to change the interface! Evaluation (Section 4) What is their metric of performance? What environment did they use to measure performance? Empirical skepticism -------------------- A fundamental aspect of life is the necessity of making decisions under uncertainty. Empirical skepticism has evolved as a heuristic for decision making that appears to be particularly appropriate in the domains of engineering and science. The fundamental ideas are 1) there is a domain of public information that the vast majority of reasonably people can agree about. 2) within this domain, evidence can be used to persuade reasonable people to adopt a new view or change an old view. 3) decisions can often be characterized as a process of weighing evidence, generating probablistic beliefs, and using those beliefs to minimize costs/maximize benefit. 4) probablistic decision making almost always involves risks. 5) the weight of evidence we require depends on the risks, costs, and benefits If you were reluctant to invest the time and money to upgrade your OS, would this paper persuade you of the benefits? What if you were the author of a rival OS, would this paper convince you to reimplement your file system? Conclusion: in the domains of engineering and science, there is a practical benefit to resist new ideas and changes in the absence of persuasive evidence.