Software Design Spring 2007 For today you should have: 1) started Homework 5 For Monday: 1) read Chapter 13 2) prepare for the exam For Wednesday: 1) read Chapter 14 2) finish Homework 5 Warmup 5 -------- Write a function named multiset that takes two lists called keys and values. You can assume that these lists have the same length. The same key might appear more than once in keys The function should return a dictionary that maps each unique key to a list of corresponding values. Two ways to loop through two lists: 1) use indices for i in range(len(keys)): k = keys[i] v = vals[i] 2) use the zip function for k,v in zip(keys, values): Several ways to update the elements of the dictionary: 1) Check to see if it's already there: if k not in d: d[k] = [v] # list with one elt else: d[k] = d[k] + [v] # both operands are lists OR if k not in d: d[k] = [v] # list with one elt else: d[k].append(v) # the parameter is an elt Choose one or the other, but DON'T MIX! d[k] = d[k].append([v]) # BAD BAD BAD BAD BAD 2) Use setdefault to avoid the special case: def multiset(ks, vs): d = {} for k, v in zip(ks, vs): d.setdefault(k, []).append(v) return d Pro: concise, Con: dense Homework 4 Solutions -------------------- 1) The function process_file returns a dictionary that contains the words from the file and their frequencies. I used this function twice, to analyze the book, and also to process the word list. def process_file(filename): d = {} fp = open(filename, 'r') for line in fp: process_line(line, d) return d This function is an adaptation of the counter pattern in Section 7.8. 2) I used the function subtract to find the words that appear in the book but not in the dictionary. This is a general-purpose function that is likely to be reusable. def subtract(d1, d2): """return a dictionary with all keys that appear in d1 but not d2""" res = {} for key in d1.keys(): if key not in d2: res[key] = True return res 3) I used a pattern called "decorate-sort-undecorate" (DSU) to find the ten most common words. def sorted_list(d): t = [] for key, value in d.items(): pair = (value, key) t.append(pair) t.sort() t.reverse() return t Remember that sort() and reverse() are modifiers that return None. So the following doesn't work: return t.sort().reverse() # ERROR! But sort takes an optional parameter (now): t.sort(reverse=True) Data structure design --------------------- In hw05 (for the first time) you have to make some decisions about which data structure to use. Choices: string, list, tuple, dictionary. How to choose? Think about the operations you will need. For the mapping between prefixes and possible suffixes, a dictionary is the obvious choice. What about: 1) the prefixes (the keys of the dictionary) 2) the collection of suffixes (the values of the dictionary) 3) the buffer that holds the current window of words Once you know what operations you need, you should consider 1) efficiency of those operations for each data structure 2) ease of implementation For us, for now, (2) is MUCH more important than (1).