Software Design Spring 2007 Running other programs from Python ---------------------------------- http://docs.python.org/lib/os-newstreams.html import os cmd = 'ls -l' fp = os.popen(cmd) res = fp.read() print res stat = fp.close() print stat Persistence ----------- Most of the programs we have written this semester are transient 1) they run for a short time. 2) most of their data disappears when the program ends. 3) the next time the program runs, it starts from the same place. Persistent programs are different on all three axes: 1) they run for a long time or all the time. 2) they keep at least some of their data in permanent storage. 3) when the program (re)starts, it picks up from where it left off. In this context, "permanent storage" usually means disk, but there are alternatives. Some ways of storing the "state" of a program: 1) reading and writing plain text files. + readable by humans and other applications 2) reading and writing binary files. + space-efficient and obscure (but not secure) 3) reading and writing databases. + provide higher-level features like concurrency (many programs can access at the same time) consistency (avoiding corruption by mid-transaction failures) Databases --------- The design and implementation of database management systems (DBMS) is a broad and fascinating topic that I will grossly oversimplify with the following summary: A database is a data structure on disk. Different kinds of databases use different data structures, but one of the most common is a relational database, which is similar to a hash table (dictionary) in the sense that it maps keys to values. A database can contain many tables, and a table can contain many records (aka row or tuple), and a record can contain many attributes. At least one of the attributes in a record is a unique key that can be used to access the record quickly. Ultimately a database is a file on a disk, but you don't (or can't) access it through the conventional file system; you access it through the database management system. Database systems are often distributed, meaning that the database itself is stored on a server (or servers) and accessed across the network by clients. Database systems often provide a query language that allows users to specify database operations in a concise (sometimes natural-like) language. The database management systems most popular with Python programmers are: 1) MySQL ("My ess cue ell"): powerful open-source distributed DBMS that supports the query language SQL. 2) anydbm: very simple interface to a simple local DBMS 2a) shelve: a persistent-object module built on top of anydbm Here is an example using anydbm (from http://docs.python.org/lib/module-anydbm.html) import anydbm # Open database file, creating it if necessary. db = anydbm.open('cache.db', 'c') # Record some values db['www.python.org'] = 'Python Website' db['www.cnn.com'] = 'Cable News Network' # Loop through contents. Other dictionary methods # such as .keys(), .values() also work. for k, v in db.iteritems(): print k, '\t', v # Storing a non-string key or value will raise an exception (most # likely a TypeError). db['www.yahoo.com'] = 4 # Close when done. db.close() 1) anydbm.open returns a db object that you can think of as a proxy for the file on disk. Changing db changes the file on disk (or fails cleanly). 2) The db object maps string keys to string values. 3) In every other way (including syntax) the db object behaves like a dictionary. 4) If your program ends or crashes, the database persists. 5) When the program restarts, it can reopen the database and pick up from where it left off. Pickling -------- A limitation of anydbm is that the keys and values have to be strings. Fortunately, most Python object can be rendered as strings in a way that allows them to be reconstituted later. This process is called "serialization", and it is provided by the pickle module: http://docs.python.org/lib/module-pickle.html anydbm and pickle go together so naturally that they have been combined for you in a package called shelve: Example (from http://docs.python.org/lib/module-shelve.html) import shelve d = shelve.open(filename) # open -- file may get suffix added by low-level # library d[key] = data # store data at key (overwrites old data if # using an existing key) data = d[key] # retrieve a COPY of data at key (raise KeyError if no # such key) del d[key] # delete data stored at key (raises KeyError # if no such key) flag = d.has_key(key) # true if the key exists klist = d.keys() # a list of all existing keys (slow!) # as d was opened WITHOUT writeback=True, beware: d['xx'] = range(4) # this works as expected, but... d['xx'].append(5) # *this doesn't!* -- d['xx'] is STILL range(4)!!! # having opened d without writeback=True, you need to code carefully: temp = d['xx'] # extracts the copy temp.append(5) # mutates the copy d['xx'] = temp # stores the copy right back, to persist it # or, d=shelve.open(filename,writeback=True) would let you just code # d['xx'].append(5) and have it work as expected, BUT it would also # consume more memory and make the d.close() operation slower. d.close() # close it WARNING: the dictionary syntax might lead you to think of your database as a dictionary, but there are limitations and some potential for subtle bugs. When you use this interface (unfortunately) you _do_ have to think about what is happening in the implementation.