Software Systems Spring 2008 For today, you should have: 1) worked on your project (draft report due today or Friday) 2) read Tanenbaum I/O Sections 5.1 and 5.2 3) Read the Stevens handout Section 12.5: I/O Multiplexing 4) Optional: the H20 problem from LBoS Outline: 1) Tanenbaum Chapter 5: I/O (notes21) 2) I/O multiplexing 3) pipes 4) peer-to-peer intro For next time you should: 1) work on your project 2) prepare for a quiz on I/O and database implementation. 3) flip through the LBoS, choose a problem, and solve it... http://greenteapress.com/semaphores/ make up your own synchronization problem, WITH A SOLUTION, and bring it to class. Write up a problem and a hint in a form you could give to another student to solve. 4) read the wikipedia article about distributed hash tables and Balakrishnan et al. "Looking up Data in P2P Systems" http://en.wikipedia.org/wiki/Distributed_hash_table http://www.project-iris.net/irisbib/papers/dht:cacm03/paper.pdf pipes ----- In the shell, the pipe character | connects two processes with a pipe so that the output from one is used as the input to the next. In the C interface, you have to 1) call the pipe function, which creates two file descriptors such that anything written on one can be read by the other, and vice versa 2) calling the fork function, which creates a child process that has the same two file descriptors. 3) if you don't need two way communication, the parent and child each close the connections they don't need 4) the child process can call exec in order to run another process, and then the parent and child can talk to each other. Here's a simple (!) example (also at http://wb/ss/code/pipe/pipe.c) #include #include #include #include typedef struct pipe { int fd; /* file descriptor from which to read */ pid_t pid; /* id of the child process */ } Pipe; Pipe *make_pipe() { Pipe *p = (Pipe *) malloc (sizeof(Pipe)); return p; } Pipe *open_pipe (char *command, char *name, char *arg) { int n, fd[2]; pid_t pid; FILE *fp; Pipe *p = make_pipe(); if (pipe(fd) < 0) { fprintf (stderr, "Unable to open pipe.\n"); exit(-1); } if ((pid = fork()) < 0) { fprintf (stderr, "Unable to fork.\n"); exit(-1); } else if (pid > 0) { /* parent code*/ close (fd[1]); /* close write end */ p->fd = fd[0]; p->pid = pid; return p; } /* child code */ close (fd[0]); /* close read end */ /* make stdout go to the write end of the pipe */ if (fd[1] != STDOUT_FILENO) { if (dup2 (fd[1], STDOUT_FILENO) != STDOUT_FILENO) { fprintf (stderr, "Unable to redirect child's output.\n"); exit(-1); } } if (execl (command, name, arg, (char *)0) < 0) { fprintf (stderr, "Unable to execute command %s %s.\n", name, arg); exit(-1); } } #define BUFFSIZE 8192 int main() { Pipe *p = open_pipe("/bin/ls", "ls", "/etc"); int n; char buf[BUFFSIZE]; pid_t child; while ( (n = read(p->fd, buf, BUFFSIZE)) > 0) { if (write(STDOUT_FILENO, buf, n) != n) { fprintf (stderr, "Write error.\n"); exit(-1); } } if (n < 0) { fprintf (stderr, "Read error.\n"); exit(-1); } child = wait(p->pid); if (child < 0) { fprintf (stderr, "Wait failed.\n"); exit(-1); } } Download and run this to see what it does! select ------ In UNIX, the lowest-level C interface for asynchronous I/O is the select function. It allows you to specify a set of file descriptors you want to wait for and then sleep until one of them is ready. When you wake up, you have to figure out which one(s) are ready and then deal with them. So this is a form of event-driven programming. Good news: select can handle any asynchronous I/O that fits into the file descriptor API. Bad news: not everything does. For example, signals. One example is trout, my version of traceroute (based on more code from Stevens!) wget http://allendowney.com/research/trout/trout.tar.gz tar -xzf trout.tar.gz cd trout make # the executable might work for you Trout needs to be able to open a "raw" socket, which can only be done with root permissions. The best way to do that is to make trout a "setuid root" executable. $ su $ chown root trout $ chmod +s trout $ ls -l trout -rwsrwsr-x 1 root downey 34250 Nov 29 14:13 trout $ exit The fastest way is to run it as root or sudo it. Either way, try: ./trout www.google.com (or whatever) And then look at trout.c: /* recv_dgram: reads all incoming datagrams and checks for returning ICMP packets. returns -3 on timeout, -2 on ICMP time exceeded in transit (we reached a router) -1 on ICMP port unreachable (we reached the destination) 0 on ICMP that has nothing to do with us */ /* as Stevens points out in Section 18.5 of Unix Network Programming, many programs with alarms have a race condition, which is that the alarm might go off before we get to the recvfrom, in which case it does nothing and the recvfrom might wait indefinitely. In earlier versions of this code, this problem seemed to pop up occasionally (although I am not positive about that). The use of select here solves that problem. When the alarm goes off, the alarm handler sends a message through the pipe, which is one of the things select waits for. When select returns, we know that we have received a datagram OR the alarm has gone off OR both. We then use rset to find out which, and deal with it. According to the specification of select, it should not be possible to get to the recvfrom unless there is a datagram waiting, and therefore the recvfrom should never block. Nevertheless, it sometimes does, which is why, when we opened it, we set the NONBLOCK flag and why, if it fails (errno = EAGAIN) we just go on. */ int recv_dgram () { int err; socklen_t len; ssize_t n; struct ip *ip; int maxfdp1 = max (recvfd, pipefd[0]) + 1; fd_set rset[1]; FD_ZERO (rset); alarm(3); /* set the timeout alarm to handle dropped packets */ while (1) { FD_SET (recvfd, rset); FD_SET (pipefd[0], rset); n = select (maxfdp1, rset, NULL, NULL, NULL); if (n < 0 && errno != EINTR) { err_sys ("select error"); } if (FD_ISSET (recvfd, rset)) { len = salen; n = recvfrom (recvfd, recvbuf, sizeof(recvbuf), 0, sarecv, &len); err = errno; Gettimeofday (recvtv, NULL); /* get time of packet arrival */ if (n < 0 && err != EAGAIN) { err_sys ("recvfrom error"); } } if (FD_ISSET (pipefd[0], rset)) { Read (pipefd[0], &n, 1); return -3; /* timeout */ } ip = (struct ip *) recvbuf; return process_ip (ip, n); } } Alarm handler: /* sig_alrm: alarm handler sends a message to the process through a pipe, causing select to return */ void sig_alrm (int signo) { Write (pipefd[1], "", 1); /* write 1 null byte to pipe */ return; } Here's how you set the alarm handler: Signal (SIGALRM, sig_alrm); Peer to peer concepts --------------------- The idea of p2p networks makes the most sense in contrast to the primary alternative, centralized client-server networks. "centralized" means that server functions tend to be co-located and unreplicated most common example in a LAN is room full of machines running web servers client-server means that most machines are entirely servers or entirely clients p2p is different on both of these axes 1) content is distributed and replicated 2) machines that use services are expected or required to provide service, too. there are no dedicated servers. well-known p2p services include (the original) Napster, Gnutella, FreeNet, BitTorrent The advantages of p2p include 1) high system throughput (both bandwidth and server processing) capacity scales with popularity 2) robustness (no single point of failure) Disadvantages include 1) vulnerability to maliciousness wikipedia mentions: poisoning, DOS, defection, etc. 2) evasion of legal responsibility this can be an advantage, too In a PURE p2p network, all peers are truly equal, but many deployed systems use centralized indices, or at least super-peers. Many p2p networks are based on the concept of an "overlay network" which is an abstract connectivity model unrelated to the physical connections in the network. This takes some explaining. Indexing in p2p networks ------------------------ Indexing = building metadata that makes it possible to find what you are looking for. 1) a card catalog is an index for finding books by author, title or subject -- basically three sorted lists 2) Yahoo is/was an attempt to organize the WWW into a semantic hierarchy -- basically a tree of unordered lists 3) Google is a content-addressable index -- a mapping from word/phrase lists to ordered lists of URLs p2p networks have historically used a variety of indexing techniques. In some cases they have resorted to a centralized index. Alternatives include 1) flooding (overlay = fixed outdegree graph) 2) superpeers (overlay = tree) 3) unstructured routing (overlay = dynamic graph) 4) distributed hash table (overlay = fixed outdegree graph) Hash tables are potentially the most efficient, BUT each data item must have a unique key. So how do we get from "I want X" to a unique key? How Chord works: Imagine the set of possible keys arranged in a ring. New peers are placed randomly around the ring. Each peer is "responsible for" the closest keys. lookup(key) is equivalent to finding the closest peer to a given point in the ring. Routing = how does a given node find other nodes? Chord uses "finger tables" How many other nodes does each node know about? How long does it take to handle a query? Alternatives to Chord: 1) PAST and others use a tree. 2) CAN uses a multi-dimensional Cartesian space. Summary: DHT is a pure p2p solution for mapping from unique ids to nodes. The more general problem of indexing in p2p networks is open. Holy grail: pure p2p implementation of Google-like searching.