Chapter 4 --------- 4.1 Internetworking --------------- With lower-case i, any network made up of smaller, heterogeneous networks. What used to be bridges are now routers, where the primary distinction is that routers have to convert from one network format to another. Canonical example: Ethernet onto FDDI or ATM. With capital I, the Internet is specifically the global internet that uses IP (Internet Protocol) as its lingua franca. Figure 4.2 on page 249 shows a protocol diagram for an Internet path. Hosts are running protocols on top of IP. Routers need to be able to "speak" IP as well as two different network frame formats. IP packets are encapsulated inside network-level frames. 4.1.2 Service model ------------- datagrams -- all packets contain all info needed for delivery. no setup before sending no way of knowing if there is a path, or if the recipient is up unreliable -- packets can get lost, reordered, or delivered more than once! best-effort -- if a router discovers that something has gone wrong, it has no obligation to inform either party (although sometimes is does) Header format ------------- Notice: things that handle IP packets tend to be written in software, so 1) byte alignment is nice 2) CRC is less appealing Header is 20 bytes (5 words) long; most fields are byte-aligned Version (4 bits): IPv4 or IPv6 Hlen: length of header including options (in words) Example option: source routing... up to nine router IP addresses route tracing... same thing TOS: type of service. In theory there is min delay, min cost, etc. Length: total length of packet including header (in bytes) Indent: unique identifier used when packet is fragmented Flags: most important, M=are there More fragments in this packet? Offset: where in the fragmented packet does this fragment fall (specified in 8-byte units) (Why?) TTL: literally time to live, but actually number of hops to live, but actually number of well-behaved routers to live Protocol: used at the next level up TCP = 6, UDP = 17 Checksum: just protects the header, fast algorithm for software Source addr and dest addr: 32 buts each, as explained below Options, padding. Fragmentation ------------- Each network has a maximum frame size. MTU = maximum transmission unit = maximum frame size - IP header size We _could_ require IP to use a packet size = min packet size over all network types, but that's bad because 1) new network types all the time, and IP needs to be universal 2) not all paths include all network types. We want hosts to be able to use large packet sizes when possible. Therefore, hosts can send any packet size (usually up the the MTU of the network they are connected to directly), and the network fragments if necessary along the way. The fragments are themselves valid IP packets that are delivered independently to the destination. Once fragmented, packets are not reassembled until they arrive at the destination. If one fragment is lost, the whole packet is dropped, since there is no one we can get just the dropped fragment from. Fragments might themselves get fragmented, if they pass through a link with an even smaller MTU. Not a particularly efficient process, since reassembly is non-trivial (possibly multi-level). Better if hosts do MTU discovery before sending. 4.1.3 Global addresses ---------------- Every interface on every host has a 32-bit IP address that is globally unique (this is now only almost true). How many addresses are there? How many interfaces are out there now? Hierarchical: 32-bit address contains a network part and a host part. Pro: simplifies routing information... the router need only look at the network part Con: not the most efficient allocation scheme. Not all networks have the maximum number of hosts on them, so many IP addresses are wasted. (Actually it's even worse than that, because ranges of network numbers have been allocated to institutions that are not using them.) The split between the network part and the host part depends on the class of the address. Addresses are written as 4 decimal numbers with . between them first byte class split 0 -127 A 1 byte network, 3 bytes host 128-195 B 2 byte network, 2 bytes host 196-255 C 3 byte network, 1 bytes host Rocky is (was) 137.146.194.45 First bit is 1, second is 0, so this is a Class B address. 137.146 is Colby, there can be 65536 interfaces on campus. Colby uses one byte to identify Ethernet, one byte for each host on the net. Recurrence relations -------------------- In the spirit of Math-on-demand, here's a few tips about recurrence relations. See Sedgewick, pages 76-78. Problem for next time: Given the following recurrence relation: f(n) = a f(n/2) + b lg n f(1) = f1 Assume that there is a solution of the form: f(n) = c n lg n + d lg n + en Find a general solution by plugging the assumed solution into the recurrence and solving for c, d and e in terms of a, b, and f1. Is there some condition on a, b, and f1 that is required in order for this functional form to solve the recurrence relation? 4.1.5 ARP --- Translate the IP address and encapsulate the IP packet One possibility is to use physical addresses as the host part of the IP address, but that's not generally possible. Instead, each host maintains a table. 1) updated dynamically 2) entries time out after about 15 minutes 3) if you don't know, broadcast If you get an ARP query, you (incidentally) find out the physical address of the sender. 1) if the sender is already in your table, refresh 2) if the sender is asking about you, reply, and also add an entry to your table (on the assumption that you will need it soon) 3) otherwise, do nothing ATMARP ------ Divide net into subnets. Each subnet gets an ARP server. Each node configured with physical address of ARP server. Everyone has a VC to the ARP server. Everyone registers with the ARP server when they create their VC. Server answers all ARP queries. 4.1.6 DHCP ---- Dynamic host configuration protocol Can't allocate IP addresses at manufacture time. (Why not?) Have to configure upon installation on a network. In addition to their own IP addresses, hosts need to be configured with the IP address of a default server. Administrator can assign addresses and configure manually. Alternative: 1) create a DHCP server somewhere on the network (in this case, think campus-wide) 2) on startup, host broadcasts DHCPDISCOVER on its subnet if DHCP server is on the subnet, it replies otherwise, a DHCP relay on the subnet relays (by unicast) the message to the DHCP server (it knows the IP address) 3) DHCP might have a table of static IP addresses, in which case it will look up the physical address of the sender otherwise it will choose an IP address dynamically from the range of available addresses This makes things more manageable, mostly. Problem: IP address of a given host changes more often than might be desireable, especially if this host provides a service of some kind (like web pages) There might be bad interactions with DNS, about which more later. 4.1.7 ICMP ---- Internet Control Message Protocol Something goes wrong, routers (or recipients) can send messages back to sender 1) host unreachable! 2) reassembly failed! 3) ttl = 0, time expired! 4) checksum failed! Quick note: both DHCP and ICMP use UDP, which is a protocol that runs on top of IP, so we are getting just a little ahead of ourselves, but that's ok. 4.1.8 Virtual private networks ------------------------ You might not want everyone to be able to send packets to you from a shared link. On nets that support VC, this is easy to do. You just configure your router to refuse to create VCs from addresses you don't like. Tunnel ------ In IP land (no virtual circuits), we can get some of the same behavior with tunnels. Tunnel entrance takes incoming packets, encapsulates them and sends them to the other end. Tunnel exit unwraps the packets and delivers them locally. Tunnel exit can restrict access to local network by refusing to unpack packet from undesireables, or packets that aren't wrapped at all. Benefits: 1) privacy, security 2) multicast: multicast packet travels through non-multicast intermediate network incognito 3) non-IP protocols over IP (similar to above) Downside: 1) space overhead, time on router overhead 2) management 3) confounds path discovery, since intermediate routers don't decrement ttl of inner packet 4.2 Routing ----------- See the table on page 266. How did it get built? Building the table is routing. When you hear routing, think, "building routes," which means "building routing tables." You can't route a packet. You can forward a packet using a routing table that got built by the routing mechanism. (What information is in the forwarding table that is often not in the routing table?) (Why might/should be routing table and the forwarding table be different data structures?) Fundamental problem with decentralized algorithms -- possibility of inconsistent state... I think you have the fastest route to the dest; you think I do. We keep handing the packet back and forth. Best we can promise is that this situation will resolve itself in a reasonable amount of time. 4.2.2 Distance vector routing (RIP) ----------------------------- Everyone knows who their neighbors are and their cost, which is one. Distance to non-neighbors is initially infinity. Periodically, everyone sends that information to their neighbors, who update their own table. Eventually, the system converges (everyone has a consistent view of the topology and knows the shortest distance to everyone). Updates are periodic (seconds to minutes) and triggered (tell your neighbors if something changes). How might a node detect that one of its neighbors (or the intervening link) has failed? What does it do in that case? In the graph on page 284, if E goes down, A sends messages to everyone. If B gets the message before C, and C sends an update to B before B gets the message from A, all heck breaks loose. (What is the error in the book?) 4.2.3 Link state (OSPF) ----------------- Instead of everyone telling the neighbors everything they know, everyone tells everyone only what they know well! Then everyone has global information and they can use global algorithm to find minimal paths. Problem: how can you multicast when you have not yet got routes established? Answer: reliable flooding. 1) send to all your neighbors. 2) they send to all of theirs, and so on. 3) packets have sequence numbers, so you only pass on new ones. 4) every connection is reliable (uses ACKS) Total amount of traffic is more, per update, but updates need to happen less often, since convergence is fast and reliable. Once hosts have global info, they use Dijkstra's shortest path algorithm. Downside: 1) amount of info stored at each router is large Ultimately, neither of these algorithms scales well, which is why we need Section 4.3: Global Internet. 4.2.4 Metrics ------- So far, we have been using costs = 1, which means we are choosing the shortest path in number of hops, but that is not necessarily best. Neither is the first draft of ARPANET 1) cost = queue length 2) cost = average delay of the last few packets delay = latency + transmit time + (depart - arrival) latency, in this context, means the fixed part of the time it takes to send a packet, regardless of size transmit time is the variable part of the time it takes to send a packer, depending on size (depart - arrival) is usually queue delay, but also includes timeouts and resend times Latency and transmit time are static, (depart - arrival) is updated periodically (presumably when we recompute routes). I assume that transmit time is based on some average packet size. 3) total hack! What is instability? On one level is means that routes are changing more than you want. The more interesting meaning is that routes oscillate forever and never converge. COMPRESS effect of queue delays to reduce the appeal of long paths SMOOTH by replacying delay with utilization, and averaging over a longer period of time, and bounding rate of change over time. This supposedly reduces instability. SIMPLIFY STATIC INFO instead of latency and transmit time, just look at the type of the link -- in other words, the precomputed functions on page 303 take into account the link characteristics. Empricial Internet Behavior --------------------------- The box on page 304 describes an interesting phenomenon. Although the Internet is made up of deterministic, man-made pieces, the system as a whole cannot be understood using the conventional tools of reductionism. It is best treated as a complex system, which means 1) it is made up of multiple components 2) the interactions of those components contribute something essential to the behavior of the system, making it impossible to understand the system by looking at the parts in isolation. complex != complicated complex means made up of multiple components, not monolothic therefore, you either are or aren't; there's no "so complex..." complicated has an underlying element of human confusion, which comes in degrees, as in "the system is so complicated that we can't make good predictions about it" Alternatively, "the system is complex, so the best way to make predictions about it is to model the components and also the interactions among the components." Let's read Paxson's paper! Let's skip routing for mobile hosts 4.3 Global Internet --------------- Two problems 1) routing algorithms don't scale up to 10,000 networks routers might have 10,000 entries! 2) address allocation is getting tight 4.3.1 Subnetting ---------- You might have noticed that I have been vague for the last week about what I mean by a network. Is it a single Ethernet on a campus or is it the whole campus. The answer is that from the point of view of a router that is off campus, the entire Colby campus is one network. So all traffic headed for Colby is going to go to the same place, which is fine becauce 1) there is only one way to get here anyway, and 2) even if there were more than one way, any will do. Once the packet gets onto campus, OUR routers use additional bits of the "host" part of the address to forward the packet to the right subnet (which usually corresponds to a physical network). We can use as many bits as we want, even non-contiguous bits (although the latter is a bad idea). The subnet mask indicates which bits it is. Only the routers (and host) in our domain need to know which part of our IP addresses idenifies the subnet. Why do hosts have to know the subnet mask? Can have more than one physical network per subnet! (we saw this in Chapter 3 -- bridges forward from one physical network to another) Can have more than one subnet per physical network! (this just means that hosts that are actually on the same network will think they are not and send packets through the router when they don't have to) There are several good things about subnetting, but the biggest benefit is that it greatly reduces the amount of information off-campus routers need to have about our network. 4.3.2 CIDR ---- Classless interdomain routing. We have already blurred the line between the network part of an IP address and the host part. CIDR does the same thing, except in the other direction. If you (a router) use only a prefix of the network part, and ignore the low order bits, you are effectively aggregating a bunch of networks. Anything that starts with 137.0011 is going to the same place. If you hand out IP addresses in clever ways, you can make this work. For example, give a set of contiguous, power-of-two addresses to a network servive provider and let them hand out network addresses to customers. For the point-of-view of other networks, you can send packets for any of MCIs customers to the same place. This is sometimes called supernetting, since you are taking a bunch of network and treating them as one big unit. CIDR prefixes can be any length. Routers match IP addresses with entries in the forwarding table by choosing the longest prefix match. There might be something like: For example, maybe Denmark is one big supernet, but it includes networks in Denmark but also in Greenland. A router in Reykjavik might have entries that say... Anything going to Denmark, send it here. Except things in Greenland (a Danish colony), which go somewhere else. 4.3.3 BGP --- That's still not good enough. Even with CIDR, there are still something like 50,000 prefixes out there that every router would have to know (and that was a couple of years ago). 1) the world is broken into AUTONOMOUS SYSTEMS. Forget about getting packets to the right network; just get them to the right AS. 2) most AS don't participate in interdomain routing, so that simplifies the routing process if not the forwarding process (we can assign a unique ID to all the transit AS) 3) forget about minimal paths, just try to get a loop-free path 4) appoint only one router in each AS to participate in routing -- the speaker 5) when you advertise a path, advertise the complete path so that someone deciding whether to use it can check for loops. 6) withing each AS, aggregating works well because finding a good path means finding the right border router, and there usually aren't that many per AS Injection --------- In a stub AS, all the routers know that the border router is the default router. (I don't understand why the book indicates that this information is injected by the border router into the intradomain routing process.) In a multihomed AS, is it actually useful for the border routers to advertise routes (with local costs) to the other routers in the domain, so that internal routers can choose the best border router for a given destination from a given place in the AS. In a transit AS, you might want to use IBGP, which uses BGP to identify the best border router and intradomain routing to get there. NAT --- NAT is evil. Instead of maintaining the very desireable property of globally unique addresses, NAT makes it possible to hand out addresses within an AS that are not unique. The often begin with 10. Imagine that Colby got a single Class C address. Within campus, we use addresses that begin with 10, so there are 2^24 local addresses. Other people might be using the same ones elsewhere. Somewhere between us and the outside world is a NAT box. It translates our local addresses into globally unique addresses, using our pool of 256 Class C addresses. As long as there are no more than 256 people on campus trying to send or receive packets to or from off campus, we're fine. Otherwise we might have to queue for a free UID. Other (big) problem. Embedded addresses. NAT box has to know when and how to translate addresses that are embedded inside messages, according to higher-level protocols. NAT is evil. 4.3.5 IPv6 ---- Need to change the size of addresses. Therefore... Need to change the size/format of the header. Therefore... All bets are off! It's a free-for-all! Step right up, pile on the cruft! IPv6 has been acting as a feature filter for a decade. Some of the good ideas (except big addresses) have been implemented in IPv4. That may have delayed actual implementation. (What is anarchy, and what is the implication for Internet?) Addressing ---------- 128 bits, one address for every atom in the universe. Even with suboptimal allocation, that's 1500/square foot. Notation: colon-separated 4-digit hex numbers Aggregatable Global Unicast Address ----------------------------------- Aggregatable: you can use CIDR because they are handed out such that common prefix implies common route. Global: globally unique Unicast: this address identifies a single machine (not a set of machines) IPv4-compatible IPv6 address: zero extend the IPv4 address. even IPv4 routers can refer to you. IPv4-mapped IPv6 address: prefix FF and then zero-extend IPv6 routers refer to you this way, but you are clueless. Transition ---------- Goal: hosts and routers that only understand IPv4 can continue to lead productive lives. You just have to talk LOUDLY AND SLOWLY. For a long time (maybe forever), new routers will be expected to be bilingual (dual stacks). To get IPv4 packets from one IPv6 compliant zone to another, we can use tunneling. Remember? The last IPv6 router wraps the packet inside an IPv4 packets and forwards it to the first IPv6 router at the destination. Any IPv6 router or host along the way can do the unpacking, as long as there are no more IPv4 routers. If the recipient has an IPv4-mapped IPv6 address, then and IPv6 router or host can do the unwrapping and generate an IPv4 header with no prior configuration. Otherwise, the wrapper has to know the IPv4 address of either the recipient or an IPv6 router between the recipient and the last IPv4 router. Aggregration ------------ Internet has a natural (intrinsic) hierarchy: A small number of direct providers (backbones) connected to A larger number of indirect providers connected to A very large number of subscribers (nontransit AS). And so... God assigns prefixes to registries (gubment or NGO) Registries assign prefixes to direct providers. Direct providers assign prefixes to indirect providers. Indirect providers assign prefixes to subscribers (like Colby) Subscribers hand out the rest according to local policy. See the picture on page 335. There is a natural mapping between aggregated addresses and trees. Packet format ------------- Version. Traffic class: QoS Flow label: remember day one we talked about multiple flows between the same pair of machines? (Why would the net care which flow a packet belonged to?) PayloadLen NextHeader: either an indicator of options to follow, or a protocol identifier. HopLimit = ttl 40 bytes total! Autoconfiguration ----------------- Currently, a new host needs to know: It's IP address, the local subnet mask, the address of a name server. DHCP servers can solve the first problem. Similar approach might solve the second, but: 1) depends on the consistency and availability of a server! Instead, we can use a unique interfece ID (like an Ethernet address) as a host ID and get the network address by snooping.