README 0.2.1 (1999-Dec-05-Sun) Adam M. Costello Overview of the EECS 122 (fall 1999) project code Introduction ============ This overview is a supplement to, not a replacement for, the .h files. You cannot fully understand and comply with the interfaces without reading the comments in the .h files. Interface you must implement (do not change this): * transport.h Interfaces that transport.c can depend on (do not change these): Indispensible: * bytestream.h * network.h * timer.h Very useful: * crc32.h * integer.h Possibly useful: * random.h * warnf.h Code we will link with transport.c (do not change these): * crc32.c * integer.c * random.c * warnf.c Example code (use it, change it, or ignore it, but transport.c must not depend on it): * deliver_packet.h * event.c * main.c * Makefile * network.c * streamers.c * streamers.h * timer_expire.h * transport.c We assume a model in which each host has one network address and one transport layer, which is either a transport source (for outgoing connections) or a transport sink (for incoming connections). Of course it's unrealistic for a host to be able to have only outgoing or only incoming connections--in real life, connections are bidirectional, but in this project they are unidirectional. transport.h defines an interface to the transport layer, that is, functions which entities outside the transport layer can call to communicate with the transport layer. You must write transport.c, which implements the functions declared in transport.h. That is the only file you will turn in. We could have separated the source and sink interfaces into transport_source.h and transport_sink.h, and separated the implementations into transport_source.c and transport_sink.c. Don't be misled by the fact the code for the source and sink is in the same file--the two entities are independent, and can interact only by exchanging packets. In order for a connection to exist between the transport layers on two hosts, both must know about it. In real life, they must do connection setup: one waits for a connection request, and the other sends a connection request, then the two exchange a few messages to get in sync with each other. In this project, you are not expected to do connection setup. Both ends will simply be told to create the connection, so each can (independently) initialize itself into a connected state. Similarly, you are not expected to connection tear-down, where one end tells the other to close the connection. Instead, both ends will simply be told to delete the connection, and any unacknowledged data may be discarded. Interface to the higher layer ============================= The transport layer is supposed to provide reliable one-way bytestream connections. In order for the transport layer to have something to do, the source end of each connection must be supplied with bytes to carry, and the sink end must have a place to deliver them. The transport layer doesn't care where these bytes are coming from or where they're going, so we use abstract entities: for each connection, there is a "byte source" producing the bytes, and a "byte sink" consuming them. bytestream.h defines an interface for moving a stream of bytes from one entity (a byte source) to another entity (a byte sink). So when the byte source transfers bytes to the transport source, the transport source is playing the role of a byte sink. Similarly, when the transport sink transfers bytes to the byte sink, the transport sink is playing the role of a byte source. Notice the distinction between "the" byte source for a connection, and "a" byte source (anything that sends a stream of bytes). When a connection is created, the transport source is given the addresses and port numbers that identify the connection, and an identifier for the byte source. It must remember the association between the byte source and the connection (if multiple connections are being supported), because the functions for transfering bytes and deleting the connection refer to the connection using only the byte source identifier. The story is the similar for the transport sink. Whenever bytes are moved from the byte source to the transport source, the transport source must save them in a buffer until they are acknowledged sometime later. When the connection is created, the transport source is told an upper limit on the number of unacknowledged bytes it is allowed to buffer. If the byte source offers more bytes than the transport source is allowed to accept, the transport source must accept fewer bytes than were offered. Any time a byte sink accepts fewer bytes than were offered, it is obligated to call the byte source's go-ahead method at some later time, when it is able to accept more bytes. In the case of the transport source, the only sensible time to refuse bytes is when the buffer is full, so the only sensible time to call the go-ahead method is when the the buffer changes from full to non-full. The only way the buffer can become non-full is when bytes are acknowledged, which can happen only when a packet arrives. Similarly, the transport sink is told an upper limit on the number of bytes it may buffer. If the byte sink refuses to accept all the bytes, the transport sink may refrain from offering any more bytes until its go-ahead method, transport_sink_okay_to_deliver(), is called by the byte sink. Interface to the lower layer ============================ In order for the transport layers of two hosts to communicate, they must use the network layer to send packets. The interface for sending packets is defined by network.h. Notice that at the interface between two layers, the "payload" is whatever the the higher layer gives to the lower layer to be carried. So the transport layer will have its own notion of header (sequence number, CRC, etc) and data (bytes produced by the higher layer), but will pass all of that to the network layer as the "payload". Whenever a packet arrives for a transport source or sink, the function transport_*_receive_packet() is called (see transport.h). The network layer is best-effort, meaning it can drop, reorder, delay, alter, and misroute packets. Implications of the project requirements ======================================== The connection must be reliable in spite of the unreliability of the network layer, so you must use an error detection code to detect corruption of packets (both data and ack packets). The crc32.h interface is provided for your convenience. To detect errors, simply have the receiver perform the same CRC calculation as the sender, and see if it gets the same answer (the sender must include the answer in the packet). The easiest way to detect misrouted packets is to include the network addresses in the transport-layer header, so they get covered by the CRC. (Since the network addresses also appear in the network-layer header, TCP includes them in its checksum calculation, but not in the actual TCP header. The receiver must retrieve the addresses from the network layer in order to check the CRC. You can do the same trick in this project, but it will make no difference to your grade.) Reliability also requires the use of sequence numbers. You can number packets ("this is packet number 3"), or individual bytes ("this packet begins with byte number 1536"), or both. Either way can work, and people will disagree about which is easier. You need to make sure that sequence numbers don't get reused before they leave the network, which means you need to have enough distinct sequence numbers. You are not given an upper bound on the network delay or the data rate, so in theory you need an infinite number, but in practice 32-bit sequence numbers are adequate (though as networks get faster we may need more bits in the not-too-distant future). Reliability also requires you to heed the warnings in network.h about data types being represented differently on different machines. Bytes can be copied into and out of packet payloads using memcpy(), while integers can be copied into and out of packet payloads using the functions in integer.h. In order to be efficient, you must not wait for one packet to be acknowledged before sending more packets; that is, you must use a sliding window protocol. You can represent the window as a number of packets or as a number of bytes. Larger windows are generally more efficient, but there are two restrictions on the window size: the transport source's buffer (the transport source needs to be able to retransmit anything it sends), and flow control (the transport source must not send data that the transport sink cannot acknowledge for lack of buffer space). The easiest way to prevent the transport source from overflowing the transport sink's buffer is to shrink the window size as the buffer fills up, which is typically accomplished by indicating the amount of free buffer space in every ack. The transport_*_create_connection() functions specify upper limits on the buffer sizes for each connection. Efficiency also requires you to avoid sending too many unnecessary retransmissions, which means you need to measure the round-trip time. The congestion control lecture notes and homework 9 talk about how to do this (taking into account the fact that the RTT can be partly random and can vary over time). One slightly tricky part of flow control is making sure the transport source can restart after it has stopped. The transport source will not know when it's okay to send more data unless the transport sink tells it, but the sink usually sends ack packets only in response to data packets. You could have the sink send an ack when its buffer changes from full to non-full, but that ack could get lost, and then you'll still have both ends waiting for the other to do something. One way to prevent the deadlock is to have the "stopped" source periodically send empty data packets (which is permissible), which will provoke acks. One might wonder how the transport sink can honor its buffer limit when packets arrive out of order. If the buffer fills up with out-of-order packets, then when the missing packet finally arrives, there's no space for it. But if the transport source is doing its job correctly, it sends only as much data as the transport sink has space for, so this situation will never arise. Concentrate on getting a transport layer that is correct (always gets all the bytes across) before worrying too much about flow control or efficiency. If you support multiple connections, you must run multiple instances of your protocol. Your transport layer functions that get called by the higher layer can use the byte_source or byte_sink arguments to determine which connection the call pertains to, while your *_receive_packet() functions can use the network addresses and port numbers to make that determination. If you wish to do congestion control, you must adjust your sending rate in response to the channel rate, which can vary over time. One way to do this is to adjust your window size. The network will indicate congestion by dropping packets. In addition to finding a rate that's not too fast and not too slow, a congestion control algorithm should be fair--when two connections are using the same bottleneck, they should get roughly the same bandwidth, or storage in the network, or somewhere in between. The algorithms used by TCP Tahoe (congestion avoidance and slow start) are considered "fair enough" for this project. I suspect that using congestion avoidance without slow start would be just as fair and almost as efficient, but I'm not sure. Simulation framework ==================== When you work on this project, you need to play two roles. While writing transport.c, you need to think as if you know nothing about what exists outside the transport layer, except what is said in transport.h, bytestream.h, network.h, and timer.h. That leave many possibilities for what *could* be happening outside the transport layer. In order to test your transport layer, you need some entities to act like byte sources, byte sinks, a network, timers, etc. The example code shows one way this could all be done, but you will need to greatly modify it to see if your transport layer meets the requirements. For example, you'll need a network that drops and reorders packets (random.h might be useful for that), and you'll need a byte sink that doesn't always accept bytes as fast as they arrive (to see whether the transport source eventually stops sending). While writing your test code, you just need to create the illusion of timers, a network, etc. In the example code, timers do not keep track of real time. Setting a timer just adds an item to a list of future expiration events, and main() repeatedly calls timer_expire(), which gets the next event off the list, advances the pretend clock to the expiration time of that event, and calls the handler for that event. You probably have no reason to modify event.c or timer_expire.h . The example network.c delays packets (and could drop them, alter them, etc), but calls deliver_packet() to actually deliver packets to their destinations. The deliver_packet() function is implemented in the example main.c, because that's where addresses were assigned to transport sources and sinks, so that's the easiest place to figure out which transport source/sink to pass the packet to. The example streamers.h and streamers.c provide objects that produce and consume bytes, using the bytestream.h interface. You could use this as inspiration for a variety of byte producers and consumers, each with its own .h and .c file, or you could generalize the streamers.h interface to allow you to customize the behavior of each produce/consumer at run-time.