README 0.2.1 (1999-Dec-05-Sun)
Adam M. Costello <amc@cs.berkeley.edu>
Overview of the EECS 122 (fall 1999) project code


Introduction
============

This overview is a supplement to, not a replacement for, the .h files.
You cannot fully understand and comply with the interfaces without
reading the comments in the .h files.

  Interface you must implement (do not change this):
      * transport.h 

  Interfaces that transport.c can depend on (do not change these):
    Indispensible:
      * bytestream.h 
      * network.h 
      * timer.h 
    Very useful:
      * crc32.h 
      * integer.h 
    Possibly useful:
      * random.h
      * warnf.h 

  Code we will link with transport.c (do not change these):
      * crc32.c
      * integer.c
      * random.c
      * warnf.c

  Example code (use it, change it, or ignore it, but transport.c must
  not depend on it):
      * deliver_packet.h 
      * event.c 
      * main.c 
      * Makefile 
      * network.c 
      * streamers.c 
      * streamers.h 
      * timer_expire.h 
      * transport.c 

We assume a model in which each host has one network address and one
transport layer, which is either a transport source (for outgoing
connections) or a transport sink (for incoming connections).  Of course
it's unrealistic for a host to be able to have only outgoing or only
incoming connections--in real life, connections are bidirectional, but
in this project they are unidirectional.

transport.h defines an interface to the transport layer, that is,
functions which entities outside the transport layer can call to
communicate with the transport layer.  You must write transport.c, which
implements the functions declared in transport.h.  That is the only
file you will turn in.  We could have separated the source and sink
interfaces into transport_source.h and transport_sink.h, and separated
the implementations into transport_source.c and transport_sink.c.
Don't be misled by the fact the code for the source and sink is in the
same file--the two entities are independent, and can interact only by
exchanging packets.

In order for a connection to exist between the transport layers on two
hosts, both must know about it.  In real life, they must do connection
setup: one waits for a connection request, and the other sends a
connection request, then the two exchange a few messages to get in sync
with each other.  In this project, you are not expected to do connection
setup.  Both ends will simply be told to create the connection, so
each can (independently) initialize itself into a connected state.
Similarly, you are not expected to connection tear-down, where one end
tells the other to close the connection.  Instead, both ends will simply
be told to delete the connection, and any unacknowledged data may be
discarded.


Interface to the higher layer
=============================

The transport layer is supposed to provide reliable one-way bytestream
connections.  In order for the transport layer to have something to do,
the source end of each connection must be supplied with bytes to carry,
and the sink end must have a place to deliver them.  The transport layer
doesn't care where these bytes are coming from or where they're going,
so we use abstract entities: for each connection, there is a "byte
source" producing the bytes, and a "byte sink" consuming them.

bytestream.h defines an interface for moving a stream of bytes from
one entity (a byte source) to another entity (a byte sink).  So when
the byte source transfers bytes to the transport source, the transport
source is playing the role of a byte sink.  Similarly, when the
transport sink transfers bytes to the byte sink, the transport sink is
playing the role of a byte source.  Notice the distinction between "the"
byte source for a connection, and "a" byte source (anything that sends a
stream of bytes).

When a connection is created, the transport source is given the
addresses and port numbers that identify the connection, and an
identifier for the byte source.  It must remember the association
between the byte source and the connection (if multiple connections
are being supported), because the functions for transfering bytes and
deleting the connection refer to the connection using only the byte
source identifier.  The story is the similar for the transport sink.

Whenever bytes are moved from the byte source to the transport
source, the transport source must save them in a buffer until they
are acknowledged sometime later.  When the connection is created, the
transport source is told an upper limit on the number of unacknowledged
bytes it is allowed to buffer.  If the byte source offers more bytes
than the transport source is allowed to accept, the transport source
must accept fewer bytes than were offered.  Any time a byte sink accepts
fewer bytes than were offered, it is obligated to call the byte source's
go-ahead method at some later time, when it is able to accept more
bytes.  In the case of the transport source, the only sensible time to
refuse bytes is when the buffer is full, so the only sensible time to
call the go-ahead method is when the the buffer changes from full to
non-full.  The only way the buffer can become non-full is when bytes are
acknowledged, which can happen only when a packet arrives.

Similarly, the transport sink is told an upper limit on the number of
bytes it may buffer.  If the byte sink refuses to accept all the bytes,
the transport sink may refrain from offering any more bytes until its
go-ahead method, transport_sink_okay_to_deliver(), is called by the byte
sink.


Interface to the lower layer
============================

In order for the transport layers of two hosts to communicate, they
must use the network layer to send packets.  The interface for sending
packets is defined by network.h.  Notice that at the interface between
two layers, the "payload" is whatever the the higher layer gives to the
lower layer to be carried.  So the transport layer will have its own
notion of header (sequence number, CRC, etc) and data (bytes produced by
the higher layer), but will pass all of that to the network layer as the
"payload".  Whenever a packet arrives for a transport source or sink,
the function transport_*_receive_packet() is called (see transport.h).
The network layer is best-effort, meaning it can drop, reorder, delay,
alter, and misroute packets.


Implications of the project requirements
========================================

The connection must be reliable in spite of the unreliability of the
network layer, so you must use an error detection code to detect
corruption of packets (both data and ack packets).  The crc32.h
interface is provided for your convenience.  To detect errors, simply
have the receiver perform the same CRC calculation as the sender, and
see if it gets the same answer (the sender must include the answer
in the packet).  The easiest way to detect misrouted packets is to
include the network addresses in the transport-layer header, so they get
covered by the CRC.  (Since the network addresses also appear in the
network-layer header, TCP includes them in its checksum calculation, but
not in the actual TCP header.  The receiver must retrieve the addresses
from the network layer in order to check the CRC.  You can do the same
trick in this project, but it will make no difference to your grade.)

Reliability also requires the use of sequence numbers.  You can number
packets ("this is packet number 3"), or individual bytes ("this packet
begins with byte number 1536"), or both.  Either way can work, and
people will disagree about which is easier.  You need to make sure that
sequence numbers don't get reused before they leave the network, which
means you need to have enough distinct sequence numbers.  You are not
given an upper bound on the network delay or the data rate, so in theory
you need an infinite number, but in practice 32-bit sequence numbers are
adequate (though as networks get faster we may need more bits in the
not-too-distant future).

Reliability also requires you to heed the warnings in network.h about
data types being represented differently on different machines.  Bytes
can be copied into and out of packet payloads using memcpy(), while
integers can be copied into and out of packet payloads using the
functions in integer.h.

In order to be efficient, you must not wait for one packet to be
acknowledged before sending more packets; that is, you must use a
sliding window protocol.  You can represent the window as a number
of packets or as a number of bytes.  Larger windows are generally
more efficient, but there are two restrictions on the window size:
the transport source's buffer (the transport source needs to be able
to retransmit anything it sends), and flow control (the transport
source must not send data that the transport sink cannot acknowledge
for lack of buffer space).  The easiest way to prevent the transport
source from overflowing the transport sink's buffer is to shrink the
window size as the buffer fills up, which is typically accomplished
by indicating the amount of free buffer space in every ack.  The
transport_*_create_connection() functions specify upper limits on the
buffer sizes for each connection.

Efficiency also requires you to avoid sending too many unnecessary
retransmissions, which means you need to measure the round-trip time.
The congestion control lecture notes and homework 9 talk about how to do
this (taking into account the fact that the RTT can be partly random and
can vary over time).

One slightly tricky part of flow control is making sure the transport
source can restart after it has stopped.  The transport source will not
know when it's okay to send more data unless the transport sink tells
it, but the sink usually sends ack packets only in response to data
packets.  You could have the sink send an ack when its buffer changes
from full to non-full, but that ack could get lost, and then you'll
still have both ends waiting for the other to do something.  One way to
prevent the deadlock is to have the "stopped" source periodically send
empty data packets (which is permissible), which will provoke acks.

One might wonder how the transport sink can honor its buffer limit when
packets arrive out of order.  If the buffer fills up with out-of-order
packets, then when the missing packet finally arrives, there's no space
for it.  But if the transport source is doing its job correctly, it
sends only as much data as the transport sink has space for, so this
situation will never arise.

Concentrate on getting a transport layer that is correct (always gets
all the bytes across) before worrying too much about flow control or
efficiency.

If you support multiple connections, you must run multiple instances of
your protocol.  Your transport layer functions that get called by the
higher layer can use the byte_source or byte_sink arguments to determine
which connection the call pertains to, while your *_receive_packet()
functions can use the network addresses and port numbers to make that
determination.

If you wish to do congestion control, you must adjust your sending rate
in response to the channel rate, which can vary over time.  One way
to do this is to adjust your window size.  The network will indicate
congestion by dropping packets.  In addition to finding a rate that's
not too fast and not too slow, a congestion control algorithm should be
fair--when two connections are using the same bottleneck, they should
get roughly the same bandwidth, or storage in the network, or somewhere
in between.  The algorithms used by TCP Tahoe (congestion avoidance and
slow start) are considered "fair enough" for this project.  I suspect
that using congestion avoidance without slow start would be just as fair
and almost as efficient, but I'm not sure.


Simulation framework
====================

When you work on this project, you need to play two roles.  While
writing transport.c, you need to think as if you know nothing about what
exists outside the transport layer, except what is said in transport.h,
bytestream.h, network.h, and timer.h.

That leave many possibilities for what *could* be happening outside
the transport layer.  In order to test your transport layer, you need
some entities to act like byte sources, byte sinks, a network, timers,
etc.  The example code shows one way this could all be done, but you
will need to greatly modify it to see if your transport layer meets
the requirements.  For example, you'll need a network that drops and
reorders packets (random.h might be useful for that), and you'll need a
byte sink that doesn't always accept bytes as fast as they arrive (to
see whether the transport source eventually stops sending).

While writing your test code, you just need to create the illusion of
timers, a network, etc.  In the example code, timers do not keep track
of real time.  Setting a timer just adds an item to a list of future
expiration events, and main() repeatedly calls timer_expire(), which
gets the next event off the list, advances the pretend clock to the
expiration time of that event, and calls the handler for that event.
You probably have no reason to modify event.c or timer_expire.h .

The example network.c delays packets (and could drop them, alter
them, etc), but calls deliver_packet() to actually deliver packets to
their destinations.  The deliver_packet() function is implemented in
the example main.c, because that's where addresses were assigned to
transport sources and sinks, so that's the easiest place to figure out
which transport source/sink to pass the packet to.

The example streamers.h and streamers.c provide objects that produce and
consume bytes, using the bytestream.h interface.  You could use this as
inspiration for a variety of byte producers and consumers, each with its
own .h and .c file, or you could generalize the streamers.h interface
to allow you to customize the behavior of each produce/consumer at
run-time.