TCP congestion control (RFC 2581), continued:

    TCP-Reno (1990) included fast retransmit and fast recovery (also
    developed by Van Jacobson).

    fast retransmit:
        Interpret three duplicate acks (dupacks) as an early indication
        of loss (before the timeout) (duplicate acks could also
        be caused by reordering, or duplication by the network).
        Retransmit the packet and set the slow-start threshhold to
        half the amount of unacked data, just as would be done after a
        timeout, but don't do slow start, just set the congestion window
        to the threshhold.

    fast recovery:
        After a fast retransmit, as long as the acks are duplicates,
        inflate the congestion window by one packet for each dupack
        (because each the dupack presumably means that a packet has left
        the network).  Once the missing data is acknowledged, deflate
        the congestion window back to the threshhold.

    These work well for one loss within a window, but not for more.
    NewReno (RFC 2582) interprets partial acks (that acknowledge the
    original retransmission, but not all the data sent before it) as
    indications of more losses, and sends more fast retransmissions.
    SACK (selective acknowledgement) (RFC 2018) is an extension to the
    TCP header that lets the receiver say exactly what has arrived and
    what hasn't.  This gets quite complicated.

    TCP-Vegas (not recommended) tries to use a constant amount of space
    in router buffers, rather than divide the entire available amount
    among contenders.  It compares the round-trip time to the minimum
    round-trip time, to infer the queue time, which it tries to keep
    small.  One potential problem is that the source might never see an
    undelayed packet.  Also, it can't compete against Tahoe or Reno, so
    who would be willing to use it?

Ack clock:

    This is only tangentially related to congestion control.  In any
    sliding window protocol, the source typically sends a new data
    packet whenever it receives an ack.  Therefore, even if the first
    link is much faster than the slowest link on the path, making it
    possible for the source to send its data in bursts, the data packets
    will become spread out in time on the slow link, causing the acks to
    be spread out, causing the subsequent data packets to be spread out.
    Congestion is less severe when data flows are steady than when they
    are bursty.

Fairness:

    When network resources are shared by multiple TCP connections, there
    is a question of whether each connection gets a fair share of the
    resources.

    bandwidth vs. storage: 
        Users generally care about how much bandwidth they get, so one
        definition of fairness would be that each connection should get
        the same bandwidth.

        On the other hand, a path of ten 10 Mbps links can support
        either a single 10 Mbps spanning all ten links, or ten 10 Mbps
        connections each spanning one link.  This suggests another
        notion of what are the resources that are being shared: storage
        in the network (on links and in queues).  If storage is
        divided equally, then each TCP connection should get the same
        bandwidth-delay product, which is roughly proportional to the
        window size.  (The window size is the product of bandwidth and
        round-trip time, rather than one-way delay.)

        There is no consensus on what fairness should mean.  TCP's
        congestion control algorithms tend to give roughly equal windows
        to competing connections.

    additive increase multiplicative decrease:
        TCP's congestion control increases the window additively (adding
        one packet at a time in the absence of loss), and decreases it
        multiplicatively (cutting it in half when a loss is detected).
        This means that if the resources are allocated unfairly to start
        with, all connections increase their allocation at the same
        rate, but a connection with a larger share will decrease more
        quickly than one with a smaller share.  Therefore an additive
        increase multiplicative decrease policy tends to make the
        allocation more fair over time.

    different round-trip times:
        Actually, if competing TCP connections have different round-trip
        times (RTT), one with a larger RTT will increase its window
        more slowly, because they all add one packet per RTT.  On the
        other hand, a connection with a larger window is more likely to
        experience a loss (because it has more packets in the network
        that could be lost).  Different RTTs complicates analysis of
        TCP, but to a first-order, competing connections get roughly
        equal window sizes on average.

RTT estimation:

    If TCP's retransmission timeout is too small, a slow packet will be
    misinterpreted as being lost, and an unnecessary retransmission will
    be sent.  Before the introduction of congestion control in 1988,
    this was a minor annoyance--if 5% of the packets were mistakenly
    thought to be lost, then the source would send at 105% of the rate
    it needed to.  But with congestion control, every window of packets
    that experiences at least one loss causes the congestion window to
    be halved.  This will cause a connection that mistakenly thinks 5%
    of its packets are lost to have an average window size of about
    four packets, even if the network is uncongested.  (For a packet
    loss probability of p, the average window size will be roughly
    sqrt(1/p) packets.)  The best predictor of future round-trip times
    is past round-trip times, so the addition of congestion control was
    accompanied by an improvement in round-trip time estimation.

    The old method used an exponentially weighted moving average to
    estimate only the mean RTT.  Each ack provides one sample RTT, which
    is used to update the estimated mean, which is used to set the
    retransmission timeout:

        error = sampleRTT - mean
        mean += a * error         /* 0 <= a <= 1 */
        timeout = d * mean        /*   d >= 1    */

    The rationale behind exponentially weighted moving averages is that
    they are easy to compute (as shown above), they approach the true
    mean (because the error is more likely to push the estimate toward
    the true mean than away from it), and they track changes in the true
    mean (because more recent samples contribute more to the estimate
    than older samples).

    RFC 793 suggested values of 0.1 to 0.2 for a, and 1.3 to 2.0 for d.

    The problem with estimating only the mean is that if d is too small,
    the random variation in the round-trip time may cause a significant
    number of packets to be mistakenly thought to be lost, while if d is
    too large then retransmissions will be delayed unnecessarily.  The
    1988 improvement estimates the variation in round-trip time as well
    as the mean.  Technically, it estimates the "mean deviation", which
    is the average absolute difference between the samples and the mean.
    The mean deviation is estimated using an exponential weighted moving
    average just like the one used to estimate the mean:

        error = sampleRTT - mean
        mean += a * error                        /* 0 <= a <= 1 */

        sample_dev = |error|
        dev_error = sample_dev - mean_dev
        mean_dev += b * (sample_dev - mean_dev)  /* 0 <= b <= 1 */

        timeout = mean + c * mean_dev            /*   c >= 0    */

    The usual values for the constants are a = 1/8, b = 1/4, c = 4.  In
    practice this does a good job of adapting to the round-trip times of
    the channel, setting timeouts that are long enough but not much too
    long.

Congestion control challenges:

    In recent years, amount of UDP traffic has increased.  It used to
    be used mainly for DNS, but now it is also used for real-time audio
    and video, since the reliability and delays of TCP are not suited to
    such applications.  Since UDP currently has no congestion control,
    this poses a risk of congestion collapse if the majority of traffic
    on the Internet becomes UDP traffic.  There is research being done
    to factor out the congestion control from TCP into a separate layer,
    so that it could be used by both UDP and TCP.

    Another risk comes from TCP implementations that, either
    accidentally or intentionally, do not back off as much as they are
    supposed to.  If too many TCP connections try to get more than
    their fair share of the network, we will have congestion collapse.
    One proposed solution is to add mechanisms to routers to detect
    high-bandwidth sources that don't respond to congestion signals, and
    put their connections into "penalty boxes", so their packets get
    lower quality service than normal packets.  This would provide an
    incentive to implementors to do congestion control properly.

    One disadvantage of TCP's congestion control is that it always tries
    to keep the router queues full.  A proposed method for keeping
    the queues mostly empty is called Random Early Detection (RED).
    Whenever a packet arrives, the router could randomly decide whether
    to drop a packet from the queue, with a higher probability if the
    queue is more full.  At low loads, the queue would usually be nearly
    empty, so very few packets would be dropped, while at higher loads,
    enough packets would be dropped to cause the sources to back off,
    without requiring the queue to fill up.  The packet to be dropped
    would not necessarily be the one that just arrived, but would be
    selected at random, so that a source sending at a higher rate (one
    with more packets in the queue) is more likely to be signalled to
    slow down.  If the packet to be dropped has the ECN-capable bit set,
    then the router can set the congestion-experienced bit rather than
    drop it (see the previous lecture notes).

    With the growth of the web, more and more traffic takes the form of
    short-lived connections.  Often the entire amount of data to be sent
    fits on the network path (the bandwidth-delay product), meaning it
    could be transferred in one round-trip time.  But the source has no
    way of knowing the capacity of the network beforehand, so to avoid
    congestion it gradually increases it window from one packet over
    several round-trip times.  This is a frustrating underuse of the
    network, and is a hard problem.