by Christos Papadopoulos,
Guru Parulkar, and George Varghese
summarized by Adam M. Costello
A loss always affects a subtree of the multicast distribution tree.
A member just below the loss link sends a request to a member just above the loss link, who sends a reply to the loss subtree.
Each router selects one of its downward links to be the replier link.
This means each subtree has a replier (leader member) chosen from among the repliers of the child subtrees.
The paper says that certain routers choose their upward link as replier. For the root router, this increases exposure. For other routers, it can cause a request moving downward to backtrack. I think there should be no exceptions; each router should choose a downward link.
A request is forwarded onto the replier link, unless it came from the replier link, in which case it is forwarded upward.
If a router receives a request from each of its child links, it will forward one of them (from the replier link) upward, and the others downward (onto the replier link).
By induction, for any subtree, a request sent by its replier will escape the subtree; requests sent by other members will not.
In particular, if every member of a loss subtree sends a request, exactly one request will escape the loss subtree, even though no one knows where the loss occurred. The other requests will be ignored, because the receivers don't have the data.
In my opinion, that last point is the essence of the proposed scheme, and the major contribution of this paper.
A request goes up for a while, then goes down until it hits a leaf. The router that bounces it back down is the turning point. The turning point writes into the request its own address and an identifier for the link on which the request arrived.
The responder uses the address and link ID in the request to send the reply to the subtree that the request escaped. This can be implemented by encapsulating a multicast packet inside a unicast packet (roughly).
A router can aim to minimize the loss rate advertised by the member and/or the distance to the replier.
I think it's premature to nail down the details. Multicast routing is a moving target, and Cisco is willing to change the routers.
If a loss occurs on a replier link, the request will go higher than necessary, and the reply will go to members that don't need it.
For a binary tree, the average exposure for a loss at height h is h/2 versus 2^h for schemes without local recovery.
I would have liked to see a more general analysis.
When a loss occurs above a subtree, that subtree's replier receives a request from each sibling-link along the replier path.
The routers can keep track of the implosion for each replier and take it into account when choosing replier links.
That doesn't always fix the problem. For a perfectly balanced tree, all your choices are equivalent.
In CBT, the routers lack per-source state, so they can't distinguish up from down.
No, take a step back. If you can construct a tree, you can use it. Someone must have known the root of the tree, if not the routers, then the end nodes. If the root address does not appear in the routing tables, it can appear in the packet.
At the leaves, elect one member on the LAN to be replier.
But what about inside the tree? Multicast routing already has to deal with this. We should look at what it does.
How do you choose the timer?
Soft state will eventually time out, but members can also test repliers with a special handshake. If it fails, the member can send a message to the turning point router telling it to pick a different replier link.
This may cause every router along the replier path of the loss subtree to change its replier link.
A router with many links causes large implosion. It can logically partition itself into multiple small routers.
Binary tree, uniform loss probability across all links, only original data packets can be lost.
The average recovery latency seen by a receiver is typically a tad less than the round-trip time between the receiver and the source.
The average nuisance (replies / losses) seen by a receiver is about 0.1 for 8 members and decreases for larger groups.
I would have liked to hear about the variance as well as the mean.
For an imaginary wide-area network, the average latency was about half the RTT, and the nuisance was about the same.
Again, it's too early for me to be interested in this, but they have implemented some of the endpoint and router functions in NetBSD. 250 lines of code appears in the appendix.