Tangle specification 0.3.0 (2003-Jul-29-Tue)
Adam M. Costello
http://www.nicemice.net/amc/


Introduction
============

Tangle is a peer-to-peer protocol for mapping resource labels to
resource locations; it is a kind of distributed hash table.  The design
of Tangle has been influenced by a particular higher-layer application,
Buzz, which is like the Web except that resources are not associated
with any particular servers.

This specification is not final.  Some details have yet to be filled in,
and there is not yet a strong commitment to backward compatibility as
the specification evolves.


Model
=====

The same Tangle protocol is run on every node.  Each node has a node
address, plus a message address for sending Tangle protocol messages to
the node, and a connection address used by higher layers (like Buzz,
which uses the connection address for transporting entities).  The
message and connection addresses are relative to the node address, and
are useful only in conjunction with it.  (For example, on the internet,
the node address is an IP address, the message address is a UDP port
number, and the connection address is a TCP port number.)  A well-known
one-way function, applied to the node address, yields the node's label,
which is a bit string.  Resources also have labels, and a node will
tend to know the locations of resources whose labels are "close" to the
node's label.  The closeness of two labels is measured by the length of
their longest common prefix.

The one-way function from addresses to labels has not yet been defined.


Data structures
===============

Historic peers
--------------

Each node maintains a list of historic peers, where each peer consists
of a node address and a message address.  This list is kept in stable
storage, and changes relatively slowly while the node runs.  Before
a node is run for the first time, this list needs to be initialized
manually.

Peers are removed from the list only as they are added, so that the
list never shrinks.  Whenever a peer is added to the list, it is added
at the front.  Then, if the list length exceeds some threshold r, one
peer is removed as follows:  With probability p, the front peer is
removed, otherwise with probability p, the next peer is removed, and so
on.  If the list is exhausted before the process stops, then no peer is
removed.  Therefore the list grows over time, and its expected length
is O(log(t)).  More precisely, after n additions to an initially empty
list, the length is n for n <= r, and about ln(cn-cr+exp(cr)) / c for
n > r, where c = ln(1/(1-p)).  [PARAMETER: The probability p, and the
threshold r.]  [The goal of this procedure is to achieve the following
property:  If peers are added at a uniform rate, then the peers in the
list will have ages such that log(age) is uniformly distributed.  The
purpose of the threshold is to grow the list quickly at the beginning.]


Unconfirmed peers
-----------------

Each node X maintains an array of sets of unconfirmed peers, one set for
each bit position in its label.  The set corresponding to the first bit
position is called the "top" set, and the set corresponding to the last
bit position is called the "bottom" set.  Each unconfirmed peer consists
of its node address and its message address.

The set corresponding to particular bit position contains peers whose
labels disagree with X's label in that position, but agree with X's
label in all earlier positions.

Notice that for any label, there is exactly one set that is appropriate
for that label.  (Exception: there is no set appropriate for the node's
own label.)

Sets do not contain duplicate node addresses; if a peer is being added
with the same node address as a peer already in the set, the new peer
replaces the old one (the message addresses might be different).  There
is an upper limit on the size of a set; whenever a set momentarily
exceeds this limit (which should be finite), a member is chosen
uniformly at random and removed, until the set size is back down to the
limit.  [PARAMETER: Unconfirmed peer set limit.]

Syntactically invalid addresses should not be admitted into the sets.

Confirmed peers
---------------

Each node X maintains an array of sets of confirmed peers, one set for
each bit position in its label, similar to the unconfirmed array, but a
confirmed peer contains more data:

    node and message and connection addresses of the peer
    label of the peer (computable from addresses)
    round-trip time between X and the peer
    node parameters of the peer

The node parameters are:

    uptime (how long the peer had been up when it was added to the set)
    bandwidth between the peer and the "core" of the network
    free space on the peer (available storage)

Confirmed peers are removed from the confirmed set whenever they've been
in the set too long.  [PARAMETER: Confirmed peer age limit, which should
vary in proportion to sqrt(max(n,1)), where n is the number of nonempty
unconfirmed peer sets above the uppermost empty one.]  Implementations
may also limit the size of confirmed sets in the same way they limit the
size of unconfirmed sets (and should do so if operations on the set take
O(n) time).  [PARAMETER: Confirmed peer set limit.]  Like unconfirmed
sets, confirmed sets do not contain duplicate node addresses; if a peer
is being added with the same node address as a peer already in the set,
the new peer replaces the old one (and the age is reset to zero).

A collection of node & messages addresses of recently confirmed peers
should occasionally be written to stable storage, to be used to
initialize the unconfirmed peers when the node starts up next time.

Location map
------------

Each node maintains a map from resource labels to sets of locations.  A
location consists of:

    node and message and connection addresses
      of a node caching the resource
    resource parameters

The resource parameters are:

    timestamp of the resource (which version is cached)
    completeness of the resource (how much of it is cached)
    old flag (is this version thought to be out of date?)

Locations are removed from the map whenever they reach a certain age
(counted from the time they were added to the map).  Implementations
should not impose any limit on the number of resources in the map.  They
may impose a limit on the number of locations in the set corresponding
to a single resource label (and should do so if operations on the set
take O(n) time).  When a set momentarily exceeds this limit, remove
one location, either the oldest or one chosen uniformly at random.
[PARAMETER: Location map set limit.]

Among the locations corresponding to a single resource label, each
<timestamp, node address> pair appears at most once.  Adding a new
location that would collide with an existing location causes the old
location to be replaced by the new location (whose age is then zero).

There is an invariant regarding the old flags:  Within the set of
locations corresponding to a single resource label, a location's flag
is turned on if the set contains another location with a strictly newer
timestamp.  Whenever a location is added, flags are turned on in order
to preserve the invariant.  Flags are never turned off.

For resource parameters appearing outside the location map (in
advertisements and local resources, see below), the flags are not
automatically changed, and there is no invariant.

Advertisement buffers
---------------------

Each node X maintains an array of buffers of advertisements, one buffer
for each bit position in X's label.  Each advertisement consists of:

    label of a resource
    location of the resource

The buffer for a particular bit position contains advertisements whose
labels disagree with X's label in that position, but agree with X's
label in all earlier positions (the same rule as for the peer sets).
There is a limit on the occupancy of a buffer, and a limit on how much
time an advertisement may stay in the buffer.  When either limit is
reached, all advertisements in the buffer are removed and sent in a post
message.  [PARAMETER: Advertisement buffer occupancy and age limits.
The occupancy cannot exceed 255, because a post message cannot hold more
than 255 advertisements.  An implementation can additionally impose
another limit to keep the message size below some maximum.]

Local resources
---------------

Each node maintains a set of local resources.  For each resource located
at that node, the set contains an element consisting of:

    label of the resource
    resource parameters

(See the location map section above for the resource parameters.)


Message types
=============

Every message contains a group identifier, which indicates which
population of nodes the message belongs to.  Messages are not
intentionally sent from one group to another.  Each group agrees on a
single version of Tangle, and on a higher-layer protocol to use on top
of Tangle.

There are five types of messages, with the following meanings and
contents (in addition to the group identifier):

post: "Store these resource locations in the distributed hash table."
    list of advertisements

query: "Tell me locations for this resource."
    node and message addresses of the requestor
    label of the resource
    cookie (opaque data to be copied blindly)
        
report: "This resource can be found at these locations."
    node and message addresses of the final destination (optional)
    cookie (if evoked by a query)
    label of the resource
    list of locations (empty means not found)

ping: "Are you there?"
    node and message addresses of the sender
    cookie (opaque data to be copied into a pong)

pong: "Yes, I'm here, and here are some other nodes to try."
    cookie (optional, to be copied into another pong)
    cookie (copied)
    node and message and connection addresses of the sender
    list of node and message addresses of other nodes
    node parameters of the sender


Protocol activities
===================

Quasi-periodicity
-----------------

Something is said to be quasi-periodic with period T if it happens
repeatedly and the time between successive occurrences is T on average.
It is recommended that each inter-occurrence time be an independent
random variable uniformly distributed between T/2 and 3T/2.

Choosing confirmed peers
------------------------

Several actions described below require confirmed peers to be chosen
for various purposes.  Such choices are made as follows.  First, if a
particular confirmed set has not been specified then choose a confirmed
set.  Second, choose a peer from the designated set.  If the specified
set is empty, or if no set could be chosen because all sets are empty,
then no peer can be chosen, and the action that required a confirmed
peer cannot be performed.

If a set needs to be chosen, start at the specified starting set (or
at the top set if no starting set is specified), then flip a coin
repeatedly and move down one set for each head, until a tail comes up.
Then, if that set is empty, move up until a non-empty set is found; if
you run past the top, move down until a non-empty set is found.

When choosing a peer from a set, use a uniform random distribution
unless some other distribution is explicitly invited.

If the distribution to be used is non-random and no confirmed set has
been specified (whenever this happens no starting set is specified
either) then don't choose a set, but instead choose the peer from among
the union of all the sets.

Peer discovery
--------------

Quasi-periodically send a ping to a peer.  Choose the peer as follows:
Let n be the number of nonempty unconfirmed sets.  For the purposes
of this activity, treat all sets beyond the first empty set as being
part of last nonempty set (or of the first set, if it is empty).  Flip
a coin repeatedly, and if n heads come up before the first tails,
choose a historic peer uniformly at random and send the ping to it
(notice that if there are no unconfirmed peers, we are guaranteed
to use a historic peer).  Otherwise, choose an unconfirmed set
uniformly at random from among the n nonempty sets (or cycle through
them in a round-robin fashion), then choose a peer from that set
uniformly at random, remove it from the set, and send the ping to it.
[PARAMETER: Unconfirmed peer ping period, which is suggested to vary
in proportion to 1/sqrt(max(n,1)).]  [The use of historic peers allows
probable recovery from network partitions.]

When a ping arrives, reply with a pong containing two cookies (one newly
created, and a second copied from the ping).

When sending a pong, choose confirmed peers at random to form the
list of other peers.  For each selection, the starting set is
determined as follows:  From the set appropriate for the label of
the pong-destination, flip coins and move up one set for each head
until a tail comes up (or until the top is reached).  [PARAMETER: How
many peer selections should be performed for one pong?  Tentative
recommendation: 2.]

When a pong arrives, use the copied cookie to validate the pong
and obtain a round-trip time (see the message formats section for
suggestions how to do this).  If the pong is valid, add the peer to
the appropriate confirmed set.  Add the other peers to the appropriate
unconfirmed sets.  If the pong contains two cookies, reply with a pong
containing one cookie (copied from the first cookie of the received
pong).

When a confirmed peer reaches its age limit, remove it from the set.

Quasi-periodically choose a confirmed peer and add it to the list of
historic peers.  [PARAMETER: Historic peer addition period.]

Hash table maintenance
----------------------

When a post message arrives, add each advertisement to the appropriate
buffer, and add the location to the label's set in the map.  But don't
add the advertisement to the buffer if enough locations for this label
are already known in the map.  Only locations at least as complete *and*
at least as new count toward inhibiting the addition.  [PARAMETER: How
many locations is enough?]

For each local resource, quasi-periodically create an advertisement for
it and process it as if it had arrived in a post message, except choose
the buffer randomly instead of using the buffer appropriate for the
label.  Start with the top buffer, then flip coins and move down one
buffer for each head until a tail comes up.  [PARAMETER: Advertisement
creation period.]

When an advertisement buffer reaches its limit on the number of
advertisements or the age of the oldest advertisement, remove all
the advertisements and send them in a post message to a peer chosen
uniformly at random from the confirmed set corresponding to the same bit
position as the buffer.

When a location is added to the map, old flags might change from off to
on.  Send cookieless indirect reports (see below) to the locations whose
flags changed.  There are two cases:  If the location that was added
gets flagged (because the set contains a newer timestamp), then send the
report to that location informing it of all the newer locations.  If
locations already in the set get flagged (because they are older than
the location that was added), then send one report to each location
that got flagged (unless its node address matches the location that was
added) informing it of the location that was added.  Both cases cannot
happen at once because of the invariant on the flags.

Hash table lookups
------------------

To look up locations for a resource label, send queries to
yourself repeatedly for as long as additional reports are desired.
[PARAMETER: Query period, which should perhaps adapt to the average
response time.]  Reports can deliver additional locations, or
already-learned locations, or indications that the resource appears not
to exist.

When a query arrives compare its resource label to your own node label
and find the first bit position in which they differ.  The confirmed
peer set corresponding to that bit position is the "closer-set"; it
contains all the confirmed peers who are "closer" to the resource label
than you are.  If that confirmed set is empty, or if any locations of
the requested resource are known, send to the requestor an indirect
report (see below) containing all known locations of the resource (even
if there are none) and containing the cookie copied from the query.
If the query arrived from yourself then choose a confirmed peer (and
if using a non-random distribution then restrict the choice to the
closer-set) and forward the query to that peer.  If the query did not
arrive from yourself and the number of known locations is not "enough"
then forward the query to a peer chosen from the closer-set.  Locations
that are incomplete or flagged as old don't count for inhibiting the
forwarding.  [PARAMETER: How many locations is enough?  Same as for
advertisements.]  The distribution for choosing the peer can be uniform
random, or skewed to favor lower-cost nodes (but too much skewing
can impair fault tolerance), or any distribution invited by options
in the query message.  The cost of a node might be proportional to
its round-trip time and/or inversely proportional to the number of
additional label bits it has in common with the requested label.

Reports
-------

When composing a report, the list of locations can be split across
multiple report messages in order to bound the message size.

When sending an indirect report, put the final destination in the
message, choose a confirmed peer, and send the message to it.  For
reports sent in response to queries, the distribution may be skewed
to favor peers with shorter round-trip times.  But when the final
destination is yourself, don't choose a peer at all, send it to yourself
instead (to avoid unnecessary network usage and to allow a one-node
community to function) (and in this case you can optionally bypass the
indirection and send a direct report instead).

When an indirect report (one containing a message address) arrives,
remove the message address and forward the resulting report to that
address.

[Without the indirection it would be trivial for any node to discover
other nodes whose labels are close to any given label.  That ability
would be much more useful to attackers than to legitimate users of the
system.]

When a direct report (one lacking a message address) arrives with a
cookie, and the resource label is one of interest to the higher layer,
deliver the locations up to the higher layer (or if the report contains
no locations, deliver a non-definitive indication that the resource
appears not to exist).

When a direct report arrives without a cookie, compare its locations to
the local resources.  For any local resource whose label matches the
report, and whose old flag is off, and whose timestamp is older than
any of the timestamps in the report, inform the higher layer of the
locations of the newer versions.


Message formats
===============

message ::= body options type group_identifier
group_identifier ::= byte*N length_byte (N == length_byte)
length_byte ::= byte
type ::= byte
options ::= byte*

body ::=    ping              (type==1==001)
         |  pong_one_cookie   (type==2==010)
         |  pong_two_cookies  (type==3==011)
         |  report_no_cookie  (type==4==100)
         |  report_cookie     (type==5==101)
         |  post              (type==6==110)
         |  query             (type==7==111)

ping ::= node_address message_address cookie

pong_one_cookie ::=
  cookie node_address message_address connection_address
  node_list node_parameters
node_list ::= length_byte node_message_address*N (N == length_byte)
node_message_address ::= node_address message_address
node_parameters ::= uptime bandwidth free_space

pong_two_cookies ::= cookie pong_one_cookie

report_no_cookie ::= destination label location_list
report_cookie ::= destination cookie label location_list
destination ::= direct | indirect
direct ::= byte==0
indirect ::= byte==1 node_address message_address byte==0
location_list ::= length_byte location*N (N == length_byte)
location ::=
  node_address message_address connection_address resource_parameters
resource_parameters ::= timestamp_flag completeness

post ::= length_byte advertisement*N (N == length_byte)
advertisement ::= label location

query ::= node_address message_address label cookie

node_address ::= length_byte byte*N (N == length_byte)
    (A length_byte of 4 means IPv4.  A length_byte of 16 means IPv6.
    Other address formats may be added in the future.)

message_address ::= length_byte byte*N (N == length_byte)
    (A length_byte of 2 means UDP.  Other address formats may be added
    in the future.)

connection_address ::= length_byte byte*N (N == length_byte)
    (A length_byte of 2 means TCP.  Other address formats may be added
    in the future.)

label ::= byte*N (N is implied by the group identifier)

cookie ::= length_byte byte*N (N == length_byte)
    (For pings, it is suggested that the cookie contain a timestamp and
    a keyed one-way hash covering the timestamp and the peer's node
    & message addresses.  Then when the cookie comes back in a pong,
    both the timestamp and the peer's addresses can be verified.  For
    queries, the cookie could contain a timestamp and a keyed one-way
    hash covering the timestamp and the label being sought.  Then when
    the cookie comes back in a report, the timestamp can be verified and
    used to estimate the average response time.  The inclusion of the
    peer address or label in the hash reduces (but does not eliminate)
    the vulnerability to replay attacks.  The representations can be
    platform-specific, since the cookie is interpreted only by its
    creator.)

uptime ::= usf (in seconds)
bandwidth ::= usf (in bits per second)
free_space ::= usf (in bytes)

usf ::= byte*2
  (Unsigned short float.  For bytes B1 B2, the value is B2 if B1 is
  zero, otherwise (256 + B2) << (B1 - 1).)

timestamp_flag ::= byte*5
  (The most significant bit of the first byte is the old flag.  The
  rest is a POSIX timestamp, in network byte order (big-endian, most
  significant first).)

completeness ::= byte
  (A value of N means that at least the first N/255 of the resource
  is present.  A value of zero means that a certain minimal amount is
  present, specified by the upper layer.)


Message options
===============

Options are defined for query and report messages; for other message
types, options must not be sent, and must be ignored if received.

For report messages, only the first option byte is defined; any more
must not be sent, and must be ignored if received.  Within this byte,
the least significant seven bits are reserved, and must be sent as
zeroes, and ignored on receipt.  If the most significant bit is set, it
indicates an exact match between the label in the message and the label
of its sender (the original sender, not the relayer).  This option can
be useful for testing the reachability of a particular node: do a lookup
on the node's label and watch for a report with the option bit set.

For query messages, only the first option byte is defined; any more must
not be sent, and must be ignored if received.  Within this byte, the
least significant six bits are reserved, and must be sent as zeros, and
ignored on receipt.  The two most significant bits provide a forwarding
hint, indicating the kind of distribution to be used when choosing the
next hop for the query and for any indirect reports evoked by the query.

    00: any random distribution designed to reduce the total time
    01: any random distribution designed to reduce the total hop count
    10: uniform random
    11: any deterministic choice designed to minimize the total time

If the option is not present, the default forwarding hint is 00.  The
hint should not be changed when the query is relayed.  Nodes are not
required to respect the hint, but note that non-random next-hops are not
allowed except when hint 11 is present.

Implementations are encouraged to set and respect the defined options,
but cannot assume that their peers do so.


Group identifiers
=================

Group identifiers with length 0 to 4 are reserved.  Group identifiers
with length greater than 4 are available for private use.  They are not
registered, so collisions are possible, but the choice of identifier can
make them arbitrarily unlikely.