$Id: dir-spec.txt 18266 2009-01-25 11:26:11Z kloesing $ Tor directory protocol, version 3 0. Scope and preliminaries This directory protocol is used by Tor version 0.2.0.x-alpha and later. See dir-spec-v1.txt for information on the protocol used up to the 0.1.0.x series, and dir-spec-v2.txt for information on the protocol used by the 0.1.1.x and 0.1.2.x series. Caches and authorities must still support older versions of the directory protocols, until the versions of Tor that require them are finally out of commission. See Section XXXX on backward compatibility. This document merges and supersedes the following proposals: 101 Voting on the Tor Directory System 103 Splitting identity key from regularly used signing key 104 Long and Short Router Descriptors AS OF 14 JUNE 2007, THIS SPECIFICATION HAS NOT YET BEEN COMPLETELY IMPLEMENTED, OR COMPLETELY COMPLETED. XXX when to download certificates. XXX timeline XXX fill in XXXXs 0.1. History The earliest versions of Onion Routing shipped with a list of known routers and their keys. When the set of routers changed, users needed to fetch a new list. The Version 1 Directory protocol -------------------------------- Early versions of Tor (0.0.2) introduced "Directory authorities": servers that served signed "directory" documents containing a list of signed "router descriptors", along with short summary of the status of each router. Thus, clients could get up-to-date information on the state of the network automatically, and be certain that the list they were getting was attested by a trusted directory authority. Later versions (0.0.8) added directory caches, which download directories from the authorities and serve them to clients. Non-caches fetch from the caches in preference to fetching from the authorities, thus distributing bandwidth requirements. Also added during the version 1 directory protocol were "router status" documents: short documents that listed only the up/down status of the routers on the network, rather than a complete list of all the descriptors. Clients and caches would fetch these documents far more frequently than they would fetch full directories. The Version 2 Directory Protocol -------------------------------- During the Tor 0.1.1.x series, Tor revised its handling of directory documents in order to address two major problems: * Directories had grown quite large (over 1MB), and most directory downloads consisted mainly of router descriptors that clients already had. * Every directory authority was a trust bottleneck: if a single directory authority lied, it could make clients believe for a time an arbitrarily distorted view of the Tor network. (Clients trusted the most recent signed document they downloaded.) Thus, adding more authorities would make the system less secure, not more. To address these, we extended the directory protocol so that authorities now published signed "network status" documents. Each network status listed, for every router in the network: a hash of its identity key, a hash of its most recent descriptor, and a summary of what the authority believed about its status. Clients would download the authorities' network status documents in turn, and believe statements about routers iff they were attested to by more than half of the authorities. Instead of downloading all router descriptors at once, clients downloaded only the descriptors that they did not have. Descriptors were indexed by their digests, in order to prevent malicious caches from giving different versions of a router descriptor to different clients. Routers began working harder to upload new descriptors only when their contents were substantially changed. 0.2. Goals of the version 3 protocol Version 3 of the Tor directory protocol tries to solve the following issues: * A great deal of bandwidth used to transmit router descriptors was used by two fields that are not actually used by Tor routers (namely read-history and write-history). We save about 60% by moving them into a separate document that most clients do not fetch or use. * It was possible under certain perverse circumstances for clients to download an unusual set of network status documents, thus partitioning themselves from clients who have a more recent and/or typical set of documents. Even under the best of circumstances, clients were sensitive to the ages of the network status documents they downloaded. Therefore, instead of having the clients correlate multiple network status documents, we have the authorities collectively vote on a single consensus network status document. * The most sensitive data in the entire network (the identity keys of the directory authorities) needed to be stored unencrypted so that the authorities can sign network-status documents on the fly. Now, the authorities' identity keys are stored offline, and used to certify medium-term signing keys that can be rotated. 0.3. Some Remaining questions Things we could solve on a v3 timeframe: The SHA-1 hash is showing its age. We should do something about our dependency on it. We could probably future-proof ourselves here in this revision, at least so far as documents from the authorities are concerned. Too many things about the authorities are hardcoded by IP. Perhaps we should start accepting longer identity keys for routers too. Things to solve eventually: Requiring every client to know about every router won't scale forever. Requiring every directory cache to know every router won't scale forever. 1. Outline There is a small set (say, around 5-10) of semi-trusted directory authorities. A default list of authorities is shipped with the Tor software. Users can change this list, but are encouraged not to do so, in order to avoid partitioning attacks. Every authority has a very-secret, long-term "Authority Identity Key". This is stored encrypted and/or offline, and is used to sign "key certificate" documents. Every key certificate contains a medium-term (3-12 months) "authority signing key", that is used by the authority to sign other directory information. (Note that the authority identity key is distinct from the router identity key that the authority uses in its role as an ordinary router.) Routers periodically upload signed "routers descriptors" to the directory authorities describing their keys, capabilities, and other information. Routers may also upload signed "extra info documents" containing information that is not required for the Tor protocol. Directory authorities serve router descriptors indexed by router identity, or by hash of the descriptor. Routers may act as directory caches to reduce load on the directory authorities. They announce this in their descriptors. Periodically, each directory authority generates a view of the current descriptors and status for known routers. They send a signed summary of this view (a "status vote") to the other authorities. The authorities compute the result of this vote, and sign a "consensus status" document containing the result of the vote. Directory caches download, cache, and re-serve consensus documents. Clients, directory caches, and directory authorities all use consensus documents to find out when their list of routers is out-of-date. (Directory authorities also use vote statuses.) If it is, they download any missing router descriptors. Clients download missing descriptors from caches; caches and authorities download from authorities. Descriptors are downloaded by the hash of the descriptor, not by the server's identity key: this prevents servers from attacking clients by giving them descriptors nobody else uses. All directory information is uploaded and downloaded with HTTP. [Authorities also generate and caches also cache documents produced and used by earlier versions of this protocol; see section XXX for notes.] 1.1. What's different from version 2? Clients used to download multiple network status documents, corresponding roughly to "status votes" above. They would compute the result of the vote on the client side. Authorities used to sign documents using the same private keys they used for their roles as routers. This forced them to keep these extremely sensitive keys in memory unencrypted. All of the information in extra-info documents used to be kept in the main descriptors. 1.2. Document meta-format Router descriptors, directories, and running-routers documents all obey the following lightweight extensible information format. The highest level object is a Document, which consists of one or more Items. Every Item begins with a KeywordLine, followed by zero or more Objects. A KeywordLine begins with a Keyword, optionally followed by whitespace and more non-newline characters, and ends with a newline. A Keyword is a sequence of one or more characters in the set [A-Za-z0-9-]. An Object is a block of encoded data in pseudo-Open-PGP-style armor. (cf. RFC 2440) More formally: NL = The ascii LF character (hex value ...
techno45