How does DHT in torrents work?

I'm coding a p2p implementation that I would like to make decentralized however I'm having some trouble grasping how DHT in protocols like bittorrent work. How does the client know where the peers are if there is no tracker? Are peers stored in the actual torrent file?


With trackerless/DHT torrents, peer IP addresses are stored in the DHT using the BitTorrent infohash as the key. Since all a tracker does, basically, is respond to put/get requests, this functionality corresponds exactly to the interface that a DHT (distributed hash table) provides: it allows you to look up and store IP addresses in the DHT by infohash.

So a "get" request would look up a BT infohash and return a set of IP addresses. A "put" stores an IP address for a given infohash. This corresponds to the "announce" request you would otherwise make to the tracker to receive a dictionary of peer IP addresses.

In a DHT, peers are randomly assigned to store values belonging to a small fraction of the key space; the hashing ensures that keys are distributed randomly across participating peers. The DHT protocol (Kademlia for BitTorrent) ensures that put/get requests are routed efficiently to the peers responsible for maintaining a given key's IP address lists.


The general theory can be found in wikipedia's article on Kademlia. The specific protocol specification used in bittorrent is here: http://wiki.theory.org/BitTorrentDraftDHTProtocol


What happens with bittorrent and a DHT is that at the beginning bittorrent uses information embedded in the torrent file to go to either a tracker or one of a set of nodes from the DHT. Then once it finds one node, it can continue to find others and persist using the DHT without needing a centralized tracker to maintain it.

The original information bootstraps the later use of the DHT.


DHT nodes have unique identifiers, termed, Node ID. Node IDs are chosen at random from the same 160-bit space as BitTorrent info-hashes. Closeness is measured by comparing Node ID's routing tables, the closer the Node, the more detailed, resulting in optimal

What then makes them more optimal than it's predecessor "Kademlia" which used simple unsigned integers: distance(A,B) = |A xor B| Smaller values are closer. XOR. Besides not being secure, its logic was flawed.

If your client supports DHT, there are 8-bytes reserved in which contains 0x09 followed by a 2-byte payload with the UDP Port and DHT node. If the handshake is successful the above will continue.