How do BitTorrent magnet links work?

A BitTorrent magnet link identifies a torrent using1 a SHA-1 or truncated SHA-256 hash value known as the "infohash". This is the same value that peers (clients) use to identify torrents when communicating with trackers or other peers. A traditional .torrent file contains a data structure with two top-level keys: announce, identifying the tracker(s) to use for the download, and info, containing the filenames and hashes for the torrent. The "infohash" is the hash of the encoded info data.

Some magnet links include trackers or web seeds, but they often don't. Your client may know nothing about the torrent except for its infohash. The first thing it needs to is find other peers who are downloading the torrent. It does this using a separate peer-to-peer network2 operating a "distributed hash table" (DHT). A DHT is a big distributed index which maps torrents (identified by infohashes) to lists of peers (identified by IP address and ports) who are participating in a swarm for that torrent (uploading/downloading data or metadata).

The first time a client joins the DHT network it generates a random 160-bit ID from the same space as infohashes. It then bootstraps its connection to the DHT network using either hard-coded addresses of clients controlled by the client developer, or DHT-supporting clients previously encountered in a torrent swarm. When it wants to participate in a swarm for a given torrent, it searches the DHT network for several other clients whose IDs are as close3 as possible to the infohash. It notifies these clients that it would like to participate in the swarm, and asks them for the connection information of any peers they already know of who are participating in the swarm.

When peers are uploading/downloading a particular torrent, they try to tell each other about all of the other peers they know of that are participating in the same torrent swarm. This lets peers know of each other quickly, without subjecting a tracker or DHT to constant requests. Once you've learned of a few peers from the DHT, your client will be able to ask those peers for the connection information of yet more peers in the torrent swarm, until you have all of the peers you need.

Finally, we can ask these peers for the torrent's info metadata, containing the filenames and hash list. Once we've downloaded this information and verified that it's correct using the known infohash, we're in practically the same position as a client that started with a regular .torrent file and got a list of peers from the included tracker.

The download may begin.

1 The infohash is typically hex-encoded, but some old clients used base 32 instead. v1 (urn:btih:) uses the SHA-1 digest directly, while v2 (urn:bimh:) adds a multihash prefix to identify the hash algorithm and digest length.
2 There are two primary DHT networks: the simpler "mainline" DHT, and a more complicated protocol used by Azureus.
3 The distance is measured by XOR.

Further Reading

  • BEP-3: The BitTorrent Protocol Specification
  • BEP-52: The BitTorrent Protocol Specification v2
  • BEP-5: DHT Protocol
  • BEP-9: Extension for Peers to Send Metadata Files
  • BEP-10: Extension Protocol
  • BEP-11: Peer Exchange (PEX)
  • Azureus DHT Description

Peer discovery and resource discovery (files in your case) are two different things.

I am more familiar with JXTA but all peer to peer networks work on the same basic principles.

The first thing that needs to happen is peer discovery.

Peer Discovery

Most p2p networks are "seeded" networks: when first starting a peer will connect to a well-known (hard-coded) address to retrieve a list of running peers. It can be direct seeding like connecting to dht.transmissionbt.com as mentioned in another post or indirect seeding as usually done with JXTA where the peer connects to an address that only delivers a plain text list of other peers network addresses.

Once connection is established with the first (few) peer(s), the connecting peer performs a discovery of other peers (by sending requests out) and maintains a table of them. Since the number of other peers can be huge, the connecting peer only maintains part of a Distributed Hash Table (DHT) of the peers. The algorithm to determine which part of the table the connecting peer should maintain varies depending on Network. BitTorrent uses Kademlia with 160 bit identifiers/keys.

Resource Discovery

Once a few peers have been discovered by the connecting peer, the latter sends a few requests out for discovery of resources to them. Magnet links identifies those resources and are built in such a way that they are a "signature" for a resource and guarantee that they uniquely identify the requested content among all the peers. The connecting peer will then send a discovery request for the magnet link/resource to peers around it. The DHT is built in such a way that it helps determine which peers should be asked first for the resource (read on Kademlia in Wikipedia for more). If the requested peer does not hold the requested resource it will usually "pass on" the query to additional peers fetched from its own DHT.

The number of "hops" the query can be passed on is usually limited; 4 is an usual number with JXTA type networks.

When a peer holds the resource, it replies with its full details. The connecting peer can then connect to the peer holding the resource (directly or via a relay - I won't go into details here) and start fetching it.

Resources/Services in P2P networks are not directly attached to network addresses: they are distributed and that is the beauty of these highly scalable networks.