Peek-A-Booty Resource Discovery

From brainsik
Jump to navigation Jump to search

Drafted by Jeremy Avnet, Bram Cohen and Jonathan Moore; February to March 2002.

Introduction

The Peek-A-Booty peer-to-peer (p2p) network provides anonymous and uncensored web access to individuals residing in countries where web surfing is filtered.

To use the Peek-A-Booty network, an individual must know the address of an uncensored node through which they can proxy their requests for web documents. If this node becomes censored, then a new node must be used. The more Peek-A-Booty nodes a user knows about, the better chance that they will be able to surf the web uncensored. The more nodes that the censoring agency knows about, the better chance they will be able to block web content. In otherwords, both the web surfer and the censoring agency desire to know about as many nodes as possible. How can we furnish information about new nodes whithout making the network unusable?

The number of nodes an individual Peek-A-Booty user needs to know can be drastically less than the number the censoring agency needs to know. To effectively use the Peek-A-Booty network, an individual user only needs to know about a small number of uncensored nodes. To effectively block the Peek-A-Booty network, the censoring agency needs to know about almost all of the nodes (in order to block them).

We assume that the censoring agency is vastly more powerful than the individual user. This power comes in the form of computing resources as well as fear tactics. This document mainly deals with computing resources. A small discussion is at the end regarding the problem of fear tactics.

New and Improved Time-Released Formula

We solve the problem of nodes both needing to be found and needing to stay hidden by making their discovery a slow process. There are two ways to slow down the discovery of new nodes.

  1. Require the completion of a computational challenge.
  2. Require the completion of a human challenge.

A computational challenge

There are many problems that are difficult for computers to solve. Furthermore, by choosing a problem whose difficulty level can be parameterized, one can effectively decide the minimum time required to solve such a problem. The best example of a problem like this is timed-release crypto, which not only takes a very precise amount of time to compute, but isn't parallelizable. The best example of this is time-released crypto[1].

To discover a new node on the Peek-A-Booty network, a client asks a previously known node for the address of one of its peers. The node could then respond with a computational challenge and the peer information encrypted using the challenge solution as the key. This gives some control over the minimum amount of time or resources needed to obtain a new node address.

By making this computational challenge sufficiently difficult we require that a censoring agency spend a large amount of computing resources and time in order to discover as many nodes as possible. If the number of new nodes becoming available over time is larger than the number of nodes the censoring agency can discover over time, the Peek-A-Booty network will remain usable.

A human challenge

For some Peek-A-Booty users, the disparity between their computing resources and the computing resources of their censoring agency may be too large. A Peek-A-Booty user would have to wait vast amounts of time before they could discover a new node since the computational challenge is graded for the censoring agency's ability. By creating a challenge that is very difficult for computers to solve but easy for humans to solve we can shift the balance of resources from the censoring agency to the Peek-A-Booty user.

One such problem is the identification of distorted text that has been treated with random noise. This leverages the assumption that humans are better than computers at this kind of pattern recognition. By making such a requirement, a censoring agency would have to employ a large number of people to solve such pattern recognition problems. If the rate at which the agency is solving these problems is less than the rate at which new nodes are joining the network, the network will remain usable.

Preventing sinkholes

An attacker may disrupt the network by creating a 'sinkhole', which is a group of their own peers which look attractive as peer targets. After these peers are left long enough that they represent a large fraction of all the peering in the entire network, the attacker removes them, causing significant network fragmentation.

This attack can be prevented by requiring a computational challenge be solved not just by the initiating side of a new connection, but by the receiving side as well. This implies that relationships should be bilateral, which makes sense for several other reasons as well, including that both sides can see the other's IP anyway.

Network Robustness

Network Concepts

Each peek-a-booty node has 'connections' to several other peers. Connected peers don't literally have a TCP connection between them at all times, but they do run servers, and check to make sure that the other one is there from time to time. New connections are formed whenever a peer has too few peers which are still up, and the initiation of a new peer requires usage of resources on both ends, to prevent usurpers.

In addition to simple ping information, peers make sure the peers they're connected to have an up to date list of all of their peers. The information is, of course, encrypted using a time-release or human-challenge lock, but plaintext hashes are included so that duplicates can be detected, for reasons which will be made clear below.

Each peek-a-booty node also runs a web proxy, which is the whole point behind peek-a-booty, but that's separate on a protocol level from peer discovery.

Network Topology

Each peer is connected to enough other peers to make network fragmentation unlikely, but no more, because more peers give away more usage pattern information and require more resources to maintain.

Peers have a minimum number of peers they're willing to have, and find more peers when the number they're connected to is below that, either because they just joined or because one of their peers went down. Peers can have more than the minimum number of connections due to connections which other peers initiated. It makes sense for peers to also have a maximum number they're willing to be connected to, to prevent them from becoming hubs.

We've created a simulation of peer churn and tried several different strategies for picking new ones. The one which works best is to make a list of all the peers your own peers have, and connect to the one which is connected to the fewest of your peers. In simulations, its seems the only way this strategy fragments is by all the external peers of some subset going down simultaneously. Since there's no way of preventing that from happening except by increasing the minimum number of peers, this strategy appears optimal. A minimum number of peers of 10 is very robust against huge amounts of churn, maximum peer numbers haven't been tested yet.

Client Usage

Clients have a minimum number of peers they're willing to use, and check that their existing ones are still up periodically. When one goes down, they find another. Their protocol for forming connections to new peers is similar than the intra-peer protocol, in that it requires due diligence on both sides and is long term, but it isn't completely bilateral, since the peer won't put the client in the list of peers it shares.

Regarding Fear Tactics

Draconian Problem

If I were the censoring agency in a sufficiently draconian country, I would run a bunch of nodes and wait for the users in my country to discover them. If a citizen of my country directly used one of these node to access unlawful materials I would cut off their hands. This may be sufficiently scary to bar use of the software.

Possible Solution

Allow users to indicate IP blocks they deem as "safe" to use. The user could then specify subnets that are owned by ISPs in more hospitable locations. Notably, the anonymity features currently built into peek-a-booty don't guard against this attack. It may make sense to abandon those features in favor of different anonymity mechanisms, possibly outside the scope of peek-a-booty altogether.