General Architecture

PWS is a Firefox plugin within which run a Tor client and an HTTP proxy. When the user executes a query, it connects to the HTTP proxy. The proxy filters the HTTP request, then sends it to the search engine over the Tor network. Later, the proxy receives the response from Tor, filters the HTML to remove all active components, and gets the answer back to Firefox for display. Table 1 shows which PWS modules take care of the various types of information leaks that may occur during search

HTTP Filter

The HTTP module's goal is to normalize the HTTP request so that it looks as similar as possible across all PWS users. Query terms will be different, of course, but all protocol specifics of the connection should be removed.

Tor Client

The Tor client serves two different purposes. First, it makes it hard for the search engine to link a user to a source IP address of a query. Second, it allows us to change that source IP address between queries in order to reduce query linkability. Every query is issued through a different channel and, as long as the query rate is below our channel-rebuild rate, channels are not reused. Therefore, the source IP of every query is randomly and indepently selected from the routers in the Tor network (or, more accurately, from the exit nodes).

HTML Filter

The HTML filter's job is to remove any component that may provide feedback to the search engine. This is the additional privacy protection that PWS provides over and above the protection provided by Tor+Provoxy. This is done by parsing the response HTML and extracting only the information needed to present an answer to the user. The information extracted is the result description, text abstract, and result URL. These are extracted using regular expressions. Using the extracted text, a new HTML file is built, with all HTML generated by PWS. This has only the results and no embedded objects. This means that the user only performs one HTTP GET per query, preventing cache timing attacks. All active components such as JavaScript and Flash are removed, so that no extra code is executed.

Information handling summery

Level Identifying information Solution
1 TCP/IP Tor
IP address
Institution or ISP
Operating system
Uptime
Timing (RTT)
2 HTTP Headers HTTP filter
Cookies
Operating system make and version
Browser make and version
Encoding and language
3 HTML HTML filter
JavaScript
Timing (web timing attacks)
4 Application Open problem
Query terms
Time of day of the query
5 Active components HTML filter
...
...
...

Table 1

More details

For further details read our WPES07 paper.