A new analysis of website fingerprinting (WF) attacks aimed at the Tor web browser has revealed that it's possible for an adversary to glean a website frequented by a victim, but only in scenarios where the threat actor is interested in a specific subset of the websites visited by users.
"While attacks can exceed 95% accuracy when monitoring a small set of five popular websites, indiscriminate (non-targeted) attacks against sets of 25 and 100 websites fail to exceed an accuracy of 80% and 60%, respectively," researchers Giovanni Cherubin, Rob Jansen, and Carmela Troncoso said in a newly published paper.
Tor browser offers "unlinkable communication" to its users by routing internet traffic through an overlay network, consisting of more than six thousand relays, with the goal of anonymizing the originating location and usage from third parties conducting network surveillance or traffic analysis. It achieves this by building a circuit that traverses via an entry, middle, and exit relay, before forwarding the requests to the destination IP addresses.
On top of that, the requests are encrypted once for each relay to further hinder analysis and avoid information leakage. While the Tor clients themselves are not anonymous with respect to their entry relays, because the traffic is encrypted and the requests jump through multiple hops, the entry relays cannot identify the clients' destination, just as the exit nodes cannot discern a client for the same reason.
Website fingerprinting attacks on Tor aim to break these anonymity protections and enable an adversary observing the encrypted traffic patterns between a victim and the Tor network to predict the website visited by the victim. The threat model devised by the academics presupposes an attacker running an exit node — so as to capture the diversity of traffic generated by real users — which is then used as a source to collect Tor traffic traces and devise a machine-learning-based classification model atop the gathered information to infer users' website visits.
The adversary model involves an "online training phase that uses observations of genuine Tor traffic collected from an exit relay (or relays) to continuously update the classification model over time," explained the researchers, who ran entry and exit relays for a week in July 2020 using a custom version of Tor v0.4.3.5 to extract the relevant exit information.
To mitigate any ethical and privacy concerns arising out of the study, the paper's authors stressed the safety precautions incorporated to prevent leakage of sensitive websites that users may visit via the Tor browser.
"The results of our real-world evaluation demonstrate that WF attacks can only be successful in the wild if the adversary aims to identify websites within a small set," the researchers concluded. "In other words, untargetted adversaries that aim to generally monitor users' website visits will fail, but focused adversaries that target one particular client configuration and website may succeed."