== Website Fingerprinting Cybersecurity Artifacts ==

=== Description ===

This README details the specifics of the artifacts available at https://www.informatik.tu-cottbus.de/~andriy/zwiebelfreunde/.
Most artifacts also contain their own README with exact information on how to apply them in practice.
For additional background and information, the authors refer to the corresponding publications.
Feel free to reach out to the authors with any questions or comments you might have.

The provided artifacts belong to two publications (NDSS '16 and ACM CCS '20).
If you use, adapt, or re-distribute any portion of our work or the provided artifacts, please cite the corresponding publication.

=== Publications ===

* Andriy Panchenko, Fabian Lanze, Andreas Zinnen, Martin Henze, Jan Pennekamp, Klaus Wehrle and Thomas Engel
Website Fingerprinting at Internet Scale
Proceedings of the 23rd Annual Network and Distributed System Security Symposium (NDSS '16), February 21-24, 2016, San Diego, CA, USA
Publisher: Internet Society,
February 2016
DOI: 10.14722/ndss.2016.23477
ISBN: 978-1-891562-41-9
https://www.informatik.tu-cottbus.de/~andriy/zwiebelfreunde/

"""
@inproceedings{Panchenkoetal2016Website,
    author = {Panchenko, Andriy and Lanze, Fabian and Zinnen, Andreas and Henze, Martin and Pennekamp, Jan and Wehrle, Klaus and Engel, Thomas},
    title = {{Website Fingerprinting at Internet Scale}},
    booktitle = {Proceedings of the 23rd Annual Network and Distributed System Security Symposium (NDSS '16)},
    year = {2016},
    publisher = {Internet Society},
    doi = {10.14722/ndss.2016.23477},
    isbn = {978-1-891562-41-9},
}
"""

> Abstract: The website fingerprinting attack aims to identify the content (i.e., a webpage accessed by a client) of encrypted and anonymized connections by observing patterns of data flows such as packet size and direction. This attack can be performed by a local passive eavesdropper – one of the weakest adversaries in the attacker model of anonymization networks such as Tor. In this paper, we present a novel website fingerprinting attack. Based on a simple and comprehensible idea, our approach outperforms all state-of-the-art methods in terms of classification accuracy while being computationally dramatically more efficient. In order to evaluate the severity of the website fingerprinting attack in reality, we collected the most representative dataset that has ever been built, where we avoid simplified assumptions made in the related work regarding selection and type of webpages and the size of the universe. Using this data, we explore the practical limits of website fingerprinting at Internet scale. Although our novel approach is by orders of magnitude computationally more efficient and superior in terms of detection accuracy, for the first time we show that no existing method – including our own – scales when applied in realistic settings. With our analysis, we explore neglected aspects of the attack and investigate the realistic probability of success for different strategies a real-world adversary may follow.

* Wladimir De la Cadena, Asya Mitseva, Jens Hiller, Jan Pennekamp, Sebastian Reuter, Julian Filter, Klaus Wehrle, Thomas Engel and Andriy Panchenko
TrafficSliver: Fighting Website Fingerprinting Attacks with Traffic Splitting
Proceedings of the 27th ACM SIGSAC Conference on Computer and Communications Security (CCS '20), November 9-13, 2020, Orlando, FL, USA, page 1971-1985.
Publisher: ACM,
November 2020
DOI: 10.1145/3372297.3423351
ISBN: 978-1-4503-7089-9/20/11
https://github.com/TrafficSliver

"""
@inproceedings{De-la-Cadenaetal2020TrafficSliver:,
    author = {De la Cadena, Wladimir and Mitseva, Asya and Hiller, Jens and Pennekamp, Jan and Reuter, Sebastian and Filter, Julian and Wehrle, Klaus and Engel, Thomas and Panchenko, Andriy},
    title = {{TrafficSliver: Fighting Website Fingerprinting Attacks with Traffic Splitting}},
    booktitle = {Proceedings of the 27th ACM SIGSAC Conference on Computer and Communications Security (CCS '20)},
    year = {2020},
    pages = {1971--1985},
    publisher = {ACM},
    doi = {10.1145/3372297.3423351},
    isbn = {978-1-4503-7089-9},
}
"""

> Abstract: Website fingerprinting (WFP) aims to infer information about the content of encrypted and anonymized connections by observing patterns of data flows based on the size and direction of packets. By collecting traffic traces at a malicious Tor entry node — one of the weakest adversaries in the attacker model of Tor — a passive eavesdropper can leverage the captured meta-data to reveal the websites visited by a Tor user. As recently shown, WFP is significantly more effective and realistic than assumed. Concurrently, former WFP defenses are either infeasible for deployment in real-world settings or defend against specific WFP attacks only. To limit the exposure of Tor users to WFP, we propose novel lightweight WFP defenses, TrafficSliver, which successfully counter today's WFP classifiers with reasonable bandwidth and latency overheads and, thus, make them attractive candidates for adoption in Tor. Through user-controlled splitting of traffic over multiple Tor entry nodes, TrafficSliver limits the data a single entry node can observe and distorts repeatable traffic patterns exploited by WFP attacks. We first propose a network-layer defense, in which we apply the concept of multipathing entirely within the Tor network. We show that our network-layer defense reduces the accuracy from more than 98% to less than 16% for all state-of-the-art WFP attacks without adding any artificial delays or dummy traffic. We further suggest an elegant client-side application-layer defense, which is independent of the underlying anonymization network. By sending single HTTP requests for different web objects over distinct Tor entry nodes, our application-layer defense reduces the detection rate of WFP classifiers by almost 50 percentage points. Although it offers lower protection than our network-layer defense, it provides a security boost at the cost of a very low implementation overhead and is fully compatible with today's Tor network.

=== License ===

The artifacts are provided as is without any warranty.
You can use, modify and/or re-distribute them in any way you like.
The authors kindly ask you to cite the corresponding publication.

=== Project Status ===

The artifacts are actively maintained.
However, the authors are open to inquiries via email.
Please refer to the original publication on how to contact the authors.

=== Acknowledgment ===

The authors would like to thank Norbert Landa and Robert Echelmeyer for their support while performing some of the experiments.
Moreover, the authors thank Daniel Forster for the initial prototype of TrafficSliver-Net.
Parts of this work have been funded by the EU H2020 Project "Privacy Flag", the Luxembourg National Research Fund (FNR) (partly within the CORE Junior Track project PETIT, the EU and state Brandenburg EFRE StaF project INSPIRE, the German Federal Ministry of Education and Research (BMBF) under the projects KISS_KI and WAIKIKI, and the Excellence Initiative of the German federal and state governments.


=== Details ===

The following list gives specifics on the individual artifacts the authors provide for their work.

=> Non-defended-dataset.tar.xz [TrafficSliver - ACM CCS '20]
   Alexa Top 100 webpages in Tor cell format (100 instances each), collected without an enabled spitting strategy (cf. Section 7)

=> TrafficSliver-Net.tar.xz [TrafficSliver - ACM CCS '20]
   Alexa Top 100 webpages in Tor cell format (100 instances each), collected with the splitting strategy BWR-5 (cf. Section 7)


=> classifier.tar.gz [CUMUL - NDSS '16]
   Used evaluation framework including a copy of the classifier (LibSVM) (covers partially the experiments presented in Section VII)

=> data-dump.tar.gz [CUMUL - NDSS '16]
   Exemplary input data for the TCP and TLS data extraction parsing (cf. Section V-B), conforming to the data collection described in Section V-A.

=> data-rnd-www.tar.gz [CUMUL - NDSS '16]
   Datasets (cf. Section IV-B) containing:
   (1) 1125 RND-WWW webpages in TLS format (40 instances each) and
   (2) 111884 RND-WWW webpages in TLS format (1 instance each).

=> feature-generation.tar.gz [CUMUL - NDSS '16]
   Scripts for:
   (1) TCP and TLS data extraction parsing (cf. Section V-B),
   (2) outlier removal (cf. Section V-B), and
   (3) the CUMUL feature generation (cf. Sections VI).

=> libsvm-src.tar.gz [CUMUL - NDSS '16]
   Implementation of the used SVM classifier (a slightly modified LibSVM) (cf. Section VI)


=> rnd-www-tcp-format/background-data.tar.gz [CUMUL - NDSS '16]
   111884 RND-WWW webpages in TCP format (1 instance each) (cf. Section IV-B)

=> rnd-www-tcp-format/foreground-data.tar.gz [CUMUL - NDSS '16]
   1125 RND-WWW webpages in TCP format (40 instances each) (cf. Section IV-B)

=> rnd-www-tcp-format/readme.txt [CUMUL - NDSS '16]
   Just a README for the (sub)directory


=> urls.tar.gz [CUMUL - NDSS '16]
   List of URLs contained in the RND-WWW datasets (both for foreground and background) (cf. Section IV-B)