Website Fingerprinting: Predicting User Behavior Based on Encrypted Metadata Using Machine Learning
OVERVIEW
In an ongoing project, student researchers at Loyola University Chicago seek to understand how machine learning can be used to identify user web browsing behavior based solely on the metadata of encrypted network traffic, eliminating the need to decrypt data for identification. In order to create a training dataset, researchers created a Python program to repeatedly visit a list of websites and collect network traffic data. The size and direction of the encrypted HTTPS packets were extracted to create a sample for each website and a Random Forest classifier was trained and evaluated on this data. Researchers were able to prove that the trained model provided a reasonably accurate prediction of the website a user was visiting, based only on the metadata of encrypted network traffic (that is, without breaking encryption). This threat model is easy for a lone attacker to establish; the computational requirements are average, and the network visibility required to perform the attack is trivial to obtain. Entities such as Internet Service Providers, corporate network managers, and government agencies already have sufficient visibility to perform the attack we describe.