Top PDF Website Fingerprinting using Deep Learning

Website Fingerprinting using Deep Learning

Website Fingerprinting using Deep Learning

• The TF approach: We use the same experimental settings for training and testing processes as described in the EXP1 and EXP2 Result Table 4.7 shows the performance of WF attacks regardless of website’s labels using transfer learning techniques. The results show that the TF approach performs significantly better compared to the traditional approach when the available number of N examples is small (N = 1 or N = 5). We observe that after number N examples starts growing up, both approaches similarly perform well with over 93% accuracy. The results can be inferred that both TF and traditional approaches meet the goals of improvement including Flexibility and Transferability and Bootstrap Time. If the attacker has a small dataset used to re-train the classifier e.g. the attacker who has limited computing resource, the TF approach is the better choice for this situation. The results also shed the light of improving WF attack application in which a group of highly- resourceful attacker can periodically train powerful pre-trained models form large dataset and other attackers can adopt it. On top of that, the transfer learning property in WF attacks can increase landscape of the attack in WF in
Show more

161 Read more

Using Packet Timing Information in Website Fingerprinting

Using Packet Timing Information in Website Fingerprinting

3.2 Deep Learning in WF Attack 3.2.1 WF Attack by SDAE Abe and Goto is the first to explore the effectiveness of deep learning (DL) effectiveness in traffic analysis [ 3 ]. They used a Stacked Denoising Autoen- coder (SDAE) model as their deep learning model, with a simple input data representation based on incoming and outgoing packet traces. They got 88% accuracy using deep learning without any manual selection of packet features. They used small datasets, which is a reason for their lower attack accuracy. However, they only considered one type of data representation that completely omit timing of the packets.
Show more

52 Read more

Fingerprinting Attack on Tor Anonymity using Deep Learning

Fingerprinting Attack on Tor Anonymity using Deep Learning

This paper proposes a new method for launching a fingerprinting attack to analyze Tor traffic in order to detect users who access illegal websites. Using a fingerprinting attack, we can identify a website that a user accesses on the basis of traffic features such as packet length, number of packets, and time. We can analyze this information from captured packets regardless of encryption. Our new method for fingerprinting attacks is based on Stacked Denoising Autoencoder (SDAE), a deep-learning technology. Our evaluation results show 0.88 accuracy is in a closed-world test. In an open world test, the true positive rate (TPR) and false positive rate (FPR) are 0.86 and 0.02, respectively.
Show more

6 Read more

Fingerprinting Mobile Applications with Deep Learning

Fingerprinting Mobile Applications with Deep Learning

Finally, if correct and no sign of improvements is observed within a specific number of epochs, early stopping stops further model training. 3.2.2 CNN Hyper-parameters Tuning When implementing a CNN model, two key factors need to be taken into account to achieve the best classification performance and, at the same time, enhance the capabilities to classify the unknown traffic: choose the hyper-parameters of the Deep Neural Network and the depth of CNN. The selection of hyper-parameters corresponds to the experience of experts. There are no general rules that can be applied directly. Therefore, the strategy followed has been to choose a representative sub-sample of all the set date and, by testing different parameters oriented to other similar studies, such as (LeCun et al., 1989) or (Krizhevsky et al., 2012), the best results were chosen:
Show more

49 Read more

Identification of Phishing Website using Deep Learning Algorithm

Identification of Phishing Website using Deep Learning Algorithm

Key Words: Deep learning; Machine learning; Phishing website; Random forest; URL. 1. INTRODUCTION Internet technology has grown so extensively over the last few decades from online social networking to online e - commerce and banking technologies to make people's lives more comfortable. This uncontrollable growth has resulted in many security threats to network systems: the most frequently encountered is “phishing”. Phishing is a web - based attack in which attackers attempt to reveal sensitive information such as user id / passwords or account information by sending an email from a reputable person or entity. Phishing attacks can occur in many different forms of communication such as SMS, VOIP and e - mail. Every internet users have many accounts in social networks, banks and lot more. These users are considered as a target for the phishing attack. Still most of the web users are unaware of the phishing attack. Phishing attack typically takes advantage of social engineering to attract the victim by sending a spoofed link to a fake web page.
Show more

6 Read more

Deep Learning for RF Fingerprinting: A Massive Experimental Study

Deep Learning for RF Fingerprinting: A Massive Experimental Study

We conduct additional experiments on ADS-B dataset (task 1D) to demonstrate that we are not learning the device ID. The signal format of ADS-B has a sync pulse that lasts 8 ms (800 samples) followed by either 56 or 112 ms of data. The ID appears in the data portion of the packet. We crop transmis- sions according to a crop size ranging from 64 to 1024, effec- tively looking only at the first part of the packet and ignoring the rest. Then we apply our entire pipeline, including slicing, to cropped transmissions in both the training and test sets. This allows us to assess the accuracy of our classification w.r.t. the crop size. This, in turn, allows us to capture whether the model relies on the ID to classify the device. This would be the case if the accuracy remained low for crop size below 800 samples, and increased significantly after 800 samples (indicated in Fig.
Show more

8 Read more

Deep learning based pipeline for fingerprinting using brain functional MRI connectivity data

Deep learning based pipeline for fingerprinting using brain functional MRI connectivity data

b ICVS/3B’s – PT Government Associate Laboratory, 4710–057 Braga, Portugal c Centre Algoritmi, University Of Minho, 4710–057 Braga, Portugal Abstract In this work we describe an appropriate pipeline for using deep-learning as a form of improving the brain functional connectivity- based fingerprinting process which is based in functional Magnetic Resonance Imaging (fMRI) data-processing results. This pipeline approach is mostly intended for neuroscientists, biomedical engineers, and physicists that are looking for an easy form of using fMRI-based Deep-Learning in identifying people, drastic brain alterations in those same people, and/or pathologic consequences to people’s brains. Computer scientists and engineers can also gain by noticing the data-processing improvements obtained by using the here-proposed pipeline. With our best approach, we obtained an average accuracy of 0.3132 ± 0.0129 and an average validation cost of 3.1422 ± 0.0668, which clearly outperformed the published Pearson correlation approach performance with a 50 Nodes parcellation which had an accuracy of 0.237.
Show more

6 Read more

Improved Website Fingerprinting on Tor

Improved Website Fingerprinting on Tor

{t55wang, iang}@cs.uwaterloo.ca ABSTRACT In this paper, we propose new website fingerprinting techniques that achieve a higher classification accuracy on Tor than previous works. We describe our novel methodology for gathering data on Tor; this methodology is essential for accurate classifier compari- son and analysis. We offer new ways to interpret the data by using the more fundamental Tor cells as a unit of data rather than TCP/IP packets. We demonstrate an experimental method to remove Tor SENDMEs, which are control cells that provide no useful data, in order to improve accuracy. We also propose a new set of metrics to describe the similarity between two traffic instances; they are derived from observations on how a site is loaded. Using our new metrics we achieve a higher success rate than previous authors. We conduct a thorough analysis and comparison between our new al- gorithms and the previous best algorithm. To identify the potential power of website fingerprinting on Tor, we perform open-world ex- periments; we achieve a recall rate over 95% and a false positive rate under 0.2% for several potentially monitored sites, which far exceeds previous reported recall rates. In the closed-world experi- ments, our accuracy is 91%, as compared to 86–87% from the best previous classifier on the same data.
Show more

12 Read more

Improved Website Fingerprinting on Tor

Improved Website Fingerprinting on Tor

Using our improved techniques, we demonstrated a marked im- provement in accuracy with open-world experiments on Alexa’s top 1000 sites to emulate an attacker with a limited set of modelled sites. With our techniques, the recall was above 95% on four sen- sitive sites and the false positive rate was less than 0.2% for those sites. It is possible that these results can be improved further with more sophisticated multi-class training approaches or some fine- tuning to the parameters used in the experiments. To compare our results with those of previous authors, we also performed closed- world experiments on Alexa’s top 100 sites with 40 instances each, and we showed that our new metrics and data processing techniques yielded up to 35% fewer mistakes than previous work. We then per- formed a number of experiments to justify our use of Alexa’s top 100 sites for our closed-world setting and Alexa’s top 1000 sites for our open-world setting. We showed that a five-fold increase of the testing space in this range does not produce a noticeable effect on closed-world and open-world accuracy. Our results warn us that even with TLS encryption, padding, and packet relaying, Tor may not be able to protect web-browsing clients from deanonymization by a passive observer with limited resources.
Show more

12 Read more

Website Fingerprinting: Attacks and Defenses

Website Fingerprinting: Attacks and Defenses

We can also use a similar methodology to examine website fingerprinting defenses rather than attacks. To do so, we first construct two classes of packet sequences that differ only by one feature category using one of the above generators. Then, we apply the WF defense on the two classes. Finally, we apply a feature-based classifier (like SVM and k-NN) to try to distinguish between two classes. If the classifier fails, we can argue that the WF defense covers this feature. However, this approach has some problems. If the WF defense adds some small random perturbation to the packet sequence, the two classes may be distinguishable by the classifier yet the defense may be provably effective on website fingerprinting in general. The choice of the feature-based classifier would also be largely arbitrary. The conclusions drawn from this approach are therefore rather limited. In our previously published paper [CNW + 14] we presented results using this approach, but we will not include the results in this work. Instead, we will examine another approach to prove the effectiveness of a defense in general, with respect to any feature set, in Section 5.1.4 .
Show more

202 Read more

A Critical Evaluation of Website Fingerprinting Attacks

A Critical Evaluation of Website Fingerprinting Attacks

Recent studies on Website Fingerprinting (WF) claim to have found highly effective attacks on Tor. However, these studies make assumptions about user settings, adversary ca- pabilities, and the nature of the Web that do not necessar- ily hold in practical scenarios. The following study criti- cally evaluates these assumptions by conducting the attack where the assumptions do not hold. We show that certain variables, for example, user’s browsing habits, differences in location and version of Tor Browser Bundle, that are usually omitted from the current WF model have a significant im- pact on the efficacy of the attack. We also empirically show how prior work succumbs to the base rate fallacy in the open-world scenario. We address this problem by augment- ing our classification method with a verification step. We conclude that even though this approach reduces the num- ber of false positives over 63%, it does not completely solve the problem, which remains an open issue for WF attacks.
Show more

12 Read more

On Realistically Attacking Tor with Website Fingerprinting

On Realistically Attacking Tor with Website Fingerprinting

Waterloo, ON, Canada iang@cs.uwaterloo.ca ABSTRACT Website fingerprinting allows a local, passive observer monitoring a web-browsing client’s encrypted channel to determine her web activity. Previous attacks have shown that website fingerprinting could be a threat to anonymity networks such as Tor under labora- tory conditions. However, there are significant differences between laboratory conditions and realistic conditions. First, the training data set is very similar to the testing data set under laboratory con- ditions, but the attacker may not be able to guarantee similarity realistically. Second, laboratory packet sequences correspond to a single page each, but for realistic packet sequences the split be- tween pages is not obvious. Third, packet sequences may include noise, which may adversely affect website fingerprinting, but this effect has not been studied.
Show more

12 Read more

Effective Attacks and Provable Defenses for Website Fingerprinting

Effective Attacks and Provable Defenses for Website Fingerprinting

9 Conclusion In this work, we have shown that using an attack which exploits the multi-modal property of web pages with the k-Nearest Neighbour classifier gives us a much higher accuracy than previous work. We use a large feature set and learn feature weights by adjusting them based on shortening the distance towards points in the same class, and we show that our procedure is robust. The k-NN costs only seconds to train on a large database, com- pared to hundreds of hours for previous state-of-the-art attacks. The attack further performs well in the open- world experiments if the attacker chooses k and the bias towards non-monitored pages properly. Furthermore, as the attack is designed to automatically converge on un- protected features, we have shown that our attack is pow- erful against all known defenses.
Show more

15 Read more

Efficient, Effective, and Realistic Website Fingerprinting Mitigation

Efficient, Effective, and Realistic Website Fingerprinting Mitigation

Figure 6 shows the classification accuracy for varying amount of noise added to original traces. Figure 7 shows the bandwidth overhead in % of the extra network traffic generated. The two basic cover traffic algorithms are indicated by k = 1 for adding the same one website as noise each time and by k = 10 for randomly adding one of ten websites as noise. The x-axis indicates the amount of noise s added. When s = 1.0, this means the whole packet trace is added as noise. When s = 0.5, only half of the packet trace is added as noise, that is, every other packet is added as noise to preserve the time intervals. For the basic cover traffic cases (k = 1 and k = 10), we are “simulating” the noise generated; in a real-world setting, this would be hard to achieve without controlling the server – in this case, the browser could send random packets. We show different values of s to compare with our algorithm. As more noise is added (s increases), the accuracy decreases, as expected. Similarly, the bandwidth overhead also increases as more noise is added. Our proposed noise generation algorithm achieves the same accuracy regardless of the amount of noise; this is because we are generating realistic noise that can more effectively hide a user’s real traffic rather than generating random noise. Our algorithm’s bandwidth overhead is the same as the basic cases. However, even with s = 0.25, the overhead is 20% and the accuracy is 14%. Since our proposed algorithm generates random packet traces based on real recorded network traffic, we ran our experiments five times; the graphs show the average of the five experiments. For these experiments, the training dataset used in the Random Forest classification algorithm is the
Show more

14 Read more

Efficient, Effective, and Realistic Website Fingerprinting Mitigation

Efficient, Effective, and Realistic Website Fingerprinting Mitigation

Figure 5. The accuracy using the Random Forest algorithm when varying the sample size. Note the y-axis does not start at 0. Figure 6 shows the classification accuracy for varying amount of noise added to original traces. Figure 7 shows the bandwidth overhead in % of the extra network traffic generated. The two basic cover traffic algorithms are indicated by k = 1 for adding the same one website as noise each time and by k = 10 for randomly adding one of ten websites as noise. The x-axis indicates the amount of noise s added. When s = 1.0, this means the whole packet trace is added as noise. When s = 0.5, only half of the packet trace is added as noise, that is, every other packet is added as noise to preserve the time intervals. For the basic cover traffic cases (k = 1 and k = 10), we are “simulating” the noise generated; in a real-world setting, this would be hard to achieve without controlling the server – in this case, the browser could send random packets. We show different values of s to compare with our algorithm. As more noise is added (s increases), the accuracy decreases, as expected. Similarly, the bandwidth overhead also increases as more noise is added. Our proposed noise generation algorithm achieves the same accuracy regardless of the amount of noise; this is because we are generating realistic noise that can more effectively hide a user’s real traffic rather than generating random noise. Our algorithm’s bandwidth overhead is the same as the basic cases. However, even with s = 0.25, the overhead is 20% and the accuracy is 14%. Since our proposed algorithm generates random packet traces based on real recorded network traffic, we ran our experiments five times; the graphs show the average of the five experiments. For these experiments, the training dataset used in the Random Forest classification algorithm is the
Show more

15 Read more

Learning Abstract Classes using Deep Learning

Learning Abstract Classes using Deep Learning

Since CNNs are very popular at the moment and perceived — in parts of the computer vision community — as obtain- ing human like performance we wanted to test their applica- bility on visual tasks slightly outside the mainstream which are still trivially solvable by humans. We chose to learn simple abstract classes using a standard CNN not because we assume that they will perform better on the tasks than other, possibly much simpler methods, but because we want to gain insights into CNNs and how they perform on tasks which can be solved trivially by humans. We will mainly try to give insight into the amount of training images needed and how well the classifier generalizes to previously unseen shapes representing the same abstract concepts. Until now, most of the classes used for training and testing convolu- tional neural networks were concrete (e.g. detecting classes of objects, animal species in an image, . . . ). One notable exception is the work by G¨ ul¸ cehre et al.[6] who trained a CNN to recognize whether multiple presented shapes are the same. This is in essence a training on two abstract classes.
Show more

5 Read more

Deep learning evaluation using deep linguistic processing

Deep learning evaluation using deep linguistic processing

The advantage of artificial data in this paper is not seen in its capacity to improve existing models by augmenting training data, although this would be conceivable. Instead we are interested in its capacity to provide data for targeted inves- tigations of specific model capabilities. We argue that it constitutes a necessary, though not in itself sufficient benchmark for genuine language under- standing abilities. The aforementioned models ex- hibit clearly superior understanding of the types of questions CLEVR contains. This paper proposes a principled way of continuing the incremental progress in multimodal language understanding initiated by CLEVR and its template-based gener- ation approach, based on deep linguistic process- ing tools. Our initial experiments show that we can provide data that is challenging for state-of-the-art models, like the quantification examples presented in section 3.3. Note that while success on such narrower datasets may not directly translate to im- proved performance on broader datasets like the VQA Dataset, the underlying mechanisms are im- portant for progress in the longer run.
Show more

7 Read more

Applicability of Website Fingerprinting Attack on Tor Encrypted Traffic

Applicability of Website Fingerprinting Attack on Tor Encrypted Traffic

Abstract: Tor is a famous anonymity tools that provide Internet user with capability of being anonymous in the Internet. By using the Tor network, a user can browser without anyone know the truth of the communication information. Numerous studies have been performed worldwide on deanonymizing the Tor user. One of popular study is the Website Fingerprinting (WF) attack, a subset of passive traffic analysis attack. WF consists of complex traffic analytical process with several limitations and assumptions on the Tor network. In this paper, we will discuss the fundamental principal of WF on Tor network, its assumptions and discussion on whether WF is considered as applicable on attacking the Tor user anonymity(especially in real-world scenario).As a result, the applicability discussion and establishment are presented. This study had found that with the advancement of WF attack, it is applicable to be utilized on Tor encrypted traffic and might become a serious threat to Tor’s user anonymity if no proper defense being proposed to prevent the improved WF attack.
Show more

6 Read more

Website Fingerprinting in Onion Routing Based Anonymization Networks

Website Fingerprinting in Onion Routing Based Anonymization Networks

University of Luxembourg {firstname.lastname}@uni.lu ABSTRACT Low-latency anonymization networks such as Tor and JAP claim to hide the recipient and the content of communica- tions from a local observer, i.e., an entity that can eaves- drop the traffic between the user and the first anonymiza- tion node. Especially users in totalitarian regimes strongly depend on such networks to freely communicate. For these people, anonymity is particularly important and an analysis of the anonymization methods against various attacks is nec- essary to ensure adequate protection. In this paper we show that anonymity in Tor and JAP is not as strong as expected so far and cannot resist website fingerprinting attacks under certain circumstances. We first define features for website fingerprinting solely based on volume, time, and direction of the traffic. As a result, the subsequent classification be- comes much easier. We apply support vector machines with the introduced features. We are able to improve recognition results of existing works on a given state-of-the-art dataset in Tor from 3% to 55% and in JAP from 20% to 80%. The datasets assume a closed-world with 775 websites only. In a next step, we transfer our findings to a more complex and realistic open-world scenario, i.e., recognition of several web- sites in a set of thousands of random unknown websites. To the best of our knowledge, this work is the first successful attack in the open-world scenario. We achieve a surpris- ingly high true positive rate of up to 73% for a false positive rate of 0.05%. Finally, we show preliminary results of a proof-of-concept implementation that applies camouflage as a countermeasure to hamper the fingerprinting attack. For JAP, the detection rate decreases from 80% to 4% and for Tor it drops from 55% to about 3%.
Show more

11 Read more

Touching from a Distance: Website Fingerprinting Attacks and Defenses

Touching from a Distance: Website Fingerprinting Attacks and Defenses

Users may also move between pages using their browser’s “Back” and “Forward” buttons and by typing a URL directly into the location bar. The attacker can model page loads via the location bar by simply adding edges between states of the HMM. The probability assigned to these transitions can be derived from user behavior. Unfortunately, it is not possible to precisely model the Back and Forward buttons using an HMM, since that would require augmenting the HMM with a stack. In most browsers, clicking the Back button generates the same traffic trace as clicking a link to the previous page, so the attacker can model the Back button by adding reverse edges for every edge in the original HMM. Note that, since clicking back necessarily is a “warm cache” load of the previous page, the HMM back edge should go to the HMM state representing a warm cache load of the page, even if its corresponding forward edge is from a cold cache state. The probability assigned to each back edge can be derived from observing real users.
Show more

12 Read more

Show all 10000 documents...