Top PDF File Detection on Network Traffic Using Approximate Matching

File Detection on Network Traffic Using Approximate Matching

File Detection on Network Traffic Using Approximate Matching

Combining both the areas of DLP and ap- proximate matching, in this work, we present a novel technique using approximate matching to detect files in network traffic. Compared to existing techniques, our proposed approach is straight forward and does not need comprehen- sive configuration. It can be easily deployed and maintained since only fingerprints (a.k.a. simi- larity digest) are required. Our approach does not require machine learning, or rule generation. The main contribution is to demonstrate that it is possible to use approximate matching on net- work traffic by changing the algorithms slightly although algorithms where never designed to handle such small pieces. To the best of our knowledge, this the first paper describing a tech- nique for file identification using approximate matching in network traffic.
Show more

15 Read more

File Detection in Network Traffic Using Approximate Matching

File Detection in Network Traffic Using Approximate Matching

The problem of data loss has become an important problem and a robust solu- tion is need of the hour. Possible routes of data loss have become complicated and numerous, making countermeasures difficult to develop and deploy. The in- creased incidents of involvement of insiders in data leakage has raised a serious question on confidentiality of organisation’s internal information, like intellec- tual property. In this work, the problem of identifying files in network traffic is considered. The problem with the existing technology is highlighted and need of open source tools and techniques needed to solve this problem is emphasised. In order to solve this problem, bitwise content analysis of data in motion using approximate matching is proposed. Each packet is analysed for containing the ‘known file’. It is successfully established that it is possible to detect files using this approach. To validate the technique and implementation, several scenarios are considered and tested. In a first step, random data is used to explore feasi- bility and establish a benchmark for what to expect from such a methodology. The tests with real world data showed promising results as well. Both binary and text based files can be easily detected using this approach. However, with real world data, problem of ‘common substrings’ persists. Wherefore, a easy extension is proposed of using stream based analysis.
Show more

106 Read more

Performance testing of GPU-based approximate matching algorithm on network traffic

Performance testing of GPU-based approximate matching algorithm on network traffic

c. Sdhash Sdhash stands for similarity digest hash and it was developed by Vassil Roussev in 2010 [26]. It is an algorithm that allows two arbitrary blobs of data to be compared for similarity based on common strings of binary data [29]. Sdhash’s approach is to identify statistically-improbable features—features that are least likely to occur in other data objects by chance and use them to generate similarity digests. Each of the features is hashed using the cryptographic hash function SHA-1 and the resulting hashes are put into a series of Bloom filters, which are a space-efficient set representation. In order to carry out comparison between two digital artifacts, their digests can be compared. Sdhash applications include identification of embedded objects, identification of code versions, identification of related documents, and correlation of network fragments.
Show more

72 Read more

Android Malware Detection using Decision Trees          and Network Traffic

Android Malware Detection using Decision Trees and Network Traffic

[4] is a static technique which looks for risky APIs / risky keywords within java code of the application. They collected malware samples and created the database of risky APIs found in malwares and further searched for presence of such APIs in the public sector related apps like banking, flight booking etc. [5] is other such mechanism which does two order risk analysis of the applications collected from the official play store. First order analysis consists of analysing permissions within the manifest file i.e. what dangerous permissions are present in the application and second order analysis consists of heuristics based filtering in which heuristics like run time download of component by the application is considered to be malicious. [6] found set of permissions which can distinguish between normal apps and malicious apps. They used hierarchical bi-clustering technique to cluster the permissions in two groups i.e. normal and malwares. And then filtered out those permissions sets that are clearly distinguishable i.e. they are present in malicious apps but missing in normal apps. [7] evaluated potential risks hidden within the ad libraries of the applications. They extracted the ad libraries within the application and looked for dangerous permissions present within them. Dangerous permissions could lead to leakage of private information of the user. [8] is the tool available in the market for performing static analysis.
Show more

5 Read more

Hardware Pattern Matching for Network Traffic Analysis in Gigabit Environments

Hardware Pattern Matching for Network Traffic Analysis in Gigabit Environments

found at text position j if C m,j < k, where k is the maximum number of errors allowed. Figure 2.2 shows the matrix when searching for the pattern “halfway” in the text “hallways”. Italic entries are the positions were a match with less than 2 errors was found. Over the years various improvements to this algorithm have been developed. Most of these exploit properties of the dynamic programming matrix. Another approach for solving the approximate pattern matching problem is using automatons, where each specifity of a column represents a separate state of the automaton. In this case, state transitions occur on every character of the text, i.e., when a new column is computed. All these improvements achieve increased performance by trading memory for runtime. Navarro [Nav01] summarizes these improvements.
Show more

48 Read more

Online Detection of Network Traffic Anomalies Using Degree Distributions

Online Detection of Network Traffic Anomalies Using Degree Distributions

There are two working procedure in our anomaly detec- tion scheme: deployment and measurement. First, our scheme must be deployed properly, such that it receives NetFlow records on available measurement network. We should configure internal NetFlow sources that handle traffic from corporate hosts to Internet and vice versa such as routers, switches and firewalls to export Net- Flows to the processing engine server. For best result and more visibility make sure those sources deal with clear, not NATed traffic. Second, we assume that training traf- fic is devoid of any attack and the characterization of traffic features acts as a normal profile. The normal pro- file is used to calculate the pre-define thresholds. And then our scheme enters fully operational mode. In this mode the threshold is constantly compared with the cur- rent entropy value of degree distributions derived from incoming NetFlows. Alarms are generated if the entropy values differ beyond allowed tolerances. Note that asso- ciated thresholds are self-adjustable as they’re calculated by the processed data itself (NetFlows) in particular time span and periodically update thresholds without requir- ing dedicated periodic training interval.
Show more

6 Read more

ASSESSMENT OF TRAFFIC DETECTION IN A HIGHWAY NETWORK

ASSESSMENT OF TRAFFIC DETECTION IN A HIGHWAY NETWORK

In a first step in our analysis we detect if the given network has a special structure, such as grid networks, radial networks or star networks, which allows in the further network simplification to adjust algorithms for a better performance. To detect the network structure, pattern matching algorithms are searching for grid and radial patterns. Then the network gets simplified to its essential nodes. Essential nodes include on ramp start points, off ramp end points, merging nodes and diverging nodes. To determine the essential nodes of a network we follow the links of a network from all entry points until arriving at a node that has more than one exit link or another entry link. Reached this node we delete all visited links and replace it with an equivalent link from the starting position to the actual position. Afterwards we repeat the process until all node of the network have been visited. To give an indication of the node and link reduction, we found that for the network description of the Tokyo Metropolitan Expressway we could eliminate about 48% of the nodes. All further structural analysis is based on the simplified network, and includes:
Show more

21 Read more

Pattern Matching Algorithm for Network Intrusion Detection Systems

Pattern Matching Algorithm for Network Intrusion Detection Systems

Zargari and Voorhis [32] examine significant features in anomaly detection systems with an aim to apply them to data mining techniques. Identifying some current challenges of obtaining a comprehensive feature set and establishing a system that eradicates redundant and recurring data from the KDD 99 dataset while also keeping the feature set to a minimal size. Rough set theory dependency was used to identify the most discriminating features of each class. Feature 21 and 22 in the KDD dataset were found not to have any significance in intrusion detection (FTP session and hot login). A further five features were identified to have a small significance in intrusion detection. These were su attempted, number of file creation operations, is guest login, dst host rerror rate. To produce a distinctive report finding the features and characteristics of the intrusion detection that offers more attacks and distribution of attacks as compared to KDD dataset. Corrected KDD-dataset was used, in order to discover the features and characteristics of the intrusions plus, whether anomaly detection can be improved by using this dataset from a statistical point of view. It is important to mention that different to other studies, the Corrected KDD- dataset was analysed here instead of the KDD-dataset. The Corrected KDD-dataset contains more attacks and the distribution of attacks is different comparing to the distribution of attacks in the KDD-dataset. A subset of features was later proposed to help decrease dimensions of KDD and compare to subset features through data mining techniques. The proposed features were later tested on NSL – KDD and demonstrated higher detection rates for proposed features. The work may require live analysis before we can be sure that it would function correctly.
Show more

11 Read more

Tool for Secure File Transfer and Intrusion Detection in a. Network

Tool for Secure File Transfer and Intrusion Detection in a. Network

1 | P a g e 1. INTRODUCTION 1.1. Intrusion Detection System (IDS) Intrusion detection system (IDS) is a tool/ application used to detect an attack that is encountered on a system or network in order to compromise or break it by an anomalous user outside of the network. This is done by keeping track of all the suspicious patterns/activities, experienced by both in incoming and outgoing traffic within the network. Generally an IDS maintains all the details of events examined on the system and later generates reports which are sent to the management station for further actions. After obtaining the details of that malicious user from the records, actions like blocking the user are performed. It is important to note that the IDS also includes a feature of monitoring the suspicious user within the network.
Show more

44 Read more

Analysis of Malware Impact on Network Traffic using Behavior-based Detection Technique

Analysis of Malware Impact on Network Traffic using Behavior-based Detection Technique

In the second place, the most common use of the API network group is API 1 and 3 groups with a total of 10 malware samples each. It can be said that as many as 10 malwares with the API Network Group 1 have activities to access a URL or IP and then provide file transfer services between client and server so that, without realizing it, the server can send and retrieve data on the client computer. Whereas 10 malwares with API Network Group 3 have activities to send and receive data through a predetermined socket. The API Network Group 3 will affect the throughput of network traffic because the process of sending and receiving data happens.
Show more

9 Read more

Network Intrusion Detection Evading System using Frequent Pattern Matching

Network Intrusion Detection Evading System using Frequent Pattern Matching

Figure 2 shows architecture of NIDS. In first step KDD- 99 dataset which contain attack and normal traffic is given to C4.5 algorithm through weka [5]. C4.5 in output generate tree. Tree is generated by some attribute value. At each node attributes are given by which tree is further classified. At the leaf node the actual attack is given. On each branch some weight is assigned according to classification attribute. In second step this tree and dataset is given to Adaboost algorithm. Adaboost algorithm has 4 phases labeling, data mining, training, testing. In labeling the normal packet are given -1 value and attack packet +1 at the end. Through data mining some features are extracted. Training phase is performed by taking different fields combination by changing folds. Then the created NIDS is tested for its accuracy. Adaboost algorithm classifies the traffic into 4 types of attacks DOS, U2R, R2L, probe and normal packet. Detection rate and false alarm rate is found out.
Show more

5 Read more

Ransomware early detection by the analysis of file sharing traffic

Ransomware early detection by the analysis of file sharing traffic

administrator privileges, and they take CPU and hard disk resources. The above-mentioned methods try to detect a ransomware when it is encrypting files in the user’s computer. However, in most enterprise productivity deployments user documents are located in central network shared volumes (Eurostat Statistics Explained). They can be documents shared by groups of users or even the whole set of documents from a user for allowing mobility among hosts. The centralization offers better storage utilization with higher quality disks, group sharing capabilities, easier maintenance and simpler periodic backups. In fact, most enterprises hit by ransomware recover their documents thanks to nightly backups (Osterman Research and Inc., 2016). However, the same centralization and sharing opens the door to a single infected computer encrypting lots of documents with effects on many company departments. Locally installed malware detectors could prevent ransomware from encrypting network shared volumes, how- ever, they require installation and updates on the whole set of company computers. As far as we know, no previous work has tried to detect ran- somware action based on the traffic to a NAS system. In this paper we show how a single network probe can detect and stop any ransomware by the analysis of traffic to a network file server. Tens of gigabits per second of sustained traffic are supported, adding file recovery capabilities in order to reduce ransomware impact to a minimum. 3. Network scenario
Show more

19 Read more

A Methodology for P2P File-Sharing Traffic Detection

A Methodology for P2P File-Sharing Traffic Detection

1.1 Main Contributions and Road-map In this paper, we provide a methodology to identify P2P traffic. The methodology is based on the following steps: analysis of the protocol of interest; identification of patterns specific to the P2P protocol that can be revealed by an IP packet level analysis; coding of these patterns in rules that can be fed to an IDS; network monitoring of the identified patterns with an effective IDS fed with the devised rule. Note that following the IDS-like approach does not intro- duce any delay in the network, while requiring only little overhead on the checking-point where it is installed. Fur- ther, the proposed methodology is showed to be extensible to the analysis of P2P protocols that encrypt their gener- ated traffic as well and to efficiently leverage characteris- tics introduced by decentralized P2P file sharing applica- tions. Our P2P traffic detection tool has been successfully deployed and is currently running in a corporate LAN.
Show more

10 Read more

Network Traffic Monitoring, Analysis and Anomaly Detection

Network Traffic Monitoring, Analysis and Anomaly Detection

The Time Sliding Window [6][9] Three Conformance level meter TSWTCL meters a traffic stream and determines the conformance level of its packets. Packets are deemed to belong to one of the three levels, Red, Yellow or Green, depending on the committed and peak rate. The meter provides an estimate of the running average bandwidth. It takes into account burstiness and smoothes out its estimate to approximate the longer-term measured sending rate of the traffic stream. The estimated bandwidth approximates the running average bandwidth of the traffic stream over a specific window (time interval).It estimates the average bandwidth using a time-based estimator. When a packet arrives for a class, TSWTCL re-computes the average rate by using the rate in the last window (time interval) and the size of the arriving packet. The window is then slid to start at the current time (the packet arrival time). If the computed rate is less than the committed configuration parameter, the packet is deemed Green; else if the rate is less than the peak rate, it is yellow else Red. To avoid dropping multiple packets within a TCP window, TSWTCL probabilistically assigns one of the three conformance level to the packet. The basic working principle of NTM is pictorially represented as below:
Show more

8 Read more

Anomaly detection using network traffic characterization

Anomaly detection using network traffic characterization

The traffic analysis process starts with a tcpdump data file which is used for extracting flow data and for gathering per packet information. There are two types of tcpdump records that was used through this thesis work. One of them is the 1999 DARPA Intrusion Detection Evaluation Data Set which was used throughout the development phase. DARPA Intrusion Detection Evaluation Data Set (DARPA 2009) includes weekly prerecorded tcpdump files for further evaluation. The data set was used for intrusion detection so it has separated clean traffic. Attack free (clean) traffic was important during the development period to detect the metric values for each flow attributes. So the first week of the dataset is used for development purposes. Second record type is the one including manually produced anomalies. These records include both abnormal traffic and a scheduled attack produced manually. These type of records are used for checking the accuracy of the work.
Show more

89 Read more

Network intrusion detection using string matching

Network intrusion detection using string matching

b) Controlling file access: Generally functions of controlling file access are done to specialized systems, such as Secret Net, which are intended specifically for protecting network information from unauthorized access. However, protection of some critically important files such as database files and password files cannot be done by such systems. Moreover, such systems are mainly developed for the Windows and NetWare platforms. So such systems fail in UNIX environments which are used for network applications in many organizations. So in such types of cases a network intrusion detection system comes to the rescue of network administrators. Mainly host based network intrusion detection systems are used in such cases which are based both on log-file analysis (Real Secure Server Sensor) and IDSs analyzing system calls (Cisco IDS Host Server).
Show more

46 Read more

On the database lookup problem of approximate matching

On the database lookup problem of approximate matching

respectively. In this paper we present and evaluate a concept to extend existing approximate matching algorithms, which reduces the lookup complexity from O(x) to O(1). Therefore, instead of using multiple small Bloom filters (which is the common procedure), we demonstrate that a single, huge Bloom filter has a far better performance. Our evaluation demonstrates that current approximate matching algorithms are too slow (e.g., over 21 min to compare 4457 digests of a common file corpus against each other) while the improved version solves this challenge within seconds. Studying the precision and recall rates shows that our approach works as reliably as the original implementations. We obtain this bene fit by accuracy–the comparison is now a file-against-set comparison and thus it is not possible to see which file in the database is matched.
Show more

9 Read more

On the Database Lookup Problem of Approximate Matching

On the Database Lookup Problem of Approximate Matching

Basically approximate matching consists of two sepa- rate functions. First, tools run a feature extraction function that extracts features or attributes from the input that allow a compressed representation of the original object (the exact proceeding depends on the implementation it- self). Second, to compare two similarity digests, a similarity function is used that normally outputs a score s which is scaled to 0  s  100. Despite its range, this value is not necessarily an estimate of percentage commonality be- tween the compared objects but a level of con fidence. It is meant to serve as a means to sort and filter the results. ssdeep and the F2S2 software
Show more

10 Read more

Approximate Image Matching using Strings of Bag-of-Visual Words Representation

Approximate Image Matching using Strings of Bag-of-Visual Words Representation

Abstract: The Spatial Pyramid Matching approach has become very popular to model images as sets of local bag-of- words. The image comparison is then done region-by-region with an intersection kernel. Despite its success, this model presents some limitations: the grid partitioning is predefined and identical for all images and the matching is sensitive to intra- and inter-class variations. In this paper, we propose a novel approach based on approximate string matching to overcome these limitations and improve the results. First, we introduce a new image representation as strings of ordered bag-of-words. Second, we present a new edit distance specifically adapted to strings of histograms in the context of image comparison. This distance identifies local alignments between subregions and allows to remove sequences of similar subregions to better match two images. Ex- periments on 15 Scenes and Caltech 101 show that the proposed approach outperforms the classical spatial pyramid representation and most existing concurrent methods for classification presented in recent years.
Show more

10 Read more

Approximate Multiple Pattern String Matching using Bit Parallelism: A Review

Approximate Multiple Pattern String Matching using Bit Parallelism: A Review

Bit-parallelism is a technique which takes advantage of the intrinsic parallelism of the bit operations inside a computer word, allowing to cut down the number of operations that an algorithm performs by a factor up to the number of bits in the computer word[8]. Bit- parallelism is indeed particularly suitable for the efficient simulation of non-deterministic automata. In other words, Bit-parallelism is the technique of packing several values in a single computer word and updating them all in a single operation. This technique has yielded the fastest approximate string-matching algorithms if exclude filteration algorithms [8].
Show more

5 Read more

Show all 10000 documents...