Machine Learning Approach for Classifying Malicious URLs

(1)

IJEDR1604091

International Journal of Engineering Development and Research (www.ijedr.org)

622

Machine Learning Approach for Classifying

Malicious URLs

Saurabh Mittal, Preeti Associate Professor, Research Scholar

Department of Computer Science and Engineering, Galaxy Global Group of Institutions, Dinarpur, Ambala, Haryana, India

________________________________________________________________________________________________________

Abstract- The discrimination of ordinary and malicious activity of program by monitoring HTTP visitors is fitting more difficult when sophisticated malware generate authorized HTTP traffic and having the identical habits with ordinary application. In this paper, a brand new strategy is proposed to support administrator in detection of malicious clients by using clustering customers into businesses founded on HTTP-pastime similarity. Size of the auto-application activity in the network can be can also be noticeable from the influence of the system. Nevertheless, the approach wishes to be increased for mechanically discover malicious agencies without using blacklist or outcomes of alternative malicious detection methods. One more terrible point of this system is that with a malware has simply been infected in just one client, that malware's conduct are not able to be detected. Also the identical time, the normal supervised learning algorithms are recognized to generalize well over the certain patterns determined in coaching knowledge, which makes them a greater replacement towards hacking campaigns. Nevertheless, the particularly dynamic environment of those campaigns requires updating the items often, and this poses new challenges in view that many of the traditional learning algorithms are also computationally high-priced to retrain. This paper compares desktop learning techniques (OneR, ZeroR, and Random Forest) for detecting malicious webpages.

Index Terms - Computer Security, Adware Classification, Machine Learning, Random Forest.

________________________________________________________________________________________________________

I. INTRODUCTION

Advert ware, brief for Malicious Multimedia publicizing [1] is a series of orders that reward malicious movements on a pc community. The previous of malware began alongside "laptop virus", a word gave by Cohen. This can be a piece of software that replicates by means of relating itself to supplementary executable in the approach. Today, the malware involves viruses, worms, Trojans, foundation kits, backdoors, bots, spyware and adware, spy ware, jolt ware and every supplementary plan that has malicious habits.

Spyware is a speedy producing threat to reward computer networks. Construction spyware and adware has end up a

Multi-billion. The progress of the web, the arrival of communal webs and the rapid proliferation of botnets has provoked an exponential rise within the quantity of spyware. In 2010, there was once a sharp rise in the quantity of spyware range throughout junk mail emails dispatched mechanisms that have been portion of botnets. McAfee Labs described that there were 6 million new infections each single month [2].Two significant retailers altering mobile use are privacy and affirmative user experience. The marketplace for mobile requests is headquartered on believe. Cell publicizing is questionable undertaking, comparable to requests that use misleading habits adware, a negative encounter on the understanding of privacy of the conclude consumer and the user experience. Do things like arrest exclusive knowledge equivalent to e mail addresses, Mechanism identification, IMEI, and so forth. Missing thoroughly notifying users and cellphone settings and alter computer missing accord, it can be exasperating and unacceptable for mobile customers. As most cell advertisements usually are not malicious, although, they're undesirable for most folks.

Figure 1 Dangers of Malicious URLS and Adwares

(2)

IJEDR1604091

623

one of the competent mechanisms for cyber convicts to allocate malicious code. In designated, attackers in general use force-with the aid of-down load exploits to compromise a giant quantity of users. To make a force with the aid of down load assault, the attacker crafts early software malicious purchaser-facet script (typically composed in JavaScript) that target vulnerabilities in an online browser or in one of the browser plugins. This program is inoculated into compromised internet sites or it appears that evidently hosted on a server beneath the manipulation of criminals. After a sufferer visits a malicious website, the malicious software is gave, and if the browser of the sufferer is inclined, the browser is compromised. As a consequence, the laptop of the victim is customarily infected alongside malware. Given the manufacturing menace of malicious web sites, it is not wonderful that researchers have begun investigating methods to safeguard web users. Currently, the most vast safeguard is established on the URL blacklists. [4] These blacklists (reminiscent of Google innocent searching) retailer the URLs which have been learned to be malicious. The catalogs are interviewed via a browser in the past visiting a web page. After the URL is on the black catalog, the connection is broken or warning. Of sequence, to be capable to craft and uphold this sort of blacklist, automated detection mechanisms are demanded that may to find on the web webpage encompassing malicious content

In analogy Virus completed [5] is a free online capability that analyzes malicious records and URLs. It facilitates the speedy detection of viruses, worms, Trojans, and all varieties of malware. Services stores all analyzes grasped out, that allows for users to find for a given describing MD5, SHA1, SHA256 or URL. Replies researches revisit the final study on the resource of curiosity. The capacity moreover allows for you to seek out the comments customers put up on records and URLs, examine our passive DNS information and reclaim data features of the threat on areas and IP addresses. Elucidate extra involving study alongside the service. Virus finished aftermath tagged alongside adware as much as fifty three%.

Search engines mostly index huge numbers of information, and as such may also be believed as particularly complete repositories of skills. Later the heuristic delineated above, we assistance to use the in finding aftermath themselves to larger appreciate supplementary enchantment for interpretation. Using relevance suggestions nickname paradigm, and take delivery of the in finding aftermath to be significant to the question. Surely, no longer all aftermath are as primary, so hackers can use complex electing preparations to acquire trustworthy imaginative and prescient concerning the question to sort out public user.

II. RELATED WORK

Large scrutiny has been finished in the globe of computer security for the detection of understood and unfamiliar malware maintaining disparate contraption discovering and data excavating systems. The authors utilized two static features eliminated from malware and benign multimedia, purpose size Frequency (FLF) and Printable Thread information (PSI) [6]. This work was once instituted on the hypothesis that “though aim calls and strings are self-governing of each single solitary supplementary authors underpin every single solitary supplementary in categorizing malware”. Disassembly of the entire examples was completed retaining IDA professional and FLF, PSI points were eliminated keeping Ida2DB.

In the work, purpose length is the number of bytes of design within the perform. Frequencies of all purpose lengths for all malware was computed and allotted within the exponential interval scopes after completed it came as much as be 50 intervals. Printable Thread information in every single solitary unpacked malware used to be removed and all the strings for all malware have been joined to craft a database. A dataset was once crafted alongside these strings as 5 facets that accord binary valued at of whether or not a unique malware encompassed this thread or now not. All strings alongside minimum size of three have been selected. With the selected points thirteen disparate datasets were crafted for 13 disparate malware households and benign applications. And the authors utilized 5 classifiers; Naive Bayes, SVM, Random forest, IB1 and decision table. Great aftermath was once acquired by AdaBoostM1 alongside selection desk alongside an accuracy fee of ninety eight.86%. It used to be additionally famous that the aftermath received with the aid of joining each aspects have been supplementary satisfactory than keeping each single solitary type of aspects individually. Schultz et al. [7] utilized disparate information excavating ways to detect unfamiliar malware. The authors amassed four,226 methods of that 2.365 have been malicious and 1,001 were benign.

In the selected data there have been 206 benign executables and 38 malicious executables had been in PE structure. Static points from each single solitary solitary design were eliminated retaining three procedures; Binary profiling, Strings and Byte sequences. Binary profiling was once commanded to in simple terms PE files and supplementary approaches were utilized for all applications. Binary profiling used to be utilized to put off three sorts of aspects;

[1] Catalog of shiny hyperlink Libraries utilized by way of the PE,

[2] intention calls created from every single solitary bright hyperlink Library and three) first-class purpose calls in each single solitary DLL.

(3)

IJEDR1604091

624

With RIPPER they came to be accuracies of 83.Sixty two%, 89.36%, and 89.07% suitably for datasets alongside facets DLLs utilized, DLL purpose calls and high-quality Calls in DLLs. The accuracies obtained alongside Naïve Bayes and Multi-Naïve Bayes have been ninety seven.11% and ninety six.88%. The end result using Signature process got here to be 49.28% accurate. Multi-Naïve Bayes produced greater aftermath contrasted to the supplementary methods.

The info in PE headers was utilized for the detection of malware[11]. The work was instituted on the belief that there must to be change within the traits of PE headers for malware and benign multimedia as both were industrialized for disparate purposes. 1908 benign and 7863 malicious executables have been amassed. The malware examples encompassed viruses, e-mail worms, trojans and backdoors. PE headers of all of the records had been dumped conserving a design yelled DUMPBIN. Each single solitary header (MS DOS header, file header, discretional header and assisting headers) within the PE used to be relied on as a probable attribute.

For each single solitary malware and benign design locale and entry advantages of every single solitary attribute had been calculated. For the reloc aiding in the PE, the decision of whether a malware encompassed that aiding or no longer was famous down. Every single solitary earth in the dataset used to be adjusted to binary valued at in the attribute linearization system. Unimportant and redundant characteristics had been eradicated in the consecutive step. Unimportant qualities are those that were gift in in basic terms one executable. Redundant features have been those gift in all executables.

In parallel, attribute decision was once endowed keeping Prop Vector Machines. The manufacturing dataset used to be confirmed alongside SVM classifier keeping 5-fold go validation. Accuracies of ninety eight.19%, 93.96%, 84.11% and 89.Fifty four% had been bought for virus, email worm, trojans and backdoors respectively. The detection premiums of viruses and e mail worms had been accelerated contrasted to the detection rates of Trojans and backdoors. In Kolter et al. [10], numerous byte sequences from the executables had been used. T he authors accrued 1971 clean and 1651 malicious executables. All of them had been in PE format. Hexadecimal design for every single solitary solitary executable was once acquired through conserving hex dump [10]. From that design numerous bytes in sequence have been joined to produce n-grams. Coaching data was once synchronized alongside the removed n-grams as binary features. Most vital features had been selected by way of computing the information gain for every single solitary function. As a outcome 500 facets have been chosen. Numerous information excavating methods like IBk, TFIDF, naive Bayes, Prop Vector Mechanisms (SVM) and selection timber commanded to supply laws for categorizing malware. The authors additionally utilized “Boosted” naïve bayes, SVM and determination tree newbies. Three examinations had been grasped on the info. In the most important examination, size of the words, dimension of n-grams and quantity of points correct for the examinations have been assessed. From the subset of executables, n-grams had been eliminated alongside n=4. Numerous information excavating examinations had been grasped to find the premiere size of n-grams with the aid of fluctuating the subset dimension (10, 20, a hundred, 1000 and so on).

First-class aftermath was once acquired alongside the dimensions of 500. By means of fixing the scale to 500, n in n-grams used to be different and the aftermath were supplementary targeted alongside n=four. Within the consecutive examination, out of sixty eight,774,909 n-grams, 500 first-class n-grams have been chosen and commanded 10-fold go validation in every single solitary organization procedure. In the third examination, 255 million n-grams had been removed from the all the executables and t the 8 related techniques had been pursued as in consecutive test. The boosted classifiers, SVM and IBk produced good aftermath contrasted to the supplementary ways. The presentation of classifiers was once better with the aid of boosting and the finished presentation of the entire classifiers used to be greater alongside the large dataset contrasted alongside the puny dataset.

Dmitry and Igor [11], utilized positionally reliant points in the Early Entry factor (OEP) of a file for noticing unfamiliar malware. In the work, authors utilized 5854 malicious and 1656 benign executable in WIN 32 PE format. Varied data excavating algorithms like choice table, C4.5, Random woodland, and Naive Bayes have been commanded on the synchronized dataset. Three assumptions have been made for the work.

[1] Studying the entry point of the design understood as Early Entry factor (OEP) reveals supplementary precise know-how. [2] The locale of the byte valued at of OEP address was set to zero. And the offsets for all of the bytes in OEP was trusted to

be in the scope

[3] Simply a solitary byte can also be elucidating for every single solitary locale worth. So the scope for Byte in locale worth is from 0 to 255.

And in the conclude the probable quantity of facets that possibly utilized for organization used to be 65536. The dataset encompassed three elements; function id, Locale and Byte in position. Function decision used to be endowed to put off supplementary momentous features. The facets removed in this % have been instituted on the dependencies amid points data reap and the core parts of the elements. The producing data used to be established opposite all classifiers and the aftermath have been contrasted instituted on ROC-discipline.

(4)

IJEDR1604091

625

malicious deeds from the dependency graphs and commanded this algorithm to the e-mail worm Bagle.J, a variant of Bagle malware.

Easy and malicious documents have been endowed within the manipulated nature, traces of arrangement calls have been eliminated for each single solitary instance as execution. Dependency graph was once crafted holding arrangement calls and the argument dependencies. In the graph, each single solitary node denotes a arrangement call and its arguments; every single solitary frontier denotes dependency amid arguments of the 2 association calls. A sub graph was eliminated from the malware dependence graph by contrasting alongside benign multimedia dependence graph such that it above all specifies the malware conduct.

A brand new file alongside these requisites has got to be categorized as malware. Virus prevention Immaculate (VPM) to detect unfamiliar malware conserving DLLs used to be commanded by way of Wang et al. [13] within the work. 846 malicious and 1,758 benign records in handy executable structure were gathered. All records have been parsed by using a design “dependency walker” that displays the entire DLLs utilized in a tree structure. Three types of characteristics T1, T2 and T3 have been derived from the producing tree. T1 is the catalog of APIs utilized with the aid of primary design undeviatingly, T2 suggests the DLLs implored by way of supplementary DLLs supplementary than essential design and T3 is the connections amid DLLs that consists of dependency tracks down the tree. On the conclude, ninety three,116 qualities had been got. The features alongside low information achieve were eliminated. Supplementary characteristic reduction was finished via maintaining L-SVM. As a consequence, 10 429 important traits have been selected and verified the dataset alongside RBF-SVM classifier holding 5-fold go validation. The detection price alongside RBF-SVM classifier was once ninety nine.00% alongside actual Affirmative expense of 98.35% and false Affirmative cost of 0.Sixty eight%. A similarity compute procedure for the detection of malware used to be counseled by way of Chanted et al. [14] instituted on the hypothesis that, editions of a malware have the similar core signature that may be a blend of elements of the variants of malware. To provide versions for disparate traces of malware, instituted obfuscation ways have been used. Generated editions had been proven opposite eight disparate antivirus products. Four virus strains W32.Mydoom, W32.Blaster, W32.Beagle and Win32.Wika were utilized on this method.

The brand new malware traces got from obfuscation have been labeled into 5 forms; null approach and dead design insertion, knowledge change, manipulation float amendment, knowledge and manipulation glide change, and pointer aliasing. The basis design of each single solitary PE was once parsed to supply API yelling sequence and the sequence used to be trusted as signature for that file. Every single solitary API call used to be given an integer identity. The sequence of API calls was once embodied via corresponding sequence of IDs. The consequence in sequence was once contrasted alongside the predominant malware sequence to produce similarity measure. The similarity measures had been computing protecting Euclidian Distance, sequence alignment and disparate similarity objectives encompassing cosine compute, scope Jacquard compute and Pearson correlation measure.

An average valued at of all the measures used to be computed for each single solitary solitary signature. The biggest index within the similarity desk denotes to that most important malware the unique variant belongs. By contrasting that worth alongside a threshold the nature of the file, benign or malicious was made up our minds. The detection fee of keep was once considerably bigger than antivirus scanners. A stress of Nugache worm used to be reversed with a view to notice its underlying design, deeds and to have an understanding of attacker’s process for discovering vulnerabilities in a association [15]. In supplement to that, the authors moreover reverse engineered 49 malware executables in a remote nature, removed different points like MD5 hash, printable strings, quantity of API calls made, DLLs accessed and URL referenced. Keeping these points they synchronized a dataset. Because of the multi dimensional nature of the dataset, a contraption discovering instrument, BLEM2 [16] instituted on hard set concept was utilized to provide brilliant outlines that have got to assistance in categorizing an unfamiliar malware.

As the size of the dataset was puny, a tremendously insufficient number of selection laws had been generated and the aftermaths weren't enough. Instituted on bright scrutiny [17], spatial -temporal knowledge in API calls used to be utilized to discover unfamiliar malware. The recommended system contains two modules; an offline module that develops a coaching immaculate maintaining out there knowledge and a web-based module that generates a assessing set through getting rid of spatial-temporal information across the educational immaculate to categorize run interval system as whichever benign or malicious. Association logs for a hundred benign and 416 malicious systems have been amassed and 237 innate windows API calls of disparate clusters like socket, recollection organization, threads and many others had been sketched and utilized as base. In the brilliant research, spatial knowledge used to be obtained from goal call arguments, revisit advantages and have been rip into seven subsets socket, recollection 12 organization, methods and threads, file, DLLs, registry and internet association instituted on the performance.

(5)

IJEDR1604091

626

The Markov shackle had okay states and the transition possibilities amid these states have been embodied in state transition matrix T. Each single solitary transition chance was once depended on as a probable attribute. Function resolution was endowed to pick traits alongside most information reap. Within the finish they chose 500 transitions and synchronized a set alongside Boolean values. Three datasets had been crafted with the aid of becoming a member of benign techniques API draft alongside every single solitary malware variety. The three datasets have been combinations of Trojan, virus and benign-worm.

Authors grasped two experiments. Predominant one used to be to observe the joined presentation of spatio-temporal points contrasted to standalone spatial or temporal features. Consecutive examination used to be grasped to get rid of a negligible subset of API clusters 13 that offers related accuracy as from the principal experiment. For this, the authors joined API call clusters in all probable approaches to seek out the negligible subset of clusters that have got to provide comparable organization fee as obtained in major scan. From the major examination, the authors got ninety eight% accuracy alongside naive bayes and ninety four.7% accuracy alongside J48 determination tree and they got here to be bigger aftermath alongside joined features contrasted standalone elements. The detection price of Trojans used to be much less contrasted to viruses and worms. In the consecutive examination, mixture of API calls related to recollection association and file I/O produced quality aftermath alongside an accuracy of ninety six.6%.

The work was instituted on an assumption that, deeds of a malware will also be uncovered completely by bestowing it and discerning its aftermath on the working nature. For this venture, the association that used to be industrialized grasped encompassing registry concentration, files arrangement awareness, net attention, API Calls made, DLLs accessed for each single solitary solitary executable via jogging them in a far flung environment. Conserving the eliminated elements from the reverse engineering procedure, we synchronized three datasets. To those datasets, we commanded information excavating algorithms C4.5, Naïve Inlets and a hard set instituted instrument BLEM2 to provide organization rules and contrasted the outcome. In a tiny of the above remarked works [16] only static aspects like byte sequences, printable strings and API call sequences were used. Nevertheless able in noticing malware, they have to be ineffective if the attackers use obfuscation approaches to encompass malware. To become aware of this setback, a tiny supplementary works.

Malicious URLs Detection and Classification

The previous work has contemplated on the setbacks of behavioral malware detection [3], [4], this research makes a specialty of the setback of classification. That is, since the realize have seen a brand new malware illustration, is there continue a approach that can rapidly and precisely categorize the malware with the intention to detect its cluster (e.G., Trojan, origin kit, worm) or relations (e.G., ZBot, Fynloski, Kelihos)? The work will reward the design and experimental evaluation of a web based, host-established malware association arrangement, making the pursuing contributions

III. PROPOSED WORK

Detection of Malicious content on-line has end up more and more complex due to the evolution of phishing campaigns and their efforts to preclude mitigation by using blacklists. The current state of cybercrime has made it viable for Hackers to host campaigns with shorter lifecycles, diminishing blacklist effectiveness. Also the identical time, the usual supervised studying algorithms are known to generalize good over the detailed patterns observed in training knowledge, which makes them a better substitute towards hacking campaigns. Nonetheless, the particularly dynamic atmosphere of these campaigns requires updating the units most of the time, and this poses new challenges on account that many of the typical studying algorithms are additionally computationally luxurious to retrain.

(6)

IJEDR1604091

627

1. URL Classification WithZeroR Algorithm

The first Classification Algorithm taken for classifying the Malicious URLs is ZeroR, as expected the ZeroR produces results equaling to class numbers in the data set.

Correct: 52.17391304347826% In Correct: 47.82608695652174% Confusion Matrix for ZeroR

Benign Malign

240 0

220 0

2. URL Classification With One R Algorithm

The first Classification Algorithm taken for classifying the Malicious URLs is OneR, Correct: 53.369565217391305

In Correct: 46.630434782608695 Confusion Matrix for OneR

228 12

197 23

3. URL Classification With Random Forest

The first Classification Algorithm taken for classifying the Malicious URLs is Random Forest, Random Forest Algorithm works much better than both OneR and ZeroR algorithm. Random forest of 10 trees, each constructed while considering 10 random features.

Out of bag error: 0.4835 Correct: 66.08695652173913 In Correct: 33.91304347826087 Confusion Matrix for OneR

240 0

39 181

(7)

IJEDR1604091

628

IV. CONCLUSION

URL association is an vital data retrieval task. Precise association of find queries benefits a number of higher-level tasks such as Web find and ad matching. As find queries are normally short, they normally hold insufficient data for adequate association accuracy. To address this setback, the research counseled a methodology for employing find aftermath as a basis of external knowledge.

In upcoming, for the intention of the discover in malicious URL detection and association a query will be dispatched to a finished web find engine, and amass a number of the highest-scoring URLs. The arrangement scuttle the Web pages pointed by these URLs, and categorize these pages into Benign or malign.

V. FUTURE SCOPE

Given that the URL may be a present feature of sites, we tend to study however they will be maximally leveraged for classification tasks. we've extended previous work and further features to model URL element length, content, writing system, token sequence and precedence. we tend to value the utilization of those features over an outsized set of tasks together with connection, categorization and prediction. Our Results indicate URL features perform well on classification tasks, on par with or surpassing full-text and anchor text approaches in sure cases. Our technique additionally outperforms earlier results mistreatment URL features that utilized specialised learning algorithms; in distinction, we tend to use generic most entropy modeling because the supervised machine learning framework. In future we tend to introduce and take a look at a multiclass classification technique aimed toward finding malicious URL on the idea of some data each on the URL syntax and its domain properties. We might also on the category of unsupervised Machine Learning models, Although utilizing all feature categories introduced during this work is best in most cases, our analysis indicates sure feature categories correlate higher to some tasks than others. Several of our fresh introduced features perform well on long URLs, generally found in Associate in computer network setting. These features don't perform furthermore with typical electronic computer entry points (i.e., simply the domain name), as they conceive to leverage the inner path structure of the URL. Future work must be done to additional improve these features and to explore their correlations to seek out best sets for specific tasks. Classification by URL features has alternative blessings apart from period of time efficiency: all sites have URLs, notwithstanding whether or not they exist, ar accessible, have incoming links or have any text (some sites are comprised only of image maps). On the opposite extreme, several pages have too several words that contribute noise. using text summarization to sites has already shown to assist during this method and may be a promising avenue for future work.

REFERENCES

[1] Apte, Jitendra, and Marina Lima Roesler. "Interactive multimedia advertising and electronic commerce on a hypertext network." U.S. Patent No. 7,225,142. 29 May 2007.

[2] Ravula, Ravindar Reddy. Classification of Malware using Reverse Engineering and Data Mining Techniques. Diss. University of Akron, 2011.

[3] Ghosh, Anup K., and Tara M. Swaminatha. "Software security and privacy risks in mobile e-commerce." Communications of the ACM 44.2, 2001

[4] Ma, Justin, et al. "Beyond blacklists: learning to detect malicious web sites from suspicious URLs." Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009.

[5] "Gap between Google Play and AV vendors on adware classification", HispasecSistemas, S. L. "Virustotal malware intelligence service." 2011.

[6] Tian, Ronghua, Lynn Margaret Batten, and S. C. Versteeg. "Function length as a tool for malware classification." Malicious and Unwanted Software, 2008. MALWARE 2008. 3rd International Conference on. IEEE, 2008.

[7] Ye, Yanfang, et al. "An intelligent PE-malware detection system based on association mining." Journal in computer virology 4.4 (2008): 323-334.

[8] William Cohen. Learning Trees and Rules with Set-Valued Features. American Association for ArtiJicial Intelligence (AMI), 1996.

[9] Tzu-Yen Wang, Chin-Hsiung Wu, Chu-Cheng Hsieh, "Detecting Unknown Malicious Executables Using Portable Executable Headers," ncm, pp.278-284, 2009 Fifth International Joint Conference on INC, IMS and IDC, 2009.

(8)

IJEDR1604091

629

[11] Malware Detection by Data Mining Techniques Based on Positionally Dependent Features., DmitriyKomashinskiy, Igor

[12] M. Christodorescu, S. Jha, and C Kruegel, “Mining specifications of malicious behavior,” in Proc. ESEC/FSE-2007, pp. 5–14.

[13] A Virus Prevention Model Based on Static Analysis and Data Mining Methods TzuYen Wang; Chin-Hsiung Wu; Chu-Cheng Hsieh; CITWORKSHOPS '08 Proceedings of the 2008 IEEE 8th International Conference on Computer and Information Technology Workshops., Publication Year: 2008 , Page(s): 288 - 293

[14] A. Sung, J. Xu, P. Chavez, and S. Mukkamala, “Static analyzer of vicious executables (save),” in Proc. 20th Annu. Comput. Security Appl. Conf., 2004, pp. 326– 334.

[15] Malware Analysis Using Reverse Engineering and Data Mining Tools, Burji, S., Liszka, K. J., and Chan, C.-C., The 2010 International Conference on System Science and Engineering (ICSSE 2010), July 2010, pp. 619-624.

[16] Chan, c.-c. and S. Santhosh, "BLEM2: Leaming Bayes' rules from examples using rough sets," Proc. NAFIPS 2003, 22nd Int. Conf. of the North American Fuzzy Information Processing Society, July 24 - 26, 2003, Chicago, Illinois, pp. 187-190.