International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 4, April 2013)
164
Use of web Mining in Network Security
Rimmy Chuchra 1, Bharti Mehta 2, Sumandeep Kaur 3
1Asst.Proff (CSE) & Sri Sai Institute of Engg. And technology, Mannawala Campus (Amritsar) 2,3M.tech (CSE) & Yadwindra college of Engg, Talwandi Sabo
Abstract--Web mining is basically knowledge discovery from the World Wide Web (WWW).This practical application of data mining helps to integrate the data gathered by the traditional data mining methodologies or data mining techniques as well as data gathered by the WWW. The term Web mining has been used in three distinct ways which are web content mining, web usage mining and web structure mining. Here, we are uses Web structure mining, it is the process of using graph theory to analyze the node and connection structure of a web site. In this research paper we are uses the distinct type of web mining called “Web Structure Mining” which helps to extract patterns from hyperlinks in a web where the function of hyperlink to connect a web page with any other location of the same or different web page. A specific hyperlink behaves like a structural component in case of “web structure mining”. In this research paper, we are merging the concept of web mining with the network security so that we can easily detect the online attacks occur on the network by using web agents (i.e. - web agents are basically web robots) rather than using man power effort. The major objective is to reduce cost as well as time while identifying online attack. Here we use “rule induction data mining technique” to achieve maximum accuracy of results. The special focus is to detect online active attack by the web agents after that they will provide security by using various mechanisms and techniques. In this way, we can also say that these web agents help to protect us from attacker during online data transfer which follows the concept of network security. The first task of web agents is to identify the type of active attack after that provide several ways to prevent security. In this way we can use a “Hybrid approach” (i.e. - web mining with network security).The major benefit to use this hybrid approach is to save time and cost which are the major objectives of data mining.
Keywords--Rule Induction, Web mining, Electronic reconnaissance attack, Web agents (web robots), active attacks, denial of service.
I. INTRODUCTION
Web mining helps to extract useful information from the web pages. Various we mining techniques are used to extract knowledge from the web data, web documents and hyperlinks between the documents. Where the web is universal information platform space which can be accessed by companies, universities, businessman etc. Generally, web hold there are numerous sources of information like internal sources and external sources .
Internal sources are those which include personal information of any organization and external sources are those which include information of clients, vendors, suppliers, intranet and extranet etc. The major significance to use the concept of web mining is to provide efficiency and effectiveness of decision making of decision making. In this research paper, we can divide us mining into three categories which are listed as:
a) Web Structure mining.
b) Web Usage mining.
c) Web content mining.
Web Structure mining: - It consists of web pages as nodes as hyperlinks and edges connecting related pages. It basically tells the structural layout of the web. it also used the connectivity among websites that are called
“Hyperlinks”. Hyperlinks are further divided into two categories which are listed as below:-
Internal hyperlinks that lead to pages within the same web page.
External hyperlinks that lead to other web pages.
Document structure is basically a schema language for XML which helps to describing a valid XML documents.
Web Usage mining: -It holds the knowledge discovered by users which are navigating through the websites. We can also say that it maintains a repository of all record of such requests in log files. It is further divided into two categories which are listed as follows
Application Server Data It holds the business transactions and also makes their repository in applications server log.
Web Server Data In these logs are made by the web server. It also includes the field of IP address means the number of web pages accessed with access times.
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 4, April 2013)
165
The major effort of web content mining is to organize semi-structured web data into structured collection of resources and getting effective results. It uses various approaches like agent based approach, database approach etc. [image:2.612.65.273.198.452.2]Figure 1: Classification of Web Mining
Network security measures are needed to protect data during their transmission. It basically interconnects their data processing equipment with a collection of interconnected networks. Such kind of collection is often referred to as an internet for this we use the term “Internet security”. Our major objective is to protect data from attacker during online data transfer. There are several types of attacks will occur active attacks or passive attacks. Active attacks are further categories as like replay, masquerade and modification of messages and denial of service etc. Similarly categories for passive attacks are traffic analysis and release of message contents. In this research paper, we are discussing about active attacks which are detected by web agents (i.e. - web robots).
An attacker can be easily entered by clicking on attractive hyperlinks. Here, we are discussing about Electronic reconnaissance attacks.
For identifying which system as well as the resources are on the network any attacker must perform “Electronic
reconnaissance attack” (ERA) even in some cases an attacker must holds the complete information about the target network then he or she can easily find out the location of the resources of any organization. Once IP (Internet Protocol) address is known, an attacker can start the scanning and probing on the network. For performing scanning on the network we use a “ping sweep utility” that actually pings a range of IP address. The major purpose to use scanning is to find out what hosts are currently live on the network. The function of probing is to gather additional information like operating system or applications running on those hosts. It also used to discover information about hosts that are on the network. It is accomplished by looking open ports on the available host computers. When any port is opened, on that time an attacker can find out what services are running on a computer. So, by identifying the opened port an attacker can use information further to discover the operating system and application servicing running on the port. Web agents (I.e.-web robots) can easily identified attack by looking various symptoms like unavailability of particular website, inability to access any website, unusually slow network performance, dramatic increase in the amount of spam you receive in your account.
In this research paper, we are merging two broader areas network security with web mining. By using the concept of web mining web agents (i.e. which are basically web robots) will easily discover the knowledge about the attacker from the World Wide Web (WWW) during online data transfer. The major benefit to use such kind of this combined approach is save time as well as cost. When web agents will detect attacker then there will be no need for human effort. In this way, we will save cost. Web agents at first identify the type of attack will occur and after that they will provide security by using various mechanisms and techniques.
II. OUR CONTRIBUTION
In this research paper, we proposed a “hybrid
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 4, April 2013)
166
We can also use this proposed concept in e-commerce applications like in banking sector during online money transfer web agents will easily find out attacker by using some methods and techniques. Here, we are uses a “RuleInduction” technique of data mining whose syntax is given below:
IF Condition Then Class.
i.e. - IF Attack Status=Enable then Call=Web Agents.
The major purpose to use “Rule induction technique” is to achieve the maximum accuracy for getting better results. Rule Induction technique can be implemented as like:
Table1
Various naming conventions used in rule induction method.
WA
Web Agent
AA
Active Attack
OnM
Online Mode
S
Status
E
Enable(shows value is 1)
D
Disable(shows value is 0)
R
Rule
For each Class WA
Initialize to the set of all A2
While Active Attack contains examples in class WA
Create a rule “R” with an empty L.H.S that
Predicts Class WA
Until R is 100% accurate (Or there is no more status to use) do:
For each status “S” not in R & each Mode (Online mode_OnM).
Consider adding the condition (Status_Mode pair) S=M
To the L.H.S of R.
Select S and M in which status of attack is disable & helps to maximize the rule accuracy & also covering of the Status_Mode Pair.
Add Status=mode to R (rule).
Removed the examples covered by R from all A2.
There is only one possible case of Status_Mode Pair
which are as follows:-
Case1:Status=Enable,Mode=Online.
Status=Disable, Mode=Online.
Description:When status is enable and mode is online that indicates data is to be transferred from the source to the destination and when status in disable and mode is again online that indicates there is no data transferred between the source and the destination.
Research Design
III. CONCLUSIONS
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 4, April 2013)
167
In this manner, man power reduced there is no money to pay anybody. So, in this way we can say that this also helps for cost saving. Then ultimately data mining objective will also achieve.IV. FUTURE SCOPE
In future, this work will be extended by implementing this concept with the help of OLAP (on-line analytical processing) tool. And we can also find out some mechanisms or techniques to identify the passive attacks occur in the web. For example when any user will want to visit on any web page then before using this page he or she will be must Signup that specific page on that time username as well as password must be submitted by the user, Later on attacker will try to break that password. So, we have to design various mechanisms to handle such type of passive attacks. It will be discussed in special two cases of passive attacks that are like traffic analysis and release of message contents will be also done by web agents.
REFERENCES
[1] Kavita Sharma, Gulshan Shrivastava, Vikas Kumar, “Web Mining:
Today and Tomorrow” In Proceedings of the IEEE 3rdInternational Conference on Electronics Computer Technology, 2011.
[2] James B. Lingan, http://whatis.techtarget.com seen on March 2011.
[3] L.K. Joshila Grace1, V.Maheswari2, Dhinaharan Nagamalai
“Analysis of Web Logs and Web User in Web
Mining“InternationalJournal of Network Security & Its Applications (IJNSA), Vol.3, No.1, January 2012.
[4] Sravan Kumar, D. and Naveena Devi, B. “Learner’s Centric
Approach for Web Mining ” et al. (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 1(2), 2010.
[5] T. Nakayama, H. Kato, and Y. Yamane, “Discovering the gap
between website designers’ expectations and users’ behavior” InProceding of the Ninth Int’l World Wide Web Conference, Amsterdam, May 2009.
[6] Ajay Ohri “Data mining through Cloud
Computing”.http://knol.google.com/k/data-mining-through-cloud-computing#See on Dec. 2010.
[7] Gulshan Shrivastava, Kavita Sharma, Swarnlata Rai, “Technical
Overview Dos and DDos Attack” in Proceeding of International Conference in Computing 2010, ACRS, Pp 274-282, 2010.
[8] Michael Jennings,” What are the major comparisons or
differences between Web mining and data” in proceeding of International journal of computer science and network security (IJCSNS)
March 2009.
[9] Magdalini Eirinaki and Michalis Vazirgiannis, “Web Mining for
WebPersonalization” in ACM Transaction on Internet Technology, Vol. 3, No.1, Feb. 2008.
[10] Adeyinka .O, “Internet attack methods and internet security
technology, “Modelling and simulation”, 2008. AICMS 08. Second Asia International conference on vol., no. , pp 77-82, 13-15 May 2011.
[11] Marin, G.A, “Network security basics”, “Security &
privacy,IEEE,vol.3,no.6,pp.68-72,nov-dec.2008.
[12] “Improving security”, http://www.cert.org/tech_tips,2009.
[13] Curtin, M. “Introduction to network
security”,http://www..interhack.net/pubs/network security.
[14] “Security Overview”, www.redhat.com/docs/manuals/
enterprise/RHEL-4-Mannual/security-guide/ch-sgs-ov.html.
[15] Virgilio Almeida, Azer Bestavros, Mark Crovella, and Adriana
deOliveira, “ Characterizing reference locality in the WWW ” , In
IEEEInternational Conference in Parallel and Distributed
InformationSystems, Miami Beach, Florida, USA, December 2007.
Acknowledgement
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 4, April 2013)