3726
A Model For Analysis Most Visited Web Page For
Web Usage Mining
Jayanti Mehra , Dr. R S Thakur
Abstract: Weblog analysis takes raw data from access log and performs study on this data for extracting statistical information. These info incorporates a variety of data for the website activity such as average no. of hits, total no. of user visits, failed & successful cached hits, average time of view, average path length over a website and analytical information such as page was not found errors and server errors, server information which includes exit and entry pages, single access pages and top visited pages, requester information like which type of search engines is used, keywords and top referring sites and so on. In general, the website administrator uses this kind of knowledge to make better the system act, helping in the manipulation process of site, then also forgiving marketing decisions support. Most of the advanced Web mining systems practice this kind of informat ion to take out more difficult or complex interpretation those take learning, using data mining procedures like association rules, clustering, and classification etc.
Keywords: Web Usage Mining , Data Preprocessing, Site Modification, Weblog Expert tool, Top Visited Web Pages.
————————————————————
1.
INTRODUCTION
Nowadays internet has become a convenient foundation and source of information in everyone's daily activity. The World Wide Web had gone through enormous development in last two decades but its amount of swap and extent increased the trouble for different websites [1]. To fulfill the demands of their users, the e-commerce website is quickly progressing hence their importance is obvious [2]. Because of several tremendous benefits of web research, it is pretty interesting thing for organizations [3]. It has helped to improve the profitability of the market and also for the benefit of the market intelligence, this also helps in marketing and comparative analysis for finding the customer relationships. The web data were organized and assembled, and structured thorough the client’s profiles. This advantage helps organizations to save current clients by giving more customized administrations; however, it additionally contributes in finding for potential clients [4].
A. Process of finding most desired web pages
Once raw data is cleaned the frequency of pages is counted. The frequency of page could be defined as the page visibility number in a particular file [5]. The Block diagram of frequency count process for each web page is shown in Fig. 1. The process for frequency count of web page for web log analysis.
Fig 1: Process for frequency Count of web page for
B. Roles of most desired web pages
Statistics is critical task to any website Administrator to know useful information regarding to own Web site contents such as to find number of hits by each client on a particular page host, page and certain substance [6]. Number of hits can be added up based on its substantial line in weblog records and further these lines can incorporate browsing, downloading and posting exercises done by various clients.
1. The output of this stage changed into suitable jobs like,
2. Re-structure or Re-create the website
3. Improve access (through) comparative practice. 4. Different task like maximum access URL, pages,
substance can be investing atedand checked. 5. Monitor client's exercises towards specific page.
The statistics has been constructed here to extract useful information such as hit count of each page of site depends on content of page, no. of valid hits on the page and some other information. Figure .2 shows structure of weblog data [7]. As the information on WWW is growing exponentially, finding the relevant information according to the user’s interest and need is a challenging issue [8]. The user is presented with number of URLs to locate his required need. Thus, more time and efforts are required to obtain required information. Web finding most desired web page is the solution to this problem [18]. In many commercial applications website attractiveness is a crucial feature from the business perspective. So, website structures i.e. the web pages organization needs to be improved [9]. Web usage mining extracts the knowledge from users’ behavior and helps the website designer to modify the website design, presented an approach for adaptive websites which automatically improves web structure organization by mining web usage logs from web server. Authors presented a cluster mining algorithm known as Page Gather for mining purpose [10] [19Web usage mining for web page access frequency consists of mainly three steps; preprocessing, pattern analysis and total hits calculation [14] [15]. The preprocessing step mainly consists of data cleaning, user_identification and session_identification. Pattern discovery step is used to identify the interesting pattern from web usage data [11] [12] [13] When the users browse in any website, they search for the desired information which will be placed in particular pages [16] [20]. If the information is significantly common to different users, those pages will be accessed with a high frequency. Thus, to identify those pages,
Pa ges RAW WEB LOG DAT A IN DIFF EREN T SOUR CES Data Cleani ng USER IDEN TIFIC ATIO N SESSI ON IDEN TIFIC ATIO N PREP ROCE SSED DATA Weblo g Server Data Preproc essing Freque ncy Count Analysi s Method Most Desired Web Pages ______________________
Jayanti Mehra, M.C.A Maulana Azad National Institute of Technology Bhopal ,India, mehra.jayanti109@gmail.com
the concept of high frequency accessed pages has been considered [12].
2.
D
ATA FROM WEBLOGSFig. 2 Snapshot of raw data from weblog
The standard database has taken from NASA-HTTP [21] for performance evaluation of proposed models. The snapshot of raw weblog data is shown in Fig 2. Hits counted for every website can recognize the website filed, most browsed web page, most users' access, desired page requested [17]. In case of cluster websites, we calculate the experiential cooperative probability of hits count for everyone to find the correlation of sites to which class belongs by applying page access frequency count algorithm on Hits counted for every website can recognize the website filed, most browsed web page, most users' access, desired page requested. In case of cluster websites, we calculate the experiential cooperative probability of hits count for everyone to find the correlation of sites to which class belongs by applying page access frequency count algorithm on preprocessed input data.
3.
E
XPERIMENTAL RESULTS3728 Table 1 Highest Hits Against Number of Record
S.No Page Hits
Inco mple te R eque sts Visitors Band width (KB)
1
http://www.weblog-expert.com/ index.html
134,2
03 0 4,575 326,5
34
2
http://www.weblog-expert.com/ 5.html 5,010 0 1,510 3,588
3
http://www.weblog-expert.com/ 167.html 2,768 0 1,091 3,257
4
http://www.weblog-expert.com/ 990.html 2,289 0 965 9,945
5 expert.com/ 39.htmlhttp://www.weblog- 1,641 0 875 1,026
6
http://www.weblog-expert.com/ 991.html 2,008 0 856 4,142
7
http://www.weblog-expert.com/ 28.html 1,958 0 770 2,196
8 expert.com/ 989.htmlhttp://www.weblog- 1,598 0 759 1,484
9
http://www.weblog-expert.com/ 41.html 1,533 0 733
49,19 1
10
http://www.weblog-expert.com/ 2142.html 1,567 0 660 1,252
11
http://www.weblog-expert.com/ 303.html 1,491 0 634 3,766
12
http://www.weblog-expert.com/ 220.html 1,555 0 585
12,49 4
13
http://www.weblog-expert.com/ 34.html 1,252 0 582 3,511
14
http://www.weblog-expert.com/ 165.html 1,003 0 578 537
15
http://www.weblog-expert.com/ 627.html 1,575 0 567 1,980
16
http://www.weblog-expert.com/ 36.html 1,340 0 563 2,545
17
http://www.weblog-expert.com/ 306.html 1,337 0 561 2,511
18
http://www.weblog-expert.com/ 992.html 992 0 560 827
19
http://www.weblog-expert.com/ 153.html 914 0 535 1,715
20 expert.com/ 332.htmlhttp://www.weblog- 741 0 476 762
21
http://www.weblog-expert.com/ 216.html 809 0 468 2,313
22
http://www.weblog-expert.com/ 219.html 963 0 465 524
23 expert.com/ 974.htmlhttp://www.weblog- 1,551 0 465 8,061
24
http://www.weblog-expert.com/ 45.html 994 0 462
10,70 9
25
http://www.weblog-expert.com/ 706.html 1,817 0 459
13,20 9
26
http://www.weblog-expert.com/ 323.html 777 0 458 845
27
http://www.weblog-expert.com/ 44.html 803 0 442 500
28
http://www.weblog-expert.com/ 1082.html 1,064 0 441 4,212
29
http://www.weblog-expert.com/ 161.html 987 0 422
10,31 3
30
http://www.weblog-expert.com/ 302.html 786 0 417 339
31
http://www.weblog-expert.com/ 277.html 1,285 0 415 912
32
http://www.weblog-expert.com/ 202.html 617 0 403 1,046
33
http://www.weblog-expert.com/ 245.html 1,074 0 402 215
34
http://www.weblog-expert.com/ 170.html 547 0 389 764
35
http://www.weblog-expert.com/ 113.html 894 0 388 7,575
36
http://www.weblog-expert.com/ 267.html 1,256 0 383 2,026
37
http://www.weblog-expert.com/ 268.html 1,082 0 381 1,768
38
http://www.weblog-expert.com/ 318.html 595 0 372 375
39
http://www.weblog-expert.com/ 246.html 1,019 0 368 4,548
40
http://www.weblog-expert.com/ 276.html 1,026 0 366 1,162
41
http://www.weblog-expert.com/ 172.html 549 0 365 1,076
42
http://www.weblog-expert.com/ 275.html 572 0 338 1,123
43
http://www.weblog-expert.com/ 3664.html 812 0 333 3,072
44 expert.com/ 224.htmlhttp://www.weblog- 758 0 332 2,108
45
http://www.weblog-expert.com/ 1611.html 678 0 327 309
46
http://www.weblog-expert.com/ 1029.html 738 0 327 834
47 expert.com/ 29.htmlhttp://www.weblog- 559 0 323 407
48
http://www.weblog-expert.com/ 566.html 695 0 323 4,134
49
http://www.weblog-expert.com/ 188.html 461 0 317 1,120
50
http://www.weblog-expert.com/ 309.html 597 0 313 1,528
Subtotal 193,1
40 0 N/A 520,4
15
Total 326,8
36 0 N/A 1,027
Fig. 3 Visitors Information According to Date
In this paper process of page access frequency including required tasks has been described. Table 1 shows highest hits against number of record and its graphical representation is shown in Fig 3. Users either browse web page directly or use web site to go particular page. This page access frequency used for improved particular web page, re-create web sites to contain corresponding pages, upgrade user search through user clusters with similar behavior. Website data is more useful in e-marketing, e-business and e-commerce and for this it should be updated from time to time. And if page access frequency algorithm is used, it will count no. of hits per page and will help to find which page is required to be modified. This is an effective and useful technique used in e-commerce.
4.
CONCLUSION
As the information on WWW is growing exponentially, finding the relevant information according to the user’s interest is a challenging issue. The user behavior is presented with number of URLs to locate his required need. Thus, more time and efforts are required to obtain relevant information. Web page access frequency is the solution to this problem. In many commercial applications website attractiveness is a crucial feature from the business perspective. So, website structures i.e. the web pages organization needs to be improved. Web usage mining extracts the knowledge from users’ behavior and helps the website designer to modify the website. This work has presented an approach for adaptive websites which automatically improves web structure organization by mining web usage logs from web server.
REFERENCES
[1] A. Kumar, V. Ahirwar, R K Singh, “A Study on Prediction of User Behavior Based on Web Server Log Files in Web Usage Mining”, International Journal of Engineering and Computer Science, Vol. 6 (2), pp. 20233-20236, 2015.
[2] A. Talakokkula, “A Survey on Web Usage Mining, Applications and Tools”, Computer Engineering and Intelligent Systems, Vol.6 (2), pp.22-29, 2015.
[3] B. K. Malviya and J Agrawal, "A Study on Web Usage Mining Theory and Applications" In Proc. of Fifth International Conference on Communication Systems and Network Technologies (CSNT) IEEE, pp. 935-939, 2015.
[4] D. Resul and I. Turkoglu, "Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method" Expert Systems with Applications, Vol.36(3), pp.6635-6644, 2009.
[5] F. M. Facca and P. L. Lanzi, "Recent developments in web usage mining research", In International Conference on Data Warehousing and Knowledge Discovery, Springer, pp. 140-150, 2003.
[6] F. M. Facca and P. L. Lanzi, “Mining interesting knowledge from weblogs: a survey”, Data & Knowledge Engineering, Vol.53(3), pp.225-241,2005.
[7] F. Yu, L. J. Wang and G. Yu, "Study on data preprocessing algorithm in weblog mining," In Proc of International Conference on Machine Learning and Cybernetics, Vol .1(1), pp. 28-32, 2003.
[8] Grace, “Analysis of Weblogs and Web User in Web Mining”, International Journal of Network Security & Its Applications, Vol.3(1), pp. 99-110, 2011.
[9] J. Han and M. Kamber,” Data Mining: Concepts and Techniques”, 3rd edition, Morgan Kaufmann ,2011. [10] J. M. P. Jeba, M. S. Bhuvaneswari and K.
Muneeswaran, "Extracting usage patterns from web server log", In Proc. of 2nd International Conference on Green High Performance Computing, IEEE, pp.1-7, 2016.
[11] K. S. Reddy, M. K. Reddy and V. Sitaramulu, "An effective data preprocessing method for Web Usage Mining", In Proc. of International Conference on Information Communication and Embedded Systems, IEEE, pp. 7-10, 2013.
[12] K. Xie, H. Yu, and R. Cen, "Using log mining to analyze user behavior on search engine," Frontiers of Electrical and Electronic Engineering, vol. 7(2), pp. 254-260, 2012.
[13] L. Liu, J. Zhou and Hongyan, “Data preprocessing of web usage mining”, Computer Science, Vol. 34, pp. 200-204, 2007.
[14] M. Munk, J. Kapusta and P. Švec, "Data preprocessing evaluation for weblog mining: reconstruction of activities of a web visitor," Procedia Computer Science, Vol. 1(1), pp.2273-2280, 2010.
[15] M. Srivastava, R. Garg, and P. K. Mishra, "Analysis of data extraction and data cleaning in Web usage mining", In Proc. of International Conference on Advanced Research in Computer Science Engineering & Technology ACM, pp. 13-13, 2015.
[16] N. Agrawal and A. Jawdekar,"User-based approach for finding various results in web usage mining", In Proc. of Symposium on Colossal Data Analysis and Networking, IEEE, pp. 1-6, 2016.
[17] N. Kaur and H. Aggarwal, "Weblog analysis for identifying the number of visitors and their behavior to enhance the accessibility and usability of website", International Journal of Computer Applications, Vol.110(4), pp. 25-30,2015.
[18] P. Suhasini and B. Joshi, "Analysis of user behavior through web usage mining ", In Proc. of International Conference on Advances in Science and Technology, pp.65-70, 2014.
[19] P. Sukumar, L. Robert and S. Yuvaraj, "Review on modern Data Preprocessing techniques in Web usage mining (WUM)", In Proc. of International Conference onComputation System and Information Technology for Sustainable Solutions, IEEE, pp. 64-69, 2016.
3730 patterns, Knowledge and information systems, Vol.1(1),
pp.5-32, 1999.
[21] Web log data downloaded from