A Model For Analysis Most Visited Web Page For Web Usage Mining

Full text

(1)

3726

A Model For Analysis Most Visited Web Page For

Web Usage Mining

Jayanti Mehra , Dr. R S Thakur

Abstract: Weblog analysis takes raw data from access log and performs study on this data for extracting statistical information. These info incorporates a variety of data for the website activity such as average no. of hits, total no. of user visits, failed & successful cached hits, average time of view, average path length over a website and analytical information such as page was not found errors and server errors, server information which includes exit and entry pages, single access pages and top visited pages, requester information like which type of search engines is used, keywords and top referring sites and so on. In general, the website administrator uses this kind of knowledge to make better the system act, helping in the manipulation process of site, then also forgiving marketing decisions support. Most of the advanced Web mining systems practice this kind of informat ion to take out more difficult or complex interpretation those take learning, using data mining procedures like association rules, clustering, and classification etc.

Keywords: Web Usage Mining , Data Preprocessing, Site Modification, Weblog Expert tool, Top Visited Web Pages.

————————————————————

1.

INTRODUCTION

Nowadays internet has become a convenient foundation and source of information in everyone's daily activity. The World Wide Web had gone through enormous development in last two decades but its amount of swap and extent increased the trouble for different websites [1]. To fulfill the demands of their users, the e-commerce website is quickly progressing hence their importance is obvious [2]. Because of several tremendous benefits of web research, it is pretty interesting thing for organizations [3]. It has helped to improve the profitability of the market and also for the benefit of the market intelligence, this also helps in marketing and comparative analysis for finding the customer relationships. The web data were organized and assembled, and structured thorough the client’s profiles. This advantage helps organizations to save current clients by giving more customized administrations; however, it additionally contributes in finding for potential clients [4].

A. Process of finding most desired web pages

Once raw data is cleaned the frequency of pages is counted. The frequency of page could be defined as the page visibility number in a particular file [5]. The Block diagram of frequency count process for each web page is shown in Fig. 1. The process for frequency count of web page for web log analysis.

Fig 1: Process for frequency Count of web page for

B. Roles of most desired web pages

Statistics is critical task to any website Administrator to know useful information regarding to own Web site contents such as to find number of hits by each client on a particular page host, page and certain substance [6]. Number of hits can be added up based on its substantial line in weblog records and further these lines can incorporate browsing, downloading and posting exercises done by various clients.

1. The output of this stage changed into suitable jobs like,

2. Re-structure or Re-create the website

3. Improve access (through) comparative practice. 4. Different task like maximum access URL, pages,

substance can be investing atedand checked. 5. Monitor client's exercises towards specific page.

The statistics has been constructed here to extract useful information such as hit count of each page of site depends on content of page, no. of valid hits on the page and some other information. Figure .2 shows structure of weblog data [7]. As the information on WWW is growing exponentially, finding the relevant information according to the user’s interest and need is a challenging issue [8]. The user is presented with number of URLs to locate his required need. Thus, more time and efforts are required to obtain required information. Web finding most desired web page is the solution to this problem [18]. In many commercial applications website attractiveness is a crucial feature from the business perspective. So, website structures i.e. the web pages organization needs to be improved [9]. Web usage mining extracts the knowledge from users’ behavior and helps the website designer to modify the website design, presented an approach for adaptive websites which automatically improves web structure organization by mining web usage logs from web server. Authors presented a cluster mining algorithm known as Page Gather for mining purpose [10] [19Web usage mining for web page access frequency consists of mainly three steps; preprocessing, pattern analysis and total hits calculation [14] [15]. The preprocessing step mainly consists of data cleaning, user_identification and session_identification. Pattern discovery step is used to identify the interesting pattern from web usage data [11] [12] [13] When the users browse in any website, they search for the desired information which will be placed in particular pages [16] [20]. If the information is significantly common to different users, those pages will be accessed with a high frequency. Thus, to identify those pages,

Pa ges RAW WEB LOG DAT A IN DIFF EREN T SOUR CES Data Cleani ng USER IDEN TIFIC ATIO N SESSI ON IDEN TIFIC ATIO N PREP ROCE SSED DATA Weblo g Server Data Preproc essing Freque ncy Count Analysi s Method Most Desired Web Pages ______________________

Jayanti Mehra, M.C.A Maulana Azad National Institute of Technology Bhopal ,India, mehra.jayanti109@gmail.com

(2)

the concept of high frequency accessed pages has been considered [12].

2.

D

ATA FROM WEBLOGS

Fig. 2 Snapshot of raw data from weblog

The standard database has taken from NASA-HTTP [21] for performance evaluation of proposed models. The snapshot of raw weblog data is shown in Fig 2. Hits counted for every website can recognize the website filed, most browsed web page, most users' access, desired page requested [17]. In case of cluster websites, we calculate the experiential cooperative probability of hits count for everyone to find the correlation of sites to which class belongs by applying page access frequency count algorithm on Hits counted for every website can recognize the website filed, most browsed web page, most users' access, desired page requested. In case of cluster websites, we calculate the experiential cooperative probability of hits count for everyone to find the correlation of sites to which class belongs by applying page access frequency count algorithm on preprocessed input data.

3.

E

XPERIMENTAL RESULTS

(3)

3728 Table 1 Highest Hits Against Number of Record

S.No Page Hits

Inco mple te R eque sts Visitors Band width (KB)

1

http://www.weblog-expert.com/ index.html

134,2

03 0 4,575 326,5

34

2

http://www.weblog-expert.com/ 5.html 5,010 0 1,510 3,588

3

http://www.weblog-expert.com/ 167.html 2,768 0 1,091 3,257

4

http://www.weblog-expert.com/ 990.html 2,289 0 965 9,945

5 expert.com/ 39.htmlhttp://www.weblog- 1,641 0 875 1,026

6

http://www.weblog-expert.com/ 991.html 2,008 0 856 4,142

7

http://www.weblog-expert.com/ 28.html 1,958 0 770 2,196

8 expert.com/ 989.htmlhttp://www.weblog- 1,598 0 759 1,484

9

http://www.weblog-expert.com/ 41.html 1,533 0 733

49,19 1

10

http://www.weblog-expert.com/ 2142.html 1,567 0 660 1,252

11

http://www.weblog-expert.com/ 303.html 1,491 0 634 3,766

12

http://www.weblog-expert.com/ 220.html 1,555 0 585

12,49 4

13

http://www.weblog-expert.com/ 34.html 1,252 0 582 3,511

14

http://www.weblog-expert.com/ 165.html 1,003 0 578 537

15

http://www.weblog-expert.com/ 627.html 1,575 0 567 1,980

16

http://www.weblog-expert.com/ 36.html 1,340 0 563 2,545

17

http://www.weblog-expert.com/ 306.html 1,337 0 561 2,511

18

http://www.weblog-expert.com/ 992.html 992 0 560 827

19

http://www.weblog-expert.com/ 153.html 914 0 535 1,715

20 expert.com/ 332.htmlhttp://www.weblog- 741 0 476 762

21

http://www.weblog-expert.com/ 216.html 809 0 468 2,313

22

http://www.weblog-expert.com/ 219.html 963 0 465 524

23 expert.com/ 974.htmlhttp://www.weblog- 1,551 0 465 8,061

24

http://www.weblog-expert.com/ 45.html 994 0 462

10,70 9

25

http://www.weblog-expert.com/ 706.html 1,817 0 459

13,20 9

26

http://www.weblog-expert.com/ 323.html 777 0 458 845

27

http://www.weblog-expert.com/ 44.html 803 0 442 500

28

http://www.weblog-expert.com/ 1082.html 1,064 0 441 4,212

29

http://www.weblog-expert.com/ 161.html 987 0 422

10,31 3

30

http://www.weblog-expert.com/ 302.html 786 0 417 339

31

http://www.weblog-expert.com/ 277.html 1,285 0 415 912

32

http://www.weblog-expert.com/ 202.html 617 0 403 1,046

33

http://www.weblog-expert.com/ 245.html 1,074 0 402 215

34

http://www.weblog-expert.com/ 170.html 547 0 389 764

35

http://www.weblog-expert.com/ 113.html 894 0 388 7,575

36

http://www.weblog-expert.com/ 267.html 1,256 0 383 2,026

37

http://www.weblog-expert.com/ 268.html 1,082 0 381 1,768

38

http://www.weblog-expert.com/ 318.html 595 0 372 375

39

http://www.weblog-expert.com/ 246.html 1,019 0 368 4,548

40

http://www.weblog-expert.com/ 276.html 1,026 0 366 1,162

41

http://www.weblog-expert.com/ 172.html 549 0 365 1,076

42

http://www.weblog-expert.com/ 275.html 572 0 338 1,123

43

http://www.weblog-expert.com/ 3664.html 812 0 333 3,072

44 expert.com/ 224.htmlhttp://www.weblog- 758 0 332 2,108

45

http://www.weblog-expert.com/ 1611.html 678 0 327 309

46

http://www.weblog-expert.com/ 1029.html 738 0 327 834

47 expert.com/ 29.htmlhttp://www.weblog- 559 0 323 407

48

http://www.weblog-expert.com/ 566.html 695 0 323 4,134

49

http://www.weblog-expert.com/ 188.html 461 0 317 1,120

50

http://www.weblog-expert.com/ 309.html 597 0 313 1,528

Subtotal 193,1

40 0 N/A 520,4

15

Total 326,8

36 0 N/A 1,027

(4)

Fig. 3 Visitors Information According to Date

In this paper process of page access frequency including required tasks has been described. Table 1 shows highest hits against number of record and its graphical representation is shown in Fig 3. Users either browse web page directly or use web site to go particular page. This page access frequency used for improved particular web page, re-create web sites to contain corresponding pages, upgrade user search through user clusters with similar behavior. Website data is more useful in e-marketing, e-business and e-commerce and for this it should be updated from time to time. And if page access frequency algorithm is used, it will count no. of hits per page and will help to find which page is required to be modified. This is an effective and useful technique used in e-commerce.

4.

CONCLUSION

As the information on WWW is growing exponentially, finding the relevant information according to the user’s interest is a challenging issue. The user behavior is presented with number of URLs to locate his required need. Thus, more time and efforts are required to obtain relevant information. Web page access frequency is the solution to this problem. In many commercial applications website attractiveness is a crucial feature from the business perspective. So, website structures i.e. the web pages organization needs to be improved. Web usage mining extracts the knowledge from users’ behavior and helps the website designer to modify the website. This work has presented an approach for adaptive websites which automatically improves web structure organization by mining web usage logs from web server.

REFERENCES

[1] A. Kumar, V. Ahirwar, R K Singh, “A Study on Prediction of User Behavior Based on Web Server Log Files in Web Usage Mining”, International Journal of Engineering and Computer Science, Vol. 6 (2), pp. 20233-20236, 2015.

[2] A. Talakokkula, “A Survey on Web Usage Mining, Applications and Tools”, Computer Engineering and Intelligent Systems, Vol.6 (2), pp.22-29, 2015.

[3] B. K. Malviya and J Agrawal, "A Study on Web Usage Mining Theory and Applications" In Proc. of Fifth International Conference on Communication Systems and Network Technologies (CSNT) IEEE, pp. 935-939, 2015.

[4] D. Resul and I. Turkoglu, "Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method" Expert Systems with Applications, Vol.36(3), pp.6635-6644, 2009.

[5] F. M. Facca and P. L. Lanzi, "Recent developments in web usage mining research", In International Conference on Data Warehousing and Knowledge Discovery, Springer, pp. 140-150, 2003.

[6] F. M. Facca and P. L. Lanzi, “Mining interesting knowledge from weblogs: a survey”, Data & Knowledge Engineering, Vol.53(3), pp.225-241,2005.

[7] F. Yu, L. J. Wang and G. Yu, "Study on data preprocessing algorithm in weblog mining," In Proc of International Conference on Machine Learning and Cybernetics, Vol .1(1), pp. 28-32, 2003.

[8] Grace, “Analysis of Weblogs and Web User in Web Mining”, International Journal of Network Security & Its Applications, Vol.3(1), pp. 99-110, 2011.

[9] J. Han and M. Kamber,” Data Mining: Concepts and Techniques”, 3rd edition, Morgan Kaufmann ,2011. [10] J. M. P. Jeba, M. S. Bhuvaneswari and K.

Muneeswaran, "Extracting usage patterns from web server log", In Proc. of 2nd International Conference on Green High Performance Computing, IEEE, pp.1-7, 2016.

[11] K. S. Reddy, M. K. Reddy and V. Sitaramulu, "An effective data preprocessing method for Web Usage Mining", In Proc. of International Conference on Information Communication and Embedded Systems, IEEE, pp. 7-10, 2013.

[12] K. Xie, H. Yu, and R. Cen, "Using log mining to analyze user behavior on search engine," Frontiers of Electrical and Electronic Engineering, vol. 7(2), pp. 254-260, 2012.

[13] L. Liu, J. Zhou and Hongyan, “Data preprocessing of web usage mining”, Computer Science, Vol. 34, pp. 200-204, 2007.

[14] M. Munk, J. Kapusta and P. Švec, "Data preprocessing evaluation for weblog mining: reconstruction of activities of a web visitor," Procedia Computer Science, Vol. 1(1), pp.2273-2280, 2010.

[15] M. Srivastava, R. Garg, and P. K. Mishra, "Analysis of data extraction and data cleaning in Web usage mining", In Proc. of International Conference on Advanced Research in Computer Science Engineering & Technology ACM, pp. 13-13, 2015.

[16] N. Agrawal and A. Jawdekar,"User-based approach for finding various results in web usage mining", In Proc. of Symposium on Colossal Data Analysis and Networking, IEEE, pp. 1-6, 2016.

[17] N. Kaur and H. Aggarwal, "Weblog analysis for identifying the number of visitors and their behavior to enhance the accessibility and usability of website", International Journal of Computer Applications, Vol.110(4), pp. 25-30,2015.

[18] P. Suhasini and B. Joshi, "Analysis of user behavior through web usage mining ", In Proc. of International Conference on Advances in Science and Technology, pp.65-70, 2014.

[19] P. Sukumar, L. Robert and S. Yuvaraj, "Review on modern Data Preprocessing techniques in Web usage mining (WUM)", In Proc. of International Conference onComputation System and Information Technology for Sustainable Solutions, IEEE, pp. 64-69, 2016.

(5)

3730 patterns, Knowledge and information systems, Vol.1(1),

pp.5-32, 1999.

[21] Web log data downloaded from

Figure

Fig 1: Process for frequency Count of web page for Preprocessing AnalysiCount Method s Desired Web Pages

Fig 1:

Process for frequency Count of web page for Preprocessing AnalysiCount Method s Desired Web Pages p.1
Table 1 Highest Hits Against Number of Record

Table 1

Highest Hits Against Number of Record p.3
Fig. 3 Visitors Information According to Date
Fig. 3 Visitors Information According to Date p.4