International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 6, Issue 3, March 2016)181
Predicting User Access Pattern Using Markov Model and
Association Rule.
Priyanka Makkar
1, Meenal Shingare
2, Dipali Kadam
3 1,2,3Assistant Professor, PICT, Pune, INDIAAbstract—Predicting Web user`s behaviour from the logs has been recognised and discussed by many researchers. This requires mining the log files to get useful information .Many Models are proposed with advantages and disadvantages to predict the next pages to be accessed by user based on previous history .In this paper, we have proposed a framework that can predict the user behaviour accurately. Markov model can to used to determine next state based on previous state but Low order Markov models do not consider history in detail and therefore, accuracy is very low, whereas, high order Markov models has high complexity. Association rules can also be used for prediction but it generates many rules, which result in contradictory predictions for a user session. In this paper, we propose an improved approach, based on a combination of Markov models and association rules that result in better prediction accuracy. We use low order Markov models to predict multiple pages to be visited by a user and then we apply association rules to predict the next page to be accessed by the user based on long history data.
Keywords— Association rules, Prediction, Weblogs, Markov Model
I. INTRODUCTION
As World Wide Web is growing at tremendous rate, this provides users with lot of data .This data can be processed to provide useful information to the user and can also be used to improve Web Performance .In this paper we are focusing on web logs. Web log consist to URLS accessed by the user having different IP addresses. Web logs can be processed to provide better web performance and can also be used for personalization [10]. In this paper we are studying web logs to predict next page to be accessed by the user so that page can be pre-fetched in the client‘s cache and thus web performance can be improved [9]. Predicting the next page to be accessed by a Web user is achieved using various techniques. Two of the most common approaches are Markov models and association rules [6].
Each of the approaches used for this purpose has its own weaknesses when it comes to accuracy, coverage and performance. Lower order Markov models lack accuracy because of the limitation in covering enough browsing history; whereas higher order Markov models usually result in higher space complexity [2,11]. On the other hand, association rules have the problem of identifying the one correct prediction out of the many rules that lead to a large number of predictions. This paper uses combination of markov model and association rule to provide better accuracy of prediction. This paper is organized as follows. In section 2, we discuss the related work in this area. In section 3, we present with the framework for prediction. In section 4, we discuss the experimental result. We conclude our work in section 5.
II.
RELATED WORKWeb MiningWeb mining is the application of data mining
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 6, Issue 3, March 2016)182
Association Rule
Association rule checks the co-occurrence of buying related items while shopping.. Given a set of transactions, where each transaction is a set of items, an association rule is an expression X=>Y, where X and Y are sets of items. The intuitive meaning of such a rule is that transactions in the database which contain the items in X tend to contain also the items in Y. For instance, 98% customers who purchase bread also buy some butter; 98% is called the confidence of the rule. The support of the rule X=>Y is the percentages of transactions that contain both X and Y. Association rule generation can be used to relate pages that are most often referenced together in a single server session [3] In the context of Web usage mining, association rules refer to set of pages that are accessed together with a support values exceeding some specified threshold. The association rules may also serves as a heuristic for pre-fetching documents in order to reduce user-perceived latency when loading a page from a remote site. These rules are used in order to reveal correlations between pages accessed together during a server session. Such rules indicate the possible relationship between pages that are often viewed together even if they are not directly connected, and can reveal associations between groups of users with specific interests.
Markov Model
Markov models can be used to find next page to be accessed by the web site user based previous history [1]. Let p= {p1, p2...pm} be a set of pages in a Web site. Let W
be a user session including a sequence of pages visited by the user in a visit. Assuming that the user has visited l
pages, then prob (pi|W) is the probability that the user visits pages pi next. Page pl+1 the user will visit next is estimated by[11]:
This probability, prob(pi|W), is estimated by using all W
sequences of all users in history (or training data), denoted by W. Naturally, the longer l and the larger W, the more accurate prob(pi|W). However, it is infeasible to have very long l and large W and it leads to complexity. Therefore, to overcome this problem, a more feasible probability is estimated by assuming that the sequence of the Web pages visited by users follows a Markov process. The Markov process imposed a limit on the number of previously accessed pages k. In other words, the probability of visiting a page pi does not depend on all the pages in the Web session, but only on a small set of k preceding pages, where
k << l. The equation becomes [11]:
The fundamental assumption of predictions based on Markov models is that the next state is dependent on the previous k states. The longer the k is, the more accurate the predictions are. However, longer k causes the following two problems: The coverage of model is limited and leaves many states uncovered; and the complexity of the model becomes unmanageable [1]. Therefore, the following modified Markov model for Predicting Web page access is proposed.
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 6, Issue 3, March 2016)183
III.
F
RAMEWORK TO PREDICT FUTURE ACCESS OF THE USERFigure1:Framework for predicting user behaviour
IV.EXPERIMENTAL DESIGN
In this paper we have used Markov model, afterwards pruning is applied depending on threshold and then association rules with highest confidence is pre-fetched. We propose to use low order Markov models to keep low complexity and high coverage. The accuracy of low order Markov models is normally not satisfactory. And then for those Markov states that provide ambiguous predictions, we make use of association rules to predict more accurate pages for pre-fetching [11]. In this paper we use an example to show the idea of the integration and thereby improving the performance by pre-fetching the appropriate pages.
In this paper we have taken dummy logs to study user access pattern. In this paper URL is represented by letters in for simplicity purposes.
With reference to the Figure1:
Pre-processing
Web logs are pre-processed or cleaned to remove irrelevant data .Real world data are generally incomplete, presence of noise, discrepancies are very common[4]. To get accurate results data needs to be pre-processed, for e.g. removal of pages which no longer exist, removal of bad links, unauthorized access, removal of those links which user has not accessed directly are required to remove to give appropriate prediction[4].
Session Identification
After cleaning of web- logs, session are identified based on IP address and time stamp. Sessions are created for every user which show the access pattern of the user in that particular time stamp [7]. We have considered threshold of 25 minutes for session identification and after that threshold new session will be created. For appropriate prediction certain length of transaction needs to be considered so we will remove transaction of length less than 2[8].Table 1.1 shows he created session of particular user.
TABLEI
USER SESSIONS
With reference to Table1 we have calculated the frequency of accessed page to user further in markov model
Session 1:
E->F->C->D->G->F->C->B->D ->G->H
Session 2:
A->C->D->G->C->B->F ->G->H
Session 3:
E->F->C->D->H->C->D
Session 4:
E->C->D->H->F->A->D->F ->G->H
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 6, Issue 3, March 2016)184 TABLE II
PAGE VIEW FREQUENCY
TABLE III
SUPPORT COUNT AFTER APPLYING 1ST
ORDER MARKOV ANALYSIS
[image:4.612.89.271.164.382.2]Considering support threshold as 3 , Pruning is done in
TABLE III. Applying the 2nd order Markov Model to above training user sessions and the support count for the access is shown in Table IV.
TABLE IV
SUPPORT COUNT AFTER APPLYING 2ND
ORDER MARKOV ANALYSIS
A
B
C
D
E
F
G
H
C->B
0
0
1
1
0
1
0
0
C->D
0
0
0
0
0
0
2
3
D->G
0
0
1
0
0
1
0
1
D->H
0
0
2
0
0
1
0
0
F->C
0
1
0
2
0
0
0
0
G->H
0
0
0
0
0
0
0
0
V.CONCLUSION
In this paper we have analysed web logs. From log file, we have build a model to predict the users‘ next page accessed based on their previous history. This paper used Markov model with association rule to predict the future page. Predicted page can be pre-fetched to improve performance
.
With reference to Table V we can conclude that using Markov models, we can determine that there is a 50% confidence that the next page to be accessed by the user after accessing the C and D could be H.Page
Frequency
A
2
B
3
C
10
D
9
E
3
F
6
G
5
H
6
A
B
C
D
E
F
G
H
A
0
0
1
1
0
0
0
0
B
0
0
1
1
0
1
0
0
C
0
3
0
6
0
0
0
0
D
0
0
1
0
0
1
3
3
E
0
0
1
0
0
2
0
0
F
1
0
3
0
0
0
2
0
G
0
0
1
0
0
1
0
3
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 6, Issue 3, March 2016)185
TABLE V
CONFIDENCE OF SELECTED RULES
References
[1] Deshpande, M. & Karypis, G.2004,‖Models for Predicting Web Page Accesses‖, ACM Transactions on Internet Technology, Vol. 4, No. 2, May 2004, Pages 163–184.
[2] Faten Khalil , Jiuyong Li , Hua Wang ,‖A framework for combining markov model with association rules for predicting web page accesses‖ Fifth Australasian Data Mining Conference (AusDM2006).
[3] Ramakrishnan Srikant, RakeshAgrawal,‖,Mining Generalized Association Rules‖, Proceedings of the 21st VLDB Conference Zurich,Swizerland,1995.
[4] Siriporn chimphlee,Naomie salim,Mohd salihin Bin Ngadiman,Witcha Chimphlee,‖Using Markov Model and Association Rules for Web Access Prediction‖,,Advances in system,computing science and software engineering,proceeding of SCSS,2005,Spinger.
[5] Madria, S.K., Bhowmick, S.S., Ng, W.K.and Lim E., ―Research Issues in Web Data Mining‖, Proc. First International Conference on Data Warehousing and Knowledge Discovery, Italy, Florence, 1999, 303-312.
[6] Pitkow, J. and Pirolli, P., ―Mining Longest Repeating Subsequences to Predict World Wide Web Surfing‖, Proc. USENIX Symp. On Internet Technologies and Systems, 1999.
[7] Cooley, R,Tan, P-N, Srivastava, J., ―Discovery of Interesting Usage Patterns from Web Data‖, Springer- Verlag LNCS/LNAI series, 2000.
[8] Fu, Y., Sandhu, K. and Shih, M.Y., ―Clustering of Web Users Based on Access Patterns‖, Proc. of the 5th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, San Diego: Springer, 1999.
[9] Yang, Q., Li, T., Wang, K., ―Building Association- Rules Based Sequential Classifiers for Web- Document Prediction‖, Journal of Data Mining and Knowledge Discovery, Netherland: Kluwer Academic Publisher, vol. 8, 2004, 253-273.
[10] Mobasher, B., Dai, H., Luo, T. & Nakagawa, M. (2000), ―Discovery of Aggregate Usage Profiles for Web Personalization‖, in ‗Web KDD Workshop 2000‘, USA,
[11] Faten Khalil,Jiuyong Li,Hua Wang,‖A framework of Combining Markov Model With Association Rules for Predicting Web Page Access‖, Data Mining and Analytics 2006, Proceedings of the Fifth Australasian Data Mining Conference (AusDM2006), Sydney, NSW, Australia.