International Journal of Research in Information Technology (IJRIT)
www.ijrit.com ISSN 2001-5569
An Improved Markov Model Approach for Web Usage Mining
1Er. Manisha Dhull, 2Dr.Meenakshi Sharma
1 Student of M.tech, HCTM (Kaithal), [email protected] 2 Assistant Professor in CSE Deptt. HCTM (kaithal), [email protected]
ABSTRACT A web page is the collection of all the files that are seen from the user’s screen at any point and contained several files at any time for a user. The web page contains frames, graphics and media. The clicks stream for a user is a series of pages that a user takes from a site.
When a client requests for web page, then proposed framework preprocess the requests of user and provides the pages local to user for web pre-fetching. The web page usage information of server is considered in framework at server side. The main contribution of this work is to evaluate the accuracy of web prediction system using Markov Model approach and reduce the web latency time.
Key words: Markov Model INTRODCUTION
REFERENCE MODEL A system is the way which takes some input and after processing gives output which effects by external or internal environment. The reference system is to be supposed that can be constituted by different parameters or variables. It considers some assumptions that have been implemented. To develop reference system there are corresponding one reference model as well as architecture on which it depends. A Proposed system which meets each criteria or requirements specify by human being or user is called the completely reference system. This
Basic Model
The web page prediction process has illustrated in Figure 1.1. It is clear from figure, first of all a client request to a server for the specific web page. The server will send the URL of that page to the predictor. Then the predictor will check that specific web page, if it exists then predictor will send that page to the server and the server will immediately send that page to the client to fulfill its request. Also the predictor will send that page to the update engine which updates the data structure. The predictor uses that data structure for storing the web pages [1].
FIGURE 1.1: Web Page Prediction Model 2 PROPOSED SYSTEM ARCHITECTURE
Web page prediction system involves anticipating the next page to be accessed by the user or the link the Web user will click at next when browsing a Web site. For example, what is the chance that a Web user visiting a site that sells computers will buy an extra battery when buying a laptop, may be there is a greater chance the user will buy an external floppy drive instead. User’s past browsing experience is very fundamental in extracting such information [2].
FIGURE 1.2: Prediction Architecture
Figure 1.2 shows when a client makes any request to server, server stores that request as a server log entry. It request from server log and identify sessions and use those session in the predictive system. In proposed web page prediction system combined form of markov model and clustering approach has taken to predict next web page. Now what the predictive system does is, for the first time it creates a graph using those identified sessions. Each node is a web page containing some other attributes. An edge represents the link from go to one page from another. It keeps update the graph by calculating the probability and generating new edges and nodes [3].
2.1 FLOW CHART OF PROPOSED METHOD
This flow chart of proposed method is shown in Figure 4.3 defines steps to carry out proposed system. This shows first of all, input is taken by preprocessing the web server log files in manner, similar web sessions are allocated to appropriate categories. Then number of cluster is decided and web sessions are partitioned into clusters. These clustered data is used in Markov Model approach. Then prediction algorithm is designed and hidden Markov Model is used in prediction algorithm. Then next web page for user access is evaluated as output [6].
FIGURE 1.3: Flow Chart of Proposed Method 2.2 EXPERIMENTAL EVALUATION
Work improves the Web page access prediction accuracy by combining both Markov model and clustering techniques.[4][5] It is based on dividing Web sessions into groups according to Web services and performing Markov model analysis on each cluster of sessions instead of the whole data set. This process involves the following steps:
• Preprocess the Web server log Files in a manner, similar Web sessions are allocated to appropriate categories.
• Decide on the number of clusters and partition the Web sessions into clusters according to the chosen distance measure.
• Perform Markov Model analysis on each of the clusters.
• Design the Prediction algorithm and use hidden Markov Model approach.
• Predict the Next Web Page for user access.
Step i: Preprocess the Web server log Files in a manner, similar Web sessions are allocated to appropriate categories.
Step i(i): Convert these web pages into numeric form and store in web.dat file.
Step ii: Decide on the number of clusters and partition the Web sessions into clusters according to the chosen distance measure.
Step ii(i): Use ‘findcluster’ command on command prompt on matlab. This will open a window wizard named ‘clustering’ on matlab.
Step ii(ii): Load the web.dat file for clustering.
Step ii(iii): Save the centers which converts the whole data into user defined clusters into clustered.dat file (for example k=4, means 4 clusters).
Step ii(iv): Open this clustered.dat file which have 4 clusters.
Step iii: Perform Markov Model analysis on each of the clusters.
Step iii(i): Make squared Transition Probability Matrix (Rows = Columns = Total number of individual web pages = 9) TRANSITIONS (I, J) is the probability of transition from state I to state J.
Step iii(ii): Make squared Emission Probability Matrix (Rows = Columns = Total number of individual web pages = 9) EMISSIONS(K, L) is the probability that symbol L is emitted from state K.
Step iv: Design the Prediction algorithm and use hidden Markov Model approach.
Designed the Prediction algorithm in which hidden Markov model approach is used. This algorithm gives us Next web page for user’s currently accessed web page.
Algorithm
1. Set TR = Transitions square matrix, TR(I,J) = probability of transition from state I to state J [initializes TR].
2. Set E = Emissions square matrix, E(K,L) = probability that symbol L is emitted from state K [initializes E].
3. Set Seq = sequence of user’s accessed web pages [initializes Seq].
4. numStates, number of states and Size, size of any column of TR.
numStates = Size
5. L, length of Seq
6. Repeat for count 1 to L
7. Repeat for state 1 to numStates
8. Set bestVal=0 [initializes bestVal]
9. Set bestPTR=0 [initializes bestPTR]
10. Repeat for inner 1 to numStates
val= TR[inner,state]
If val > bestVal
bestVal = val
bestPTR = inner
[End of if structure]
[End of step 10 inner loop.]
11. pTR[state,count] = bestPTR
12. v[state] = E[state,Seq[count]] + bestVal
[End of step 7 inner loop]
13. vOld = v
14. P = max[v] max is maximum value
15. finalState = max[v] max is maximum value
16. currentState[count] = finalState
[End of step 6 outer loop]
Step v: Predict the Next Web Page for user access.
Step v(i): Open designed Prediction algorithm named Prediction.m on Matlab and give the input in seq variable (This input for that web page which is accessed by user for which next web page will be available for example seq = 4;).
Step v(ii): Run this algorithm to get output which shows output on command prompt as ans = 5.
Step v(iii): If seq =7 as input which gives the output as ans = 9.
4 Conclusion
This paper describes the overall prediction accuracy by grouping the data set sessions into clusters and reduces the web latency time. The Web pages in the user sessions are first allocated into categories according to Web services that are functionally meaningful. Then, k-means clustering algorithm is implemented using the most appropriate number of clusters. Prediction techniques are applied using each cluster. The experimental results reveal that more accuracy of the next web page access prediction system can be achieved by combining markov model and
clustering approach. More accuracy describes most accurate web page predicted by predicted system as user want next time. Now user do not need to request to web page as user wants, because it has been available to the user before. This pre-fetching reduces requests made by user and reduces request time. Thus this dissertation reduces the web latency time. The prediction accuracy achieved is an improvement to previous research papers that addressed mainly recall and coverage.
3 FUTURE WORK
This paper introduced the Prediction algorithm for automatically transferring Web pages. It demonstrated that it can learn to closely reproduce human mappings, and This work takes a first step towards a powerful new paradigm for example-based Web design, and opens up exciting areas for future research. At present, the algorithm employs only about thirty simple visual and semantic features. Expanding this set to include more complex and sophisticated properties, such as those based on computer vision, will likely improve the robustness of the machine learning.
Reference
[1] Cooley R., Mobasher B., and Srivastava J., “Grouping web page references into transactions for mining world wide web browsing patterns”, Technical Report TR 97-021, Dept. of Computer Science, Univ. of Minnesota, Minneapolis, USA,1997.
[2] Larry Page, PageRank: Bringing Order to the Web, Technical report, Stanford digital libraries, 1997.
[3] The China Internet Network Information Center. http://www.cnnic.com.cn/, April 2007.
[4] T. Haveliwala, “Topic-Sensitive PageRank”. In Proceedings of WWW 2002 Conference, Hawaii USA, 2002.
[5] The Department of Computer Science and Software Engineering, the University of Melbourne,Australia, http://www.cs.mu.oz.au, 2007.