The above Fig. 1 shows the types and sources of Web mining. Web Content Mining is the process of extracting useful information from the contents of Web documents. Content data corresponds to the collection of facts a Web page was designed to convey to the users. It may consist of text, images, audio, video, or structured records such as lists and tables . Research in web content mining encompasses resource discovery from the web, document categorization and clustering, and information extraction from web pages . Web structure mining studies the web‟s hyperlink structure. It usually involves analysis of the in-links and outlinks of a web page, and it has been used for search engine result ranking. . Web Structure Mining can be regarded as the process of discovering structure information from the Web. This type of mining can be performed either at the(intra-page) document level or at the (inter-page) hyperlink level . Web structure mining is the process of inferring knowledge from the World Wide Web organization and links between references and referents in the Web .WebUsage Mining is the application of data mining techniques to discover interesting usage patterns from Web data, in order to understand and better serve the needs of Web based applications. It also called as Web log mining. Some of the typical usage data collected at a Web site includes IP addresses, page references, and access time of the users. 
The World Wide Web (WWW) is continuously growing with the information transaction volume from web servers and the number of requests from web users. Providing web administrators with meaningful information about users’ access behaviour and usage patterns have become a necessity to improve the quality of web information service performance. Existing models do not make use of completely detailed and longer period of web log data. There is the need for a model that analyses usage patterns of different aspects of log files collectively and for a longer duration. In this paper, web log data was collected from Information and Telecommunications Unit of Obafemi Awolowo University, Ile Ife. The web log data was comprehensively studied to identify the most important input variables useful for the webusage mining model and web traffic analysis. An improved web mining model was designed using Unified Modelling Language. The developed model was simulated on Waikato Engineering and Knowledge Analysis (WEKA) software using Naïve Bayes’ classifier. The performance of the simulated model was validated using performance metrics: accuracy, recall, precision, true positive and false positive rate and ROC area. The model had a precision value of 0.810, which means that the Naïve Bayes’ classifier got 81% of predictions correctly to their original class. The area under the ROC had a minimum value of 0.993 indicating clearly the level of bias attributed to the predictions made by the Naïve Bayes’ classifier which in this case is 0.7% of all predictions.
Cristóbal Romero, et. al.  has proposed architecture of a recommender system that utilizes Webusage mining to recommend the links to visit next in an adaptive Web-based educational system. A specific mining tool and a recommender engine have been developed to help the instructor to carry out the Web mining process. Although they have integrated both the Web mining tool and the recommender engine into the AHA! System, it can, in principle, also be used in other Web-based educational systems. AHA system has some adaptive hypermedia methods and techniques that are especially useful for educational applications, which are User model based on concepts and Adaptive link hiding or link annotation. Author tested AHA architecture and algorithms proposed with several experiments. The data used in this study were collected from the online AHA! tutorial (http://aha.win.tue.nl/tutorial/) that consists of 43 Web pages. For experiments, author used a total number of 78 students with 118 sessions and 684 records. These students are mainly TU/e (Eindhoven University of Technology) students taking a traditional course in adaptive hypermedia and some other Internet users interested in the AHA! system and taking the tutorial online.
Web use mining has been an imperative piece of organization procedure to enhance definitive examination and decision. The written work on Webusage mining that plan with strategies and advancements for reasonably using Web use mining is extremely vast. Starting late, E- government has become much thought from experts and experts. Colossal measures of customer get to data are conveyed in Electronic government Web webpage standard. The piece of this data in the achievement of government organization can't be misrepresented in light of the way that they impact government examination, desire , philosophies, key, operational masterminding and control. Web utilize imitating in E-government has a basic part to play in defining government objectives, discovering local direct, and choosing future courses of exercises. Web utilize mining in E-government has not gotten adequate thought from researchers or pros. We developed a structure to propel an unrivaled understanding of the hugeness of Web utilize mining in E-government. Using the present composition, we developed the structure presented in this, with the desire that it would stimulate more excitement for this essential region .
Webusage mining is indeed one of the emerging area of research and important sub-domain of data mining and its techniques. In order to take full advantage of webusage mining and its all techniques, it is important to carry out preprocessing stage efficiently and effectively. This paper tries to deliver areas of preprocessing including data cleansing, session identification, user identification, etc. Once preprocessing stage is well-performed, we can apply data mining techniques like clustering, association, classification etc for applications of webusage mining such as business intelligence, e-commerce, e-learning, personalization, etc.
WebUsage Mining, ,  used to discover interesting usage patterns from Web log data, in order to understand and better serve the needs of Web-based applications. Usage data captures the identity or origin of Web users along with their browsing behavior at a Web site. Webusage mining itself can be classified further depending on the kind of usage data considered. They are web server data, application server data and application level data. Web server data  correspond to the user logs that are collected at Web server. Some of the typical data collected at a Web server include IP addresses, page references, and access time of the users and is the main input to the present Research. Assessing Web Site Usability from Server Log Files is a difficult task”.  Presents various assessment methods and techniques in assessing web-usability. This Research work concentrates applying webusage mining techniques for improving web navigation and in particular focuses on discovering the anticipated usage patterns for websites from the server log files and reducing the time of web developers by updating web links to website in automated manner.
But on the basis of these models only we can’t correctly recommend webpage so the third model introduced is Conceptual Prediction. Here we get Webusage knowledge. Here we refer to this semantic network as TermNavNet. This model motivated by the Markov models, which is a kind of well-known probabilistic model. They have remarkable features of events prediction by learning sequential data patterns. Here the idea of Markov models is very useful i.e. FWAP, for the Web-page recommender system.
WebUsage Mining is performed to extract sessions from web page requests. The sequence of pages requested by user during particular visit to publisher web site is considered as one user session . The page requests are recorded in web server log and but there should be a mechanism to group these requests into sessions. The request is considered to be part of particular session using unique identifier assigned to client browser. The id can be assigned to cookies or dynamically generated hyperlinks when user requests first time. Then while further page requests, client returns this id to server. This way, entire user session is generated with detail of current activities of user after finishing. A particular time period for example 30 minutes is considered to close the session when user is idle.
Website usage easier and faster. Web log file formats are usually designed for debugging purposes, therefore, web accesses are recorded in the order they come. Due to the stateless nature of the HTTP (i.e., each request is handled in a separate connection), web log records for a single user do not necessarily appear contiguously since they could be interleaved with records from other users. Thus, for each page component— such as an image, a cascading style sheet file, an HTML file, scripting file, or a Java script a separate record is recorded in the web log file. For webusage mining purposes, the only interesting elements are extract are from web log fille.
Webusage mining is one of the frequent usage areas of web mining. The awareness of Web mining lies in analyzing user's behaviour on the web after exploring access logs and its popularity is increasing at a faster face especially in E-services areas. The applications in these web semantic search areas added its approval and made it as an inevitable part in computer and information sciences. Details like user log files demand for resources and maintain web servers, which is the core mining area of webusage. The semantic analysis gives the user browsing patterns utilized for target advertisement, development of web design, fulfilment of users and making market analysis. Most of the web service providers realized the fact behind it to retain their users.
Given a set of frequently viewed term patterns, namely FVTP, the WebNav is generated by populating the CPM schema with FVTP. The CPM schema is designed using the formal ontology web language, RDF. An algorithm is also accomplished to perform this task. The transition probabilities is upgradable based on the ﬁrst-order or second-order probability formula depending on the applied CPM’s order. Thus, 1st or 2ndorder WebNav is obtained by using the 1st or 2nd-order CPM, respectively. For a given current Web-page or a combination of the current and previous Web-pages, the next Web-pages is recommended differently depending on which knowledge representation model and the order of CPM are used as mentioned earlier. These recommendation methods make utilization of the domain knowledge and the prediction model through two of the three models to forecast the enclosing pages with probabilities for a given Web client depended on his or her current Web-page navigation state. All things considered in this new system is fully automated. The knowledge base implementation has improved the new-page issue as specified previously. This technique yields better performance contrasted with the current Webusage based Web-page recommendation frameworks.
Abstract - At present in our day to day life internet plays a very important role. It has become a very vital part of human life. As internet is growing day by day, so the users are also expanding at much greater rate. Users spend lot of time on internet depending on the behavior of different user. Internet provides huge amount of information and from this information knowledge is extracted for the users. This extraction of information demands for the new logics and method. The data mining techniques and applications can be used in web based applications for performing this job which is also known as web mining. Web based mining or webusage mining is one of the trending topics nowadays. When user uses internet or visits some web pages, the associated information are stored in the server log files. Using these log files of server the human nature or behavior can be predicted. This paper focus on the web based mining and how it can be can be used to predict the human behavior using the server log files. The paper contains some of the techniques and methods associated with web mining.
IJEDR1702278 International Journal of Engineering Development and Research (www.ijedr.org) 1770 Biclustering fra me work using Genetic Algorith m for webusage mining could be referred at [4-7]. In , researchers have proposed Bic lustering approach with genetic a lgorith m fo r optima l web page category. Three diffe rent fitness functions based on Mean squared residue score are used to study the performance of the proposed biclustering method. Imp roved Fuzzy C-Means Clustering of WebUsage Data with Genetic Algorith m based approach could be referred at . Th is method is scalable and can be coupled with a scalable clustering algorith m to address the large -scale clustering proble ms in web data mining. In , researchers have proposed recommender system using GA K-means clustering algorith m for online shopping market. GA K- means clustering approach improves segmentation performance in co mparison to other typical clustering algorith ms.
The focus of this research paper is on webusage mining, the focus is on the data in the web and using clustering approach. During the training phase, clustering will convert nonlinear statistical relationship between high dimensional data into simple geometrical relationship in low dimensional display.
Webusage mining is the application of data mining techniques for discovering interesting usage patterns from webusage data, in order to understand and better serve the needs of web-based applications. Usage data captures the identity and origin of web users along with their browsing behavior at a web site. Webusage mining tries to make sense of data generated by the web surfer‟s session or behaviour . Webusage mining itself can be classified further depending on the kind of usage data considered. First one is Web Server Data in which user logs are collected by the web server and typically include IP address, page reference and access time. Second is Application Server Data which track various kinds of business events and log them in application server logs. And third one is Application Level Data in which new kinds of events can be defined in an application, and logging can be turned on for them - generating histories of these specially defined events.
The statistics has been constructed here to extract useful information such as hit count of each page of site depends on content of page, no. of valid hits on the page and some other information. Figure .2 shows structure of weblog data . As the information on WWW is growing exponentially, finding the relevant information according to the user’s interest and need is a challenging issue . The user is presented with number of URLs to locate his required need. Thus, more time and efforts are required to obtain required information. Web finding most desired web page is the solution to this problem . In many commercial applications website attractiveness is a crucial feature from the business perspective. So, website structures i.e. the web pages organization needs to be improved . Webusage mining extracts the knowledge from users’ behavior and helps the website designer to modify the website design, presented an approach for adaptive websites which automatically improves web structure organization by mining webusage logs from web server. Authors presented a cluster mining algorithm known as Page Gather for mining purpose  [19Web usage mining for web page access frequency consists of mainly three steps; preprocessing, pattern analysis and total hits calculation  . The preprocessing step mainly consists of data cleaning, user_identification and session_identification. Pattern discovery step is used to identify the interesting pattern from webusage data    When the users browse in any website, they search for the desired information which will be placed in particular pages  . If the information is significantly common to different users, those pages will be accessed with a high frequency. Thus, to identify those pages, Pa
Webusage mining attempts to discover useful knowledge from the secondary data obtained from the interactions of the users with the Web. Webusage mining has become very critical for effective Web site management, creating adaptive Web sites, business and support services, personalization; network traffic flow analysis etc., Web site under study is part of a nonprofit organization that does not sell any products. It was crucial to understand who the users were, what they looked at, and how their interests changed with time. To achieve this, one of the promising approaches is webusage mining, which mines web logs for user models and recommendations. Webusage mining algorithms have been widely utilized for modeling user web navigation behavior. In this study we advance a model for mining of user’s navigation pattern.
This is the final stage of WebUsage Analysis. Pattern analysis finds knowledge from the discovered pattern Of the interesting patterns by eliminating the irrelevant patterns. Pattern Analysis involves the validation and interpretation of the mined patterns. Validation can be used to remove the irrelevant patterns and to extract the interesting patterns from the output of the pattern discovery process. The output result is in mathematic form which is not suitable for direct human interpretations. So, Visualization techniques are used to interpret the results. The most general ways of analyzing user access patterns are either by using a knowledge query mechanism on a database such as SQL or data cubes to perform OLAP operations. Visualization techniques, such as graphing patterns are used for an easier interpretation of the results.
Web log files are the greatest source of knowledge now days, which keeps all the information about users interaction to web. This interaction provides us the usage patterns of the user by using webusage mining. These files contain all the information about visitors of the web which is used as input for analysis. These files are converted to required formats after completing the preprocessing so WebUsage Mining (WUM) techniques can apply on these logs. Webusage mining gives us the details of user patterns. In this study we are going discover different behaviors patterns from the web proxy server log file of an educational organization with webusage mining technique. Results are based on the interest of users towards educational websites.
In last few decades web has become an informational hub for users. Thus analysis of user’s behavior is becoming more and more important for e-commerce companies to provide better services to customers and visitors. Webusage mining is a field of study where user’s activity is analyzed and processed to generate useful patterns. Due to irrelevant data in log file, preprocessing is considered as an essential step in webusage mining. In this paper different steps of preprocessing: Data cleaning, User identification, Session identification, and Path completion have been discussed. Webusage mining depicts various challenging problems for preprocessing of log data. High dimensionality and large volume of data results in high computational complexity of mining process. So there is need to compress data without losing essential information regarding user’s behavior. Apart from that, preprocessing techniques and proposed heuristics are also facing relevancy issues and no robust techniques are present to solve them. For