Top PDF Preprocessing Techniques in Web Usage Mining: A Survey

Preprocessing Techniques in Web Usage Mining: A Survey

Preprocessing Techniques in Web Usage Mining: A Survey

With the enormous growth of web there is a huge volume of structured, unstructured, semi- structured, heterogeneous, dynamic, distributed and high dimensional data available on web pages. So accessing relevant information with speed is a challenging task today. Several issues like multimedia data , scalability and temporal arises due to dynamic and diverse nature of data .While interaction with web various problems like finding useful information, personalization of information, to learn about consumers or individual users, creating new knowledge from the information available on web arises [1,2]. To solve these problems many techniques from Information retrieval (IR), Database, Natural Language Processing (NLP), Web mining are used directly or indirectly [4, 5]. Among them web mining has emerged as most popular and effective technique to overcome above problems in last few decades. Web mining is an application of data mining to extract uncover, relevant, hidden information on web. Web mining can be categorized into three classes based on content, structure and usage of web pages which is shown in Figure 1 [1, 27].
Show more

9 Read more

A Survey of Issues and Techniques of Web Usage Mining

A Survey of Issues and Techniques of Web Usage Mining

role in field of computer science. The World Wide Web is an interactive and popular platform to transfer information. Web Usage Mining is the type of web mining and it is application of data mining techniques. Web Usage Mining has become helpful for website management, personalisation etc. Usage data internment the origin of web users along with their browsing behaviour at a website. It means weblog records to discover user access pattern from web pages. Weblog contains all information regarding to users which is useful to access pattern. Web mining helps to gather the information from customer who’s visiting the site. Now a days various issues related to log files i.e. data cleaning, session identification, user identification etc. In this survey paper we discuss the phases of WUM, architecture of WUM, issues related to WUM and also discuss the future direction.
Show more

5 Read more

A Survey on Web Structure and Web Usage Mining Algorithms for Web Applications

A Survey on Web Structure and Web Usage Mining Algorithms for Web Applications

After taking a survey on web structure mining & web usage mining the main algorithm is found out to follow for the further development of web applications that is HITS algorithm. This paper described several purposed web structure mining algorithms like Pagerank algorithm, weighted content Pagerank algorithm (WCPR), HITS etc. We analyzed their strengths and limitations and provide comparison among them. So we can say that this paper may be used as a reference by researchers when deciding which algorithm is suitable. We also try to overcome from the problem that particular algorithms have. This paper gives an insight into the possibility of merging data mining techniques with Web application analysis for achieving a synergetic effect of Web usage mining and its utilization in Web Applications Evaluation. The paper firstly describes the data preprocessing and pattern discovery steps, as pages based upon visits using weighted page content ranking and HITS. User clustering tries to discover groups of users having similar browsing patterns. Such knowledge is especially useful in Ecommerce applications for inferring user demographics in order to perform market segmentation while in the evaluation of Web site quality and developing web applications this knowledge is valuable for providing personalized Web content to the users. For the further research of web applications HITS will be the best.
Show more

7 Read more

Web Usage Mining: A Survey

Web Usage Mining: A Survey

From the literature survey, we have concluded that Web usage mining plays a very important role for the web site owners. Websites are the most important way of advertisements. Web Usage Mining helps in extracting user-access pattern which can help the website owners in number of ways such as customization of web data, design support and caching. The results of Web Usage Mining depend greatly on the pre-processing stage. So, much care should be taken while performing this step. More efficient methods need to be developed to perform pre-processing. Further, to protect the web log file data from the various attacks heuristic techniques should be used such as combination of genetic algorithms with neural networks
Show more

5 Read more

A Survey on Preprocessing of Web Log File in Web Usage Mining to Improve the Quality of Data

A Survey on Preprocessing of Web Log File in Web Usage Mining to Improve the Quality of Data

information. In the internet era web applications are increasing at enormous speed and the web users are increasing at exponential speed. As number of users grows, web site publishers are having increasing their information for attracting and satisfying users. it is possible to trace the users’ essence and interactions with web applications through web server log file and Web log file contains only (.txt) file. The data stored in the web log file consist of large amount of eroded, incomplete, and unnecessary information. Because of large amount of irrelevant data’s available in the web log file, an original log file can not be directly used in the web usage mining. So prepeocessing technique is applied to improve the quality and efficiency of a web log file. Different techniques are applied in preprocessing that is data cleaning, data fusion, data integration. In this paper we will survey different preprocessing technique to identify the issues in web log file and to improve web usage mining preprocessing for pattern mining and analysis.
Show more

6 Read more

A Survey on Web Usage Mining Preprocessing

A Survey on Web Usage Mining Preprocessing

ABSTRACT: Web mining is to discover and extract useful information. In the internet era web applications are increasing at enormous speed and the web users are increasing at exponential speed. As number of users grows, web site publishers are having increasing their information for attracting and satisfying users. it is possible to trace the users’ essence and interactions with web applications through web server log file and Web log file contains only (.txt) file. The data stored in the web log file consist of large amount of eroded, incomplete, and unnecessary information. Because of large amount of irrelevant data’s available in the web log file, an original log file cannot be directly used in the web usage mining. So preprocessing technique is applied to improve the quality and efficiency of a web log file. Different techniques are applied in preprocessing that is data cleaning, data fusion, data integration. In this paper we will survey different preprocessing technique to identify the issues in web log file and to improve web usage mining preprocessing for pattern mining and analysis.
Show more

6 Read more

Model Survey on Web Usage Mining and Web Log Mining

Model Survey on Web Usage Mining and Web Log Mining

In the data preprocessing, it takes web log data as input and then process the web log data and gives the reliable data. The goal of preprocessing is to choose primary features, then remove unwanted information and finally transform raw data into sessions. So to do this Data preprocessing is divided into sub processes which are known as Data Cleaning, user identification, and Session Identification [12] [2].

7 Read more

A Survey on Methods used in Web Usage Mining

A Survey on Methods used in Web Usage Mining

Path Completion: Another critical step in data preprocessing is path completion. Thereare some reasons that result in path’s incompletion, for instance, local cache, agent cache, “post” technique and browser’s “back” button can result in some important accesses not recorded in the access log file, and the number of URL’s recorded in log may be less than the real one. Using the local caching and proxy servers also provide the drawback for path completion since users can access the pages in the local caching or the proxy servers caching without leaving any record in server’s access log, in reaction the user access paths are incompletely
Show more

5 Read more

A Survey on Web Personalization of Web Usage Mining

A Survey on Web Personalization of Web Usage Mining

The discovery of user access patterns from the user access logs, referrer logs, user registration logs etc is the main purpose of the Web Usage Mining activity. Pattern discovery is performed only after cleaning the data and after the identification of user transactions and sessions from the access logs. The analysis of the pre-processed data is very beneficial to all the organizations performing different businesses over the web [8]. The tools used for this process use techniques based on AI, data mining algorithms, psychology, and information theory. The different systems developed for the Web Usage Mining process have introduced different algorithms for finding the maximal forward reference, large reference sequence, which can be used to analyze the traversal path of a user. The different kinds of mining algorithms that can be performed on the preprocessed data include path analysis, association rules, sequential patterns, clustering and classification. It totally depends on the requirement of the analyst to determine which mining techniques to make use of. Association Rules, This technique is generally applied to a database of transactions consisting of a set of items. This rule implies some kind of association between the transactions in the database. It is important to discover the associations and correlations between these set of transactions. In the web data set, the transaction consists of the number of URL visits by the client, to the web site. It is very important to define the parameter support, while performing the association rule technique on the transactions. This helps in reducing the unnecessary
Show more

7 Read more

Advanced Preprocessing Techniques used in Web Mining: A Study

Advanced Preprocessing Techniques used in Web Mining: A Study

Web based applications are now increasingly becoming more popular among the users across the world. The user interactions with the applications are being tracked by the web log files that are maintained by the web server. For this purpose web usage mining (WUM) is being used. Web usage mining is the process of extracting user patterns from the web usage. In web usage mining, preprocessing plays a key role, since large amount of irrelevant information are present in the web. It is used to improve the quality and efficiency of the data. There are number of techniques available at preprocessing level of WUM. Different techniques are applied at preprocessing level such as data cleaning, data filtering, and data integration. In this paper, we present a survey on the various preprocessing techniques that have been used in order to improve the efficiency.
Show more

5 Read more

Web Log Analyzer for Semantic Web Mining

Web Log Analyzer for Semantic Web Mining

In this study researchers presented a survey of the use of Web mining for Web personalization. More specifically, they introduce the modules that comprise a Web personalization system, emphasizing on the Web usage mining module. A review of the most common methods that are used as well as technical issues that occur is given, along with a brief overview of the most popular tools and applications available from software vendors. Moreover, the most important research initiatives in the Web usage mining and personalization area are presented. The researchers proposed that Web personalization is the process of customizing the content and the structure of a Web site to the specific and individual needs of each user, without requiring from them to ask for it explicitly. This can be achieved by taking advantage of the user’s navigational behavior, as it can be revealed through the processing of the Web usage logs, as well as the user’s characteristics and interests. They also include the overall process of Web personalization consists of five modules, namely: user profiling, log analysis and Web usage mining, information acquisition, content management and Web site publishing. The main component of a Web personalization system is the usage miner. Log analysis and Web usage mining is the procedure where the information stored in the Web server logs is processed by applying statistical and data mining techniques, such as clustering, association rules discovery, classification and sequential pattern discovery, in order to reveal useful patterns that can be further analyzed. Such patterns differ according to the method and the input data used, and can be user and page clusters, usage patterns and correlations between user groups and Web pages. Those patterns can then be stored in a database or a data cube and query mechanisms or OLAP operations can be performed in combination with visualization techniques. The most important phase of Web
Show more

5 Read more

A Comprehensive Survey on Data Preprocessing          Methods in Web Usage Minning

A Comprehensive Survey on Data Preprocessing Methods in Web Usage Minning

Abstract—Web usage mining is the application of data mining technique which is used to extract information about user’s interest from web server log files. Web usage mining is widely used by companies to analyze the customer’s interest and predict future of their business. It is used in various fields like E-Business, E-Commerce, E- learning, etc., Web usage mining entails of three phases :- Data Preprocessing , Pattern Discovery and Pattern analysis. Data Preprocessing is one of the essential and a preliminary step in web mining to enforce quality in the input data. The raw data from web server log file is preprocessed to eliminate the noisy, vague and redundant data for efficient mining. It involves different phases namely Field Extraction and Data cleaning, User Identification, Session Identification, Path completion and Transaction Identification. In this paper, we have discussed about various researches carried out in Data Cleaning and the various attributes considered in the process of cleaning.
Show more

5 Read more

Review of Web Recommendation System and Its
          Techniques: Future Road Map

Review of Web Recommendation System and Its Techniques: Future Road Map

The concept of web usage mining is playing main role for identifying the web page requirements of end users through the web server. Generally the end users want to find the right web pages within the short duration of time. So the need of demand, the development is required to forecast the correct web pages from the web. Many techniques applied to the analysis of web log data, but researchers have been attracted by ARM. Preprocessing is for Web Usage Mining works basis. Preprocessing methods discussed the importance of this work; various techniques are compared and identified. Preprocessing techniques to preprocess a complete extraction of user patterns, web log files are proposed [1]. Data cleaning algorithms irrelevant web log files and remove entries from the log file filtering algorithm discards unselfish characteristics. Users are able to identify the session. Sanjay Gandhi et al also a full stream of data preprocessing techniques proposed for use. The preprocessing stage and search log data is collected from different data sources are used before meaningful patterns. Web mining valuable information from secondary data derived from user access logs. It is important for web site organization, improve business services, personalization web traffic and web recommendation. Web usage mining divided into three different phases and these are planned. Big web traffic data calculated & applied to web mining techniques for discovering an interesting pattern useful from traffic analysis.
Show more

5 Read more

AN EFFICIENT STRATEGY OF PREPROCESSING FOR OBTAINING KNOWLEDGE FROM WEB USAGE DATA

AN EFFICIENT STRATEGY OF PREPROCESSING FOR OBTAINING KNOWLEDGE FROM WEB USAGE DATA

The World Wide Web (WWW) is a collection of huge amount of Web Usage Data. The process of extracting the relevant data from Web Usage Data is known as Web usage mining. This data must be assembled into a consistent and comprehensive view, in order to be used for further steps of the Web Usage Mining. However, often most of this data are not of much interest to most of the users. Due to this abundance, it became essential for finding ways in extracting relevant data from this ocean of data, hence several researches have been done and researchers proposed an significant and unifying area of research is known as Web mining. As most in data mining technique the data preprocessing involves the removing of irrelevant and inconsistent data, but proper data cannot be achieved without implementing proper preprocess techniques. In this paper we are mainly focusing on the complete preprocessing techniques, such as- data fusion, data cleaning, user identification, session identification, data formatting and summarization. These are the activities used to improve the quality of the data by reducing the quantity of data. This methodology will reduce the size of the data from 75% to 85% from its original data size in Web Usage Mining.
Show more

13 Read more

Web Usage Mining for Predicting Users’ Browsing Behaviors by using FPCM Clustering

Web Usage Mining for Predicting Users’ Browsing Behaviors by using FPCM Clustering

In this internet era web sites on the internet are useful source of information in day to day activities. So there is a rapid development of World Wide Web in its volume of traffic and the size and complexity of web sites. As per August 2010 Web Server survey by Netcraft there are 213,458,815 active sites. Web mining is the application of data mining, artificial intelligence, chart technology and so on to the web data and traces user’s visiting behaviors and extracts their interests using patterns. Because of its direct application in e-commerce, Web analytics, e-learning, information retrieval etc., web mining has become one of the important areas in computer and information science. Web Usage Mining uses mining methods in log data to extract the behavior of users which is used in various applications like personalized services, adaptive web sites, customer profiling, prefetching, creating attractive web sites etc.
Show more

6 Read more

A Novel Preprocessing Technique for Session Construction using Propositional DAGs

A Novel Preprocessing Technique for Session Construction using Propositional DAGs

This paper continues the line of research on Web access log analysis. Web access log analysis is to analyze the patterns of web site usage and the features of users’ behavior. It is the fact that the normal Log data is very noisy and unclear and it is vital to preprocess the log data for efficient web usage mining process. Preprocessing is the process comprises of three phases which includes data cleaning, user identification and session construction. Session construction is very vital and numerous real world problems can be modeled as traversals on graph and mining from these traversals would provide the requirement for preprocessing phase. On the other hand, the traversals on unweighted graph have been taken into consideration in existing works. This paper oversimplifies this to the case where vertices of graph are given weights to reflect their significance. The proposed method constructs sessions as a Propositional Directed Acyclic Graph (PDAGs) which contains pages with calculated weights. We identify a new property called simple- negation, which is an implicit restriction of all Negation Normal Form (NNFs) and Binary Decision Diagram(BDDs). The removal of this restriction leads to Propositional Directed Acyclic Graphs (PDAG), a more general family of graph-based languages for representing Boolean functions or propositional theories. This will help site administrators to find the interesting pages for users and to redesign their web pages. After weighting each page according to browsing time a PDAGs structure is constructed for each user session. Existing system in which there is a problem of learning with the Boolean function and the problem can be overcome by the proposed method.
Show more

6 Read more

Web Log Based Analysis of User’s Browsing Behavior Ashwini ladekar Pooja Pawar

Web Log Based Analysis of User’s Browsing Behavior Ashwini ladekar Pooja Pawar

Web usage mining basically has three stages, namely preprocessing, pattern discovery, and pattern analysis. One of the algorithms which is very simple to use and easy to implement is the Apriori algorithm. Web usage mining refers to the automatic discovery and analysis of patterns in user access stream and associated data collected or generated as a result of user interactions with Web resources on one or more Web sites. The goal is to capture, model, and analyze the behavioral patterns and profiles of users interacting with a Web site.
Show more

5 Read more

A Survey of Web Mining and Various Web Mining Techniques

A Survey of Web Mining and Various Web Mining Techniques

Till now we read out the research topic of web mining that focuses on content, usage and structure of web. Web structured mining deals with mainly hits and page rank. Web mining which is one of the Mining technique that extracts the information from web documents automatically. Page Rank algorithm is used in WSM to rank the related pages. In general web mining retrieve the data from websites for users in efficient manner but we catch some problems in hits and page rank algorithm that is our purposed work for future to solve the problem.
Show more

7 Read more

Enhancing Web Navigation Usability Using Web Usage Mining Techniques

Enhancing Web Navigation Usability Using Web Usage Mining Techniques

Delivery of efficient service through a web site makes it compulsory in the redesigning stage to take into account the behavior of the users, which can be studied by means of a web log file that partially records information about user visits. The reconstruction of all of the sequences of pages that are visited by users who browse a web site is known as the web sessionization problem, and it has been formulated by means of an integer programming model; however, because a web log can accumulate a large amount of information, it is necessary to reconstruct the sessions over a period of weeks or months, thus the solution to this problem requires a long computational processing time. A heuristic approach based on simulated annealing is useful for the sessionization problem. Using [11] and [12] this approach, it has been possible to reduce the processing time up to 166 times compared to the time that is required for the integer programming model. Furthermore, the meta- heuristic solution finds new optimum values, which achieve increases on the order of 17% in the best cases.
Show more

7 Read more

A Novel Preprocessing Method for Web Usage Mining based on Hierarchical Clustering

A Novel Preprocessing Method for Web Usage Mining based on Hierarchical Clustering

Web Usage Mining is the method of implementing data mining procedures to extract usage pattern from Web Log files data. There are three phases in Web usage mining - preprocessing, pattern discovery and pattern analysis. There are several preprocessing tasks that must be performed prior to data collected from server log data mining algorithms to apply. This serves to define the value of specific clients, cross marketing strategies across products and the effectiveness of promotional efforts, and so on. Data preprocessing is a data mining technique which involves the transforming of raw data into an understandable format. Data preprocessing is important to insure the ability of web log mining. Result of preprocessing has direct influence on the choosing of mining algorithm. In this research, data preprocessing algorithms are discussed in database-driven applications such as customer relationship management and rule based applications. The preprocessed Web Log File can be suitable for the discovery and analysis of useful information referred to as web mining. Preprocessing may be needed to make data more suitable for data mining. This research summarizes the efficient and complete preprocessing results before actual mining can be performed.
Show more

5 Read more

Show all 10000 documents...