Enhancing e-business Through Web Data Mining

Full text


Enhancing e-Business Through

Web Data Mining

Amy Shi1, Allen Long2, and David Newcomb3

1 Accurate Business Solutions, Courtyard, Denmark Street, Wokingham, RG 40 2AZ, U.K.


2 School of SCISM, South Bank University, London, SE1, 0AA, U.K.


3 BigSoft Limited, 40 Belmond Road, Reading, RG30 2UU, U.K.


Abstract. Today, there is more interest than ever about “e.” The Internet,

e-commerce and e-business undoubtedly hold an important key to every organisation’s future. The paper aims to introduce a web data mining solution to e-businesses to discover hidden patterns and business strategies from their customer and web data. A three-layer virtual e-business framework is proposed in the paper, as well as the web mining technique to personalise e-services, increase cross-selling, and improve the customer relationship management. Compared with general data mining algorithms, specific characteristics associated with web data are discussed too.

1. Introduction

Today, there is more interest, more discussion and more hype than ever about “e.” The Internet, e-commerce and e-business undoubtedly hold an important key to every organisation’s future and success, offering tremendous opportunities and worldwide markets. Nobody can afford to let the competition pass by, yet if started in haste, it is bound to fail – e-business projects and dot-com companies have unfortunately the highest rate of failure due to the poor understanding of the new rules in the e-economy environment.

To differentiate themselves in the Internet economy, wining enterprises are realizing that e-business is much more than a simple buy/sell transaction, right e-strategies are the key to successfully increasingly competitiveness in the marketplaces. However, even the principles that made organisations successful yesterday are still the best foundation of where to start today, the implementation of e-strategy is not as easy as simply adding an “e” in front of their current business strategy.

This paper aims to introduce a data mining solution to e-businesses to discover the hidden insight of their business and web data. This will help e-organisations to make intelligent business strategies and improve their customer relationships management. A three-layer virtual e-business framework is proposed in Section 2. Section 3 discusses how to enhance e-business through web data mining.


2. A Virtual Framework of e-Business

2.1 Data Involved in e-business

Generally, the business data involved e-businesses is massive. Mostly, it contains customer information, purchase information, product/service information, suppliers, security and priority information, management reports including standing and statistic analysis of production, sale, financial etc, as well as online web access data. Fig.1 shows an example of basic kinds of data involved with e-business, the content may vary with different types of e-businesses. In the figure, tables circled together, e.g. Customer, Contact, Product, Purchase, Payment and Web_log, are connected to each other. The management database contains all the information, reports and knowledge generated by an organisation for business management.

Fig. 1 An example of e-business data

This online e-business data is growing constantly. Effectively organising and managing the data is a fundamental task to all e-businesses. There is no doubt that advanced database/warehouse technology is required to handle the data which is likely in different formats and distributed environments, providing reasonably quick response to customer queries and intra-processing. Additionally, the data is required to be shared by the whole organisation based on specific priority control policies in the e-business environment, leading to a common data resource and processing platform.

2.2 Three-Layer Architecture

A new virtual e-business framework is structured as a three-layer architecture, i.e. customer service, data manager and business intelligent (BI) support, as shown in Fig. 2.


Fig. 2: A virtual e-business framework, the three-layer architecture • Customer Service - External and Internal Navigation Platforms:

This layer is essential to any kind of e-business. The external web platform provides major part of communications and services to visitors or customers. A well-designed website pages should have the characteristics like easy to use, quick responds, good quality but right amount of information, convenient access to customer related data without returning etc, as well as the security guarantee.

Back offices of the e-organisation, like customers, also work on the net via the Intra-navigation. This is a common platform to all functioning departments to share the same data resource and deliver processed results. For example, when a customer order is received and put into database by the sales department, it will also involve the financial department to deal with the payment, the delivery department to arrange the shipping, and customer service department to confirm the order. Every department will deliver relevant records and modify the data once a process is completed. Business reports and internal documents are also sharable through the platform, so that a marketing manager can quote the numbers of finance manager’s reports to support the performance of a promotion campaign. The overall performance of the e-organisation is improved via the intra-platform by providing up-to-date and accurate information to every element.

• Data Manager

This layer is very important to the effectiveness of e-business, as it is in charge of the management of the entire e-business data that discussed in Section 2.1. In fact, this layer acts as a bridge that links visitors/customers and organisation together via data exchange. It requires advanced database/ warehouse technology and a well-designed data structure.

This layer can significantly strengthen an e-organisation’s intra-process by using automatic function-oriented agents. For instance, the standard order-processing example discussed above can be done by a sales agent and look after all relevant data records effectively. Fig.3 shows how the sales agent (the black cartoon) can handle a standard sales procedure and modify related data records when a sub-task, such as credit confirmation or inventory check, is completed. In addition, the sales agent can also deal with


some special cases. For example, if the publication of a new book is delayed, the agent is able to find the customers who have already ordered the book from the Purchase table, and forwards the information (Customer’s name, IP address, order items etc) to every customer touch point, e.g. back offices and call center. Therefore, when the customer visits the website or telephones the call center, the specific information will be put forward immediately (through the IP address) to the customer to explain the situation. Obviously, this will strengthen the intra-processing of the e-organisation, speed up the response time and improve customer relationships.

Fig. 3 An example of the sales agent • Business Intelligence (BI) Support:

This layer doesn’t have to exist but it may just separate the winners and losers. Integrating BI tools, e.g. OLAP and data mining etc., into e-businesses has become more widely accepted by e-organisations to reveal hidden facts of business data. The knowledge of customers’ behavior will help to improve customer relationships and make business strategies. These techniques are discussed in the following Section 3.

3. Enhance e-business Through Web Data Mining

It has been a challenge for e-organisations to uncover patterns that reveal the hidden insights of their massive e-business data effectively, as the data is constantly growing. Data mining techniques are becoming more popular as a powerful BI tool to fill the increasing gap between data collection and exploration, helping e-businesses of all sizes to sift through the data in search of useful patterns.

3.1 A Glance at Web Data

Web data is the information that is recorded by the website server when a user visits. As an example, a file named access_log, as shown in Fig. 4, contains all the website “hit” information, such as visitor’s IP address, date and time (GMT + time difference), required pages, and status code indicating


if the request is completed or failed. Similarly, a file called error_log records the error details, web server problems and possible intrusions.

Fig. 4 An example access_log and error_log files

The web data in access_log and error_log is required to be converted into database format, so that data mining algorithms can be applied to it. Two tables, i.e. Web_log and Error_log, are used to collect the relevant data from the two files. Fig. 5 shows the table “Web_log” that contains part of the data in the file of access_log.

ID IP Date Time Adaptation Page Status

1 11-Aug-00 9:24:25 +0100 GET /vnvi/ HPPT /1.1 304

2 11-Aug-00 9:24:25 +0100 GET /vnvi/ md5.js HPPT /1.1 304 3 11-Aug-00 9:24:25 +0100 GET /image/coimage.gif HPPT /1.1 304 4 11-Aug-00 9:24:25 +0100 GET /vnvi/ md5.js HPPT /1.1 404 298 5 11-Aug-00 9:24:25 +0100 GET /vnvi/ image/coimage.gif 404 303

.. …. … … .. … …

Fig. 5 The table Web_log contains part of the web data in the file of access_log. The web data is massive since the visitor’s every click in the website will leave several records in the tables. This also allows the website owner to track visitors’ behavior details and discover valuable patterns. For example, the visitor’s IP address and time difference represent the organisation and geographic location; the required page indicate the visitors’ interests and searching topics; frequency statistics and time duration separates the regular users from casual visitors; and the status codes show the stability of the website - succeed or failure to get the required pages, quick or slow response as loading from cache or hard disk etc. These useful patterns can be uncovered by web data mining technique, helping e-organisations to enhance their e-services.


3.2 Personalised e-Services

Organisations are seeking effective and low cost personalised service for their customers. For example, an e-banking organisation wants to provide a quick access to frequently-used customers to their most likely wanted functions, e.g., statement enquiry, payment, or stock market quotation etc, instead of the bank’s general front page.

This purpose can be achieved by applying data mining to the web data. Two attributes of Web_log, i.e. the “IP” and “Page” in Fig. 5, are relevant. Since every web page is designed for a specific function, a concept hierarchy for web pages can be build up based on the assigned function. Fig. 6 gives an example hierarchy of the pages shown in Fig. 4.

Fig. 6 Concept hierarchy of web pages

Based on this page hierarchy, every web page recorded in Fig. 5 is replaced by the higher concept in the hierarchy, resulting in a generalised table in Fig. 7, with more abstract concepts of visiting purposes, as well as the visiting frequency indicated by the statistic numbers.

IP Login Statement Transfer Stock Currency 231 38 2 200 12 … 123 100 45 0 0 … 94 90 12 0 0 … 140 112 1 100 67 …

… … … …

Fig. 7 the generalised table of IP and Pages with statistic information

Customer’s visiting purposes are revealed clearly in the obtained table by the items of “transfer” or “statement” instead of web pages. With this piece of knowledge, the organisation can build up a hot mapping of IP address to services. When a specific customer who checks share prices every day login to the e-bank, the website will jump to the stock market quotation page firstly even with the stock numbers. As can be seen, without the data generalisation process, it is almost impossible to understand and analyse the giant amount of web data in its original data level.


Applying other data mining algorithms to the above-generalised table can uncover more complicated patterns. For example, classification algorithms can find out the visitor segmentation based on their interests shown in the frequently visited pages. Another example is to reveal casual visitor’s browsing patterns, such as general time duration that visitors spend in the website, mostly visited pages and likely checked out pages, as well as the reasons etc., from the attributes of Date, Time, Pages and Status in Web_log. These patterns are very helpful to the website owner to improve the attractiveness and stability of the website.

It is noticed that the visitor segmentations and browsing patterns discovered from web data are different from the customer segmentations and shopping behaviors that can be discovered from a normal business like supermarket. This is because that web visitors are not necessarily customers if they have no previous purchase record; or simply because that the website only provides service and not selling any product. Therefore, web visitors’ personal details or consuming potentials cannot be collected directly like normal businesses, unless specific forms are required to be filled in, which unfortunately is a big reason why visitors give up the website. However, e-business data has other advantages. As every click in the website creates many web data records, significant patterns can also be found out from the massive data that recording every movement in the website. This characteristic of web data requires more pre-processing and specific procedures based on web techniques, to filter out irrelevant or noisy records before applying a normal mining algorithm. Therefore, compared with the ones discovered from normal business data, patterns generated from e-business and web data might have slightly lower accuracy but more natural meanings.

3.3 Increasing Cross-Selling by Basket Analysis

All businesses constantly try to increase cross-selling opportunities. The secret of successful cross-selling is to understand customer’s interests and recommend the right products to the right customer. The knowledge revealing customers’ consuming behaviors are hidden in the large amount of the historical purchase records maintained by the data manager. Relevant tables, including Customer, Purchase, Payment, Contact and Web_Log, are shown in Fig.8, as well as their relationships. Based on this set of data, interesting patterns can be developed to increase cross-selling.


Association rule algorithm, or market basket analysis algorithm, can be employed to the dataset of all the items that have been bought together, resulting in patterns like “this percentage of customers who bought XX also bought YY”. These associations can be used to recommend YY to the customers who are going to buy XX, or launch a package promotion for the goods like XX and YY that are likely to be bought together.

These e-association patterns should be as good as the ones generated from normal business data, if the e-company has reasonable amount of customers. In fact, to e-businesses, not only the purchase records, but also the customer’s potential interests indicated by the frequently visited pages can be taken into account to increase the accuracy of the association rules. At the same time, the Web_log shows whether a new order is made as the result of specific recommendations, which allows the e-organisation to monitor the performance of the rules, and adapt their marketing strategies accordingly. Obviously, the knowledge and strategies with measurable results can better target potential markets, maximise the success opportunity and minimise the marketing cost.

4. Conclusion

Successful e-business needs cutting edge BI and CRM technology. This paper touches upon how web data mining can help e-business to improve their customer relationship, make intelligent business strategies, and sharpen competitive edge; yet it reveals only a tip of the iceberg. The experience of many e-business winners has shown the tremendous benefits from applying even only a single piece of mining technology, but which has singled them out from their peer.


1. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. 1996.

Advances in Knowledge Discovery and Data Mining (AKDDM). AAAI/NIIT Press.

2. Chen, M.S.; Han, J.; and Yu, P.S. 1996. Data Mining: An Overview from A Database Perspective, IEEE Trans. on Knowledge and Data Engineering, Vol. 8, No.6, P866-883.

3. Amy Shi, Mining Linguistic Knowledge From Financial Data, Ph.D Thesis, HK Polytechnic University, Hong Kong, 1999.

4. Business Intelligent 2000, London, U.K. 2000.





Related subjects :