• No results found

Step 1 — Defining the business issue

Chapter 3. A generic data mining method

3.4 The generic data mining method

3.4.1 Step 1 — Defining the business issue

All to often, organizations approach data mining from the perspective that there must be some value in the data we have collected, so we will just use data mining and discover what’s there. Using the mining analogy, this is rather like choosing a spot at random and starting to dig for gold. This may be a good strategy if the

Define the data

Step 1: The business issue

Step 2: Using a common data model or how to define it when it does not exist

Step 3: Sourcing and preparing the data

Step 4: Evaluating the data

Step 5: Mining the data

Step 6: Interpreting the results

Step 7: Deploying the results Define the Business issue Choose the Mining technique Interpret the results Deploy the results Evaluate the Data model Source the Data Date Model to use Define the data

Data mining is about choosing the right tools for the job and then using them skillfully to discover the information in your data. We have already seen there are a number of tools that can be used, and that very often we have to use a combination of the tools at our disposal, if we are to make real discoveries and extract the value from our data.

The

first step in our data mining method

is therefore to identify the business issue that you want to address and then determine how the business issue can be translated into a question, or set of questions, that data mining can address. By

business issue

we mean that there is an identified problem to which you need an answer, where you suspect, or know, that the answer is buried somewhere in the data, but you are not sure where it is.

A business issue should fulfill the requirements of having:

򐂰 A clear description of the problem to be addressed

򐂰 An understanding of the data that may be relevant

򐂰 A vision for how you are going use the mining results in your business

Describing the problem

If you are not sure what questions data mining can address, then the best approach is to look at examples of where it has been successfully used, either in your own industry or in related industries. Many business and research fields have been proven to be excellent candidates for data mining. The major fraction are covered by banking, insurance, retail and telecommunications (telecoms), but there are many others such as manufacturing, pharmaceuticals,

biotechnology and so on, where significant benefits have also been derived. Well-known approaches are: customer profiling and cross-selling in retail, loan delinquency and fraud detection in banking and finance, customer retention (attrition and churn) in telecoms, patient profiling and weight rating for Diagnosis Related Groups in health care and so on. Some of these are depicted in

Figure 3-5 Business and research applications

The objective behind this book and others in the series is to describe some of these different issues and show how data mining can be used to address them. Even where the specific business issue you are trying to address has not been addressed elsewhere, understanding how data mining can be applied will help to define your issue in terms that data mining can answer. You need to remember that data mining is about the discovery of patterns and relationships in your data. All of the different applications are using the same data mining concepts and applying them in subtly different ways.

With this in mind, when you come to define the business issue, you should think about it in terms of patterns and relationships. Take fraud as an example. Rather than ask the question can we detect fraudulent customers, you could ask the question, can we detect a small group of customers who exhibit unusual

characteristics that may be indicative of fraud? Alternatively, if you have identified some customers who are behaving fraudulently, the question is, can you identify some unique characteristics of these customers that would enable you to identify

D a ta M in in g A p p lic a tio n s

D a ta M in in g A p p lic atio n s

Ta rg e t M a rk e ti n g Loa n D elin que ncy C us tom e r A ttrition O p inion S urveys C ross Sellin g P roduct Analysis C reditanaly sis F rau d d ete ctio n

D e m and forec asting Custo m er R etentio n M a rket Basket A n alysis P hy s ic s C h e m istry .. M aintena nce M anu factur ing H e a lthcare C us tome r A c qu is ition

Understanding your data

As you are formulating the business question, you need to also think about whether the data that you have available is going to be sufficient to answer the question. It is important to recognize that the data you hold may not contain the information required to enable you to answer the question you are posing. For example, we suppose you are trying to determine why you are losing customers and the reason is that your competitors are undercutting you on price. If you do not have competitor pricing data in your database, then clearly data mining is not going to provide the answer. Although this is a trivial example, sometimes it is not so obvious that the data cannot provide the answer you are looking for. The amazing thing is how many people still believe that data mining should be able to perform the impossible.

Where the specific business issue has been addressed elsewhere, then knowing what data was used to address the issue will help you to decide which of your own data should be used and how it may need to be transformed before it can be effectively mined. This process is termed the construction of a common data model. The use of common data models is a very powerful aid to performing data mining as we will show when we address specific business issues.

Using the results

When defining the business issue that you want to address with data mining, it is important that you think carefully about how you are going to use the information that you discover. Very often, by considering how you are going to deploy the results of your data mining into your business, will help to clarify the business issue you are trying to address and determine what data you are going to use. Suppose for example, that you want to use data mining to identify which types of existing customers will respond to new offers or services and then use the results to target new customers. Clearly the variables you use when doing the data mining on your existing customers, must be the same variables that you can derive about your new customers. In this case you cannot use the 6-month aggregated expenditure (

aggregated spend

) on particular products if all you have available for new customers is the expenditure from a single transaction.

Thinking about how you are going to use the information you derive places constraints on the selection of data that you can use to perform the data mining and is therefore a key element in the overall process of translating the business issue into a data mining question.

Related documents