• No results found

Chapter 2. Getting Started with Data Mining

2.3 Organizational Factors

Now that you know why to begin using data mining, it is time to pay some attention on where to start. In other words, what organizational environment is needed? There are several prerequisites to successfully start and complete a data mining project. We cover those questions in the following sections.

2.3.1 The Organizational Culture

First of all, the culture in your organization must support the flow of data and information that is needed. It must also accommodate the flow of results from the data mining effort. This means you need an open, communicative culture, where people actively cooperate in the exchange of information. This is especially needed in the interaction between business departments and technical departments. People must be willing to accept new information and, based on this, change the way they work.

If people protect their data and are not willing to share both the needed and resulting information, your organization will need an internal or external consulting effort to change this. It will not be an easy task, but it is essential to the success of the data mining effort.

2.3.2 The Business Environment

We already paid attention to the fact that the business must be leading your data mining activities. Executives at a high level in your organization must be willing to sponsor the project. There must be a clear understanding of the business issue you want to tackle, with a clear statement of the objectives to be able to fix the scope and expectations.

Knowledge about the general business environment and the metrics used to measure its various aspects must be readily available. Keep in mind that implementing a tool in your technical environment always means integrating the use of the tool into your business environment.

2.3.3 The People

The people factor is all about availability of the right specialists at the right time. The people that are involved in data mining activities fall into three general roles, as shown in Figure 6. The roles can be filled by fewer people, but normally that is not the case.

Figure 6. Roles in Data Mining

Following is a description of these roles:

Domain experts People that know the business environment, the processes, the customers, and the competitors. They are typically people in higher business management functions.

Database administrators People that know where and how the company’s data is stored, how to access it, and how to relate it to other data stores.

Mining specialists People with a background in data analysis who have at least basic statistical knowledge. They are able to apply data mining techniques and interpret the results in a technical way. They must be able to establish relationships with the domain experts for business guidance on their results, and to the database administrators for access to the data required for their activities.

Generally, you will find the first two roles already present in your company. The third role might require an external advisor the first time your company goes through the data mining process. People in your organization that might fill this role after the knowledge transfer has taken place include, for example, your marketing analysts.

One of the main difficulties in getting the right people, either inside or outside of your organization, is the variety of domains that have to be combined in

Database

Mining Domain

Getting Started with Data Mining 19

2.3.4 The I/T Architecture

While your organizational culture will support the information flow, the I/T architecture must support the data flow. You need fast, scalable, and open access to the available data and the flexibility to extract and update subsets of that data in the environment that you will use for data mining. An

environment that supports BI and the easy creation of data marts from a central warehouse is a good example of such an environment.

Besides accessing and transmitting data, your I/T architecture must also allow for either the processing capacity needed for data mining, or the easy addition of extra capacity. This could mean adding an extra system dedicated to mining, or running it in addition to the existing processes on one of your systems. In 3.4.2, “Performance” on page 46 we will describe a more detailed picture of the factors influencing processing performance.

2.3.5 The Data

Of course, the data must be available. The amount of raw data usually is not the problem, but the amount of clean, usable, relevant, and integrated data may be less than you think. One of the first steps in the data mining process, as we describe in 2.4, “The Data Mining Process” on page 20, is analyzing your data with this in mind. However, you should have a good idea of what is or is not available, before you start thinking about data mining.

There is no fixed rule about the amount of data needed to start mining. As a rule of thumb, several thousand records, and ten or more attributes, are a good starting point. These numbers further depend on the data mining technique you will employ.

2.3.6 The Data Mining Tools

The tool, or tools, that you use for data mining must be able to support data access, preprocessing, mining, visualization, storage, and maintenance of the results. This can be supported from a single package, or might may need several tools. In any case, tight integration between the tools is essential. You must also pay attention to the scalability of the tools that you plan to use. The well-known credo “start small, but think big” is essential in this case. You will always want to add extra data, explore more history, or achieve results faster. The processing time should not grow much more than linearly with the amount of data, either in the amount of attributes or the amount of records.

2.3.7 The Project

Finally, your data mining efforts must be managed as you would manage any other complex project. That entails a clear understanding of the measures of success in order to be able to gauge progress. There must be a commitment of the skills and resources needed to sustain the project, and it must be managed following a proven methodology.

You will find more on running data mining projects in 2.5, “The IBM BI Methodology” on page 29.

Related documents