• No results found

Personalization the ability to provide content

N/A
N/A
Protected

Academic year: 2022

Share "Personalization the ability to provide content"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

Using Data Mining Methods to Build Customer Profiles

P

ersonalization—the ability to provide content and services tailored to individuals on the basis of knowledge about their preferences and behavior1—has become an important marketing tool (see the “Personalization”

sidebar). Personalization applications range from per- sonalized Web content presentations to book, CD, and stock purchase recommendations. Among issues the personalization community must deal with, the fol- lowing are of special importance: how to provide per- sonal recommendations based on a comprehensive knowledge of who customers are, how they behave, and how similar they are to other customers; and how to extract this knowledge from the available data and store it in customer profiles.

Various recommender systems address the recom- mendation problem.2Most use either the collabora- tive-filtering3-5or the content-based6approach (see sidebar). Some systems integrate the two methods.2,6

To address the second issue, we have developed an approach that uses information learned from cus- tomers’ transactional histories to construct accurate, comprehensive individual profiles.7One part of the profile contains facts about a customer, and the other part contains rules describing that customer’s behav- ior. We use data mining methods to derive the behav- ioral rules from the data. We have also developed a method for validating customer profiles with the help of a human domain expert who uses validation oper- ators to separate “good” rules from “bad.” We have implemented the profile construction and validation methods in a system called 1:1Pro. Our approach dif- fers from other profiling methods in that we include personal behavioral rules in customer profiles.7

We can judge the quality of rules stored in customer profiles in several ways. We might call rules “good”

because they are statistically valid, acceptable to a human expert in a given application, or effective in that they result in specific benefits such as better deci- sion-making and recommendation capabilities. Here, we focus on the first two aspects: statistical validity and acceptability to an expert.

BUILDING CUSTOMER PROFILES

As Figure 1 illustrates, the two main phases of the profile-building process are rule discovery and valida- tion. Our method of building personalized customer profiles begins with collecting the data.

Data model

Applications use various kinds of data about indi- vidual customers. Many applications classify the data into two basic types: factual—who the customer is—

and transactional—what the customer does.

For example, in a marketing application based on customers’ purchasing histories, the factual data includes demographic information such as name, gen- der, birth date, address, salary, and social security num- ber. Transactional data consists of records of the customer’s purchases during a specific period. A pur- chase record might include the purchase date, prod- uct purchased, amount paid, coupon use, coupon value, and discount applied. Figure 2 shows examples of factual and transactional data.

Profile model

A complete customer profile has two parts: factual and behavioral. The factual profile contains infor- mation, such as name, gender, and date of birth, that the personalization system obtained from the cus- tomer’s factual data. The factual profile also can con- tain information derived from the transactional data, The 1:1Pro system constructs personal profiles based on customers’

transactional histories. The system uses data mining techniques to discover a set of rules describing customers’ behavior and supports human experts in validating the rules.

Gediminas Adomavicius Alexander Tuzhilin

New York University

R E S E A R C H F E A T U R E

(2)

such as “The customer’s favorite beer is Heineken”

or “The customer’s biggest purchase last month was for $237.”

A behavioral profile models the customer’s actions and is usually derived from transactional data.

Examples of behaviors are “When purchasing cereal, John Doe usually buys milk” and “On weekends, John Doe usually spends more than $100 on groceries.”

RULE DISCOVERY

We model individual customer behavior with vari- ous types of conjunctive rules, including association8 and classification rules.9Figure 3 shows an example of association rules discovered for a particular customer.

For instance, Rule 1 specifies that John Doe usually buys lemon juice at RiteAid. More specifically, in 95 percent of the cases when he buys lemon juice, he buys it at RiteAid. In addition, 2.4 percent of all John Doe’s shopping transactions include purchasing lemon juice at RiteAid.

Using rules to describe customer behavior has cer-

tain advantages. Besides being an intuitive and de- scriptive way to represent behaviors, a conjunctive rule is a well-studied concept used extensively in data mining, expert systems, logic programming, and many other areas. Moreover, researchers have pro- posed many rule discovery algorithms in the litera- ture, especially for association and classification rules.

For personalization applications, we apply rule dis- covery methods individually to every customer’s data.

To discover rules that describe the behavior of individ- ual customers, we can use various data mining algo- rithms, such as Apriori8for association rules and CART (Classification and Regression Trees)9for classification rules. Moreover, our profiling approach is not limited

Phase 1 Phase 2

Data mining Validation

Data Rules Profiles

Figure 1. A simplified view of the profile- building process.

Figure 2. Fragments of data in a marketing application showing demographic information (factual data) and records of the customers’ purchases (trans- actional data).

Figure 3. Association rules discovered in a marketing application help to describe the customer’s behavior.

Factual

CustomerId LastName FirstName BirthDate Gender

0721134 Doe John 11/17/1945 Male

0721168 Brown Jane 05/20/1963 Female

0730021 Adams Robert 06/02/1959 Male

Transactional

CustomerId Date Time Store Product CouponUsed

0721134 07/09/1993 10:18am GrandUnion WheatBread No

0721134 07/09/1993 10:18am GrandUnion AppleJuice Yes

0721168 07/10/1993 10:29am Edwards SourCream No

0721134 07/10/1993 07:02pm RiteAid LemonJuice No

0730021 07/10/1993 08:34pm Edwards SkimMilk No

0730021 07/10/1993 08:34pm Edwards AppleJuice No

0721168 07/12/1993 01:13pm GrandUnion BabyDiapers Yes

0730021 07/12/1993 01:13pm GrandUnion WheatBread No

Discovered rules

(1) Product = LemonJuice => Store = RiteAid (2.4%, 95%)

(for John Doe)

(2) Product = WheatBread => Store = GrandUnion (3%, 88%) (3) Product = AppleJuice => CouponUsed = YES (2%, 60%) (4) TimeOfDay = Morning => DayOfWeek = Saturday (4%, 77%)

(5) TimeOfWeek = Weekend & Product = OrangeJuice => Quantity = Big (2%, 75%) (6) Product = BabyDiapers => DayOfWeek = Monday (0.8%, 61%)

(7) Product = BabyDiapers & CouponUsed = YES => Quantity = Big (2.5%, 67%)

(3)

Personalization is a relatively new field, and different authors provide various def- initions of the concept.1We define per- sonalization as an iterative process con- sisting of the steps shown in Figure A.

Collecting Customer Data

Personalization begins with collecting customer data from various sources. This data might include histories of customers’

Web purchasing and browsing activities, as well as demographic and psycho- graphic information. After the data is col- lected, it must be prepared, cleaned, and stored in a data warehouse.2

Customer Profiling

A key issue in developing personaliza- tion applications is constructing accurate and comprehensive customer profiles based on the collected data.

Matchmaking

Personalization systems must match appropriate content and services to indi-

vidual customers. This matchmaking takes various approaches. For example, BroadVision (http://www.broadvision.

com) and Art Technology Group (http://

www.atg.com) use business rules to spec- ify what content to deliver in particular situations. To provide personalized rec- ommendations, recommender systems use technologies such as content-based and collaborative filtering. Content-based sys- tems recommend items similar to items the customer preferred in the past. Col- laborative-filtering systems recommend items that other customers with similar tastes and preferences liked in the past.

Delivery and Presentation

E-companies deliver personalized infor- mation to customers in several ways. One classification of delivery methods is pull, push, and passive.3Push methods reach a customer who is not currently interacting with the system—for example, by sending an e-mail message. Pull methods notify customers that personalized information is available but display this information only when the customer explicitly requests it. Passive delivery displays personalized information in the context of the e-com- merce application. For example, while looking at a product on a Web site, a cus- tomer also sees recommendations for related products. The system can present personalized information in various forms:

narrative, a list ordered by relevance, an unordered list of alternatives, or various types of visualization.3

Measuring Customer Response Companies can use various “e-metrics”

to evaluate the effectiveness of personal- ization technologies.4For example, an e- commerce company can determine whether customers start spending more time (and money) on its Web site, whether personalized services attract new cus- tomers, and whether customer loyalty increases.

As Figure A shows, measuring cus- tomer response serves as feedback for pos- sible improvements of each of the other four steps of the personalization process.

In particular, system designers should use customer response to decide whether to collect additional data, build better user profiles, develop better matchmaking algorithms, and improve information pre- sentation. If organized properly, this iter- ative process improves relationships with customers over time by providing a better understanding of customers, more accu- rately targeted content, and better recom- mendations and services.

Implementation Issues

To implement the personalization process properly, companies must deal with several issues. Privacy is a big con- cern to customers, and companies must reconcile their personalization systems with this concern. Progress in privacy preservation will be critical to personal- ization. Scalability of the personalization process is another important issue—many growing companies must be ready to han- dle millions of customers and tens of thousands of products. Moreover, pro- viding personalized services in real time requires efficient profiling and match- making methods.

Personalization is a broad field, and many companies focus only on certain steps of the process. Overviews of most of the personalization companies are avail- able at http://www.personalization.com and http://www.personalization.org.

References

1. Comm. ACM, Special Issue on Personal- ization, vol. 43, no. 8, 2000.

2. D. Pyle, Data Preparation for Data Min- ing, Morgan Kaufmann, San Francisco, 1999.

3. J.B. Schafer, J.A. Konstan, and J. Riedl,

“E-Commerce Recommendation Appli- cations,” J. Data Mining and Knowledge Discovery, Jan. 2001.

4. M. Cutler and J. Sterne, “E-Metrics,”

NetGenesis Corp., 2000; http://www.

netgen.com/emetrics/.

Measuring customer response

Delivery and presentation of personalized information

Matchmaking

Building customer profiles

Collecting customer data

Personalization

Figure A. Stages of the personalization process. Customer response measurements provide feedback to each of the previous stages.

(4)

to any specific representation of data mining rules or discovery method.

Because data mining methods discover rules for each customer individually, these methods work well for applications containing many transactions for each customer, such as credit card, grocery shopping, online browsing, and stock trading applications. In applications such as a car purchase or vacation plan- ning, individual rules tend to be statistically less reli- able because they are generated from relatively few transactions.

RULE VALIDATION

Data mining methods often generate large numbers of rules, many of which, although statistically accept- able, are trivial, spurious, or just not relevant to the application at hand.10,11Therefore, validating the dis- covered rules is an important requirement. For exam- ple, assume that a data mining method discovers the rule that whenever John Doe takes a business trip to Los Angeles, he stays in an expensive hotel. Assume that John went to Los Angeles seven times over the past two years, and five of those times he stayed in expensive hotels. We must validate this rule—that is, make sure that it captures John’s behavior rather than a spurious correlation and that it is not simply irrele- vant to the application.

One way to validate discovered rules is to let a domain expert inspect them and decide how well they represent customers’ actual behaviors. The expert accepts some rules and rejects the others, and the accepted rules form the behavioral profiles.

An important issue in validating rules is scalability.

In many personalization applications, the number of customers is very large. For example, in a credit-card

application, the number of customers can be measured in millions. If we discover 100 rules per customer on average, the total number of rules in that application would be hundreds of millions. It is simply impossi- ble for a human expert to validate all these rules one by one.

To address this problem, our system uses validation operators that let a human expert validate large num- bers of rules at a time with relatively little input from the expert. As Figure 4 shows, rule validation is an iterative process that lets the expert apply various operators successively and validate many rules each time an operator is applied.

The profile-building process in Figure 4 consists of two phases. Phase 1, data mining, generates rules for each customer from the customer’s transactional data.

Phase 2 constitutes the rule validation process per- formed by the domain expert.

Rule validation, unlike rule discovery, is not a sep- arate process for each customer, but takes place for all customers at once. As a result, the expert usually validates many similar or even identical rules for dif- ferent customers. For example, the rule “When buy- ing cereal, John Doe also buys milk” might be common to many customers. Similarly, the rule

“When shopping on weekends, John Doe usually spends more than $100 on groceries” might be com- mon to many customers with big families. Collective rule validation lets the expert deal with such common rules just once. On the other hand, separate rule val- idation for each customer would force the expert to work on many identical or similar rules repeatedly.

Therefore, at the beginning of Phase 2, the system col- lects rules from all the customers into one set and tags each rule with the ID of the customer to which it Individual

data

Individual rules

All rules

Accepted rules Validation

operator + expert

Individual profiles

Undecided Rejected Discarded

rules

Accepted

Phase 2 validation Phase 1

data mining

Figure 4. An expanded view of the profile-building process. Rule validation is an iterative process in which the expert applies various operators succes- sively to validate rules.

(5)

belongs. After validation, the system places each accepted rule in that customer’s profile.

Figure 5 describes the rule validation process. All rules discovered during Phase 1 (denoted Rallin the fig- ure) are considered unvalidated. The expert chooses various validation operators and applies them succes- sively to the set of unvalidated rules. Applying each val- idation operator results in the acceptance of some rules (set Oacc) and the rejection of others (Orej). The expert then applies the next validation operator to the set of remaining unvalidated rules (Runv). The process stops when the TerminateValidationProcess condition is met.

After the validation process, the set of all discov- ered rules (Rall) is split into three mutually disjoint sets:

accepted rules (Racc), rejected rules (Rrej), and possi- bly some remaining unvalidated rules (Runv). At the end of Phase 2, all accepted rules are placed in the behavioral profiles of their respective customers.

VALIDATION OPERATORS

We have developed several validation operators that a human expert can use to validate large numbers of rules.7

Similarity-based rule grouping

This operator puts similar rules into groups accord- ing to expert-specified similarity criteria. As a result, the expert can inspect groups of rules instead of indi- vidual rules one by one, and can accept or reject all rules in the group at once. We developed a method the expert can use to specify different levels of rule simi- larity. We also developed an efficient (linear running time) rule-grouping algorithm.7

For example, according to the attribute structure similarity condition, all rules that have the same attribute structure (ignoring attribute values and sta- tistical parameters) are similar. Consider the rules in Figure 3. Under the attribute structure similarity con- dition, the grouping operator would place rules 1 and 2 in the same group because they both have the

attribute structure Product => Store. Consequently, any rule with such an attribute structure would be placed in the same group as rules 1 and 2. Rule 3, however, would not be grouped with rules 1 and 2 because it has a different attribute structure: Product

=> CouponUsed. Attribute structure is only one of many possible similarity conditions that the similarity- based rule-grouping operator can use.

Template-based rule filtering

This operator filters rules that match expert-spec- ified rule templates. The expert specifies accepting and rejecting templates. Naturally, rules that match an accepting template are accepted; rules that match a rejecting template are rejected. Rules that do not match a template remain unvalidated. For this oper- ator, we developed a rule-template specification lan- guage and an efficient (linear running time) matching algorithm.

Consider the following rule template: REJECT HEAD = {Store = RiteAid}. This template means

“Reject all rules that have Store = RiteAid in their heads.” Of the seven rules in Figure 3, only rule 1 matches this template and, as a result, would be rejected. A more complicated rule template is ACCEPT BODY {Product} AND HEAD {DayOfWeek, Quantity}. This rule means “Accept all rules that have the attribute Product (possibly among other attributes) in their bodies, that also have heads restricted to the attributes DayOfWeek or Quantity.” In Figure 3, rules 5 and 7 match this template and would be accepted and placed in the profile.

Redundant-rule elimination

This operator eliminates rules that can be derived from other, usually more general, rules and facts. In other words, it eliminates rules that by themselves carry no new information about a customer’s behav- ior. The operator incorporates an algorithm that checks rules for certain redundancy conditions.7 Input: Set of all discovered rules Rall.

Output: Mutually disjoint sets of rules Racc, Rrej, Runv, such that Rall= RaccRrejRunv.

(1) Runv:= Rall, Racc:= ∅, Rrej:= ∅.

(2) while (not TerminateValidationProcess()) begin

(3) Expert picks a validation operator (say, O) from the set of available validation operators.

(4) O is applied to Runv. Result: disjoint sets Oaccand Orej. (5) Runv:= RunvOaccOrej, Racc:= RaccOacc, Rrej:= RrejOrej. (6) end

Figure 5. Basic algorithm for the rule validation process. Applying each validation operator results in the acceptance of some rules and the rejection of others until the TerminateValidationProcess condition is met.

(6)

Consider the association rule Product = AppleJuice

=> Store = GrandUnion (2%, 100%), which was dis- covered in the purchasing history of John Doe. This rule by itself might seem to show the specifics of Doe’s behavior (he buys apple juice only at Grand Union), so putting this rule in his behavioral profile might seem logical. Assume, however, that we also determined from the data that this customer shops exclusively at Grand Union. The AppleJuice rule constitutes a spe- cial case of this finding. Therefore, keeping the fact

“The customer shops only at Grand Union” in John Doe’s factual profile eliminates the need to store the AppleJuice rule in the profile.

In addition to these three operators, we have also introduced other validation operators, such as visual- ization, statistical analysis, and browsing.7The visual- ization operator lets the expert view subsets of unvalidated rules in visual representations such as his- tograms and pie charts. The statistical-analysis operator computes various statistical characteristics of unvali- dated rules, thus providing the expert with important information to use during validation. The browsing operator allows the expert to inspect individual rules or groups of rules directly by viewing them on the screen.

As Figure 5 shows, the expert successively applies validation operators to the set of unvalidated rules until the process reaches the stopping criterion TerminateValidationProcess. The expert can specify this criterion in several ways. The following are two examples of stopping criteria:

• The validation process continues until some pre- determined percentage of rules (such as 95 per- cent) is validated.

• The validation process terminates when valida-

tion operators validate only a few rules at a time—that is, when the costs of selecting and applying each additional validation operator exceed the benefits of validating a few more rules (the law of diminishing returns).

THE 1:1PRO SYSTEM

The 1:1Pro (short for One-to-One Profiling) system implements our profiling and validation methods. The system takes as input the factual and transactional data stored in a database or flat files and generates a set of validated rules capturing individual customers’ behav- ior. It can use any relational database management sys- tem (DBMS) to store customer data and various data mining tools to discover rules. In addition, it can incor- porate other tools useful in the rule validation process, such as visualization and statistical-analysis tools that the validation operators may require.

The 1:1Pro system architecture, shown in Figure 6, follows the client-server model. The server compo- nent consists of the following modules:

• Coordination module—coordinates profile con- struction, including the rule generation process and the subsequent validation process.

• Validation module—validates the rules discov- ered by data mining tools. The system’s current implementation supports similarity-based group- ing, template-based filtering, redundant-rule elimination, and browsing operators.

• Communications module—handles all commu- nications with the 1:1Pro client.

• Interfaces to external modules such as a DBMS, data mining tools, and visualization tools. Each module requires a separate interface (Figure 6).

1:1Pro client

1:1Pro server

GUI

Communications

Coordination

Data mining interface

Communications

Database interface Validation

Other interfaces

DBMS

Data mining tools Visualization

tools Statistical- analysis tools

Network (LAN, WAN, Internet)

Figure 6. The 1:1Pro system architecture. 1:1Pro is an open system that incorporates a broad range of data sources as well as data mining, visualization, and statistical-analysis tools.

(7)

The client component contains the graphical user interface and the communications modules.

The expert uses the GUI to specify validation oper- ations and view the results of the iterative valida- tion process. Figure 7 shows an example of the GUI window for the template-based filtering oper- ator.

The client communication module sends the expert-specified validation request to the server. The server receives validation operators and passes them through the coordination module to the validation component for subsequent processing. Some valida- tion operators, such as the statistical-analysis opera- tor, generate outputs. The communications modules send these outputs from the validation module to the GUI module.

We wanted 1:1Pro to be an open system that easily incorporates a broad range of data sources as well as data mining, visualization, and statistical-analysis tools. In particular, we designed a database interface that supports relational databases (such as Oracle and SQL Server), flat files, and Web logs, as well as other data sources.

In 1:1Pro’s current implementation, we used asso- ciation rule discovery methods to build customer profiles. However, our methods are not restricted to a particular rule structure or discovery algorithm.

We can plug in commercial and experimental data mining tools that use methods other than association rule discovery, such as decision tree methods.9One problem with using external data mining tools is that

their rule representation formats differ from ours.

We overcame this problem by developing rule con- verters for the external data mining tools that inter- face with 1:1Pro.

The expert specifies validation operators through the GUI module, and a log file records the operators as they are applied. Figure 8 shows the log file struc- ture. The log file captures the entire validation process, allowing the expert to keep track of all validation activities. The expert can also retrace validation steps that were not useful by selecting a validation operator in the log file and running the process forward from that operator on. The major fields in the log file record are

• ResultId—the instance of the validation opera- tor used in the current validation step,

• Operator—this operator’s type (grouping, brows- ing, filtering),

• SourceId —the instance of the previously applied validation operator,

• Date/Time—the time stamp when the validation operator was created, and

• Notes—the domain expert’s comments.

The 1:1Pro system’s server component, imple- mented in C++ and Perl, runs on a Linux or Unix plat- form. The client component, implemented in Java, can run on the same machine as the server or on a differ- ent machine. It can also run from an applet viewer or as a stand-alone application.

ResultId Operator SourceId Date/Time Notes

... ... ... ... ...

6 Filter 5 11/23/1998 5:26pm Rejecting: demogr. in the body 7 Group 3 11/23/1998 5:37pm Used attribute-level setting here 8 Browse 7 11/23/1998 5:51pm Accepted: 7 groups, rejected: 11 9 Filter 3 11/23/1998 6:28pm Rejecting: ‘age’ in the head

... ... ... ... ...

Figure 8. A 1:1Pro system log file fragment. The log file captures the entire validation process, allowing the expert to keep track of all validation activities.

Figure 7. Graphical user interface window for a filtering operator. The expert uses the GUI to specify validation operations and view the results of the iterative validation process.

(8)

EXPERIMENTS

We tested 1:1Pro on a real-world marketing appli- cation that included data on 1,903 households pur- chasing various nonalcoholic beverages over a one-year period. The data set contained 21 fields characterizing the purchase transactions and 353,421 records (on average, 186 records per household).

In our case study, we performed a seasonality analy- sis. That is, we constructed customer profiles con- taining individual rules describing season-related customer behaviors. For example, such rules might describe the types of products a customer buys under specific temporal circumstances (only in winter, only on weekends) or the temporal circumstances under which a customer buys specific products.

The system’s data mining module generated 1,022,812 association rules—on average, about 537 rules per household. We observed that most rules per- tain to a very small number of households. For exam- ple, nearly 40 percent of the 407,716 discovered rules pertain to five or fewer of the 1,903 households. Of that 40 percent, nearly half (196,384 rules) apply to only one household. This demonstrates that many dis- covered rules capture truly idiosyncratic behavior of individual households. Because the traditional seg- mentation-based approaches to building customer profiles do not capture idiosyncratic behavior, they would not identify many of the rules discovered in our application. On the other extreme, several discovered rules were applicable to a significant portion of the households. For example, nine rules pertained to more than 800 households. In particular, DayOfWeek=

Monday => Shopper=Female was applicable to 859 households.

Because we were familiar with this application, we performed the expert’s role and validated the discov- ered rules ourselves. Table 1 summarizes the validation process. We started by eliminating redundant rules and then applied several filtering operators, most of which were rule elimination filters. Eliminating redun- dant rules and repeatedly applying rule filters helped us validate 93.4 percent of the rules.

After validating a large number of rules with rel- atively few filtering operators, we decided to switch to a different validation approach—applying the grouping operator to the remaining unvalidated rules.

The grouping operator generated 652 groups. Then, we examined several of the largest groups and vali- dated their rules. As a result, we managed to validate all but 37,370 rules. We then applied a grouping oper- ator based on a different rule structure to these remaining rules and validated a set of additional 8,714 rules. At this point, we encountered the law of diminishing returns—each subsequent application of a validation operator validated a smaller number of rules—and we stopped the validation process. It took

us one hour to perform the whole process (including the software running time and the expert validation time). We validated 97.2 percent of the rules—4.0 percent were accepted and 93.2 percent rejected. Our results demonstrate that our system can validate a significant number of rules in medium-size personal- ization applications.

As a result of the validation process, we reduced the average customer profile size from 537 unvali- dated rules to 21 accepted rules. An example of an accepted rule for one household is Product=

FrozenYogurt & Season=Winter => CouponUsed=

YES—in other words, during the winter, this house- hold buys frozen yogurt, mostly using coupons. We accepted this rule because it reflects an interesting sea- sonal coupon-use pattern. A rejected rule for one household is Product=Beer => Shopper=Male—the predominant beer buyer in this household is male.

We rejected this rule because it is unrelated to the sea- sonality analysis.

Because we had performed the validation process ourselves in the previous case study, we decided to use the help of a marketing expert in another seasonality analysis of the same data. The expert started the analysis by applying redundant-rule elimination and several template-based filtering rejection operators (for example, operators that reject all rules not refer- ring to the Season or DayOfWeek attributes). Then, she grouped the remaining unvalidated rules, exam- ined several resulting groups, and stopped the valida- tion process. At that point, she found nothing more to reject and decided to accept all the remaining unvali- dated rules. As a result, she accepted 42,496 rules (4.2 percent of all discovered rules) and spent about 40 minutes on the entire process.

The marketing expert noted that our rule evalua- tion process is inherently subjective because different experts have different experiences and understand- ings of the application. She believes that different experts can arrive at different evaluation results using the same validation process.

W

e must point out that the accepted rules, although valid and relevant to the expert, may not be effective. That is, they might not guarantee actionable results, such as decisions, rec-

Table 1. A validation process for the seasonality analysis of market research data.

Validation Accepted Rejected Unvalidated

operator rules rules rules

Redundancy elimination 0 186,727 836,085

Filtering 0 285,528 550,557

Filtering 0 424,214 126,343

Filtering 0 48,682 77,661

Filtering 10,052 0 67,609

Grouping (652 groups) 23,417 6,822 37,370 Grouping (4,765 groups) 7,181 1,533 28,656

Total 40,650 953,506 1,724,281

(9)

ommendations, and other user-related actions. We are currently working on incorporating the concept of effectiveness in 1:1Pro.

We are also dealing with the problem of generating many irrelevant rules that are subsequently rejected during the validation process. One way to handle the problem is for the domain expert to specify constraints on the types of rules of interest before the rule dis- covery stage. Ramakrishnan Srikant, Quoc Vu, and Rakesh Agrawal have presented methods for specify- ing such constraints,12and thus reducing the number of discovered rules that are irrelevant to the expert.

Since it is difficult for an expert to anticipate all the relevant and irrelevant rules in advance, validating the discovered rules is still necessary. Thus, the ideal solu- tion is to combine the constraint specification, data mining, and rule validation stages in one system. We are currently working on integrating the three stages in 1:1Pro. ✸

References

1. P. Hagen, “Smart Personalization,” The Forrester Report, Forrester Research, Cambridge, Mass., July 1999.

2. Comm. ACM, Special Issue on Recommender Systems, vol. 40, no. 3, 1997.

3. P. Resnick et al., “GroupLens: An Open Architecture for Collaborative Filtering of Netnews,” Proc. 1994 Com- puter-Supported Cooperative Work Conf., ACM Press, New York, 1994, pp. 175-186.

4. U. Shardanand and P. Maes, “Social Information Filter- ing: Algorithms for Automating ‘Word of Mouth,’” Proc.

Conf. Human Factors in Computing Systems (CHI 95), ACM Press, New York, 1995, pp. 210-217.

5. W. Hill et al., “Recommending and Evaluating Choices in a Virtual Community of Use,” Proc. Conf. Human Factors in Computing Systems (CHI95), ACM Press, New York, 1995, pp. 194-201.

6. M. Pazzani, “A Framework for Collaborative, Content- Based and Demographic Filtering,” Artificial Intelligence Review, Dec. 1999, pp. 393-408.

7. G. Adomavicius and A. Tuzhilin, “Expert-Driven Vali- dation of Rule-Based User Models in Personalization Applications,” J. Data Mining and Knowledge Discov- ery, Jan. 2001, pp. 33-58.

8. R. Agrawal et al., “Fast Discovery of Association Rules,”

Advances in Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, Calif., 1996, chap. 12.

9. L. Breiman et al., Classification and Regression Trees, Wadsworth, Belmont, Calif., 1984.

10. G. Piatetsky-Shapiro and C.J. Matheus, “The Interest- ingness of Deviations,” Proc. AAAI-94 Workshop Knowledge Discovery in Databases, AAAI Press, Menlo Park, Calif., 1994, pp. 25-36.

11. A. Silberschatz and A. Tuzhilin, “What Makes Patterns Interesting in Knowledge Discovery Systems,” IEEE Trans. Knowledge and Data Engineering, Dec. 1996, pp. 970-974.

12. R. Srikant, Q. Vu, and R. Agrawal, “Mining Associa- tion Rules with Item Constraints,” Proc. Third Int’l Conf. Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, Calif., 1997, pp. 67-73.

Gediminas Adomavicius is a PhD candidate in com- puter science at Courant Institute of Mathematical Sciences, New York University, where he received an MS. He received a BS in mathematics from Vilnius University, Lithuania. He is a recipient of a Fulbright Fellowship. His research interests include data min- ing, personalization, and scientific computing. Con- tact him at adomavic@cs.nyu.edu.

Alexander Tuzhilin is an associate professor of infor- mation systems at the Stern School of Business, New York University. His research interests include knowl- edge discovery in databases, personalization, and tem- poral databases. Tuzhilin received a PhD in computer science from the Courant Institute of Mathematical Sciences, NYU. Contact him at atuzhili@stern.nyu.

edu.

IT Professional is looking for contributions about the following cover feature topics for 2001:

Mar./Apr. The Paperless Environment Faster and increasingly powerful networks are making it easy to pass documents back and forth electronically. But security and authentication remain a concern for commercial companies and governments alike.

Find out how banks, courts, and others are using encryption, and electronic documents and signatures.

May/June Wireless

A bewildering array of wireless technologies—Bluetooth and WAP among them—are making inroads into corporate communications. Making it all work together with existing systems is IT’s headache, but will be a boon to global communications and a company’s competitive edge.

July/Aug. IT Projections and Market Models Reinventing the wheel is never fun, yet IT departments do it every time they make projections for manpower, project costs, and future technologies. Find out what data and tools to avoid reading “data analysis” use to look beyond the next week’s work to make strategic calls.

2001

EDITORIAL CALENDAR 2001

EDITORIAL CALENDAR

References

Related documents

We amend the real EF data by generating a certain number of papers by author X and linking each of them with 35 randomly chosen users (35 is the average paper degree in the

In this PhD thesis new organic NIR materials (both π-conjugated polymers and small molecules) based on α,β-unsubstituted meso-positioning thienyl BODIPY have been

President Abraham Lincoln issued the Emancipation Proclamation on January 1 163 as the nation approached its third cause of bloody civil penalty The proclamation declared that

As long as a shopper’s smartphone can provide greater access to the information she cares about—extended assortments, inventory availability, purchase history,

For more detailed information on the process, and what it may or may not do for you, refer to the resource guide titled Expunging Criminal Records: Step by Step that is provided by

Both a plant cell (shown at left magnified 1750!) and an animal cell (shown at right magnified 12,000!) have a nucleus and a cell membrane.. Only plant cells have a

Hence, similar effects on the number of jams and the jam weight are achieved using controlled advice systems but the negative effects on the average network speed are reduced..

each side must bear its own legal costs. However, in an effort not to discourage plaintiffs from bringing claims, some statutes, such as Title VII of the Civil Rights Act of 1964,