Personalization the ability to provide content

(1)

Using Data Mining Methods to Build Customer Profiles

P

ersonalization—the ability to provide content and services tailored to individuals on the basis of knowledge about their preferences and behavior¹—has become an important marketing tool (see the “Personalization”

sidebar). Personalization applications range from personalized Web content presentations to book, CD, and stock purchase recommendations. Among issues the personalization community must deal with, the following are of special importance: how to provide personal recommendations based on a comprehensive knowledge of who customers are, how they behave, and how similar they are to other customers; and how to extract this knowledge from the available data and store it in customer profiles.

Various recommender systems address the recommendation problem.²Most use either the collaborative-filtering^3-5or the content-based⁶approach (see sidebar). Some systems integrate the two methods.^2,6

To address the second issue, we have developed an approach that uses information learned from customers’ transactional histories to construct accurate, comprehensive individual profiles.⁷One part of the profile contains facts about a customer, and the other part contains rules describing that customer’s behavior. We use data mining methods to derive the behavioral rules from the data. We have also developed a method for validating customer profiles with the help of a human domain expert who uses validation operators to separate “good” rules from “bad.” We have implemented the profile construction and validation methods in a system called 1:1Pro. Our approach dif- fers from other profiling methods in that we include personal behavioral rules in customer profiles.⁷

We can judge the quality of rules stored in customer profiles in several ways. We might call rules “good”

because they are statistically valid, acceptable to a human expert in a given application, or effective in that they result in specific benefits such as better decision-making and recommendation capabilities. Here, we focus on the first two aspects: statistical validity and acceptability to an expert.

BUILDING CUSTOMER PROFILES

As Figure 1 illustrates, the two main phases of the profile-building process are rule discovery and validation. Our method of building personalized customer profiles begins with collecting the data.

Data model

Applications use various kinds of data about individual customers. Many applications classify the data into two basic types: factual—who the customer is—

and transactional—what the customer does.

For example, in a marketing application based on customers’ purchasing histories, the factual data includes demographic information such as name, gender, birth date, address, salary, and social security number. Transactional data consists of records of the customer’s purchases during a specific period. A purchase record might include the purchase date, product purchased, amount paid, coupon use, coupon value, and discount applied. Figure 2 shows examples of factual and transactional data.

Profile model

A complete customer profile has two parts: factual and behavioral. The factual profile contains information, such as name, gender, and date of birth, that the personalization system obtained from the customer’s factual data. The factual profile also can con- tain information derived from the transactional data, The 1:1Pro system constructs personal profiles based on customers’

transactional histories. The system uses data mining techniques to discover a set of rules describing customers’ behavior and supports human experts in validating the rules.

Gediminas Adomavicius Alexander Tuzhilin

New York University

R E S E A R C H F E A T U R E

(2)

such as “The customer’s favorite beer is Heineken”

or “The customer’s biggest purchase last month was for $237.”

A behavioral profile models the customer’s actions and is usually derived from transactional data.

Examples of behaviors are “When purchasing cereal, John Doe usually buys milk” and “On weekends, John Doe usually spends more than $100 on groceries.”

RULE DISCOVERY

We model individual customer behavior with various types of conjunctive rules, including association⁸ and classification rules.⁹Figure 3 shows an example of association rules discovered for a particular customer.

For instance, Rule 1 specifies that John Doe usually buys lemon juice at RiteAid. More specifically, in 95 percent of the cases when he buys lemon juice, he buys it at RiteAid. In addition, 2.4 percent of all John Doe’s shopping transactions include purchasing lemon juice at RiteAid.

Using rules to describe customer behavior has cer-

tain advantages. Besides being an intuitive and de- scriptive way to represent behaviors, a conjunctive rule is a well-studied concept used extensively in data mining, expert systems, logic programming, and many other areas. Moreover, researchers have pro- posed many rule discovery algorithms in the litera- ture, especially for association and classification rules.

For personalization applications, we apply rule discovery methods individually to every customer’s data.

To discover rules that describe the behavior of individual customers, we can use various data mining algorithms, such as Apriori⁸for association rules and CART (Classification and Regression Trees)⁹for classification rules. Moreover, our profiling approach is not limited

Phase 1 Phase 2

Data mining Validation

Data Rules Profiles

Figure 1. A simplified view of the profile- building process.

Figure 2. Fragments of data in a marketing application showing demographic information (factual data) and records of the customers’ purchases (trans- actional data).

Figure 3. Association rules discovered in a marketing application help to describe the customer’s behavior.

Factual

CustomerId LastName FirstName BirthDate Gender

0721134 Doe John 11/17/1945 Male

0721168 Brown Jane 05/20/1963 Female

0730021 Adams Robert 06/02/1959 Male

Transactional

CustomerId Date Time Store Product CouponUsed

0721134 07/09/1993 10:18am GrandUnion WheatBread No

0721134 07/09/1993 10:18am GrandUnion AppleJuice Yes

0721168 07/10/1993 10:29am Edwards SourCream No

0721134 07/10/1993 07:02pm RiteAid LemonJuice No

0730021 07/10/1993 08:34pm Edwards SkimMilk No

0730021 07/10/1993 08:34pm Edwards AppleJuice No

0721168 07/12/1993 01:13pm GrandUnion BabyDiapers Yes

0730021 07/12/1993 01:13pm GrandUnion WheatBread No

Discovered rules

(1) Product = LemonJuice => Store = RiteAid (2.4%, 95%)

(for John Doe)

(2) Product = WheatBread => Store = GrandUnion (3%, 88%) (3) Product = AppleJuice => CouponUsed = YES (2%, 60%) (4) TimeOfDay = Morning => DayOfWeek = Saturday (4%, 77%)

(5) TimeOfWeek = Weekend & Product = OrangeJuice => Quantity = Big (2%, 75%) (6) Product = BabyDiapers => DayOfWeek = Monday (0.8%, 61%)

(7) Product = BabyDiapers & CouponUsed = YES => Quantity = Big (2.5%, 67%)

(3)

Personalization is a relatively new field, and different authors provide various def- initions of the concept.¹We define personalization as an iterative process con- sisting of the steps shown in Figure A.

Collecting Customer Data

Personalization begins with collecting customer data from various sources. This data might include histories of customers’

Web purchasing and browsing activities, as well as demographic and psycho- graphic information. After the data is collected, it must be prepared, cleaned, and stored in a data warehouse.²

Customer Profiling

A key issue in developing personalization applications is constructing accurate and comprehensive customer profiles based on the collected data.

Matchmaking

Personalization systems must match appropriate content and services to indi-

vidual customers. This matchmaking takes various approaches. For example, BroadVision (http://www.broadvision.

com) and Art Technology Group (http://

www.atg.com) use business rules to specify what content to deliver in particular situations. To provide personalized recommendations, recommender systems use technologies such as content-based and collaborative filtering. Content-based systems recommend items similar to items the customer preferred in the past. Col- laborative-filtering systems recommend items that other customers with similar tastes and preferences liked in the past.

Delivery and Presentation

E-companies deliver personalized information to customers in several ways. One classification of delivery methods is pull, push, and passive.³Push methods reach a customer who is not currently interacting with the system—for example, by sending an e-mail message. Pull methods notify customers that personalized information is available but display this information only when the customer explicitly requests it. Passive delivery displays personalized information in the context of the e-commerce application. For example, while looking at a product on a Web site, a customer also sees recommendations for related products. The system can present personalized information in various forms:

narrative, a list ordered by relevance, an unordered list of alternatives, or various types of visualization.³

Measuring Customer Response Companies can use various “e-metrics”

to evaluate the effectiveness of personalization technologies.⁴For example, an e- commerce company can determine whether customers start spending more time (and money) on its Web site, whether personalized services attract new customers, and whether customer loyalty increases.

As Figure A shows, measuring customer response serves as feedback for possible improvements of each of the other four steps of the personalization process.

In particular, system designers should use customer response to decide whether to collect additional data, build better user profiles, develop better matchmaking algorithms, and improve information presentation. If organized properly, this iterative process improves relationships with customers over time by providing a better understanding of customers, more accu- rately targeted content, and better recommendations and services.

Implementation Issues

To implement the personalization process properly, companies must deal with several issues. Privacy is a big concern to customers, and companies must reconcile their personalization systems with this concern. Progress in privacy preservation will be critical to personalization. Scalability of the personalization process is another important issue—many growing companies must be ready to handle millions of customers and tens of thousands of products. Moreover, providing personalized services in real time requires efficient profiling and matchmaking methods.

Personalization is a broad field, and many companies focus only on certain steps of the process. Overviews of most of the personalization companies are available at http://www.personalization.com and http://www.personalization.org.

References

1. Comm. ACM, Special Issue on Personal- ization, vol. 43, no. 8, 2000.

2. D. Pyle, Data Preparation for Data Min- ing, Morgan Kaufmann, San Francisco, 1999.

3. J.B. Schafer, J.A. Konstan, and J. Riedl,

“E-Commerce Recommendation Appli- cations,” J. Data Mining and Knowledge Discovery, Jan. 2001.

4. M. Cutler and J. Sterne, “E-Metrics,”

NetGenesis Corp., 2000; http://www.

netgen.com/emetrics/.

Measuring customer response

Delivery and presentation of personalized information

Matchmaking

Building customer profiles

Collecting customer data

Personalization

Figure A. Stages of the personalization process. Customer response measurements provide feedback to each of the previous stages.

(4)

to any specific representation of data mining rules or discovery method.

Because data mining methods discover rules for each customer individually, these methods work well for applications containing many transactions for each customer, such as credit card, grocery shopping, online browsing, and stock trading applications. In applications such as a car purchase or vacation plan- ning, individual rules tend to be statistically less reli- able because they are generated from relatively few transactions.

RULE VALIDATION

Data mining methods often generate large numbers of rules, many of which, although statistically acceptable, are trivial, spurious, or just not relevant to the application at hand.^10,11Therefore, validating the discovered rules is an important requirement. For example, assume that a data mining method discovers the rule that whenever John Doe takes a business trip to Los Angeles, he stays in an expensive hotel. Assume that John went to Los Angeles seven times over the past two years, and five of those times he stayed in expensive hotels. We must validate this rule—that is, make sure that it captures John’s behavior rather than a spurious correlation and that it is not simply irrelevant to the application.

One way to validate discovered rules is to let a domain expert inspect them and decide how well they represent customers’ actual behaviors. The expert accepts some rules and rejects the others, and the accepted rules form the behavioral profiles.

An important issue in validating rules is scalability.

In many personalization applications, the number of customers is very large. For example, in a credit-card

application, the number of customers can be measured in millions. If we discover 100 rules per customer on average, the total number of rules in that application would be hundreds of millions. It is simply impossi- ble for a human expert to validate all these rules one by one.

To address this problem, our system uses validation operators that let a human expert validate large numbers of rules at a time with relatively little input from the expert. As Figure 4 shows, rule validation is an iterative process that lets the expert apply various operators successively and validate many rules each time an operator is applied.

The profile-building process in Figure 4 consists of two phases. Phase 1, data mining, generates rules for each customer from the customer’s transactional data.

Phase 2 constitutes the rule validation process performed by the domain expert.

Rule validation, unlike rule discovery, is not a separate process for each customer, but takes place for all customers at once. As a result, the expert usually validates many similar or even identical rules for different customers. For example, the rule “When buy- ing cereal, John Doe also buys milk” might be common to many customers. Similarly, the rule

“When shopping on weekends, John Doe usually spends more than $100 on groceries” might be common to many customers with big families. Collective rule validation lets the expert deal with such common rules just once. On the other hand, separate rule validation for each customer would force the expert to work on many identical or similar rules repeatedly.

Therefore, at the beginning of Phase 2, the system col- lects rules from all the customers into one set and tags each rule with the ID of the customer to which it Individual

data

Individual rules

All rules

Accepted rules Validation

operator + expert

Individual profiles

Undecided Rejected Discarded

rules

Accepted

Phase 2 validation Phase 1

data mining

Figure 4. An expanded view of the profile-building process. Rule validation is an iterative process in which the expert applies various operators succes- sively to validate rules.

(5)

belongs. After validation, the system places each accepted rule in that customer’s profile.

Figure 5 describes the rule validation process. All rules discovered during Phase 1 (denoted Rallin the figure) are considered unvalidated. The expert chooses various validation operators and applies them successively to the set of unvalidated rules. Applying each validation operator results in the acceptance of some rules (set Oacc) and the rejection of others (Orej). The expert then applies the next validation operator to the set of remaining unvalidated rules (Runv). The process stops when the TerminateValidationProcess condition is met.

After the validation process, the set of all discov- ered rules (Rall) is split into three mutually disjoint sets:

accepted rules (Racc), rejected rules (Rrej), and possi- bly some remaining unvalidated rules (Runv). At the end of Phase 2, all accepted rules are placed in the behavioral profiles of their respective customers.

VALIDATION OPERATORS

We have developed several validation operators that a human expert can use to validate large numbers of rules.⁷

Similarity-based rule grouping

This operator puts similar rules into groups according to expert-specified similarity criteria. As a result, the expert can inspect groups of rules instead of individual rules one by one, and can accept or reject all rules in the group at once. We developed a method the expert can use to specify different levels of rule similarity. We also developed an efficient (linear running time) rule-grouping algorithm.⁷

For example, according to the attribute structure similarity condition, all rules that have the same attribute structure (ignoring attribute values and statistical parameters) are similar. Consider the rules in Figure 3. Under the attribute structure similarity condition, the grouping operator would place rules 1 and 2 in the same group because they both have the

attribute structure Product => Store. Consequently, any rule with such an attribute structure would be placed in the same group as rules 1 and 2. Rule 3, however, would not be grouped with rules 1 and 2 because it has a different attribute structure: Product

=> CouponUsed. Attribute structure is only one of many possible similarity conditions that the similarity- based rule-grouping operator can use.

Template-based rule filtering

This operator filters rules that match expert-specified rule templates. The expert specifies accepting and rejecting templates. Naturally, rules that match an accepting template are accepted; rules that match a rejecting template are rejected. Rules that do not match a template remain unvalidated. For this operator, we developed a rule-template specification lan- guage and an efficient (linear running time) matching algorithm.

Consider the following rule template: REJECT HEAD = {Store = RiteAid}. This template means

“Reject all rules that have Store = RiteAid in their heads.” Of the seven rules in Figure 3, only rule 1 matches this template and, as a result, would be rejected. A more complicated rule template is ACCEPT BODY ⊇{Product} AND HEAD {DayOfWeek, Quantity}. This rule means “Accept all rules that have the attribute Product (possibly among other attributes) in their bodies, that also have heads restricted to the attributes DayOfWeek or Quantity.” In Figure 3, rules 5 and 7 match this template and would be accepted and placed in the profile.

Redundant-rule elimination

This operator eliminates rules that can be derived from other, usually more general, rules and facts. In other words, it eliminates rules that by themselves carry no new information about a customer’s behavior. The operator incorporates an algorithm that checks rules for certain redundancy conditions.⁷ Input: Set of all discovered rules Rall.

Output: Mutually disjoint sets of rules Racc, Rrej, Runv, such that Rall= Racc∪Rrej∪Runv.

(1) Runv:= Rall, Racc:= ∅, Rrej:= ∅.

(2) while (not TerminateValidationProcess()) begin

(3) Expert picks a validation operator (say, O) from the set of available validation operators.

(4) O is applied to Runv. Result: disjoint sets Oaccand Orej. (5) Runv:= Runv−Oacc−Orej, Racc:= Racc∪ Oacc, Rrej:= Rrej∪ Orej. (6) end

Figure 5. Basic algorithm for the rule validation process. Applying each validation operator results in the acceptance of some rules and the rejection of others until the TerminateValidationProcess condition is met.

(6)

Consider the association rule Product = AppleJuice

=> Store = GrandUnion (2%, 100%), which was dis- covered in the purchasing history of John Doe. This rule by itself might seem to show the specifics of Doe’s behavior (he buys apple juice only at Grand Union), so putting this rule in his behavioral profile might seem logical. Assume, however, that we also determined from the data that this customer shops exclusively at Grand Union. The AppleJuice rule constitutes a spe- cial case of this finding. Therefore, keeping the fact

“The customer shops only at Grand Union” in John Doe’s factual profile eliminates the need to store the AppleJuice rule in the profile.

In addition to these three operators, we have also introduced other validation operators, such as visualization, statistical analysis, and browsing.⁷The visualization operator lets the expert view subsets of unvalidated rules in visual representations such as his- tograms and pie charts. The statistical-analysis operator computes various statistical characteristics of unvalidated rules, thus providing the expert with important information to use during validation. The browsing operator allows the expert to inspect individual rules or groups of rules directly by viewing them on the screen.

As Figure 5 shows, the expert successively applies validation operators to the set of unvalidated rules until the process reaches the stopping criterion TerminateValidationProcess. The expert can specify this criterion in several ways. The following are two examples of stopping criteria:

• The validation process continues until some pre- determined percentage of rules (such as 95 percent) is validated.

• The validation process terminates when valida-

tion operators validate only a few rules at a time—that is, when the costs of selecting and applying each additional validation operator exceed the benefits of validating a few more rules (the law of diminishing returns).

THE 1:1PRO SYSTEM

The 1:1Pro (short for One-to-One Profiling) system implements our profiling and validation methods. The system takes as input the factual and transactional data stored in a database or flat files and generates a set of validated rules capturing individual customers’ behavior. It can use any relational database management system (DBMS) to store customer data and various data mining tools to discover rules. In addition, it can incor- porate other tools useful in the rule validation process, such as visualization and statistical-analysis tools that the validation operators may require.

The 1:1Pro system architecture, shown in Figure 6, follows the client-server model. The server component consists of the following modules:

• Coordination module—coordinates profile con- struction, including the rule generation process and the subsequent validation process.

• Validation module—validates the rules discov- ered by data mining tools. The system’s current implementation supports similarity-based grouping, template-based filtering, redundant-rule elimination, and browsing operators.

• Communications module—handles all commu- nications with the 1:1Pro client.

• Interfaces to external modules such as a DBMS, data mining tools, and visualization tools. Each module requires a separate interface (Figure 6).

1:1Pro client

1:1Pro server

GUI

Communications

Coordination

Data mining interface

Communications

Database interface Validation

Other interfaces

DBMS

Data mining tools Visualization

tools Statistical- analysis tools

Network (LAN, WAN, Internet)

Figure 6. The 1:1Pro system architecture. 1:1Pro is an open system that incorporates a broad range of data sources as well as data mining, visualization, and statistical-analysis tools.

(7)

The client component contains the graphical user interface and the communications modules.

The expert uses the GUI to specify validation operations and view the results of the iterative validation process. Figure 7 shows an example of the GUI window for the template-based filtering operator.

The client communication module sends the expert-specified validation request to the server. The server receives validation operators and passes them through the coordination module to the validation component for subsequent processing. Some validation operators, such as the statistical-analysis operator, generate outputs. The communications modules send these outputs from the validation module to the GUI module.

We wanted 1:1Pro to be an open system that easily incorporates a broad range of data sources as well as data mining, visualization, and statistical-analysis tools. In particular, we designed a database interface that supports relational databases (such as Oracle and SQL Server), flat files, and Web logs, as well as other data sources.

In 1:1Pro’s current implementation, we used association rule discovery methods to build customer profiles. However, our methods are not restricted to a particular rule structure or discovery algorithm.

We can plug in commercial and experimental data mining tools that use methods other than association rule discovery, such as decision tree methods.⁹One problem with using external data mining tools is that

their rule representation formats differ from ours.

We overcame this problem by developing rule con- verters for the external data mining tools that interface with 1:1Pro.

The expert specifies validation operators through the GUI module, and a log file records the operators as they are applied. Figure 8 shows the log file structure. The log file captures the entire validation process, allowing the expert to keep track of all validation activities. The expert can also retrace validation steps that were not useful by selecting a validation operator in the log file and running the process forward from that operator on. The major fields in the log file record are

• ResultId—the instance of the validation opera- tor used in the current validation step,

• Operator—this operator’s type (grouping, brows- ing, filtering),

• SourceId —the instance of the previously applied validation operator,

• Date/Time—the time stamp when the validation operator was created, and

• Notes—the domain expert’s comments.

The 1:1Pro system’s server component, implemented in C++ and Perl, runs on a Linux or Unix plat- form. The client component, implemented in Java, can run on the same machine as the server or on a different machine. It can also run from an applet viewer or as a stand-alone application.

ResultId Operator SourceId Date/Time Notes

... ... ... ... ...

6 Filter 5 11/23/1998 5:26pm Rejecting: demogr. in the body 7 Group 3 11/23/1998 5:37pm Used attribute-level setting here 8 Browse 7 11/23/1998 5:51pm Accepted: 7 groups, rejected: 11 9 Filter 3 11/23/1998 6:28pm Rejecting: ‘age’ in the head

... ... ... ... ...

Figure 8. A 1:1Pro system log file fragment. The log file captures the entire validation process, allowing the expert to keep track of all validation activities.

Figure 7. Graphical user interface window for a filtering operator. The expert uses the GUI to specify validation operations and view the results of the iterative validation process.

(8)

EXPERIMENTS

We tested 1:1Pro on a real-world marketing application that included data on 1,903 households purchasing various nonalcoholic beverages over a one-year period. The data set contained 21 fields characterizing the purchase transactions and 353,421 records (on average, 186 records per household).

In our case study, we performed a seasonality analysis. That is, we constructed customer profiles containing individual rules describing season-related customer behaviors. For example, such rules might describe the types of products a customer buys under specific temporal circumstances (only in winter, only on weekends) or the temporal circumstances under which a customer buys specific products.

The system’s data mining module generated 1,022,812 association rules—on average, about 537 rules per household. We observed that most rules pertain to a very small number of households. For example, nearly 40 percent of the 407,716 discovered rules pertain to five or fewer of the 1,903 households. Of that 40 percent, nearly half (196,384 rules) apply to only one household. This demonstrates that many discovered rules capture truly idiosyncratic behavior of individual households. Because the traditional seg- mentation-based approaches to building customer profiles do not capture idiosyncratic behavior, they would not identify many of the rules discovered in our application. On the other extreme, several discovered rules were applicable to a significant portion of the households. For example, nine rules pertained to more than 800 households. In particular, DayOfWeek=

Monday => Shopper=Female was applicable to 859 households.

Because we were familiar with this application, we performed the expert’s role and validated the discovered rules ourselves. Table 1 summarizes the validation process. We started by eliminating redundant rules and then applied several filtering operators, most of which were rule elimination filters. Eliminating redundant rules and repeatedly applying rule filters helped us validate 93.4 percent of the rules.

After validating a large number of rules with relatively few filtering operators, we decided to switch to a different validation approach—applying the grouping operator to the remaining unvalidated rules.

The grouping operator generated 652 groups. Then, we examined several of the largest groups and validated their rules. As a result, we managed to validate all but 37,370 rules. We then applied a grouping operator based on a different rule structure to these remaining rules and validated a set of additional 8,714 rules. At this point, we encountered the law of diminishing returns—each subsequent application of a validation operator validated a smaller number of rules—and we stopped the validation process. It took

us one hour to perform the whole process (including the software running time and the expert validation time). We validated 97.2 percent of the rules—4.0 percent were accepted and 93.2 percent rejected. Our results demonstrate that our system can validate a significant number of rules in medium-size personalization applications.

As a result of the validation process, we reduced the average customer profile size from 537 unvalidated rules to 21 accepted rules. An example of an accepted rule for one household is Product=

FrozenYogurt & Season=Winter => CouponUsed=

YES—in other words, during the winter, this house- hold buys frozen yogurt, mostly using coupons. We accepted this rule because it reflects an interesting sea- sonal coupon-use pattern. A rejected rule for one household is Product=Beer => Shopper=Male—the predominant beer buyer in this household is male.

We rejected this rule because it is unrelated to the seasonality analysis.

Because we had performed the validation process ourselves in the previous case study, we decided to use the help of a marketing expert in another seasonality analysis of the same data. The expert started the analysis by applying redundant-rule elimination and several template-based filtering rejection operators (for example, operators that reject all rules not refer- ring to the Season or DayOfWeek attributes). Then, she grouped the remaining unvalidated rules, examined several resulting groups, and stopped the validation process. At that point, she found nothing more to reject and decided to accept all the remaining unvalidated rules. As a result, she accepted 42,496 rules (4.2 percent of all discovered rules) and spent about 40 minutes on the entire process.

The marketing expert noted that our rule evaluation process is inherently subjective because different experts have different experiences and understand- ings of the application. She believes that different experts can arrive at different evaluation results using the same validation process.

W

e must point out that the accepted rules, although valid and relevant to the expert, may not be effective. That is, they might not guarantee actionable results, such as decisions, rec-

Table 1. A validation process for the seasonality analysis of market research data.

Validation Accepted Rejected Unvalidated

operator rules rules rules

Redundancy elimination 0 186,727 836,085

Filtering 0 285,528 550,557

Filtering 0 424,214 126,343

Filtering 0 48,682 77,661

Filtering 10,052 0 67,609

Grouping (652 groups) 23,417 6,822 37,370 Grouping (4,765 groups) 7,181 1,533 28,656

Total 40,650 953,506 1,724,281

(9)

ommendations, and other user-related actions. We are currently working on incorporating the concept of effectiveness in 1:1Pro.

We are also dealing with the problem of generating many irrelevant rules that are subsequently rejected during the validation process. One way to handle the problem is for the domain expert to specify constraints on the types of rules of interest before the rule discovery stage. Ramakrishnan Srikant, Quoc Vu, and Rakesh Agrawal have presented methods for specify- ing such constraints,¹²and thus reducing the number of discovered rules that are irrelevant to the expert.

Since it is difficult for an expert to anticipate all the relevant and irrelevant rules in advance, validating the discovered rules is still necessary. Thus, the ideal solu- tion is to combine the constraint specification, data mining, and rule validation stages in one system. We are currently working on integrating the three stages in 1:1Pro. ✸

References

1. P. Hagen, “Smart Personalization,” The Forrester Report, Forrester Research, Cambridge, Mass., July 1999.

2. Comm. ACM, Special Issue on Recommender Systems, vol. 40, no. 3, 1997.

3. P. Resnick et al., “GroupLens: An Open Architecture for Collaborative Filtering of Netnews,” Proc. 1994 Com- puter-Supported Cooperative Work Conf., ACM Press, New York, 1994, pp. 175-186.

4. U. Shardanand and P. Maes, “Social Information Filter- ing: Algorithms for Automating ‘Word of Mouth,’” Proc.

Conf. Human Factors in Computing Systems (CHI 95), ACM Press, New York, 1995, pp. 210-217.

5. W. Hill et al., “Recommending and Evaluating Choices in a Virtual Community of Use,” Proc. Conf. Human Factors in Computing Systems (CHI95), ACM Press, New York, 1995, pp. 194-201.

6. M. Pazzani, “A Framework for Collaborative, Content- Based and Demographic Filtering,” Artificial Intelligence Review, Dec. 1999, pp. 393-408.

7. G. Adomavicius and A. Tuzhilin, “Expert-Driven Vali- dation of Rule-Based User Models in Personalization Applications,” J. Data Mining and Knowledge Discov- ery, Jan. 2001, pp. 33-58.

8. R. Agrawal et al., “Fast Discovery of Association Rules,”

Advances in Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, Calif., 1996, chap. 12.

9. L. Breiman et al., Classification and Regression Trees, Wadsworth, Belmont, Calif., 1984.

10. G. Piatetsky-Shapiro and C.J. Matheus, “The Interest- ingness of Deviations,” Proc. AAAI-94 Workshop Knowledge Discovery in Databases, AAAI Press, Menlo Park, Calif., 1994, pp. 25-36.

11. A. Silberschatz and A. Tuzhilin, “What Makes Patterns Interesting in Knowledge Discovery Systems,” IEEE Trans. Knowledge and Data Engineering, Dec. 1996, pp. 970-974.

12. R. Srikant, Q. Vu, and R. Agrawal, “Mining Associa- tion Rules with Item Constraints,” Proc. Third Int’l Conf. Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, Calif., 1997, pp. 67-73.

Gediminas Adomavicius is a PhD candidate in com- puter science at Courant Institute of Mathematical Sciences, New York University, where he received an MS. He received a BS in mathematics from Vilnius University, Lithuania. He is a recipient of a Fulbright Fellowship. His research interests include data min- ing, personalization, and scientific computing. Con- tact him at [email protected].

Alexander Tuzhilin is an associate professor of infor- mation systems at the Stern School of Business, New York University. His research interests include knowl- edge discovery in databases, personalization, and tem- poral databases. Tuzhilin received a PhD in computer science from the Courant Institute of Mathematical Sciences, NYU. Contact him at [email protected].

edu.

IT Professional is looking for contributions about the following cover feature topics for 2001:

Mar./Apr. The Paperless Environment Faster and increasingly powerful networks are making it easy to pass documents back and forth electronically. But security and authentication remain a concern for commercial companies and governments alike.

Find out how banks, courts, and others are using encryption, and electronic documents and signatures.

May/June Wireless

A bewildering array of wireless technologies—Bluetooth and WAP among them—are making inroads into corporate communications. Making it all work together with existing systems is IT’s headache, but will be a boon to global communications and a company’s competitive edge.

July/Aug. IT Projections and Market Models Reinventing the wheel is never fun, yet IT departments do it every time they make projections for manpower, project costs, and future technologies. Find out what data and tools to avoid reading “data analysis” use to look beyond the next week’s work to make strategic calls.