Data Model - Towards Recommender Engineering: tools and experiments for identifying recommender

for sanitizing scores to be interpretable as ratings in one place (the default RatingPredictor implementation), reducing code duplication. Third, it allows alternative strategies for map- ping scores to predicted ratings, such as OrdRec [KS11], to be easily swapped in and used on top of LensKit’s existing item scoring capabilities.

The item recommender interface provides lists of recommendations for a particular user in the system. The application using it provides a user ID, the desired number of recom- mendations 𝑛, and optionally an candidate set 𝐶 and/or an exclude set 𝐸 of item IDs to con- strain the recommendations. The recommender will return up to 𝑛 recommendations from 𝐶\𝐸. If unspeciﬁed, 𝐶 defaults to all recommendable items and 𝐸 defaults to the items the user has rated or purchased (although individual item recommender implementations may change these defaults). These sets allow the application to use LensKit in situations such as recommending from among the items in one particular category or matching some search query.

LensKit also exposes an interface GlobalItemRecommender (and an associated Glob- alItemScorer for ‘global’ (non-personalized) recommendation that does not take the user into account, but operates with respect to zero or more items. Applications can use it to implement a ‘similar items’ feature or to provide recommendations based on the contents of a shopping basket.

Listing 3.3 lists the core methods exposed by several of the interfaces in the LensKit API. Section 3.7 describes many of implementations LensKit provides of these interfaces.

3.5 Data Model

LensKit recommenders need a means of accessing and representing the data — ratings, purchases, item metadata, etc. — from which they are to compute recommendations. To

3.5. Data Model

public interface ItemScorer {

/**

* Compute scores for several items for a user. */

SparseVector score(long user, Collection<Long> items);

}

public interface ItemRecommender {

/**

* Recommend up to `count' items for a user. Only items * in `candidates' but not in `excludes' are considered. */

List<ScoredId> recommend(long user, int count, Set<Long> candidates, Set<Long> excludes);

}

public interface GlobalItemRecommender {

/**

* Recommend up to `count' items related to a selected * set of items. Only items in `candidates' but not in * `excludes' are considered.

List<ScoredId> recommend(Set<Long> items, int count, Set<Long> candidates, Set<Long> excludes);

}

Listing 3.3: Simpliﬁed LensKit interfaces.

support this in a general fashion, extensible to many types of data, LensKit defines the concepts of users, items, and events. This design is sufficiently flexible to allow LensKit to work with explicit ratings,implicit preference extractable from behavioral data, and other types of information in a unified fashion.

3.5. Data Model

assumptions about the range or distribution of user and item identifiers, nor does it require users and items to have disjoint sets of identifiers. The only constraint it places upon the users and items in the data it interacts with is that they can be represented with numeric IDs. An event is some type of interaction between a user and an item, optionally with a timestamp. Each type of event is represented by a different Java interface extending Event. Since ratings are such a common type of data for recommender input, we provide a Rating event type that represents a user articulating a preference for an item.5 _{A rating can also} have a null preference, representing the user removing their rating for an item. Multiple ratings can appear for the same user-item pair, as in the case of a system that keeps a user’s rating history; in this case, the system must associate timestamps with rating events, so that the most recent rating can be identified.

Recommender components access the user, item, and event data through data access objects (DAOs). Applications embedding LensKit can implement the DAO interfaces in terms of their underlying data store using whatever technology they wish — raw ﬁles, JDBC, Hibernate, MongoDB, or any other data access technology. LensKit also provides basic implementations of these interfaces that read from delimited text ﬁles or generic databases via JDBC, and implement more sophisticated functionality by caching the events in in- memory data structures.

The methods these interfaces define come in two flavors. Basic data access methods, prefixed with get (such as getEventsForItem(long)), retrieve data and return it in a stan-

dard Java data structure such as a list (or a LensKit-speciﬁc extension of such a structure). Streaming methods, preﬁxed withstream, return a cursor of items; cursors allow client code

to process objects (usually events) one at a time without reading them all into memory, and 5_{LensKit does not yet provide implementations of other event types, but it is one of our high-priority} tasks.

3.5. Data Model

release any underlying database or ﬁle resource once processing is completed or abandoned. The standard LensKit DAO interfaces are:

EventDAO The base DAO interface, providing access to a stream of events. Its only methods are to stream all events in the database, optionally sorting them or ﬁltering them by type.

ItemEventDAO An interface providing access to events organized by item. With this interface, a component can retrieve the events associated with a particular item, optionally ﬁltering them by type. It can also stream all events in the database grouped by item. UserEventDAO Like ItemEventDAO, but organized by user.

ItemDAO An interface providing access to items. The base interface provides access to the set of all item IDs in the system.

UserDAO An interface providing access to users. Like ItemDAO, it provides access to the set of all user IDs in the system.

An application that augments LensKit with components needing additional information, such as user or item metadata for a content-based recommender, will augment these interfaces with additional interfaces (possibly extending the LensKit-provided ones) to provide access to any relevant data. We have done this ourselves when embedding LensKit in an application or using it for an experiment; for example, in teaching our recommender systems MOOC, we extended ItemDAO with methods to get the tags for a movie to allow students to build a tag-based recommender in LensKit.

Early versions of LensKit had a single DataAccessObject interface that was handled specially by the conﬁguration infrastructure; it was possible to extend this interface to pro-

In document Towards Recommender Engineering: tools and experiments for identifying recommender differences (Page 47-51)