Comparison with Other Systems - Towards Recommender Engineering: tools and experiments for iden

To designate a component for pre-instantiation and sharing, the algorithm developer annotates it with the @Shareable annotation. Components with this annotation must be thread-safe and should generally be Serializable. LensKit will pre-instantiate and reuse such a component if and only if all of its dependencies are also shareable. This analysis means that if a shareable component is conﬁgured so that one of its dependencies that is generally shareable no longer is, it will automatically be downgraded to a non-shared component without the developer needing to do any checking or enforcement.

LensKit also provides a @Transient annotation for dependencies to indicate that a particular dependency should not be considered when determining a components shareability. If a component marks one of its dependencies as transient, it is promising that the dependency will only be used to build the object, and the built object will not retain a reference to it. For example, the item-item model builder’s dependency on the data source is marked as transient, since it uses the data source to build the model but the ﬁnal model is independent of it.

The ﬁnal result of these manipulations is that each web request instantiates a set of lightweight objects that combine the current connection with heavyweight recommender components to provide the recommendation services of the rest of the application. We have successfully integrated this architecture with multiple web applications that are currently used in production.

3.10 Comparison with Other Systems

There are many other recommendation toolkits available, commercial, freeware, and open- source; section 2.4 listed some of them. Several of the open-source oﬀerings now seem to be inactive (COFI, jCOLIBRI), some are focused on particular recommendation techniques

3.10. Comparison with Other Systems

Feature LensKit Apache Mahout MyMediaLite

Platform Java Java C#/.NET

User-user CF Yes Yes Yes

Item-item CF Yes Yes Yes

Matrix factorization CF FunkSVD Yes Many

Distributed algorithms No Yes No

Visualization of conﬁgurations Yes No No

Algo-independent lifecycle separation Yes No No

Rating data support Yes Yes Yes

Implicit feedback support Partiala _Yes _Yes

Distinct data normalizations Yes No No

Oﬄine evaluation Yes Yes Yes

Reuses shared components in eval Yes Nob _No

a_{Finishing this is a high-priority project.}

b_{Common component reuse may be achievable manually.}

Table 3.1: Comparison of recommender toolkits

(myCBR), and others are focused more on providing recommendation services in applications (EasyREC, PredictionIO) than on supporting research and cutting-edge recommender system development or on particular integrations (e.g. RecDB [SAM13], providing recommender services within a database).

LensKit’s most direct competitors are Apache Mahout and MyMediaLite. Apache Ma- hout is a machine learning library with support for many diﬀerent algorithms, including several recommendation algorithms; it has extensive support for distributed computing [SBM12; Sch+13]. MyMediaLite [Gan+11] is a recommendation toolkit for the .NET platform (with good support for non-Windows systems via Mono) that has a particular focus on providing state-of-the-art rating prediction and item recommendation algorithms.

LensKit sets itself apart with its extensive support for research activities and its support infrastructure for connecting algorithms to evaluators and running applications. While we are playing catch-up in some areas, particularly advanced matrix factorization algorithms

3.10. Comparison with Other Systems

and implicit feedback support, the algorithms and evaluations LensKit does are signiﬁcantly more ﬂexible.

As discussed in section 3.7, LensKit algorithms are built from many discrete pieces that can be replaced and recombined. This allows for extensive experimentation with distinct choices for similarity functions, data normalization methods, neighbor selection algorithms, etc., with very few limits on how they can be combined. Apache Mahout provides some configurability of its algorithms — the item similarity function can be replaced, for instance — but has relatively few configuration points; as near as we can tell, data normalization needs to be built in to either the data model (so the algorithm sees normalized data) or into each algorithm component itself. MyMediaLite supports reconfiguring algorithms via subclassing.

LensKit’s evaluator is more flexible than either Mahout’s or MyMediaLite’s. Both Ma- hout and MyMediaLite support measuring an algorithm’s performance on prediction accu- racy or top-𝑁 metrics, but provide either a command line or a Java programmatic interface. With Mahout, the programmer must provide recommender builders that build testable recommenders. LensKit’s ability to represent and analyze algorithms as entities allow it to train and evaluate algorithms using the same mechanisms used to load algorithm models for running applications, and basic evaluations read in a declarative fashion (evaluate of X algorithms, Y data sets, with Z metrics). LensKit’s evaluator will also analyze the tested configurations to automatically determine components that can be trained once and shared between multiple configurations, providing a dramatic decrease in the cost of operations such as finding the best neighborhood size without requiring any additional effort from the programmer or researcher. LensKit also provides minimal entry points for new evaluation components such as metrics, and can run arbitrarily many metrics in a single evaluation pass; Mahout provides base classes to simplify writing new metrics, but the class embody-

3.10. Comparison with Other Systems

ing a metric (or suite of metrics) drives the evaluation.

LensKit also has advantages for building applications around the recommender. Its support for separating pre-built and runtime components mean that the recommender integrator does not need to worry about what components can be precomputed and shared between requests, and what components do not (unless they need to debug a configuration that is not precomputing enough data): given an algorithm configuration, LensKit can instantiate the pre-computable portion, save it to disk, and instantiate the needed runtime components. All of this is in a configuration-independent fashion, so the recommender for an application can be changed simply by replacing its algorithm configuration file.

A key enabler of LensKit’s lifecycle separation — as well as some of its evaluation optimizations — is that it treats algorithm speciﬁcations as objects that can be manipulated and analyzed. It can perform operations on a recommender algorithm or conﬁguration itself, not just the models and components that comprise it.

Finally, LensKit is built from a somewhat different philosophy. As we see it, MyMedi- aLite and Mahout’s APIs are structured around the idea that ‘here is a recommender algorithm, connect it to your data and use it’, with some options for configuration. LensKit is structured around a large collection of pieces that can be wired together to make a recommender, and a set of defaults and example configurations to put them together into common types of recommenders. In addition to affecting the design of algorithms, this also manifests in the public API: different recommendation services are provided by different components, and not all configurations will necessarily provide all services.

Both philosophies have advantages and disadvantages. It is currently easier to take Ma- hout or MyMediaLite off the shelf and get recommendations from it than it currently is with LensKit, but once LensKit is running it provides more built-in reconfigurability and flexibility in its algorithm components. It is possible to adapt Mahout or MyMediaLite

In document Towards Recommender Engineering: tools and experiments for identifying recommender differences (Page 85-89)