Partial-Order Based Approach - Partial Order Based Active Caching Solution

4.2 Partial Order Based Active Caching Solution

4.2.3 Partial-Order Based Approach

The proposed method for active caching of recommender system results is described in detail below. The method does not make use of the actual distance values, instead rely only on ranking information of query result lists. This approach uses partial ordered list characteristics to compute the answer for neighboring queries. In order to assess the impact of using rank information instead of distance information for active caching, study also describes and implements a distance-based variant of this method.

Let S be a dataset drawn from some domain D and l is a ranked list of objects similar to the query object returned by a recommender system for S. Let rank(l, v) denote the rank of the object v in the list l. A preference (or dominance) relation over l is a binary relation

is preferable to v2, or that v1 dominates v2. When extended to the full domain D, the

preference relation constitutes a partial order on D. If for v1, v2 ∈ D neither v1 ≻ v2 nor

v2 ≻ v1hold, then v1and v2 are incomparable, written v1 ∼ v2.

Given a query object v for which no result is cached, The active cache method first generates two partial orderings from each cached query result list containing v as shown in Figure 4.2:

• the suffix list suff (l), defined as the sublist of l consisting of items with ranks strictly

higher than rank(l, v); and

• the prefix list pref (l), defined as the sublist of l consisting of items with ranks strictly

less than rank(l, v), taken in reverse order.

With respect to these two partial orderings, define the rank rankv(l, u) of object u ∈ l

with respect to v to be the rank that u holds in either pref (l) or suff (l) — note that u cannot simultaneously be contained in both. More precisely, this rank is defined to be the difference

rankv(l, u)=△ |rank(l, u) − rank(l, v)|.

1 2 3 4 5 6 7 8 9 10 Top K ForwardList for object V returned by recommender system

Query Object

Relative ranked list for object V5generated from ForwardList O1

4 3 2 1 6 7 8 9 10

Prefix list Suffix list

Figure 4.2 Structure showing the example extraction of prefix and suffix lists for object

The objective of active caching is to synthesize an answer for a given query from other query result lists available in the cache. Assume l is a cached list with rankv(l, v1) >

rankv(l, v2), for some pair of objects v1, v2 ∈ D. If for a top-k query-by-example Q(v, k)

an active cache method generates a ranked result list l′ containing both v1 and v2, then

ideally it would expect the ranks of these objects in the synthesized result to satisfy

rank(l′, v1) > rank(l′, v2). In this limited sense, the active cache function would be

‘monotonic’ with respect to l, in that the ordering of the objects v1 and v2 would be

preserved. However, note that it is in general impossible for any aggregation function to be monotonic with respect to all cached lists. Monotonicity applies only to the lists selected from the cache to contribute to the answer and is only used in the aggregation algorithm to determine the ranks of the object v1 and v2. For recommendation systems,

monotonicity makes sense because most objects in a list will appear in each others lists. The active caching method proposed in this dissertation resolves conflicts in the partial order information by aggregating the rank information across all suffix lists and prefix lists available from the cache. The aggregation can be performed with respect to such standard operations as min, max, avg and skyline.

Algorithm Query

Input:query object v, result size k;

Output:ranked top-k query-by-example result list Result.

1. Initialization:

(a) Assign list L← Q−1_C (v). L refers to the cached forward lists containing object

(b) Initialize result object candidate set candset ← ∅, and final result object set

Result← ∅.

(a) For all objects u ∈ l do:

i. If u /∈ candset then insert u into candset, and initialize rank value multiset

rankset(u)← ∅.

ii. Insert an instance of the rank value rankv(l, u) into the multiset rankset(u).

3. For all objects u∈ candset do:

(a) Generate a score s for u by aggregating the rank values stored in rankset(u) using the chosen aggregation function (for example, max, min, avg, etc.). (b) Insert the object-score pair (u, s) into the result list Result.

4. Sort the entries in the list Result according to their score values, and return the objects of top k object-score pairs. Ties can be broken arbitrarily, with the exception that v is given priority over any other object w ̸= v in D.

If the object domain D is embeddable in a metric space M with distance metric d, and these distance values are readily computable, a distance-based variant of the proposed query algorithm is possible: each instance of the rank rankv(l, u) can simply be replaced by

the distance value d(v, u). The use of distance values in place of rank values can reasonably be expected to lead to better performances in practice; however, as has been previously noted, a distance-based formulation may not always be possible.

4.2.4 Cache Replacement Policy

In traditional setting, a cache stores and processes queries that are requested repeatedly. In a case of cache hit when result for a requested query is available in the cache: the cost of answering the query is very little. Contrarily, if the cache does not contain the result of a requested query, the result is fetched from the disk at a much higher cost. At this point the answer for this new query can be kept in the cache to answer future requests, but since the cache size is limited, answer for another query must be removed from the cache. A cache

replacement policy is a decision process that helps in retaining right objects in the cache to achieve the goal of answering as many requests as possible. Zipf’s law and temporal locality principles help to make these decisions where least recently used, least frequently used and other replacement policies apply these principles to select the items for evictions. This process will automatically load the popular content in the cache after certain period of time. Selection of proper replacement method depends on the type of dataset and the traffic characteristics.

Traditional caching may not be effective in applications such as recommender systems where requests are unlikely to be repeated. For example, it is highly unlikely that a user asks for recommendations for an object again and again and hence, in that case almost none of the requests would hit an item already in a cache. In such case cache replacement policies could be very expensive and not much helpful in reducing the server load. However, a static cache, a cache without any replacement of the content, which is able to answer most of the queries could be more appropriate for these types of applications. Active caching approach in this dissertation also follows this approach where data is loaded in the cache only once and this data helps to answer cached as well as non-cached queries posed by the users and high cache hit rates can be achieved with out any replacement of the content.

4.3 Evaluation

As discussed earlier, existing active caching methods developed for boolean queries cannot in general be applied to handle recommender system queries, and thus a direct comparison between these methods and the proposed technique is not possible. For this reason, to evaluate the active caching approach for recommender system queries, this study instead compare their performance against those of traditional passive caching strategies in terms of four measures, the hit rate, the byte hit rate, the recall and the execution cost. Use of both hit rate and byte hit rate is due to the fact that the proposed solution requires more

space than the traditional caching approach due to the storage of inverted lists alongside standard result lists.

Partial order based active caching approach in general can be used with any nearest neighbor application. In order to test the performance of out proposed approach, different types of datasets have been selected . First, this study focuses on testing this approach with the two major categories of recommender systems i.e., content based and collaborative filtering. Second, it also focuses on other nearest neighbor applications e.g., image retrieval etc. Third, similarity functions used in nearest neighbor applications either result in a metric space e.g. Euclidean distance, or non-metric measures like cosine similarity. Datasets selected for evaluation in this study cover both content based and collaborative filtering recommendation techniques. Furthermore in these datasets similarity measure is also manipulated and both metric as well as non-metric similarity measures are used. Finally, to test nearest neighbor applications from various domains, datasets for digital libraries, images, network, intrusions, geographical data and rating data are used. Each of these dataset is detailed below.

In document Active caching for recommender systems (Page 103-108)