Contributions and Implications - Active caching for recommender systems

This study contributes to recommender system as well as caching domains. It provides a caching strategy that is specifically designed for recommender system queries however, it works with any application which uses top-k similarity queries (also known as k-nearest- neighbor, or k-NN queries). This caching strategy, active cache mechanism, can answer not only queries that exactly match the queries in the cache, but also act as a limited query processor by computing results for non-cached query items using information cached for other items. The main contributions of this study are:

The main contribution of the active caching approach is to provide a mechanism which assists in answering not only queries that exactly match the queries in the cache but also estimating answers for non-cached queries thus using cache in a limited query processor role. This approach does not assume any knowledge of the methods or similarity measures used, and as such can be applied even when non-metric and probabilistic approaches are used to produce query results. Whereas the conventional approach is to fill the cache with those items most likely to be requested in future queries, active caching can instead support a form of data interpolation, in which the cache is selected so as to provide uniform coverage of the data set from which most if not all query results are actively generated. For some applications, it may even suffice to answer all similarity queries actively, without ever referring to the original data. Active caching could thus serve as a scalability technique, as it provides the basis of space- and time-efficient approximation of large databases [105].

The main contribution of the shared neighbor approach is to facilitate the design of shared-neighbor ranking formulae for active caching that allow for variation of parameters. The ranking function can correct for bias relating to variations in such quantities as the size of the cache, the length of ranked lists stored in the cache, and the number of items requested by the query, all without any knowledge of the actual similarity values [49].

The main contribution of the greedy balancing cache selection policy is that it balances the size of the inverted cache lists through reduction in variance of the lengths of these lists, thereby balancing the frequency of appearance of objects in the cached top-k neighbor lists. By achieving a better inverted list balance, it provides a better uniform coverage of the query range, and increases the spatial locality from which most if not all query results can be actively generated. CES-GB provides significant improvement in the hit rate and average recall for small caches. Since the size of cache memory is usually much smaller than the total dataset size, this approach can have a great practical impact. Even for small caches, CES-GB may be sufficient to answer all queries actively, without ever referring to

the original dataset. This form of active caching therefore has the potential to serve as a scalability technique. With the explosive growth of data repositories and the popularity of similarity-based applications, the CES-GB approach opens doors for new forms of indices based on data sampling [45].

7.2.2 Implications

This study has several implications in various research fields. These implications are possible by applying the proposed caching solution with other query types. Furthermore, proposed methods can also be adopted in other areas of research.

Active caching approach presented in this work is primarily designed to work with recommender systems and showed very strong overall performance for recommender systems. This approach can also work with any application which uses top-k similarity queries (also known as k-nearest-neighbor, or k-NN queries). As such this approach can be easily and effectively used with similar other applications like contextual advertising, image retrieval etc. which use top-k similarity queries. Another possible implication of active caching approach is to modify the solution so that it can work with other types of queries. Modifying and using this approach with keyword queries can significantly improve the performance of applications like search engines, digital libraries etc. Another implication of this work is possible by using this approach with boolean queries which can make database caching an effective approach to achieve high scalability and performance.

The main contribution of the shared neighbor approach is to facilitate the design of shared-neighbor ranking formulae for active caching. Shared-neighbor similarity measure assess the statistical significance of the relationship between objects based on their shared neighborhood. This concept provides new directions in various domains and can help to introduce new approaches based on shared-neighbor information. A possible implication of shared-neighbor approach is the development of a new type of recommender system

which will be based on shared-neighbor information. This type of recommender system can deduce from rich sources of relationships, text, images, media etc. using shared-neighbor information and provide effective cross-genre recommendations.

Greedy balancing approach introduced in this work successfully provides a better uniform coverage of the dataset. This approach has great implications in the areas of data summarization and data sampling. Greedy balancing approach can help in computing data summarization in very large multi-dimensional datasets like data warehouses which otherwise require a very powerful and time consuming operations. GB approach also opens doors for new forms of indices based on data sampling where a better uniform coverage of the dataset can make these indices much more effective.

7.3 Limitations and Future Work

In document Active caching for recommender systems (Page 195-198)