4.3 RecStore Extensibility
4.4.4 Update + Query Workload
This experiment uses our real recommender system workload trace (described in Sec- tion 4.4.1) to test comprehensive update and query processing performance. Fig- ures 4.7(a) and 4.8(a) give the results of both query and updates for the cosine and
44 0 10 20 30 40 50 60 70 80 90 query updates Time (sec) matall
viewmat ionlypm-m pm-miviewreg
(a) Cosine 0 1 2 3 4 5 6 7 query updates Time (sec) matall viewmat ionlypm-m (b) Cosine Zoomed
Fig. 4.7: Real Workload
0 10 20 30 40 50 60 70 query updates Time (sec) matall
viewmat ionlypm-m pm-miviewreg
(a) Probabilistic 0 1 2 3 4 5 6 7 8 query updates Time (sec) matall viewmat ionlypm-m (b) Probabilistic Zoomed
Fig. 4.8: Real Workload
probabilistic algorithms, respectively. Both viewreg and pm-mi exhibit poor query
processing performance for the workload, with viewreg performing almost an order magnitude worse than other approaches. While the viewreg performance is expected, the pm-mi performance is more surprising. Both viewreg and pm-mi exhibit the best update performance out of all approaches, as confirmed by our previous experiments (Section 4.4.2). However, the query processing performance of viewreg makes it an unattractive alternative, while the update/query processing tradeoff for pm-mi is a borderline choice due to its high query processing penalty. Figures 4.7(b) and 4.8(b) remove the viewreg and pm-mi numbers to zoom in on the other approaches for the co-
exhibit the same query processing performance that is superior to ionly and pm-m . As for updates, we note again that matall (RecStore ) provides more efficient update performance over viewmat (materialized views). Meanwhile, pm-m and ionly show superior update performance to matall and viewmat, with ionly providing the best performance.
In this experiment, we can observe the update/query processing trade-off discussed in Section 4.2.2 for high values of α and β (matall ) compared to lower values of α and β (ionly and pm-mi ). Thus, for slightly more update-heavy recommender systems, the ionly or pm-mi is preferable due to efficient updates with little query processing penalty. Meanwhile, for more query-heavy systems, the matall approach is preferable with tolerable update penalty.
Chapter 5
Recommendation Query
Processing and Optimization
A main challenge is how to efficiently execute queries over an initialized recommender. One solution is to implement a Stored Procedure that performs the recommendation query functionality over the initialized recommender. However, this approach is very limiting since it does not provide much flexibility for the database engine to optimize incoming queries. This section discusses RecQuEx a RecDB module built inside the database query execution engine for internal processing of recommender queries. RecQuEx encapsulates the recommendation functionality into a new family of query operators, termed Recommend. Being a query operator allows the recommendation functionality to be a part of a larger query plan that includes other query operators, e.g., selection, projection, and join. It also means that the recommendation functionality will be treated as a first class citizen operation, allowing a myriad of query optimization techniques that can be integrated to speed up recommendation queries.
To further reduce the recommendation application latency, RecQuEx pre-computes the predicted rating scores and caches the corresponding {user,item,rating} entries inside the database system. Storing and Maintaining the predicted rating for all {user,item,rating} entries may preclude system scalability. Hence, RecQuEx adap- tively decides, based on the query/update workload, which entries to maintain with the main goal to reduce the overall recommender storage and maintenance overheard
(a) Item-Item Similarity Matrix
Item ItemNeighbors
‘Spartacus’ {h‘Spartacus’,0.5i;...}
‘Inception’ {h‘Spartacus’,0.5i;h‘The Matrix’,1i;...} ‘The Matrix’ {h‘Inception’,1i;h‘Spartacus’,0.2i;...}
(b) User-User Similarity Matrix
User UserNeighbors
‘Alice’ {hBob,0.5i;hCarol,0.5i;...} ‘Bob’ {hAlice,0.5i;hEve,1i;...} ‘Carol’ {hAlice,1i;hEve,0.2i;...} ‘Eve’ {hBob,0.2i;hCarol,0.2i;...}
Fig. 5.1: Item-Item (User-User) Collaborative Filtering Model
without compromising the recommendation generation performance.
Experiments, based on actual system implementation inside PostgreSQL, using real data extracted from MovieLens [54] (Movie Recommendation Application) and Foursquare [55] (Point-of-Interest Recommendation Application), show that RecQuEx exhibits high performance for large-scale recommendation scenarios.
In summary, this chapter introduces the following: (1) RecQuEx – a unified ap- proach for processing recommendation requests inside the database engine. (2) A family of novel query operators, called Recommend, that realize the recommendation algo- rithms inside the database query processor (Section 5.1). (3) Recommendation-aware operators that further optimize the recommendation operation with other operators (se-
lection, join, and ranking) (Sections 5.3.1 and 5.3.2). (4) A materialization scheme that
caches the pre-computed predicted rating scores to further reduce the recommendation latency (Section 5.3.3).
5.1
Recommendation Operators
RecQuEx employs three main versions of the Recommend operator; one for each recommendation algorithm. Each operator take as input a ratings table and return a set of tuples S such that each tuple s ∈ S; s =huid,iid,ratingvali represents a predicted rating score ratingval for each item iid unseen by user uid. This section first describes
48 Algorithm 1 ItemCF-Recommend
1: /* load User Vector Table block by block in Memory */
2: for each user u ∈ UserVector do
3: UserItems ← List of User u rated items in ItemNeighborhood 4: /* load Item Neighborhood Table block by block in Memory */
5: for each item i ∈ ItemNeighborhood do
6: ItemNeighbors ← List of item similar to item i in ItemNeighborhood 7: if item i ∈ UserItems then
8: ru,i ← Rating that u gave to i
9: else
10: CandItems ← ItemNeighbors ∩ UserItems
11: if CandItems not equal φ then
12: ru,i ← Predict(u,i, UserItems, ItemNeighbors)
13: else
14: ru,i ← 0
15: EMIT hu, i, ru,ii
the recommendation operators and then explains how they are integrated in the SQL query pipeline.