In this section, we consider an approach to collaborative learning based on MIRA. For each user, we can augment his or her feedback (expressed as linear constraints fed into the MIRA algorithm) with feedback from others. From this we will learn a more general weight assignment for the user. The benefit of this approach is that the basic setup of the learning problem remains essentially the same, making the strategy likely to have high effectiveness in learning scores.
To start, we can consider how to combine the constraints expressed byall users, and to find an optimal assignment of weight values. This idea encounters difficulties when there is inconsistency among different users’ feedback. We can resolve this with the following intuition. For each given user u, feedback from other users should have different degrees of influence, based on how consistent their standards are with those of u. Feedback from
4.2. CONSTRAINT-BASED COLLABORATIVE FILTERING 63
“similar” users sharing consistent standards should be weighted more than feedback from highly incompatible users.
This requires that we develop a measure of how compatible one user’s feedback is with that of another user. For a given user u1, we take each unit of feedback (linear constraint)
from some other user u2 and assign it a “penalty”. This penalty indicates the significance
of violation of the constraint, and is based on a similarity measure between u1 and u2’s
overall query and feedback patterns.
We use such penalty values (Section 4.2.1) to adapt the MIRA algorithm to directly encode the penalty of each constraint in its loss function (Section 4.2.2). Since MIRA is an online approximation approach with provable upper bounded cumulative loss, this adapted version will overall reduce cumulative weighted loss in an online setting. In the remainder of this section, we explain the details and describe how to combine the approaches in our QSystem.
4.2.1 Weighting Constraints by User Similarity
As described above, given constraints from the full user community, it will often be im- possible to find a set of compatible feature weights. In contrast to “hard” constraints as described above,weighted constraints [64] allow us to violate some requirements, with some penalty. In our setting, the more “similar” two users are, the more impact each user’s feedback should have on the other’s. We can map this problem of computing edge costs for a given user as a weighted constraint satisfaction problem as follows.
For a given useru1, consider each linear constraintCprovided by any useru2. Associate
this constraint with a “penalty” or weight — indicating its significance of violation — obtained from similarity measure between u1 and u2:
Penu1(C) = Sim(u
1, u2), (4.1)
where Sim(u1, u2) is a similarity measure between two users. In ourQSystem, we use cosine
similarity between two users’ weight vectors, except that for each user’s own constraints, their penalty values are taken to be infinite (a hard constraint) rather than 1.
64 CHAPTER 4. COLLABORATIVE LEARNING IN THE Q SYSTEM
4.2.2 Weighting the MIRA Loss Function
One naive method to find optimal weight assignments for all users, given a network of weighted constraints, is to aggressively minimize the total weight of constraints that are violated. However, this approach has several shortcomings. First, the weighted constraint satisfaction problem is NP-hard in the number of constraints. When a large amount of joint feedback is available, the computation becomes intractable. Second, solutions may evolve dramatically when constraints are added. There is no guarantee that the changes to solutions are indeed beneficial. By contrast, the original MIRA algorithm formulation in Algorithm 1 enables “incremental” (thus efficient) weight updates that ensures stability: each new solution needs to be close to its previous version. It also has provable bound on cumulative loss incurred [20].
We wish to use this notion of weighted constraint but still preserve the benefits of MIRA. To achieve this, we directly incorporate the constraint penalty into the loss function. Given a linear constraintC involving two treesT and T0, we reformulate the loss function for user u as follows
Lu(T, T0) = Penu(C)(|E(T)\E(T0)|+|E(T0)\E(T)|), (4.2)
where Penu(C) denotes the penalty value of a constraint as defined in Equation 4.1. The user’s own constraint receives penalty value 1, which is consistent with the original loss function of Section 2.3.3.
The benefit of the loss function is clear once we see the formula for incremental updating weight vector in MIRA. In fact,
−−−→
wt+1=−w→t+Lu× −→v , (4.3)
where the new weight vector is updated by adding a “delta” vector −→v (obtained from feedback) multiplied by a scalar. The length of this “delta” vector is proportional to the updated loss function Lu in Equation 4.2. Clearly, large constraint penalty forces the learning algorithm to aggressively update the weight vector.
We outline below how we incorporate this method in the Q System.
1. The online learner (MIRA) is triggered as usual: it is applied immediately whenever feedback is received. The user’s own weight vector will be updated with new feature