3.6 Active learning with soft-label information
3.6.1 A query strategy for active learning with soft-label information
For active learning, the most important question is the query strategy, i.e. how to find the most informative examples that could help us to learn a model faster. In practical applica- tions and research studies, the single most commonly used strategy is uncertainty sampling ( [Lewis and Gale, 1994], [Settles, 2010]). Briefly, the uncertainty strategy selects examples that the current learner is most uncertain about, i.e. examples with predictive scores closest to 0.5 or examples closest to the decision boundary. The intuition for this strategy is that the examples closest to decision boundary would most likely influence the decision if we knew their labels, therefore they are the most informative. Other strategies have been proposed
in the machine learning community (Section 2.2.1), however they were designed for only
either classification (query class label) or regression setting (query continuous values). In our problem, we have both class labels and auxiliary soft-labels, and none of the existing active learning works have specifically addressed this setting.
The query strategy we propose to utilize the soft-label information divides examples into different regions (intervals) based on their predictive scores given by the current learned model, and then selects examples from the regions that have the most discrepancy between predictive scores and soft-labels given by the human annotator. The intuition for this strat-
egy is that if in some region, the model and the human annotator do not agree on the labeling of examples, then acquiring more labels in that region would help us to resolve the discrep- ancy between the model and the human and get more useful information for learning. In contrast, if the current model and the human annotator already agree on the labeling of examples in a region, then acquiring more labels from that region would not provide more valuable information for learning.
In the following, we give the description and explanation of our strategy.
3.6.1.1 Description of the query strategy Let us consider the following notation:
• LetU denotes the set of unlabeled examples, L the set of labeled examples that is ini-
tially empty andD the complete set of examples such thatD=U∪L.
• X - the feature space.
• XD,XL,XU - feature vectors of all, labeled and unlabeled examples, respectively.
• f :X→[0,1] - the current model that maps inputX to probabilities.
• Ph- probabilistic soft-label given by the human annotator.
• MSE - mean square error estimate.
• ∆E- confidence interval for error estimate.
• Std- standard deviation.
Given this notation, the query strategy with soft-label information is described as fol- lows:
• Step 1. Select randomly an initial small subset of examples from U; query a human
annotator for class and soft labels and add the labeled examples toL.
• Step 2. Train a model f on the current labeled setL.
Model f can be trained using any learning algorithm, e.g. a standard binary classifier,
such as SVM, or a learning method that uses soft-label information, such as SvmAuxOrd (Section3.2.4.2).
• Step 3. Compute f(XL) and f(XU); divide the (0,1) interval equally into mcontinuous
regions with nlabeled examples in each region based on f(XL); distribute all examples
In Figure23 the values of f(XL) and f(XU) are projected onto the horizontal axis and the values of soft-labels Ph(XD) are projected onto the vertical axis. The black points are labeled examples (XL).
• Step 4. For each region i=1..mcomputeMSEi=1n(f(XiL)−Ph(XLi))2 and∆Ei=1.96∗
Std[(f(XLi)−Ph(XiL))2]/pn.
The mean square error term MSEi determines the discrepancy between the predictive
scores f(XLi ) given by the model f and the soft-label estimates Ph(XLi) given by the
human annotatorhin regioni.∆Edetermines the 95% confidence of theMSEestimate.
• Step 5. Select interval i with a probability proportional to MSEi+∆Ei; sample ran- domly an example in the selected interval and query the human annotator for its class
and soft labels; add the labeled example to the labeled set L.
In this step we prefer to select the intervals that have the more discrepancy between the predictive scores f(XL) and the soft-labels Ph(XL). Note that in Figure 23, more preferred intervals, e.g. region 2 and 3 , would have labeled examples spread out further from the line connecting points {0,0} and {1,1}. The intuition for this selection strategy
is that if in some region there is much discrepancy between the learned model f and the
human annotatorh then acquiring more labeled examples in that region would help us
to get more information to resolve this discrepancy, and therefore have a better revision
of the model f in the next learning iteration. In contrast, if in some region the learned
model f and the humanhalready agree on the labeling (e.g. region 4 in Figure23) then
we probably do not want to put more labeling effort to explore that region further.
• Repeat steps 2-5until a stopping criteria is matched.
𝑓(𝑋
𝐷)
Region 2 Region 3 Region 4
Region 1
{0,0}
𝑃
ℎ(𝑋
𝐷)
{0,1}
{1,1}
{1,0}
Figure 23: The query strategy (SLDiscr) that uses soft-label information. The black points are labeled examples. The x-coordinate represents the probability of the example as esti- mated by the current model, while the y-coordinate shows the probability that is assigned to it by a human.