The ideal assistance bioassays for the target bioassay Bi are expected to provide
auxiliary information, carried out by consequential assistance compounds, with which Bi’s ranking model RMican be improved. A key question here is what such auxiliary
information could be in the context of ranking. In active learning for classification, auxiliary information could be additional strong positive/negative signals that help bias the classification boundary toward the right direction. In the context of regres- sion, auxiliary information could be additional data samples that help better reveal the underlying data distribution. Unfortunately, such options from classification and regression do not directly apply for ranking as ranking focuses on the ordinal relations across multiple instances. Thus, we expect that auxiliary information from assistance bioassays could be the information that helps strengthen, remedy or reconstruct the desired reference/ordinal structures among the compounds in the target bioassay. Furthermore, assistance bioassays should be the ones that sufficiently exhibit such information.
In order to identify sensible assistance bioassays, we need quantitative measure- ments to evaluate how much auxiliary information each candidate bioassay carries. However, it is non-trivial to quantify such information content and volume. Instead, we surrogate them by the similarity between the target bioassay and a candidate as- sistance bioassay in terms of their ordinal structures, under the hypothesis that if two bioassays are significantly similar in their ordinal structures, one of them carries aux- iliary information for the other. In specific, we develop the similarities by comparing the ordinal structures from the following two aspects:
• How the target bioassay Bi’s model RMiperforms on the candidate assistance bioas-
say. This method represents an indirect comparison of the ordinal structures; and • How the target bioassay and the candidate assistance bioassay are similar in their compounds and compound rankings. This is a direct comparison of the ordinal structures.
These two similarities lead to the following two assistance bioassay selection schemes: cross-ranking based assistance bioassay selection, and profiling based assistance bioas- say selection, respectively. From all the candidate bioassays, we select the assistance bioassays into a set denoted as Bi+, where all the assistance bioassays have the re- spective bioassay similarities that fall in 98 percentile of all the bioassay similarities.
2.4.1 Cross-Ranking based Bioassay Similarities
The first bioassay similarity measurement is inspired by our previous work that has been applied for target fishing [52]. The idea is, for the target bioassay Bi, if its ranking
model RMi performs well on another bioassay Bj, then Biand Bj are similar in terms
of their ranking structures. The underlying assumption is that model RMi captures
and models the signals from Bi’s compound ranking, and the good performance of
RMion Bj indicates that such signals align well with those from Bj’s ranking. Under
this assumption, the problem further boils down to measuring the performance of RMi on Bj. Such cross-ranking based assistance bioassay selection scheme is denoted
as bSimx.
To measure the performance of RMi on Bj, we devise the follow two approaches:
1). the first approach, denoted as bSimci
x , relies on a standard ranking evaluation
metric; and 2). the second approach, denoted as bSimal
x , utilizes sequence-alignment
based ranking comparison. Note that in bSimxwe only use RMion Bjin order to select
assistance bioassays to improve RMi. We don’t use RMj on Bi for RMi improvement
purposes because in addition to RMi, it requires the availability of Bj’s model RMj,
Concordance Indexing for bSimx (bSimcix )
We first use concordance index (CI; will be discussed later in Section 2.7.2) to evaluate the ranking performance of RMi on Bj. In this case, RMi ranks Bj into a
ranking list ˜ri→j, and CI is then calculated on ˜ri→j with respect to Bj’s true ranking
list rj. The higher the CI is, the better the ranking model RMi can predict the
ranking relations in Bj, and thus the more similar Bi and Bj are. Please note that
the similarities calculated from bSimci
x are not necessarily symmetric because the CI
calculated from ˜ri→j (i.e., ranking that RMi produces for Bj) and ri is not necessarily
the same as the CI calculated from ˜rj→i (i.e., ranking that RMj produces for Bi) and
rj.
Ranking Alignment for bSimx (bSimalx )
The concordance index CI measures the entirety of the ranking structures. How- ever, it is possible that only a certain portion of the ranking structures in Bj will
help, while CI cannot indicate such scenarios. Thus, we develop an alignment based ranking performance measurement bSimal
x (details on ranking list alignment will be
discussed later in Section 2.7.3). The key idea of bSimal
x is to identify locally con-
served ranking structures among ˜ri→j and rj. If the alignment reveals strong block
structures between ˜ri→j and rj, it indicates that RMi is able to reproduce a certain
chunk of orderings in rj, which would be considered for auxiliary information.
2.4.2 Profiling based Bioassay Similarity
The second bioassay similarity measurement is based on the comparison of com- pound profiles of two bioassays without modeling any of them. If two bioassays have similar rankings over similar compounds, we consider them as similar and hypothesize that they carry useful information that can be utilized to assist each other. Under this hypothesis, the problem can be casted to that, for bioassay Bi and Bj, we compare
the two ranking lists ri and rj. We develop the following two approaches for ranking
list comparison: 1). the first approach, denoted as bSimal
p , compares two ranking
lists riand rj using alignment; and 2). the second approach, denoted as bSimcsp , com-
pares two sets of compounds Ciand Cjregardless of ranking structures. The approach
bSimcsp is for approach comparison purposes and to make the study complete.
Ranking Alignment for bSimp (bSimalp )
The key idea in profiling-based ranking alignment approach bSimal
p is very similar
to that of bSimal
x , that is, to measure how similar two rankings are. In specific, we
look at to what extent similar compounds are ranked in similar orders. However, in bSimalp , instead of aligning ˜ri→j and rj as in bSimalx , we align ri and rj and use the
alignment to measure the similarity between Bi and Bj.
Compound Similarities for bSimp (bSimcsp )
In bSimcs
p , we compare Bi and Bj by looking at how similar their compounds are,
and thus, the similarity between Biand Bjis calculated as the average compound sim-
ilarities (Compound similarity will be discussed later in Section 2.7.1). This approach ignores the ranking ordering among the compounds.