Impact of each transporter label on the global predictive performance

5. Using Multi-label Classification to Explore the Link among the Solute Carriers (SLCs)

5.3.3. Impact of each transporter label on the global predictive performance

To assess the impact of each label on the best CC model and the BR model, each label was isolated and the performance of the remaining multilabel model was calculated using Hamming Loss for the BR model. While the BR architecture allows simulating the full classifier without each of the existing labels, this kind of analysis is not possible for the CC model. The CC model is built in an incremental manner; therefore the only way to test a

label’s impact on the model is by measuring the HL of the chain upon the addition of each

label to the chain. Note that labels that are not in the chain are used in the model anyway, but they are used as non-linked single label components (similar to the BR model).

Figure 5.1(1) shows that, while most labels appear to have similar impact over the global performance of the BR model, removing OCT1 is associated with a larger-than-normal penalty (i.e, increase in Hamming Loss), which indicates that the respective single-label model is contributing with a high predictive performance. On the other hand PEPT1 is contributing with the largest amount of error, as removing it from the multi-label BR model leads to a marked decrease in the HL value (recall this is an error measure). Given these observations alone, PEPT1 would be expected to be one of the labels that potentially benefit the most from being in a multi-label setting that utilises label interaction (i.e. the CC scheme), while OCT1 would be expected to offer support in the modelling of other labels. Both expectations were indeed observed through the marked improvement of the predictive performance of PEPT1 in the CC model compared to the BR model (Tables 5.2 and 5.3), and through the fact that OCT1 occupies the first position in the best CC model (Table 5.2).

The analysis of the label impact on predictive performance in the CC model shows an overall decrease in Hamming Loss as the chain grows (See Figure 5.1(2)). This shows that each new label is modelled with an added level of accuracy when compared with BR single label models. Recall that the set of labels shown in each point of Figure 5.1 is completed with the remainder BR single-labels, so each newly added label is replacing its BR equivalent in the prior iteration. Taking this into account, this multi-label scheme showed to be robust to any noise across the 6-label CC model. The data point corresponding to the

“no interaction chain” refers to a setting where there is no link (or chain) connecting the labels, and they are modelled independently from each other. This showed to be poorer or equivalent is performance to any stage of the construction of the CC model.

Figure 5.1. (1) The impact of each label over the global Hamming Loss of the BR model, computed

on the test set. The impact is measured by calculating the HL of the full multi-label model upon removal of each label. Recall that HL is meant to be minimized. (2) The impact of replacing the single labels from the BR model with the single label components of the CC model with increasing chain length. The impact is measured by calculating the HL of the full multi-label model upon addition of a new label to the chain (rather than using the BR equivalent of the single labels). The order of labels in the chain follows that of the selected CC model. *The term “no interac. chain” refers to the scenario where there are no links between labels (i.e. the BR model). The dashed line connecting the first and second data points in the CC plot conveys the discontinuous nature between the two.

It is worth noting that the significance of previous labels is seen throughout all the models generated (in the exhaustive combinations of chain sequences) and not just in the best model discussed above. As such, an exhaustive analysis of label contribution in a multi- label modelling context showed that previous label predictions were very frequently selected as a descriptor for the modelling of any given transporter, which demonstrates that the value of using transporters as predictors among each other was not an exception found in the best achieved model. This is another evidence supporting the correlation between transporters with respect to their substrate profiles.

Looking into two-label chains where the predicted label #1 is used as the only predictor of the label number #2 can also provide useful information about how labels relate to each other. To this end the first transporter label in each possible 2-label chain was modelled using the optimal conditions (as done for all of the other CC models in this study), and its output was used as the only descriptor to model the following label in the chain using the C4.5 (decision tree) algorithm. From this exercise there are only two possible outcomes: either a one-descriptor tree is produced (with pLabel being the descriptor) or no tree is produced as pLabel is not statistically significant for the partitioning of label #2. This process

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

- OCT1 - 2B1 - 1A2 - PEPT1 - 1B1 - 1B3

H ammi ng Lo ss

1

0.264 0.268 0.268 0.229 _0.229 _0.228 0.19 0.21 0.23 0.25 0.27 0.29 no interac. chain* OCT1 2B1 OCT1 2B1 1A2 OCT1 2B1 1A2 PEPT1 OCT1 2B1 1A2 PEPT1 1B1 OCT1 2B1 1A2 PEPT1 1B1 1B3

2

is more appropriate to test the relationships between the different transporters than running a statistical test of each correlation, given that the latter implies a symmetric (bidirectional) correlation and the former only assumes unidirectional correlations (while allowing the identification of bidirectional correlations). Given the complex nature of the problem under study it is possible that some of the relationships between transporters are in fact asymmetric (unidirectional), where transporter A offers information relevant to B, but B does not do the same for A (Gibaja and Ventura, 2014).

The summary of results in Table 5.5 (criterion B) identifies three links: 1) OATP1B1 and OATP1B3 are shown to mutually correlate to each other, as the pLabel from each transporter is selected into a decision tree to model the other transporter; 2) pOATP1A2 was selected as a predictor of OATP1B3, and 3) pOCT1 was identified as a predictor of PEPT1. Surprisingly pOATP2B1, which appears in second position in the best CC model, was not selected as a predictor for any of the transporters. However assessing labels in a two-by-two fashion is perhaps a harsh method to ascertain the significance of relationships between transporters. For example pOATP2B1 may be a predictor for a sub-group of compounds already partitioned by a molecular descriptor, rather than for all compounds. As a label receives a certain input, this can alter significantly the learned patterns, especially if different sources of input complement each other’s information. As a result, the binding patterns of the OATP2B1 label, for example, might be learned very differently, which will transform what this label outputs to the remainder of the classifier chain, thus rendering it a potentially advantageous predictor of the following transporter models.

Additionally, two factors might explain the absence of other relationships present in the selected CC model from the two-label chain analyses above. Firstly, C4.5 is clearly a suboptimal learning algorithm as, in many cases, it was not the optimal training algorithm (Table 5.5). This was used here for its straightforward and transparent output, and it may have overlooked weaker correlations. Secondly, some of the correlations might not be global (and may occur in a specific region of chemical space), hence not being observed without any additional chemical information.

To assess whether there is any link between a single-label’s predictive performance and its position in the CC model, the top 10 performances of a given label, at each of the six possible positions, were averaged. Figure 5.2 shows that all labels benefit, though to different extents, from being located somewhere between the second and the last position of the classifier chain as opposed to being in the first position (where no information from other labels is available). This is another evidence in support of the hypothesis of

intercorrelation between the transporters’ binding profiles – this essentially means that all transporters benefit, to some degree, from previous label information.

Table 5.5. Summary of proposed links between SLC transporters, determined from four different

approaches. Criterion C is a summary of the results presented in Appendix II, Table A2.2, and criterion D is derived from the results presented in Appendix II, Table A2.3.

Criterion A Criterion B Criterion C Criterion D

endpoint top 5 predictor in the best model Statistically significant predictor in a two-label chain Statistically significant in Obs x Obs Chi-Square correlation Statistically significant in pLabel x Obs Chi-Square correlation#

OCT1 n.a. none none n.a.

OATP2B1 pOCT1 none none none

OATP1A2 pOCT1 none OATP1B1

OATP1B3 none

PEPT1

pOCT1

pOATP1A2 pOATP2B1

pOCT1 OATP1B1 pOCT1

OATP1B1 none pOATP1B3

OATP1A2

OATP1B3

PEPT1

none

OATP1B3 pOATP1B1 pOATP1B1

pOATP1A2

OATP1B1

OATP1A2

pOATP1B1

# Each observed endpoint was only tested with the eligible pLabel variables (i.e. the pLabels from the transporter models in lines above it, which were made available during its training). OATP1A2, for example, has two possible pLabels against which it is tested (pOCT1 and pOATP2B1) which precede it.

In agreement with other observations discussed earlier in this chapter, OATP1B3 was the transporter that most benefitted from being pushed further towards the end of the chain, showing an overall trend (except when in the second position) of increasing predictive performance as its position approaches the end of the chain. On the other hand, observations regarding OCT1 and OATP1B1 indicate that despite benefitting from being trained with information from other labels (i.e. trained later in the chain’s order), these transporters show the least extent of benefit from this. This is evidenced by the smallest improvement in predictive performance from being at the top of the chain compared to being at any of the following positions. This aligns with the fact that OCT1 occupies the first position in the best multi-label model.

Figure 5.2. Average over the top 10 G-mean of each class label at every position in the 6-label chain.

The highest G-mean points are marked with a black outline.

In document Machine Learning for Modelling Tissue Distribution of Drugs and the Impact of Transporters (Page 122-126)