Chapter 4: Identification of the core structural features of genotoxic and non-genotoxic
4.2. Materials and Methods :
4.2.1: Leadscope SAR carcinogenicity and genotoxicity databases:
Leadscope SAR carcinogenicity and genotoxicity databases are high-quality data resources that can be used to build a predictive model and for the 'read-across' of data for other chemicals to determine their potential to be carcinogenic or mutagenic. Both databases contain summarised data for in vitro and in vivo cancer and mutagenicity endpoints along with chemical structures. These databases are used by the Leadscope program to build predictive models. To ensure high-quality data for SAR, analyses of various salt forms of chemical compounds and their respective toxicity data have also been carried out to derive an overall endpoint for the active portion of the chemical. Several sources of experimental test results have been included in these databases, such as from FDA, NTP, CCRIS, CPDB and other primary sources. All chemical structures have been provided in SAR in the neutral and, if appropriate, tested form and confirmed for accuracy.
The SAR carcinogenicity database contains 1,948 SAR structures. The database includes 3,598 compounds with 11,538 test results and provides carcinogenicity study endpoints for male and female rats (1,774 and 1,725 compounds respectively) and male and female mice (1,640 and 1,675 compounds respectively). The SAR genotoxicity database provides compound-level calls for 46 genetic toxicity endpoints. These include 32 bacterial mutagenicity endpoints, 4 in vitro mammalian, 5 in vitro chromosomal aberration and 6 in vivo micronucleus results. An overview of the datasets is presented in Table 4.1.
Chapter 4
Table 4.1. An overview of the number of compounds and the endpoints total results for both SAR genetox and SAR carcinogenicity databases. * = (Number of tests)
SAR genotoxic database (10543)* Mutagenic (positive) Non-mutagenic (negative)
Bacterial mutation 4530 4173
Salmonella 4235 4180
In vivo micronucleus 274 912
SAR carcinogenic database (2870)* carcinogenic (positive) Non-carcinogenic (negative)
Male rat 745 869
Female rat 686 892
4.2.2 Scaffold generation feature in Leadscope:
The scaffold analysis feature for a large dataset of bioactivity values in the Leadscope personal Ver 4.4 programme was used in this study to generate hierarchical structuring and visualisation of the main carcinogenic scaffolds covering both genotoxic and non-genotoxic mechanisms. This feature was also used to navigate and explore the chemical space of different complex structures in both the SAR carcinogenicity and SAR genotoxicity databases. The carcinogenic scaffolds were extracted by removing all side chains except the linking double bonds and exocyclic groups to generate chemically meaningful compound scaffolds. Using the scaffold
generating parameters, the level of quality of the extracted scaffolds from the database was selected by choosing the minimum compounds per scaffold and the minimum atoms per scaffold. Choosing higher numbers yields lower number of scaffolds but of higher quality. In this study, the criteria used were that a carcinogenic scaffold has to cover at least 10 compounds and therefore the minimum compounds number per one scaffold was set to 10, and the same number was set for the minimum number of atoms per scaffold.
Chapter 4
The scaffold generation feature in Leadscope has another advantage in that it arranges scaffolds to form a tree of “virtual scaffolds” that are constructed in silico. This tree is built in hierarchical arrangement of parent and child scaffolds (see Figure 4.1). The smaller size scaffold (parent), which covers a larger number of compounds, yields bigger sized (child) scaffolds that cover fewer compounds but have more specificity in terms of activity. Child scaffolds that share the same substructure of parent scaffold are usually termed as sibling scaffolds since they are all linked to one parent scaffold.
Figure 4.1. A screenshot of the Leadscope program which shows the virtual scaffolds tree in hierarchical arrangement of parent and child carcinogenic scaffolds.
In this study, the scaffold generation feature in Leadscope was used to construct a scaffold tree of both carcinogenic and mutagenic scaffolds extracted from the SAR carcinogenicity and genotoxicity databases. This approach also helped to illustrate any relationship(s) between carcinogenicity and mutagenicity scaffolds since visual analysis of the structural relationship between parent and child scaffolds was easier using the hierarchical arrangement tree.
Chapter 4
4.2.3 Cut-offs for Selecting Carcinogenic and Mutagenic Scaffolds:
Carcinogenicity and mutagenicity values were assigned to each scaffold within a scaffold tree. In the case of the SAR carcinogenicity database, this value was defined as the ratio of carcinogenic compounds to the total compounds that are contained in that scaffold. While in the SAR genotoxicity database, this value was defined as the ratio of mutagenic compounds to the total compounds that are contained in that scaffold. Cut-off values were then specified in order to select the representative carcinogenicity or mutagenicity scaffold. If the value of mutagenicity or carcinogenicity of any scaffold was equal or greater than the cutoff value, it was considered as a representative active scaffold (carcinogenicity or mutagenicity) whereas the scaffolds less than the cut-off value were defined as non-active (non-carcinogenic or non- mutagenic). In addition, each scaffold had to cover at least 10 compounds to be selected to the scaffold group. The main goal of adjusting the cut-off value was to select the minimum number of carcinogenicity and mutagenicity scaffolds that would cover the largest possible number of carcinogenic and mutagenic compounds. The ratio of activity (carcinogenicity or mutagenicity) (C1/S), where C1 represent total active compounds (carcinogenic or mutagenic) and S represent the number of active compounds that contain this scaffold, was adjusted to be 0.7 based on selection criteria discussed above. All the values between 0.7-0.3 were considered equivalent while values below 0.3 were considered as non-active scaffolds (non- carcinogenicity or non-mutagenicity).
On the other hand, the non-active compounds’ ratio was also adjusted to select the minimum number of non-active (non-carcinogenicity or non-mutagenicity) scaffolds that would cover maximum number of non-active compounds. The ratio of non-activity (C2/S), where C2 represent the total non-active compounds (non-carcinogenic or non-mutagenic) and S represent the number of non-active compounds that contain this scaffold, was adjusted to be 0.7 based on selection criteria discussed above, which is equal to 0.3 in ratio of activity.
Chapter 4
All scaffolds that were equal or more than the activity ratio (0.7) in carcinogenicity and equal or more than activity ratio (0.7) in mutagenicity and covered more than 10 structures were classified as genotoxic carcinogenicity scaffolds. On the other hand, all scaffolds that are more than or equal to activity ratio (0.7) in carcinogenicity and less than or equal to non-activity ratio (0.3) in mutagenicity were classified as non-genotoxic carcinogenicity scaffolds.