Classification SAR Modelling Methods

3. Materials and Methods

3.3.5 Classification SAR Modelling Methods

The limitation in the number of data samples is an unavoidable problem with the metal oxide NM databases. The restriction of size is a challenge in building a reliable model of genotoxicity with high prediction accuracy. Even if a high number of molecular descriptors is calculated for the small data set of NMs, still we deal with the issue of “under sampling induced collinearity”, which means a high degree of collinearity in descriptors 146,147_{. Collinearity will be present in}

the model as the number of samples is very small compared to the number of descriptors. Additionally, other problems such as over-fitting and noise in the data with negative effects on the model will arise. Considering the abovementioned complications, in order to find the most

appropriate model to fit the data, it is better to focus on a limited set of hypotheses. In other words, in case of small data, it is better to start from a small set of possible hypotheses, e.g. a set of decision trees with depth <= four. Thus, we opted for a simple tree classification analysis for (Q)SAR modelling of our data set, in particular, Recursive Partitioning and Regression Trees (rpart) model was used to classify the data set.

3.3.5.1 Recursive Partitioning and Regression Trees

The rpart programs build classification and regression models in two phases and the result is a binary tree. To build the tree classification model the first phase is identifying the variable which contributes the most to the splitting the data into two groups. After dividing the data into two groups, the algorithms continues the splitting separately for each group. The procedure continues recursively until each group contains a minimum number of samples or no more improvement can be achieved. During the second phase, a cross-validation evaluation is performed on the data to trim the full tree and make is simpler 148_.

Considering the small data set of metal oxide NMs with their associated set of their quantum- mechanical descriptors and the classification endpoint we need to model, the factor of

“randomness” is likely to play a role in the built model. To overcome this situation, we decided to develop a model to analyse the importance of each variable in relationship with the

genotoxicity property of the NMs, rather than a model to estimate the genotoxicity of the metal oxide NPs. The (Q)SAR models in addition to their predictive ability, help us to identify the more effective physico-chemical attributes of a chemical related to toxicological and biological properties of the substances. In the present study, (Q)SAR models are employed to study the effect of each quantum-chemical descriptors in amplifying or reducing the genotoxicity of the NMs. Considering the limitations mentioned above, we decided to use all the data as training set and study the importance of each descriptor in amplifying the genotoxicity property of the metal oxide NPs. All the quantum-chemical descriptors have been standardized in the data set prior to the modelling process. All analyses were done in R version 3.2.3 (R Foundation for Statistical Computing, Vienna, Austria), using the ‘rpart’ library.

Table 4. Criteria for the usefulness and quality assessment of the data set for the (Q)SAR

modelling: extent of Comet assay conditions checklist. General parameters have been used to assess each data point and the results are reported in Table S1.A (Appendices) where all questions are answered in a yes or no fashion.

General parameters Further details to assess

Comet protocol type:

I) The pH of unwinding: alkaline, neutral, very alkaline.

II) Incubation with the enzymes: FPG, 8oxodG, Endo III.

Concentrations expressed in at least one of the units:

I) Mass per volume, per area, per cell (µg/ml, µg/cm2, µg/cell)

II) Number of NMs per ml, per cm2, per cell (ENMs/ml or ENMs/cm2 or

ENMs/cell)

III) Surface area per ml, per cm2, per cell (cm2/ml or cm2/cm2 or cm2/cell) Cytotoxicity tests performed?

Performed trend test for dose-response relationship?

Microscopic analysis in the Comet assay: Analyzed at least 50 Comets per gel divided on two different slides (parallel gels per sample)? Comet count performed at least by one of the methods?):

I) % DNA in the tail II) Tail length III) Tail moment

IV) Tail intensity (classified as belonging to one of five classes depending on their tail intensity?)

At least 3 hours for treatment time was respected?

Performed comparison between treated samples

and controls? I) Positive control

II) Negative control

III) Both negative and positive controls Information on uptake (demonstrated cellular

Table 5. Comet assay experimental results for all selected metal oxide nanomaterials used for

(Q)SAR modelling*_.

No Metal oxide Number of genotoxic reports Number of nongenotoxic reports Overall assessment** 1 Al2O3 1 1 + 2 NiO 1 + 3 Co3O4 2 + 4 CuO 6 2 + 5 Fe2O3 1 5 - 6 Fe3O4 6 3 + 7 TiO2 32 6 + 8 ZnO 16 1 + 9 SiO2 3 9 - 10 V2O3 1 + 11 V2O5 1 - 12 MgO 1 - 13 ZrO2 1 - 14 CeO2 5 1 + 15 Bi2O3 1 + 16 SnO2 1 -

*_{Data were extracted from}128_.

**_{The “positive” and “negative” signs are assigned according to the number of genotoxic and}

nongenotoxic “reports” per each NM. The assessment column represents the variable used to model, based upon the global evaluation (weight of evidence) of all the reports related to a single NM (i.e. row): “+” means positive, i.e. genotoxic, whereas “-“means negative, i.e. not genotoxic.

Table 6. Acronyms, short definitions and units of the molecular descriptors calculated by

MOPAC2012.

Symbol Descriptors Unit

HF Heat of formation Kcal/mol

TE Total energy of the oxide cluster Ev

EE Electronic energy of the oxide cluster Ev

Core Core-core repulsion energy of the oxide cluster Ev

COSMO Surface charge distribution based on Conductor-like Screening Model

Cubic Angstroms COSMO-

SA Area of the oxide cluster calculated based on COSMO

Square Angstroms

IP Ionization Potential Ev

HOMO Energy of the highest occupier molecular orbital of the

oxide cluster Ev

LUMO Energy of the lowest unoccupied molecular orbital of the

oxide cluster Ev

No.Fl Number of Filled Levels adimensional

3.4 Weight of Evidence Approach in the Analysis of Results of Different In Silico Methods

In document Integration of Toxicity Data from Experiments and Non-Testing Methods within a Weight of Evidence Procedure (Page 77-82)