2018 International Conference on Modeling, Simulation and Analysis (ICMSA 2018) ISBN: 978-1-60595-544-5
Development of Prediction Models for the Mutagenicity of Nitrated-PAHs
Based on Multiple Linear Regression
Wen-jing ZHANG, Li-jiao ZHAO
*and Ru-gang ZHONG
Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing, China
*Corresponding author
Keywords: Nitrated polycyclic aromatic hydrocarbons, Mutagenicity, Quantitative structural-activity relationship, Multiple linear regression.
Abstract. Nitrated polycyclic aromatic hydrocarbons (NPAHs) are a family of toxicants wide spreading in the environment. In this study, quantitative structural-activity relationship (QSAR) models were developed for the prediction of mutagenicity of NPAHs. Structural descriptors were screened and multiple linear regression (MLR) were performed for developing the QSAR models. Totally 1706 descriptors were obtained based on structural optimization using density functional theory (DFT) at the CPCM-B3LYP/6-311+g (d,p) theoretical level in water. External and leave-one-out cross validation were performed to confirm the predictive ability and the models robustness, respectively. Totally 33 QSAR models were generated using one to eight descriptors, in which the model consisting of 4 descriptors, including Eelec, SIC2, RDF040v and GATS4v, has the
highest correlation coefficient (R2=0.8755). This study will contribute to not only the prediction of the mutagenicity of NPAHs, but also the development of QSAR modeling methods of toxicants.
Introduction
Nitrated polycyclic aromatic hydrocarbons (NPAHs) are derivatives of polycyclic aromatic hydrocarbons (PAHs) containing at least one nitro group on the aromatic benzene ring widely spreading in the environment. NPAHs can be generated by incomplete combustion and pyrolysis of fossil fuels and biomass, such as exhaust gases from gasoline and diesel engine combustion, certain food processing involving roast or fumigation, household stoves or heaters, solid waste incineration and natural fires. [1] NPAHs can also be produced by photochemical reactions in atmosphere of PAHs with gaseous oxidants, including NO2, N2O5, O3, OH radicals and NO3 radicals. Because of their
close association with particulate matter and ubiquity in polluted air, NPAHs have arosen great concern and have been analyzed in the atmosphere of different areas in the world.
Materials and Methods
Experimental Data
[image:2.595.67.534.224.790.2]In the present study, the dataset used for modeling is comprised of 48 NPAHs with reported experimental mutagenicity in TA100 strain of Salmonella typhimurium obtained from bacterial Ames test. [6,7] The information of all NPAHs, including names, CAS numbers and experimental data, is listed in Table 1. The dataset was randomly divided into the training set (consisting of 32 compounds) for model construction and the test set (consisting of 16 compounds) for model validation by external prediction.
Table 1. Information of the 48 NPAHs used for QSAR modeling.
47 9-Nitroanthracene 602-60-8 C14H9NO2 0.26 Training set 48 6-Nitrochrysene 7496-02-8 C18H11NO2 2.93 Training set
Geometry Optimization and Molecular Descriptors
The three-dimensional molecular structures were built using GaussView software. Full geometry optimizations were performed by density functional theory (DFT) methods with Becke’s three- parameter hybrid method of Lee, Yang, and Parr correlation functional (B3LYP) at the 6-311+G (d, p) theoretical level. In order to simulate the intracellular aqueous environment, the geometric structures were optimized by employing the conductor-like polarizable continuum model (CPCM) in water. The vibration frequency calculations were performed to verify that the optimized structures were the global minimum. All calculations were performed with Gaussian 09 program package. Based on the optimized geometries, 14 quantum-chemical structural descriptors were obtained, including energy of the highest occupied molecular orbital (EHOMO), energy of the lowest unoccupied
molecular orbital (ELUMO), energy of HOMO-LUMO gap (EGAP), formation heat (H), total potential
energy (Eelec), dipole moment (d), molecular diameter (a0), molecular volume (V), electronegativity
(χ), chemical hardness (η), RMS Gradient Norm (RMS), chemical softness(σ), dipole moment (μ) and electrophilicity index (ω). Based on the optimized structures, 1666 molecular descriptors were generated by E-Dragon software and were subdivided into 20 logical blocks, including constitutional descriptors, functional group counts, atom-centered fragments, topological descriptors, walk and path counts, connectivity indices, information indices, 2D autocorrelations, edge adjacency indices, BCUT descriptors, topological charge indices, eigenvalue-based indices, geometrical descriptors, 3D matrix-based descriptors, RDF descriptors, WHIM descriptors, GETAWAY descriptors, Randic molecular profiles, molecular properties, and charge descriptors. Another 26 descriptors were obtained from ChemBioOffice package and Molinspiration Property Calculator. A total of 1706 molecular descriptors were obtained to construct a quantitative structure-activity relationship (QSAR) model for predicting the mutagenicity of NPAHs.
Variable Reduction
Before QSAR modeling, variable reduction was performed to eliminate the redundant dexcriptors by removing the descriptors missing for at least one compound or showing little or no discrimination within all compounds. As a result, the total descriptors were reduced to 1306. To ensure the quality of developed models, the descriptors with an absolute Pearson correlation coefficient to biological activity (|r|) below 0.7 were excluded, which execute a more conservative level of |r| ≤ 0.3 reported by Luana et al [8]. And then the descriptors were further screened to remove those with high autocorrelation but relatively low correlation with the mutagenicity. According to the suggestion by Topliss and Edwards [9,10], only one descriptor was retained among those descriptors with intercorrelation values of r2 ≥ 0.8 for reducing the probability of spurious correlations. Finally, as listed in Table 2, three quantum-chemical descriptors (Eelec, a0 and ω) and eight Dragon descriptors,
[image:3.595.70.528.637.787.2]which have correlation coefficients to logTA100 values higher than 0.7 without autocorrelation, were screened out for the QSAR modeling.
Table 2. Definition of the descriptors screened out for QSAR modeling.
Scriptors Definition
SIC2 Structural information content (neighborhood symmetry of 2-order)
RDF040v Radial Distribution Function - 4.0 / weighted by atomic van der Waals volumes
GATS4v Geary autocorrelation - lag 4 / weighted by atomic van der Waals volumes
RDF080u Radial Distribution Function - 8.0 / unweighted
HIC Mean information content on the leverage magnitude
R1p R maximal autocorrelation of lag 1 / weighted by atomic polarizabilities
L2u 2nd component size directional WHIM index / unweighted
Mor21p 3D-MoRSE - signal 21 / weighted by atomic polarizabilities
Eelec The total energy in the B3LYP level /a.u.
a0 Molecular radius /angstrom
Model Development and Validation
In this study, the QSAR models were built by multiple linear regression (MLR), which has been proved to be a multidisciplinary approach applicable for establishing linear predictive models. [11,12] The dataset was split into the training and the test set, and the test set was used to validate the models developed by the training set. To ensure the quality of the obtained QSAR models, internal and external validations were performed to confirm the reliability, robustness and stability of the models by examining the values of correlation coefficient (R2), root mean square error (RMSE) and Q2LOO
after leave-one-out cross validation (LOO-CV). The values of R2 and Q2 were calculated according to equation (1), where 𝑦̂𝑖, 𝑦𝑖 and 𝑦̅ were the predicted, experimental and average of predicted logTA100 values, respectively. The values of RMSE indicating the errors in internal and external validations were calculated by equation (2), where 𝑦̂𝑖 was the predicted values and 𝑦𝑖 was the experimental values. And 𝑛 was the number of samples.
𝑅2, 𝑄2 = 1 − ∑𝑖=1𝑛 (𝑦̂𝑖 − 𝑦𝑖)2/ ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2 (1)
𝑅𝑀𝑆𝐸 = (∑𝑛 (𝑦𝑖− 𝑦̂𝑖)2
[image:4.595.61.532.321.813.2]𝑖=1 /𝑛)1/2 (2)
Table 3. Different numbers and combinations of the descriptors used for QSAR modeling.
Model No.
Number of
descriptors Descriptors
1 1 E
2 a0
3 ω
4 SIC2
5 2 E SIC2
6 a0 SIC2
7 ω SIC2
8 SIC2 RDF040v
9 ω a0
10 3 E a0 ω
11 E SIC2 RDF040v
12 a0 SIC2 RDF040v
13 ω SIC2 RDF040v
14 SIC2 RDF040v GATS4v
15 4 E SIC2 RDF040v GATS4v
16 a0 SIC2 RDF040v GATS4v
17 ω SIC2 RDF040v GATS4v
18 SIC2 RDF040v RDF080u HIC
19 5 E SIC2 RDF040v RDF080u HIC
20 a0 SIC2 RDF040v RDF080u HIC
21 ω SIC2 RDF040v RDF080u HIC
22 SIC2 RDF040v RDF080u HIC R1p
23 6 E SIC2 RDF040v RDF080u HIC R1p
24 a0 SIC2 RDF040v RDF080u HIC R1p
25 ω SIC2 RDF040v RDF080u HIC R1p
26 SIC2 RDF040v RDF080e HIC R1p L2u
27 7 E SIC2 RDF040v RDF080e HIC R1p L2u
28 a0 SIC2 RDF040v RDF080e HIC R1p L2u
29 ω SIC2 RDF040v RDF080e HIC R1p L2u
30 SIC2 RDF040v RDF080e HIC R1p L2u Mor21p
32 a0 SIC2 RDF040v RDF080e HIC R1p L2u Mor21p
33 ω SIC2 RDF040v RDF080e HIC R1p L2u Mor21p
Result and Discussion
As listed in Table 3, totally 33 QSAR models were constructed by MLR based on different numbers and combinations of the 11 descriptor screened by correlation analysis. For the 4 models consisting of only one descriptor, relatively low correlation (R2 < 0.6) and high dispersion (RMSE > 1.1) were observed, which meant that the one-descriptor models were unqualified for predicting the mutagenicity of NPAHs. Therefore, more descriptors are required for the construction of the QSAR models. LOO-CV is frequently used for evaluating the predictive ability of a statistical model, and the squared correlation coefficient Q2LOO is an important parameter reflecting the quality of the model. [13]
According to the previously reported studies, the QSAR models were acceptable with the values of R2 and Q2 were higher than 0.6. [14] In this study, the values of R2 and Q2 are higher than 0.7 in the models consisting of more than 3 descriptors, which suggests that the number of the descriptors has significant effect on the quality of models. As shown in Figure 1, the four proposed models consisting of 4, 5, 6 and 7 descriptors (model 15, 19, 23 and 27, respectively), have the relatively higher R2 and Q2 values (R2 > 0.85 and Q2 > 0.75) than the other 29 models. The model consisting of 4 descriptors (model 15 in Figure 1A), including RDF040v, SIC2, Eelec and GATS4v, has the highest values of R2
(0.8775) and Q2 (0.8418) and lowest value of RMSE (0.6131), which means that this QSAR model is the most robust for predicting the mutagenecity of NPAHs compared with the other obtained models. Moreover, the results indicates that the five descriptors of ω, a0, RDF040v, RDF080e and HIC show
positive correlation with the logTA100 values, while the five descriptors of Eelec, L2u, R1p, SIC2 and
GATS4v show negative correlation.
[image:5.595.64.536.396.679.2]
Figure 1. Scatter plots for model 15 (A), model 19 (B), model 23(C) and model 27 (D) with R2 > 0.85 and Q2 > 0.75.
Conclusion
QSAR models were established, and then internal and external validations were performed to evaluate the quality of the obtained models. Finally, the QSAR model consisting of RDF040v, SIC2,
Eelec and GATS4v was supposed to be reasonable for predicting the mutagenicity of NPAHs. This
study will not only contribute to the prediction of environmental exposure risk of toxicants, but also will assist in revealing the mutagenic or carcinogenic mechanism of PAHs and related environmental pollutents.
Acknowledgement
This research was financially supported by the National Natural Science Foundation of China (No. 21778011), Natural Science Foundation of Beijing Municipality (No. 7162015).
References
[1] B.A.M. Bandowe, H. Meusel, Nitrated polycyclic aromatic hydrocarbons (nitro-PAHs) in the environment - A review, Sci. Total Environ. 581 (2017) 237-257.
[2] K. Misaki, T. Takamura-Enya, H. Ogawa, et al., Tumour-promoting activity of polycyclic aromatic hydrocarbons and their oxygenated or nitrated derivatives, Mutagenesis 31 (2016) 205-213.
[3] A. Feilberg, T. Nielsen, M.L. Binderup, et al., Observations of the effect of atmospheric processes on the genotoxic potency of airborne particulate matter, Atmos. Environ. 36 (2002) 4617-4625.
[4] P.P. Fu, Metabolism of nitro-polycyclic aromatic hydrocarbons, Drug Metab. Rev. 22 (1990) 209-268.
[5] T. Watanabe, M. Takashima, T. Kasai, et al., Comparison of the mutational specificity induced by environmental genotoxin nitrated polycyclic aromatic hydrocarbons in Salmonella typhimurium his genes, Mutat. Res. Genet. Toxicol. Environ. Mutagen. 394 (1997) 103-112.
[6] P. Gramatica, P. Pilutti, E. Papa, Approaches for externally validated QSAR modeling of nitrated polycyclic aromatic hydrocarbon mutagenicity, SAR QSAR Environ. Res. 18 (2007) 169-178.
[7] Reenu, Vikas, Role of exchange and correlation in the real external prediction of mutagenicity: performance of hybrid and meta-hybrid exchange-correlation functionals, RSC Adv. 5 (2015) 29238-29251.
[8] L.J. de Campos, E.B. de Melo, Modeling structure–activity relationships of prodiginines with antimalarial activity using GA/MLR and OPS/PLS, J. Mol. Graph. 54 (2014) 19-31.
[9] J.G. Topliss, R.P. Edwards, Chance factors in studies of quantitative structure-activity relationships, J. Med. Chem. 22 (1979) 1238-1244.
[10] Y.L. Ding, Y.C. Lyu, M.K. Leong, In silico prediction of the mutagenicity of nitroaromatic compounds using a novel two-QSAR approach, Toxicol. in Vitro 40 (2017) 102-114.
[11] P.R. Duchowicz, A. Talevi, L.E. Bruno-Blanch, et al., New QSPR study for the prediction of aqueous solubility of drug-like compounds, Bioorg. Med. Chem.16 (2008) 7944-7955.
[12] L.M. Saavedra, G.P. Romanelli, C.E. Rozo, et al., The quantitative structure-insecticidal activity relationships from plant derived compounds against chikungunya and zika aedes aegypti (Diptera: Culicidae) vector, Sci. Total Environ., 610 (2017) 937-943.
[13] J. Singh, S. Singh, B. Shaik, et al., Mutagenicity of nitrated polycyclic aromatic hydrocarbons: a QSAR investigation, Chem. Biol. Drug Des. 71 (2008) 230-243.