• No results found

Development of Prediction Models for the Mutagenicity of Nitrated PAHs Based on Multiple Linear Regression

N/A
N/A
Protected

Academic year: 2020

Share "Development of Prediction Models for the Mutagenicity of Nitrated PAHs Based on Multiple Linear Regression"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

2018 International Conference on Modeling, Simulation and Analysis (ICMSA 2018) ISBN: 978-1-60595-544-5

Development of Prediction Models for the Mutagenicity of Nitrated-PAHs

Based on Multiple Linear Regression

Wen-jing ZHANG, Li-jiao ZHAO

*

and Ru-gang ZHONG

Beijing Key Laboratory of Environmental and Viral Oncology, College of Life Science and Bioengineering, Beijing University of Technology, Beijing, China

*Corresponding author

Keywords: Nitrated polycyclic aromatic hydrocarbons, Mutagenicity, Quantitative structural-activity relationship, Multiple linear regression.

Abstract. Nitrated polycyclic aromatic hydrocarbons (NPAHs) are a family of toxicants wide spreading in the environment. In this study, quantitative structural-activity relationship (QSAR) models were developed for the prediction of mutagenicity of NPAHs. Structural descriptors were screened and multiple linear regression (MLR) were performed for developing the QSAR models. Totally 1706 descriptors were obtained based on structural optimization using density functional theory (DFT) at the CPCM-B3LYP/6-311+g (d,p) theoretical level in water. External and leave-one-out cross validation were performed to confirm the predictive ability and the models robustness, respectively. Totally 33 QSAR models were generated using one to eight descriptors, in which the model consisting of 4 descriptors, including Eelec, SIC2, RDF040v and GATS4v, has the

highest correlation coefficient (R2=0.8755). This study will contribute to not only the prediction of the mutagenicity of NPAHs, but also the development of QSAR modeling methods of toxicants.

Introduction

Nitrated polycyclic aromatic hydrocarbons (NPAHs) are derivatives of polycyclic aromatic hydrocarbons (PAHs) containing at least one nitro group on the aromatic benzene ring widely spreading in the environment. NPAHs can be generated by incomplete combustion and pyrolysis of fossil fuels and biomass, such as exhaust gases from gasoline and diesel engine combustion, certain food processing involving roast or fumigation, household stoves or heaters, solid waste incineration and natural fires. [1] NPAHs can also be produced by photochemical reactions in atmosphere of PAHs with gaseous oxidants, including NO2, N2O5, O3, OH radicals and NO3 radicals. Because of their

close association with particulate matter and ubiquity in polluted air, NPAHs have arosen great concern and have been analyzed in the atmosphere of different areas in the world.

(2)

Materials and Methods

Experimental Data

[image:2.595.67.534.224.790.2]

In the present study, the dataset used for modeling is comprised of 48 NPAHs with reported experimental mutagenicity in TA100 strain of Salmonella typhimurium obtained from bacterial Ames test. [6,7] The information of all NPAHs, including names, CAS numbers and experimental data, is listed in Table 1. The dataset was randomly divided into the training set (consisting of 32 compounds) for model construction and the test set (consisting of 16 compounds) for model validation by external prediction.

Table 1. Information of the 48 NPAHs used for QSAR modeling.

(3)

47 9-Nitroanthracene 602-60-8 C14H9NO2 0.26 Training set 48 6-Nitrochrysene 7496-02-8 C18H11NO2 2.93 Training set

Geometry Optimization and Molecular Descriptors

The three-dimensional molecular structures were built using GaussView software. Full geometry optimizations were performed by density functional theory (DFT) methods with Becke’s three- parameter hybrid method of Lee, Yang, and Parr correlation functional (B3LYP) at the 6-311+G (d, p) theoretical level. In order to simulate the intracellular aqueous environment, the geometric structures were optimized by employing the conductor-like polarizable continuum model (CPCM) in water. The vibration frequency calculations were performed to verify that the optimized structures were the global minimum. All calculations were performed with Gaussian 09 program package. Based on the optimized geometries, 14 quantum-chemical structural descriptors were obtained, including energy of the highest occupied molecular orbital (EHOMO), energy of the lowest unoccupied

molecular orbital (ELUMO), energy of HOMO-LUMO gap (EGAP), formation heat (H), total potential

energy (Eelec), dipole moment (d), molecular diameter (a0), molecular volume (V), electronegativity

(χ), chemical hardness (η), RMS Gradient Norm (RMS), chemical softness(σ), dipole moment (μ) and electrophilicity index (ω). Based on the optimized structures, 1666 molecular descriptors were generated by E-Dragon software and were subdivided into 20 logical blocks, including constitutional descriptors, functional group counts, atom-centered fragments, topological descriptors, walk and path counts, connectivity indices, information indices, 2D autocorrelations, edge adjacency indices, BCUT descriptors, topological charge indices, eigenvalue-based indices, geometrical descriptors, 3D matrix-based descriptors, RDF descriptors, WHIM descriptors, GETAWAY descriptors, Randic molecular profiles, molecular properties, and charge descriptors. Another 26 descriptors were obtained from ChemBioOffice package and Molinspiration Property Calculator. A total of 1706 molecular descriptors were obtained to construct a quantitative structure-activity relationship (QSAR) model for predicting the mutagenicity of NPAHs.

Variable Reduction

Before QSAR modeling, variable reduction was performed to eliminate the redundant dexcriptors by removing the descriptors missing for at least one compound or showing little or no discrimination within all compounds. As a result, the total descriptors were reduced to 1306. To ensure the quality of developed models, the descriptors with an absolute Pearson correlation coefficient to biological activity (|r|) below 0.7 were excluded, which execute a more conservative level of |r| ≤ 0.3 reported by Luana et al [8]. And then the descriptors were further screened to remove those with high autocorrelation but relatively low correlation with the mutagenicity. According to the suggestion by Topliss and Edwards [9,10], only one descriptor was retained among those descriptors with intercorrelation values of r2 ≥ 0.8 for reducing the probability of spurious correlations. Finally, as listed in Table 2, three quantum-chemical descriptors (Eelec, a0 and ω) and eight Dragon descriptors,

[image:3.595.70.528.637.787.2]

which have correlation coefficients to logTA100 values higher than 0.7 without autocorrelation, were screened out for the QSAR modeling.

Table 2. Definition of the descriptors screened out for QSAR modeling.

Scriptors Definition

SIC2 Structural information content (neighborhood symmetry of 2-order)

RDF040v Radial Distribution Function - 4.0 / weighted by atomic van der Waals volumes

GATS4v Geary autocorrelation - lag 4 / weighted by atomic van der Waals volumes

RDF080u Radial Distribution Function - 8.0 / unweighted

HIC Mean information content on the leverage magnitude

R1p R maximal autocorrelation of lag 1 / weighted by atomic polarizabilities

L2u 2nd component size directional WHIM index / unweighted

Mor21p 3D-MoRSE - signal 21 / weighted by atomic polarizabilities

Eelec The total energy in the B3LYP level /a.u.

a0 Molecular radius /angstrom

(4)

Model Development and Validation

In this study, the QSAR models were built by multiple linear regression (MLR), which has been proved to be a multidisciplinary approach applicable for establishing linear predictive models. [11,12] The dataset was split into the training and the test set, and the test set was used to validate the models developed by the training set. To ensure the quality of the obtained QSAR models, internal and external validations were performed to confirm the reliability, robustness and stability of the models by examining the values of correlation coefficient (R2), root mean square error (RMSE) and Q2LOO

after leave-one-out cross validation (LOO-CV). The values of R2 and Q2 were calculated according to equation (1), where 𝑦̂𝑖, 𝑦𝑖 and 𝑦̅ were the predicted, experimental and average of predicted logTA100 values, respectively. The values of RMSE indicating the errors in internal and external validations were calculated by equation (2), where 𝑦̂𝑖 was the predicted values and 𝑦𝑖 was the experimental values. And 𝑛 was the number of samples.

𝑅2, 𝑄2 = 1 − ∑𝑖=1𝑛 (𝑦̂𝑖 − 𝑦𝑖)2/ ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2 (1)

𝑅𝑀𝑆𝐸 = (∑𝑛 (𝑦𝑖− 𝑦̂𝑖)2

[image:4.595.61.532.321.813.2]

𝑖=1 /𝑛)1/2 (2)

Table 3. Different numbers and combinations of the descriptors used for QSAR modeling.

Model No.

Number of

descriptors Descriptors

1 1 E

2 a0

3 ω

4 SIC2

5 2 E SIC2

6 a0 SIC2

7 ω SIC2

8 SIC2 RDF040v

9 ω a0

10 3 E a0 ω

11 E SIC2 RDF040v

12 a0 SIC2 RDF040v

13 ω SIC2 RDF040v

14 SIC2 RDF040v GATS4v

15 4 E SIC2 RDF040v GATS4v

16 a0 SIC2 RDF040v GATS4v

17 ω SIC2 RDF040v GATS4v

18 SIC2 RDF040v RDF080u HIC

19 5 E SIC2 RDF040v RDF080u HIC

20 a0 SIC2 RDF040v RDF080u HIC

21 ω SIC2 RDF040v RDF080u HIC

22 SIC2 RDF040v RDF080u HIC R1p

23 6 E SIC2 RDF040v RDF080u HIC R1p

24 a0 SIC2 RDF040v RDF080u HIC R1p

25 ω SIC2 RDF040v RDF080u HIC R1p

26 SIC2 RDF040v RDF080e HIC R1p L2u

27 7 E SIC2 RDF040v RDF080e HIC R1p L2u

28 a0 SIC2 RDF040v RDF080e HIC R1p L2u

29 ω SIC2 RDF040v RDF080e HIC R1p L2u

30 SIC2 RDF040v RDF080e HIC R1p L2u Mor21p

(5)

32 a0 SIC2 RDF040v RDF080e HIC R1p L2u Mor21p

33 ω SIC2 RDF040v RDF080e HIC R1p L2u Mor21p

Result and Discussion

As listed in Table 3, totally 33 QSAR models were constructed by MLR based on different numbers and combinations of the 11 descriptor screened by correlation analysis. For the 4 models consisting of only one descriptor, relatively low correlation (R2 < 0.6) and high dispersion (RMSE > 1.1) were observed, which meant that the one-descriptor models were unqualified for predicting the mutagenicity of NPAHs. Therefore, more descriptors are required for the construction of the QSAR models. LOO-CV is frequently used for evaluating the predictive ability of a statistical model, and the squared correlation coefficient Q2LOO is an important parameter reflecting the quality of the model. [13]

According to the previously reported studies, the QSAR models were acceptable with the values of R2 and Q2 were higher than 0.6. [14] In this study, the values of R2 and Q2 are higher than 0.7 in the models consisting of more than 3 descriptors, which suggests that the number of the descriptors has significant effect on the quality of models. As shown in Figure 1, the four proposed models consisting of 4, 5, 6 and 7 descriptors (model 15, 19, 23 and 27, respectively), have the relatively higher R2 and Q2 values (R2 > 0.85 and Q2 > 0.75) than the other 29 models. The model consisting of 4 descriptors (model 15 in Figure 1A), including RDF040v, SIC2, Eelec and GATS4v, has the highest values of R2

(0.8775) and Q2 (0.8418) and lowest value of RMSE (0.6131), which means that this QSAR model is the most robust for predicting the mutagenecity of NPAHs compared with the other obtained models. Moreover, the results indicates that the five descriptors of ω, a0, RDF040v, RDF080e and HIC show

positive correlation with the logTA100 values, while the five descriptors of Eelec, L2u, R1p, SIC2 and

GATS4v show negative correlation.

[image:5.595.64.536.396.679.2]

Figure 1. Scatter plots for model 15 (A), model 19 (B), model 23(C) and model 27 (D) with R2 > 0.85 and Q2 > 0.75.

Conclusion

(6)

QSAR models were established, and then internal and external validations were performed to evaluate the quality of the obtained models. Finally, the QSAR model consisting of RDF040v, SIC2,

Eelec and GATS4v was supposed to be reasonable for predicting the mutagenicity of NPAHs. This

study will not only contribute to the prediction of environmental exposure risk of toxicants, but also will assist in revealing the mutagenic or carcinogenic mechanism of PAHs and related environmental pollutents.

Acknowledgement

This research was financially supported by the National Natural Science Foundation of China (No. 21778011), Natural Science Foundation of Beijing Municipality (No. 7162015).

References

[1] B.A.M. Bandowe, H. Meusel, Nitrated polycyclic aromatic hydrocarbons (nitro-PAHs) in the environment - A review, Sci. Total Environ. 581 (2017) 237-257.

[2] K. Misaki, T. Takamura-Enya, H. Ogawa, et al., Tumour-promoting activity of polycyclic aromatic hydrocarbons and their oxygenated or nitrated derivatives, Mutagenesis 31 (2016) 205-213.

[3] A. Feilberg, T. Nielsen, M.L. Binderup, et al., Observations of the effect of atmospheric processes on the genotoxic potency of airborne particulate matter, Atmos. Environ. 36 (2002) 4617-4625.

[4] P.P. Fu, Metabolism of nitro-polycyclic aromatic hydrocarbons, Drug Metab. Rev. 22 (1990) 209-268.

[5] T. Watanabe, M. Takashima, T. Kasai, et al., Comparison of the mutational specificity induced by environmental genotoxin nitrated polycyclic aromatic hydrocarbons in Salmonella typhimurium his genes, Mutat. Res. Genet. Toxicol. Environ. Mutagen. 394 (1997) 103-112.

[6] P. Gramatica, P. Pilutti, E. Papa, Approaches for externally validated QSAR modeling of nitrated polycyclic aromatic hydrocarbon mutagenicity, SAR QSAR Environ. Res. 18 (2007) 169-178.

[7] Reenu, Vikas, Role of exchange and correlation in the real external prediction of mutagenicity: performance of hybrid and meta-hybrid exchange-correlation functionals, RSC Adv. 5 (2015) 29238-29251.

[8] L.J. de Campos, E.B. de Melo, Modeling structure–activity relationships of prodiginines with antimalarial activity using GA/MLR and OPS/PLS, J. Mol. Graph. 54 (2014) 19-31.

[9] J.G. Topliss, R.P. Edwards, Chance factors in studies of quantitative structure-activity relationships, J. Med. Chem. 22 (1979) 1238-1244.

[10] Y.L. Ding, Y.C. Lyu, M.K. Leong, In silico prediction of the mutagenicity of nitroaromatic compounds using a novel two-QSAR approach, Toxicol. in Vitro 40 (2017) 102-114.

[11] P.R. Duchowicz, A. Talevi, L.E. Bruno-Blanch, et al., New QSPR study for the prediction of aqueous solubility of drug-like compounds, Bioorg. Med. Chem.16 (2008) 7944-7955.

[12] L.M. Saavedra, G.P. Romanelli, C.E. Rozo, et al., The quantitative structure-insecticidal activity relationships from plant derived compounds against chikungunya and zika aedes aegypti (Diptera: Culicidae) vector, Sci. Total Environ., 610 (2017) 937-943.

[13] J. Singh, S. Singh, B. Shaik, et al., Mutagenicity of nitrated polycyclic aromatic hydrocarbons: a QSAR investigation, Chem. Biol. Drug Des. 71 (2008) 230-243.

Figure

Table 1. Information of the 48 NPAHs used for QSAR modeling.
Table 2. Definition of the descriptors screened out for QSAR modeling.
Table 3. Different numbers and combinations of the descriptors used for QSAR modeling
Figure 1. Scatter plots for model 15 (A), model 19 (B), model 23(C) and model 27 (D) with R2 > 0.85 and Q2 > 0.75

References

Related documents

Introduction: Current indications for open abdomen management are damage control surgery, severe intra-abdominal sepsis, abdomi- nal compartment syndrome, abdominal wall closure

This paper presents the modelling of FSW for various tool-pin profiles along with simulation of peak temperature induced in plate material and flow stresses

In the present study we report an additional patient who developed CML several years after the diagnosis of B-cell chronic lymphocytic leukemia (B- CLL).. Until now, 18 patients

Practice Characteristics That Matter In the Provision of Health Education Services By Primary Care

The aim of current study was design, formulation and evaluation of Nebivolol buccal tablets by using Carbopol, sodium alginate and Sodium carboxy methyl cellulose as polymers

The ―Title‖, ―Keywords,‖ and ―Abstract‖ fields were selected in all databases, except for the MEDLINE database which offered to search ―All Fields.‖ EndNote was used

According to these principles an optimal system for mobilising the psychological resources of our client should include: (1) an immediate response by having the first meeting within