A3.2 Oxidation and reduction potentials data in Fig 3
Chapter 4 Predicting with confidence the PCE of new dyes in DSSC
4.5 Construction of predictive model for PCE
4.5.2 A rigorous model
A more rigorous procedure was based on the construction of a generalized linear model where the ηexp was initially expressed as
5
0 1 2 3 4
exp g Gn( Gn; ) gn( ; )n S NDD OA (Eq. 4.7)
where gΔGn(ΔGn; β ) was the link function for ΔGn with parameter β1 to be
estimated, gλn(λn; β ) was the link function of λn with parameter β2 to be
estimated, and β3, β4and β5 were the parameters to be estimated for S, NDD and OA respectively. Eq. 4.7 was linear in S, NDD, and OA and contained linear and
non-linear components in ΔGn and λn, although the overall function would still be
linear in all the parameters. In particular, gΔGn(ΔGn; β ) and gλn(λn; β ) expanded
ΔGn and λn into restricted cubic splines with parameter vectors β , β
respectively.207 The spline expansions were defined uniquely from the data for
ΔGn and λn; this was a well-established methodology to include non-linear terms
in regression procedures where the analytical form of the non-linearity could not be derived from a physical basis.
Eq. 4.7 contained too many fitting parameters with respect to the 52 data points available, which would violate the empirical rule of thumb of allowing one parameter for, at most, 10 data points. The initial fitting was therefore
performed using a statistical penalization procedure,207 based on using the corrected AIC (Eq. 2.26) to penalize the likelihood ratio (LR) of the model. The analysis of variance (ANOVA) (Table A4.3) of the fitting confirmed that there was no evidence of correlation between the predictors S, NDD, OA and η. A
reduced model was therefore built from the total model (Eq. 4.7) by
“simplification by approximation”,207 as introduce in section 2.3.2, which produced the fitting as
0 1 2
exp g Gn( Gn; ) g n( ;n ) (Eq. 4.8)
The standard deviation of the residuals for this more advanced model was 1.71% and, as before, it was possible to predict the probability that ηwas higher than a given threshold for any values of computed ΔGn and λn. Fig. 4.5 shows a map
with the probability of η higher than 7% with this more accurate model. The differences between the intuitive and the rigorous procedures were not large but the rigorous procedure guaranteed that the effect of potentially more complex nonlinearities was not neglected. In addition, the functional form in Eq. 4.8 produced more conservative estimates outside the region where data points were present, while the polynomial fit of Eq. 4.5 produced unphysical estimates in these regions. The calibration graph of the model in (Eq. 4.8), obtained by bootstrap re-sampling (Fig. A4.4), shows that this model is much more appropriate than the model based on (Eq. 4.5) (c.f. Fig. A4.3).207
The proposed map could be used to either direct the synthesis of new dyes where the maximum η was predicted, or prepare dyes in the region of the map where there were few or no data points, to learn more about the system in these conditions. The advantage of this statistical approach was that the confidence intervals of the prediction included both the existence of effects and parameters
that were not included in the predictors, as well as the inaccuracies of both the computational chemistry and the experimental measurements: all inaccuracies and missing effects would simply decrease the level of confidence of the prediction.Considering that new families of DSSC were being used, for instance, with different electrolytes, it was believed that the construction of a similar map, with perhaps a larger set of data and predictors, should constitute a priority in the rational exploration of the chemical space.
Figure 4.5 Map of the probability (%) that η exceeds 7% as a function of the computed parameters ΔGn, λn based on Eq. 4.8.
It is also important to stress the difference between our approach, where correlations were searched for between computables and a target experimental property, and the alternative computational tools for material discovery that generated a large set of “theoretical” materials and directly computed the property of interest, such as the band gap or other electronic properties. The latter
approach was particularly suitable when the underlying physics was relatively well-understood and the direct computation of the property of interest was possible. For n-type DSSCs it was not possible to compute the η from first principles and a closer alliance between theory and experiment was therefore necessary.
Finally, such analysis in larger and unbiased data sets offered the best opportunity to validate some hypotheses put forward to describe the physics of DSSC. After considering the results, it was not too surprising that the overlap with the solar radiation did not correlate with the PCE, possibly due to cells with small absorptance were not even reported and beyond a threshold of absorptance the PCE did not change. On the other hand, it was quite surprising to see that there was no effect in having HOMO and LUMO localized in different regions of the dye, considering the enormous effort put into the preparation of large families of D-π-A dyes. The efficacy of D-π-A character on PCE of dyes was further investigated in Chapter 5.
4.6 Conclusions
A general method was proposed to predict the PCE of n-type DSSCs with new dyes from easily computable quantities, including, for the first time, the degree of confidence of such predictions. Carboxylated organic dyes studied with iodide/tri-iodide electrolyte were considered, but the method could be applied to a different family of DSSCs and the accuracy of its prediction could be improved over time by expanding the set of data and/or the set of predictors.
4.7 Appendix
A4.1 Correlations between PCE and J
sc, and PCE and V
ocFig. A4.1 shows the correlations between PCE and Jsc, and PCE and Voc of the 52
dyes in the dataset. The Pearson’s r indicated that the PCE was more strongly influenced by the Jsc, suggesting that it was more suitable to use predictors that
were relevant to the dye’s properties, rather than other components that influenced the Voc of the device, such as the electrolyte’s properties.
Figure A4.1 (a) Correlation between PCE and Jsc of the 52 dyes in the dataset. (b)
Correlation between PCE and Voc of the 52 dyes in the dataset.