Chapter 2 – Sample Preparation for Long Wavelength M
2.6. Appendix
2.6.1.1. XSCALE.INP !MAXIMUM_NUMBER_OF_PROCESSORS=16 !RESOLUTION_SHELLS= 10 6 4 3 2.5 2.0 1.8 1.7 1.6 !SPACE_GROUP_NUMBER=19 !UNIT_CELL_CONSTANTS=65.46 108.41 113.15 90.000 90.000 90.000 !REIDX=-1 0 0 0 0 -1 0 0 0 0 -1 0 !REFERENCE_DATA_SET= fae-rm.ahkl !MINIMUM_I/SIGMA=3.0
!REFLECTIONS/CORRECTION_FACTOR=50 !minimum #reflections/correction_factor !0-DOSE_SIGNIFICANCE_LEVEL=0.10
!WFAC1=1.5 ! factor applied to e.s.d.'s before testing equivalent reflections !SAVE_CORRECTION_IMAGES= FALSE ! TRUE is default
OUTPUT_FILE=datasetX_Y.HKL !at minimum of f' FRIEDEL'S_LAW=FALSE !TRUE
MERGE=FALSE !TRUE
STRICT_ABSORPTION_CORRECTION=TRUE !FALSE is default INPUT_FILE= .../datasetX_Y/XDS_ASCII.HKL
! INCLUDE_RESOLUTION_RANGE= 20 1.6
! CORRECTIONS= DECAY MODULATION ABSORPTION
! CRYSTAL_NAME=Seleno1 !Remove first "!" to switch on 0-dose extrapolation ! STARTING_DOSE=0.0 DOSE_RATE=1.0 !Use defaults for 0-dose extrapolation
74
2.6.1.2. XSCALE and Pointless
module load ccp4 module load XDS
module load global/cluster cd Dataset1_1
xscale
pointless -copy xdsin dataset1_1.HKL hklout pointless.mtz echo "done"
cd ..
cd dataset1_2 xscale
pointless -copy xdsin dataset1_2.HKL hklout pointless.mtz echo "done"
cd ..
2.6.1.3. Aimless
module load ccp4 module load XDS
module load global/cluster cd dataset1_1
aimless hklin pointless.mtz hklout aimless.mtz << eof > aimless.log run 1 all scales constant ANALYSIS isigminimum 2 resolution 60.0 0.00 anomalous on eof cd .. cd dataset1_2
aimless hklin pointless.mtz hklout aimless.mtz << eof > aimless.log run 1 all scales constant ANALYSIS isigminimum 2 resolution 60.0 0.00 anomalous on eof cd ..
75
2.6.1.4. Data Extraction
echo dataset1_1 hc1 > Protein_Wavelength_Results
grep " Overall InnerShell OuterShell" dataset1_1/ aimless.log >>Protein_Wavelength_Results
grep "Low resolution limit" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "High resolution limit" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Rmerge (within I+/I-)" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Rmerge (all I+ and I-)" dataset1_1/ aimless.log >> 1A_EAL_results
grep "Rmeas (within I+/I-)" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Rmeas (all I+ & I-)" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Rpim (within I+/I-)" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Rpim (all I+ & I-)" dataset1_1/ aimless.log >> Protein_Wavelength_Results
grep "Rmerge in top intensity bin" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Total number of observations" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Total number unique" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Mean((I)/sd(I))" dataset1_1/ aimless.log >> Protein_Wavelength_Results
grep "Mn(I) half-set correlation CC(1/2)" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Completeness " dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Multiplicity " dataset1_1/ aimless.log >> Protein_Wavelength_Results
grep "Anomalous completeness " dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Anomalous multiplicity " dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "DelAnom correlation between half-sets" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Mid-Slope of Anom Normal Probability" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "REFLECTING_RANGE_E.S.D.=" dataset1_1/XDS_ASCII.HKL >> Protein_Wavelength_Results grep "UNIT_CELL_CONSTANTS=" dataset1_1/XSCALE.INP >> Protein_Wavelength_Results echo >> Protein_Wavelength_Results
echo >> Protein_Wavelength_Results
2.6.2. Statistical Tests
2.6.2.1. Mann-Whitney U-test
All data from both populations were ranked in ascending order and assigned a rank of one to n, where n represents the total number of crystals from combined populations. Adding up the ranks for a population gives the ranked sum R1, which was used to calculate U1 using equation (1),
𝑈1 = 𝑛1𝑛2+ 𝑛1(𝑛1+ 1)
2 − 𝑅1
(2.1)
where 𝑛1 and 𝑛2 represent the number of samples in population one and two respectively. U1 represents the number of times a data point from population one is smaller than a data point from population two, when comparing all possible pairs of samples, where one sample is taken from each population. U1 was used to calculate U2 using equation (2),
76
𝑈2= 𝑛1𝑛2− 𝑈1
(2.2)
where U2 represents the equivalent of U1 for population two. In larger sample sizes (n1 and n2 > 10) ranked data from the Mann-Whitney U-test become near normally distributed. Using equation (3),
𝑍 = 2𝑈 − 𝑛1𝑛2 √𝑛1𝑛2(𝑛1+ 𝑛2+ 1)
3
(2.3)
we can transform the U statistic (the higher value of U1 and U2) to give a sampling distribution comparable to the standard normal (Z) distribution. Observed Z-values were compared to the critical Z-value, representing data ±1.96 standard deviations from the calculated mean (95% of the sample population). The null hypothesis was rejected if calculated Z-values exceeded the critical Z-value in either direction.
2.6.2.2. Student’s t-test
The t-values were calculated using equation (1.4),
𝑡 = 𝑋̅̅̅ − 𝑋1 ̅̅̅2 √𝑆𝑝2(𝑛1 1+ 1 𝑛2) (2.4)
where 𝑋̅̅̅1 is the mean of population one, 𝑋̅̅̅2 is the mean of population two and 𝑛1 and 𝑛2 are the number of samples in population one and two, respectively. 𝑆𝑝2 represents the pooled sample variance, as defined by equation (5),
𝑆𝑝2= 𝑑𝑓1𝑆12+ 𝑑𝑓2𝑆22
𝑑𝑓1+ 𝑑𝑓2
(2.5)
where 𝑑𝑓1 and 𝑑𝑓2 represent the degrees of freedom of populations one and two (equations (6) and (7), respectively) and 𝑆12and 𝑆22 represent the sample variances of populations one and two, respectively. Calculated t-values were analysed against critical t-values from Student's t-distribution tables for α(2) = 0.05 (Student, 1908) with degrees of freedom calculated using equation (8),
77 (2.6) 𝑑𝑓2= 𝑛2− 1 (2.7) 𝐷𝐹 = 𝑑𝑓1+ 𝑑𝑓2 (2.8)
If observed t-values were found to be greater than the critical t-value, the null hypothesis is rejected. In addition to t-values, P-values were calculated using standard statistical analysis software. The P- value quantifies the uncertainty of obtaining the observed data if the null hypothesis were true. A critical P-value of 0.05 was used to judge significance, with the null hypothesis rejected if the observed p-values were less than or equal to the critical P-value.
2.6.3. Correlation Coefficients
Table 2.A1: Correlation coefficients of overall data quality indicators in relation to mosaicity.
Protein Lysozyme 1 Lysozyme 2 Insulin Thaumatin Ferritin 1 Ferritin 2
Treatment Ctrl HC Ctrl HC Ctrl HC Ctrl HC Ctrl HC Ctrl HC Rmerge (%) Vs Mosaicity (°) 0.29 0.14 0.32 0.11 0.17 0.44 0.42 -0.36 0.33 0.86 0.60 0.67 Rp.i.m. (%) Vs Mosaicity (°) 0.32 0.15 0.30 -0.16 0.21 0.47 0.37 -0.39 0.36 0.86 0.68 0.34 I/σ(I) Vs Mosaicity (°) -0.65 -0.58 -0.41 -0.56 -0.60 -0.29 0.51 0.17 -0.27 -0.84 -0.67 0.54 High Resolution Limit (Å) Vs Mosaicity (°) 0.85 0.82 0.58 0.77 0.95 0.78 0.15 -0.08 0.51 0.88 0.80 0.77
78