Appendix - – Sample Preparation for Long Wavelength M

Chapter 2 – Sample Preparation for Long Wavelength M

2.6. Appendix

2.6.1.1. XSCALE.INP !MAXIMUM_NUMBER_OF_PROCESSORS=16 !RESOLUTION_SHELLS= 10 6 4 3 2.5 2.0 1.8 1.7 1.6 !SPACE_GROUP_NUMBER=19 !UNIT_CELL_CONSTANTS=65.46 108.41 113.15 90.000 90.000 90.000 !REIDX=-1 0 0 0 0 -1 0 0 0 0 -1 0 !REFERENCE_DATA_SET= fae-rm.ahkl !MINIMUM_I/SIGMA=3.0

!REFLECTIONS/CORRECTION_FACTOR=50 !minimum #reflections/correction_factor !0-DOSE_SIGNIFICANCE_LEVEL=0.10

!WFAC1=1.5 ! factor applied to e.s.d.'s before testing equivalent reflections !SAVE_CORRECTION_IMAGES= FALSE ! TRUE is default

OUTPUT_FILE=datasetX_Y.HKL !at minimum of f' FRIEDEL'S_LAW=FALSE !TRUE

MERGE=FALSE !TRUE

STRICT_ABSORPTION_CORRECTION=TRUE !FALSE is default INPUT_FILE= .../datasetX_Y/XDS_ASCII.HKL

! INCLUDE_RESOLUTION_RANGE= 20 1.6

! CORRECTIONS= DECAY MODULATION ABSORPTION

! CRYSTAL_NAME=Seleno1 !Remove first "!" to switch on 0-dose extrapolation ! STARTING_DOSE=0.0 DOSE_RATE=1.0 !Use defaults for 0-dose extrapolation

2.6.1.2. XSCALE and Pointless

module load ccp4 module load XDS

module load global/cluster cd Dataset1_1

xscale

pointless -copy xdsin dataset1_1.HKL hklout pointless.mtz echo "done"

cd ..

cd dataset1_2 xscale

pointless -copy xdsin dataset1_2.HKL hklout pointless.mtz echo "done"

cd ..

2.6.1.3. Aimless

module load ccp4 module load XDS

module load global/cluster cd dataset1_1

aimless hklin pointless.mtz hklout aimless.mtz << eof > aimless.log run 1 all scales constant ANALYSIS isigminimum 2 resolution 60.0 0.00 anomalous on eof cd .. cd dataset1_2

aimless hklin pointless.mtz hklout aimless.mtz << eof > aimless.log run 1 all scales constant ANALYSIS isigminimum 2 resolution 60.0 0.00 anomalous on eof cd ..

2.6.1.4. Data Extraction

echo dataset1_1 hc1 > Protein_Wavelength_Results

grep " Overall InnerShell OuterShell" dataset1_1/ aimless.log >>Protein_Wavelength_Results

grep "Low resolution limit" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "High resolution limit" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Rmerge (within I+/I-)" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Rmerge (all I+ and I-)" dataset1_1/ aimless.log >> 1A_EAL_results

grep "Rmeas (within I+/I-)" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Rmeas (all I+ & I-)" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Rpim (within I+/I-)" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Rpim (all I+ & I-)" dataset1_1/ aimless.log >> Protein_Wavelength_Results

grep "Rmerge in top intensity bin" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Total number of observations" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Total number unique" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Mean((I)/sd(I))" dataset1_1/ aimless.log >> Protein_Wavelength_Results

grep "Mn(I) half-set correlation CC(1/2)" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Completeness " dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Multiplicity " dataset1_1/ aimless.log >> Protein_Wavelength_Results

grep "Anomalous completeness " dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Anomalous multiplicity " dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "DelAnom correlation between half-sets" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "Mid-Slope of Anom Normal Probability" dataset1_1/ aimless.log >> Protein_Wavelength_Results grep "REFLECTING_RANGE_E.S.D.=" dataset1_1/XDS_ASCII.HKL >> Protein_Wavelength_Results grep "UNIT_CELL_CONSTANTS=" dataset1_1/XSCALE.INP >> Protein_Wavelength_Results echo >> Protein_Wavelength_Results

echo >> Protein_Wavelength_Results

2.6.2. Statistical Tests

2.6.2.1. Mann-Whitney U-test

All data from both populations were ranked in ascending order and assigned a rank of one to n, where n represents the total number of crystals from combined populations. Adding up the ranks for a population gives the ranked sum R1, which was used to calculate U1 using equation (1),

𝑈₁ = 𝑛₁𝑛₂+ 𝑛1(𝑛1+ 1)

2 − 𝑅1

(2.1)

where 𝑛1 and 𝑛2 represent the number of samples in population one and two respectively. U1 represents the number of times a data point from population one is smaller than a data point from population two, when comparing all possible pairs of samples, where one sample is taken from each population. U1 was used to calculate U2 using equation (2),

𝑈₂= 𝑛₁𝑛₂− 𝑈₁

(2.2)

where U2 represents the equivalent of U1 for population two. In larger sample sizes (n1 and n2 > 10) ranked data from the Mann-Whitney U-test become near normally distributed. Using equation (3),

𝑍 = 2𝑈 − 𝑛1𝑛2 √𝑛1𝑛2(𝑛1+ 𝑛2+ 1)

(2.3)

we can transform the U statistic (the higher value of U1 and U2) to give a sampling distribution comparable to the standard normal (Z) distribution. Observed Z-values were compared to the critical Z-value, representing data ±1.96 standard deviations from the calculated mean (95% of the sample population). The null hypothesis was rejected if calculated Z-values exceeded the critical Z-value in either direction.

2.6.2.2. Student’s t-test

The t-values were calculated using equation (1.4),

𝑡 = 𝑋̅̅̅ − 𝑋1 ̅̅̅2 √𝑆𝑝2(_𝑛1 1+ 1 𝑛₂) (2.4)

where 𝑋̅̅̅1 is the mean of population one, 𝑋̅̅̅2 is the mean of population two and 𝑛1 and 𝑛2 are the number of samples in population one and two, respectively. 𝑆𝑝2 represents the pooled sample variance, as defined by equation (5),

𝑆_𝑝2₌𝑑𝑓1𝑆12+ 𝑑𝑓2𝑆22

𝑑𝑓₁+ 𝑑𝑓₂

(2.5)

where 𝑑𝑓1 and 𝑑𝑓2 represent the degrees of freedom of populations one and two (equations (6) and (7), respectively) and 𝑆12and 𝑆22 represent the sample variances of populations one and two, respectively. Calculated t-values were analysed against critical t-values from Student's t-distribution tables for α(2) = 0.05 (Student, 1908) with degrees of freedom calculated using equation (8),

77 (2.6) 𝑑𝑓₂= 𝑛₂− 1 (2.7) 𝐷𝐹 = 𝑑𝑓₁+ 𝑑𝑓₂ (2.8)

If observed t-values were found to be greater than the critical t-value, the null hypothesis is rejected. In addition to t-values, P-values were calculated using standard statistical analysis software. The P- value quantifies the uncertainty of obtaining the observed data if the null hypothesis were true. A critical P-value of 0.05 was used to judge significance, with the null hypothesis rejected if the observed p-values were less than or equal to the critical P-value.

2.6.3. Correlation Coefficients

Table 2.A1: Correlation coefficients of overall data quality indicators in relation to mosaicity.

Protein Lysozyme 1 Lysozyme 2 Insulin Thaumatin Ferritin 1 Ferritin 2

Treatment Ctrl HC Ctrl HC Ctrl HC Ctrl HC Ctrl HC Ctrl HC Rmerge (%) Vs Mosaicity (°) 0.29 0.14 0.32 0.11 0.17 0.44 0.42 -0.36 0.33 0.86 0.60 0.67 Rp.i.m. (%) Vs Mosaicity (°) 0.32 0.15 0.30 -0.16 0.21 0.47 0.37 -0.39 0.36 0.86 0.68 0.34 I/σ(I) Vs Mosaicity (°) -0.65 -0.58 -0.41 -0.56 -0.60 -0.29 0.51 0.17 -0.27 -0.84 -0.67 0.54 High Resolution Limit (Å) Vs Mosaicity (°) 0.85 0.82 0.58 0.77 0.95 0.78 0.15 -0.08 0.51 0.88 0.80 0.77

Chapter 3 – Excess Solvent and Data

In document Development of crystallographic techniques and their application to several protein targets (Page 84-89)