Analysis of Genetic Data: Probability and the Chi-Square Test
PROBABILITY AND GENETIC EVENTS
Probability Theory
Probability of occurrence (P) =
For example, the probability of getting a head from a toss of a coin is 1/2.
number of defined outcome(s) Total number of possible outcomes
Basic Terms: Sample Space
In probability theory, the sample space of an experiment or random trial is the set of all
possible outcomes. For example,
if the experiment is tossing a coin, then the sample space is the set {head, tail}.
For tossing a single six-sided die, the sample space is {1, 2, 3, 4, 5, 6}
Event
Again in probability, any subset of the sample space is usually called an event.
Ordered event (e.g. tossing 2 coins once and obtaining a head for the first coin and a tail for the second coin).
Unordered event (e.g. tossing 2 coins once and obtaining exactly one tail).
Probability of Multiple Events
The rule of Independent events:
This states that the occurrence of past events have no influence on that of future events.
The Product rule:
This rule also states that the probability of
independent events occurring together is equal to the product of their individual probabilities.
E.g., if the p(A) = 0.7, then, p(AA) = 0.7 X 0.7 = 0.49
Questions
What is the probability of a couple having 5 boys in a row?
What is the probability of tossing a coin twice and getting one head and one tail?
Probability of Multiple Events cont’d.
Sum Rule
It states that the probability of either of 2 or more independent events occurring is equal to the sum of their individual probabilities.
Example:
What is the probability of a couple having (i) a boy and a girl? (ii) either a boy or a girl
Probability of Multiple Events cont’d.
Binomial expansion/distribution
The probability of occurrence of some arrangement of two mutually exclusive trials, where the final order is not specified, is defined by the binomial theorem:
In probability theory, events E1, E2, ..., En are said to be mutually exclusive if the occurrence of any one of them automatically implies the non-occurrence of the remaining n − 1 events.
Therefore, two mutually exclusive events cannot both occur.
P = (n!/s!t!)(p
sq
t)
P = (n!/s!t!)(p
sq
t)
Where;
n = number of trials
p = probability of an event occurring on any given trial
q = probability of the event not occurring s = number of times an event occurs
t = number of times an opposite event occurs
Example:
A would be couple plan to have five children when they marry. Determine the probability of the couple having 3 girls and 2 boys.
Solution:
n = 5, s = 3, t = 2, p = 1/2 and q = 1/2
p = (5!/3!2!)(1/2)3(1/2)2 = 10(1/2)3(1/2)2 = 10/32
Question
If four babies are born at a given hospital on the same day;
(a) What is chance that two will be boys and two girls?
(b) What is the chance that all four will be girls?
Questions
A man and his wife who are both heterozygous for albinism plan to have four children. Use the information to answer the following questions.
(a) What is the probability that any given child will be normal?
(b) What is the probability that all of them would be normal?
(c) What is the probability that all of them are normal except the 2nd child?
(d) What is the probability of having an albino child among the four children?
Evaluating Genetic Data: Chi-Square Analysis
The Chi-Square (χ2) Test
Mendel’s 3:1 monohybrid and 9:3:3:1 dihybrid ratios are hypothetical predictions based on the following assumptions:
1. Dominance/Recessiveness 2. Segregation
3. Independent assortment and 4. Random fertilization
Of all the factors, segregation, independent assortment and random fertilization can be affected by chance and thus influenced by normal deviation (chance deviation).
Evaluating Genetic Data, cont’d.
Thus, chance deviations in any one of the above can alter observed Mendelian ratios.
Chance deviation is affected by sample size. The greater the sample size the lesser the possibility of chance deviation occurring. The reverse is also true.
Chi-Square (χ2) Distribution
It allows one to determine whether or not a deviation from expected Mendelian ratio can be attributed solely to chance.
Chi-Square (χ2) Distribution, cont’d
It also compares observed distribution to expected distribution (based on genetic hypotheses) and mathematically assesses whether or not the calculated χ2 value is due to chance or a real difference between the two distributions.
It is dependent upon the sample size.
CALCULATION OF CHI-SQUARE STATISTIC
c 2
Where “O” is the observed value for a given category and “E” is the expected value for that category
Since (o – e) is the deviation in each case, then the equation can be reduced to:
c
2= Σd
2/e
Problem Solving
Christabel, a would-be food technologist and a genetics student decided to test the 3:1
Mendelian ratio. She obtained 1000 seeds in the following proportions.
Tall : 740 Dwarf: 260
Calculate the p-value and also infer if her results closely fit the 3:1 ratio.
Step by step procedure to make the X2 calculation for the F2 results of a hypothetical monohybrid and dihybrid crosses
The final step is the interpretation of the c2 value
First, we determine the value of the degrees of freedom (d/f), which is equal to n-1,
where n is the number of different categories into which each datum point may fall.
For the 3:1 ratio, n = 2, so d/f = 2 – 1 = 1
The d/f for the 9:3:3:1 ratio is 3
D/f must always be taken into account
because the greater the number of categories, the more deviation is expected due to chance.
The next step is to convert the c2 value to the corresponding probability value (p), using a prepared chart or graph.
INTERPRETATION OF THE VALUE
Compare your calculated χ2 value to the χ2 value on the table at 5 %.
If your calculated χ2 is larger than the χ2 from the table at 5% (i.e. p > 0.05), then the difference is due solely to CHANCE, and therefore the observed numbers fit a particular ratio.
c
2In the F2 generation of a certain tomato experiment, Michael, decided to test the 9:3:3:1 Mendelian ratio. She obtained 1000 seeds in the following proportions.
583 round yellow 195 round green
166 wrinkled yellow 56 wrinkled green
Are the discrepancies between the observed and expected ratios acceptable?
Graph of Chi-Square
Using the dihybrid cross above, where p = 26 as an example,
The first interpretation is that, the probability is 26% or about 1 in 4, that the deviation was due to chance.
The second interpretation is that, were the
same experiment repeated many times, 26% of the trials would be expected to exhibit chance deviation.
Interpretation
Is 0.26 an acceptable or unacceptable p value?
The decision is relative and depends on the certainty of the investigator
By conversion, 0.05 (5%) has been chosen as an arbitrary standard.
All p values between 0.05 and 1.0 are
considered acceptable in chi-square analysis.
All values below 0.05 are unacceptable with respect to goodness of fit.
0.26 is much above 0.05 (5%) and therefore acceptable.
In other words, our data are consistent with the hypothesis of a 9:3:3:1 ratio of phenotypes,
which is indicative of a two-locus genetic model with dominance at each locus.
Were the p value below this standard, we would have rejected the hypothesis for the experiment. The data would then be
interpreted as unacceptable in fitting a 9:3:3:1 ratio.
NOTE:
When the χ2 test shows that there is no significant difference between the observed and expected samples then “ we fail to reject the hypothesis”
i. e. we accept.
If there is significant difference between them “we reject the hypothesis”.
Homework
A heterozygous genetic condition called
“creeper” in chickens produces shortened and deformed legs and wings, giving the bird a
squatty appearance. Matings between creepers produced
775 creeper : 388 normal progeny.
(a) Is the hypothesis of a 3:1 ratio acceptable?
(b) Does a 2:1 ratio fit the data better?
CELLULAR BASIS OF INHERITANCE.
MITOSIS AND MEIOSIS