© 2015 Ellement
Topics to be covered:
1. Statistics and Data Management
Parametric Tests - anything you can measure - can be between 2 points - comparison of the means e.g. length, time, weight, temperature
Non-Parametric Tests - cannot find a value in between - deals with ranks
- comparison of the medians e.g. number of males in the classroom
Comparison Test - used to know whether 2 or more groups are the same or equal, if not which is greater? which is smaller? Correlation Test - used to know the relationship between
2 or sometimes 3 groups whether they are directly or inversely related and by how much?
NOTICE that your value is either between 1 and 0 or 0 and 1
VALUE: the closer your value to zero, the weaker the relationship
SIGN: positive sign means it is directly related; negative means it is inversely related
0.90-1.00 very weak correlation 0.70-0.89 weak correlation 0.40-0.69 modest correlation 0.20-0.39 strong correlation 0.00-0.19 very strong correlation Association Test - used to know which group or groups
show an affinity to a set of conditions Unmatched - uses 2 different population
Matched - uses the same population
Mann-Whitney
- nonparametric, comparison, unmatched Example: Problem Set D
A herpetologist studying the effect of a deadly fungal disease on frogs wanted to find out if the altitude of the frog’s habitat makes a difference in the prevalence of the disease among resident animals. She delineated two study sites (A and B) found on different altitudinal areas (A = 20 masl, B = 350 masl), and set up eight traps in each of the sites (total of 16 traps). She left the traps in the sites for a few days, and went back to collect the captured frogs and count how many tested positive for the fungal disease in each trap, Upon her return, she found out that one trap in site B was missing, so the data for this trap was not counted. Tabulating her results, she arrived at the following values:
SITE A 8 12 15 21 25 44 44 60 n= 8 SITE B 2 4 5 9 12 17 19 n= 7 Hypotheses:
H0: There is no significant difference between the two samples. H1: There is a significant difference between the two samples.
1. Rank the data. Data items that have equal values are given the average rank of those items.
1 2 SITE A rank SITE B rank
2 4 8 4 2 1 3 5 12 6.5 4 2 4 8 15 8 5 3 5 9 21 11 9 5 6.5 12 25 12 12 6.5 6.5 12 44 13.5 17 9 8 15 44 13.5 19 10 9 17 60 15 10 19
𝑛
1=8𝑛
2=7 11 21 Total of ranks of SITE B 36.5 12 2513.5 44 13.5 44 15 60
2. Use the following formulae to solve for 𝑈1 and 𝑈2:
𝑈
1= 𝑛
1𝑛
2+
𝑛2(𝑛2+1)2
− 𝑅
2𝑈
2= 𝑛
1𝑛
2− 𝑈
1where 𝑛1= number of observations in first column 𝑛2= number of observations in second column 𝑅2= sum of the ranks in the second column 𝑈1= (8)(7) + 7(7 + 1)
2 − 36.5 𝑼𝟏= 𝟒𝟕. 𝟓
𝑈2= (8)(7) − 47.5 𝑼𝟐= 𝟖. 𝟓
3. Reject H0 if the computed lower U value > critical U value. 𝑛1=8; 𝑛2=7; level of confidence = 0.05
critical U value = 10 computed lower U value = 8.5 8.5<10 Fail to reject H0.
Param
etr
ic
Comparison 2-Group T-test Z-test3,4,5-Group Variance (ANOVA)Analysis of
Correlation Pearson Association Chi-Square
Non
-Para
metric
Comparison 2-Group Mann-Whitney Wilcoxon 3,4,5-Group Kruskal-Wallis Correlation Spearman Association Chi-Square© 2015 Ellement
Analysis of Variance One-Way (1-Way ANOVA)
- parametric, comparison, 3,4,5-group Example: Problem Set A
A marine biologist in charge of four marine reserves located on a small island noticed that one of the marine reserves (Area ‘A’) was twice the size of the other areas (‘B’, ‘C’, and ‘D’). Considering that all other aspects of the marine reserves were equal except for size, the biologist wanted to find out if the size of the marine reserve had an effect on the overall size of fish species living within them. To test this, he designated a single fish species
Acanthurus olivaceous as the test species, and collected
10 specimens of this fish in each of the four marine reserves. He measured each fish (in cm) and tabulated the data below. AREA A 78 88 87 88 83 82 81 80 80 89 AREA B 78 78 83 81 78 81 81 82 76 76 AREA C 79 73 79 75 77 78 80 78 83 84 AREA D 77 69 75 70 74 83 80 75 76 75 Hypotheses:
H0: There is no significant difference among the means. H1: At least one of the means is different from the others.
1. Find the size (𝒏𝒊), mean (𝑿̅𝒋), and the grand mean (𝑋̿).
A (𝑋𝑗− 𝑋̅)2 B (𝑋𝑗− 𝑋̅)2 C (𝑋𝑗− 𝑋̅)2 D (𝑋𝑗− 𝑋̅)2 78 31.36 78 1.96 79 0.16 77 2.56 88 19.36 78 1.96 73 31.36 69 40.96 87 11.56 83 12.96 79 0.16 75 0.16 88 19.36 81 2.56 75 12.96 70 29.16 83 0.36 78 1.96 77 2.56 74 1.96 82 2.56 81 2.56 78 0.36 83 57.76 81 6.76 81 2.56 80 1.96 80 21.16 80 12.96 82 6.76 78 0.36 75 0.16 80 12.96 76 11.56 83 19.36 76 0.36 89 29.16 76 11.56 84 29.16 75 0.16 𝑛𝑖 10 𝑔1 10 𝑔2 10 𝑔3 10 𝑔4 𝑋̅𝑗 83.6 146.4 79.4 56.4 78.6 98.4 75.4 154.4 𝑿̿ 79.25 =∑ 𝑋̅𝑗 𝐽
2. Complete the ANOVA Table:
Sources 𝑺𝑺 𝑑𝑓 𝑀𝑆 𝐹 𝐹𝑐 Treatments 𝑆𝑆𝑇𝑅 𝟑𝟒𝟏. 𝟗 𝑑𝑓𝑡𝑟 𝟑 𝟏𝟏𝟑. 𝟗𝟕 𝑀𝑆𝑇𝑅 𝟗. 𝟎𝟎𝟐𝟒 2.87 Errors 𝑆𝑆𝐸 𝟒𝟓𝟓. 𝟔 𝟑𝟔 𝑑𝑓𝑒 𝟏𝟐. 𝟔𝟔 𝑀𝑆𝐸 Total 𝑆𝑆𝑇 𝟕𝟗𝟕. 𝟓 𝟑𝟗 𝑑𝑓𝑡 𝟏𝟐𝟔. 𝟔𝟑 𝑀𝑆𝑇
𝑆𝑆𝑇𝑅 = ∑ 𝑛
𝑖(𝑋
𝑗− 𝑋̿)
2𝑆𝑆𝐸 = ∑ 𝑔
𝑖𝑆𝑆𝑇 = 𝑆𝑆𝑇𝑅 + 𝑆𝑆𝐸
where 𝑆𝑆𝑇𝑅 = Treatment Sum of Squares 𝑆𝑆𝐸 = Error Sum of Squares 𝑆𝑆𝑇 = Total Sum of Squares 𝑛𝑖 = sample size 𝑋̅𝑗 = mean 𝑋̿ = grand mean 𝑔𝑖 = mean of the (𝑿𝒋− 𝑿̅)𝟐
𝑑𝑓
𝑡𝑟= 𝑘 − 1
𝑑𝑓
𝑒= 𝑛 − 𝑘
𝑑𝑓
𝑡= 𝑑𝑓
𝑡𝑟+ 𝑑𝑓
𝑒where 𝑘 = number of populations 𝑛 = number of observations
𝑀𝑆𝑇𝑅 =
𝑆𝑆𝑇𝑅
𝑑𝑓
𝑡𝑟𝑀𝑆𝐸 =
𝑆𝑆𝐸
𝑑𝑓
𝑒𝑀𝑆𝐸 = 𝑀𝑆𝑇𝑅 + 𝑀𝑆𝐸
𝐹 =
𝑀𝑆𝑇𝑅
𝑀𝑆𝐸
3. Reject H0 if the computed F value > critical F value. 𝑑𝑓𝑡𝑟=3; 𝑑𝑓𝑒 =36; level of confidence = 0.05 critical F value = 2.87
computed F value = 9.0024 9.0024>2.87
© 2015 Ellement
Kruskal-Wallis
- nonparametric, comparison, 3,4,5-group Example: Problem Set B
A marine biologist in charge of four marine reserves located on a small island noticed that one of the marine reserves (Area ‘A’) was twice the other areas (‘B’, ‘C’ and ‘D’). Considering that all other aspects of the marine reserves were equal except for size, the biologist wanted to find out if the size of the marine reserve had an effect on the overall number of fishes living within them. To test this he designated a single species Acantharus olivaceous as the test species, and established ten counting stations and noted the number of A. olivaceous in each station and noted those in the data sheet. He did this for all areas and listed his data below.
AREA A 78 88 87 88 83 82 81 80 80 89 AREA B 78 78 83 81 78 81 81 82 76 76 AREA C 79 73 79 75 77 78 80 78 83 84 AREA D 77 69 75 70 74 83 80 75 76 75 Hypotheses:
H0: There is no significant difference in the distribution of fishes from four marine reserves.
H1: There is a significant difference in the distribution of fishes from four marine reserves.
1. Rank the data. Data items that have equal values are given the average rank of those items.
A rank B rank C rank D rank
78 17 78 17 79 20.5 77 12.5 88 38.5 78 17 73 3 69 1 87 37 83 33.5 79 20.5 75 6.5 88 38.5 81 27.5 75 6.5 70 2 83 33.5 78 17 77 12.5 74 4 82 30.5 81 27.5 78 17 83 33.5 81 27.5 81 27.5 80 23.5 80 23.5 80 23.5 82 30.5 78 17 75 6.5 80 23.5 76 10 83 33.5 76 10 89 40 76 10 84 36 75 6.5
TOTAL 309.5 TOTAL 217.5 TOTAL 190 TOTAL 106
2. Complete the ANOVA Table:
𝐻 =
12
𝑁(𝑁 + 1)
(∑
𝑅
𝑖2𝑛
𝑖 𝑘 𝑖=1) − 3(𝑁 + 1)
𝑑𝑓 = 𝑘 − 1
where 𝐻 = Kruskal-Wallis value 𝑁 = number of total scores 𝑘 = sample size
𝑅𝑖 = ranked total per sample 𝑛𝑖 = number of scores per sample 𝐻 = {( 12 40(40 + 1)) ( 309.52 10 + 217.52 10 + 1902 10 + 1062 10 )} − 3(𝑁 + 1) 𝐻 = 𝟏𝟔. 𝟑𝟒 𝑑𝑓 = 4 − 1 𝑑𝑓 = 3
3. Reject H0 if the computed H value > critical X2 value. 𝑑𝑓=3; level of confidence = 0.05
critical X2 value = 7.8147 computed H value = 16.34 16.34>7.8147
Reject H0.
Pearson Product-Moment Coefficient
- parametric, correlation Example: Problem Set J
The Jackson’s chameleon is a very popular animal among reptile keepers owing to the horns possessed by the males. The larger the horns, the more expensive the price. An exotic animal breeder wanted to find out if the length of the horns of males are related to the mass (weight) of the animal rather than size (length). He collected data from his captive stock males and got the following data:
HORN LENGTH 6.6 6.9 7.3 8.2 8.3 11 12 12 9.4 10.2 MASS (g) 86 92 71 74 185 185 201 283 255 222 Hypotheses:
H0: There is no correlation between the 2 groups.
H1: There is either a positive or negative correlation between the 2 groups.
1. Compute for the xy, x2, and y2. Calculate their summation.
horn length (cm) mass (g) xy x2 y2
6.6 86 567.6 43.56 7396 6.9 92 634.8 47.61 8464 7.3 71 518.3 53.29 5041 8.2 74 606.8 67.24 5476 8.3 185 1535.5 68.89 34225 11 185 2035 121 34225 12 201 2412 144 40401 12 283 3396 144 80089 9.4 255 2397 88.36 65025 10.2 222 2264.4 104.04 49284 ∑ 𝑥 91.9 ∑ 𝑦 1654 ∑ 𝒙𝑦 16367.4 ∑ 𝒙𝟐 881.99 ∑ 𝑦2 329626
2. Using the formula below, get the value of r.
𝑟 =
𝑁 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
√[𝑁 ∑ 𝑥
2− (∑ 𝑥)
2][𝑁 ∑ 𝑦
2− (∑ 𝑦)
2]
𝑟 = 10(16367.4) − (91.9)(1654)√[10(881.99) − (91.9)2][10(329626) − (1654)2] 𝑟 = 0.8058
3. Based from the following, determine the correlation between the two groups.
VALUE: the closer your value to zero, the weaker the relationship
SIGN: positive sign means it is directly related; negative means it is inversely related
Since 𝑟 = 0.8058, the relationship between the horn length and mass shows a strong positive correlation.
© 2015 Ellement
Chi-Square
- non/parametric, association Example: Problem Set I
A reforested area consists of three tree species A, B, and C, and four species of endemic bird species 1, 2, 3, and 4. The timber concession that owns the area is preparing to cut down trees for use as wood pulp for paper manufacturing. As part of the deal with the WWF, the timber concession can only cut down one species of tree. To help them decide what species of tree to cut, the company hired an ornithologist who did a survey of each tree species, and what bird species was found utilizing each tree species. The results of the survey are listed as:
BIRD 1 BIRD 2 BIRD 3 BIRD 4
TREE A 12 7 5 17
TREE B 14 6 22 9
TREE C 35 12 7 11
Hypotheses:
H0: The number of bird inhabitants does not depend on the species of the trees.
H1: The number of bird inhabitants depend on the species of the trees.
1. Get the total of the rows and columns.
BIRD 1 BIRD 2 BIRD 3 BIRD 4 TOTAL
TREE A 12 7 5 17 41
TREE B 14 6 22 9 51
TREE C 35 12 7 11 92
TOTAL 61 25 34 37 157
2. In an ideal world, it is expected to have equal distribution of the birds. To get the expected value, divide the grand total with the number of cells.
𝐸 =
∑ 𝑤 + ∑ 𝑥 + ∑ 𝑦 + ∑ 𝑧
𝑁
𝐸 =157 12 𝐸 = 13.08 ~133. Using the formula below, get the value of 𝑋2 and .
BIRD 1 BIRD 2 BIRD 3 BIRD 4 TOTAL
TREE A 12 7 5 17 41 TREE B 14 6 22 9 51 TREE C 35 12 7 11 92 TOTAL 61 25 34 37 157
𝑋
2= ∑ [
(𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑)
2𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑
]
1 (𝑶 − 𝑬)2 𝑬 2 (𝑶 − 𝑬)2 𝑬 3 (𝑶 − 𝑬)2 𝑬 4 (𝑶 − 𝑬)2 𝑬 A 12 0.0769 7 2.7692 5 4.9230 17 1.2307 B 14 0.0769 6 3.7692 22 6.2307 9 1.2307 C 35 37.230 12 0.0769 7 2.7692 11 0.3076 𝑋2=60.6923𝑑𝑓 = (𝑟 − 1)(𝑐 − 1)
𝑑𝑓 = (3 − 1)(4 − 1) 𝑑𝑓 = 64. Reject H0 if the computed X2 value > critical X2 value. 𝑑𝑓=6; level of confidence = 0.05
critical X2 value = 12.592 computed X2 value =60.6923 60.6923>.592
© 2015 Ellement
2. Global Positioning System
- relies on a constellation of 24 NAVSTAR satellites launched and maintained by the U.S. Department of Defense
- uses at least 5 satellites/space vehicles (SVs) Satellites are used to transmit the signal by letting
these signal bounce on them since sound and light travel in a straight line.
SVs orbit at an altitude of about 21,000 km
SVs keep time using an atomic clock that loses or gains one second every 30,000 years
Unlike other devices like communication gadgets, GPS doesn’t need an importance signal.
It only needs the signal to be bounced back to the recipient.
1st Live Telecast via satellite: 1964 Summer Olympics in Tokyo
REMINDERS in using the GPS:
Use it in an open area. Move slowly, do not run. Do not cover the transmitter.
Other information GPS can give you:
Direction Distance Depth Elevation Speed Temperature
3. Terrestrial Sampling Techniques a. Quadrat
-applies to a square sample unit or plot -may be a single sample unit or be divided into
subplots
the richer the flora, the larger or more numerous the quadrats must be
b. Transect Line
-across section of an area
-used to relate changes in vegetation within it to changes in the environment
c. Point-Quarter
-most useful in sampling communities in which individuals are widely spaced or in which the dominant plants are large shrubs or trees The classic distance method is the point quarter method which was developed by the first land surveyors in the U.S.A. in the nineteenth century. The four trees nearest to the corner of each section of land (1 sq. mile) were recorded in the first land surveys and they form a valuable data base on the composition of the forests in the eastern U.S. before much land had been converted to agriculture. The point quarter technique has been a commonly used distance method in forestry. It was first used in plant ecology by Cottam et al. (1953) and Cottam and Curtis (1956). Figure 5.10 illustrates the technique. A series of random points is
selected often along a transect line with the constraint that points should not be so close that the same individual is measured at two successive points. The area around each random point is divided into four 90° quadrants and the distance to the nearest tree is measured in each of the four quadrants. Thus, 4 point-to-organism distances are generated at each random point, and this method is similar to measuring the distances from a random point to the 1st, 2nd, 3rd and 4th nearest neighbors.
Figure 5.10 Point-quarter method of density estimation. The area around each random point is subdivided into four 90° quadrants and the nearest organism to the random point is located in each quadrant. Thus four point-to-organism distances (blue arrows) are obtained at each random point. This method is commonly used on forest trees. Trees illustrate individual organisms
© 2015 Ellement
Abundance/Species Richness (
𝑆
)- count of number of species occurring within the community
Relative Abundance/Species Evenness
𝑅𝐷
𝑖=
𝑛
𝑖𝑁
where 𝑅𝐷𝑖 = abundance of species 𝑖
𝑛𝑖 = number of individuals of species 𝑖 𝑁 = total number of individuals of all species
Rank-Abundance
Whittaker plot/Rank-Abundance Curve
- species ranking based on relative abundance, ranked from most to least abundant ) x-axis and relative abundance (y-axis) expressed on a log10 axis.
- a 2D chart with relative abundance on the Y-axis and the abundance rank on the X-axis
on Species Richness
- reflected by the greater length of the curve on Species Evenness
- equitable distribution of individuals among species - indicated by the more gradual slope of the curve e.g.
Density
𝐷
𝑖=
𝐴
𝑖𝑎𝑟𝑒𝑎
where 𝐴𝑖 = total number of individuals of species 𝑖
Relative Density
𝑅𝐷
𝑖=
𝐷
𝑖𝑡𝑜𝑡𝑎𝑙 𝑝𝑙𝑎𝑛𝑡 𝑑𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦
Diversity&Dominance Simpson’s Index𝐷 = ∑ (
𝑛
𝑖𝑁
)
2Simpson’s Index of Diversity = 1 − 𝐷
Simpson’s Diversity Index =
1
𝐷
where 𝐷 = Simpson’s index
𝑛𝑖 = number of individuals of species 𝑖 𝑁 = total number of individuals of all species The greater the value of D, the lower the diversity
The greater the Simpson’s Index of Diversity, the greater the diversity
A D value of 1 represents complete dominance meaning only one species is present in the community.
Shannon-Weiner’s Index
𝑝
𝑖=
𝑛
𝑖𝑁
𝐻′ = ∑(𝑝
𝑖)(ln 𝑝
𝑖)
where 𝑝𝑖 = proportion of individuals found in species 𝑖 𝑛𝑖 = number of individuals in species 𝑖 𝑁 = total number of individuals of all species
When only one species is present, the value of H is 0. When all species are present in equal numbers, the maximum
values of index, 𝐻𝑚𝑎𝑥= ln 𝑆, where 𝑆 = total number of species *Relative Dominance
- absolute dominance of species i divided by the sum of dominance for all species
- usually done with trees **Rank Dominance Frequency