You have data! What’s next?
Data Analysis, Your Research Questions, and
Proposal Writing
Zoo 511 Spring 2014
Part 1: !
Research Questions
Part 1: !
Research Questions
Write down > 2 things you thought were interesting or engaging during the field trip
(can be a species, a habitat feature, a
relationship, etc). You can phrase these as questions, but you don’t have to yet.
Part 1: !
Research Questions
Your questions should be specific
and answerable
Does sculpin CPUE differ among geomorphic units?
Is brown trout density related to flow velocity? In what kind of stream
are brown trout most likely to be found? What habitat do fish
prefer?
Current Velocity (m/s) Bro w n T ro ut /m 2 0 1 2 3 4 5 6 Sc u lp in p e r m in u te POOL RUN RIFFLE Scu lp in C PU E
…and statistically testable
Does sculpin CPUE differ among geomorphic units?
Is brown trout density related to flow velocity?
Part 2: Statistics
How do we find the answer to our
question?
Why use sta*s*cs?
Are there more green sunfish in pools or
runs?
Run
5
4
1
Pool
2
7
3
12
10
• Sta4s4cs help us find pa7erns in the face of varia4on, and draw inferences beyond our sample sites
• Sta4s4cs help us tell our story; they are not the story in themselves!
Statistics Vocab
(take notes on your worksheet)
Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool)
Continuous Variable: Measurements along a continuum, such as Flow Velocity
What type of variable is “Mottled Sculpin /meter2”?
Explanatory/Predictor Variable: Independent
variable. On x-axis. The variable you use to predict another variable.
Response Variable: Dependent variable. On y-axis. The variable that is hypothesized to depend on/be predicted by the explanatory variable.
Mean: The most likely value of a random variable or set of observations if data are normally distributed (the
average)
Variance: A measure of how far the observed values
differ from the expected variables (Standard deviation is the square root of variance).
Normal distribution: a symmetrical probability distribution described by a mean and variance. An assumption of
many standard statistical tests.
N~(µ1,σ1) N~(µ1,σ2) N~(µ2,σ2)
Hypothesis Testing: In statistics, we are always
testing a Null Hypothesis (Ho) against an alternate hypothesis (Ha).
p-value: The probability of observing our data or more extreme data assuming the null hypothesis is correct
Statistical Significance: We reject the null
hypothesis if the p-value is below a set value (α), usually 0.05.
What test do you need?
For our data, the response variable will probably be continuous.
T-test: A categorical explanatory variable with only 2 options.
ANOVA: A categorical explanatory variable with >2 options.
Tests the
statistical significance
of the
difference between means from two
independent samples
Student
’
s T-Test
Cross Plains Salmo Pond Mottled
Sculpin/m2
Compares the means of 2 samples of a categorical variable
Analysis of Variance (ANOVA)
Tests the
statistical significance
of the
difference between means from two or
more
independent groups
Riffle Pool Run
Mo ttl ed Scu lp in /m 2
Null hypothesis: No difference between means
Precautions and Limitations
• Meet Assumptions
• Samples are
independent
• Assumed equal
variance
(this assumption
can be relaxed)
Variance not equal
Precautions and Limitations
• Meet Assumptions
• Samples are
independent
• Assumed equal
variance
(this assumption
can be relaxed)
• Observations from data with a
normal
Precautions and Limitations
• Meet Assumptions
• Samples are
independent
• Assumed equal
variance
(this assumption
can be relaxed)
• Observations from data with a
normal
distribution
(test with histogram)
Simple Linear Regression
• Analyzes relationship between two
continuous variables:
predictor
and
response
• Null hypothesis: there is no relationship
(slope=0)
Residuals
Least squared line (regression line: y=mx+b)
Residuals
Residuals are the distances from observed points to the best-fit line
Residuals always sum to zero
Regression chooses the best-fit line to minimize the sum of square-residuals. It is called the Least Squares Line.
Precautions and Limitations
• Meet Assumptions
• Relationship is linear (not exponential,
quadratic, etc)
• X is measured without error
• Y values are measured
independently
Residual Plots Can Help Test Assumptions
0 “Normal” Scatter 0 Fan Shape: Unequal Variance 0 Curve (linearity)if assumptions are violated
• Try transforming data (log transformation, square root transformation)
• Most of these tests are robust to violations of
assumptions of normality and equal variance (only be concerned if obvious problems exist)
• Diagnostics (residual plots, histograms) should NOT be reported in your paper. Stating that
assumptions were tested is sufficient.
Precautions and Limitations
•
Meet Assumptions
•
Relationship is linear (not exponential,
quadratic, etc)
•
X is measured without error
•
Y values are measured independently
•
Normal distribution of residuals
P-value: probability of observing your data (or more extreme data) if no relationship existed
- Indicates the strength of the relationship, tells you if your slope (i.e. relationship) is non-zero (i.e. real)
R-Squared: indicates how much variance in the response variable is explained by the
explanatory variable
R-Squared and P-value
High R-Squared
R-Squared and P-value
Low R-Squared
R-Squared and P-value
High R-Squared
R-Squared and P-value
Low R-Squared
We just talked about:
•
Types of variables
•
3 sta*s*cal tests: t-‐test, ANOVA, linear
regression
•
When to use these tests
•
How to interpret the test sta*s*cs
•
How to be sure you’re mee*ng assump*ons
of the tests
Part 3: Proposal
Wri*ng a Proposal
•
What is the func*on of a proposal?
Wri*ng a Proposal
•
What is the func*on of a proposal?
•
What informa*on should go in a proposal?
– Research goals/objec3ves/hypotheses/ques3ons – Why does this ma?er? (Ra3onale)
– Procedure / Methods
– Future direc*ons / implica*ons – Budget/cost analysis
Other data you can use
Previous years’ data on website: all of the same information was collected from the same place,
around the same time of year. Replication!
USGS: http://waterdata.usgs.gov/nwis/uv?05435943
Background info: from the Upper Sugar River Watershed Association
Think about these data sources as you generate your questions.