• No results found

You have data! What s next?

N/A
N/A
Protected

Academic year: 2021

Share "You have data! What s next?"

Copied!
38
0
0

Loading.... (view fulltext now)

Full text

(1)

You have data! What’s next?

Data Analysis, Your Research Questions, and

Proposal Writing

Zoo 511 Spring 2014

(2)

Part 1: !

Research Questions

(3)

Part 1: !

Research Questions

 

Write down > 2 things you thought were interesting or engaging during the field trip

(can be a species, a habitat feature, a

relationship, etc). You can phrase these as questions, but you don’t have to yet.

(4)

Part 1: !

Research Questions

 

(5)

Your questions should be specific

and answerable

Does sculpin CPUE differ among geomorphic units?

Is brown trout density related to flow velocity? In what kind of stream

are brown trout most likely to be found? What habitat do fish

prefer?

(6)

Current Velocity (m/s) Bro w n T ro ut /m 2 0 1 2 3 4 5 6 Sc u lp in p e r m in u te POOL RUN RIFFLE Scu lp in C PU E

…and statistically testable

Does sculpin CPUE differ among geomorphic units?

Is brown trout density related to flow velocity?

(7)

Part 2: Statistics

How do we find the answer to our

question?

(8)

Why  use  sta*s*cs?  

Are  there  more  green  sunfish  in  pools  or  

runs?  

Run  

5  

4  

1  

Pool  

2  

7  

3  

12  

10  

• Sta4s4cs  help  us  find  pa7erns  in  the  face  of  varia4on,  and  draw   inferences  beyond  our  sample  sites  

• Sta4s4cs  help  us  tell  our  story;  they  are  not  the  story  in  themselves!  

(9)

Statistics Vocab

(take notes on your worksheet)

Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool)

Continuous Variable: Measurements along a continuum, such as Flow Velocity

What type of variable is “Mottled Sculpin /meter2”?

(10)

Explanatory/Predictor Variable: Independent

variable. On x-axis. The variable you use to predict another variable.

Response Variable: Dependent variable. On y-axis. The variable that is hypothesized to depend on/be predicted by the explanatory variable.

(11)

Mean: The most likely value of a random variable or set of observations if data are normally distributed (the

average)

Variance: A measure of how far the observed values

differ from the expected variables (Standard deviation is the square root of variance).

Normal distribution: a symmetrical probability distribution described by a mean and variance. An assumption of

many standard statistical tests.

N~(µ11) N~(µ1,σ2) N~(µ2,σ2)

(12)

Hypothesis Testing: In statistics, we are always

testing a Null Hypothesis (Ho) against an alternate hypothesis (Ha).

p-value: The probability of observing our data or more extreme data assuming the null hypothesis is correct

Statistical Significance: We reject the null

hypothesis if the p-value is below a set value (α), usually 0.05.

(13)

What test do you need?

For our data, the response variable will probably be continuous.

T-test: A categorical explanatory variable with only 2 options.

ANOVA: A categorical explanatory variable with >2 options.

(14)

Tests the

statistical significance

of the

difference between means from two

independent samples

Student

s T-Test

(15)

Cross Plains Salmo Pond Mottled

Sculpin/m2

Compares the means of 2 samples of a categorical variable

(16)

Analysis of Variance (ANOVA)

Tests the

statistical significance

of the

difference between means from two or

more

independent groups

Riffle Pool Run

Mo ttl ed Scu lp in /m 2

Null hypothesis: No difference between means

(17)

Precautions and Limitations

•  Meet Assumptions

• Samples are

independent

•  Assumed equal

variance

(this assumption

can be relaxed)

Variance not equal

(18)

Precautions and Limitations

•  Meet Assumptions

• Samples are

independent

•  Assumed equal

variance

(this assumption

can be relaxed)

•  Observations from data with a

normal

(19)

Precautions and Limitations

•  Meet Assumptions

• Samples are

independent

•  Assumed equal

variance

(this assumption

can be relaxed)

•  Observations from data with a

normal

distribution

(test with histogram)

(20)

Simple Linear Regression

•  Analyzes relationship between two

continuous variables:

predictor

and

response

• Null hypothesis: there is no relationship

(slope=0)

(21)

Residuals

Least squared line (regression line: y=mx+b)

(22)

Residuals

Residuals are the distances from observed points to the best-fit line

Residuals always sum to zero

Regression chooses the best-fit line to minimize the sum of square-residuals. It is called the Least Squares Line.

(23)

Precautions and Limitations

•  Meet Assumptions

•  Relationship is linear (not exponential,

quadratic, etc)

•  X is measured without error

•  Y values are measured

independently

(24)
(25)

Residual Plots Can Help Test Assumptions

0 “Normal” Scatter 0 Fan Shape: Unequal Variance 0 Curve (linearity)

(26)

if assumptions are violated

•  Try transforming data (log transformation, square root transformation)

•  Most of these tests are robust to violations of

assumptions of normality and equal variance (only be concerned if obvious problems exist)

•  Diagnostics (residual plots, histograms) should NOT be reported in your paper. Stating that

assumptions were tested is sufficient.

(27)

Precautions and Limitations

Meet Assumptions

Relationship is linear (not exponential,

quadratic, etc)

X is measured without error

Y values are measured independently

Normal distribution of residuals

(28)
(29)

P-value: probability of observing your data (or more extreme data) if no relationship existed

- Indicates the strength of the relationship, tells you if your slope (i.e. relationship) is non-zero (i.e. real)

R-Squared: indicates how much variance in the response variable is explained by the

explanatory variable

(30)

R-Squared and P-value

High R-Squared

(31)

R-Squared and P-value

Low R-Squared

(32)

R-Squared and P-value

High R-Squared

(33)

R-Squared and P-value

Low R-Squared

(34)

We  just  talked  about:  

• 

Types  of  variables  

• 

3  sta*s*cal  tests:  t-­‐test,  ANOVA,  linear  

regression  

• 

When  to  use  these  tests  

• 

How  to  interpret  the  test  sta*s*cs  

• 

How  to  be  sure  you’re  mee*ng  assump*ons  

of  the  tests  

(35)

Part 3: Proposal

(36)

Wri*ng  a  Proposal  

• 

What  is  the  func*on  of  a  proposal?  

(37)

Wri*ng  a  Proposal  

• 

What  is  the  func*on  of  a  proposal?  

• 

What  informa*on  should  go  in  a  proposal?  

– Research  goals/objec3ves/hypotheses/ques3ons   – Why  does  this  ma?er?  (Ra3onale)  

– Procedure  /  Methods  

– Future  direc*ons  /  implica*ons   – Budget/cost  analysis  

(38)

Other data you can use

Previous years’ data on website: all of the same information was collected from the same place,

around the same time of year. Replication!

USGS: http://waterdata.usgs.gov/nwis/uv?05435943

Background info: from the Upper Sugar River Watershed Association

Think about these data sources as you generate your questions.

References

Related documents

Select a power supply where the rated load is at/or below the current of the device and the Peak Current is less than the short-circuit rating of the power

This is “the probability of obtaining data that is as or more supportive of the alternative hypothesis than the data that were observed, when the null hypothesis is correct.”..

The p-value of the data is the probability calculated assuming that the null hypothesis is true of obtaining a test statistic that deviates from what is expected under the null (in

The P-value is NOT the chance that the null hypothesis is true – it’s the chance of us seeing DATA as far away as what we saw, if the null hypothesis were true, so if the P- value

In null hypothesis significance testing, the p value is the probability of obtaining a test statistic at least as extreme as the one that was observed, assuming that the null

I wasn’t sure what I was going to say or how I was going to fit it all in, but what I did know was that three times that day, while praying into the event, I could literally feel

The purpose of this study was to examine how the Events Industry Council (EIC) administers and markets the Certified Meeting Professional (CMP) accreditation to corporate

If a decision maker whose behavior conforms to the max-min expected utility model (Gilboa and Schmeidler [15]) is faced with a scoring rule for a subjective expected utility