Unconscious Selection on Beans and an Introduction to Hypothesis Testing BIOE 140 Fall 2011

(1)

Unconscious Selection on Beans and an Introduction to Hypothesis Testing BIOE 140 Fall 2011

As you read in your lab reading (Heiser 1988), unconscious selection can be defined as “selection resulting from human activities not involving a deliberate attempt to change the organism.” Specifically, Heiser argues that such unconscious selection, rather than intentional artificial (a.k.a. methodical) selection may be responsible for many of the differences between domesticated plants and their wild ancestors. In this kind of

selection, humans act as an “agent of selection” on other organisms, in much the same way that other aspects of those organisms’ biotic or abiotic environments do. The purpose of this lab is for you to act as an agent of unconscious selection on beans, in order to directly observe how selection sorts existing variation and can thus result in gradual change in the mean value of traits within a population. You’ll also develop null and alternative hypotheses and learn about the experimental and statistical methods scientists use to differentiate among them.

Lab Procedure

1. Form groups of 4-5 students. Each group should include at least one computer

equipped with a spreadsheet or statistical analysis program (Microsoft Excel, JMP, Open Office, etc.).

2. Obtain a plastic baggie containing 200 beans, as well as 20 plastic cups. Count the beans to make sure you have 200. If you’re missing any, get a few extra beans from the front of the class.

3. Designate one person in the group as the “bean picker” (a.k.a. “agent of selection”).

The bean picker will choose 20 lots of 10 beans at a time and place them into the plastic cups. Keep track of the order of the cups! Other members of the group will weigh the beans on the electronic balance at the front of the room and record the weights of each lot of beans.

First, however, you should develop a null hypothesis and two alternative hypotheses about how the weights of the lots of beans might be related to lot number. See the “Stats Primer” included with the materials for this lab for a more detailed explanation of null and alternative hypotheses, or ask your TA.

Null hypothesis (the pattern you would predict if the bean picker chose beans randomly with respect to weight): ______________________________________

__________________________________________________________________

Alternative hypothesis #1: ____________________________________________

__________________________________________________________________

Alternative hypothesis #2: ____________________________________________

__________________________________________________________________

(2)

Data Analysis

Once you’ve collected your data, you’ll perform a linear regression of lot weight on lot number. This section of the handout explains how to graph your results, perform the regression, and test your regression for significance in Microsoft Excel. If you have previous experience with statistics and would prefer to use another program, such as JMP, Systat, or R, you may do so.

1. Format your data into two columns, one containing lot numbers 1-20, and the other containing the total weight of the beans in each lot. See the example below:

Lot # Lot

Weight

1 5.4

2 5.3

3 5.5

2. Highlight your columns of data and choose “Insert Chart.” Depending on which version of Excel you have, this may be a menu option or a button.

3. From the various chart types available, choose “XY (Scatter),” (the one without lines connecting the points).

4. The next few steps likely depend on which version of Excel you have, but the end result should be a graph similar to the one below, with the axes labeled. Ask your TA or another group member for help if you’re not familiar with the process of making a scatter plot on your computer.

(3)

5. Perform the regression. Once you have your graph, right-click (control-click on a Mac) on any of the data points. Click “Add Trendline” on the menu that pops up. On the

“Type” page, choose “Linear.” Click the “Options” button, and put check marks in the boxes next to “Display equation on chart” and “Display r-squared value on chart.” You should get something that looks like the example below (but with more data points, of course):

6. The equation you see on the chart is the result of the linear regression you’ve performed. A linear regression analyzes how a dependent variable (y) varies as a function of an independent variable (x) and provides an equation of the form y = mx + b that relates the two (here, m represents the slope of the equation, and b represents the y- intercept). The R² value is a measure of how “tight” the relationship between the variables is. It tells you what percentage of the variation in the values of y is explained by variation in the values of x.

9. Test your regression for significance. This handout will explain how to do the test by hand, but in reality, almost all researchers use statistics programs for these sorts of tests.

Even a simple test like this is rather lengthy to do by hand. If you have and know how to use a statistics program, you’re welcome to do so, and the process will probably be much faster. If not, never fear! Just pay close attention to the equations below and ask your TA for help if necessary.

In order to calculate a P-value for your data (see the Stats Primer for an explanation), you first need to calculate a parameter called t (if you want to understand more of the theory behind this analysis, ask Beth or your TA in office hours). The formula for t is as follows:

t = m/SEm Where m = the slope of the linear regression equation and SEm = the standard error of the slope

You can get m from the equation on the chart you made in step 7. SEm is calculated as follows:

(4)

SEm = √[∑(yobs-ypred)² / (n-2)] / √[∑(xobs-xavg)²]

Here, yobs = your observed values of y (i.e., the lot weights), ypred = the value of y predicted by the linear regression equation, n = the sample size (20, in this case), xobs = your observed values of x (i.e., the lot numbers), and xaverage = the average value of x (you can calculate this using the spreadsheet, a calculator, or just by hand, as you prefer).

10. You can use the same Excel spreadsheet you used to enter your original data to calculate SEm and t. The easiest way is probably to create one column for the predicted values of y, one column for the numerator of the equation, and one column for the denominator. In the ypred column, enter the following Excel formula: =m*(cell with first xobs) + b. You should be able to drag this formula down to the rest of the cells in the column. Similarly, enter the following formula in the numerator column: =[(cell with first yobs – cell with first ypred)^2]/18. Drag the formula down to the rest of the cells in the column. In the denominator column, enter this formula: =(cell with first xobs – xavg)^2.

Drag the formula down to the rest of the cells in the column.

11. Calculate the sums of the numerator and denominator columns with the following formulae: =SUM(first numerator column cell:last numerator column cell) and

=SUM(first denominator column cell:last denominator column cell).

12. Take the square roots of the summation cells you created above with the following formulae: =(numerator sum cell)^0.5 and =(denominator sum cell)^0.5. Divide the numerator square root cell by the denominator square root cell. This value is SEm. Congratulations!

13. Calculate t using the formula in step 9. To calculate P in Excel, enter the formula

=TDIST(value of t,18,2). The 18 is the number of degrees of freedom you have in your analysis (it’s equal to n-2 for this type of test), and the 2 is the number of “tails” on the test. There are two tails because we originally considered two alternative hypotheses.

Again, if you want a more detailed explanation of the theory behind this test, ask Beth or your TA in office hours.

14. Write your R² and P values on the board, so they can be compared with those of the rest of the class.

(5)

Lab Report

Your lab reports should be shorter than 3 pages (including graphs) and will be due in next week’s section. Include the following sections in your lab report:

Introduction: Where you discuss your hypotheses and the broader context for your experiment. You should begin your intro with the big concepts (selection in general, artificial selection, unconscious selection, etc.) and work down to the small concepts (e.g., your specific hypotheses). Generally your intro should explain how this simple experiment fits into the context of research on selection.

Methods: Where you briefly explain what you did. Only include enough detail so that another researcher could repeat this experiment. Methods sections should NOT be written like an instruction manual (“Do this”), but rather as a description of the experiment you performed (“We did this,” or “This was done”).

Results: Should describe the patterns you saw. Refer to your figures (graphs) and report the results of your statistical analyses. Describe any statistically significant patterns, but DO NOT interpret your results (i.e., speculate about their meaning) here. Save your interpretations for the Discussion section.

Discussion: Includes a consideration of your hypotheses and proposed mechanisms.

Which of your hypotheses are your results most consistent with? What sort of

implications does this simple experiment have for the origins of agriculture and cultivated plants? How does it illustrate the mechanism by which selection works? Refer to the paper you read before lab and cite it in the text using parenthetical citations (i.e., Heiser 1988).

Literature Cited: At minimum, you should cite the paper you read before the lab. Use the following format:

Author. Year. Title. Journal Name Volume(Issue): pages.

For example:

Heiser, C. 1988. Aspects of unconscious selection and the evolution of domesticated plants. Euphytica 37(1): 77-81.

If you decide to incorporate other sources and cite them in the text of your lab report, you should also include them in the literature cited. It’s not necessary to cite any of Beth’s or the TAs’ lectures for this lab report, however.