Modeling Count Data from Hawk Migrations

(1)

Modeling Count Data from Hawk Migrations 

M.S. Plan B Project Report

January 12, 2011

Fengying Miao

M.S. Applied and Computational Mathematics Candidate

Dr. Ronald Regal

Advisor

University of Minnesota Duluth

(2)

Table of Contents 

i.  ACKNOWLEDGEMENTS... 4  ii.  ABSTRACT ... 5  1  INTRODUCTION... 6  2  HAWK EXAMPLE ... 8  3  THE GENERAL AND GENERALIZED LINEAR MODELS ... 10  3.1  General Linear Model ... 10  3.2  Exponential Family of Distributions... 12  3.3  Generalized Linear Models ... 13  4  DO NOT LOG‐TRANSFORM COUNT DATA... 15  5  COMPARISON OF ESTIMATION METHODS USING THE DELTA METHOD... 17  5.1  Expected Values and Variances of Nonlinear Functions of Random Variables... 17  5.2  Single Mean ... 19  5.2.1  Poisson for single mean ... 20  5.2.2  Log‐normal for Single Mean... 20  5.3  Two Means ... 21  5.3.1  Poisson for two means... 21  5.3.2  Log‐normal for two means ... 21  5.4  Alternative Nonlinear Models... 22  6  FURTHER COMPARISONS OF MODELS... 23  6.1  Single Mean ... 23  6.1.1  Exact calculation for a single mean... 23  6.1.2  Specific example of comparing exact calculation and  ... 24  6.2  Two Means ... 25  6.2.1  Exact calculation for difference between means... 26  6.2.2  Comparisons of Models for Two Means by Doing Simulation... 27  6.2.3  Regression... 29  7  FITTING MODELS TO HAWK DATA ... 38  7.1  Simple introduction to some potential variables... 38  7.2  Fitting Models... 39  7.2.1  Fit Mixed Model to Data ... 40 

(3)

7.3  Summary of Findings ... 42  8  CONCLUSION ... 46  9  REFERENCES... 50  10  APPENDICES... 51  10.1  SAS Code ... 51  10.2  R code ... 66   

 

(4)

i.

ACKNOWLEDGEMENTS 

I would like to take this opportunity to give my sincere thanks to my advisor, Dr. Ronald Regal,

for his great support and guidance, which make it possible for me to finish the project. Dr.

Ronald Regal is the best advisor I have ever had. I would never forget his great help in my study

and life and what he told me that finding the limits of our knowledge and understanding is

always important.

I also want to give my thanks to Dr. Richard Green and Dr. Gerald Niemi for being on my

degree committee, reviewing my report and providing useful suggestions.

I also thank Heidi Seeland for providing the datasets in this project.

Thanks to Dr. Zhuangyi Liu for accepting me into this good program and letting me have the

(5)

ii.

ABSTRACT 

The General Linear Model (LM) with assumptions of independence, linearity and equal variance

underlies most statistical analyses. Because of its generality, many kinds of data are transformed

to satisfy its assumptions. Count data are often log-transformed using to more

nearly match the assumptions. However, adding a value of one to counts might generate biases,

so we need to choose a proper model for count data. In addition E(Ln(X)) is not the same as

Ln(E(X)) so even if the relationship is linear for Ln(E(X)), the same will not be perfectly true for

E(Ln(X)). To avoid or reduce the bias from transforming data, the Generalized Linear Model

(GLM) and nonlinear mixed (NLMIXED) model could be considered instead. This report

investigates how LM regression models, Generalized Linear Models based on Poisson and

negative binomial distributions, and approximate nonlinear models fit with NLMIXED model

compare when estimating the slope of a linear trend when analyzing count data. Implementations

of comparing models are done by the popular statistical software SAS with packages, PROC

REG, PROC GENMOD, PROC MIXED and PROC NLMIXED. A real data set from a hawk

migration is analyzed and fitted with the mixed model. The NLMIXED model is used to analyze

the variances and means of .

(6)

1 INTRODUCTION 

A statistical model is used to predict the probabilistic future behavior of a system from data. The

main purpose of model building is to obtain proper estimates with small bias and little variability.

The traditional model (LM) has been widely used, since many data can be modeled this way and

there are many available theories to be applied. Different methods, such as square-root

transformation and log-transforming, are often used to transform data, usually response variables,

to meet the assumptions of LM. These methods might work well for continuous response

variable and certain discrete variable, such count data including few ‘zero’ observations which

rules out direct log transformations. For example, in a study where migrating hawks are counted

hourly, the numbers counted are often zero.

More and more methods and models have been explored to break the limits of the

assumptions. The Generalized Linear Model, GLM, an extension of LM, allows the analyst to

specify the distribution of data, which address the problem of transforming data to be normally

distributed. The NLMIXED model in which both fixed and random effects are allowed to have

nonlinear relationships with response variables has become increasingly popular and allow

flexibility of nonlinear functions as well as user specified likelihood functions. These newly born

models can be applied to a wider range of real problems. Currently, the computing statistical

software has been keeping in line with the numerical methods and making them more applicable.

To get best estimates of response variable for a particular system, it is important to fit

proper model for best describing data. In this report I describe my investigations into finding

appropriate models to analyze count data such as hawk migration counts. Model selection from

(7)

negative binomial distributions. Relative bias and relative RMSE are used to evaluate how well

the models work. The relationship between variance and means of real data is modeled by using

NLMIXED model, which cannot deal with complicated random effects in real data we are going

to study. A mixed model fit with SAS proc MIXED is used for the real hawk data to account for

(8)

2 HAWK EXAMPLE 

A data set from monitoring of migrating hawks is used to illustrate the issues and conclusions in

this report. The data were collected in fall of 2008 by Heidi Seeland and Anna Peterson, graduate

students at UMD with Dr. Gerald Niemi as their advisors. In this section, the structure of hawk

data is initially described. Further details on fitting models to the data are discussed in later

chapters. The data set contains counts of hawk and eagles at three distances from the shore of

Lake Superior, over seven hours on certain days between August 29, 2008, and November 11,

2008. The sampling plan had eight sets of three sampling locations spread out along the north

shore of Lake Superior. The eight sets of three sampling sites were called transects and

numbered for 1 to 8 up the shore starting from Duluth.

(9)

One general category of hawks is accipiters, which fly lower and closer to tree cover. To make

the data set more understandable, let’s first introduce buteos, larger hawks with broad wings that

soar higher on wind currents. Figure 2 shows a plot of the average number of buteos per 7 hour

each day plotted against dates.

Fig. 2 Average Buteos Counts per day VS Dates in Original Scale

In this original scale, buteos counts on a day are dispersedly distributed over time. The huge

variation makes the form of the time trend unclear. As shown in Figure 3, log transformation of

(10)

Fig. 3 Average Buteos Counts per day VS Dates in Log-Scale

The relationship between buteos and dates follow a general linear trend, except the last two

points. For simplicity in applying models in this project which focus on estimating the slope of a

linear trend, we will leave out points after November 1 in demonstrating the fitting some models

to these data.

3 THE GENERAL AND GENERALIZED LINEAR MODELS 

Recent papers including Ohara and Kotze (2010) have advocated generalized linear models with

Poisson or negative binomial distributions rather than using normal linear models in the log scale.

Before comparing these models in a wider range of situations than considered by Ohara and

(11)

Consider a situation where we are interested, for example, in describing the number of violation

tickets people get for violating traffic regulations annually as a function of their age. The average

number of violation tickets is predicted by the following equation

(3.1)

where y is the response variable, Violation Tickets, is the explanatory variable, Age, and

measures the deviation of the measured y from its expected value. It may now be asked whether,

after allowing for the effect of age, a person’s sex has any influence on the frequency of violation

tickets people get. Based on this assumption, the appropriate model might be described as

(3.2)

where and represent Age and Sex, respectively.

Each time a new variable has been introduced into the model, an additional parameter has

been added. This process is an approach by which we find a mathematical description of the

structure in the values of response variable. These two models discussed above involve a linear combination of parameters , , , , and are consequently known as linear models. For example, polynomial regression model belongs this category despite the fact

that y is a non-linear function of the explanatory variable . The general form of linear models is

described as

(3.3)

where and represents the error that explanatory variable cannot tell. By

(12)

, (3.4)

or in the following compact form

(3.5)

Besides linearity, the usual general linear model also assumes normality, independence and equal

variance of observations, which can be written as

where ,

. (3.6)

3.2 Exponential Family of Distributions 

Linear models are postulated more often than non-linear ones because they are mathematically

easier to manipulate and usually easier to interpret. They appear to provide an adequate

description of many data sets. A wider class including normal distribution is called the

exponential family of distributions. Consider a single random variable Y whose probability

distribution depends on a single parameter θ. The distribution belongs to the exponential family

if it can be written in the form

(3.7)

If , the distribution is said to be in canonical form, and is sometimes is called the

(13)

exponential family includes such useful distributions as binomial, Poisson, negative binomial,

and gamma distributions, in addition to the normal distribution.

3.3 Generalized Linear Models 

There are many types of data which might not be normally distributed in original scale. To

address this problem, a transformation may be used to normalize the data. Often, people deal

with the log-transformation first, before evaluating other transformation techniques. But discrete

response variables, such as birds count data, often contain many ‘zero’ observations and are

unlikely to have a normally distributed error structure. Maindonald & Braun (2007) argued that

generalized linear models (GLMs) have largely removed the need for transforming count data.

More recently, GLMs have been developed and commonly used.

A GLM is an extension of the well-known linear models to include response variables

that follow any probability distribution in the exponential family of distributions. The key idea is

that, the relationship between and a linear predictor is specified by a link function:

(3.8)

where and is a link function that links the random component, , to the

systematic component . Equation (3.8) can be written as

(14)

For example, count data could be appropriately analyzed as a Poisson random variable within the

context of the Generalized Linear Model. So, for the observation bird count , we have

. The probability function for is described as

(3.10)

If we had a covariate x for predictor days, then

(3.11)

For the Poisson distribution, the mean and variance are equal. Real data do not always follow

this, and the variance ( ) is often much larger than the mean µ . This so-called overdispersion

can be incorporated into a model in several ways. These all estimate the amount of extra

variation but make different assumptions about how this extra variation scales with the mean.

The negative binomial distribution, for example, assumes with an overdispersion

parameter and the mean . The negative binomial distribution approximates to Poisson

distribution when is much bigger than i.e. approaches to infinity. To introduce the

negative binomial distribution in a simple way, we only use one variable here. Suppose

where . Then we can describe the probability function of negative

binomial distribution as follows:

(15)

The negative binomial is also a Gamma-Poisson Mixture. Suppose and .Then we can have the following procedures:

(3.13)

4 DO NOT LOGTRANSFORM COUNT DATA 

Ohara and Kotze (2010) provide a detailed discussion in their paper Do Not Log-transform Count Data. In that paper, they put forward that log-transformation of counts has the additional quandary in how to deal with zero observations. With just one zero observation (if this

observation represents a sampling unit), the whole data set is usually adjusted by adding a value

(usually 1, the lowest possible nonzero count) before transformation, so they introduced GLMs

to deal with count data.

They simulated data sets from a negative binomial distribution with different values of

(16)

that negative binomial distribution can be viewed as gamma mixture of Poisson. Low shrinks

the graph of Gamma probability function of , which pulls values to a smaller domain, thus

generating more clumping data. For each simulation, n=100 data points were simulated at each

of 20 mean values, µ = 1, 2, ..., 20. Five hundred replicate simulations were carried out for each

value of . Then they compared the outcome of fitting models that were transformed in various

ways (log, square root) with results from fitting models using overdispersed, quasi-Poisson

models and negative binomial models to untransformed count data. The simulations were

compared by calculating the mean bias and root mean-squared error in estimating log (µ).

In their results, the quasi-Poisson and negative binomial models behave similarly, having

negligible bias, whereas the models based on a normal distribution are all biased, particularly at

low means and high variances. The square-root transformation has a lower bias than any of the

log-transformations, unless the mean is low. Thus, they recommend that count data not be

transformed to be used in parametric tests. For such data, GLMs and their derivatives are more

appropriate.

However, their simulations were from negative binomial distributions. Poisson models

with extra-binomial variation still model the variation as proportional to the mean, whereas

negative binomial models include a term in the variance proportional to the mean squared. In

many data sets, Ln(Y+1) is fairly normal. For any of the discussions from here on Log or Ln are

interchangeable notations. Generally, when in statistics Log means Ln. For example in SAS,

Log(y) means Ln(y). Fitting a linear relationship to Ln(Y+1) of the daily counts of buteos

(17)

This normal plot is reasonably straight, at least close enough for normal methods to work well

enough. In later sections where we fit models to hourly counts, the normal plot is even straighter.

The results from Ohara and Kotze are limited to 1) negative binomial data, 2) estimating a single

mean and 3) very large replication, n=100. The generalized linear models worked in their

simulations, but how will they work in estimating slopes of trends if the data are normal with

variances not like Poisson or negative binomial data?

5 COMPARISON OF ESTIMATION METHODS USING THE DELTA METHOD 

5.1 Expected Values and Variances of Nonlinear Functions of Random  Variables 

In discussions below, I will use Taylor series approximations for approximating expected values

and variances of the nonlinear functions. First, I describe these methods, commonly known as

(18)

Suppose we have a random variable , and we know and , but we are

interested in the mean and variance of for some function . For example, we might be

able to measure and determine its mean and variance, but we are really interested in , which

is related to in a known way. If is linear, then this is pretty straightforward:

(5.1.1)

(5.1.2)

(5.1.3)

However, in many cases is not linear. In many areas of mathematics we find approximations by linearizing a nonlinear problem we cannot solve exactly. In probability and statistics, this

method is called propagation of error or the delta method.

Denote as the mean of . We use a first-order Taylor series approximation around

: (5.1.4) since (5.1.5) (5.1.6)

(19)

We have , but we know that in general from Jensen’s

Inequality. Thus, we can carry out the Taylor Series expansion to the second order to get an

improved approximation of .

(5.1.7)

Taking the expectation of right-hand side, we have,

(5.1.8)

(5.1.9)

How good such approximations depends on how nonlinear is in the neighborhood of

defined by the size of , where is the standard deviation of .

In comparing disadvantages of using Log(Y+1), the Poisson case with Y from Poisson

distribution should be studied where log-normal estimation is at a great disadvantage. Using the

delta method, we can start by comparing Poisson and log-normal estimation for the simple case

of a single mean, the case considered by Ohara and Kotze and then compare two means. For the

observation from the hawk counts , we have . To make the notation

consistent through the discussions of one mean, two means, and regression, throughout I will use

or .

(20)

Most of this report focuses on estimating changes or slopes across time, for example estimating

how bird populations are changing across several years of monitoring. But first I will discuss

briefly the case of estimating a single mean. In the one mean case, we consider the average

number of hawks at where is considered as predictor day. Let .

5.2.1 Poisson for single mean

From (3.10), we have at , so for Poisson likelihood , using the

delta method, the expectation and variance of can be obtained as follows:

(5.2.1)

(5.2.2)

Note that the degree of bias depends on the number of replicates, . Ohara and Kotze used

n=100 which results in little bias if Poisson data are modeled.

5.2.2 Lognormal for Single Mean 

If we use a normal distribution as an approximation to the distribution of , then

.For the Poisson model above we use the log of the average, whereas in the

normal model we use the average of log values.

(5.2.3)

(21)

Note that the Poisson model has smaller bias, expected value closer to log (µ). The smaller bias

is more pronounced for larger n such as n=100 for Ohara and Kotze. Unlike Poisson estimation,

the bias in using the mean of log values does not disappear with increasing n.

5.3 Two Means  

Suppose that and correspond to the average number of hawks at and ,

respectively. In this case a regression of Y on X will give a slope that is the same as the

difference between the means. Considering the difference between means, I will use

for _and for _.

5.3.1 Poisson for two means 

From (3.10), we know the true and . Then we get

and ,

where and . By applying the delta method, we have

(5.3.1)

(5.3.2)

5.3.2 Lognormal for two means 

For Log-transformation to ,

(22)

We are more concerned about the slope , so we would like to obtain the followings by using

delta method:

(5.3.3)

(5.3.4)

Again, the primary disadvantage of using normal likelihood methods is the larger bias. The

results given above are based on approximations, but based on simulations and exact calculation

given below, the general trends are accurate.

5.4 Alternative Nonlinear Models 

In the previous sections we used the delta method to find approximations for the mean and

variance for those parameter estimates, and we saw that using results in more biased

estimates. Alternatively, we could use these approximations to derive more unbiased estimators.

Since the means are no longer linear functions, we will need to use nonlinear models to

accomplish the estimation. Nonlinear mixed models in which fixed and random effects have

nonlinear relationships to the response variable are becoming more and more popular nowadays.

For using Taylor series expansions:

]

] (5.4.1)

(23)

If we assume that the variance is equal to the mean as in a Poisson distribution then a normal

approximation will use

(5.4.3)

More generally, we can assume an overdispersion models such as or

. Since both the mean and variance are nonlinear functions of the parameters,

procedures such as SAS NLMIXED is used to fit these nonlinear models, as I discuss later more.

6 FURTHER COMPARISONS OF MODELS 

The final purpose is to fit a good model for data on hawk migration by modeling effects such as

date, time of day, weather and distance from shore. To check comparisons of alternative models,

I did simulations and exact calculation to investigate how Poisson, negative binomial and

log-normal models compare when the data are Poisson, negative binomial and log-log-normal for

log(Y+1). I also investigated methods for bias corrections using approximate propagation of

error methods for log-normal for log(Y+1). A simple way to check different models is only to

see how hawk counts are distributed based on time effect.

6.1 Single Mean 

6.1.1 Exact calculation for a single mean 

In section 5.2, we have discussed the application of delta method for single mean and cases for

two means . For single mean case, we only compare exact calculation with delta method. Let

(24)

(6.1.1) (6.1.2) Using (6.1.3) (6.1.4)

The two methods aren't on equal footing above, since the Poisson calculations don't use S=0

cases, but these are not common in the models considered, and comparisons of exact and

delta-method results, for the same model, are completely comparable.

6.1.2 Specific example of comparing exact calculation and   

I use the one simple case to illustrate the differences between exact calculation and the delta

method approximation. Assume that we have observations and the observed hawk counts

have . Then the true value of is that . Applying

equations in sections 5.2 and 6.1, results for bias and root mean squared error (RMSE) about

(25)

  Exact calculation  for Poisson  regression 

Delta method for 

Poisson regression  Exact calculation for log‐normal of    Delta method for log‐ normal of    True    2.302585  2.302585  2.302585  2.302585    2.297543  2.297585  2.356573  2.356573    ‐0.005042438  ‐0.005  0.05398787  0.05398787    0.01015371  0.01  0.00944442  0.008264463    0.1008917  0.1001249  0.1097244  0.1057315 

Table 6.1.2.1 for and

(6.1.5)

(6.1.6) Conclusions from these results are as follows. 1) Comparing the first and second or third

and fourth columns, we see that the delta method approximations are quite good for means of

this size. For smaller means the approximations will be less precise. The delta method

approximations could be used to develop more efficient models as done later in this report. 2)

Comparing the first and third columns, the normal approximation has larger bias, smaller

variance, and a bit larger RMSE. Developing approximately bias corrected estimators could be

competitive with generalized linear models.

For comparing models with discrete distributions and closed form solutions such as

log(Y+1) or Poisson estimation, the simulations of Ohara and Kotze can be replaced with exact

calculations. In addition, simple delta method approximations can be used for initial

comparisons of alternative modeling methods before using more lengthy exact calculations for

final results on promising methods.

(26)

The next step up in complexity is comparing two means. The difference between means is the

same as the regression slope with only two x values. In this section let’s look at this simpler case

before moving on to a more usual regression case.

6.2.1 Exact calculation for difference between means 

In two-means case, I would like to discuss the comparisons among exact calculation, delta

method and simulation for Poisson and log-normal model. We also assume the data are Poisson

distributed with and . Based on (3.10), for the method of

estimation , where and ,  and 

in exact calculation can be obtained by the following:

               (6.2.1)        (6.2.2)  Where  , k = 1, 2.  For the method that , we have the followings:                (6.2.3)                (6.2.4) 

(27)

6.2.2 Comparisons of Models for Two Means by Doing Simulation 

Basically, I would like to compare different methods of estimation, such as ,

and non-linear mixed model, of biases and RMSEs for . Data

sets were simulated from a Poisson distribution. To check if the mean and number of data points

in each simulation are factors, I simulated data sets with different values of two-means and data

points[( , ,n=10), ( , ,n=20), ( , ,n=10), ( ,

,n=20)]. The data were analyzed assuming that time is a factor. Models were fitted making the

following assumptions about the response, y:

1. y follows a Poisson distribution

2. y follows a negative binomial distribution

3. log(y+1) transformation follows a normal distribution

a. A standard regression with mean linearly related to x and constant variance.

b. Nonlinear approximations to the mean and variance with nlmixed.

The simulations were also compared by using the mean bias and root mean-squared error

(RMSE). Simulations and analyses were carried out in the SAS statistical program using proc reg, proc genmod and proc nlmixed.

Fig. 4 and Fig. 5 show the bias and RMSE of against different

models for the data generated from different two means and data points. For example,

‘5_10with20’ means that the data are generated from , and .The amount of

(28)

the regression model for log(Y+1) has a little dependence on two means. But basically,

non-linear mixed model gives the best estimate of the slope, that is, the difference of means. The data

set with higher mean generates lower bias than the one with lower mean.

Fig. 4 Estimated mean bias from four different models, applied to data simulated form a Poisson distribution. A low bias means that the model will basically return the “true” value.

The root mean-squared error shows a similar pattern, with the non-linear mixed model having a

low RMSE. A combination of higher mean and more data points gives lower RMSE. From these

plots in Fig. 4 and Fig. 5, Poisson, negative binomial and nlmixed models perform well for

Poisson data no matter what values are chosen for and . In short, we don’t have to worry

(29)

Fig. 5 Estimated root mean-squared error from four six different models, applied to data simulated form a Poisson distribution.

6.2.3 Regression 

From the previous sections, using nonlinear approximations to the means and variances is a

viable alternative to fitting the correct model when data are Poisson. If data were always Poisson,

these methods would not be necessary. But since data often follow fairly lognormal patterns

with much larger variances relative to the mean than a Poisson distribution or even a negative

binomial distribution, these methods could work well over a large range of models. For the

hawk migration data, our primary interests are usually regression type analyses such as whether

the populations are decreasing over a span of years.

To decide on what methods I should use to fit models to the hawk data, I will compare

(30)

and log normal distributed. We are most interested in estimating the slope, , to monitor

changes in bird populations over time. The results will be shown with relative bias and relative

RMSE of , which makes it easier to see how large the bias and RMSE's are without referring

to the actual parameter values.

(6.2.5)

(6.2.6)

For example, a value of 0.2 means that the ratio of estimate divided by the true value in error is

20%. For simplicity here, we consider the number of days past September 1 as a trend factor for

the hourly hawk counts and generate data corresponding to the number of days from 0 to 50.

6.2.3.1 Regression with Poisson data 

The simplest model for count data is a Poisson distribution, so as in previous sections, at the

beginning of this Regression section, data are also simulated from a Poisson distribution. The

SAS statistical program is the main one for analyses. The main procedures include proc reg, proc genmod (For Poisson and negative binomial regression) and proc nlmixed. The following code is used to generate Poisson data with the mean of :

%let b0=0; %let b1=0.02; %let nsim=1000; %let n=10; data sim3.mydata; call streaminit(1895); do isim=1 to &nsim; do days=0 to 50 by 5; do rep=1 to &n; mu=&b0+(&b1)*days;

(31)

output; end; end; end; run;

The values for β0 and β1 are chosen to represent relatively small expected counts corresponding

to hourly observation of hawk counts, Y; the mean counts increase from an average of 1 bird per

hour to 2.7 birds per hour.

For the nlmixed model for Ln(Y+1) the approximations to the mean and variance of Ln(Y+1)

play big roles in estimates of parameters. From section 6.1, will be a good way to do

approximations. The nlmixed code is as follows:

proc nlmixed data=sim3.mydata;

ods output ParameterEstimates=parm_nlmix;

by isim; title 'nlmixed'; parms b0=0 b1=0.031 r=1 c=0.45; bounds c > 0; mu = b0 + b1*days; mu_y = exp(mu); var_y = c*mu_y**r;

mu_ln = log(mu_y+1) - 0.5*var_y/((mu_y+1)**2); var_ln = abs(var_y/(mu_y+1)**2);

model ln_y_1 ~ normal(mu_ln, var_ln); run;

The comparison results can be seen in the following table 6.2.3.1(1). The nlmixed model doesn’t

work as well as Poisson model and negative binomial model, both of which work very well for

Poisson data. The negative binomial model does very well for Poisson data. We sacrifice little in

fitting this more general model to the data. The nlmixed approximation does not do a bad job

either. Perhaps the extra complexity of the nlmixed models for more complex data will be worth

the small sacrifice in efficiency when the data are Poisson. From this simulation result, it is very

obvious that the regression model does not fit well for Poisson data. We see the comparison

(32)

Obs  method  MEAN  rel_bias  rel_RMSE  1  Poisson  0.020128  0.00638  0.23568  2  Reg  0.012683  ‐0.36584  0.39503  3  Negbin  0.020120  0.00598  0.23612  4  nlmixed  0.020610  0.03050  0.24795  Table 6.2.4.1(1)

(33)

Fig. 7 The estimated relative RMSE for different models

6.2.3.2 Regression with lognormal data 

As discussed earlier, in many case Ln(Y+1) is fairly normally distributed. Potentially, when data are of this sort, the generalized linear models such as Poisson regression or negative binomial regression might not be efficient compared to methods assuming normal errors. In this section, I simulate hawk data with Y following a discrete version of a lognormal distribution where Ln(Y) is normal. Because the discrete version will have zero counts, the analysis will be performed with Ln(Y+1). I will take the variance of Y to be proportional to a power of the expected value of Y. Since has a log normal distribution, where . The following equations show the way to generate random variables. Then the mean and variance of log normal variables are as follows:

(6.2.7)

(34)

` (6.2.9)

(6.2.10)

where and are both constants. Solving for we find

(6.2.11)

Then after knowing from (6.2.11), it is easily to get from (6.2.9)

(6.2.12)

For a Poisson distribution Vay(Y) = E(Y) which corresponds to r=1 and c=1. Meanwhile, Y has constant variance in Ln-scale when r=2. From the above methods, here comes the SAS code of generating data as followings:

%let b0=0; %let b1=0.02; %let nsim=1000; %let n=10; %let c=1; data one;

title 'Run Simulation';

call streaminit(1895733); do isim=1 to &nsim; do days=0 to 50 by 5; do r= 1 to 3.0 by 0.2; do rep=1 to &n; ln_mu_y=&b0+(&b1)*days; mu_y=exp(ln_mu_y); sig_2=log(1+(&c)*exp((r-2)*ln_mu_y)); std=sqrt(sig_2); mu=ln_mu_y-0.5*sig_2;

x=rand('normal',mu,std);

y1=exp(x);

rem = y1 - floor(y1);

y = floor(y1) + 1*(rand('uniform') < rem);

ln_y_1 = log(y+1);

var_y = (exp(sig_2) - 1)*mu_y**2; mu_y_r = mu_y**r; output; end; end; end; end; run;

(35)

The code y = floor(y1) + 1*(rand('uniform') < rem); is to keep the expected value of the rounded

version of Y the same as the expected value of y. For example if Y1 = 1.75, then floor(y) = 1,

the smallest integer less than or equal to Y1, and Y=1 with probability 0.25 and Y=2 with

probability 0.75. For these simulated values the mean of Y increases from 1.0 at days=0 to 2.7 at

days=50. These values were chosen to represent fairly small counts corresponding to smaller

hourly recording for the data of Seeland (2010).

Hawk data are count data, which possibly include many zeros, so it is more meaningful to

compare models with dependent variable where represents buteos counts. In our

simulation, it is easy to build relationship between and independent variable days with regression, Poisson and negative binomial models. Here I also mainly introduce the nlmixed

model. Before finalizing nlmixed model, the first key thing is to find good approximations to the

mean and variance of Ln(Y+1). Three approximation methods were compared. The “best” one

was chosen with the approximate mean of closest to the real mean and variance. To do

a meaningful simulation, simulating data and choosing an appropriate approximation method in

nlmixed model are two key steps. The main step in our simulation part is about selecting and

checking approximation method. Different expansions of and generate

different approximations of variances and means of . For example, we can apply Taylor

Series to do the following expansion . In our simulation, we used

a different approximation method by constructing a log likelihood function which can be seen in

the following nlmixed code:

proc nlmixed data=one;

ods output ParameterEstimates=parm_nlmix2; by isim r;

(36)

bounds c > 0;

ln_mu_y = b0 + b1*days;

sig_2 =log(c*exp((r-2)*ln_mu_y)+1); mu = ln_mu_y - 0.5*sig_2;

if y = 0 then LogLike = log(probnorm((log(0.5)-mu)/sqrt(sig_2))) ;

else LogLike = log(probnorm((log(y+0.5)-mu)/sqrt(sig_2))

- probnorm((log(y-0.5)-mu)/sqrt(sig_2))); model y ~ general(LogLike);

run;

This likelihood treats the observed counts as rounded lognormal data. Since this was the way the

simulation data were generated, this maximum likelihood method should be optimal at least

asymptotically for large sample sizes. Comparing this nlmixed model with regression, Poisson

and negative binomial models, the result plots are showed in Fig. 8 and Fig. 9. Obviously,

regression model performs poorly with large relative biases and RMSEs. Poisson and negative

binomial models perform well with good estimates. We can say nlmixed model works very well

(37)

Fig. 9 Relative RMSE of against different r

6.2.3.3 Summaries for Model Comparison 

Through comparing relative bias and RMSE for different models, the nlmixed model generally

does a good job no matter whether the data are Poisson distribution or log normal distribution.

Surprisingly, the regression model doesn’t work well for log normal data. Meanwhile, Poisson

and negative binomial still perform well. When the variance is proportional to a large power of

the mean, say 3 or more, the nlmixed nonlinear approximation works better, but for data between

Poisson, r = 1, and lognormal with constant variance, r = 2, the generalized linear models,

particularly the negative binomial model, work well even for lognormal data. The negative

binomial variance allows both Poisson variance with θlarge and variance

(38)

data, we will note that the variance is estimated to be proportional to µ1.8 which is within the range where either negative binomial or nonlinear approximations work well.

7 FITTING MODELS TO HAWK DATA 

In this section, we will further look at the hawk example introduced in section 2. From Fig. 3 in

section 2 with logarithm of average buteos counts each day against date, we can find that there

might be a linearly increasing trend in date. Does the wind during the observation hour affect

buteos migration? Could the distance to dry land be a factor in buteos counts?

To draw valid forecasting of buteos counts, model selection is important to us. Buteo

counts are discrete variables and might include zeros. Models for such data include Poisson and

negative binomial distributions, but it’s possible that there are too many zeros for Poisson or

negative binomial distributions. Another option is to use and apply methods for

normally distributed data. The Central Limit Theorem (CLT) helps make models work assuming

normality of data. This made us do simulations in section 6 and try to find an appropriate model

for hawk data.

7.1 Simple introduction to some potential variables 

Hawk counts were recorded under certain weather, geographic and geological conditions. Let’s

get to know more basic ideas about how we use these conditions.

1. Wind is considered as one of possible factor. Best wind direction is nearly zero=north, so north is chosen as the referenced wind direction. Wind was recorded as degrees

clockwise from north. The Wind_north_sp is wind speed times the cossin of the wind angle relative to north and can be understood as the strength of the northly wind vector.

(39)

The variable Wind Pre is used to record the number of days that winds did not have a westerly component before observation day.

2. We wonder if the time of a day, that is, a specific hour when observations began, could be a factor in counting migrating buteos, so variable Time will represent the starting time of observations a day.

3. We have noticed that buteo counts slightly increase with date. Then we use the variable

day to represent the number of days since Sept. 1, 2008.

4. Precipitation is also considered as potential predictor. The variable Precip Pre recorded the number of days with 50% or more hours of precipitation prior to observation day.

5. Likewise, we wonder if the distance to water would affect buteos counts. Distance to the

shore of Lake Superior is used to see if buteos migration somehow is related to this

geographical location.

7.2 Fitting Models 

From section 6.2.3, it seems that the negative binomial model would be a reasonable choice

given that NLMIXED cannot handle the random effects that we need in the model. These mixed

models with negative binomial data can be fit with SAS procedure NLMIXED. However, fitting

these types of models with these random effects turns out to be tricky. Nonlinear optimization

and numerical integration are needed, and for all models we fit, the resulting gradient vector in

the "solution" was not close to zero, which is what we want if we are at a local maximum of the

log-likelihood function. So we are back to using Ln(Y+1) as an initial analysis of these data.

From the simulations, using Ln(Y+1) should be less efficient than using the better methods, so at

(40)

efficient models. Further work will need to be done beyond this project to figure out how to fit

more complicated models.

At the first step, we try many independent variables, e.g. 19 and use as a selection

criterion to obtain potential variables. Generally, several models might be highly similar in the

quality of the fit based on selection. Based on the values, we only can choose a shorter list

of independent variables to start studying. The runs were done without random effects in the

models, since software is readily available to do this. The p-values will not be correct, but the

relative importance of the potential independent variables should be fine. We then fit a model

including variables included in the top models based on . Then the p-value is used as one of

the criteria to cut down variables based on former runs. Here is an example to show you how we

get rid of variables. By running a regression model including independent variable temp_chg, we

found that the p value of variable temp_chg is around 0.9236, which is very big indicating that

temp_chg is not needed if other variables are included in the model. We can say that

temperature-change is not an important predictor to buteos counts, so variable temp_chg need

not be considered in the model. Finally there are only 14 independent variables left by using the

similar method of using p-values to reduce variables.

7.2.1 Fit Mixed Model to Data 

Mixed models are widely used to model a linear relationship when the dependent data have

known structure. The commonly used mixed model involves repeated measurement. Repeated

measures are encountered in hawk data, so a mixed model is applied in analyzing the relationship

(41)

Transects are numbered by ordering the distances from Duluth up to the North Shore of

lake Superior. Drawing general conclusions about places in general is more meaningful than

finding out the effect of these specific transects. Thus, transect is considered as a random effect

here. Date is the day of observing hawk migration, which is treated as random effect too. The

sites on a given transect were distances from shore recorded as the variable shore (a, b, or c)

where shore = a is closest to Lake Superior and shore = c is farthest from Lake Superior. To

account for dependence of hourly measurements at the same site, a shore*date random effect is

also included.

Even though nlmixed might fit well for hawk data, nlmixed cannot handle both date and

shore*date random effects. This is the reason for using mixed rather than nlmixed for including

those random effects in the model. Proc Mixed in SAS system provides a very flexible platform for dealing with repeated measures problems. The mixed model can provide a better p value than

regression model to cut down variables. One of mixed model codes is as follows:

proc mixed data=fengying.buteos_before_nov plots=residualpanel(unpack);

class transect date shore;

model Ln_buteos_plus_1 = day shore Wind_Prev Precip_Pre wind_east

wind_north_sp time time*time/ residual outpm=outpm solution ;

random date shore*date;

ods select solutionf covparms tests3 ResidualQQplot ; run;

The estimated transect variance comes out as 0. To make the convergence of the estimation

simpler and more likely to find the right MLE, transect is taken out of the random effects for the

mixed models in our study.

7.2.2 Fit Nlmixed Model to Data 

The procedure nlmixed model cannot handle the model with both date and shore*date random

(42)

and also to check for the best wind direction. This model should be fixed to include all the

variables from previous runs. No random effects in our nlmixed model were used to make the

estimation easier. Applying the approximation method we finalized in section 6.2.3 into the

hawk data, we came up the following nlmixed code:

proc nlmixed data=fengying.buteos_before_nov;;

parms b0=-8.8 b_day=0.02 b_shore_a=0.4 b_shore_b=0.3

b_wind_prev=0.2 b_time=1.5 b_time_2 = -0.06 b_precip_pre=-0.75 k=0.2

r=2 c=0.5 theta=0; bounds c > 0;

wind = k*wind_sp*cos( (wind_dir-theta)*3.1415927/180 );

ln_mu_y = b0 + b_day*day + b_shore_a*shore_a + b_shore_b*shore_b + wind + b_wind_prev*wind_prev

+ b_time*time + b_time_2*time*time + b_precip_pre*precip_pre; sig_2 =log(c*exp((r-2)*ln_mu_y)+1);

mu = ln_mu_y - 0.5*sig_2; y = buteos;

if y = 0 then ll = log(probnorm((log(0.5)-mu)/sqrt(sig_2))) ;

else ll = log(probnorm((log(y+0.5)-mu)/sqrt(sig_2)) -

probnorm((log(y-0.5)-mu)/sqrt(sig_2))); model y ~ general(ll); run;

Using this code, the maximum likelihood estimate of the clockwise angle relative to north is

-0.03 with a standard error of 10o_{, very nearly true north. The estimate of r is 1.8 with a standard}

error of 0.19, corresponding to , indicating Y is log normal. We can say that

the data would not be modeled well as Poisson data.

7.3 Summary of Findings 

One of mixed model was introduced in section 7.2.1. Firstly, a newly built model prompts us to

look at how the errors of the model are distributed. In Fig. 10, we can see that the residuals are

almost distributed around the straight line except the last two points, which indicates the data

(43)

Fig. 10 QQ-plot for residuals from a mixed model

Intuitively, a good model should have the predicted values as close to true values as

possible. The R2 value of a model is the square of the correlation between the fitted and observed values. It is interesting to see how the predicted values from the mixed model are compared

with real values of . In Fig. 11 the basic trend can be described as equation y=x

except two outliers. The R2_{value for this model is about 0.4. Based on the simulations, better}

models could potentially be fit, but generally speaking, this model works fairly well for these

(44)

Fig. 11 Ln(buteos+1) VS. Predicted Mean of Ln(buteos+1)

The following table 7.3(1) with p-values shows that the effects day, wind_north_sp and time are significant to predict buteos counts. Again, better models could potentially be fit, but the significant p-values from this model are reliable. This is like using non-parametric methods

when data are normal or some other distribution. The statistics are not as efficient as they could

be, but significant effects can still be considered significant.

Type 3 Tests of Fixed Effects 

Effect  Num DF  Den DF  F Value  Pr > F 

day  1  425  26.73  <.0001  shore  2  45  3.69  0.0328  Wind_Prev  1  425  2.63  0.1056  Precip_Pre  1  425  5.30  0.0218  wind_east  1  425  4.31  0.0384  wind_north_sp  1  425  37.56  <.0001  time  1  425  78.66  <.0001  time*time  1  425  71.68  <.0001 

(45)

The day effect has been showed in Fig. 3. More and more buteos migrate as date gets

close to winter. In this part, we are more concerned about the wind_north_sp and time effects. To

check their effects, LSMEANS statements were added to the previous mixed model, respectively.

For example, lsmeans wind_north_sp /obsmargins;. For the mixed model with

this LSMEANS statement, the variable wind_east with p value of 0.7624 is not useful to this

model. The Fig. 12 is the plot the estimates of against the least squares means of

wind_north_sp using each unique value of wind_north_sp as its own effect , using

wind_north_sp as a "class" variable, rather than a linear effect. Obviously, there is an increasing

trend in wind_north_sp, which further illustrates that north is the best wind direction for buteos

migration.

Fig. 12 Estimates of VS. LS-means of wind_north_sp

For another mixed model with LS-means variable time, wind_east also came up to be an

(46)

is shown in Fig 13 using each hour of the day as its own effect, day as a "class" variable, rather

than a quadratic effect. Basically, the buteo migration peak in a day is in the early afternoon.

Fig. 13 Estimates of VS. LS-means of time

In nlmixed model, we want to see if there exists a relationship between variance of

and mean of , where Y is the buteo counts. We used the idea that in nlmixed

model. After running the nlmixed code in section 7.2.2, the estimates of r is 1.8 with a standard

erro of 0.19. In another words, the variance of buteo count is approximately proportional to the

square of the mean of buteos counts, so the mixed model which assumes equal variances in the

log scale is reasonable.

8 CONCLUSION 

Count data are commonly studied nowadays. The LM, GLM, MIXED and NLMIXED models

(47)

some of the results from Ohara and Kotze (2010) that log-transformation of count data performs

poorly while negative binomial and Poisson work well, so we do not recommend

log-transforming count data with many ‘zero’ observations. The negative binomial model might

perform better than the Poisson model for these kinds of data. The mixed model provides an

effective way to analyze count data with complicated random effects instead of NLMIXED

model. When applying the NLMIXED model, the main focus should be put on choosing a good

approximation method. SAS is a good statistical software to fit these models. In our simulations

the negative binomial model did well even for lognormal data. However, in 2008 there were not

as pronounced large bursts of buteos during the migration. With more very large count days, the

variance may be a higher power of the mean where the nonlinear models would have advantage

over the negative binomial models.

To answer the questions of research for the hawk data set in this report, the mixed model

is used to analyze it. The effects of day, wind_north_sp and time play central roles in estimating buteos count during a certain period of time. It is understandable that there are

increasing number of buteos that migrate as time gets closer to winter. Buteos fly from north to

south with the benefit of north wind when winter is coming, so it makes sense that

wind_north_sp is a significant factor and there exists an increasing trend in wind_north_sp. It is

possible that buteos prefer to migrate during a slightly warmer time, in the early afternoon,

which can also be seen from Fig. 13 with a downward ‘parabola’.

For future studies, we can incorporate the mixed model with other bird data, such as

accipiters which fly closer to the ground. By analyzing other bird data, we can further see if day, wind_north_sp and time are still significant in predicting other bird’s counts. Although the transformation of hawk data, to some extent, supports the mixed model, the generalized mixed

(48)

model (GLMM) can be studied to fit hawk data, since the nonnegative counts are more possible

to be Poisson or negative binomial distribution than normal distribution. More extensive data

from the Hawk Ridge observatory station in Duluth could be used for further investigations

including years with very large counts of buteos on some days. But before fitting GLMM, we

have to address the concern that the GLMM has less flexibility of selecting covariance structure

than the linear mixed model (LMM). The thing is how to balance the benefits and disadvantages

from the GLMM, which could become a future subject of study. For count data, many types of

Poisson mixed model have been put forward. It is often the case that there are more zero counts

than there should be for Poisson distribution. For this kind of case such as hawk data,

zero-inflated Poisson (ZIP) mixed models, which include not only the Poisson regression for zero and

nonzero counts but also a logistic regression for the probability of a nonzero response, have been

proposed and developed. For future work, we are also interested in fitting zero-inflated models,

ZIP or ZINB for data such as counts of migrating hawks. Likewise, we will compare power for

different models and consider other variables such as atmosphere pressure into the model. In

addition to estimating the slope trend over time, we would also like to investigate how well the

models predict the number of hawks at any given point in time.

The primary take home message from the simulations is that even if the data are

generated by normal models with variances proportional to no more than the mean squared, the

GLM negative binomial models are still quite good even for lognormal data and that methods

other than LMM's for Ln(Y+1) should be investigated, at least for the small counts that we used

in our simulations. The nlmixed models are similar to Generalized Estimating Equation (GEE)

models for correlated generalized linear models in that they apply normal theory using

(49)

need to be able to fit these reliably with more general random effects than nlmixed can handle.

The next step in fitting better models to these data is to figure out how to fit GLM negative

binomial models or nlmixed type models with multiple random effects. SAS procedure

GLIMMIX has the potential for mixed effects GLM models, but getting reliable convergence has

been a problem for us. Other software such as the nlme library in R or Bayesian methods should

(50)

9 REFERENCES 

[1] Robert B. O’Hara and D. Johan Kotze, 2010. Do Not Log-transform Count Data. Methods in Ecology& Evolution 2010, 1, 118-122.

[2] John A. Rice, 1987. Mathematical Statistics and Data Analysis, University of California, San Diego.

[3] Changming Xia, 2010. Modeling Data Correlation with Structured Covariance in Mixed Model, University of Minnesota Duluth.

[4] Annette J. Dobson and Adrian G. Barnett, 2002. An Introduction to Generalized Linear Models. Boca Raton: CRC Press

[5] Sheldon M. Ross, 2002. Introduction to Probability Models. Academic Press

[6] Norman I. Johnson and Samuel Kotz, 1969. Discrete Distributions. Boston: Houghton Mifflin Company

[7] Brian S Everitt and Graham Dunn, 1991. Applied Multivariate Data Analysis. New York &

Toronto: Halsted Press

[8] David Shen and Zaizai Lu. Statistical Application of SAS in Method Comparison Analysis.

[9] Mike Zdeb and Rober Allison. SAS/GRAPH® 101,SUGI 131.

(51)

10 APPENDICES 

10.1SAS Code 

Two-means simulation code:

libname sim3 "F:\Fengying Miao\simulation\Poisson vs Ln_normal"; run; %macro choose(u1,u2,n,nsim,dataset) ; %let b0=log(&u1); %let b1=log(&u2)-log(&u1); data sim3.mydata; do isim=1 to &nsim; call streaminit(1895); do time=0 to 1; do rep=1 to &n; mu=&b0+(&b1)*time;

y=rand('poisson',exp(mu));

ln_y_1=log(y+1); output; end; end; end; run;

proc sort data=sim3.mydata; by isim;

run;

ods listing close;

proc printto log="F:\Fengying Miao\simulation\Poisson vs Ln_normal\junk.log";; run;

proc reg data=sim3.mydata;

ods output ParameterEstimates=parm_reg3; by isim; model ln_y_1=time; run; data reg3; set parm_reg3(drop=model); method="Reg"; if variable="time"; run;

proc means data=reg3; by method;

output out=outs_reg3; run;

data outs_rega1(drop=_stat_ estimate); set outs_reg3(keep=method _stat_ estimate); if _stat_="STD";

(52)

run;

data outs_rega2(drop=_stat_ estimate); set outs_reg3(keep=method _stat_ estimate); if _stat_="MEAN";

MEAN=estimate; run;

data outs_rega3(drop=STD); merge outs_rega1 outs_rega2; by method; bias=MEAN-(&b1); var=STD**2; MSE=var+bias**2; RMSE=sqrt(MSE); run;

proc genmod data=sim3.mydata;

ods output ParameterEstimates=parm_gen3; by isim;

model y=time/link=log dist=poisson; run; data genmod3(drop=parameter); set parm_gen3; method="Poisson"; variable=parameter; dependent="y"; if variable="time"; run;

data a0(drop=s_include s_above s_below lowerwaldcl upperwaldcl); set genmod3(keep=isim method lowerwaldcl upperwaldcl);

retain s_include s_above s_below;

if lowerwaldcl<(&b1) and (&b1)<upperwaldcl then s_include+1; else if lowerwaldcl>(&b1) then s_above+1;

else s_below+1;

p_b1=s_include/&nsim; p_ab1=s_above/&nsim; p_bb1=s_below/&nsim;

label p_b1="prob(include b1)"; label p_ab1=" prob(above b1)"; label p_bb1="prob(below b1)"; if isim=&nsim;

run;

proc means data=genmod3; by method;

output out=outs_all3; run;

data outs_a1(drop=_stat_ estimate);

(53)

STD=estimate; run;

data outs_a2(drop=_stat_ estimate);

set outs_all3(keep=method _stat_ estimate lowerwaldcl upperwaldcl); if _stat_="MEAN";

MEAN=estimate; run;

data outs_a3(drop=isim STD lowerwaldcl upperwaldcl); merge a0 outs_a1 outs_a2;

by method; bias=MEAN-(&b1); var=STD**2; MSE=var+bias**2; RMSE=sqrt(MSE); run;

proc genmod data=sim3.mydata;

ods output ParameterEstimates=parm_neg3; by isim;

model y=time/link=log dist=negbin; run; data neg3(drop=parameter); set parm_neg3; method="Negbin"; variable=parameter; dependent="y"; if variable="time"; run;

data neg_a(drop=s_include s_above s_below lowerwaldcl upperwaldcl); set neg3(keep=isim method lowerwaldcl upperwaldcl);

else s_below+1;

run;

proc means data=neg3; by method;

output out=outs_neg3; run;

data outs_neg_b(drop=_stat_ estimate); set outs_neg3(keep=method _stat_ estimate); if _stat_="STD";

(54)

data outs_neg_c(drop=_stat_ estimate);

set outs_neg3(keep=method _stat_ estimate lowerwaldcl upperwaldcl); if _stat_="MEAN";

MEAN=estimate; run;

data outs_neg_d(drop=isim STD lowerwaldcl upperwaldcl); merge neg_a outs_neg_b outs_neg_c;

by method; bias=MEAN-(&b1); var=STD**2; MSE=var+bias**2; RMSE=sqrt(MSE); run;

proc nlmixed data=sim3.mydata;

ods output ParameterEstimates=parm_nlmix; by isim; title 'nlmixed'; parms b0=1.6 b1=0.7 r=1 c=1; bounds c > 0; mu = b0 + b1*time; mu_y = exp(mu); var_y = c*mu_y**r;

mu_ln = log(mu_y+1) - 0.5*var_y/((mu_y+1)**2); var_ln = abs(var_y/(mu_y+1)**2);

model ln_y_1 ~ normal(mu_ln, var_ln); run;

data nlmix(keep=isim variable estimate method) ; set parm_nlmix;

method="nlmixed"; variable=parameter; if variable="b1"; run;

proc means data=nlmix; by method;

output out=outs_nlmix; run;

data outs_nlmix1(drop=_stat_ estimate); set outs_nlmix(keep=method _stat_ estimate); if _stat_="STD";

STD=estimate; run;

data outs_nlmix2(drop=_stat_ estimate); set outs_nlmix(keep=method _stat_ estimate); if _stat_="MEAN";

MEAN=estimate; run;

data outs_nlmix3(drop=STD var MSE); merge outs_nlmix1 outs_nlmix2;

(55)

bias=MEAN-(&b1); var=STD**2; MSE=var+bias**2; RMSE=sqrt(MSE); run; data results_&dataset;

set outs_a3 outs_rega3 outs_neg_d outs_nlmix3; twomean="&u1._&u2.with&n"; run; %mend choose; %choose(u1=5,u2=10,n=10,nsim=1000,dataset=1) %choose(u1=2,u2=5, n=10,nsim=1000,dataset=2) %choose(u1=5,u2=10,n=20,nsim=1000,dataset=3) %choose(u1=2,u2=5,n=20,nsim=1000,dataset=4) data sim3.result_all;

set results_1 results_2 results_3 results_4;

run;

ods rtf file="F:\Master Project\Exact

calculation\exact_simulation\two_mean.rtf"; goptions reset=all;

symbol1 value=dot c=green height=0.25in; symbol2 value=star c=red height=0.3in;

symbol3 font=marker value=U c=brown height=0.15in; symbol4 value=circle c=red height=0.25in;

axis1 label=("Bias of b1"); axis2 label=("RMSE of b1");

proc gplot data=sim3.result_all;

plot bias*method=twomean;

plot RMSE*method=twomean;

run;

ods rtf close;

Regression code for Poisson data:

libname sim3 "F:\Master Project\Exact calculation\exact_simulation\pois_sim"; run; %let b0=0; %let b1=0.02; %let nsim=1000; %let n=10; data sim3.mydata; call streaminit(1895); do isim=1 to &nsim; do time=0 to 50 by 5; do rep=1 to &n; mu=&b0+(&b1)*time;

y=rand('poisson',exp(mu));

ln_y_1=log(y+1); output;

end; end;

(56)

run;

proc sort data=sim3.mydata;

by isim;

run;

ods listing close;

proc printto log='F:\Master Project\Exact

calculation\exact_simulation\pois_sim\junk.log'; run;

proc reg data=sim3.mydata;

ods output ParameterEstimates=parm_reg3;

by isim; model ln_y_1=time; run; data reg3; set parm_reg3(drop=model); method="Reg"; if variable="time"; run;

proc means data=reg3;

by method;

output out=outs_reg3; run;

data outs_rega1(drop=_stat_ estimate);

set outs_reg3(keep=method _stat_ estimate);

if _stat_="STD"; STD=estimate; run;

data outs_rega2(drop=_stat_ estimate);

set outs_reg3(keep=method _stat_ estimate);

if _stat_="MEAN"; MEAN=estimate; run;

data outs_rega3(drop=STD var MSE);

merge outs_rega1 outs_rega2;

by method; bias=MEAN-(&b1); var=STD**2; MSE=var+bias**2; RMSE=sqrt(MSE); rel_bias=bias/(&b1); rel_RMSE=RMSE/(&b1); run;

proc genmod data=sim3.mydata;

ods output ParameterEstimates=parm_gen3;

by isim;

model y=time/link=log dist=poisson; run;

(57)

data genmod3(drop=parameter); set parm_gen3; method="Poisson"; variable=parameter; dependent="y"; if variable="time"; run;

data a0(drop=s_include s_above s_below lowerwaldcl upperwaldcl);

set genmod3(keep=isim method lowerwaldcl upperwaldcl);

else s_below+1;

run;

proc means data=genmod3;

by method;

output out=outs_all3; run;

data outs_a1(drop=_stat_ estimate);

set outs_all3(keep=method _stat_ estimate);

data outs_a2(drop=_stat_ estimate);

set outs_all3(keep=method _stat_ estimate lowerwaldcl upperwaldcl);

data outs_a3(drop=isim STD lowerwaldcl upperwaldcl var MSE);

merge a0 outs_a1 outs_a2;

proc genmod data=sim3.mydata;

ods output ParameterEstimates=parm_neg3;

by isim;

model y=time/link=log dist=negbin; run;

(58)

data neg3(drop=parameter); set parm_neg3; method="Negbin"; variable=parameter; dependent="y"; if variable="time"; run;

data neg_a(drop=s_include s_above s_below lowerwaldcl upperwaldcl);

set neg3(keep=isim method lowerwaldcl upperwaldcl);

else s_below+1;

run;

proc means data=neg3;

by method;

output out=outs_neg3; run;

data outs_neg_b(drop=_stat_ estimate);

set outs_neg3(keep=method _stat_ estimate);

data outs_neg_c(drop=_stat_ estimate);

set outs_neg3(keep=method _stat_ estimate lowerwaldcl upperwaldcl);

data outs_neg_d(drop=isim STD lowerwaldcl upperwaldcl var MSE);

merge neg_a outs_neg_b outs_neg_c;

proc nlmixed data=sim3.mydata;

ods output ParameterEstimates=parm_nlmix;

by isim;

title 'nlmixed';

parms b0=0 b1=0.031 r=1 c=0.45; bounds c > 0;

Modeling Count Data from Hawk Migrations