Linear Mixed-Effects Models to Measure the Effect of Age on Marathon Performance

By Spencer Siegel

Senior Honors Thesis Statistics and Operations Research University of North Carolina at Chapel Hill

April 21, 2020

Approved:

____________________________________ Dr. Richard L. Smith, Thesis Advisor

**ABSTRACT**

Spencer Siegel: Linear Mixed-Effects Models to Measure the Effect of Age on Marathon Performance

(Research supervised by Dr. Richard L. Smith)

Age has been widely accepted as a critical factor which affects marathon performance. Currently, there is no widely accepted standard that measures this effect. Although age-graded standards exist, these standards are computed using world-record times which may not be indicative of the aging process for a typical runner. Prestigious marathons, such as the Boston Marathon, utilize qualification times to restrict the number of participants and obey space limitations. The fairness of these qualification times has been a hot topic in the running

community. The qualification times for the Boston Marathon currently depend on one’s gender and age-group, but the exact effect of these variables is unknown. The Boston Marathon does not currently use a model to specify qualification standards. This paper serves as a quantitative approach to measuring the effect of age and gender on marathon performance. We have identified gender, age, and the conditions of the race as three critical factors which influence marathon performance. We have utilized mixed-effects modeling to model the effect of these variables on marathon finish times. We have shown the age-performance curves derived from these models. Additionally, we have developed methodology to construct qualifying standards based on the results of the model. Recently, we have modeled for dropout probability to address the issue of survivor bias in the marathon data. We hope this research can serve as a foundation to address the fairness of current marathon qualification standards and, more broadly, to

**ACKNOWLEDGEMENTS**

I wish to express my deepest gratitude to Dr. Richard L. Smith for his dedication and assistance in my research. Not only has been a phenomenal research advisor, but he has also served as a mentor. His enthusiasm for statistics and generosity have inspired my desire to pursue further education in statistics.

I would also like to thank my professors and fellow classmates who have had a profound impact on my experience in Chapel Hill. I really enjoyed my time as an undergrad and will forever cherish the memories.

**TABLE OF CONTENTS**

**1. Introduction...1**

**2. Data...4**

**3. Methods...8**

**4. Results...21**

**A. Boston Marathon (2001-2017 except 2015)... 22**

**B. Chicago Marathon (2000-2017 except 2015)...34**

**C.** **New York Marathon (2000-2019 except 2012)...44**

**D. Los Angeles Marathon (2000-2019)... 49**

**E.** **Marine Corps Marathon (2000-2018)... 53**

**F.** **Twin Cities Marathon (2000-2019 except 2003-2005)...56**

**G. Philadelphia Marathon (2000-2019)... 59**

**H. Houston Marathon (2000-2018 except 2011)...60**

**I.** **Grandma’s Marathon (2000-2019 except 2006)...62**

**J.** **California International Marathon (2000-2019)...63**

**K. Results Summary... 65**

**5. Discussion...66**

**LIST OF TABLES**

Table 1.1- 2020 Boston Marathon Qualifying Standards...3

Table 2.1 - Overview of Datasets...7

Table 4.1 - Model 1 Summary Output on Female Participants in the Boston Marathon...22

Table 4.2 - Model 1 Summary Output on Male Participants in the Boston Marathon...23

Table 4.3 - Historical Temperatures of Boston Marathon...24

Table 4.4 - Qualifying Standards produced from Model 1 on Boston Marathon data...27

Table 4.5 - Qualifying Standards produced from Model 1 on top 75% of Boston Marathon data...31

Table 4.6 - Historical Temperatures of Chicago Marathon...36

Table 4.7 - Qualifying Standards produced from Model 1 on Chicago data...38

Table 4.8 - Qualifying Standards produced from Model 1 on top 50% of Chicago data...43

Table 4.9 - Qualifying Standards produced from Model 1 on New York data...46

Table 4.10 - Qualifying Standards produced from Model 1 on top 50% of New York data...49

Table 4.11 - Qualifying Standards produced from Model 1 on Los Angeles data...51

Table 4.12 - Qualifying Standards produced from Model 1 on Marine Corps data...54

Table 4.13 - Qualifying Standards produced from Model 1 on Twin Cities data...58

Table 4.14 - Qualifying Standards produced from Model 1 on Philadelphia data...60

Table 4.15 - Qualifying Standards produced from Model 1 on Houston data...61

Table 4.16 - Qualifying Standards produced from Model 1 on Grandma’s data...63

Table 4.17 - Qualifying Standards produced from Model 1 on California International data...64

**LIST OF FIGURES**

Figure 2.1 – Boston Marathon Data # of Races per Participant...7

Figure 3.1 – Cross-Validation plot of Natural Splines for Boston Female data...11

Figure 3.2 - Cross-Validation plot of Natural Splines for Boston Male data...11

Figure 3.3 - Cross-Validation plot of Orthogonal Polynomials for Boston Female data...12

Figure 3.4 - Cross-Validation plot of Orthogonal Polynomials for Boston Female data...12

Figure 3.5 - Cross-Validation plot of Natural Splines for Chicago Female data...13

Figure 3.6 - Cross-Validation plot of Natural Splines for Boston Male data...13

Figure 3.7 - Cross-Validation plot of Orthogonal Polynomials for Chicago Female data...14

Figure 3.8 - Cross-Validation plot of Orthogonal Polynomials for Chicago Female data...14

Figure 3.9 – Histogram of Boston Marathon Finish Times...16

Figure 4.1 – Lattice Plot for Year Random Effect on Boston Female Data...24

Figure 4.2 - Lattice Plot for Year Random Effect on Boston Male Data...25

Figure 4.3 – Age-Performance curve for Model 1 on Boston data...26

Figure 4.4 - Age-Performance curve for Boston Female data split by Quartiles...28

Figure 4.5 - Age-Performance curve for Boston Male data split by Quartiles...28

Figure 4.6 - Age-Performance curve for top 75% of Boston data...30

Figure 4.7 – Age-Performance Curve from Model 2 on Boston data...31

Figure 4.8 - Age-Performance Curve from Model 3 on Boston data...32

Figure 4.9 - Age-Performance Curve from Model 4 on Boston data...33

Figure 4.10 - Age-Performance Curve from Model 1 on Chicago data...34

Figure 4.11 - Lattice Plot for Year Random Effect on Chicago Female Data...35

Figure 4.12 - Lattice Plot for Year Random Effect on Chicago Female Data...36

Figure 4.13 - Age-Performance Curve from Model 1 (df = 3) on Chicago data...38

Figure 4.14 - Age-Performance Curve from Model 1 (df = 4) on Chicago data...39

Figure 4.15 - Age-Performance curve for Chicago Female data split by Quartiles...40

Figure 4.16 - Age-Performance curve for Chicago Male data split by Quartiles...40

Figure 4.17 - Age-Performance Curve from Model 1 on top 50% of Chicago data...41

Figure 4.18 - Age-Performance Curve from Model 4 on Chicago data...43

Figure 4.19 - Age-Performance Curve from Model 1 on New York data...44

Figure 4.20 - Age-Performance curve for New York Female data split by Quartiles...47

Figure 4.21 - Age-Performance curve for Chicago Male data split by Quartiles...47

Figure 4.22 - Age-Performance curve from Model 1 on Los Angeles data...49

Figure 4.23 - Age-Performance curve for Los Angeles Female data split by Quartiles...51

Figure 4.24 - Age-Performance curve for Los Angeles Male data split by Quartiles...51

Figure 4.25 - Age-Performance curve from Model 1 on Marine Corps data...52

Figure 4.26 - Age-Performance curve for Marine Corps Female data split by Quartiles...54

Figure 4.27 - Age-Performance curve for Marine Corps Male data split by Quartiles...54

Figure 4.28 - Age-Performance curve from Model 1 on Twin Cities data...55

Figure 4.29 - Age-Performance curve from Model 1 on Philadelphia data...57

Figure 4.30 - Age-Performance curve from Model 1 on Houston data...59

Figure 4.31 - Age-Performance curve from Model 1 on Grandma’s data...60

**1. Introduction**

Age has been widely accepted as a critical factor which affects performances in long-distance races such as marathons. However, the degree to which age affects runners could vary. Some runners may improve into their thirties before seeing a decrease in performance. Others may see a dip in performance much earlier. Nonetheless, measuring the effect of age on running times is essential for the running community. Runners desire an objective metric to determine performance relative to their age. Races such as the Boston Marathon also need an objective performance measure (which depends on age and gender) to define qualifying times. Age-graded performances have been a common standard to measure running performance with respect to age. However, this standard is based on world-record times which may not be indicative of the “average” runner. By simply using world records, these age-graded standards are not tailored to any particular individual. The world-record holder for a certain age-group is most likely not the same runner as the world-record holder at a different age-group. Therefore, it seems illogical to assume that this is how a typical runner’s performance changes with age. Lastly, it seems

unreasonable that a new world-record time should negatively impact the age-graded scores of all runners in the age-group of the new world-record holder. This metric does not define constant performance grades since the standard relies on world-records.

manually, which was quite a tedious process. Thankfully, Dorit Hammerling, Associate

Professor of Statistics at Colorado School of Mines, and two of her students, Laura Albrecht and Ross Ring-Jarvi, have spent countless hours scraping large marathon datasets from

marathonguide.com. This was an essential component, which has allowed me to analyze ten different marathons across the United States. These races range from years 2000 to 2020.

With the larger datasets, we first replicated Professor Smith’s process and compared the results. We have also come up with a few unique ways to extend the analysis such as splitting the runners into quartiles and manipulating the fixed and random effect components in the model. We split the runners into quartiles to test whether or not the relationship between performance and age is dependent on the ability of the runner. This idea could prove to be an attractive alternative if a relationship exists between a runner’s aging curve and their ability. We were also intrigued to manipulate the components of the model, such that distinct age-performance curves would be fitted for each runner.

Additionally, we were particularly interested in assisting the Boston Marathon in their efforts to determine the most effective qualifying times. The Boston Marathon aims to accept a certain number of qualified runners each year. The 2020 Boston Marathon qualifying standards are shown below:

Age-Group Males (Hrs:Mins) Females (Hrs:Mins)

18-34 3:00 3:30

35-39 3:05 3:35

40-44 3:10 3:40

45-49 3:20 3:50

60-64 3:50 4:20

65-69 4:05 4:35

70-74 4:20 4:50

75-79 4:35 5:05

80+ 4:50 5:20

Table 1.1- 2020 Boston Marathon Qualifying Standards

These strict qualifying times have been tightened twice in the last decade in response to the increasing number of applications from qualified runners. However, the Boston Marathon remains oversubscribed (e.g. more qualified runners apply for the marathon than the Boston Athletic Association (BAA) permits). In 2019, the actual qualifying time necessary to garner acceptance to the Boston Marathon was one minute and thirty-nine seconds faster than each of the times listed above. For instance, a 30 year-old female needed a 3:28:21 marathon time to be acceptedand a 50 year-old male needed a 3:23:21 marathon time to be accepted. This certainly frustrated runners who thought they qualified, especially those who have been turned away in the past for the same reason. The BAA would like potential participants to know the precise

marathon time needed for acceptance to the Boston Marathon. However, they cannot simply accept more runners due to space limitations.

**2. Data**

The data used in this analysis was scraped from marathonguide.com by Dr. Hammerling and her students. This data consists of all individual marathon performances from the Boston Marathon, Chicago Marathon, New York Marathon, Los Angeles Marathon, Marine Corps Marathon, Twin Cities Marathon, Philadelphia Marathon, Houston Marathon, Grandma’s Marathon, and the California International Marathon (The years included for each of these races can be seen in Figure 2). Each dataset represents a race (e.g. Boston Marathon) and each

observation represents a specific runner in a particular year of that race (e.g. John Smith in the 2009 Boston Marathon). At a minimum, each observation contains the name, gender, age, nationality, and finish time. However, the datasets typically include other information such as hometown, place within a certain group (e.g. 5th male between ages 18-34 to finish the race), qualifying time, and time splits. While many races such as the Chicago and Boston marathons have qualifying standards, one important note is that the observations are not exclusive to qualified runners. Runners can also participate if they make a charity donation or are granted admission through a lottery (The Boston Marathon does not have a lottery). These runners are not encoded distinctly in the data. However, wheelchair runners were encoded distinctly in the data. We have removed these participants from the data because wheelchair times are not comparable with running times, being generally much faster.

The marathon datasets were in a very raw format, which required preprocessing in R. For instance, we converted all finishing times from ‘hours:minutes:seconds’ to minutes (as a

Additionally, we only considered races that were finished in six hours or less. This is a common marathon standard which we utilized to exclude outliers and non-running participants.

One of the largest challenges in handling the data was re-identifying runners. We wanted to create a data-frame for each marathon in which an observation would define a particular race outcome. The observations would include the runner’s ID (which is unique for each runner), age, gender, finish time, and year of the race. This unique runner ID would allow for re-identification of a runner in multiple different years of the same race. For example, the ID would recognize that a particular runner participated in the Boston Marathon in 2004, 2005, and 2006.

Unfortunately, the data lacked a distinct id which referenced each runner and was consistent from year to year. Identifying runners solely based on the runner name is not sufficient because many runners can have the same name. Common names such as “John Smith” could be

year, the age of a runner may not increase by one each year (if the runner had a birthday near the date of the race). We could not account for this phenomenon. If a runner’s age increased by 2 or did not increase at all between two years, the runner would not be re-identified. Also, we did not account for runners who were encoded with different names each year. For instance, if a runner was listed as “Bob Smith” from 2010-2012 and “Robert Smith” from 2013-2014, these were decided to be different runners.

Given the potential drawbacks of our methodology for re-identifying runners, we cannot be sure that all runners were reidentified correctly. The identification procedure we utilized was intended to be overly conservative. While this may exclude some runners from our analysis, it reduces the likelihood of assigning the same ID to different runners. Nonetheless, we proceeded with the analysis as we could not think of a plausible alternative without significantly reducing the dataset. After reidentifying runners, we filtered the data to only include runners who ran in at least K years of a given race, for some integer K greater than or equal to 2. For most analyses, we let K=2 in an effort to include the most observations possible. After preprocessing the data and reidentifying runners, the number of observations and unique runners for each race is shown below (A runner is included if he/she participated in at least 2 years of a given race):

Race Years # of Unique Runners # of Observations

Boston Marathon 2001-2017 except

2015 51,119 146,570

Chicago Marathon 2000-2017 except

2015 72,588 194,370

New York Marathon 2000-2019 except

2012 84,515 236,526

Marine Corps

Marathon 2000-2018 44,630 128,535

Twin Cities Marathon 2000-2019 except

2003-2005 21,262 65,617

Philadelphia Marathon 2000-2019 19,784 53,741

Houston Marathon 2000-2018 except

2011 17,288 60,889

Grandma’s Marathon 2000-2019 except

2006 15,382 46,530

California International Marathon

2000-2019 14,667 44,933

*There is no explicit reason why certain years were omitted from the data Table 2.2 - Overview of Datasets

As seen by the table above, the average runner participated in about 2-3 races for each marathon. However, some participants ran in far more than that. The histogram below shows the distribution of the number of races for the participants in the Boston Marathon data.

The distribution of the number of races per participant is relatively consistent across all marathons. The Boston Marathon included slightly higher numbers than other races, but not by a significant margin. It also should be noted that each marathon dataset tended to have more observations in recent years. This is likely due to increased participation and better kept data. Also, the constraint that we placed on finish time (less than or equal to 6 hours) reduced most races by about 5-8% of the data. However, it did have a large impact on the Los Angeles data, omitting just over 25% of the data. Perhaps, this race attracts many casual participants.

**3. Methods**

To begin analyzing the large datasets, we performed Dr. Richard Smith’s analysis to model marathon finish times. The approach uses a mixed-effects model in which the year of the race and the overall ability of a runner serve as random effects. The age of the runner at the time of the race is treated as a fixed effect in the model. The model used is given below:

log(*tij*)=*αi*+*βyij*+*S*(*aij;df*)+*ϵij*

(Model 1)
● *tij is the jth finish time for runner i (in minutes)*
● *yijis the year of the jth finish time for runner i*

● *aij is the age of runner i when running his/her jth race*

● *αi is the overall ability of runner i (smaller number indicates a faster runner)*
● *βyij* is the year effect

It should be noted that men and women have been modelled separately throughout this analysis. This was done because men and women have very different finish times and perhaps aging patterns.

The intuition behind this model is that there are three factors that contribute to one’s finish time in a given race: the ability of the runner, the age of the runner, and the year of the race. In order to determine the effect of age on finish times, the model separates out the two other factors which also contribute to finish times. Obviously the individual ability of the runner contributes to finish times, but the reasoning behind including a year effect may not be as intuitive. The conditions of the race in a given year can significantly affect finish times depending on the temperature, precipitation, or other extraneous environmental factors. For instance, the 2004 and 2012 Boston Marathons were especially hot, which led to slower finish times. Contrarily, the 2018 Boston Marathon was very cold and had a strong headwind.

Conditions which led to slower finish time would correspond to a greater year effect (*βyij*). This

year effect can certainly be valuable to compare one year to another or to determine the quality
of a finish time given the conditions of the year. This model can produce age-performance curves
for the typical runner by zeroing the random effect coefficients (*βyij*, *αi). Zeroing the random *

effects for the year and individual runner ability allows for the isolation of the age variable to measure its effect on marathon performance.

One important note about this model is the use of the nonlinear function to incorporate
the age of runners. *S*(*a; K*)is given by:

*S*(*a; K*)=

### ∑

*k=*1

*K*

*γ _{k}s_{k}*(

*a*)

● *s*_{1}*,*...*, sKare fixed basis functions*

The fixed basis functions (*sk) could be basis functions from orthogonal polynomials or a *
spline representation (such as b-splines or natural splines). The different fixed basis functions we
utilized in this analysis were natural splines (ns function in R) and orthogonal polynomials (poly
function in R). These are both common interpolation methods which allow for non-linear

modeling of the effect of age on finish time. Natural splines are an interpolation method which uses cubic polynomials to create a smooth line through a set of points. Orthogonal polynomials interpolate data by using orthogonal polynomial functions.

In order to determine the optimal fixed basis function (*s _{k}*¿

_{ and its corresponding degrees }

of freedom, we conducted k-fold cross validation to analyze the model’s predictive accuracy on out-of-sample data. We tried both 5-fold and 10-fold cross validation, but found that using 10 folds did not significantly change the results. The process of 5-fold cross-validation begins with splitting the data into 5 equally sized folds. Then, we trained Model 1 using 4 of the 5 folds of data and tested its predictive accuracy by mean squared error.

*MSE*=1

*N*

### ∑

*i=*1

*N*

( ^*t _{ij}*−

*t*)2

_{ij}● N is the number of observations in the test set

● * _{t}*^

_{ij is the predicted jth finish time for runner i (in minutes)}● *tij is the jth finish time for runner i (in minutes)*

Figure 3.2 – Cross-Validation plot of Natural Splines for Boston Female data

Figure 3.4 - Cross-Validation plot of Orthogonal Polynomials for Boston Female data

Figure 3.6 - Cross-Validation plot of Natural Splines for Chicago Female data

Figure 3.8 - Cross-Validation plot of Orthogonal Polynomials for Chicago Female data

Figure 3.9 - Cross-Validation plot of Orthogonal Polynomials for Chicago Feale data

degrees of freedom for both males and females. There is not much of a significant difference in cross-validation error when increasing the degrees of freedom. The results from the Chicago Marathon are quite similar except it appears that there is a bit more non-linearity in the women’s curve. This pattern in the Chicago data was quite similar across other races as well. In an effort to minimize the complexity of the model and be consistent when modeling males and females, we chose 2 for the degrees of freedom on the natural spline in our analysis. We tested higher values and noticed only minor differences in the age curves, which we show in the results section. We were also curious if using a specific subset of runners (such as elite runners based on the random effect for running ability) had an effect on the non-linearity of the curves. However, the cross-validation results were nearly the same. Therefore, throughout our analysis we utilized a natural spline with degrees of freedom equal to 2 to represent the age of a runner (unless otherwise specified).

Figure 3.10 – Histogram of Boston Marathon Finish Times

Additionally, we compared the cross validation error of the model with and without the
log transformation. We defined the cross validation error (as measured by MSE) on the original
scale so all units were in (minutes)2_{. Cross validation errors were lower when utilizing the log }
transformation (I tested natural splines and orthogonal polynomials to represent the non-linearity
of age but this did not change the results). In the Boston Marathon data, cross-validation error
(measured by MSE) dropped by 1.63% for males and 1.04% for females when utilizing the log
transformation. Similarly, in the Chicago data, the model with the log transformation resulted in
cross-validation errors that were reduced by 0.25% for males and 0.16% for females. It makes
sense that the log transformation has a more significant on the Boston dataset because the
distribution of finish times is more skewed for this race. This race attracts the most talented
runners out of the marathons we have analyzed, but it also includes charity runners who finish at
much slower times. Although the inclusion of the log transformation is not overly significant,
this pattern was consistent across all datasets. Therefore, we decided to utilize the log

As mentioned previously, the model utilizes random effects, *αi and βyij*, to represent the

individual runner effect and the year effect respectively. We have used the lmer function from
the lme4 package in R (developed by Dr. Doug Bates) to fit this linear mixed-effects model. In
this model, *αi is assumed to follow a normal distribution with an unknown mean and standard *
deviation: *αi~ N(μ _{a}, σa). Similarly, βyij*is assumed to follow a normal distribution with an

unknown mean and standard deviation: *β _{y}_{ij}*~ N(

*μβyij*

*, σ*

_{β}_{y}*ij*). At a high level, including *αi as a *
random effect accounts for the difference in the ability of runners. Certain runners will tend to
have faster finish times than others for a variety of reasons such as genetics, training regiment,
and experience. Additionally, treating *αi as a random effect helps to reduce the number of *
coefficients in the model. If we treated each runner ability as a fixed effect, the number of
estimators in the model would grow with the size of the dataset. This is an instance of the
Neyman-Scott problem which can lead to inconsistent estimators *[2]*. Estimating too many *αi’s *
could lead to inconsistent and unreliable coefficients. *βyij*was included as a random effect to

properly adjust for year to year variability in race conditions without adapting the model to a specific year. These random effects are assumed to be independent according to the specification of the model.

time for runners between 18 and 34 was 3:00:00 for males and 3:30:00 for females. Then, we assessed the expected times for runners in each age-group, using the age-groups defined by the Boston Marathon qualifying standards. This discretizes the results of Model 1. In order to

accomplish this, we first calculated *ta , the time (in minutes) predicted by Model 1 for a runner of*
age *a* when zeroing the random effects for year and individual runner ability:

*t _{a}*=exp

_{(}

*S*(a ;df=2)

### )

Next, we used these calculations and applied them to the age-groups utilized by the
Boston Marathon qualifying times. We utilized the following formula to calculate the suggested
qualifying time for a given age-group (*Tg):*

*T _{g}*=

*B*

1
17*i=*

### ∑

1834

*t _{i}*
∗1

*n*

### ∑

*j=a*1

*an*

*t _{j}*

*B* is the baseline time (in minutes) for the 18-34 age-group (B=180 for males, 210
for females)

*a*_{1}_{ , … , }*a _{n are the n ages that compose an age-group g}*

*t _{a}* is the predicted finish time (in minutes) for a runner of age

*a*(Using Model 1)

*Tg is the recommended qualifying time for age-group g (in minutes)*

appropriate age-group (as shown by the above formulas). We carried out this process on every age-group as defined by the Boston Marathon qualifying standards.

After performing this analysis on all of the marathon datasets with which we were
provided, we further extended Dr. Smith’s analysis in a number of ways. We were particularly
interested in testing whether runners age differently based on their skill-level. While one may
hypothesize to subset runners based on their finish times, we realize that this would ignore the
age component. A 3:30 marathon by a 60 year-old is much more impressive than if a 30 year-old
accomplished this same feat. We wanted an objective and thorough way to distinguish runners
by skill in order to model their age-performance curves separately. We decided that using the *αi *
coefficients would be reasonable because these coefficients represent the overall ability of each
runner (while adjusting for the year of the race and age of the runner). For men and women
respectively, we split the *αi estimators into quartiles and plotted age-performance curves *

separately. This analysis can serve as a basis for comparison of the aging of runners with varying
abilities. Additionally, based on the age-performance curves, we noticed that the quality of
runners in the Boston Marathon was superior to other races despite its inclusion of charity
runners. We attempted to adjust for this by comparing the Boston Marathon age-performance
curve to the age-performance curve generated when filtering for the top 50% of runners in the
other races (I determined the top 50% of runners through the *αi coefficients).*

runner will not run a race in any subsequent year. This is certainly not a perfect measure given that participants in recent years will have higher dropout rates. However, this definition can still give an idea about how dropout varies by age for each dataset. We modeled for dropout

probability using a logistic regression, using the non-linear function on age as the predictor. This equation can be seen below:

log

## (

*pij*

1−*p _{ij}*

## )

=*β*0+

*S*(

*aij;df*=2)

● *pij is the probability of dropout for the runner i after his/her jth race*
● *β*_{0} is the intercept

● *aij is the age of runner i when running his/her jth race*

● *S*(*aij;df*) is the nonlinear function (natural spline) on age with 2 d.f.

This model gave us an understanding of how dropout probability is impacted by age. Next, we utilized these probabilities in a variable-quantile analysis as outlined below:

1. We found the top p percentile of runners aged 18 (minimum age in the datasets) by
using the *αi coefficients from Model 1*

2. For each age i > 18, we found the top x percentile of runners aged i.

Where x = p * (1+P(dropout for runner age i)- P(dropout for runner aged 18))

*The probability of dropout was determined by utilizing the logistic regression above.

runners at age 70. For a given value of p, we aggregated the subset of runners for each age as determined by our methodology. We utilized this data to train Model 1 and determine qualifying times using the methodology explained above. We have only utilized this procedure on the Chicago and New York datasets so far, but would like to carry this out on more datasets in the near future.

After performing analyses with Model 1, we were curious to alter the specifications of the model. Perhaps each individual runner ages differently for a variety of reasons (health, body structure, genetics, etc.). We wanted to test a model in which the age curves are also modeled as random effects. The simplest way in which to do this is by omitting the random effect for the year of the race. This model is given below:

log(*t _{ij}*)=

*α*+

_{i}*S*(

*a*)∗

_{ij};df*α*+

_{i}*ϵ*(Model 2)

_{ij}This linear mixed-effects model was also fitted using the lmer function from the lme4 package in R. As with the previous analyses, we modeled men and women separately. After training this model on a few of the larger marathon datasets (Boston, Chicago, and New York), we attempted to create this same model while including the random effect for the year of the race. This model is shown below:

log(*tij*)=*αi*+*βyij*+*S*(*aij;df*)∗*αi*+*ϵij*

(Model 3)

Lastly, we wanted to tailor the analysis to more specifically address the Boston Marathon qualifying times. This inspired an analysis that treated the age variable as a categorical variable rather than a numerical variable. The categories for this variable are given by the age-groups defined by the Boston Marathon (as seen in Figure 1). This gives 11 distinct age-groups. We utilized a slight tweak to the original model:

log(*t _{ij}*)=

*α*+

_{i}*a*+

_{ij}*ϵ*(Model 4)

_{ij}Here, *aij represents the age-group of runner we during his/her jth race. This approach can be seen*
as a bit more naive because it treats runners from different ages as the same. For example, a
runner of age 18 and 33 are taken to have the same ‘age’ (since they fall in the same age-group).
Essentially, it is simplifying a variable in the model (age) and results in a less specific predictor.
This fosters less accurate results and creates discrete age-performance plots (instead of smooth
curves) by setting *αi equal to zero. Nonetheless, we wanted to incorporate this plot to directly *
model for the Boston Marathon age-groups. Comparing the results across different approaches
could suggest the creation of better age-groups. Likewise, the results could indicate the merit in
having different qualifying times for each age (essentially one-year age-groups).

**4. Results**

less than 2% of the observations in each of the datasets (these runners represented less than 1% of the data in all races except for the Boston Marathon). Due to a lack of observations, the models were untrustworthy for runners in this age-group. One may think that the constraint to only include marathons ran in under 6 hours contributed to this problem as well. However, the age-performance curves still showed the same behavior when omitting this constraint. Therefore, we have only shown ages 18-65 in the age-performance curves below. When implementing methodology to discretize the results of Model 1, we decided to include older age-groups to be consistent with the Boston qualifying standards. However, we did not produce suggestions for the 80+ age-group due to extreme lack of data (sometimes no data at all).

**A. Boston Marathon (2001-2017 except 2015)**

The first results we obtained on the Boston dataset utilized the original model (Model 1).
We modeled males (30,887 runners and 92,088 total races) and females (20,232 runners and
54,482 total races) separately. This is the summary output from the linear-mixed effects models:
**Females:**

**Random Effects** **# Groups** **Variance** **Standard Deviation**

*α _{i}* 20,232 0.012 0.110

*β _{y}_{ij}* 16 0.004 0.050

**Fixed Effects** **Estimate** **Standard Error** **t-value**

Intercept 5.462 0.013 433.02

*γ*_{1} 0.314 0.006 49.01

*γ*_{2} 0.506 0.010 52.80

Table 4.3 - Model 1 Summary Output on Female Participants in the Boston Marathon

**Males:**

**Random Effects** **# Groups** **Variance** **Standard Deviation**

*α _{i}* 30,887 0.018 0.135

**Fixed Effects** **Estimate** **Standard Error** **t-value**

Intercept 5.315 0.012 431.52

*γ*_{1} 0.412 0.006 67.85

*γ*_{2} 0.577 0.006 92.65

Table 4.4 - Model 1 Summary Output on Male Participants in the Boston Marathon

In the output above, *γ*_{1} and *γ*_{2} refer to the fixed effect coefficients for the natural spline on
age (2 degrees of freedom). Obviously, it is logical that the coefficients on these random effects
increase as age increases (marathon finish times generally increase with age). However, one
important note is that the intercepts for the males and females differ quite a bit. The intercept for
the model with the males is 0.15 less than that of the females. Although the men have higher
coefficients on the fixed effect for age, the lower intercept in the model for males overrides this
discrepancy. This model predicts faster finishing times for males than females when holding
other variables constant. Also, *γ*_{1} and *γ*_{2} are both extremely significant predictors of finish time
as seen by the high t-values associated with these effects. Intuitively, this would suggest that
adding further variables (increasing the degrees of freedom on the natural spline for age) would
improve the fit. Although increasing the degrees of freedom results in a larger number of
significant variables, the actual fit does not improve significantly as shown by the

cross-validation results. Increasing the degrees of freedom to 3 results in 3 significant variables *γ*_{1}, *γ*_{2},
and *γ*_{3} which have lower t-values than *γ*_{1} and *γ*_{2} in the current model. Adding more degrees of
freedom seems to spread out the significance of *γ*_{1} and *γ*_{2} across more variables, but it does not
seem to add unique insights which improve the fit.

‘year’ is relatively the same for men and women, signifying that the variance of the effect of the
race conditions is relatively the same for men and women. The race conditions seem to affect
men and women equally, as seen by the coefficients for the ‘year’ random effect below. We have
shown the temperature each year when the first male winner finished along with lattice plots for
the random effect *βyij* for females and males (The lines through each dot indicate the 95%

confidence interval for each random effect):

Year 2001 2002 2003 2004 2005 2006 2007 2008

Temp (°F) 54 56 59 86 66 53 50 53

Year 2009 2010 2011 2012 2013 2014 2016 2017

Temp (°F) 47 55 55 87 54 62 61 73

*Temperatures taken from https://www.baa.org/races/boston-marathon/history Table 4.5 - Historical Temperatures of Boston Marathon

Figure 4.12 - Lattice Plot for Year Random Effect on Boston Male Data

The coefficient on the year effect is relatively consistent across males and females. From
the years shown in Table 4.3, 2003, 2004, 2005, and 2012 were the only years with positive
coefficients. This signifies that finish times from these years are typically greater than the
average year. Therefore, we would expect this year to have worse running conditions when
compared to the other years of the Boston Marathon included in this analysis. While temperature
is simply one factor that affects race conditions, typically high temperatures indicates difficult
race conditions. This explains the positive coefficient for the 2004, 2005, and 2012 Boston
Marathons and the near-zero coefficient in 2017. Factors other than temperature which may
impact the *βyij*coefficients include wind and precipitation.

Figure 4.13 – Age-Performance curve for Model 1 on Boston data

The red and blue dotted lines represent confidence bands (2 standard errors above and
below the curve) for the males and females respectively. We used the standard deviation of the
fixed effect (S(*aij;df*)which is the natural spline on age) to compute this. One particular
noteworthy takeaway from the plot is that the gap between males and females converges as
runners age. The finish times for male runners increase with age a bit faster than females.
Nonetheless, the finish times for both males and females increase at an increasing rate as these
runners age. Finish times change much more drastically between ages 40 and 65 than between
ages 18 and 40. Another important note of this plot is the y-axis values. The finish times for the
average runner in this dataset are significantly lower than other datasets. The Boston Marathon
attracts fast runners, in large part due to its strict qualifying times.

suggested qualifying times for the age-groups defined by the Boston Marathon qualifying standards. We were able to accomplish this task by implementing the process we described in the methods section. This methodology produced the following qualifying standards:

Table 4.6 - Qualifying Standards produced from Model 1 on Boston Marathon data

The qualifying standards shown above are quite similar to the standards used for the 2020 Boston Marathon (Table 1.1). The times for each demographic do not differ by more than a few minutes (except for the older age-groups which is to be expected due to the lack of observations of these ages in the dataset). The 2020 Boston Marathon qualifying standards seem to be a bit more lenient for most age-groups. Most qualifying standards are a few minutes slower than the ones shown above. Perhaps this suggests that the current standards are too harsh on runners aged 18-34 with respect to other age-groups. Nonetheless, the minor difference between these

which can no longer participate are obviously omitted from the data and therefore, the analysis cannot comprehensively assess the effect of age on performance. This phenomenon is even more noticeable when analyzing less prestigious marathons.

After implementing Model 1 on all of the Boston Marathon data, we were curious to see
if runners age based on their underlying ability. As mentioned in the Methods section, we
implemented Model 1 on quartiles of men and women. We split the men and women into ability
quartiles based on the random effect for individual runners (*αi). Runners with higher α _{i}*

coefficients signify slower runners. The results from this analysis are shown below:

Figure 4.15 - Age-Performance curve for Boston Male data split by Quartiles

of the curve for this quartile is noticeably different as it does not exhibit a smooth increase in finish time like the other three curves.

These results prompted me to subset the data to only include runners in performance quartiles 1-3. We reproduced the analysis of model 1 and arrived at the following age-performance curves and new recommended qualifying times:

Table 4.7 - Qualifying Standards produced from Model 1 on top 75% of Boston Marathon data

The shape of the curves in the age-performance curves which excludes quartile 4 seems to be a bit different than the previous result (when including all runners). These new curves tend to increase a bit faster, which makes sense given the plots with the runners split by quartiles. However, the new qualifying standards from this analysis are very similar to the previous analysis (as well as the 2020 Boston qualifying standards). This is another positive sign that the numbers are reliable and logical even when filtering based on the runners’ ability.

Model 3 because it omits the random effect on the year of the race (*βyij*). The age-performance

curves that results from these models are shown below (first is from Model 2, second is from Model 3):

Figure 4.17 – Age-Performance Curve from Model 2 on Boston data

Clearly, the results from Model 2 and Model 3 are unreasonable. The aging curves in both plots are extremely flat for both men and women. The finish times do not vary much at all as age increases. The separation of the male and female finish time converges in Model 2, but it diverges in Model 3. Perhaps, we have made a mistake in specifying the model. Fitting these models using the lmer function in R gave a ‘singular fit’ warning which suggests that we may have specified the model incorrectly. Another argument could be that there are only a few observations per runner (for most runners). These observations typically span a small age range. It could be difficult to estimate an age-performance curve given this lack of information about each runner. We still think this idea of fitting individual age curves for each runner has merit because runners can age very differently for a variety of reasons. Unfortunately, we were unable to attain reasonable results when applying Model 2 and Model 3 on other races (Chicago, New York, Los Angeles, and Marine Corps). We have left these results out of the results section, but the main takeaway is that the curves resemble the curves above from the Boston Marathon dataset.

Figure 4.19 - Age-Performance Curve from Model 4 on Boston data

The age-performance curves above are rather intuitive. For the most part, the times increase with age at an increasing rate. This is consistent with our previous approaches. The data points for the eldest two age-groups can be largely ignored due to lack of observations. It should be noted that the confidence bands are much wider for the older age-groups due to the lack of observations. The finish times increase similarly for men and women as the age-group increases. As mentioned before, this approach was a bit naive because it splits a more precise variable (age represented by natural splines in Model 1) into discrete bins. Therefore, it is not as granular of a model as Model 1. This result provides support that discretizing Model 1 to determine qualifying times is a stronger method than fitting a separate fixed effect for each age-group as in Model 4.

**B. Chicago Marathon (2000-2017 except 2015)**

male runners which account for 121,229 observations and 28,678 female runners which account for 73,141 observations This produced the age-performance curves seen below:

Figure 4.20 - Age-Performance Curve from Model 1 on Chicago data

These curves are quite similar in shape to the curves generated from the Boston data. The curves are shifted upwards which suggests that the typical Chicago Marathon runner is a bit slower than the typical Boston Marathon runner. Both the Boston and Chicago curves illustrate the pattern that finish times tend to increase at an increasing rate as a runner ages. However, the Chicago dataset seems to show that runners do not age much from ages 18-40. The curves for both males and females are relatively flat across this age range.

The summary output for Model 1 on the Chicago dataset was quite similar to that of
Boston. The variance of *αi(random effect on the individual runner) was greater for males than *
for females. This suggests that there is more variance in the ability of male runners in this
dataset. Additionally, the values of *βyij*(random effect on the year of the race) reasonably

Year 2000 2001 2002 2003 2004 2005 2006 2007 2008

Temp (°F) 69 53 50 69 62 59 48 88 86

Year 2009 2010 2011 2012 2013 2014 2016 2017

Temp (°F) 45 84 80 51 65 64 63 73

*Temperatures taken from https://www.nbcchicago.com/news/local/chicago-marathon-weather-history/1963352/ and https://findmymarathon.com/weather-detail.php?zname=Chicago%20Marathon&year=

Table 4.8 - Historical Temperatures of Chicago Marathon

Figure 4.22 - Lattice Plot for Year Random Effect on Chicago Female Data

The *βyij*coefficients are relatively consistent between males and females which suggest

that the race conditions impact males and females similarly. Additionally, as expected, years
with hotter temperatures resulted in positive coefficients for *βyij*. For instance, 2007, 2008, 2010,

2011, and 2017 had positive coefficients for *βyij*. This indicates that these years had difficult

running conditions, as we can see with the high temperatures in Table 4.6. Other factors such as wind and precipitation can also impact race conditions which may explain some of the variability in other years. The lines in the lattice plot indicate the 95% confidence interval for the random effects. It is interesting to note that 2004 and 2005 had particularly wide confidence intervals. This could suggest that these years had more of an uncertain effect on the finish times of runners. Perhaps these race conditions impacted runners differently.

the same methodology as we did with the Boston data to develop these new qualifying standards. The result of this analysis is shown below:

Table 4.9 - Qualifying Standards produced from Model 1 on Chicago data

These suggestions still do not deviate too much from the current standards or the standards we generated from the Boston dataset. However, the standards do not increase as quickly with age. The age performance curves are a bit flatter for the Chicago dataset than the Boston dataset. Additionally, the older age-groups seem to be unreasonable which is largely due to the lack of observations with runners above 65 (make up less than 1% of the data). Most of the standards generated from this analysis are at least 10 minutes less than the 2020 Boston

As discussed in the methods section, one of the specifications of the model that we
attempted to modify was the natural spline on age *S*(*aij;df*). Since the Chicago dataset and other
marathon datasets showed that females have a bit more non-linearity in their age performance
curves (based on the k-fold cross validation method we discussed in the methods section), we
were curious to see the impact on the age-performance curves when altering the degrees of
freedom. We have shown the fit of implementing Model 1 on the Chicago dataset when using 3
and 4 degrees of freedom on the natural spline:

Figure 4.24 - Age-Performance Curve from Model 1 (df = 4) on Chicago data

The curves do not change much when altering the degrees of freedom between 2-4. The only noticeable change is towards the end of the curve where the male and female finish times start to converge. Overall, it does not appear that using 2 degrees of freedom is significantly different from these fits. This pattern was persistent when analyzing different datasets as well. As shown in the methods section, the change in cross-validation error was very minor when altering the degrees of freedom from 2-4. Using a larger value for the degrees of freedom did not always reduce cross-validation error. Therefore, we chose to utilize 2 degrees of freedom throughout the remainder of the analysis.

Figure 4.25 - Age-Performance curve for Chicago Female data split by Quartiles

Figure 4.26 - Age-Performance curve for Chicago Male data split by Quartiles

Due to the disparity between the Chicago and Boston datasets in terms of the

performance of runners, we were curious to run Model 1 with a subset of the best runners in the
Chicago dataset. We decided to utilize the top 50% of runners (as determined by the *α _{i}*

coefficients) to see if the age performance curves and new qualifying standards more closely resemble the Boston results. The age-performance curve and new qualifying standards are shown below:

Table 4.10 - Qualifying Standards produced from Model 1 on top 50% of Chicago data

The age-performance curves appear to have the same shape as they did when including all of the Chicago data. Obviously the curves are shifted down due to the quality of runners, but the shape of the curves remained similar. This is also consistent with the fact that the qualifying standards did not change much when filtering the data. These qualifying standards are a bit closer to the 2020 Boston qualifying standards than the original analysis on the entire Chicago dataset. While the standards are not vastly different than those generated from the Boston dataset, the Chicago results definitely suggest that runners age a bit more slowly.

2 degrees of freedom. This model returned intuitive results which suggest that dropout probability increases with age. These results are seen below:

Figure 4.28 - Predicted Dropout Probability by age for Chicago Marathon Dataset

Figure 4.29 – Age-Performance curves when using p = 25% in the variable-quantile analysis with Chicago data

Clearly, the age-performance curves are not nearly as flat as the other analyses with Model 1 on the Chicago dataset. Also, the suggested qualifying times seem a bit more reasonable and consistent with current Boston qualifying standards. This shows the merit of this variable-quantile analysis. Since we have just recently implemented this analysis, it still needs to be refined and performed on more datasets. Additionally, it needs to be justified as a reasonable way to account for the survivor bias in the marathon datasets.

Lastly, we implemented Model 4 on the Chicago Marathon dataset. This resulted in the following age-performance curves which appear quite problematic:

Figure 4.30 - Age-Performance Curve from Model 4 on Chicago data

produces more reliable results than Model 4. Model 4 does not provide any other additional insights, so we decided to omit this plot for the other marathons.

**C. New York Marathon (2000-2019 except 2012)**

The next marathon we analyzed was the New York Marathon. We began by analyzing the entire dataset using Model 1 in the same manner in which we handled the Boston and Chicago datasets. The New York data includes 55,608 unique male runners which account for 160,127 observations and 28,907 female runners which account for 76,339 observations. This is the largest dataset in which we analyzed. Implemented Model 1 on this data yielded the age-performance curves seen below:

Figure 4.31 - Age-Performance Curve from Model 1 on New York data

this age range for both males and females. This contrasts with the slowly increasing pattern in the Boston Marathon curves. Perhaps this is a result of less experienced runners competing in New York and Chicago. This leaves more room for improvement, which can offset the aging factor that can occur between 18 and 40. The Boston Marathon typically attracts higher-quality, more experienced runners which may be performing closer to their peak. This results in an evident aging component in this age range.

The summary output of Model 1 mirrors the previous two analyses. The *αicoefficients *
have more variance for male runners than female runners, which suggests that there is a greater
variance in the ability of male runners in the dataset. The *βyij*coefficient aligns reasonably with

the race conditions in each year. In years with extreme race conditions, this value was positive which indicates slower times. This pattern was persistent across all marathons in which we analyzed.

Table 4.12 - Qualifying Standards produced from Model 1 on New York data

younger runners may be locals who are not as committed or experienced. The faster young runners may also be deterred by the complicated entry procedures for the New York Marathon. These entry procedures reward runners who participate in a large number of races organized by New York Road Runners. Fast, young runners may not be interested or capable of gaining points in this system. Therefore, these runners could be unable to participate in the New York

Marathon.

I was also curious to look at fitting Model 1 on different performance quartiles of New
York Marathon runners. We split the male and female runners into quartiles based on the runner
ability (*αi). Then we fit Model 1 on each of these quartiles, separately. This resulted in the *
following age-performance curves:

Figure 4.33 - Age-Performance curve for Chicago Male data split by Quartiles

Table 4.13 - Qualifying Standards produced from Model 1 on top 50% of New York data

This analysis resulted in age-performance curves which were similar to those generated when using all of the New York data. Therefore, it makes sense that the qualifying standards (shown above) did not change much. Overall, the results of the analyses performed on the New York dataset closely resembled the results from the Chicago analyses.

Figure 4.34 - Age-Performance curves when using p = 25% in the variable-quantile analysis with New York data

As seen in the results above, the shape of the curve and qualifying times only change slightly from the original analysis with Model 1. The curves are a bit less flat and the qualifying times are a bit more logical, but it still seems to suffer from the same problem. The percent increase in suggested finish times from the 18-34 age-group to the 50-54 age-group is 8.10% and 6.10% for males and females respectively. While this is greater than the results from Model 1 on the entire dataset, it still is significantly less than the Boston qualifying times (or the results from our analysis on the Boston and Chicago datasets). As mentioned, this variable-quantile analysis is still a work in progress. The results promote further research and analysis to address the issue of dropout probability.

**D. Los Angeles Marathon (2000-2019)**

Figure 4.35 - Age-Performance curve from Model 1 on Los Angeles data

These age-performance curves indicate that the typical runner in this dataset is

significantly slower than the Chicago and New York dataset. The Los Angeles curves appear to be higher about 30 minutes throughout a majority of the ages. The age-performance curves are flat from ages 18-45 which is consistent with the Chicago and New York results. However, the shape of these curves is a bit less steep than the other races we have analyzed (particularly towards the older age range). This suggests that the aging is not quite as evident in this dataset, which is likely due to the slower finish times for younger runners. Perhaps younger runners are less committed and experienced than older runners in this marathon. This marathon is not as prestigious as Boston or Chicago, which may lead to less participation by elite young marathon runners. These runners may not be able to afford the travel and expenses associated with

competing in the Los Angeles Marathon. Experienced and talented older runners are more likely to be able to afford these expenses and take the time to compete in this marathon.

Table 4.15 - Qualifying Standards produced from Model 1 on Los Angeles data

The fairly flat age-performance curves of this analysis corresponded with very steady qualifying times (did not increase much with age). These qualifying times are much more

extreme than the standards we have generated from other races. They are especially unreasonable in the older age-groups, which is in part due to the phenomena we mentioned earlier. The slower younger runners appear to have distorted the shape of the age-performance curve.

Figure 4.36 - Age-Performance curve for Los Angeles Female data split by Quartiles

Figure 4.37 - Age-Performance curve for Los Angeles Male data split by Quartiles

to the prestige of the races and the stricter qualifying times. We filtered the Los Angeles dataset to only include runners in quartiles 1 and 2 (top 50% of runners). However, as anticipated, this still resulted in a flat age-performance curve and unreasonable qualifying standards.

**E. Marine Corps Marathon (2000-2018)**

The last large dataset that we analyzed was the Marine Corps Marathon. After

preprocessing, this data included 15,952 unique female runners (which constitute 43,146 races) and 28,678 unique male runners (which constitute 85,389 races). Implementing Model 1 on this dataset resulted in the following age-performance curves:

Figure 4.38 - Age-Performance curve from Model 1 on Marine Corps data

in the older age-groups. As with the other races which we have analyzed, the finish times in these curves increase at an increasing rate with age.

When applying the same methodology to convert the above age-performance curves to qualifying standards, we obtained the following results:

Table 4.16 - Qualifying Standards produced from Model 1 on Marine Corps data

These new qualifying standards are very similar to the results obtained from the analysis of the Chicago and New York datasets. This makes sense because the shape of the

be due to the fact that running these races is costly and therefore, the younger runners may simply be locals (who may not be as committed or experienced). This can also relate to the previously mentioned idea regarding survivor bias.

I proceeded to fit Model 1 on performance quartiles (determined by the *αi coefficients) of*
the Marine Corps dataset. This produced the following age-performance curves:

Figure 4.39 - Age-Performance curve for Marine Corps Female data split by Quartiles

These age-performance curves resemble the analysis done on the performance quartiles with the Chicago and New York datasets. The shape of the curves is consistent for each quartile (except for quartile 3 in the females curve). Additionally, quartile 2 and 3 appear to be closest together. This suggests that there are elite runners in this race and also very casual runners. We filtered the Marine Corps data by only looking at quartiles 1 and 2 (top 50%). We implemented Model 1 on this subset, but the shape of the curves was still similar. The curves were still relatively flat from ages 18-45, which resulted in qualifying standards that differed from the 2020 Boston qualifying standards.

**F. Twin Cities Marathon (2000-2019 except 2003-2005)**

The Twin Cities Marathon is the first ‘smaller’ dataset which we will discuss. The data consists of 8,490 unique female runners (which participated in 24,082 total races) and 12,772 unique male runners (which participated in 41,535 total races). Implementing Model 1 on this dataset produced the following age-performance curves:

The finish times tend to increase at an increasing rate with age. However, the curves show very minor changes in finish times from ages 18-45 (which we have seen in all of the analyses besides Boston). The quality of runners in this race resemble the Chicago and Marine Corps dataset (the curves have similar finish times throughout the age range). As seen with these other races (besides Boston), producing qualifying times from these curves leads to unreasonable results. We have shown these results below:

Table 4.17 - Qualifying Standards produced from Model 1 on Twin Cities data

Model 1 on performance quartiles from this dataset, but we have not included them to avoid repetition. This analysis produced very similar results to the analysis performed on the Chicago, New York, and Marine Corps data. The same reasoning explains why we have not included the analysis of Model 1 on performance quartiles for the remaining marathon datasets (Philadelphia, Houston, Grandma’s, and California International). The results were very similar to the analysis of the Chicago, New York, and Marine Corps data. Since it did not provide new insights, we have not included it in the results sections.

**G. Philadelphia Marathon (2000-2019)**

The Philadelphia Marathon dataset consists of 7,134 female runners (18,343 total races) and 12,650 male runners (35,398 total races). We began analyzing this dataset by implementing Model 1 on male and female runners separately. The age-performance curves and new qualifying standards that resulted are shown below:

Table 4.18 - Qualifying Standards produced from Model 1 on Philadelphia data

The results from this dataset are very similar to the analyses of the Chicago, Marine Corps, and Twin Cities datasets. The ability of runners and the shape of the age-performance curves in this analysis is very similar to those races. The qualifying standards are at least 10 minutes faster than the current Boston qualifying standards for most age-groups. The times for the first four age-groups are separated by less than 4 minutes for the males and less than 6 minutes for the females. This is largely due to the lack of elite runners at the lower age-groups, which led to the flat shape of the curve from ages 18-50. These results confirm this pattern which we have seen previously. It seems to be as a result of the typical ability of the participants of marathons with similar prestige.

**H. Houston Marathon (2000-2018 except 2011)**

and qualifying standards that resulted from implementing Model 1 on the Houston data are shown below:

Figure 4.43 - Age-Performance curve from Model 1 on Houston data

The age-performance curves are very similar to previous analyses. The male and female curves are fairly flat through age 50 which suggests that runners do not age much until after 50. This can be seen with the qualifying standards as well. The standards for the first four age-groups differ by less than 2 minutes for males and less than 6 minutes for females. This dataset likely suffers from the problem in which most of the talented runners are in the older age-groups. This can explain the lack of separation between runners in the first four age-groups. Overall, the quality of runners in the Houston dataset resembles the runners in the New York and Twin Cities datasets. The times are fairly similar throughout the age curve.

**I. Grandma’s Marathon (2000-2019 except 2006)**

After preprocessing, the Grandma’s Marathon dataset includes 6,290 female participants (20,461 total races) and 10,998 male participants (40,428 total races). Implementing Model 1 on this dataset produced similar results from our previous analyses. The age-performance curves and new qualifying standards are shown below:

Table 4.20 - Qualifying Standards produced from Model 1 on Grandma’s data

This analysis suggests that runners age more slowly than the 2020 Boston qualifying standards propose. However, this analysis likely has bias based on the participants. As seen with previous analyses, the younger participants do not seem as talented as the older participants relative to their age groups. This explains the flatness of the age curves through age 50. The qualifying times produced from this analysis show that the first four age groups are separated by under 2 minutes for males and under 6 minutes for females. This seems quite unrealistic and distorts the standards for the older age groups as well.

**J. California International Marathon (2000-2019)**

Figure 4.45 - Age-Performance curve from Model 1 on California International data

Although this dataset is fairly small, the results seem to be fairly reasonable. The age-performance curves show the typical pattern in which times increase at an increasing rate with age. The qualifying standards are faster than the 2020 Boston qualifying standards, but the times do not seem completely unreasonable (except for the older age-groups). The older age-groups are likely unreliable due to a lack of observations. The results from this analysis still suggest that aging occurs more slowly than the 2020 Boston qualifying standards propose. This may be due to the flatness of the age-performance curves which results due to a disproportionately talented participant group.

**K. Results Summary**

Most of the results seem to illustrate the same pattern in which the age-performance curves are relatively flat. To illustrate this pattern we have shown a summary table below which outlines the percent increase in the curves from the 18-34 age-group through the 50-54 age-group:

The Boston Marathon seems to be the most reasonable in terms of the aging during this span of ages. It aligns most with the current Boston Marathon qualifying standards which suggest an increase of 13.89% and 11.90% for male and female finish times respectively over this span of age-groups. The Boston Marathon data likely had the most logical results because it is the most prestigious U.S. race and attracts top runners from all age-groups. Therefore, the talent is more well spread out than other races which may have more casual runners in certain age-groups (especially the younger age-groups).

Another takeaway from Table 4.18 is the discrepancy between males and females. The female age-performance curves tend to increase less than the male curves over this time interval. While we are not sure exactly why this is the case, it seems that these curves are not indicative of how marathon performance changes with age. Perhaps this relates to the survivor bias issue. The older runners in these datasets represent runners who still manage to run at older ages. This is not indicative of all older runners in general. Since the data and our analysis has not accounted for this dropout rate, the age-performance curves from the models likely understate the effect of age on marathon performance.

**5. Discussion**