ABSTRACT

TURNBULL, BRADLEY CHRISTOPHER. Non-parametric Regression Models for Ordinal Predictors and Unimodal Density Estimation featuring Bernstein Polynomials. (Under the direction of Sujit K. Ghosh.)

This dissertation is comprised of two primary topics. The first chapter introduces a novel method for unimodal density estimation. All subsequent chapters present and develop statistical methodology motivated by the United States’ liver transplant allocation system.

The estimation of probability density functions is one of the fundamental aspects of any statistical inference. Many data analyses are based on an assumed family of parametric models, which are known to be unimodal (e.g., exponential family, etc.). Often a histogram suggests the unimodality of the underlying density function and parametric assumptions may not be adequate for many inferential problems. A flexible class of mixture of beta densities that are constrained to be unimodal is presented. It is shown that the estimation of the mixing weights, and the number of mixing components, can be accomplished using a weighted least squares criteria subject to a set of linear inequality constraints. The mixing weights of the beta mixture are eﬃciently computed using quadratic programming techniques. Three criteria for selecting the number of mixing weights are presented and compared in extensive simulation studies.

Over the last 17 years, there have been major concerns that the current system of organ allocation and transplantation may not ensure that available organs unbiasedly reach the patients most in need of a transplant. In 1998, these concerns prompted Congress to ask the Institute of Medicine (IOM) of the National Academy of Sciences (NAS) to conduct a review of the liver allocation system. The liver was the organ of focus as the time a liver can survive outside of the body is much larger than other organs, making alterations to the allocation system more feasible, and the urgency for liver transplant is greater than some other organs, such as kidneys, where treatment alternatives such as dialysis are possible. In 2002, motivated by the results of the IOM’s evaluation, the liver allocation system was changed from a subjective status-based algorithm to one using priority groups determined by an integer Model for End-stage Liver Disease (MELD) severity score.

ordinal predictors is proposed. A flexible non-linear relationship between the continuous latent variable and waiting time is represented using Bernstein polynomials. Estimates for the regression parameters as well as the cut-points partitioning the latent variable into the ordinal levels are obtained using popular Bayesian MCMC methods. Second and first order approximation models are also introduced which enable the use of a two-stage maximum likelihood parameter estimation procedure. Proofs of the consistency and convergence rates of model parameters are provided along with an analytical comparison of the full and approximation models. The proposed method is numerically compared to popular alternative methods through three simulated data scenarios. Finally, the method is applied to the motivating data set of liver transplant candidate waiting times.

Given the presence of a variety of predictor variable types in the UNOS liver transplant data, Chapter 3 introduces a variable selection method designed for data scenarios featuring ordinal, categorical, and continuous covariates. The new method combines the latent continuous variable model for ordinal predictors from Chapter 2 with the popular stochastic search and variable selection method. Selection inconsistencies and poor mixing issues revolving around categorical variables with varying number of levels are handled using a parameter expansion of regression coeﬃcients. Three simulated data scenarios demonstrate the performance of the proposed method compared to popular alternative mixed data variable selection methods. The new method is lastly used to help identify a sparse set of predictors to include in a waiting time regression model for the motivating UNOS liver transplant data set.

© Copyright 2015 by Bradley Christopher Turnbull

Non-parametric Regression Models for Ordinal Predictors and Unimodal Density Estimation featuring Bernstein Polynomials

by

Bradley Christopher Turnbull

A dissertation submitted to the Graduate Faculty of North Carolina State University

in partial fulfillment of the requirements for the Degree of

Doctor of Philosophy

Statistics

Raleigh, North Carolina

2015

APPROVED BY:

Subhashis Ghoshal Alyson Wilson

Wenbin Lu Sujit K. Ghosh

DEDICATION

BIOGRAPHY

Bradley Christopher Turnbull was born December 29, 1987 in Bay Minette, Alabama to parents Roberto and Barbara Turnbull. At the age of four his family moved from Alabama to the Western Pennsylvania town of Indiana. He graduated from Indiana Area Senior High School in 2006 and went on to attend the Rochester Institute of Technology (RIT) in Rochester, New York majoring in Applied Mathematics.

After completing two semesters of Abstract Algebra his junior year of college, he began to realize his true passion was not to one day have the job title of “mathematician”. His first exposure to Statistics was in a design of experiments course his junior year. From there his interest in the field of Statistics grew thanks in large part to three faculty mentors: Drs. Carol Marchetti, James Halavin, and David Farnsworth. Outside of academics, Bradley was a dedicated member of the all-male a cappella group Eight Beat Measure. He served as the musical director for three of his four years in the group. Stemming from his a cappella experience in college he has since gone on to own and operate a small vocal arranging company, BT Arranging, in his spare time.

Bradley graduated from RIT in 2010 with a Bachelor’s degree in Applied Mathematics and a minor in Statistics. With a desire to one day become a teacher but refusing to do so at the high school level, he enrolled in the Statistics PhD program at North Carolina State University in Raleigh, North Carolina in the fall of 2010. En route to his PhD he received his Master’s degree in Statistics in 2012.

In the summer of 2013, Bradley began an internship with an Advanced Analytics group at GlaxoSmithKline led by Alan Menius. This positive experience changed his opinion on industry versus academia leading him to pursue a career in industry after graduation. Under the direction of Dr. Sujit K. Ghosh, he will earn his PhD in Statistics in 2015.

ACKNOWLEDGEMENTS

I would first like to thank my advisor, Dr. Sujit Ghosh. It has been a pleasure working with you, and I am extremely grateful for your consistent patience, flexibility, and guidance. You were wonderfully accommodating and generous with your time the year we were working remotely during your appointment at the NSF in Washington, DC. I greatly admire your broad knowledge of statistics and expertise in diverse areas of the field, which are both traits I hope I can one day emulate. Without your wealth of ideas and words of encouragement this dissertation would not have been possible. I also appreciate your honesty in letting me know when my work was not up to the quality you knew I was capable of. Your “consistent reminders” have helped me develop a stronger commitment to excellence that will serve me well the rest of my career and in life in general. I could not have chosen a better mentor to guide me through my graduate studies and research.

I would also like to thank Dr. Subhashis Ghoshal and Dr. Helen Zhang who I have had the pleasure of working with on a few research projects and papers outside of my dissertation research. I greatly appreciate both of your patience as our projects often took a back seat to my dissertation work. Thank you for allowing me to join your research projects, your confidence in my abilities, and your willingness to provide me funding during the summer on two separate occasions. This extra research experience has been invaluable and definitely helped broaden my knowledge of statistics. Dr. Ghoshal you have always been so gracious with your time, often willing to set aside your current task and talk when I would stop by your oﬃce unannounced. Without hesitation you would graciously explain things two or three times when I was slow to catch up to your thought process. I also thank you for agreeing to serve on my dissertation committee.

I am grateful to Dr. Alyson Wilson and Dr. Wenbin Lu for serving on my dissertation committee. Dr. Wilson was willing to take time out of her busy schedule to share her experiences and deep knowledge of the National Labs. She was also always genuinely interested in hearing about the progress of my job search. Both Dr. Wilson and Dr. Lu have provided insightful comments and critiques of my research that have helped strengthen and expand the work presented in this dissertation. I also thank Dr. Katherine Jennings for serving on my committee as the graduate school representative.

There are a countless number of faculty members at NC State who deserve my gratitude. DGPs Dr. Sujit Ghosh, Dr. John Monahan, Dr. Kim Weems, and Dr. Howard Bondell have always been kind and helpful in answering questions about funding, health insurance, and degree requirements. I have also always enjoyed participating in random conversations in the hallways with Dr. John Monahan, Dr. Len Stefanski, and Dr. Dennis Boos.

The many wonderful staﬀ members of Department of Statistics at NC State have greatly assisted me throughout my graduate school journey. Alison McCoy has always been so kind and a joy to talk to. Lanakila Alexander and Adrian Blue have been invaluable resources regarding funding. A special thanks goes to Chris Waddell who has always been willing to take time out of his day to answer my many computing questions. He has also installed packages on the cluster and answered my emails at times well past usual business hours and even on the weekends. Without his help some of the computational aspects of this dissertation would not have been possible.

I am thankful for the wonderful education I received at the Rochester Institute of Technology. Dr. Carol Marchetti, Dr. James Halavin, and Dr. David Farnsworth were instrumental in my decision to pursue a PhD in Statistics.

I am grateful to my fellow members of the Fifth Moment: Kyle White, Joe Usset, Sidd Roy, Kristin Linn, and Jason Osborne. Having the opportunity to write and play music with all of you has been one of the highlights of my time in graduate school. I especially thank Kyle White who has been a wonderful roommate and friend for the past five years. He was a great resource when I was first acclimating to the rigor of graduate school and always happy to answer my homework related questions. I also thank Justin Post and his wife Mallorie who have been very supportive friends and always open to having long conversations about Pittsburgh sports teams and our pet cats. In addition, Matt Austin, Chris Krut, Sam Morris, Andrew Wilcox, and many others have made graduate school enjoyable and memorable.

Most of all, I would like to thank my parents, brother, and two sisters, for their constant love and support. Mom and Dad, since my days in elementary school you always ensured I was given every opportunity to succeed. It was your guidance which led me to pursue a degree in Applied Mathematics and that has led to this exciting career in statistics. You also taught me the value of hard work and persistence which are two attributes that have been greatly tested throughout the process of writing this dissertation. Your wisdom and support have guided me through many diﬃcult times, most notably my first year of graduate school. I am eternally grateful for all you have done for me.

TABLE OF CONTENTS

LIST OF TABLES . . . ix

LIST OF FIGURES . . . x

Chapter 1 Unimodal Density Estimation Using Bernstein Polynomials . . . 1

1.1 Introduction . . . 1

1.1.1 Unimodal Density Estimation . . . 1

1.1.2 Density Estimation with Bernstein Polynomials . . . 3

1.2 Methodology . . . 4

1.2.1 Selection of the Number of Weights . . . 7

1.2.2 Comparison of Selection Criterion . . . 8

1.3 Simulation Study . . . 9

1.3.1 Density Estimation with Support (−∞,∞) . . . 10

1.3.2 Density Estimation with Support [0,∞) . . . 10

1.3.3 Density Estimation with Support [_{−}1,1] . . . 10

1.3.4 Evaluation Method . . . 11

1.3.5 Results . . . 11

1.3.6 Outliers . . . 12

1.4 Real Data Examples . . . 13

1.4.1 Suicide Data . . . 13

1.4.2 S&P 500 Log Returns Data Set . . . 15

1.5 Discussion and Future Work . . . 17

Chapter 2 Non-parametric Regression Model for Ordinal Predictors . . . 22

2.1 Introduction . . . 22

2.1.1 Liver Transplant Allocation . . . 22

2.1.2 Regression Models with Ordinal Predictors . . . 24

2.1.3 Proposed Non-parametric Model Based on Latent Variable . . . 26

2.2 Methodology . . . 27

2.2.1 Bayesian Inference . . . 29

2.2.2 Approximate Likelihood Approaches . . . 30

2.2.3 Selection of Dimension of Basis Functions . . . 37

2.3 Asymptotic Properties . . . 38

2.4 Analytical Comparison of Approximation Models . . . 42

2.5 Numerical Results . . . 43

2.5.1 Scenario 1: Bernstein Polynomial as True Regression Curve . . . 44

2.5.2 Scenario 2: Non-Polynomial True Regression Curve . . . 51

2.5.3 Scenario 3: Simulated Data Similar to Transplant Data . . . 54

2.6 Analysis of UNOS Liver Transplant Data . . . 58

2.6.1 Incorporation of Censored Transplant Candidates . . . 61

2.6.2 Transplant Waiting Times and Blood-Type . . . 62

Chapter 3 Variable Selection for Ordinal and Mixed Predictors . . . 66

3.1 Introduction . . . 66

3.2 Methodology . . . 68

3.2.1 Traditional SSVS Regression Priors . . . 69

3.2.2 Parameter Expansion SSVS Regression Priors . . . 71

3.3 MCMC Sampler . . . 73

3.3.1 Gibbs Sampler for O-SSVS . . . 74

3.3.2 Gibbs Sampler for peO-SSVS . . . 75

3.3.3 Parameter Transformations . . . 77

3.4 Numerical Analysis . . . 78

3.4.1 Scenario 1 . . . 81

3.4.2 Scenario 2 . . . 84

3.4.3 Scenario 3 . . . 88

3.5 Application to UNOS Liver Transplant Data . . . 93

3.6 Discussion and Future Work . . . 101

Chapter 4 A Bayesian Competing Risks Analysis of UNOS Liver Transplant Data . . . .104

4.1 Introduction . . . 104

4.2 The Competing Risks Model . . . 106

4.2.1 Incorporation of Ordinal Predictor Model . . . 107

4.2.2 Prior Distribution Specifications . . . 108

4.3 Fitting the Model . . . 109

4.3.1 Data Augmentation . . . 109

4.3.2 Full Conditional Distributions . . . 110

4.4 Pilot Simulation . . . 111

4.5 Analysis of UNOS Liver Transplant Data . . . 114

4.6 Discussion . . . 127

BIBLIOGRAPHY . . . .129

APPENDICES . . . .140

Appendix A Supplementary Materials for Unimodal Density Estimation . . . 141

A.1 Optimization and Linear Constraints Details . . . 141

A.2 Proof of Bounds of Density Support . . . 142

A.3 R function: umd . . . 142

A.4 Additional Tables and Figures . . . 144

Appendix B Supplementary Materials for Non-parametric Regression Model for Ordinal Predictors . . . 146

B.1 Proof of Lemma 1 (Consistency of MLE for fixedN) . . . 146

B.2 Proof of Theorem 1 (Consistency of MLE) . . . 146

B.3 Proof of Theorem 2 (Convergence Rate of MLE) . . . 151

B.4 Sampling Algorithm forc_{m−}1 . . . 153

B.5 Metropolis Hastings Random Walk Block Sampling Algorithm . . . 154

B.7 Calculation of Hellinger Distance for Numerical Results . . . 156

B.8 Additional Figures . . . 158

B.8.1 Scenario 1 . . . 158

B.8.2 Scenario 2 . . . 162

B.8.3 Scenario 3 . . . 168

Appendix C Supplementary Materials for Variable Selection for Ordinal and Mixed Predictors . . . 172

C.1 Standardization of Bernstein Polynomial Design Matrix . . . 172

C.2 Parameter Transformation Details . . . 173

C.3 Additional Figures . . . 174

LIST OF TABLES

Table 1.1 RMISE values_{×}10 and average estimate of mfor 1,000 MC samples with

MC standard errors displayed in parentheses . . . 9

Table 1.2 Support (−∞,∞): estimated expected value of functional norms ×10 for the Bernstein polynomial method using CN criterion (UMD-CN) and Wolters (2012) method (Wolters) across 1,000 MC samples for left-skewed, symmetric, and right-skewed distributions with samples of size 15, 30, 100, and 500. MC standard errors are displayed in parentheses. . . 12

Table 1.3 Support [0,_{∞}): estimated expected value of functional norms_{×}10 for the
Bernstein polynomial method using CN criterion (UMD-CN) and Wolters
(2012) method (Wolters) across 1,000 MC samples for left-skewed and
right-skewed distributions with samples of size 15, 30, 100, and 500. MC
standard errors are displayed in parentheses. . . 13

Table 1.4 Support [_{−}1,1]: estimated expected value of functional norms _{×}10 for
the Bernstein polynomial method using CN criterion (UMD-CN) and
Wolters (2012) method (Wolters) across 1,000 MC samples for left-skewed,
symmetric, and right-skewed distributions with samples of size 15, 30, 100,
and 500. MC standard errors are displayed in parentheses. . . 14

Table 1.5 Average computation times (in seconds) for the Bernstein polynomial method using the CN criterion (UMD-CN) and Wolters (2012) method (Wolters) across 1000 MC samples for left-skewed, symmetric, and right-skewed distributions on supports of (−∞,∞), [0,∞), and [−1,1] with samples of size 15, 30, 100 and 500. MC standard errors are displayed in parentheses. . . 15

Table 2.1 Average computation time in minutes for each method across 50 MC samples for simulation scenario 3. . . 58

Table 3.1 Summary of Methods for Simulation Studies . . . 80

Table 3.2 Scenario 1: R2 _{values of generated data for true regression parameters}
(Oracle) along with averageR2 across 100 MC samples for each method. . 82

Table 3.3 Scenario 2: R2 _{values of generated data for true regression parameters}
(Oracle) along with the average R2 _{values across 100 MC samples for each}
method. . . 87

Table 3.4 Scenario 3: Summary of categorical and continuous predictors. . . 91

Table 3.5 Summary of categorical predictors in UNOS data set. . . 96

Table 3.6 Summary of continuous predictors in UNOS data set. . . 97

LIST OF FIGURES

Figure 1.1 Support (_{−∞},_{∞}): plots of average estimated densities for the Bernstein
polynomial method using the CN criterion (dashed line) and Wolters
method (dotted line) across 1,000 MC samples for samples of size 15, 30,
100, and 500 along with the true density curve (solid line) for each type of
skewness. . . 16
Figure 1.2 Support [0,∞): plots of average estimated densities for the Bernstein

polynomial method using the CN criterion (dashed line) and Wolters
method (dotted line) across 1,000 MC samples for samples of size 15, 30,
100, and 500 along with the true density curve (solid line) for each type of
skewness. . . 18
Figure 1.3 Support [_{−}1,1]: plots of average estimated densities for the Bernstein

polynomial method using the CN criterion (dashed line) and Wolters method (dotted line) across 1,000 MC samples for samples of size 15, 30, 100, and 500 along with the true density curve (solid line) for each type of skewness. . . 19 Figure 1.4 Plots of the average estimated densities using the CN criterion for 1,000

MC samples of size 15 (long dashed line), 30 (dot-dash line), and 100 (dotted line) along with the true mixture distribution (short dashed line) and standard normal distribution (solid line) . . . 20 Figure 1.5 Histogram of duration in days of psychiatric treatments for 86 control group

patients, CN criterion unimodal Bernstein polynomial density estimate (solid line) and 95% confidence interval for the density estimate (gray shading), and exponential distribution with sample mean (dashed line). . 20 Figure 1.6 Histogram of 756 S&P 500 daily log return values from 2008 to 2010 along

with the Bernstein polynomial density estimate with m_{�} = 117 weights
(solid line), AIC Bernstein polynomial density estimate (short dashed line),
CN criterion Bernstein polynomial density estimate (dotted line), and
normal distribution with sample mean and sample standard deviation
(long dashed line). The 95% confidence interval bands from 100 bootstrap
samples for each density estimate are displayed in the shaded region around
the density estimate . . . 21

Figure 2.1 Hellinger distance between second and first order approximation models and the Exact model for a range of values of ψ2

0/σ2_{0}. . . 43
Figure 2.2 Scenario 1: Plot ofg0(·) with red dashed vertical lines denoting the true

cut-point values. . . 44 Figure 2.3 Scenario 1: Boxplots of the estimates of the latent regression function

Figure 2.4 Scenario 1: Boxplots of the estimates of the latent regression function for

n= 250 for the 2nd Order-TSE and 1st Order-TSE methods for 50 MC samples over a uniform grid of uvalues ranging from 0 to 1 for both the DIC and WAIC selection criterion. Vertical red dashed-lines symbolize estimated cut-points for the respective methods. The true latent regression curve is displayed as a solid red line. . . 48 Figure 2.5 Scenario 1: Scatter plots comparing the estimated value of N for the DIC

and WAIC criterion for Bayesian methods, and AIC and BIC criterion for the likelihood based methods. The red dotted lines denote the value of N

for the Bernstein polynomial used to generate the data. . . 49 Figure 2.6 Scenario 1: Boxplots of estimated main eﬀects using the DIC and AIC

selection criterion of the ordinal levels across 50 MC samples. . . 50 Figure 2.7 Scenario 1: Boxplots of the bias of the estimates of σ2 and the components

of δ for all methods across 50 MC samples. . . 50 Figure 2.8 Scenario 2: Plot of true regression function and red dashed vertical lines

denoting the true cut-point values. . . 51 Figure 2.9 Scenario 2: Boxplots of the estimates of the latent regression function

for n= 200 for 50 MC samples over a uniform grid ofu values ranging from 0 to 1. Vertical thin red lines symbolize estimated cut-points for the respective methods. The true latent regression curve is displayed as a solid red line. . . 52 Figure 2.10 Scenario 2: Boxplots of the estimates of the latent regression function

for n= 500 for 50 MC samples over a uniform grid ofu values ranging from 0 to 1. Vertical thin red lines symbolize estimated cut-points for the respective methods. The true latent regression curve is displayed as a solid red line. . . 53 Figure 2.11 Scenario 2: Boxplots of the estimated main eﬀects of the ordinal levels for

n= 200 across 50 MC samples. . . 54 Figure 2.12 Scenario 2: Boxplots of the estimated main eﬀects of the ordinal levels for

n= 500 across 50 MC samples. . . 54 Figure 2.13 Scenario 2: Boxplots of the bias of the estimates ofσ2 and the components

of δ for all methods for sample sizen= 200 across 50 MC samples. . . 55 Figure 2.14 Scenario 2: Boxplots of the bias of the estimates ofσ2 and the components

of δ for all methods for sample sizen= 500 across 50 MC samples. . . 55 Figure 2.15 Scenario 3: Plot of true regression function and red dashed vertical lines

denoting the cut-point values. . . 56 Figure 2.16 Scenario 3: Boxplots of the estimated main eﬀects of the ordinal levels

across 50 MC samples. . . 57 Figure 2.17 Exact Model posterior mean and 95% credible interval estimates of the

Figure 2.18 Estimates of ordinal level main eﬀects for the Ord Pens and LM methods of days on waiting list for training set of only candidates which received a transplant. . . 61 Figure 2.19 Posterior mean and 95% credible interval estimates of the latent regression

function for the Exact Model and estimates of ordinal level main eﬀects for the Ord Pens and LM methods of days on waiting list for training set of only candidates which received a transplant. Red vertical lines correspond to the posterior mean estimates of the cut-points. . . 62 Figure 2.20 Absolute prediction error in log days for transplant only candidates in the

test set for all three methods. . . 63 Figure 2.21 Exact Model posterior mean and 95% credible interval estimates of the

regression functions for training set with and without censored transplant candidates. . . 64 Figure 2.22 Exact Model posterior mean and 95% credible interval estimates of the

regression functions for data partitions based on blood-type including censored candidates. . . 65

Figure 3.1 Scenario 1: R2 _{values of generated data for true regression parameters}

(Oracle) along with averageR2 across 100 MC samples for each method. . 83 Figure 3.2 Scenario 1: Proportion of 100 MC samples each predictor is selected.

Bayesian methods use the selection criterion: posterior mean of the selection parameter, γk, greater than or equal to 0.5. . . 84 Figure 3.3 Scenario 1: Posterior mean of each component of γ across 100 MC samples. 85 Figure 3.4 Scenario 1: Estimates of the true regression function at the first level ofW1

for the O-SSVS and peO-SSVS methods. The light gray region represents
the 95% credible interval with the dark black line in the center of the
region representing the posterior mean. The dark gray regions around the
posterior mean and 95% credible interval bounds represent MC confidence
intervals for those respective estimates across the 100 MC samples. The
true regression function is denoted by the red line. . . 86
Figure 3.5 Scenario 2: R2 _{values of generated data for true regression parameters}

(Oracle) along with the averageR2 values across 100 MC samples for each method. . . 88 Figure 3.6 Scenario 2: Proportion of 100 MC samples each predictor is selected.

Bayesian methods use the selection criterion: posterior mean of the selection parameter, γk, greater than or equal to 0.5. . . 89 Figure 3.7 Scenario 2: Posterior mean of each component of γ across 100 MC samples. 90 Figure 3.8 Scenario 2: Estimates of the true regression function at the first level ofW1

Figure 3.9 Scenario 3: Distribution of observations across levels of categorical predic-tors. Circle size indicates probability mass associated with corresponding level. . . 91 Figure 3.10 Scenario 3: R2 values of generated data for true regression parameters

(Oracle) along with averageR2 _{across 50 MC samples for each method. . 92}

Figure 3.11 Scenario 3: Proportion of 50 MC samples each predictor is selected. Bayesian methods use the selection criterion: posterior mean of the selec-tion parameter, γk, greater than or equal to 0.5. . . 94 Figure 3.12 Scenario 3: Posterior mean of each component ofγ across 50 MC samples. 95 Figure 3.13 Heat map of absolute values of Spearman rank correlation matrix of

predictors in UNOS data set. Only displays predictors with an absolute correlation of at least 0.25 with another predictor. . . 98 Figure 3.14 Posterior mean of selection parameter for each predictor in UNOS dataset

sorted largest to smallest. . . 99 Figure 3.15 Boxplots of the posterior draws of the multiplicative eﬀects of the levels

of the categorical variables. The percent of observations within each level is in parentheses next to variable names. . . 102 Figure 3.16 Boxplots of the posterior draws of the inflation factor of the continuous

variables for a one standard deviation increase from the mean. . . 103 Figure 3.17 Posterior mean and 95% credible intervals of latent regression function for

initial status group variable. . . 103

Figure 4.1 Plots of the true regression functions. . . 113 Figure 4.2 Posterior mean estimate of regression functions and 95% credible interval

(light gray) for the 30 MC samples (dark gray). The true regression functions are displayed in red. . . 114 Figure 4.3 Posterior mean and 95% credible interval estimates of latent regression

functions for ordinal status group predictor for each outcome: transplant (blue), death (red), other (green). . . 116 Figure 4.4 Maximum likelihood estimates of latent regression functions of each

out-come for ordinal status group predictor for naive analysis, i.e. data parti-tioning observed outcomes. . . 117 Figure 4.5 Posterior mean estimate of probability of transplant over a range of time

values and latent status group continuous variableu. The color gradient corresponds to probabilities ranging from 0 to 1. . . 119 Figure 4.6 Posterior mean estimate of probability of transplant over a range of time

values for four specific values ofu. . . 119 Figure 4.7 Posterior mean estimate of probability of death over a range of time

values and latent status group continuous variableu. The color gradient corresponds to probabilities ranging from 0 to 1. . . 120 Figure 4.8 Posterior mean estimate of probability of transplant before death,

Pr (T1 ≤t, T2> t|u), over a range of time values and latent status group

Figure 4.9 Posterior mean estimate and 95% credible interval bands of probability of transplant before death, Pr (T1< T2 |u), over the latent status group

continuous variable u. . . 122 Figure 4.10 Boxplots of the posterior draws of the multiplicative eﬀects of the levels of

the categorical variables for both the transplant and death outcomes. The proportion of observations within each level is displayed in parentheses next to variable names. . . 123 Figure 4.11 Boxplots of the posterior draws of the inflation factor of the continuous

variables for a one standard deviation increase from the mean for both the transplant and death outcomes. . . 124 Figure 4.12 Posterior mean estimate and 95% credible interval bands of probability

of transplant before death, Pr (T1 < T2|u,blood-type ), over the latent

status group continuous variable ufor each of the four blood-types. . . 125 Figure 4.13 Posterior mean estimate and 95% credible interval bands of probability

of transplant before death, Pr (T1 < T2|u,life-support ), over the latent

status group continuous variable u for patients on life-support at time of listing versus those who are not. . . 126 Figure 4.14 Posterior mean estimate and 95% credible interval bands of probability of

transplant before death, Pr (T1 < T2 |u,multiple listing ), over the latent

status group continuous variable u for patients who are simultaneously listed for another organ other than the liver versus those who are only on the liver transplant list. . . 126 Figure 4.15 Posterior mean estimate and 95% credible interval bands of probability

of transplant before death, Pr (T1 < T2 |u,region ), over the latent status

group continuous variable u for patients listed in region 3 versus those listed in region 5. . . 127 Figure 4.16 Posterior mean estimate and 95% credible interval bands of probability

of transplant before death, Pr (T1< T2 |u,age ), over the latent status

group continuous variable ufor three ages: one standard deviation below the mean age (41.6 years old), mean age (52.6 years old), and one standard deviation above the mean age (63.6 years old). . . 128

Figure A.1 Plots of average estimated densities using the AIC criterion across 1,000 MC samples for samples of size 15 (dashed line), 30 (dot-dash line), and 100 (dotted line) along with the true density curve (solid line) for each support and type of skewness. . . 145

Figure B.1 Scenario 1: Trace plots of 20,000 MCMC samples of theβ andσ2 _{}

parame-ters for the Exact Model method. . . 158 Figure B.2 Scenario 1: Trace plots of 20,000 MCMC samples of the components ofδ

for the Exact Model method. . . 159 Figure B.3 Scenario 1: Trace plots of 20,000 MCMC samples of theβ andσ2

Figure B.5 Scenario 1: Trace plots of 20,000 MCMC samples of theβ andσ2 _{}

parame-ters for the 1st Order-Bayes method. . . 160 Figure B.6 Scenario 1: Trace plots of 20,000 MCMC samples of the components ofδ

for the 1st Order-Bayes method. . . 161
Figure B.7 Scenario 2: Trace plots of 20,000 MCMC samples of theβ andσ2 _{}

parame-ters for the 1st Order-Bayes method. . . 162 Figure B.8 Scenario 2: Trace plots of 20,000 MCMC samples of the components ofδ

for the 1st Order-Bayes method. . . 162 Figure B.9 Scenario 2: Trace plots of 20,000 MCMC samples of theβ andσ2

parame-ters for the Exact Model method. . . 163 Figure B.10 Scenario 2: Trace plots of 20,000 MCMC samples of the components ofδ

for the Exact Model method. . . 163 Figure B.11 Scenario 2: Trace plots of 20,000 MCMC samples of theβ andσ2

parame-ters for the 2nd Order-Bayes method. . . 164 Figure B.12 Scenario 2: Trace plots of 20,000 MCMC samples of the components ofδ

for the 2nd Order-Bayes method. . . 164 Figure B.13 Scenario 2: Boxplots of the estimates of the latent regression function for

WAIC and BIC selection criterion forn = 200 for 50 MC samples over a uniform grid of u values ranging from 0 to 1. Vertical thin red lines symbolize estimated cut-points for the respective methods. The true latent regression curve is displayed as a solid red line. . . 165 Figure B.14 Scenario 2: Boxplots of the estimates of the latent regression function for

WAIC and BIC selection criterion forn = 500 for 50 MC samples over a uniform grid of u values ranging from 0 to 1. Vertical thin red lines symbolize estimated cut-points for the respective methods. The true latent regression curve is displayed as a solid red line. . . 166 Figure B.15 Scenario 2: Scatter plots comparing the estimated value ofN for the DIC

and WAIC criterion for Bayesian methods, and AIC and BIC criterion for the likelihood based methods for n= 200. . . 167 Figure B.16 Scenario 2: Scatter plots comparing the estimated value ofN for the DIC

and WAIC criterion for Bayesian methods, and AIC and BIC criterion for the likelihood based methods for n= 200. . . 167 Figure B.17 Scenario 3: Boxplots of the estimated main eﬀects of the ordinal levels for

the WAIC and BIC selection criterion across 50 MC samples. . . 168 Figure B.18 Scenario 3: Boxplots of the estimates of the latent regression function for

DIC and AIC selection criterion for 50 MC samples over a uniform grid of

u values ranging from 0 to 1. Vertical thin red lines symbolize estimated cut-points for the respective methods. The true latent regression curve is displayed as a solid red line. . . 169 Figure B.19 Scenario 3: Boxplots of the estimates of the latent regression function for

Figure B.20 Scenario 3: Scatter plots comparing the estimated value ofN for the DIC and WAIC criterion for Bayesian methods, and AIC and BIC criterion for the likelihood based methods. . . 171 Figure B.21 Scenario 3: Boxplots of the bias of the estimates ofσ2 and the components

of δ for all methods across 50 MC samples. . . 171

Figure C.1 Scenario 1: MAE of predictions for thentest= 200 test set across 100 MC

samples. . . 174 Figure C.2 Scenario 2: MAE of predictions for thentest= 300 test set across 100 MC

samples. . . 175 Figure C.3 Scenario 3: MAE of predictions for thentest= 1,000 test set across 50 MC

samples. . . 175

Figure D.1 Posterior mean estimate of probability of transplant before death, Pr (T1 ≤t, T2> t|u, blood-type ), for each blood-type over a range of

time values and latent status group continuous variable u. The color gradient corresponds to probabilities ranging from 0 to 1. . . 177 Figure D.2 Posterior mean estimate of probability of transplant before death,

Pr (T1 ≤t, T2> t|u, life-support ), for patients on life support at the

time of listing versus those not on life support over a range of time values and latent status group continuous variableu. The color gradient corresponds to probabilities ranging from 0 to 1. . . 178 Figure D.3 Posterior mean estimate of probability of transplant before death,

Pr (T1 ≤t, T2> t|u, multiple-listing ), for patients who are

simultane-ously listed for another organ other than the liver versus those who are only on the liver transplant list over a range of time values and latent status group continuous variable u. The color gradient corresponds to probabilities ranging from 0 to 1. . . 178 Figure D.4 Posterior mean estimate of probability of transplant before death,

Pr (T1 ≤t, T2> t|u, region ), for patients patients listed in region 3

ver-sus those listed in region 5 over a range of time values and latent status group continuous variable u. The color gradient corresponds to probabili-ties ranging from 0 to 1. . . 179 Figure D.5 Posterior mean estimate of probability of transplant before death,

Pr (T1 ≤t, T2> t|u, age ), for patients one standard deviation below the

### Chapter 1

## Unimodal Density Estimation Using

## Bernstein Polynomials

### 1.1

### Introduction

Statistical inference is typically based on an assumed family of unimodal parametric models. Nonparametric density estimation is a popular alternative when that parametric assumption is not appropriate for modeling the density of the underlying population. The kernel method, developed by Parzen (1962), is one of the most popular methods of nonparametric density estimation. It is defined as the weighted average of kernel functions centered at the observed values. This average is taken with respect to the empirical cumulative distribution function (ECDF), Fn(·), and is dependent on a smoothing or bandwidth parameter.

If one believes the underlying population’s density is unimodal, there are two major advan-tages to including a unimodality constraint in the density estimate. First, incorporating extra information about the shape of the density should improve the overall accuracy of the estimate. Second, extraneous modes, which may hinder the usefulness of the density estimate as a visual aid and exploratory tool, will be eliminated (Wolters, 2012).

1.1.1 Unimodal Density Estimation

the distribution and therefore requires a large bandwidth value in order to produce a unimodal density estimate. As the sample size increases, the bandwidth may even diverge to infinity if the data are sampled from a heavy tailed density, similar to the Student’s tdistribution with small degrees of freedom.

Other approaches to unimodal density estimation extend from the estimation of monotone densities, which are simply special cases of unimodal densities with the mode located on a boundary of the density’s support. Grenander (1956) proposed the nonparametric maximum likelihood estimator which is the derivative of the least concave majorant of the ECDF for nonincreasing densities, and the derivative of the greatest convex minorant of the ECDF for nondecreasing densities. Later research focussed on extending the Grenander estimator to any unimodal density. The typical approach is to combine a nondecreasing Grenander estimate to the left of the mode and a nonincreasing estimate to the right. The key element of this approach is determining the location of the mode. Wegman (1972) proposed specifying a modal interval, while Bickel and Fan (1996) used a consistent point estimate of the mode location, and Birg´e (1997) selected the mode that minimized the distance between the distribution estimate and ECDF. Since all these estimates are based on the nonsmooth Grenander estimate, they each produce a step-function density estimate. Bickel and Fan (1996), however, did present methods for smoothing the estimated density.

Other novel approaches include, Foug´eres (1997) who used a monotone rearrangement, as suggested by Hardy et al. (1952), to transform a multimodal density estimate into a unimodal form. This, however, requires the assumption that the location of the mode is known. Cheng et al. (1999) developed a unique method which treats a general unimodal density as a transformation

of some known, but subjective, unimodal template. They presented a recursive algorithm for estimating the transformation and showed how to adjust the technique for density estimation under the monotonicity constraint. The algorithm produces a sequence of successive step function approximations of the true density, which require some form of smoothing in order to make the method an appealing estimate of a smooth density.

data and original data set using aLα distance, for 1≤α≤2. They obtained the sharpened data vector that minimized theLα norm using sequential quadratic programming (SQP) techniques. Hall and Huang (2002) also used SQP methods to perform unimodal density estimation by reweighting, or tilting, the empirical distribution. There are numerous issues with using SQP for unimodal density estimation, including the requirement that the location of the mode must be explicitly defined, and whenα= 1 the L1 norm is not strictly convex so solutions may not be

unique. The biggest issue, however, is that the constraint functions may not always be convex functions of the sharpened data set, so the SQP could improperly converge to local optima, or in some cases may not converge at all (Wolters, 2012). Wolters (2012) attempted to remedy these issues by proposing a greedy algorithm which always converges to a sensible solution, does not require the location of the mode to be pre-specified, and requires less computing time than SQP, but like SQP, the algorithm is sensitive to its starting values.

1.1.2 Density Estimation with Bernstein Polynomials

Bernstein polynomials were first studied by Bernstein in 1912, who developed them as a probabilistic proof of the Weierstrass Approximation Theorem. He showed that any continuous function, f(x), on a closed interval [a, b] can be uniformly approximated using Bernstein polynomials by,

Bm(x, f) = m

�

k=1

f

�

a+ k−1

m_{−}1(b−a)

� �

m−1

k_{−}1

� �

x−a
b_{−}a

�k−1�

b−x
b_{−}a

�m−k

, (1.1)

for a_{≤}x_{≤}b. The Bernstein-Weierstrass Approximation Theorem assures that as the degree of
the polynomial increases to infinity the Bernstein polynomial approximation converges uniformly
to the true function, i.e. _{�}Bm(·, f)−f(·)�∞ ≡ sup_{a≤x≤b}|Bm(x, f)−f(x)| → 0, as m → ∞
(Lorentz, 1986).

Bernstein polynomials are an attractive approach to density estimation as they are the simplest example of a polynomial approximation with a probabilistic interpretation. They also naturally lead to estimators with acceptable behavior near the boundaries (Leblanc, 2010). Vitale (1975) was the first to propose using Bernstein polynomials to produce smooth density estimates.

Petrone (1999) to probability distributions with bounded supports.

This paper presents a method of unimodal density estimation using Bernstein polynomials. A density estimate is obtained using quadratic programming techniques to minimize a scaled squared distance between the Bernstein distribution function estimate constrained to unimodality and the ECDF of the data. Multiple approaches for selecting the degree of the polynomial are presented as well, along with a small simulation comparing the eﬀectiveness of each approach. The performance of the proposed method when estimating densities of various supports and levels of skewness is assessed in a Monte Carlo simulation. We also show how the method handles data that is contaminated with outliers. A small section in Appendix A is dedicated to describing R code we provide for easy implementation our method. Finally, we apply our method to two real data sets and discuss its performance compared to the traditional approach of assuming the data follow some unimodal parametric distribution.

### 1.2

### Methodology

We begin by assuming Xi iid∼ f(·), for i= 1,2, . . . , n, and it is known that f(·) is a unimodal
continuous density function. In other words, we assume that there exists an x∗ ∈ R, such
that f(x) is non-decreasing on (_{−∞}, x∗) and f(x) is non-increasing on (x∗,_{∞}). Our goal is to
construct an estimate of f(·), ˆf(·),which satisfies the following conditions:

(i) ˆf(x)_{≥}0, for all x_{∈}_{R},
(ii) �fˆ(x)dx= 1,

(iii) ˆf(x) is unimodal, i.e. there exists an ˆx∗∈R, such that ˆf(x) is non-decreasing on (−∞,xˆ∗)
and ˆf(x) is non-increasing on (ˆx∗,_{∞}).

It is also desirable that the proposed density estimate, ˆf(·), satisfies a set of asymptotic properties, e.g. consistency. Without loss of generality, we first consider densities with support [0,1], and then extend our methodology to more general supports.

We begin by considering a Bernstein polynomial of order m−1 to estimatef(x),

Bm(x, f) = m

�

k=1

f

�

k_{−}1

m−1

� �

m_{−}1

k−1

�

xk−1(1_{−}x)m−k. (1.2)

However, it is clear thatBm(x, f), defined in (1.2), is not a proper density function. We therefore consider a re-scaled version ofBm(x, f),

m

which is a proper density function for any value m. This motivates us to consider the following class of density estimates,

fm(x,ω) = m

�

k=1

ωkfb(x;k, m−k+ 1), (1.4)

where ω = (ω1, ω2, . . . , ωm)T is a vector of weights of size m, and fb(·) is the beta density
function with shape parameterskandm_{−}k+ 1. Notice thatfm(x,ω) will obtain all the desired
properties of ˆf(·) if the vector of weights, ω, satisfies the following constraints:

(a) ωk ≥0, fork= 1, . . . , m, (b)

m

�

k=1

ωk= 1, and

(c) ω1 ≤ ω2 ≤ · · · ≤ ωk∗ ≥ ωk∗+1 ≥ · · · ≥ωm, for some k∗ ∈ {1,2, . . . , m}, i.e. the ωk’s are non-decreasing fork≤k∗ and non-increasing fork≥k∗.

Constraint (a) implies property (i), as�m−_{k−}_{1}1�xk−1(1−x)m−k is non-negative for all values ofx in
[0,1]. Integratingfm(x,ω) over the interval [0,1] shows that property (ii) holds when constraint
(b) is satisfied. Finally, Carnicer and Pe˜na (1993) showed that a Bernstein polynomial basis is
“optimally” shape-preserving, such that the unimodal shape constraint on the weights, (c), will

result in a unimodal density estimate,fm(x,ω), satisfying property (iii).

Our next goal is to estimateω andk∗ for a given m, and then select mby some information
theoretic and related criteria. LetFn(x) = 1_{n}�n_{i}_{=1}I(xi≤x) be the ECDF for some observed
independent and identically distributed (i.i.d.) data, _{{}x1, . . . , xn}, from f(·). One obvious

estimation method is to simply set ˜ωk = Fn

�

k m−1

�

−Fn

�

k−1 m−1

�

(Leblanc, 2010) and let ˆ

k∗ = argmax

1≤k≤m−1

˜

ωk. Using the results established by Babu et al. (2002) it follows that fm(x,ω˜)
is consistent, with respect to the sup-norm, L_{∞}, for f(x) as m _{→ ∞} and n_{→ ∞}, such that
2_{≤}m_{≤}(n/logn).

It can also be shown that ˆx∗_{m} = argmax
x∈[0,1]

fm(x,ω˜) is consistent forx∗ asm→ ∞andn→ ∞, satisfying the above conditions, using the continuity and monotonicity properties of f(x) on [0, x∗) and (x∗,1]. Although the above estimate, (ω,˜ ˆk∗), is consistent for estimatingfm(x,ω), given by (1.4),ω˜ is not guaranteed to satisfy condition (c), and hencefm(x,ω˜) is not necessarily a unimodal density satisfying property (iii).

Motivated by this result, we propose the following criteria to estimateω: findωˆ that minimizes n

�

i=1

n[Fn(xi)−F˜m(xi,ω)]2 (Fn(xi) +εn)(1 +εn−Fn(xi))

, (1.5)

subject to the restrictions (a)-(c) withk∗ = ˆk∗, and

˜

Fm(x,ω) =

� x

0

fm(u,ω)du= m

�

k=1

ωkFb(x;k, m−k+ 1),

whereFb(·, k, m−k+1) is the cdf of a beta distribution with shape parameterskandm−k+1. The
small nudge factor,εn= _{8}3_{n}, is added to avoid numerical instabilities, following the second-order
corrections suggested by Anscombe and Aumann (1963).

It can be easily shown that the optimization problem in (1.5) can be solved by quadratic programming techniques subject to linear inequality constraints. More specifically, (1.5) can be written as: minimize

−b ω+1 2ω

T_{Aω}_{+}_{c,} _{(1.6)}

subject to Rω≥d, whereA,b, c, R,and dare given in Appendix Section A.1.

Our methodology can be extended to more general supports of [a, b] using the simple linear
transformation,u= (x_{−}a)/(b_{−}a), resulting in the following expression for the density estimate,

fm(x,ω) = m

�

k=1

ωkfb

�

x−a

b_{−}a;k, m−k+ 1

�

/(b_{−}a). (1.7)

When no information is known about the support off(_{·}),a=x_{(1)}_{−}s/√nandb=x_{(}_{n}_{)}+s/√n

provide reasonable bounds for the estimated density, where x(1) and x(n) are the first and last

order statistics of the data, andsis the sample standard deviation. These bounds are motivated
by the fact that Pr�X_{(1)}_{−}√s

n ≤Xn+1 ≤X(n)+ s √ n

�

≥Pr�X_{(1)}_{≤}Xn+1≤X(n)

�

= n_{n}−_{+1}1, with
justification provided in Appendix Section A.2.

An alternative method for estimating the support of the density uses ideas presented in de Carvalho (2011, 2012), who provides methodology for creating confidence intervals for the extreme values of functions. Using his results we create intervals for the minimum and maximum bounds of the support of the density, defined as,

�

X_{(1)}− X(2)−X(1)
(1_{−}p)−2_{−}_{1}, X(1)

�

,

and _{�}

respectively, where pdenotes the level of significance of the confidence interval. We use the most extreme values of each of the intervals to create an estimate of the support, defined as,

�

X_{(1)}_{−} X(2)−X(1)

(1−p)−2_{−}_{1}, X(n)+

X(n)−X(n−1)

(1−p)−2_{−}_{1}

�

,

with p set to the reasonable value of 0.05. A small simulation study, not reported, shows that
the de Carvalho (2011, 2012) method tends to select wider and more conservative supports than
the bounds of �X_{(1)}_{−}s/√n, X_{(}_{n}_{)}+s/√n�.

We are able to obtain standard errors for the density estimate along a fixed grid of values by implementing the common bootstrap sampling technique (Efron, 1979). This technique can also be used to obtain standard errors for the estimate of the mode of the density as well as the number of weights. We illustrate the use of the bootstrap method to create 95% confidence interval bands for the density estimates in the data analysis portion of this chapter in Section 1.4.

1.2.1 Selection of the Number of Weights 1.2.1.1 Condition Number (CN)

Before solving the minimization problem, the optimal number of weights, m, must be selected. The square matrixA, from expression (1.6), must be positive definite, and the size of the matrix depends onm. If too many weights are selected the matrix no longer remains positive definite due to numerical instabilities, and the minimization procedure cannot be completed as quadratic programming techniques depend on the numerical invertibility of theAmatrix. Ideally we would like to include a suﬃcient number of weights to properly estimate the density, while keeping A

numerically positive definite.

We present a novel procedure for selecting the number of weights by examining the condition number of A. Since A is a normal matrix the condition number is evaluated by, CN(m) =

� �

�λmax(A)

λmin(A)

� �

�, where λmax(A) and λmin(A) are the maximum and minimum eigenvalues of A
respectively. We selectmby including the largest number of weights possible while still bounding
log_{10}CN(m) by √n.

1.2.1.2 AIC and BIC

density estimation, these criteria are given by,

AIC(m) =_{−}2
n

�

i=1

log[ ˆfm(xi,ω�m)] + 2(m−1), (1.8)

BIC(m) =_{−}2
n

�

i=1

log[ ˆfm(xi,ω�m)] + log(n)(m−1), (1.9)

wheremis the number of weights, ˆfm is the estimated density usingm weights,ω�m is the vector
of estimated weights, and xi for i= 1, . . . , nare the observations. The eﬀective dimension of the
weight vector ω_{�}m is m−1 since the weights are contrained to sum to one.

A variation of Theorem 3.1 in Babu et al. (2002) shows that ˆfm will converge uniformly
to f for 2 ≤ m ≤ (n/logn) as n → ∞, so we implement both the AIC and BIC methods
by estimating the density with m weights ranging from 2 to _{�}n/logn_{�}. The AIC or BIC is
calculated for each fit, then the density estimate with the lowest AIC or BIC is selected as the
best estimate.

1.2.2 Comparison of Selection Criterion

We compare these three methods of selecting m by performing a small simulation study. We generate data from a mixture of Beta densities with a fixed vector of weights, which follows the required format of our density estimation model. We then generate density estimates using the AIC, BIC, and CN criterion to select m. Our goal is to find the method which provides the best density estimate in terms of a low root mean integrated squared error (RMISE) and selects the true number of weights of the beta mixture. The RMISE is approximated as follows,

RMISE = E�_{�}fˆm_{�} −f�2

�

≈ _{N}1
N

�

�=1

� � � �d

J

�

j=1

�

ˆ

f_{m}(_{�}�)(xj)−f(xj)

�2

, (1.10)

where J = 100, Pr[x1 ≤X ≤xJ] = 0.999, and d= (xj −xj−1) = xJ_{J}−x_{−}_{1}1 for all j ≥2. In the

above expression, ˆf_{m}(_{�}�) denotes the estimated density at the �th sample for�= 1, . . . , N withm_{�}

chosen by one of the three criteria.

We examine a symmetric and right-skewed beta mixture each with 7 weights, i.e. m= 7. The weight vectors are fixed to be ω1 = {0.05,0.1,0.2,0.3,0.2,0.1,0.05} for the symmetric

distribution, and ω2 ={0.1,0.4,0.25,0.15,0.05,0.03,0.02} for the right-skewed distribution. We

generate N = 1,000 Monte Carlo (MC) samples of sizen= 15,30,100,and 500 for each beta mixture. Table 1.1 displays the MC estimated RMISE values as well as the average estimate of

are in bold.

Table 1.1: RMISE values _{×}10 and average estimate ofm for 1,000 MC samples with MC standard

errors displayed in parentheses

Symmetric Right-Skewed

n= 15 30 100 500 15 30 100 500

RMISE

CN 3.631 2.718 1.772 0.999 3.943 2.807 1.914 1.051

(0.049) (0.029) (0.018) (0.008) (0.049) (0.0340) (0.020) (0.009) AIC 3.673 2.542 1.347 0.749 3.944 3.080 1.871 0.712

(0.047) (0.033) (0.017) (0.007) (0.046) (0.028) (0.025) (0.010) BIC 3.680 2.548 1.293 0.752 3.943 3.102 2.302 0.773

(0.047) (0.033) (0.017) (0.006) (0.045) (0.026) (0.023) (0.014)

Average Number of Weights

CN 3.882 5.934 12.235 24.223 3.799 5.744 11.867 24.082 (0.010) (0.008) (0.014) (0.046) (0.013) (0.014) (0.016) (0.043) AIC 3.030 3.125 3.411 3.977 3.031 3.223 4.429 5.226

(0.006) (0.012) (0.026) (0.034) (0.007) (0.022) (0.037) (0.022) BIC 3.010 3.046 3.085 3.203 3.019 3.034 3.571 4.883

(0.003) (0.007) (0.011) (0.018) (0.005) (0.008) (0.027) (0.018)

The AIC method has the lowest average RMISE in most situations. Unfortunately, none of the methods have average number of weight estimates that are close to 7. The CN method appears to select the number of weights at around √n, which is much larger than 7 for the cases of n= 100 and 500. Both the AIC and BIC underestimatem for all sample sizes with the AIC having slightly larger estimates. Overall it appears the CN criterion is a better option for samples with fewer observations while the AIC performs the best for larger samples of size 100 and 500.

In an attempt to enable users to easily implement our method we provide R code which is available from the author upon request. Our code features two functions:umd, which is the main function that executes our method, andumd.se, which calculates the standard errors for the density estimate using the bootstrap method as described earlier. We outline the inputs and returned values for each of the functions in Appendix Section A.3.

### 1.3

### Simulation Study

We assess each methods performance estimating densities with three common supports: (_{−∞},_{∞}),
[0,∞), and [−1,1]. Densities with left skewed, right skewed, and symmetric shapes are chosen for
each support with the exception of a symmetric density for the [0,_{∞}) case. We also compare the
computation times for each method. In this section we present the results using the CN criterion
to selectm; we include the results for the AIC method in Appendix Section A.4, excluding the

n= 500 case.

1.3.1 Density Estimation with Support (−∞,∞)

On the (_{−∞},_{∞}) support we select a normal density with µ= 0 and σ2 = 4. We skew this
normal distribution by multiplying the density function by the scaled cdf, F∗(cx). In other
words the true density is given by,

f(x) = 1 8πexp

�

−x 2

8

� � cx

−∞ exp

�

−t 2

8

�

dt. (1.11)

We set c= 0,2.2,and _{−}2.2 resulting in symmetric, left skewed, and right skewed distributions
with skewness factors of 0, 0.5, and_{−}0.5 respectively.

1.3.2 Density Estimation with Support [0,_{∞})

A gamma distribution with α= 16, β = 0.5, mean αβ= 8, and skewness = 0.5 is used for a
right skewed distribution on the [0,_{∞}) support. A beta distribution, with α= 10 and β = 5.5,
scaled by 25 is selected to simulate a left skewed density on that same support. The resulting
expression for the beta density function is,

f(x) = 1 25

Γ(α+β) Γ(α)Γ(β)

�_{x}

25

�(α−1)�

1− x 25

�(β−1)

, x∈(0,25).

This distribution has a skewness factor of approximately −0.5 as well. When estimating each of these distributions we assume it is known that the data cannot have values less than 0, so the

fix.lower parameter is set to 0.

1.3.3 Density Estimation with Support [_{−}1,1]

We select the less traditional half circle density on the [_{−}1,1] support. We create this proper
density by rescaling the half circle function. This density is skewed by multiplying by the scaled
cdf, resulting in the following density function,

f(x) = 4

π

�

1_{−}x2

�

1

2arcsin(cx) +

cx

2

�

1_{−}(cx)2_{+}_{π/}_{4}

�

Symmetric and skewed distributions are obtained by setting c = 0,1, and _{−}1. The skewed
distributions have skewness factors of approximately 0.4 and −0.4. The probability integral
transformation is used to generate random samples from the symmetric distribution. We use
rejection sampling with proposal density proportional to 2√1_{−}x2 _{to sample from the skewed}

distributions. We assume the true densities’ support of −1 to 1 is known when applying our
density estimation method, so thefix.lower andfix.upper parameters are set to _{−}1 and 1
respectively.

1.3.4 Evaluation Method

We evaluate the performance of the density estimates using the RMISE, outlined in equation (1.10), and the estimated expected value of two other popular functional norms:

ML1E = E

�

�fˆm_{�} −f�1

�

≈ 1

N

N

�

�=1

d

J

�

j=1

� �

�fˆ_{m}(_{�}�)(xj)−f(xj)

� � � ,

ML_{∞}E = E��fˆm_{�} −f�∞

�

≈ 1

N

N

�

�=1

max x1≤xj≤xJ

� �

�fˆ_{m}(_{�}�)(xj)−f(xj)

� � �,

where J = 100, and x1 and xJ are chosen such that based on the true density of X, Pr[x1 ≤

X≤xJ] = 0.999, andd= (xj −xj−1) = xJ_{J−}−x_{1}1 for all j≥2.
1.3.5 Results

We generate N = 1,000 MC samples of size n= 15,30,100 and 500. Tables 1.2, 1.3, and 1.4 display the results of the average expected norm values for each method across the MC samples for the supports of (−∞,∞), [0,∞), and [−1,1] respectively. The MC standard errors are displayed in parentheses, and all values are multiplied by 10. Table 1.5 displays the average computation time, in seconds, required by each method for all the scenarios presented in Tables 1.2, 1.3, and 1.4. The MC standard errors for the mean computation times are displayed in parentheses as well. The Wolters method is implemented using MATLAB and the online supplemental code and instructions provided by the publishing journal.

Table 1.2: Support (_{−∞},_{∞}): estimated expected value of functional norms_{×}10 for the Bernstein

polynomial method using CN criterion (UMD-CN) and Wolters (2012) method (Wolters) across 1,000

MC samples for left-skewed, symmetric, and right-skewed distributions with samples of size 15, 30, 100, and 500. MC standard errors are displayed in parentheses.

Left-Skewed Symmetric Right-Skewed ML1E RMISE ML∞E ML1E RMISE ML∞E ML1E RMISE ML∞E

n= 15 UMD-CN 3.683 1.622 1.296 3.532 1.299 0.878 3.671 1.715 1.591 (0.038) (0.017) (0.017) (0.037) (0.014) (0.012) (0.038) (0.019) (0.023) Wolters 3.028 1.293 0.950 2.972 1.046 0.626 2.983 1.283 0.958

(0.037) (0.016) (0.013) (0.039) (0.014) (0.010) (0.036) (0.016) (0.014)

n= 30 UMD-CN 2.501 1.085 0.846 2.437 0.870 0.540 2.509 1.119 0.934 (0.029) (0.013) (0.011) (0.028) (0.010) (0.007) (0.028) (0.013) (0.014) Wolters 2.328 0.996 0.732 2.333 0.819 0.490 2.413 1.035 0.770

(0.028) (0.012) (0.009) (0.028) (0.01) (0.007) (0.028) (0.012) (0.010)

n= 100 UMD-CN 1.425 0.606 0.450 1.412 0.494 0.297 1.451 0.622 0.472 (0.016) (0.007) (0.006) (0.016) (0.006) (0.004) (0.017) (0.008) (0.006) Wolters 1.502 0.653 0.507 1.458 0.519 0.323 1.521 0.660 0.508

(0.016) (0.007) (0.006) (0.016) (0.006) (0.004) (0.017) (0.007) (0.006)

n= 500 UMD-CN 0.741 0.321 0.245 0.739 0.263 0.161 0.731 0.318 0.243 (0.008) (0.004) (0.003) (0.008) (0.003) (0.002) (0.008) (0.004) (0.003) Wolters 0.817 0.363 0.303 0.785 0.286 0.192 0.816 0.363 0.302

(0.008) (0.004) (0.003) (0.008) (0.003) (0.002) (0.008) (0.004) (0.003)

It can be seen in Table 1.4 that the Wolters method outperforms our method for almost all
cases on the [−1,1] support with the exception of the ML_{∞}E for the n= 500 case. This result
is not surprising since the half circle density has a less traditional shape, and the Bernstein
polynomials are not as flexible as a kernel density based method such as Wolters, and therefore
less equipped to accommodate the density’s odd shape. Our method, however, does provide
competitive results for sample sizes of 100 and 500, and Table 1.5 shows that our method is
about 14 times faster for the n= 100 case and about 17 times faster for the n= 500 case.

Figures 1.1, 1.2, and 1.3 display plots of the average density estimates over the 1,000 MC
samples for each sample size for both methods. The plots are consistent with the numeric results
such that the Wolters method provides more accurate density estimates for sample sizes of 15
and 30 but our method is superior for the larger samples of 100 and 500, with the exception of
the densities on the [_{−}1,1] support. Both methods adeptly estimated the location of the mode
for all proposed cases.

1.3.6 Outliers

Table 1.3: Support [0,_{∞}): estimated expected value of functional norms_{×}10 for the Bernstein

poly-nomial method using CN criterion (UMD-CN) and Wolters (2012) method (Wolters) across 1,000 MC

samples for left-skewed and right-skewed distributions with samples of size 15, 30, 100, and 500. MC standard errors are displayed in parentheses.

Left-Skewed Symmetric Right-Skewed ML1E RMISE ML∞E ML1E RMISE ML∞E ML1E RMISE ML∞E

n= 15 UMD-CN 3.553 1.316 0.882 - - - 3.508 1.347 1.020 (0.041) (0.015) (0.012) - - - (0.036) (0.015) (0.014) Wolters 2.965 1.053 0.628 - - - 3.005 1.06 0.647

(0.036) (0.013) (0.009) - - - (0.038) (0.014) (0.010)

n= 30 UMD-CN 2.335 0.848 0.545 - - - 2.466 0.912 0.625 (0.028) (0.010) (0.007) - - - (0.028) (0.011) (0.009) Wolters 2.313 0.825 0.500 - - - 2.345 0.831 0.511

(0.028) (0.010) (0.007) - - - (0.027) (0.010) (0.007)

n= 100 UMD-CN 1.468 0.528 0.325 - - - 1.451 0.520 0.332 (0.016) (0.006) (0.004) - - - (0.017) (0.006) (0.004) Wolters 1.481 0.533 0.333 - - - 1.522 0.546 0.345

(0.016) (0.006) (0.004) - - - (0.016) (0.006) (0.004)

n= 500 UMD-CN 0.767 0.281 0.182 - - - 0.731 0.265 0.168 (0.008) (0.003) (0.002) - - - (0.008) (0.003) (0.002) Wolters 0.817 0.298 0.199 - - - 0.808 0.295 0.200

(0.007) (0.003) (0.002) - - - (0.007) (0.003) (0.002)

the mixture proportion to 95%, resulting in the following normal mixture,

pN(0,1) + (1−p)N(4,0.5), (1.13)
where p_{∼}bernoulli(0.95).

We generate 1,000 MC samples from the mixture distribution with sizes n= 15,30,and 100.
Figure 1.4 displays the average density estimates across the 1,000 MC samples. Assuming the
data is truly from a_{N}(0,1) distribution, the 5% contamination does not completely compromise
the density estimate. For the n= 100 sample, the density estimate still properly locates the
mode at 0 and there is only a slightly heavier tail on the side of the contaminating distribution.

### 1.4

### Real Data Examples

1.4.1 Suicide Data

Table 1.4: Support [_{−}1,1]: estimated expected value of functional norms_{×}10 for the Bernstein

poly-nomial method using CN criterion (UMD-CN) and Wolters (2012) method (Wolters) across 1,000 MC

samples for left-skewed, symmetric, and right-skewed distributions with samples of size 15, 30, 100, and 500. MC standard errors are displayed in parentheses.

Left-Skewed Symmetric Right-Skewed ML1E RMISE ML∞E ML1E RMISE ML∞E ML1E RMISE ML∞E

n= 15 UMD-CN 3.168 2.826 5.508 2.961 2.554 4.690 2.911 2.820 7.029 (0.042) (0.035) (0.072) (0.046) (0.035) (0.058) (0.041) (0.038) (0.102) Wolters 2.674 2.321 3.925 2.378 1.944 2.877 2.671 2.328 3.963

(0.036) (0.030) (0.041) (0.034) (0.027) (0.033) (0.036) (0.03) (0.040)

n= 30 UMD-CN 2.249 2.038 4.098 2.253 1.939 3.641 2.260 2.127 5.274 (0.027) (0.023) (0.060) (0.030) (0.023) (0.041) (0.029) (0.026) (0.080) Wolters 2.023 1.787 3.36 1.869 1.546 2.409 2.079 1.832 3.378

(0.026) (0.021) (0.026) (0.024) (0.018) (0.019) (0.027) (0.022) (0.027)

n= 100 UMD-CN 1.502 1.353 2.793 1.393 1.209 2.466 1.481 1.374 3.425 (0.016) (0.014) (0.038) (0.017) (0.013) (0.027) (0.016) (0.014) (0.049) Wolters 1.350 1.224 2.748 1.232 1.040 1.882 1.359 1.230 2.762

(0.014) (0.012) (0.016) (0.013) (0.010) (0.011) (0.014) (0.011) (0.016)

n= 500 UMD-CN 0.783 0.719 1.576 0.771 0.673 1.414 0.804 0.755 1.985 (0.007) (0.007) (0.022) (0.007) (0.006) (0.017) (0.007) (0.007) (0.032) Wolters 0.783 0.753 2.287 0.731 0.640 1.450 0.788 0.757 2.309

(0.007) (0.005) (0.009) (0.006) (0.004) (0.006) (0.006) (0.005) (0.009)

distributions. Since these data are measured in number of days we estimate the density with a fixed minimum of 0 for the support, i.e.fix.lower = 0, as values of negative days are not practical. The AIC method selects 7 as the optimal number of weights whereas the CN criterion selects 10. Regardless of this diﬀerence in number of weights, both estimates are almost identical so only the CN criterion estimate is reported.

Table 1.5: Average computation times (in seconds) for the Bernstein polynomial method using the CN criterion (UMD-CN) and Wolters (2012) method (Wolters) across 1000 MC samples for

left-skewed, symmetric, and right-skewed distributions on supports of (_{−∞},_{∞}), [0,_{∞}), and [_{−}1,1] with

samples of size 15, 30, 100 and 500. MC standard errors are displayed in parentheses.

Left-Skewed Symmetric Right-Skewed UMD-CN Wolters UMD-CN Wolters UMD-CN Wolters

(−∞,∞)

n= 15 0.00186 0.01598 0.00184 0.01489 0.00178 0.0154 (0.00005) (0.00067) (0.00002) (0.00111) (0.00002) (0.00023)

n= 30 0.00274 0.15428 0.00268 0.02809 0.00267 0.03083 (0.00003) (0.00079) (0.00002) (0.00032) (0.00003) (0.00033)

n= 100 0.01310 0.14897 0.01305 0.14298 0.01274 0.15540 (0.00005) (0.00075) (0.00005) (0.00063) (0.00006) (0.00075)

n= 500 0.26028 3.58696 0.26245 3.61415 0.25799 3.70024 (0.00132) (0.00942) (0.00129) (0.00878) (0.00135) (0.01239)

[0,∞)

n= 15 0.00174 0.01475 - - 0.00176 0.0143 (0.00001) (0.00025) - - (0.00002) (0.00021)

n= 30 0.00258 0.03059 - - 0.00253 0.03240 (0.00002) (0.00044) - - (0.00002) (0.00041)

n= 100 0.01306 0.15315 - - 0.01275 0.15451 (0.00005) (0.00089) - - (0.00010) (0.00085)

n= 500 0.26328 3.79077 - - 0.25254 3.73938 (0.00136) (0.01108) - - (0.00137) (0.01200)

[−1,1]

n= 15 0.00190 0.01545 0.00186 0.01578 0.0018 0.01590 (0.00004) (0.00025) (0.00002) (0.00028) (0.00002) (0.00027)

n= 30 0.00252 0.03223 0.00257 0.03503 0.00253 0.03285 (0.00002) (0.00038) (0.00002) (0.00054) (0.00002) (0.00044)

n= 100 0.01176 0.16727 0.01240 0.17889 0.01143 0.16518 (0.00002) (0.00139) (0.00005) (0.00213) (0.00003) (0.00155)

n= 500 0.26018 4.29922 0.27321 4.63595 0.26156 4.36435 (0.00097) (0.01981) (0.00097) (0.03912) (0.00105) (0.02676)

1.4.2 S&P 500 Log Returns Data Set

We also explore a data set of the Standard and Poor’s 500 daily log returns for the years 2008 to 2010. This data was acquired from the Yahoo Finance webpage (http://finance.yahoo.

com/q?s=%5EGSPC, accessed: 11/15/2012). The S&P 500 is one of the most commonly followed

-6 -4 -2 0 2 4 6

0.00

0.10

0.20

0.30

n=15

Left-Skewed

-6 -4 -2 0 2 4 6

0.00

0.10

0.20

0.30

Symmetric

-6 -4 -2 0 2 4 6

0.00

0.10

0.20

0.30

Right-Skewed

-6 -4 -2 0 2 4 6

0.00

0.10

0.20

0.30

n=30

-6 -4 -2 0 2 4 6

0.00

0.10

0.20

0.30

-6 -4 -2 0 2 4 6

0.00

0.10

0.20

0.30

-6 -4 -2 0 2 4 6

0.00

0.10

0.20

0.30

n=100

-6 -4 -2 0 2 4 6

0.00

0.10

0.20

0.30

-6 -4 -2 0 2 4 6

0.00

0.10

0.20

0.30

-6 -4 -2 0 2 4 6

0.00

0.10

0.20

0.30

n=500

-6 -4 -2 0 2 4 6

0.00

0.10

0.20

0.30

-6 -4 -2 0 2 4 6

0.00

0.10

0.20

0.30

Figure 1.1: Support (_{−∞},_{∞}): plots of average estimated densities for the Bernstein polynomial

method using the CN criterion (dashed line) and Wolters method (dotted line) across 1,000 MC

sam-ples for samsam-ples of size 15, 30, 100, and 500 along with the true density curve (solid line) for each type of skewness.