• No results found

Prediction of Death for Extremely Low Birth Weight Neonates

N/A
N/A
Protected

Academic year: 2020

Share "Prediction of Death for Extremely Low Birth Weight Neonates"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

Prediction of Death for Extremely Low Birth Weight Neonates

Namasivayam Ambalavanan, MD*; Waldemar A. Carlo, MD*; Georgiy Bobashev, PhD‡; Erin Mathias, BS‡; Bing Liu, MS‡; Kenneth Poole, PhD‡; Avroy A. Fanaroff, MB, BCh§; Barbara J. Stoll, MD㛳;

Richard Ehrenkranz, MD¶; and Linda L. Wright, MD#,

for the National Institute of Child Health and Human Development Neonatal Research Network

ABSTRACT. Objective. To compare multiple logistic regression and neural network models in predicting death for extremely low birth weight neonates at 5 time points with cumulative data sets, as follows: scenario A, limited prenatal data; scenario B, scenario A plus addi-tional prenatal data; scenario C, scenario B plus data from the first 5 minutes after birth; scenario D, scenario C plus data from the first 24 hours after birth; scenario E, sce-nario D plus data from the first 1 week after birth.

Methods. Data for all infants with birth weights of 401 to 1000 g who were born between January 1998 and April 2003 in 19 National Institute of Child Health and Human Development Neonatal Research Network cen-ters were used (n8608). Twenty-eight variables were selected for analysis (3 for scenario A, 15 for scenario B, 20 for scenario C, 25 for scenario D, and 28 for scenario E) from those collected routinely. Data sets censored for prior death or missing data were created for each scenario and divided randomly into training (70%) and test (30%) data sets. Logistic regression and neural network models for predicting subsequent death were created with train-ing data sets and evaluated with test data sets. The pre-dictive abilities of the models were evaluated with the area under the curve of the receiver operating character-istic curves.

Results. The data sets for scenarios A, B, and C were similar, and prediction was best with scenario C (area under the curve: 0.85 for regression; 0.84 for neural net-works), compared with scenarios A and B. The logistic regression and neural network models performed simi-larly well for scenarios A, B, D, and E, but the regression model was superior for scenario C.

Conclusions. Prediction of death is limited even with sophisticated statistical methods such as logistic regres-sion and nonlinear modeling techniques such as neural networks. The difficulty of predicting death should be acknowledged in discussions with families and caregiv-ers about decisions regarding initiation or continuation of care. Pediatrics 2005;116:1367–1373; logistic models,

neural networks (computer), predictive value, receiver op-erating characteristic curve.

ABBREVIATIONS. ELBW, extremely low birth weight; ROC, re-ceiver operating characteristic; AUC, area under the curve; SNAP, Score for Neonatal Acute Physiology; PPV, positive predictive value; NPV, negative predictive value.

E

xtremely low birth weight (ELBW) infants con-tinue to have a disproportionately high mor-tality rate, compared with larger, more mature infants, despite advances in perinatal and neonatal care.1–3 Decisions regarding initiation or continua-tion of support, as well as decisions regarding ag-gressiveness of management options, are difficult for many of these infants, and guidelines have been developed to assist with clinical management4 and counseling of families5 around the time of birth. These guidelines are dependent on the best estimate of gestational age. However, many prenatal and postnatal factors associated with outcomes (eg, mul-tiple gestation, Apgar scores, birth weight, gender, and prenatal steroid use)3,6modify the risk of death for individual neonates. The risk of death also changes with postnatal age, because ELBW neonates who survive beyond the first days of life have a higher likelihood of survival.7Therefore, during con-sideration of management options and counseling at different times, such as just before birth, just after birth, and in the days after birth, it is necessary to take these additional factors and the preceding clin-ical course into account. Prediction of death is also useful for auditing or benchmarking, comparison of outcomes among NICUs, controlling for population differences during clinical trials, and evaluation of resource utilization.3,8,9

Clinical intuition, scoring systems such as the Score for Neonatal Acute Physiology (SNAP),10 re-gression analyses,3,11and nonlinear statistical mod-els such as neural networks11,12have been evaluated previously for the prediction of death but have not been shown to have sufficiently high sensitivity and specificity for clinical purposes. Neural networks, more properly called “artificial neural networks,” are nonparametric, pattern-recognition techniques that can recognize complex nonlinear relationships or “hidden patterns” between independent and depen-dent variables, as well as possible interactions be-tween independent variables.13,14It is possible that, From the *Department of Pediatrics, University of Alabama at Birmingham,

Birmingham, Alabama; ‡Research Triangle Institute, Research Triangle Park, North Carolina; §Department of Pediatrics, Case Western Reserve University, Cleveland, Ohio;㛳Department of Pediatrics, Emory University, Atlanta, Georgia; ¶Department of Pediatrics, Yale University, New Haven, Connecticut; and #National Institute of Child Health and Human Develop-ment Neonatal Research Network, Bethesda, Maryland.

Accepted for publication Feb 28, 2005. doi:10.1542/peds.2004-2099

No conflict of interest declared.

(2)

with a sufficiently large sample size and high-quality data, novel and clinically important models using either regression models or neural networks for the prediction of death among extremely premature ne-onates can be developed.

The aim of this study was to develop and to com-pare multiple logistic regression and neural network models for the prediction of ELBW death in multiple scenarios at different time points, using only prenatal data (with either a limited or expanded set of vari-ables) or adding data available soon after birth, data available after completion of the first 24 hours of life, or data available at the end of the first 1 week of life. It was hypothesized that the best prediction models would be those using data from just after birth, be-cause most of the deaths occur in the first days of life and are associated with variables known soon after birth. It was also hypothesized that nonparametric, pattern-recognition techniques such as neural net-works would prove superior to standard logistic re-gression models.

METHODS Study Centers and Population

Data for all live-born infants with birth weights of 401 to 1000 g who were born between January 1, 1998, and April 9, 2003, and admitted to the 19 centers of the National Institute of Child Health and Human Development Neonatal Research Network were in-cluded in this study. Routinely, the data analyzed are collected systematically, stored in a database, and used for surveillance of the care and outcomes for high-risk infants in NICUs. The identity of the patients is kept highly confidential. The collection of data for the Neonatal Research Network had been approved by the institutional review boards of the participating institutions. All network centers are tertiary care centers.

Data Collection and Analysis

All statistical analyses were performed at the Research Triangle Institute (Research Triangle Park, NC). Thirty variables were se-lected from the database for analysis on the basis of the existing literature, which indicated that these variables were associated with death among premature infants (Table 1). All continuous (eg, birth weight) and logical (eg, gender) data variables were used unaltered, whereas ordinal data (eg, Apgar scores) were converted to categorical variables (eg, Apgar score at 5 minutes of⬎6: yes or no).

Five data sets were created to reflect 5 time points (scenarios), as follows: scenario A, limited prenatal data using only 3 vari-ables; scenario B, scenario A plus additional prenatal data (to determine whether additional variables improved predictive abil-ity); scenario C, scenario B plus data obtained 5 minutes after birth; scenario D, scenario C plus data obtained at 24 hours of life; scenario E, scenario D plus data obtained at 7 days of age. Scenario A had 3 variables, whereas scenario B had 15 variables (the 3 variables of scenario A plus 12 additional variables) (Table 1). Scenario C had 20 variables (the 15 of scenario B plus 5 additional variables), whereas scenario D had 25 (20 of scenario C plus 5 additional variables) and scenario E had 28 (25 of scenario D plus 3 additional variables) (Table 1). The data sets were censored for prior death and missing data, so that only infants who survived to 24 hours were included in scenario D and those who survived to 1 week were included in scenario E. The 5 data sets (1 for each scenario) were each divided into 10 pairs of training (70%) and test (30%) data sets, by assigning observations randomly to the train-ing and test data sets (Table 2). Logistic regression and neural network models were created in S-PLUS (Insightful Corp, Seattle, WA) with training data sets, and mortality probabilities were calculated for test data. The neural network models for all 5 scenarios were back-propagation models with sigmoid transfor-mation using 1 hidden layer with 6 nodes. This process of devel-opment and testing of the models was repeated with each of the 10 replicate data sets, and the results were averaged. The predictive

abilities of the regression and neural network models were com-pared by using the area under the curve (AUC) of the receiver operating characteristic (ROC) curves, calculated with the method described by Hanley and McNeil.15ROC curves plot sensitivity

versus 1⫺specificity; the more the AUC approaches 1, the greater is the predictive value. The matched-pairttest (SAS, Cary, NC) was used to compare the AUC for the logistic regression analysis with that for the neural network.

RESULTS

The total observations for each scenario ranged from 8608 for scenario A to 5973 for scenario E, because of censoring for prior death or missing data (Table 2). The median birth weight for the study population for scenario A was 735 g (mean: 735 g; SD: 158 g), the median gestation was 25 weeks (mean: 25.5 weeks; SD: 2.3 weeks), 43% of patients

TABLE 1. List of Variables and the Scenarios in Which They Were Introduced

Variable Description Scenario Introduced Prenatal steroids given (any) A

Non-Hispanic black A

Gestational age A

Mother had hypertension/eclampsia B Mother had prepartum hemorrhage B

Mother’s age B

Center mortality rate B

Pregnancy history, gravida B Pregnancy history, parity B Prenatal care (ⱖ1 prenatal care visit) B

Mother’s marital status B

Presence of labor B

Tocolytic agents used B

Multiple birth B

Antibiotics used B

Infant birth weight C

5-min Apgar score of⬍3 C 5-min Apgar score of 3–6 C 5-min Apgar score of⬎6 C

Male C

Highest oxygen concentration at 24 h D Clinical features of respiratory distress syndrome D Abnormal chest radiograph within 24 h D Respiratory support to age 24 h D Indomethacin given within 24 h D Highest oxygen concentration at 7 d E Number of days with conventional ventilation at

day 7

E

Number of days with high-frequency ventilation at day 7

E

The variables are carried through each scenario (eg, scenario B contains all variables introduced in scenarios A and B). Scenario A includes only limited prenatal data, whereas scenario B includes additional prenatal data. Scenario C includes additional data ob-tained in the first 5 minutes of life, whereas scenario D includes data from the first 24 hours of life and scenario E includes data from the first 1 week of life.

TABLE 2. Number of Eligible Infants for Each Scenario Scenario Total No. of

Observations

No. of Training Observations

No. of Testing Observations

(3)

were non-Hispanic black (range: 5– 84% by center), 50% of patients were male, and 82% of patients re-ceived mechanical ventilation (range: 72–96% by cen-ter). When the total study population was consid-ered, 14.3% of patients had died by 24 hours, 22.4% by 7 days, and 35% by discharge. Although similar numbers of infants were analyzed for scenarios A, B, and C, death between birth and 24 hours and death between 24 hours and 7 days of age reduced the sample sizes significantly for scenarios D and E, respectively. The infants with missing data (mostly because of nonrecording of ⱖ1 variable) were com-parable to those with recorded data.

To calibrate the models, AUCs and Hosmer-Leme-show statistics were calculated for the training and test sets. We noticed little discrepancy between the training and test sets. For most models, the AUC was slightly higher for the training set. For scenarios A and D, however, the AUC was higher for the test set, although the values were within the confidence in-terval of the training set AUC (data not shown); this might be expected because of the large sample size, which makes the models quite robust. The Hosmer-Lemeshow statistic was good only for scenarios D (statistic⫽.12) and E (statistic⫽.2) and was poor for scenarios A, B, and C (statistic ⬍ .01), indicating significant differences between model-predicted and observed values.

The models for scenarios A, B, and C could be compared with each other because the data sets were similar, but a direct comparison of these models with scenarios D and E was not possible because the data sets were dissimilar. Model C had a larger AUC, compared with models A and B (Table 3). The mul-tiple logistic regression model had a larger AUC than the neural network, indicating better predictive abil-ity, for scenario C, but the models had similar AUC values for other scenarios (Table 3). Although the AUC was statistically greater in the regression model for scenario C, the magnitude of the difference (dif-ference: 0.01) is unlikely to be clinically relevant. Larger magnitudes of differences in AUC values (dif-ference: 0.07– 0.09) between the regression and neu-ral network models for scenarios D and E were not statistically significant because the variation was greater. Although neural networks produced better predictions and had excellent Hosmer-Lemeshow goodness-of-fit statistics with the training sets, they failed to produce better predictions with the test data.

The models were compared at 50% and 90% sen-sitivity; these levels of sensitivity were chosen arbi-trarily so that the models could be compared when a higher specificity (lower sensitivity) and a higher sensitivity are required. At 50% sensitivity (infants predicted to die of those who died), the regression models for scenarios A and B had a specificity (in-fants predicted to survive of those who survived) of 93%, a positive predictive value (PPV) (infants pre-dicted to die who actually died) of 80%, and a neg-ative predictive value (NPV) (infants predicted to survive who survived) of 78%, whereas scenario C had a specificity of 95%, a PPV of 84%, and a NPV of 79%. The model for scenario D had 89% specificity, 58% PPV, and 85% NPV, whereas that for scenario E had 90% specificity, 49% PPV, and 90% NPV.

At a higher sensitivity, the specificity and PPV naturally declined. At 90% sensitivity, the regression models for scenarios A and B had a specificity of 35%, a PPV of 43%, and a NPV of 87%. At the same sensitivity, the model for scenario C had a specificity of 55%, a PPV of 51%, and a NPV of 93%, whereas that for scenario D had 49% specificity, 35% PPV, and 94% NPV and that for scenario E had 43% specificity, 25% PPV, and 96% NPV.

The regression equation coefficients and odds ra-tios showed that the contributions of different vari-ables to the outcome varied with the scenario (Table 4). It can also be seen that some of the variables used were not associated significantly with the outcome in the models (Table 4). For example, in scenario C, for which the logistic regression model performed best, the variables associated with a significantly lower risk of death were use of prenatal steroids, black race, older gestational age, presence of pregnancy-induced hypertension, higher birth weight, and higher 5-minute Apgar score (ⱖ3) and the variables associated with a higher risk of death were higher center mortality rate, presence of prepartum hemor-rhage, multiple births, and male gender (Table 4). Center mortality rate was considered an aggregated measure, and a multilevel modeling approach was not implemented for the sake of simplicity and con-sistency. Exploratory analyses were also performed (data not shown) with varying numbers of hidden layers for neural networks and stepwise selection of variables for regression models, but the increase in complexity of the models did not improve perfor-mance, indicating that the models were quite robust.

DISCUSSION

The identification of ELBW infants at high risk of death is of increasing importance, particularly be-cause many of these infants are at high risk for neurodevelopmental impairment.10 The current study demonstrates that the ability to predict death is significantly better (both statistically and of a clin-ically relevant magnitude) at 5 minutes of age, rather than at or before birth with only prenatal data. How-ever, the ability to predict death does not improve with increasing age among infants who avoid early death, because early variables do not have lingering effects. Also, the contribution of the different vari-ables to subsequent death varies with the time

pe-TABLE 3. Average AUC (of 10 Replicates) Under the ROC Curve and SE of the AUC for Each of the 5 Scenarios

Scenario Logistic Model Neural Network Average

AUC

SE of AUC

Average AUC

SE of AUC

A 0.814 0.008 0.816 0.007

B 0.802 0.007 0.804 0.006

C 0.854 0.004 0.844 0.005

D 0.804 0.014 0.714 0.11

E 0.791 0.010 0.729 0.044

(4)

riod. Prediction with multiple logistic regression proved comparable to that with neural networks for most of the time periods.

There are important strengths to this study. The data sets evaluated in this study included many thousands of ELBW infants, making this the largest of any such prediction study to date. Infants from multiple level III centers in the United States were evaluated during a recent period in which mortality rates did not change significantly, making the results comparable to current clinical practice. In addition, the statistical models were developed with one data set and tested with another data set, which ensured that the model was truly tested. Developing and testing a model with the same set may lead to excel-lent performance with a high AUC for an

over-TABLE 4. Regression Coefficients, Odds Ratios, and 95% Confidence Intervals for the Multiple Logistic Regression Equa-tions Developed for Each Scenario

Variable Regression Coefficient Odds Ratio 95% Confidence Interval Scenario A Intercept 13.9233

Prenatal steroids ⫺0.9376 0.392* 0.342–0.449 Gestational age

(each 1 wk) ⫺

0.5514 0.576* 0.555–0.598

Black ⫺0.1116 0.894 0.789–1.014 Scenario B

Intercept 13.5542

Prenatal steroids ⫺0.7110 0.491* 0.416–0.580 Gestational age

(each 1 wk) ⫺

0.5962 0.551* 0.528–0.575

Tocolytic agents used ⫺0.2706 0.763* 0.659–0.884 Black ⫺0.2127 0.808* 0.701–0.933 Antibiotics used ⫺0.1735 0.841* 0.715–0.988 Marital status ⫺0.1005 0.904 0.775–1.055 Presence of labor ⫺0.0868 0.917 0.770–1.092 Gravida ⫺0.0276 0.973 0.924–1.024 Parity ⫺0.0232 0.977 0.908–1.051 Maternal age ⫺0.0037 0.996 0.985–1.008 Center mortality rate 0.0485 1.050† 1.038–1.062 Hypertension 0.1328 1.142 0.931–1.401 Prenatal care 0.1770 1.194 0.927–1.538 Prepartum hemorrhage 0.1901 1.209† 1.025–1.427 Multiple birth 0.4069 1.502† 1.258–1.794 Scenario C

Intercept 8.1044

5-min Apgar score of⬎6 ⫺1.5601 0.210* 0.140–0.315 5-min Apgar score of 3–6 ⫺1.0263 0.358* 0.240–0.536 Prenatal steroids ⫺0.5969 0.551* 0.458–0.662 Birth weight (per 100 g) ⫺0.5688 0.566* 0.538–0.596 Black ⫺0.3210 0.725* 0.621–0.848 Hypertension ⫺0.2228 0.800* 0.642–0.997 Gestational age ⫺0.2037 0.816* 0.776–0.858 Tocolytic agents used ⫺0.1460 0.864 0.736–1.015 Marital status ⫺0.0946 0.910 0.177–1.074 Gravida ⫺0.0304 0.970 0.917–1.026 Maternal age ⫺0.0047 0.995 0.983–1.008 Antibiotics used ⫺0.0014 0.999 0.837–1.191 Presence of labor 0.0050 1.005 0.832–1.214 Parity 0.0054 1.005 0.929–1.088 Center mortality rate 0.0538 1.055† 1.042–1.069 Prenatal care 0.0720 1.075 0.811–1.424 Prepartum hemorrhage 0.2343 1.264† 1.054–1.515 5-min Apgar score of⬍3 0.3225 1.381 0.888–2.146 Multiple birth 0.3290 1.390† 1.145–1.687 Male 0.5459 1.726† 1.500–1.987 Scenario D

Intercept 3.3691

5-min Apgar score of⬎6 ⫺0.7744 0.461* 0.278–0.765 5-min Apgar score of 3–6 ⫺0.6243 0.536* 0.323–0.888 Birth weight (per 100 g) ⫺0.5287 0.589* 0.556–0.625 Clinical RDS ⫺0.5234 0.593* 0.372–0.945 5-min Apgar score of⬍3 ⫺0.5046 0.604 0.342–1.065 Prenatal steroids ⫺0.2283 0.796* 0.649–0.975 Marital status ⫺0.2187 0.804 0.674–0.958 Black ⫺0.1162 0.890 0.756–1.048 Antibiotics used ⫺0.1098 0.896 0.742–1.082 Gestational age ⫺0.1092 0.897* 0.847–0.949 Hypertension ⫺0.0833 0.920 0.736–1.15 Tocolytic agents used ⫺0.0312 0.969 0.817–1.15 Gravida ⫺0.0181 0.982 0.926–1.042 Maternal age ⫺0.0108 0.989 0.976–1.003 Parity 0.00682 1.007 0.926–1.095 Respiratory support to 24 h 0.0238 1.024 0.687–1.527 Indomethacin within 24 h 0.0566 1.058 0.9–1.244 Center mortality rate 0.0595 1.061† 1.047–1.076 Prenatal care 0.1552 1.168 0.869–1.57 Multiple birth 0.1573 1.170 0.951–1.441 Abnormal chest radiograph

within 24 h

0.184 1.202 0.88–1.642

TABLE 4. Continued

Variable Regression Coefficient Odds Ratio 95% Confidence Interval Presence of labor 0.1839 1.202 0.983–1.469 Prepartum hemorrhage 0.3232 1.382† 1.144–1.668 Male 0.5797 1.785† 1.538–2.073 Highest oxygen

concentration at 24 h

1.4984 4.475† 3.226–6.207

Scenario E

Intercept 1.2703

Clinical RDS ⫺0.7289 0.482* 0.298–0.781 5-min Apgar score of⬎6 ⫺0.6915 0.501* 0.295–0.85 5-min Apgar score of 3–6 ⫺0.5934 0.552* 0.326–0.936 5-min Apgar score of⬍3 ⫺0.492 0.611 0.34–1.1 Birth weight (per 100 g) ⫺0.4316 0.649* 0.606–0.696 Respiratory support to 24 h ⫺0.4093 0.664 0.427–1.033 Marital status ⫺0.2229 0.800* 0.667–0.959 Prenatal steroids ⫺0.1386 0.871 0.705–1.074 Indomethacin within 24 h ⫺0.1064 0.899 0.759–1.065 Tocolytic agents used ⫺0.0825 0.921 0.772–1.098 Antibiotics used ⫺0.0794 0.924 0.76–1.123 Black ⫺0.0567 0.945 0.797–1.12 Gestational age ⫺0.0541 0.947 0.894–1.004 Hypertension ⫺0.0366 0.964 0.765–1.215 Gravida ⫺0.0184 0.982 0.924–1.043 Maternal age ⫺0.0112 0.989 0.975–1.003 Abnormal chest radiograph

within 24 h

0.00748 1.008 0.725–1.4

Parity 0.0249 1.025 0.94–1.118 Center mortality rate 0.0475 1.049† 1.034–1.064 Number of days with

conventional ventilation at d 7

0.0982 1.103† 1.048–1.162

Multiple birth 0.1193 1.127 0.91–1.395 Prenatal care 0.1203 1.128 0.833–1.526 Number of days with

high-frequency ventilation at d 7

0.1964 1.217† 1.147–1.291

Presence of labor 0.202 1.224 0.995–1.505 Prepartum hemorrhage 0.3235 1.382† 1.139–1.677 Male 0.4917 1.635† 1.402–1.907 Highest oxygen

concentration at 24 h

0.8734 2.395† 1.686–3.403

Highest oxygen concentration at 7 d

1.9891 7.309† 5.245–10.184

Variables are arranged in order of increasing odds ratios. RDS indicates respiratory distress syndrome.

* Odds ratios of⬍1 with confidence intervals that do not overlap 1 indicate variables that improve survival rates.

(5)

trained model that may not be able to predict out-comes in a different data set.

There are also some limitations to this study. Only variables that already existed in the database could be used for analysis. Other variables that may be associated with death (eg, chorioamnionitis, timing of prenatal steroid therapy, fetal biophysical profile, and resuscitation variables such as parental or phy-sician wishes regarding resuscitation) could not be evaluated because they were not part of the data collected. It must also be noted that the models for the different scenarios used different data sets, be-cause infants who died before the scenario could not be considered for the prediction of subsequent death. Scenario A was approximately comparable to ios B and C (which were identical), and these scenar-ios included almost all live-born infants. However, scenario D included only infants who had survived to 24 hours of age, and scenario E included only infants who had survived to 1 week of age; therefore, scenarios D and E must be considered in isolation and not in comparison with scenario A, B, or C. Another limitation is that, in addition to prior death, a few infants were excluded because of missing data, which resulted in smaller data sets (mostly for sce-narios D and E) than were accounted for by earlier death alone. It is known that ELBW infants are at highest risk of death in the first 3 days.7Therefore, the overall likelihood of death was higher in scenar-ios A, B, and C and decreased with scenario D and additionally with scenario E. Because the PPV of a test also depends on the prevalence of the outcome in the population (PPVs are low for rare outcomes and higher for common outcomes, with the same sensi-tivity), the models are less likely to be accurate in the later scenarios, because mortality rates are lower af-ter the immediate postnatal period. The PPVs of these models therefore diminish over time, because mortality rates are lower among older infants. Other limitations of this study are that statistical methods such as regression analysis or neural networks are not easy to use in the clinical setting. It would be possible to optimize the regression models by eval-uating nonlinear relationships and interactions and incorporating them into the logistic regression mod-els, but this would increase model complexity and might decrease clinical utility. Neural networks es-pecially are considered a “black box,” the inner working of which is difficult to determine.

A limited number of prenatal variables (gesta-tional age, race, and prenatal steroid use) performed as well as a larger collection of prenatal variables, which indicates that a parsimonious model is often preferable to a large model, although some of the additional included variables (tocolysis, antibiotic use, and singleton birth) were associated signifi-cantly with a lower probability of death in this sce-nario. At 5 minutes of age, the addition of birth weight, gender, and 5-minute Apgar score improved the model. As expected, there was a proportional increase in survival rates with increasing Apgar scores. The odds ratio for gestational age was less significant in scenario C, compared with scenario B, because part of its contribution to outcome was taken

over by the inclusion of birth weight (with which gestational age is correlated strongly). The odds ratio per 100-g increase in birth weight is ⬃0.5 to 0.6, which is highly statistically significant and clinically relevant. It is possible that, if the birth weight and gender of the fetus were determined prenatally with good accuracy, those factors could also be used as predictors in the prenatal period and could be used in discussions of mortality risk with the parents be-fore birth, when discussions about resuscitation are held. Some variables (such as tocolysis and prenatal antibiotic use) that were significant in scenario B were no longer significant after birth. For infants who survived to 24 hours of age, the effects of race were less significant, whereas the effect of prenatal steroid use was diminished by the seventh day. The maximal oxygen concentrations at 24 hours and at 7 days were strong predictors of death, probably be-cause they were good indicators of the underlying severity of the respiratory illness. Clinicians do not normally use mathematical equations and ROC curves to predict outcomes for individual neonates, but knowledge of these predictors and how their contributions vary over time may assist clinical judg-ment and influence decision-making.

(6)

being a determinant of ELBW death in the current study.

Meadow et al10showed that the predictive ability of serial SNAP scores and clinical intuition for neo-natal death declines with time. Even with the advan-tage of a large data set and sophisticated statistical techniques, predictive ability declined with time in our study, possibly because of a decrease in the number of rational or logical predictors. Variables that are major risk factors for death (eg, birth weight, gestational age, gender, race, and Apgar scores) are mostly determinants of early death and can be iden-tified or measured easily, whereas the risk factors for later death (eg, sepsis, necrotizing enterocolitis, and bronchopulmonary dysplasia) are not well defined or cannot be determined sufficiently in advance, leading to an attenuation in predictive ability for death. Meadow et al10also demonstrated the impor-tance of prediction; infants who were predicted to die but who actually survived were at high risk (82%) of neurodevelopmental impairment.10 It is likely that infants with a higher probability of death would also have a higher probability of morbidity in the current study, and this will be investigated when follow-up data are available for these infants. Pollack et al9 compared Clinical Risk Index for Babies, SNAP, SNAP-Perinatal Extension, and other models with prenatal, birth, and first 12- and 24-hour data. The discriminatory ability of these models was very good. However, the same data set was used for development and testing of the models and larger infants (up to 1500 g) were evaluated, both of which would increase the apparent predictive ability. An-other issue to consider with prediction models is the “self-fulfilling prophecy,” ie, if an infant is consid-ered at high risk of death, then there may be a bias against provision of aggressive resuscitative mea-sures. It is difficult to determine in these studies the extent to which death was attributable directly to the magnitude of the variable (eg, birth weight of 500 g) or to clinicians’ perceptions of that variable as a predictor of death (eg, less aggressive resuscitation of infants who wereⱕ500 g at birth).

Regression analysis has limitations in some clinical situations, because the relationships between inde-pendent and deinde-pendent variables may be nonlinear. Neural networks are nonparametric, pattern-recog-nition techniques capable of identifying hidden pat-terns and interactions.13,14 Cross et al13 provided an introduction to neural networks for clinicians, and Tu14 reviewed the advantages and disadvantages of neural networks versus regression models for pre-dicting medical outcomes. Neural networks have been found to be suitable and superior to logistic regression for the prediction of outcomes for criti-cally ill adult patients.17 There have been 2 single-center studies comparing multiple regression models with neural networks for the prediction of death

among premature neonates.11,12 In 1 of those stud-ies,12 which used admission data for prediction of death among very low birth weight infants, the neu-ral network performed significantly better than the logistic regression; in the other,11 which used data from admission and the first 6 hours of life for pre-diction of ELBW death, the performance was equiv-alent. It is difficult to compare those single-center studies with our multicenter study, because varia-tions in clinical practices might induce greater vari-ance in the relationship of a variable (dependent on those clinical practices) to death. For example, in-fants who might have been removed from support within hours after birth at one institution might be more often resuscitated aggressively at another insti-tution, which might lead to postponement of death beyond the first day or even survival to discharge with impairment. Therefore, it may be necessary for each center to develop and to test its own model for the prediction of outcomes.

One major implication of this study is that it may be better to postpone decisions about initiation of support or withdrawal of care until 5 minutes after birth, rather than making decisions before birth with only prenatal data, because immediate postnatal variables such as Apgar scores (reflecting status at birth), birth weight, and gender add significantly (20% higher specificity and 8% higher PPV at 90% sensitivity) to the ability to predict death. The other major implication is that it is difficult to predict death (or survival) for individual neonates with certainty. Clinicians and parents need to be aware of inherent biases and uncertainty in trying to foretell the future, especially when such predictions are used for clinical decision-making. It has been demonstrated that ob-stetricians and pediatricians who underestimate the possibility of survival of a neonate are less likely to resuscitate the neonate or to use mechanical ventila-tion, inotropes, or other standard therapies.18 De-spite these limitations in prognostication, these pre-dictive models indicate the contribution of the known major risk factors to death and are useful for the generation of hypotheses that can be tested in controlled trials (eg, the benefits of tocolysis and maternal antibiotic therapy in preterm labor and the effects on long-term outcomes of care practices re-sponsible for variations in center mortality rates).

ACKNOWLEDGMENTS

Financial support was provided by National Institutes of Health grants U10 HD27851, U01 HD36790, U10 HD21364, U10 HD34216, U10 HD27871, M01 RR06022, U10 HD27856, M01 RR00750, U10 HD27853, M01 RR08084, U10 HD34167, M01 RR02635, M01 RR02172, M01 RR01032, U10 HD21373, U10 HD27904, U10 HD21397, U10 HD21415, U10 HD21385, U10 HD40689, U10 HD27880, M01 RR00070, U10 HD27881, U10 HD 40461, and M01 RR00997.

(7)

REFERENCES

1. Lemons JA, Bauer CR, Oh W, et al. Very low birth weight outcomes of the National Institute of Child Health and Human Development Neo-natal Research Network, January 1995 through December 1996. Pediat-rics.2001;107(1). Available at: www.pediatrics.org/cgi/content/full/ 107/1/e1

2. Victorian Infant Collaborative Study Group. Improved outcome into the 1990s for infants weighing 500 –999 g at birth.Arch Dis Child Fetal Neonatal Ed.1997;77:F91–F94

3. Tyson JE, Younes N, Verter J, Wright LL. Viability, morbidity, and resource use among newborns of 501- to 800-g birth weight: National Institute of Child Health and Human Development Neonatal Research Network.JAMA.1996;276:1645–1651

4. American College of Obstetricians and Gynecologists. ACOG Practice Bulletin: Clinical Management Guidelines for Obstetrician-Gyne-cologists: number 38, September 2002: perinatal care at the threshold of viability.Obstet Gynecol.2002;100:617– 624

5. American Academy of Pediatrics, Committee on Fetus and Newborn. Perinatal care at the threshold of viability.Pediatrics.2002;110:1024 –1027 6. Shankaran S, Fanaroff AA, Wright LL, et al. Risk factors for early death among extremely low-birth-weight infants.Am J Obstet Gynecol.2002; 186:796 – 802

7. Meadow W, Reimshisel T, Lantos J. Birth weight-specific mortality for extremely low birth weight infants vanishes by four days of life: epi-demiology and ethics in the neonatal intensive care unit.Pediatrics.

1996;97:636 – 643

8. Kaaresen PI, Dohlen G, Fundingsrud HP, Dahl LB. The use of CRIB (Clinical Risk Index for Babies) score in auditing the performance of one neonatal intensive care unit.Acta Paediatr.1998;87:195–200

9. Pollack MM, Koch MA, Bartel DA, et al. A comparison of neonatal mortality risk prediction models in very low birth weight infants.

Pediatrics.2000;105:1051–1057

10. Meadow W, Frain L, Ren Y, Lee G, Soneji S, Lantos J. Serial assessment of mortality in the neonatal intensive care unit by algorithm and intuition: certainty, uncertainty, and informed consent.Pediatrics.2002; 109:878 – 886

11. Ambalavanan N, Carlo WA. Comparison of the prediction of extremely low birth weight neonatal mortality by regression analysis and by neural networks.Early Hum Dev.2001;65:123–137

12. Zernikow B, Holtmannspoetter K, Michel E, et al. Artificial neural network for risk assessment in preterm neonates.Arch Dis Child Fetal Neonatal Ed.1998;79:F129 –F134

13. Cross SS, Harrison RF, Kennedy RL. Introduction to neural networks.

Lancet.1995;346:1075–1079

14. Tu JV. Advantages and disadvantages of using artificial neural net-works versus logistic regression for predicting medical outcomes.J Clin Epidemiol.1996;49:1225–1231

15. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic curve.Radiology.1982;143:29 –36

16. Horbar JD, Onstad L, Wright E. Predicting mortality risk for infants weighing 501 to 1500 grams at birth: a National Institutes of Health Neonatal Research Network report.Crit Care Med.1993;21:12–18 17. Dybowski R, Weller P, Chang R, Gant V. Prediction of outcome in

critically ill patients using artificial neural network synthesized by genetic algorithm.Lancet.1996;347:1146 –1150

18. Morse SB, Haywood JL, Goldenberg RL, Bronstein J, Nelson KG, Carlo WA. Estimation of neonatal outcome and perinatal therapy use. Pedi-atrics.2000;105:1046 –1050

APPENDIX. National Institute of Child Health and Human Development Neonatal Research Network (1996 –2006)

Center Principal Investigator Follow-up Principal Investigator

Network Coordinator Follow-up Coordinator

Brown University William Oh, MD Betty Vohr, MD Angelita Hensman, RNC Lucey Noel, RNC Case Western Reserve

University

Avroy A. Fanaroff, MB, BCh

Dee Wilson, MD Nancy Newman, RN Bonnie Siner, RN

Duke University Ronald N. Goldberg, MD Ricki Goldstein, MD Kathy Auten, RN Melody Lohmeyer, RN Emory University Barbara J. Stoll, MD Barbara J. Stoll, MD Ellen Hale, RNC, BS Ellen Hale, RNC, BS Harvard University Ann R. Stark, MD Ann R. Stark, MD Kerri Fournier, RN

Indiana University James A. Lemons, MD Anna Dusick, MD DeeDee Appel, RN Leslie Richards, RN Stanford University David K. Stevenson, MD Susan Hintz, MD Bethany Ball, RN Bethany Ball, RN University of Alabama Waldemar A. Carlo, MD Myriam Peralta, MD Monica Collins, RN Vivien Phillips University of California,

San Diego

Neil N. Finer, MD Yvonne Vaucher, MD Wade Rich, RN Martha Fuller, RN

University of Cincinnati Edward F. Donovan, MD Jean Steichen, MD Cathy Grisby, RN Tari Gratton, RN University of Miami Shahnaz Duara, MD Charles Bauer, MD Ruth Everett, RN Mary Allison, RN University of New Mexico Lu-Ann Papile, MD Lu-Ann Papile, MD Conra Backstrom, RN

University of Rochester Dale L. Phelps, MD Gary Myers, MD Linda Reubens, RN Diane Hust, RN University of Tennessee Sheldon B. Korones, MD Kimberly Yolton, PhD Tina Hudson, RN

University of Texas-Dallas Abbot R. Laptook, MD Roy Heyne, MD Susie Madison, RN Jackie Hickman, RN University of Texas-Houston Jon E. Tyson, MD, MPH Pamela Bradt, MD Georgia McDavid, RN Shannon Rossi Wake Forest University T. Michael O’Shea, MD Robert Dillard, MD Nancy Peters RN Barbara Jackson, RN Wayne State University Seetha Shankaran, MD Yvette Johnson, MD Gerry Muran, BSN Debbie Kennedy, RN Yale University Richard A. Ehrenkranz,

MD

Richard A. Ehrenkranz, MD

Pat Gettner, RN Elaine Romano, MSN

National Institute of Child Health and Human Development

Linda L. Wright, MD Beth B. McClure, MS Rose Higgins, MD Carolyn Petrie, MS

(8)

DOI: 10.1542/peds.2004-2099

2005;116;1367

Pediatrics

and Linda L. Wright

Bing Liu, Kenneth Poole, Avroy A. Fanaroff, Barbara J. Stoll, Richard Ehrenkranz

Namasivayam Ambalavanan, Waldemar A. Carlo, Georgiy Bobashev, Erin Mathias,

Prediction of Death for Extremely Low Birth Weight Neonates

Services

Updated Information &

http://pediatrics.aappublications.org/content/116/6/1367

including high resolution figures, can be found at:

References

http://pediatrics.aappublications.org/content/116/6/1367#BIBL

This article cites 17 articles, 7 of which you can access for free at:

Subspecialty Collections

http://www.aappublications.org/cgi/collection/nutrition_sub

Nutrition

ub

http://www.aappublications.org/cgi/collection/metabolic_disorders_s

Metabolic Disorders

following collection(s):

This article, along with others on similar topics, appears in the

Permissions & Licensing

http://www.aappublications.org/site/misc/Permissions.xhtml

in its entirety can be found online at:

Information about reproducing this article in parts (figures, tables) or

Reprints

http://www.aappublications.org/site/misc/reprints.xhtml

(9)

DOI: 10.1542/peds.2004-2099

2005;116;1367

Pediatrics

and Linda L. Wright

Bing Liu, Kenneth Poole, Avroy A. Fanaroff, Barbara J. Stoll, Richard Ehrenkranz

Namasivayam Ambalavanan, Waldemar A. Carlo, Georgiy Bobashev, Erin Mathias,

Prediction of Death for Extremely Low Birth Weight Neonates

http://pediatrics.aappublications.org/content/116/6/1367

located on the World Wide Web at:

The online version of this article, along with updated information and services, is

by the American Academy of Pediatrics. All rights reserved. Print ISSN: 1073-0397.

Figure

TABLE 2.Number of Eligible Infants for Each Scenario
TABLE 3.Average AUC (of 10 Replicates) Under the ROCCurve and SE of the AUC for Each of the 5 Scenarios
TABLE 4.Regression Coefficients, Odds Ratios, and 95%Confidence Intervals for the Multiple Logistic Regression Equa-tions Developed for Each Scenario

References

Related documents

Methods: We surveyed 200 type 2 diabetic patients from two public hospitals using the Diabetes Distress Scale (DDS), Zung Self-rating Depression Scale, and Revised Treatment

A follow-up study of civilian trauma survivors with acute stress disorder reported that 8% of CBT patients and 25% of supportive counseling patients met criteria for PTSD

To determine the mechanism by which DPY30 in fl uences the pathogenesis and progression of CSCC, we transfected DPY30 siRNA into SiHa cells to knock-down DPY30 expression.. Real-time

Keywords: nozzle, coefficient of variation, uniformity of distribution, operating pressure, sprayer, flow,

The basis of that work was a reformulation of Eigen’s error catastrophe for replication in terms of an average distortion measure, and formal application of the information theory

HAL n’a pas permis de mettre en Libre Accès la production scientifique française, qui reste confinée aux périodiques: les institutions de recherche ne se sont

Materials and methods: Fifteen liver cancer patients with 47 total lesions were treated with 3-D printing template-assisted radioactive seed implantation (group A), and 25

To know is to cognize, to cognize is to be a culturally bounded, rationality-bounded and environmentally located agent. Knowledge and cognition are thus dual aspects of human