In this paper, **survival** **data** **analysis** is realized by applying Generalized En- tropy Optimization Methods (GEOM). It is known that all statistical distribu- tions can be obtained as MaxEnt distribution by choosing corresponding moment functions. However, Generalized Entropy Optimization Distribu- tions (GEOD) in the form of MinMaxEnt,MaxMaxEnt distributions which are obtained on basis of Shannon measure and supplementary optimization with respect to characterizing moment functions, more exactly represent the given statistical **data**. For this reason, **survival** **data** **analysis** by GEOD acquires a new significance. In this research, the **data** of the life table for engine failure **data** (1980) is examined. The performances of GEOD are established by Chi-Square criteria, Root Mean Square Error (RMSE) criteria and Shannon entropy measure, Kullback-Leibler measure. Comparison of GEOD with each other in the different senses shows that along of these distributions

Show more
17 Read more

In this section, we will illustrate the applications of the proposed generalized transformation models and compare the proposed models with the Cox proportional hazards model and the Zeng et al. [1] transformation cure model by analyzing **data** from the First National Health and Nutrition Examination Survey Epidemiologic Follow-up Study (NHANES1). The NHANES1 **data** set is from the Diverse Populations Collaboration (DPC), which is a pooled database contributed by a group of investigators to examine issues of heterogeneity of results in epidemiological studies. The database includes 21 observational cohorts studies, 3 clinical trials, and 3 national samples. In the dataset NHANES1, information for 14,407 individuals was collected in four cohorts from 1971 to 1992. In this **analysis**, we use **data** from two of the four cohorts, the black female cohort and the black male cohort. After dropping all missing observations, a total of 2027 patients remains in these two cohorts, including 1265 black females and 762 black males. **Survival** times of the 2027 patients are used as the response variable. The endpoint is the overall **survival** time collected in 1992. In the two cohorts 848 patients, about 40% of the total number of patients, died at the end of followup with a maximum **survival** life time of 7691 days. There were 1179 patients whose **survival** times were right censored, among them 115 patients had **survival** time longer than 7691 days. We consider these 115 patients as cured subjects.

Show more
23 Read more

The local similarity matrices of nominal attributes and the weights of attributes are often defined manually by do- main experts like medical specialists [8, 16–18]. This works well for straightforward domains with low complex- ity. However, in complex domains even clinical experts in the same field may have different views on the impact of an attribute on the disease of interest. A more objective approach is to derive the similarity from the **data** in the case base. There are a number of CBR algorithms that are able to learn local similarities from the case base itself. However, many of them are based on the dependency of one or more solution-describing attributes. In clinical con- texts, this is often the case for the attribute “applied ther- apy”. A special case where such CBR based systems struggle is when **data** from randomized clinical trials is an- alyzed. Here, this dependency would cause a huge bias, because therapy arms (novel therapy against gold standard or placebo therapy) are usually randomized. As an alterna- tive, a similarity measure depending on “overall **survival** time” might make more sense as it is considered authentic in assessing the success of clinical trials. However, the authors of this article are not aware of the existence of a similarity measure with an explicit focus on “overall **survival** time ” .

Show more
14 Read more

(ii) types of censoring: in many **data** sets, all one really knows about the timing of the event is that it happened between time a and time b; this is interval censoring and is somewhat more complicated than standard right censoring (some people have not had the event when the **analysis** is performed). If all **data** are either interval- or right- censored, then this can easily be treated as discrete-time **data**. However, if the fail- ure time is known for some cases but only known within an interval for other cases, then there are no clearly superior methods of **analysis** known, at least when covariates are present. Several variants, with a com- parative assessment, are discussed in [3]. 8. whether the software is special purpose or is part

Show more
In 1897, a Paris-born engineer named Vilfredo Pareto showed that the distribution of wealth in Europe followed a simple power-law pattern, which essentially meant that the extremely rich hogged most of a nation's wealth (New Scientist, 19th August 2000, p 22). Economists later realised that this law applied to just the very rich, and not necessarily to how wealth was distributed among the rest. But mathematically, a Pareto distribution has often been used to quantify the distribution of wealth in the world It is called an “ABC **analysis**” or a Power law (a polynomial relationship - an expression that is constructed from one or more variables and constants, using only the operations of addition, subtraction, multiplication, and constant positive whole number exponents). It exhibits the property of the probability distribution that coincides with social, scientific, geophysical, and many other types of observable phenomena.

Show more
26 Read more

Peterson (1977) expressed finite sample censored survivorship function as an explicit function of two empirical sub **survival** functions which has got strong consistency property. The Kaplan-Meier estimator for the **survival** function in the censored **data** problem can be expressed for finite samples as an explicit function of two empirical sub **survival** functions. Jan et al., (2005) attaches’ non censored rate as weights for censored observations, in case of high proportion of censoring and which makes the **survival** estimates less biased. Shafiq et al., (2007) proposed a new weight that gives non-zero weight to the last censored observation, in order to avoid zero probability for the same. Many of the popular nonparametric two-sample test statistics for censored **survival** **data**, such as the log-rank (Mantel, 1966), generalized Wilcoxon (Gehan, 1965), and Peto-Peto (1972) test statistics, have been shown to be special cases of a general two-sample statistic, differing only in the choice of weight function (Tarone and Ware, 1977; Gill, 1980). This work has been extended to a general s- sample statistic (Tarone and Ware, 1977; Andersen et al., 1982) which includes the s-sample log-rank (Breslow, 1970) and generalized Wilcoxon (Prentice, 1978).

Show more
– Where random censoring is an inevitable feature of a study, it is important to include explanatory variables that are probably related to both censoring and **survival** time — e.g., seriousness of illness in the first instance, grade-point average in the second.

55 Read more

APPENDIX D: GRAPH OF NONPARAMETRIC SURVIVAL PLOT FOR FOLLOW UP DAYS FOR 15 PATIENTS.... 55.[r]

79 Read more

Some association between severity of disability and missing information may be due to the child having died before the assessment could be made or through lack of follow-up (for covariate **data** rather than death information). The Kaplan-Meier estimates of the **survival** by region and severity or missingness of impairment variables indicate there are many likely causes of missing covariate information (Figure 1). For example, for both manual and ambulatory variables for Mersey, missing **data** show a strong association with early deaths, although the absolute number of cases with missing information on these two variables for the Mersey region is small. In the North of England, Oxford and Scotland those with missing covariate information on ambulatory and manual variables constitute a mixture of early deaths and late censored observations, contrasting with Northern Ireland in which almost all those with missing information on these two variables are late censored observations. For those with missing cognitive impairment, again Mersey constitutes mainly deaths, and the North of England a mixture of early deaths and late censored observations, for Northern Ireland and Oxford, the missing covariate observations consist of mainly censored **survival** times. In Scotland, where the proportion of cases with missing information on severity of cognitive impairment is high, a significant proportion of those with missing cognitive **data** have died. For those with missing visual impairment, a large number of early deaths give a **survival** pattern which is not too dissimilar to those who are severely impaired for all five regions.

Show more
27 Read more

Marenco et al. developed a Query Integrator System (QIS) to address robust **data** integration from heteroge- neous **data** sources in the biosciences in 2004 [26]. An ontology server was used in QIS to map **data** sources’ metadata to the concepts in standard vocabularies [26]. Cheung et al. developed a prototype web application called YeastHub based on a Resource Description Framework (RDF) database to support the integration of different types of yeast genome **data** in different sources in 2005 [27]. Lam et al. used the Web Ontology Lan- guage (OWL) to integrate two heterogeneous neurosci- ence databases [28] in 2005. In a follow-up study, Lam et al. designed AlzPharm that used RDF and its exten- sion vocabulary, RDF Schema (RDFS), to facilitate both **data** representation and integration [29]. Smith et al. built the LinkHub system leveraging Semantic Web technologies (i.e., RDF and RDF queries) to facilitate cross-database queries and information retrieval in pro- teomics in 2007 [30]. In 2008, Shironoshita et al. intro- duced a query formulation method to execute semantic queries across multiple **data** services in the cancer Bio- medical Informatics Grid (caBIG), named Semantic caBIG **Data** Integration (semCDI). Mercadé et al. devel- oped an ontology-based application called Orymold for dynamic gene expression **data** annotation, integration and exploration in 2009. Based on the QIS [26], Luis et al. designed an automated approach for integrating fed- erated databases using ontological metadata mappings in 2009 [31]. Chisham et al. created the Comparative **Data** **Analysis** Ontology (CDAO) and developed the CDAO-Store system to support **data** integration for phylogenetic **analysis** in 2011 [32]. Kama et al. built a **Data** Definition Ontology (DDO) using the D2RQ (i.e., a platform to provide RDF-based access over relational da- tabases) for accessing heterogeneous clinical **data** sources [33]. Pang et al. developed BiobankedConnect to speed up the process of integrating comparable **data** from different biobanks to get a pooled **data** using onto- logical and lexical indexing in 2014 [34]. Ethier et al. de- signed the Clinical **Data** Integration Model (CDIM) based on the Basic Formal Ontology (BFO) [35] to sup- port biomedical **data** integration in 2015 [36]. Mate et al.

Show more
19 Read more

Most analyses of **survival** **data** use primarily Kaplan – Meier plots, logrank tests and Cox models. We have described the rationale and interpretation of each method in previous papers of this series, but here we have sought to highlight some of their limitations. We have also suggested alternative methods that can be applied when either the **data** or a given model is deficient, or when more difficult or specific problems are to be addressed. For example, **analysis** of recurrent events can make an important contribution to the understanding of the **survival** process, and so investigating repeat cancer relapses may be more informative than concentrating only on the time until the first. More fundamentally, missing **data** are a common issue in **data** collection that in some cases can seriously flaw a proposed **analysis**. Such considerations may be highly relevant to the **analysis** of a **data** set, but are rarely mentioned in the **analysis** of **survival** **data**. One possible reason for this is a perceived lack of computer software, but many of the approaches discussed here are currently incorporated into existing commercial statistical packages (e.g. SAS, S-Plus, Stata) and freeware (e.g. R). On the other hand, the desire may be to ‘keep things simple for the readership’. This view is reasonable, but is valid only where a simple **analysis** adequately represents the **survival** experience of patients in the study. Ensuring the analyses are appropriate is therefore crucial. More advanced **survival** methods can derive more information from the collected **data**; their use may admittedly convey a less straightforward message, but at the same time could allow a better understanding of the **survival** process.

Show more
When used inappropriately, statistical models may give rise to misleading conclusions. Checking that a given model is an appropriate representation of the **data** is therefore an important step. Unfortunately, this is a complicated exercise, and one that has formed the subject of entire books. Here, we aim to present an overview of some of the major issues involved, and to provide general guidance when developing and applying a statistical model. We start by presenting approaches that can be used to ensure that the correct factors have been chosen. Following this, we describe some approaches that will help decide whether the statistical model adequately reflects the survivor patterns ob- served. Lastly, we describe methods to establish the validity of any assumptions the modelling process makes. We will illustrate each using the two example datasets (a lung cancer trial and an ovarian cancer dataset) that were introduced in the previous papers (Bradburn et al, 2003; Clark et al, 2003).

Show more
De Angelis et al. (1999) incorporated the background mortality into the mixture cure proportional hazards (MCPH) model under the exponential and Weibull distri- butions for uncured patients. Phillips, Coldman, and McBride (2002) extended De Angelis’s model to estimate the prevalence of cancer. Sposto (2002) described three link functions for the cure fraction estimation. Lambert et al. (2006) incorporated the background mortality to the parametric Weibull cure model via Newton-Raphson algorithm which is implemented in STATA (Lambert, 2007). They (Lambert et al., 2010) also have proposed a finite mixture of Weibull distributions to add flexibility, which also adds the complexity of estimating more Weibull parameters. Royston and Lambert (2011) discussed different topics such as time-dependent and continuous covariates in relative **survival**. Andersson et al. (2011) proposed to use flexible para- metric **survival** model with cubic splines to estimate cure fraction in population-based studies, which however do not allow covariates included in cure fraction function. All these previous studies on cure model with background mortality used maximum like- lihood function to estimate parameters and cure fraction.

Show more
121 Read more

124 Read more

Hypertension is a major long-term health condition and a leading modifiable risk factor for cardiovascular disease and death. The aim of this study was to examine major factors that affect **survival** time of hypertension patients under follow-up. We considered a total of 430 random samples of hypertension patients who had been under follow up at Yekatit-12 Hospital in Ethiopia from January 2013 to January 2019. Four parametric accelerated failure time distributions: Exponential, Weibull, Lognormal and loglogistic are used to analyse **survival** probabilities of the patients. The Kaplan-Meierestimation method and log-rank tests were used to compare the **survival** experience of patients with respect to different covariates. The Weibull model is selected to best fit to the **data** sets. The results indicate that the baseline age of the patient, place of residence, family history of hypertension, khat intake, blood cholesterol level of the patient, hypertension disease stage, adherence to the treatment and related disease were significantly associated with **survival** time of hypertension patients. But factor like gender, tobacco use, alcohol use, diabetes mellitus status and fasting blood sugar were not significantly associated factors. Society and all stakeholders should be aware of the consequences of these factors which can influence the **survival** time of hypertension patients.

Show more
13 Read more

In general, a characteristic of correlated clustered time-to-event **data** is that the individuals (subjects, rats, etc.) within each cluster share common genetic or envi- ronmental factors such that the failure times within each cluster might be correlated. Dependence induced by clustering must be taken into appropriate account in order to obtain valid inferences on questions of interest. If the failure times represent the same type of time-to-event, we refer to them as “parallel” event **data**. Recurrent event times, such as depressive episodes from the same subject, present another case of clustered correlated **survival** times. Many multivariate **survival** **analysis** models (Lin et al., 2000; Pena, Strawderman, and Hollander, 2001; Huang and Wang, 2005) have been proposed to deal with recurrent failure **data**. Here, the “cluster” is the subject on whom recurrent events are observed. The first part of this dissertation focuses on new methods for regression **analysis** of clustered correlated time-to-event **data**.

Show more
164 Read more

We examined patients with HGGs who underwent preoperative con- ventional and advanced MR imaging (perfusion, DTI). LGGs were not analyzed due to the small number of cases. The patient demo- graphics are shown in Table 1. Exclusion criteria were any surgery, radiation therapy, or chemotherapy of a brain tumor before inclusion in the study as well as lack of histopathologic diagnoses, missing im- aging **data** (pre- or postoperational), or presence of artifacts. Treat- ment was not used to exclude patients. All patients were treated under the same protocol. After resection, they received chemotherapy/radi- ation, including bevacizumab (Temodar). All patients underwent bi- opsy or surgical resection of the tumor with histopathologic diagnosis based on WHO criteria. Postoperative scans were analyzed to deter- mine the extent of resection. The study was approved by the institu- tional review board and was compliant with the Health Insurance Portability and Accountability Act.

Show more
The main advantage of this approach is that it is simple and can be implemented using existing statistical packages. Tsiatis (1997) argued that using estimated values from a repeated measures random effects model for the covariate process is superior to naive methods where one maximizes the partial likelihood of the Cox using the observed covariates values. Ye, Lin, and Taylor (2008) and Albert and Shih (2009) have argued that there are two main disadvantages of a simple TSA; 1) it may provide biased estimates especially when the longitudinal process and the **survival** process are strongly associated; and 2) it does not incorporate the uncertainty of estimation in the first stage into the second stage, possibly leading to under-estimation of the standard errors. We evaluate a number of scenarios to access the validity of these assumptions in simulation studies.

Show more
162 Read more

Deep learning techniques have recently drawn attention in bioinformatics because of their automatic capturing of nonlinear relationships, from their input and a flexible model design. Several deep learning models, which incor- porate a standard Cox-PH model as an output layer, have been proposed for predicting patient **survival**. DeepSurv incorporates a standard Cox-PH regression, along with a deep feed-forward neural network in order to improve **survival** prediction, and eventually build a recommenda- tion system for personalized treatment [16]. DeepSurv has achieved competitive performance, compared to stan- dard Cox-PH alone and random **survival** forests (RSFs). However, the limitation of DeepSurv is that only very low-dimension clinical **data** were examined, where the number of variables was less than 20. Cox-nnet, an arti- ficial neural network for a regularized Cox-PH regression problem, was proposed in order to high-throughput RNA sequencing **data** [17]. Overall, Cox-nnet outperformed a regularized Cox-PH regression (alone), RSF, and Cox- Boost. In Cox-nnet, the top-ranked hidden nodes, which are the latent representations from gene expression **data**, are associated to patient **survival**, and each hidden node may implicitly represent a biological process. In a simi- lar fashion, SurvivalNet adopted a Bayesian Optimization technique, so as to automatically optimize the structure of a deep neural network [18]. SurvivalNet produced slightly better performance than Cox elastic net (Cox-EN) and RSF. Intriguingly, a well-trained SurvivalNet can gener- ate the risk score for each node by a risk backpropagation **analysis**.

Show more
13 Read more