In this paper, survivaldataanalysis is realized by applying Generalized En- tropy Optimization Methods (GEOM). It is known that all statistical distribu- tions can be obtained as MaxEnt distribution by choosing corresponding moment functions. However, Generalized Entropy Optimization Distribu- tions (GEOD) in the form of MinMaxEnt,MaxMaxEnt distributions which are obtained on basis of Shannon measure and supplementary optimization with respect to characterizing moment functions, more exactly represent the given statistical data. For this reason, survivaldataanalysis by GEOD acquires a new significance. In this research, the data of the life table for engine failure data (1980) is examined. The performances of GEOD are established by Chi-Square criteria, Root Mean Square Error (RMSE) criteria and Shannon entropy measure, Kullback-Leibler measure. Comparison of GEOD with each other in the different senses shows that along of these distributions
In this section, we will illustrate the applications of the proposed generalized transformation models and compare the proposed models with the Cox proportional hazards model and the Zeng et al.  transformation cure model by analyzing data from the First National Health and Nutrition Examination Survey Epidemiologic Follow-up Study (NHANES1). The NHANES1 data set is from the Diverse Populations Collaboration (DPC), which is a pooled database contributed by a group of investigators to examine issues of heterogeneity of results in epidemiological studies. The database includes 21 observational cohorts studies, 3 clinical trials, and 3 national samples. In the dataset NHANES1, information for 14,407 individuals was collected in four cohorts from 1971 to 1992. In this analysis, we use data from two of the four cohorts, the black female cohort and the black male cohort. After dropping all missing observations, a total of 2027 patients remains in these two cohorts, including 1265 black females and 762 black males. Survival times of the 2027 patients are used as the response variable. The endpoint is the overall survival time collected in 1992. In the two cohorts 848 patients, about 40% of the total number of patients, died at the end of followup with a maximum survival life time of 7691 days. There were 1179 patients whose survival times were right censored, among them 115 patients had survival time longer than 7691 days. We consider these 115 patients as cured subjects.
The local similarity matrices of nominal attributes and the weights of attributes are often defined manually by do- main experts like medical specialists [8, 16–18]. This works well for straightforward domains with low complex- ity. However, in complex domains even clinical experts in the same field may have different views on the impact of an attribute on the disease of interest. A more objective approach is to derive the similarity from the data in the case base. There are a number of CBR algorithms that are able to learn local similarities from the case base itself. However, many of them are based on the dependency of one or more solution-describing attributes. In clinical con- texts, this is often the case for the attribute “applied ther- apy”. A special case where such CBR based systems struggle is when data from randomized clinical trials is an- alyzed. Here, this dependency would cause a huge bias, because therapy arms (novel therapy against gold standard or placebo therapy) are usually randomized. As an alterna- tive, a similarity measure depending on “overall survival time” might make more sense as it is considered authentic in assessing the success of clinical trials. However, the authors of this article are not aware of the existence of a similarity measure with an explicit focus on “overall survival time ” .
(ii) types of censoring: in many data sets, all one really knows about the timing of the event is that it happened between time a and time b; this is interval censoring and is somewhat more complicated than standard right censoring (some people have not had the event when the analysis is performed). If all data are either interval- or right- censored, then this can easily be treated as discrete-time data. However, if the fail- ure time is known for some cases but only known within an interval for other cases, then there are no clearly superior methods of analysis known, at least when covariates are present. Several variants, with a com- parative assessment, are discussed in . 8. whether the software is special purpose or is part
In 1897, a Paris-born engineer named Vilfredo Pareto showed that the distribution of wealth in Europe followed a simple power-law pattern, which essentially meant that the extremely rich hogged most of a nation's wealth (New Scientist, 19th August 2000, p 22). Economists later realised that this law applied to just the very rich, and not necessarily to how wealth was distributed among the rest. But mathematically, a Pareto distribution has often been used to quantify the distribution of wealth in the world It is called an “ABC analysis” or a Power law (a polynomial relationship - an expression that is constructed from one or more variables and constants, using only the operations of addition, subtraction, multiplication, and constant positive whole number exponents). It exhibits the property of the probability distribution that coincides with social, scientific, geophysical, and many other types of observable phenomena.
Peterson (1977) expressed finite sample censored survivorship function as an explicit function of two empirical sub survival functions which has got strong consistency property. The Kaplan-Meier estimator for the survival function in the censored data problem can be expressed for finite samples as an explicit function of two empirical sub survival functions. Jan et al., (2005) attaches’ non censored rate as weights for censored observations, in case of high proportion of censoring and which makes the survival estimates less biased. Shafiq et al., (2007) proposed a new weight that gives non-zero weight to the last censored observation, in order to avoid zero probability for the same. Many of the popular nonparametric two-sample test statistics for censored survivaldata, such as the log-rank (Mantel, 1966), generalized Wilcoxon (Gehan, 1965), and Peto-Peto (1972) test statistics, have been shown to be special cases of a general two-sample statistic, differing only in the choice of weight function (Tarone and Ware, 1977; Gill, 1980). This work has been extended to a general s- sample statistic (Tarone and Ware, 1977; Andersen et al., 1982) which includes the s-sample log-rank (Breslow, 1970) and generalized Wilcoxon (Prentice, 1978).
– Where random censoring is an inevitable feature of a study, it is important to include explanatory variables that are probably related to both censoring and survival time — e.g., seriousness of illness in the first instance, grade-point average in the second.
Some association between severity of disability and missing information may be due to the child having died before the assessment could be made or through lack of follow-up (for covariate data rather than death information). The Kaplan-Meier estimates of the survival by region and severity or missingness of impairment variables indicate there are many likely causes of missing covariate information (Figure 1). For example, for both manual and ambulatory variables for Mersey, missing data show a strong association with early deaths, although the absolute number of cases with missing information on these two variables for the Mersey region is small. In the North of England, Oxford and Scotland those with missing covariate information on ambulatory and manual variables constitute a mixture of early deaths and late censored observations, contrasting with Northern Ireland in which almost all those with missing information on these two variables are late censored observations. For those with missing cognitive impairment, again Mersey constitutes mainly deaths, and the North of England a mixture of early deaths and late censored observations, for Northern Ireland and Oxford, the missing covariate observations consist of mainly censored survival times. In Scotland, where the proportion of cases with missing information on severity of cognitive impairment is high, a significant proportion of those with missing cognitive data have died. For those with missing visual impairment, a large number of early deaths give a survival pattern which is not too dissimilar to those who are severely impaired for all five regions.
Marenco et al. developed a Query Integrator System (QIS) to address robust data integration from heteroge- neous data sources in the biosciences in 2004 . An ontology server was used in QIS to map data sources’ metadata to the concepts in standard vocabularies . Cheung et al. developed a prototype web application called YeastHub based on a Resource Description Framework (RDF) database to support the integration of different types of yeast genome data in different sources in 2005 . Lam et al. used the Web Ontology Lan- guage (OWL) to integrate two heterogeneous neurosci- ence databases  in 2005. In a follow-up study, Lam et al. designed AlzPharm that used RDF and its exten- sion vocabulary, RDF Schema (RDFS), to facilitate both data representation and integration . Smith et al. built the LinkHub system leveraging Semantic Web technologies (i.e., RDF and RDF queries) to facilitate cross-database queries and information retrieval in pro- teomics in 2007 . In 2008, Shironoshita et al. intro- duced a query formulation method to execute semantic queries across multiple data services in the cancer Bio- medical Informatics Grid (caBIG), named Semantic caBIG Data Integration (semCDI). Mercadé et al. devel- oped an ontology-based application called Orymold for dynamic gene expression data annotation, integration and exploration in 2009. Based on the QIS , Luis et al. designed an automated approach for integrating fed- erated databases using ontological metadata mappings in 2009 . Chisham et al. created the Comparative DataAnalysis Ontology (CDAO) and developed the CDAO-Store system to support data integration for phylogenetic analysis in 2011 . Kama et al. built a Data Definition Ontology (DDO) using the D2RQ (i.e., a platform to provide RDF-based access over relational da- tabases) for accessing heterogeneous clinical data sources . Pang et al. developed BiobankedConnect to speed up the process of integrating comparable data from different biobanks to get a pooled data using onto- logical and lexical indexing in 2014 . Ethier et al. de- signed the Clinical Data Integration Model (CDIM) based on the Basic Formal Ontology (BFO)  to sup- port biomedical data integration in 2015 . Mate et al.
Most analyses of survivaldata use primarily Kaplan – Meier plots, logrank tests and Cox models. We have described the rationale and interpretation of each method in previous papers of this series, but here we have sought to highlight some of their limitations. We have also suggested alternative methods that can be applied when either the data or a given model is deficient, or when more difficult or specific problems are to be addressed. For example, analysis of recurrent events can make an important contribution to the understanding of the survival process, and so investigating repeat cancer relapses may be more informative than concentrating only on the time until the first. More fundamentally, missing data are a common issue in data collection that in some cases can seriously flaw a proposed analysis. Such considerations may be highly relevant to the analysis of a data set, but are rarely mentioned in the analysis of survivaldata. One possible reason for this is a perceived lack of computer software, but many of the approaches discussed here are currently incorporated into existing commercial statistical packages (e.g. SAS, S-Plus, Stata) and freeware (e.g. R). On the other hand, the desire may be to ‘keep things simple for the readership’. This view is reasonable, but is valid only where a simple analysis adequately represents the survival experience of patients in the study. Ensuring the analyses are appropriate is therefore crucial. More advanced survival methods can derive more information from the collected data; their use may admittedly convey a less straightforward message, but at the same time could allow a better understanding of the survival process.
When used inappropriately, statistical models may give rise to misleading conclusions. Checking that a given model is an appropriate representation of the data is therefore an important step. Unfortunately, this is a complicated exercise, and one that has formed the subject of entire books. Here, we aim to present an overview of some of the major issues involved, and to provide general guidance when developing and applying a statistical model. We start by presenting approaches that can be used to ensure that the correct factors have been chosen. Following this, we describe some approaches that will help decide whether the statistical model adequately reflects the survivor patterns ob- served. Lastly, we describe methods to establish the validity of any assumptions the modelling process makes. We will illustrate each using the two example datasets (a lung cancer trial and an ovarian cancer dataset) that were introduced in the previous papers (Bradburn et al, 2003; Clark et al, 2003).
De Angelis et al. (1999) incorporated the background mortality into the mixture cure proportional hazards (MCPH) model under the exponential and Weibull distri- butions for uncured patients. Phillips, Coldman, and McBride (2002) extended De Angelis’s model to estimate the prevalence of cancer. Sposto (2002) described three link functions for the cure fraction estimation. Lambert et al. (2006) incorporated the background mortality to the parametric Weibull cure model via Newton-Raphson algorithm which is implemented in STATA (Lambert, 2007). They (Lambert et al., 2010) also have proposed a finite mixture of Weibull distributions to add flexibility, which also adds the complexity of estimating more Weibull parameters. Royston and Lambert (2011) discussed different topics such as time-dependent and continuous covariates in relative survival. Andersson et al. (2011) proposed to use flexible para- metric survival model with cubic splines to estimate cure fraction in population-based studies, which however do not allow covariates included in cure fraction function. All these previous studies on cure model with background mortality used maximum like- lihood function to estimate parameters and cure fraction.
survival from AD symptom onset can have a great impact on society. In this paper, we assessed the effect of education on survival in subjects with autopsy-confirmed AD. Our method is critical for analyzing data of this sort, since autopsy confirmation leads to doubly truncated survival times, which can result in biased hazard ratio estimators. While AD studies that do not use autopsy con- firmation avoid double truncation, the conclusions based on these studies may be unreliable due to the inaccuracy of clinical diagnosis. This may explain the inconclusive findings of the two meta- analyses conducted by Paradise et al. (2009) and Meng and D’Arcy (2012), who used studies with clinically diagnosed AD subjects to examine the effect of education on survival. Using our pro- posed method on an autopsy-confirmed AD study found that higher education was associated with increased survival. However, these effects were not statistically significant. This may be due to our small sample size and the fact that our sample was highly educated (range = 12 - 20 years). When double truncation was ignored, we found no effect of education on survival.
Hypertension is a major long-term health condition and a leading modifiable risk factor for cardiovascular disease and death. The aim of this study was to examine major factors that affect survival time of hypertension patients under follow-up. We considered a total of 430 random samples of hypertension patients who had been under follow up at Yekatit-12 Hospital in Ethiopia from January 2013 to January 2019. Four parametric accelerated failure time distributions: Exponential, Weibull, Lognormal and loglogistic are used to analyse survival probabilities of the patients. The Kaplan-Meierestimation method and log-rank tests were used to compare the survival experience of patients with respect to different covariates. The Weibull model is selected to best fit to the data sets. The results indicate that the baseline age of the patient, place of residence, family history of hypertension, khat intake, blood cholesterol level of the patient, hypertension disease stage, adherence to the treatment and related disease were significantly associated with survival time of hypertension patients. But factor like gender, tobacco use, alcohol use, diabetes mellitus status and fasting blood sugar were not significantly associated factors. Society and all stakeholders should be aware of the consequences of these factors which can influence the survival time of hypertension patients.
In general, a characteristic of correlated clustered time-to-event data is that the individuals (subjects, rats, etc.) within each cluster share common genetic or envi- ronmental factors such that the failure times within each cluster might be correlated. Dependence induced by clustering must be taken into appropriate account in order to obtain valid inferences on questions of interest. If the failure times represent the same type of time-to-event, we refer to them as “parallel” event data. Recurrent event times, such as depressive episodes from the same subject, present another case of clustered correlated survival times. Many multivariate survivalanalysis models (Lin et al., 2000; Pena, Strawderman, and Hollander, 2001; Huang and Wang, 2005) have been proposed to deal with recurrent failure data. Here, the “cluster” is the subject on whom recurrent events are observed. The first part of this dissertation focuses on new methods for regression analysis of clustered correlated time-to-event data.
Survivalanalysis is a hotspot in statistical research for model- ing time-to-event information with data censorship handling, which has been widely used in many applications such as clinical research, information system and other fields with survivorship bias. Many works have been proposed for sur- vival analysis ranging from traditional statistic methods to machine learning models. However, the existing methodolo- gies either utilize counting-based statistics on the segmented data, or have a pre-assumption on the event probability distri- bution w.r.t. time. Moreover, few works consider sequential patterns within the feature space. In this paper, we propose a Deep Recurrent SurvivalAnalysis model which combines deep learning for conditional probability prediction at fine- grained level of the data, and survivalanalysis for tackling the censorship. By capturing the time dependency through modeling the conditional probability of the event for each sample, our method predicts the likelihood of the true event occurrence and estimates the survival rate over time, i.e., the probability of the non-occurrence of the event, for the cen- sored data. Meanwhile, without assuming any specific form of the event probability distribution, our model shows great advantages over the previous works on fitting various sophis- ticated data distributions. In the experiments on the three real- world tasks from different fields, our model significantly out- performs the state-of-the-art solutions under various metrics.
We examined patients with HGGs who underwent preoperative con- ventional and advanced MR imaging (perfusion, DTI). LGGs were not analyzed due to the small number of cases. The patient demo- graphics are shown in Table 1. Exclusion criteria were any surgery, radiation therapy, or chemotherapy of a brain tumor before inclusion in the study as well as lack of histopathologic diagnoses, missing im- aging data (pre- or postoperational), or presence of artifacts. Treat- ment was not used to exclude patients. All patients were treated under the same protocol. After resection, they received chemotherapy/radi- ation, including bevacizumab (Temodar). All patients underwent bi- opsy or surgical resection of the tumor with histopathologic diagnosis based on WHO criteria. Postoperative scans were analyzed to deter- mine the extent of resection. The study was approved by the institu- tional review board and was compliant with the Health Insurance Portability and Accountability Act.
The main advantage of this approach is that it is simple and can be implemented using existing statistical packages. Tsiatis (1997) argued that using estimated values from a repeated measures random effects model for the covariate process is superior to naive methods where one maximizes the partial likelihood of the Cox using the observed covariates values. Ye, Lin, and Taylor (2008) and Albert and Shih (2009) have argued that there are two main disadvantages of a simple TSA; 1) it may provide biased estimates especially when the longitudinal process and the survival process are strongly associated; and 2) it does not incorporate the uncertainty of estimation in the first stage into the second stage, possibly leading to under-estimation of the standard errors. We evaluate a number of scenarios to access the validity of these assumptions in simulation studies.
Deep learning techniques have recently drawn attention in bioinformatics because of their automatic capturing of nonlinear relationships, from their input and a flexible model design. Several deep learning models, which incor- porate a standard Cox-PH model as an output layer, have been proposed for predicting patient survival. DeepSurv incorporates a standard Cox-PH regression, along with a deep feed-forward neural network in order to improve survival prediction, and eventually build a recommenda- tion system for personalized treatment . DeepSurv has achieved competitive performance, compared to stan- dard Cox-PH alone and random survival forests (RSFs). However, the limitation of DeepSurv is that only very low-dimension clinical data were examined, where the number of variables was less than 20. Cox-nnet, an arti- ficial neural network for a regularized Cox-PH regression problem, was proposed in order to high-throughput RNA sequencing data . Overall, Cox-nnet outperformed a regularized Cox-PH regression (alone), RSF, and Cox- Boost. In Cox-nnet, the top-ranked hidden nodes, which are the latent representations from gene expression data, are associated to patient survival, and each hidden node may implicitly represent a biological process. In a simi- lar fashion, SurvivalNet adopted a Bayesian Optimization technique, so as to automatically optimize the structure of a deep neural network . SurvivalNet produced slightly better performance than Cox elastic net (Cox-EN) and RSF. Intriguingly, a well-trained SurvivalNet can gener- ate the risk score for each node by a risk backpropagation analysis.