variable importance

Top PDF variable importance:

Learning temporal weights of clinical events using variable importance

Learning temporal weights of clinical events using variable importance

account their importance, in terms of informativeness, when building the predictive models. To mitigate this problem, this study focuses on how to learn the weights automatically when constructing the predictive models in a way that reflects both temporality and importance of the clinical events. To take into account the temporality, each patient’s medical history is segmented into a set of time windows and features are extracted from each of them; for the importance, a random forest model is built using these extracted features to produce variable importance, which is then used as the weight for each feature. To apply these weights, two strategies are proposed: weighted aggrega- tion and weighted sampling. The former aggregates the weighted clinical events from different time windows to form new features, while the latter keeps the original fea- tures but samples them with their weights as probabilities when building each tree in the forest. We conclude here that learning weights is significantly beneficial in terms of predictive performance in the weighted sampling strat- egy. Moreover, weighted aggregation generally diminishes the impact of temporal weighting of the clinical events, irrespective of whether the weights are pre-assigned or learned.

11 Read more

Early Prediction and Variable Importance of Certificate Accomplishment in a MOOC

Early Prediction and Variable Importance of Certificate Accomplishment in a MOOC

We can find in the literature different studies that aim at the prediction of learning outcomes of students after interacting with ITS environments [3,12]. MOOC envi- ronments are different to ITSs, e.g. the former usually contain more complex and specific exercise players that can provide different features (e.g. about hints) that usually MOOC environments do not have. Additionally, the interaction with videos and the context is different, thus we can expect that the variables that are used to pre- dict learning outcomes in each educational environment differ a bit. We can find also different studies in this direction using MOOC environments, for example towards the prediction of learning gains after the interaction with a Khan Academy instance [18] and also to predict student knowledge status in MOOCs using Open edX [9]. Other works in the literature have approached the prediction of certificate accomplishment using different methods such as LDA [6]. In our work we focus on early prediction of certificate accomplishment, performance of different models and variable importance.

10 Read more

Class Based Variable Importance for Medical Decision Making

Class Based Variable Importance for Medical Decision Making

The fact that CVI measures the relative effect of a variable between classes and is not weighted by the proportion of nodes in the tree allows for the detection of more nuanced relationships. However, if the goal is to find variables with high relationships and a large portion of classes, a holistic look at the feature importance can be employed. Variables that have both high variable importance and high ratio importance can be identified as having affecting many examples and in the same way. It may not be easy to infer the direction of the relationship, but by looking at patients on a case by case basis and applying domain expertise, variables can be identified that are influential to a given example and its class prediction. While it is difficult for a single measure to convey a complete picture of a data set, creating a variety of measures to represent different nuances is key to better understanding and insight. In this regard, further exploration of variable importance in regard to inference is essential. Exploring different approaches for calculating the variable effect within the trees may result in more useful measures. For example, employing the Gini index instead of an indicator function or incorporating the actual splitting rule on the nodes into the class importance calculation may present the variables differently. Devising a weighting scheme to give more credence to importance ratios with a larger proportion of nodes in the tree may make detecting variables influencing a larger portion of the population. In future work, we plan to explore these nuances further.

8 Read more

Hillslope characteristics as controls of subsurface flow variability

Hillslope characteristics as controls of subsurface flow variability

Both analyses exposed that the water table response at the event scale is less explainable: partial correlation between in- fluential predictors and water table response and explained variance of ensemble tree models was mostly lower than at the seasonal scale and for the entire time. Furthermore, trends of predictor variable importance were more variable at the event scale. Short-term variability of water table dy- namics is thus less explainable than longer-term variability. A closer look at the differences among seasons and events revealed that there is a tendency of lower explainability for summer seasons and summer events (by degree of explain- ability we mean strength of partial correlation between water table response and influential predictors, and degree of en- semble tree model performance). Interestingly, we found a negative correlation between explainability of water table re- sponse and rainfall intensity. We also found a moderate rela- tionship between explainability of water table response and AWI; nevertheless, due to the small sample size (AWI only available at event scale) this should be interpreted cautiously. These findings indicate that the mapped predictors explain the observed water table response for time periods with high rainfall intensity (and low AWI) to a lesser degree. We reason that during such periods additional drivers of SSF dynamics play a more important role, e.g. stronger hydrophobicity and thus bypass-flow, exploitation of preferential pathways initi- ated by high rainfall intensity, etc.

17 Read more

Subgroup Analysis via Recursive Partitioning

Subgroup Analysis via Recursive Partitioning

Subgroup analysis is an integral part of comparative analysis where assessing the treatment effect on a response is of central interest. Its goal is to determine the heterogeneity of the treatment effect across subpopulations. In this paper, we adapt the idea of recursive partitioning and introduce an interaction tree (IT) procedure to conduct subgroup analysis. The IT procedure automatically facilitates a number of objectively defined subgroups, in some of which the treatment effect is found prominent while in others the treatment has a negligible or even negative effect. The standard CART (Breiman et al., 1984) methodology is inherited to construct the tree structure. Also, in order to extract factors that contribute to the heterogeneity of the treatment effect, variable importance measure is made available via random forests of the interaction trees. Both simulated experiments and analysis of census wage data are presented for illustration.

18 Read more

Aging Stormwater Infrastructure: An Exploration of the Evolution of Aging Bioretention Media, Hydrologic Mitigation, and Water Quality Performance.

Aging Stormwater Infrastructure: An Exploration of the Evolution of Aging Bioretention Media, Hydrologic Mitigation, and Water Quality Performance.

Analysis proved that the default number of trees, 500, was sufficient to stablize the models constructed herein. RF models seek to reduce error by adjusting the number of predictor variables considered for prediction in each tree (represented as "mtry" in the model); therefore, RF models were tuned by adjusting mtry from 2 to 6 and an additional tuning parameter, the minimal node size (min.node.size) for a split to occur, from 1 to 3. The combination of mtry and min.node.size resulting in the lowest OOB Root Mean Square Error (RMSE) was deemed optimal. Due to the random nature of the model and selection of calibration and validation data, variation will occur in each model’s results. Following methods by Tirpak et al. [ 2018 ] and Stillwell [ 2019 ] , an effort was made to minimize the impact of variation by executing each model 15 times. Variable importance was estimated by the permutation procedure for each predictor variable during each model execution and averaged.

213 Read more

Applying Multi-Output Random Forest Models to Electricity Price Forecast

Applying Multi-Output Random Forest Models to Electricity Price Forecast

Removing variables that have a negative effect on the joint function error (as identified in the variable importance analysis explained in subsection 4.1) results in a reduction in both price and demand forecasting errors (Row “Multivariate RF-CI NA” in bold Table 3). This means that when carrying out the adequately the multivariate analysis, selecting input variables can favor forecasting for both responses. In our case, the variables related to renewable energy production, day type and fuel gas energy production have been removed, allowing other variables whose influence on the joint function is minor, to appear more often and thus improving the algorithm’s forecasting accuracy.

17 Read more

Random Forest for Scale and Item Level Prediction Analysis in the Social Sciences: An Application Using Organizational Deviance Data.

Random Forest for Scale and Item Level Prediction Analysis in the Social Sciences: An Application Using Organizational Deviance Data.

organization’s best interests (Spector, Fox, Penney, Bruursema, Goh, & Kessler, 2006). However, predictive models for deviant behavior are a topic of interdisciplinary scientific concern. Findings from this study are likely of particular interest to disciplines other than I/O, such as Forensic Psychology and Criminology. Consideration over the validity and utility of both measures and prediction models is a topic of growing attention in these fields, culminating in a recent court case (Loomis v. Wisconsin, 2017) over the ethical nature of such algorithms. Findings from the current study provide insight to researchers regarding when, how, and why ML algorithms can outperform standard linear regression techniques while also providing equal or better information about variable importance in predicting social science outcomes. While general fondness for ML algorithms, such as RF, has grown in recent years, few studies seem to address the pragmatic application of ML techniques, which can overcome many of the

65 Read more

Profiling alumni of a Brazilian public dental school

Profiling alumni of a Brazilian public dental school

The two-step algorithm analysis allows subjects to be divided into an optimal number of clusters according to continuous and categorical variables. The variable importance for cluster segmentation was ranked by a Chi-square test in which each cluster group was tested against the overall group. Since multiple tests were per- formed, Bonferroni adjustments were applied to control the false-positive error rate. An alternative importance measure, which has the advantage of placing both types of variables on the same scale, is based on statistical sig- nificance values using -log 10 of the statistical significance (-log10 P-value). This transformation stretches the origi- nal scale from 0 to infinity (instead of a small band from 0 to 1), so that larger values of -log10 of P-value equate to greater significance.

9 Read more

A global model of avian influenza prediction in wild birds: the importance of northern regions

A global model of avian influenza prediction in wild birds: the importance of northern regions

precipitation and temperature. A number of time- dependent variables were included (i.e. mean tempe- rature in January - December) and were manipulated in order to maintain their relevance to collection locations in the Southern Hemisphere. For points with negative latitude values, time-dependent variables were shifted by 6 months, such that months were correctly associated with the austral seasons. Geographic variables included elevation, which has been identified as an important fac- tor in other AIV models [19], and lakes, rivers, and wet- lands, which are important to waterfowl. We calculated some layers from existing predictor variables using the Spatial Analyst Tool in ArcMap. The distances from fresh water features and coastline were calculated using the Euclidean Distance Tool. Slope was calculated from elevation and aspect was in turn calculated from slope. Anthropogenic variables included indices of human ma- nipulation, infrastructure, and population density. Due to the importance of chickens and pigs in the transmis- sion of AIV to humans, we included predicted poultry and pig densities [26,27]. Not all layers included Antarc- tica, so the entire continent was excluded from the study area (layers trimmed at −57° latitude) to prevent biases in calculation. We then used the Geospatial Modeling Environment (GME; [28]) to intersect, or extract the values of the predictor variables at the same geographic coordinates as the sample data points. GME adds the values of each predictor variable to the database as an additional column. The intersected database is then imported into ArcMap for visualization. Layers and metadata are stored at and can be obtained from the Ecological Wildlife Habitat Analysis of the Land- and Seascape (EWHALE) Lab at the University of Alaska Fairbanks (UAF).

9 Read more

The Relationship Between GHR Recruitment And Employee Engagement In Jordanian Public Universities

The Relationship Between GHR Recruitment And Employee Engagement In Jordanian Public Universities

study showed a positive relationship between GRH recruitment and employee engagement. These results demonstrated the importance of the independent variable of GRH recruitment practices such as recruitment resources, recruitment strategy, recruitment ethics, and recruitment sources quality evaluation. Also this study showed the importance degree for independent variable dimensions depending on the mean of study sample toward GHR recruitment process on employee engagement in public Jordanian universities as the following respectively, recruitment strategy (3.13), recruitment resources (4.67), recruitment ethics(2.99) and recruitment sources quality evaluation(3.36).

6 Read more

A SUCCESFULL WAY TO SELL HANDICRAFTS IN ALBANIA: EMPIRICAL EXPLORATION OF THE ROLE OF WEB MARKETING

A SUCCESFULL WAY TO SELL HANDICRAFTS IN ALBANIA: EMPIRICAL EXPLORATION OF THE ROLE OF WEB MARKETING

This article intends to explore role of web marketing in the sale of handicrafts in Albania. This connection between these two concepts is based on analysis of data collected from craft units in Albania. This research supplyes us with the information on the positive or negative relationship between the dependent variable “web marketing” and independent variable, “selling handicraft products”. This study also focuses on one of the problems of web marketing practices, its value, or more precisely the contribution of web marketing in selling craft products in Albania. Many artisans pay a lot about web marketing. Web marketing professionals try to prove how web marketing influences in this area (handicraft), for example how web marketing affects the growth of profits, as contributing to the spread of the market and supports consumer satisfaction. Research question in this paper is: "Is there a web marketing impact on the sale of handicrafts in Albania? Research regarding the measurement of web marketing and sale of craft products and connectivity between them are reflected in this paper.

10 Read more

Wildlife and Livestock Grazing Effects on Some Physical and Chemical Soil Properties (Case Study: Kalmand-Bahadoran Arid Rangelands of Yazd Province)

Wildlife and Livestock Grazing Effects on Some Physical and Chemical Soil Properties (Case Study: Kalmand-Bahadoran Arid Rangelands of Yazd Province)

particles become closer together, undergoing a reduction in porosity and an increase in density (Sanadgol, 2002; Huang et al., 2007; Afrah et al., 2010). Another effect of grazing is a reduction in the amount of water penetration in the soil, thus decreasing its moisture content. Sanadgol (2006) found there to be little hydrological difference between various pastures under balanced and continuous grazing, but in arid and semi-arid rangelands there is no significant difference in the water permeability of the soil when under light, medium or no grazing. Owing to the fact that soil is more stable than vegetation and is usually affected after it, by preventing the development of this process in the early stages of degradation it can be hoped to restore vegetation easily with the lowest costs and over the shortest possible time (Moghaddam, 2009). Identifying and assessing the value and type of grazing effect will help scientific and systematic range management, which requires adequate knowledge (Jalilvand et al., 2007; Cuevas et al., 2012). Different results have been reported as to the effects of grazing on soil characteristics, which may be due to different climate, soil, vegetation, range management and animal types. Some studies suggest that a reduction in grazing intensity causes a significant decrease in bulk density and increases moisture content (Eteraf & Telvari, 2005; Yong-Zhong et al., 2005; Moradi et al., 2008). With increasing grazing intensity, acidity and calcium carbonate will increase, whilst the amount of organic matter and electrical conductivity will significantly decrease (Potter et al., 2001; Moosavi et al., 2001; Aghasi et al., 2006; Pei et al., 2008; Shifang et al., 2008). McDowell et al. (2004) investigated the effects of deer grazing on soil quality in southern New Zealand. The results showed that bulk density in the paddock was 1.06 gr/cm 3 and 1.10 gr/cm 3 , one day and six weeks after grazing, respectively. In general, it can be said that grazing causes changes in soil properties (Kohandel et al., 2008). To manage a rangeland ecosystem, these changes should be identified to avoid unwanted and harmful conversions. Evaluating the effects of grazing is essential towards finding a way for effective management and the adoption of a strategy for stocking in rangelands. Owing to the importance of wildlife in range management and their role in desert rangelands, the problems of livestock grazing and the limited information about their grazing effects in desert areas, these studies are necessary in arid ecosystems. Because of the importance and necessity of identifying soil properties in arid rangeland management and

7 Read more

OUTLIER DETECTION FOR DYNAMIC DATA STREAMS USING WEIGHTED K-MEANS

OUTLIER DETECTION FOR DYNAMIC DATA STREAMS USING WEIGHTED K-MEANS

This paper presents a new k-means type clustering algorithm that can calculate weights to the variables. This method is efficient for dynamic data streams in order to overcome the global optimum problems. The variable weights produced by the algorithm measures the importance of variable in clustering and can be used in variable selection in which the data items with similar properties are grouped into clusters, the new approach of applying this weighted k-means on dynamic data streams is carried out in order to have efficient outlier detection within the user specific threshold value.

7 Read more

Flux weakening control of Permanent Magnet Machine based aircraft electric starter generator

Flux weakening control of Permanent Magnet Machine based aircraft electric starter generator

the current limiter (3). Fig. 11 shows the resultant stabilised result when using the proposed current limiting trajectory (19). It is clearly seen that the proposed current limiting technique has a crucial stabilising effect on system: if the unstable area of operation in Fig. 10 increases with the speed, in Fig. 11 it is completely eliminated when the proposed variable current limiting mechanism is employed.

6 Read more

The role of the most recent prior period's price in value relevance studies : a thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Finance at Massey University, Palmerston North, New Zealand

The role of the most recent prior period's price in value relevance studies : a thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Finance at Massey University, Palmerston North, New Zealand

that without past price as an additional explanatory variable, value relevance models can be misspecified due to a missing variable problem, since current trailing earnings can act as a proxy for the strong forward-looking information provided by the most recent prior period’s price. Earnings and equity prices are both non-stationary, so they move together over time, thus potentially creating a spuriously significant statistical relationship between earnings and next period’s price when a non-autoregressive empirical model is used to explain prices. It is not surprising that the most recent prior period’s price is important for explaining subsequent prices, since it is well-known that the level of equity prices follows an autoregressive, non-stationary process (e.g., Aggarwal and Kyaw, 2004). The first difference in equity price appears to follow a stationary, non-persistent process, however, as noted by Jeon and Jang (2004). We therefore subsequently use change in equity price as the dependent variable, for econometric reasons, to explore the value relevance of earnings, thus further improving the model specification. When the model specification is improved by utilising change in price as the dependent variable, the results reveal a random walk process, and earnings play only a weak role in predicting or explaining changes in price.

205 Read more

Personal Indebtedness, Spatial Effects and Crime

Personal Indebtedness, Spatial Effects and Crime

Just as the importance of personal indebtedness, noted a moment ago, varies across crime type, the same is true of other explanatory variables. A good example of this is our finding that income is directly and negatively associated with thefts of cars, but , is not significantly associated with thefts from cars. A plausible explanation for this is that in areas with higher incomes, cars have more sophisticated alarm and immobilser systems and are therefore harder to steal. Stealing from a car meanwhile, is perhaps as easy for a more expensive car as an inexpensive car.

11 Read more

Evaluating the Node Importance in Lifeline Systems Based on Variable Fuzzy Clustering

Evaluating the Node Importance in Lifeline Systems Based on Variable Fuzzy Clustering

After selecting and calculating the node importance evaluating indices, an effective classification and sort method is needed. Recent years, a large number of practical results showed that the variable fuzzy clustering method is a good option. It made a breakthrough for the traditional fuzzy clustering iterative model. The clustering results are gained in a more balanced way by dynamic iteration of index weight, relative membership degree with cluster center [8-9]. Based on this, the variable fuzzy clustering method is adopted for clustering analysis of lifeline system nodes.

6 Read more

Analyzing Determinants of Tax Morale based on Social Psychology Theory: Case study of Iran

Analyzing Determinants of Tax Morale based on Social Psychology Theory: Case study of Iran

Religious practices can prohibit illegal behaviors because it has a sanctioning system in itself that legitimizes and supports social values (Hirschi & Stark, 1969). Torgler (2006), also, elaborates on the idea of religiosity as a factor that affects tax morale. It is based on survey questions that measure church attendance, religious education, church participation, and importance of religion, religious guidance related to good and evil and trust in the church as an organization. All variables are found to have a significant and positive effect on tax morale (Torgler, 2006). Criminology literature, also, has reported a negative correlation between religious membership and crime (see for example Torgler & Schneider, 2007:449). Thus, as religiosity decreases the levels of criminal actions, it can increase tax morale. According to the equity theory, when paying taxes to government is perceived as a patriotic duty, fairness can be seen in a taxpayer’s loyalty to his country (Torgler, 2003). This national pride indicates an individual’s behavior within organizations, groups and society (Torgler & Valev, 2010). An increase in national pride would lead to increased tax compliance in a country. Variable of importance of politics also have an effect on tax morale. This could be better explained with equity theories. The relationship between taxpayers and government would involve not only the provision of public goods, but also a psychological relation including the way both parts treat each other and the fairness of the procedures leading to political outcomes (Lago-Penas & S. Lago-Penas, 2010). According to Schnellenbach (2006) tax evasion might be considered as an instrument to punish Leviathan governments that are going to increase tax revenues rather than regarding the preferred policies of the taxpayers.

17 Read more

Variable selection strategies and its importance in clinical prediction modelling

Variable selection strategies and its importance in clinical prediction modelling

hypertension risk in the Chinese population. A prospec- tive cohort of 2506 ethnic Chinese community individuals in Taiwan was used to develop the model. Two different models, a clinical model with five variables and a biochem- ical model with eight variables, were developed. The objective was to identify high- risk Chinese community individuals with hypertension risk using the newly devel- oped model. The variables for the model were selected using the stepwise selection method, the most common method for variable selection that permits using both forward and backward procedures iteratively in model building. Generally, to apply a stepwise selection proce- dure, a set of candidate variables need to be identified first. However, information about candidate variables and the number of variables considered in stepwise selection was absent in this study. Although it was indicated that the selected variables were statistically associated with the risk of hypertension, without a discussion about the potential candidate variables, how variables were selected and how many were included in the model, the reader is left uninformed about the variable selection process, which raises concern about the reliability of the finally selected variables. Moreover, setting a higher signifi- cance level is strongly recommended in stepwise selec- tion to allow more variables to be included in the model. A significance level of only 0.05 was used in this study, and that cut- off value can sometimes miss important vari- ables in the model. This likely happened in this study, as an important variable termed ‘gender’ was forcefully entered into the biochemical model even though it did not appear significant at the 0.05 level. Alternatively, the study could use Akaike information criterion (AIC) or Bayesian information criterion (BIC) (discussed later), which often provide the most parsimonious model.

7 Read more

Show all 10000 documents...

Related subjects