A fast, convenient and well-known way toward **regression** is to induce and prune a binary tree. However, there has been lit- tle attempt toward improving the performance of an induced **regression** tree. This paper presents a meta-algorithm capa- ble of minimizing the **regression** loss function, thus, improv- ing the accuracy of any given hierarchical model, such as k-ary **regression** trees. Our proposed method minimizes the loss function of each node one by one. At split nodes, this leads to solving an instance-based cost-sensitive classification problem over the node’s data points. At the leaf nodes, the method leads to a simple **regression** problem. In the case of binary univariate and multivariate **regression** trees, the com- putational complexity of training is **linear** over the samples. Hence, our method is scalable to large trees and datasets. We also briefly explore possibilities of applying proposed method to classification tasks. We show that our algorithm has significantly better test error compared to other state-of- the-art tree algorithms. At the end, accuracy, memory usage and query time of our method are compared to recently in- troduced forest models. We depict that, most of the time, our proposed method is able to achieve better or similar accuracy while having tangibly faster query time and smaller number of nonzero weights.

Show more
10 Read more

Some interesting problems for future research include extending the current formulation of the **regression** coefficient matrix in (2) to the case where the singular values can be repeated such that the left singular vectors (which correspond to latent factors) are not identifiable. Then we will need to estimate the eigenspaces spanned by important singular vectors and characterize the estimation accuracy by some new criterion, such as the one in Cai et al. (2013) and Ma (2013). Another research direction is to explore the theory of random design matrices and this can be addressed by using an extended version of perturbation theory (Lemma 6), where the perturbation in P is also included in the analysis. Moreover, it is computationally straightforward to extend SEED to the generalized **linear** models by adapting the sequential quadratic programming framework. For this extension, we first approximate the loss function by the quadratic loss function and find the optimal unit rank matrix. Then we can add the unit rank matrix to the solution and re-approximate the loss function with another quadratic function around this new solution. By performing these three steps sequentially, we can efficiently estimate the low-rank coefficient matrix. Statistical properties of such estimator can be analyzed by extending the results in Lozano et al. (2011) for greedy sparse procedures to reduced-rank **regression**.

Show more
34 Read more

Site investigation and estimation of physical soil characteristics are essential parts of a geotechnical design process. Evaluation of soil properties beneath and adjacent to the structures at a specific region is of importance in terms of geotechnical considerations since behavior of structures is strongly influenced by the **response** of soils due to loading. Due to difficulty in obtaining high quality undisturbed soil samples and cost & time involved their in, the software based modeling may probably help in assessing the factor of safety relevant to location based assessment of soil liquefaction which is being proposed herewith.

Show more
43 Read more

The application of the current downscaling technique was conducted for 3 selected areas within Greece that present different climate characteristics and complex topography due to their location. The under investigation study areas are Ardas River basin in north-eastern Greece, Sperchios River basin in Central Greece and Geropotamos River basin in Crete island in South Greece. More information about the conditions prevailing in the study areas can be found in [17-21]. Figure 2 and Table 1 depict the general location and the characteristics of the stations that were used for the application of the described downscaling technique. The variables that were spatially interpolated were precipitation, potential evapotranspiration and mean air temperature. As described above, all the factors which affect a certain climatological variable need to be included in the **multi**-**linear** **regression** procedure. These factors can be separated to physical factors that affect the type, occurrence, and amount of the variable and environmental factors that affect their composition. In this certain case, the following available factors were taken into account: latitude, longitude, presence of mountains and their elevation, slope, prevalent wind speed, distance from a body of water, air temperature (for PET), etc.

Show more
The idea to use ROUGE during training is also present in the work of Berg-Kirkpatrick et al. (Section 2). The SVM that Berg-Kirkpatrick et al. use, however, in effect attempts to separate (prefer) the gold summaries from the other possible summaries; ROUGE (more precisely, a modified version of ROUGE -2) is included in the SVM as a loss function to force the SVM to place more emphasis on separating gold summaries from other possible summaries with high ROUGE scores. By contrast, the SVR that we use attempts to directly output the ROUGE score of each sentence. Furthermore, the RBF kernel that we use in the SVR allows the SVR to learn non-**linear** functions, whereas the **linear** SVM of Berg-Kirkpatrick et al. can learn only **linear** functions. We also note that the two SVM s used by Woodsend and Lapata (Section 2) perform binary classification (not **regression**), attempting to separate sentences that a human would include in a summary from sentences that would not be included. The (unsigned) distance from the learnt separating hyperplane of the first SVM is included in the objective function of the ILP model, in effect treating the distance as a confidence score. We believe that our use of a **regression** model ( SVR ) is a better choice, because the distance from an SVM ’s separating hyperplane is often a poor confidence estimate. We also note that the second SVM of Woodsend and Lapata contributes only hard constraints to the ILP model, without taking into account the SVM ’s confidence.

Show more
16 Read more

India being an agricultural country, its economy predominantly depends on agriculture yield growth and allied agro industry products. In India, agriculture is largely influenced by rainwater which is highly unpredictable. Agriculture growth also depends on diverse soil parameters, namely Nitrogen, Phosphorus, Potassium, Crop rotation, Soil moisture, pH, surface temperature and weather aspects like temperature, rainfall, etc. India now is rapidly progressing towards technical development. Thus, technology will prove to be beneficial to agriculture which will increase crop productivity resulting in better yields to the farmer. The proposed project provides a solution for Smart Agriculture by monitoring the agricultural field which can assist the farmers in increasing productivity to a great extent. Weather forecast data obtained from IMD (Indian Metrological Department) such as temperature and rainfall and soil parameters repository gives insight into which crops are suitable to be cultivated in a particular area. This work presents a system, in form of an android based application and a website, which uses Machine Learning techniques in order to predict the most profitable crop in the current weather and soil conditions. The proposed system will integrate the data obtained from repository, weather department and by applying machine learning algorithm: Multiple **Linear** **Regression**, a prediction of most suitable crops according to current environmental conditions is made. This provides a farmer with variety of options of crops that can be cultivated. Thus, the project develops a system by integrating data from various sources, data analytics, prediction analysis which can improve crop yield productivity and increase the profit margins of farmer helping them over a longer run.

Show more
pertumbuhan pokok seperti model pertumbuhan taklinear, analisis regresi **linear** berganda dan analisis regresi-M teguh. Data hasil kelapa sawit, data kandungan nutrien dalam daun dan data ujikaji pembajaan yang dikumpulkan daripada tujuh buah stesen di kawasan pedalaman dan tujuh buah stesen di kawasan tanah lanar pantai telah disediakan oleh Lembaga Minyak Sawit Malaysia (MPOB). Dua belas model pertumbuhan taklinear telah dipertimbangkan. Kajian awal menunjukkan model pertumbuhan taklinear logistik adalah yang terbaik untuk memodelkan pertumbuhan hasil kelapa sawit. Kajian ini diteruskan dengan menerokai hubungan di antara hasil kelapa sawit dengan kandungan nutrien dalam daun dan nisbah keseimbangan nutrien. Bagi mempertingkatkan keupayaan model, kajian ini

Show more
62 Read more

The first argument of the lm() function is a formula object, with the out- come specified followed by the ∼ operator then the predictors. More in- formation about the **linear** model summary() command can be found using help(summary.lm). By default, stars are used to annotate the output of the summary() functions regarding significance levels: these can be turned off using the command options(show.signif.stars=FALSE).

28 Read more

to **linear** **regression**. As expected, the experiments with regularization produced lower variance among the different experiments in terms of the BLEU score, and the resulting set of the parameters had a smaller norm. However, because of the small num- ber of features used in our experiments, regulariza- tion was not necessary to control overfitting. 5 Discussion

Now we consider a test we will call “Test A” that is partly chance and partly skill: Instead of predicting the outcomes of 12 coin flips, each subject predicts the outcomes of 6 coin flips and answers 6 true/false questions about world history. Assume that the mean score on the 6 history questions is 4. A subject's score on Test A has a large chance component but also depends on history knowledge. If a subject scored very high on this test (such as a score of 10/12), it is likely that they did well on both the history questions and the coin flips. For example, if they only got four of the history questions correct, they would have had to have gotten all six of the coin predictions correct, and this would have required exceptionally good luck. If given a second test (Test B) that also included coin predictions and history questions, their knowledge of history would be helpful and they would again be expected to score above the mean. However, since their high performance on the coin portion of Test A would not be predictive of their coin performance on Test B, they would not be expected to fare as well on Test B as on Test A. Therefore, the best prediction of their score on Test B would be somewhere between their score on Test A and the mean of Test B. This tendency of subjects with high values on a measure that includes chance and skill to score closer to the mean on a retest is called “**regression** toward the mean.”

Show more
55 Read more

With Table 4.16, we can therefore, arrive at an estimated model thus: socio- economic dividends (2.118) = 0.273 infrastructural + 0.2 superstructure. Consequently, we can see from Figs. 4.1 and 4.8, that the infrastructural aspects of ecotourism uniquely contribute more to the explanation of the **regression** model with beta coefficient of 0.273 and a standard error of 0.06 compared to superstructure which has a beta coefficient of 0.1 and on large standard error of 0.112 (Table, 4.16). The table also shows that the t-value for the infrastructural aspect of ecotourism is well above the value of 2, so it meets the guideline to be a useful predictor; whereas the t-value of 1.78 for superstructure aspect of ecotourism does not meet the guideline for being a useful predictor. The implication of this result is that even though there exists some social amenities and infrastructures, they are grossly inadequate to make a significant impact on the resident indigenes. Hence, the null hypothesis which states that there is no significance relationship between ecotourism development and the provision of social amenities and infrastructures in the study area is retained.

Show more
29 Read more

In practice, when applying a statistical method it often occurs that some observations deviate from the usual model assumptions. Least-squares (LS) estimators are very sensitive to outliers. Even one single atypical value may have a large effect on the **regression** parameter estimates. The goal of robust **regression** is to develop methods that are resistant to the possibility that one or several unknown outliers may occur anywhere in the data. In this paper, we review various robust **regression** methods including: M-estimate, LMS estimate, LTS estimate, S-estimate, τ-estimate, MM-estimate, GM-estimate, and REWLS estimate. Finally, we compare these robust estimates based on their robustness and efficiency through a simulation study. A real data set application is also provided to compare the robust estimates with traditional least squares estimator.

Show more
55 Read more

A Radial Basis Function SVM Kernel is used in the project to train the input features to achieve better predictive estimation of the Bitcoin Market Price. The Kernel hyperparameters such as C, gamma and epsilon are set to tune the algorithms performance. Similar to the **Linear** SVM model, the Grid search Logic is implemented on 5-fold Cross Validation set in order to achieve the optimum meta-parameters. The estimated parameters are then used to train the model and predict the results. The Kernel SVM (RBF) metrics and model accuracy is shown in Table 7.

Show more
~ by the entropy-logit model, that correctly identifies 0 and 1 outputs 167 and 143 times, so the total rate of correct forecasts is 310, or 76.9%. It is interesting to note that both **linear** and entropy-logit models better identify the level y=1 of the satisfied customers. The other sections D, E, and F of Table 3 compare predictions by each two of the three constructed models, where again the **linear** and entropy-logit models yield very close counts of 204 and 195 for 0 and 1 binary outputs, so the total rate of the coinciding results equals 99%.

13 Read more

The use of Neural Networks (NNs) in the time series fore- casting domain is now well established, with a number of re- cent review and methodology studies (e.g. [1], [2], [3]). The main attribute which differentiates NN time series modelling from traditional econometric methods is their ability to gen- erate non-**linear** relationships between a vector of time series input variables and a dependent series, with little or no a pri- ori knowledge of the form that this non-linearity should take. This is opposed to the rigid structural form of most economet- ric time series forecasting methods (e.g. Auto-Regressive (AR) models, Exponential Smoothing models, (Generalised) Auto- Regressive Conditional Heteroskedasticity models, and Auto- Regressive Integrated Moving Average models) [4], [5], [6]. Apart from this important difference, the underlying approach to time series forecasting itself has remained relatively un- changed during its progression from explicit **regression** mod- elling to the non-**linear** generalisation approach of NNs. Both of these approaches are typically based on the concept that the most accurate forecast, if not the actual realised (target) value, is the one with the smallest Euclidean distance from the actual. When measuring financial predictor performance however, practitioners often use a whole range of different error mea- sures (15 commonly used time series forecasting error mea- sures alone are reported in [7]). These error measures tend to reflect the preferences of potential end users of the fore- cast model. For instance, in the area of financial time series forecasting, correctly predicting the directional movement of a time series (for instance of a stock price or exchange rate) is arguably more important than just minimising the forecast Euclidean error.

Show more
By now, inference in the basic **regression** problem is well-understood from both frequentist and Bayesian perspectives. However, for the variable selection problem, a fully satisfactory theory/method has yet to emerge. It is not our goal to review the extensive literature on variable selection, but it can be insightful to see where the fundamental difficulty arises. The most popular strategies are stepwise selection procedures and the lasso (Tibshirani 1996) and its many variants; see Hastie et al. (2009) for a thorough review of these strategies. These methods have a common drawback, which is that they cannot assign any meaningful measures of uncertainty---probabilistic or otherwise---to the set of variables selected. From a Bayesian perspective, probabilistic summaries of various models can be obtained by introducing a prior probability over the model space and a conditional prior on the model parameters, and performing a Markov chain Monte Carlo scan of the model space. For relatively small p this scheme is feasible (e.g., Clyde and George 2004), but it typically requires a convenient choice of prior for parameters given the model, which may overly influence the posterior calculations. Furthermore, as p increases, estimates of posterior model probabilities become less reliable heaton.scott.2009, making it questionable whether the ``mostly likely'' model has been identified. Since there seems to be no fully satisfactory approach among the existing methods, it makes sense to consider something new and different.

Show more
20 Read more

Example: Consider the example of the scrambled wavelets. In dimension 1, using a wavelet dyadic-tree of depth H (i.e., F = 2 H+1 ), the numerical cost for computing Ψ is O(HPN) (using one tree per random feature). Now, in dimension d the classical extension of one-dimensional wavelets uses a family of 2 d − 1 wavelets, thus requires 2 d − 1 trees each one having 2 dH nodes. While the resulting number of initial features F is of order 2 d(H+1) , thanks to the lazy evaluation (notice that one never computes all the initial features), one needs to expand at most one path of length H per training point, and the resulting complexity to compute Ψ is O(2 d HPN). Thus the method is **linear**

Show more
38 Read more

Calculation of correlation between **multi** independent variables to 1 dependent variable can be done by multiple **linear** **regression** [32].The effect of variable gender, age, and nature of the task on the level of confusion of participants was analyzed using multiple **linear** **regression** analysis in fig.1.

high, resulting in higher complexity of the computation. Increase in the number of the features would lead to not only greater calculation complexity, but also poorer recognition effect due to the redundancy information. Principal Component Analysis (PCA) [5] is a commonly used tool for dimension reduction in analyzing high dimensional data; Original PCA method is designed to firstly expand the training samples into a column or row of high-dimensional vector and then calculate the feature vectors of the training sample covariance matrix consisting projection matrix based on these vectors. Finally, the projection matrix changes a high-dimensional vector projection of the original training samples into a low one. However, vectorization process may destroy the object's underlying spatial structures and PCA runs into serious diﬃculties in analyzing functional data because of the “curse of dimensionality” (Bellman 1961). Hu proposed on the basis of the **Multi** **linear** PCA [8] that, the MPCA is a general extension of traditional **linear** methods, such as PCA or matrix SVD. The MPCA can obtain the variance of dimensionality reduction by directly using matrix or higher-order tensor, without transforming matrix or higher-order tensor into a vector for the PCA. Therefore, the above-mentioned problems can be overcome.

Show more