Informed choices of the lifting dimension q. In many settings in practice, a suitable choice of the lifting dimension q is usually not known in advance. Smaller values of q allow us to obtain more concisely described reconstructions, although such estimates may not fit the given **data** well; on the other hand, larger values of q provide better fidelity to the **data** and yield more complex reconstructions, but one runs the risk of over-**fitting**. A practically relevant question in our context is to design methods akin to cross-validation to choose q in a **data**-driven manner. We illustrate our ideas with the following stylized experiment. In this instance, we consider reconstructing the ` 1 -ball in R 3 from 100 measurements corrupted by additive Gaussian noise with standard deviation σ = 0.1. We partition the measurements into two subsets of equal size. Next, we apply our method with the choice of C = ∆ q as our lifting set on the first partition, and we evaluate the mean squared error (MSE) of our computed estimator on the second partition. We repeat the process across 50 different random partitions, and over values of q in {3, . . . , 10}. The left sub-plot of Figure 3.15 shows the MSE averaged over all partitions. We observe that the error decreases as q increases initially as models that are more expressive allow us to fit to the **data** better. We observe that the error subsequently remains approximately equal (instead of increasing, as one might expect), and this occurs because our regression restricts to **convex** **sets**, which prevents the MSE from growing unboundedly.

Show more
193 Read more

Our framework for spatial **data** mining is based on spatial neighbourhood relations between ob- jects and on the induced neighbourhood graphs and neighbourhood paths which can be defined with respect to these neighbourhood relations. Thus, we introduce a set of database primitives or basic operations for spatial **data** mining which are sufficient to express most of the spatial **data** min- ing **algorithms** from the literature. This approach has several advantages. Similar to the relational standard language SQL, the use of standard primitives will speed-up the development of new **data** mining **algorithms** and will also make them more portable. Second, we can develop techniques to efficiently support the proposed database primitives (e.g. by specialized index structures) thus speeding-up all **data** mining **algorithms** which are based on our database primitives. Moreover, our basic operations for spatial **data** mining can be integrated into commercial database management systems. This will offer additional benefits for **data** mining **applications** such as efficient storage management, prevention of inconsistencies, index structures to support different types of database queries which may be part of the **data** mining **algorithms**.

Show more
32 Read more

two studies proposed a sequential pattern mining technique that incorporated alert information. 327[r]

16 Read more

Figure 4.1 shows an example of this analysis. Initially, a mesh with a scalar function (see Figure 4.1(a)) is converted into a Reeb graph (see Figure 4.1(b)). After that, the critical points are paired, and the persistence diagram displays the **data**, as seen in Figures 4.1(c) and 4.1(d). This final step can still be challenging, particularly when considering essential critical points—those critical points associated with cycles in the Reeb graph. These require an expensive search that needs to be performed on each essential critical point. While many prior works have provided efficient **algorithms** for the calculation of Reeb graph structures themselves, to our knowledge, none have provided a detailed description of an algorithm for pairing critical points.

Show more
115 Read more

MapWindow is an open source GIS software that Windows users are familiar with and it is currently being developed and the latest version released continuously. MapWindow support plug-ins in the form of dynamic link libraries (.dll *), and the development environment such as Visual Studio Community Edition is available for free download. This tool supports using the language C# and dot.NET frame. Our implementation of the proposed **algorithms** to run experimental evaluation is conducted using C / C ++, therefore the Visual Studio development environment in the most suitable choice to put our source code into.

Show more
The BFGS method has proven successful in many real world **applications** in areas such as systems biol- ogy, chemistry and nanotechnology (Kim et al. 2007; Pankratov and Uchaeva 2000; Zhao et al. 2002). We chose a version of BFGS with limited memory usage and box constraints, namely the L-BFGS-B method, already implemented in the Scipy library. The lim- ited memory aspect of the implementation means that the whole gradient history is not considered when the Hessian is calculated, thereby saving memory. L-BFGS-B combines the well known computational effectiveness of BFGS (Nocedal 1980; Fiore et al. 2003) with box constraints on the parameters. Such con- straints are necessary in our model since some para- meters must be positive. To further assess the com- putational cost and convergence properties of BFGS we compare it with a truncated Newton code (TNC) (Nash 1984; Nash and Sofer 1996; Schlick and Fogelson 1992) with parameter constraints (also implemented in Scipy), which is known to be robust in convergence but computationally costly. We use the SSQ error as the one to be minimised by the GF **algorithms**, in line with common practice.

Show more
20 Read more

In order to compare the overall performance of the resulting classifier for different choices of the loss function we performed a nested cross validation (cf. [72, 59]), too. In this way one obtains an unbiased estimate of the true classification error for each model. More precisely, we implemented a so-called two nested 10-fold cross validation, i. e. for an outer loop the whole set of images was split into ten disjoint **sets** used as test **sets** to obtain the classification error. For each test set the remaining **data** again was split into ten disjoint **sets** used in the inner loop. On the basis of these ten **sets** the 10-fold cross validation described above was performed to determine both the optimal kernel parameter and the regularization parameter. Once these parameters are determined, they

Show more
179 Read more

19 Read more

Background: Genome-sequencing projects are currently producing an enormous amount of new sequences and cause the rapid increasing of protein sequence databases. The unsupervised classification of these **data** into functional groups or families, clustering, has become one of the principal research objectives in structural and functional genomics. Computer programs to automatically and accurately classify sequences into families become a necessity. A significant number of methods have addressed the clustering of protein sequences and most of them can be categorized in three major groups: hierarchical, graph-based and partitioning methods. Among the various sequence clustering methods in literature, hierarchical and graph-based approaches have been widely used. Although partitioning clustering techniques are extremely used in other fields, few **applications** have been found in the field of protein sequence clustering. It is not fully demonstrated if partitioning methods can be applied to protein sequence **data** and if these methods can be efficient compared to the published clustering methods.

Show more
11 Read more

Abstract —Decision-tree induction **algorithms** are widely used in machine learning **applications** in which the goal is to extract knowledge from **data** and present it in a graphically intuitive way. The most successful strategy for inducing decision trees is the greedy top-down recursive approach, which has been continuously improved by researchers over the past 40 years. In this paper, we propose a paradigm shift in the research of decision trees: instead of proposing a new manually designed method for inducing decision trees, we propose automatically designing decision-tree induction **algorithms** tailored to a specific type of classification **data** set (or application domain). Follow- ing recent breakthroughs in the automatic design of machine learning **algorithms**, we propose a hyper-heuristic evolutionary algorithm called hyper-heuristic evolutionary algorithm for de- signing decision-tree **algorithms** (HEAD-DT) that evolves design components of top-down decision-tree induction **algorithms**. By the end of the evolution, we expect HEAD-DT to generate a new and possibly better decision-tree algorithm for a given application domain. We perform extensive experiments in 35 real-world microarray gene expression **data** **sets** to assess the performance of HEAD-DT, and compare it with very well known decision- tree **algorithms** such as C4.5, CART, and REPTree. Results show that HEAD-DT is capable of generating **algorithms** that significantly outperform the baseline manually designed decision- tree **algorithms** regarding predictive accuracy and F-measure.

Show more
21 Read more

Figure 3-8 shows the plot of machine 4, which is our bottleneck machine. On this plot we observe a good linear relation until it reaches its capacity value. When this machine reaches its capacity, increasing resource load does not increase the amount of output, causing the curve to level off. Figure 3-9, depicting the situation for the bottleneck machine, shows that when the machine approaches its capacity limit, the **data** is more scattered in that region. Note that in Figure 3-9, there are no observations at the right upper corner of the plot, since for given high release and WIP levels; we do not observe any low output observations. From the plots for both functional forms, we see that the observations appear to follow a concave functional form when the resource load or release for a given initial WIP increases. This is crucial in terms of implementing those functional forms in optimization models, since as discussed in Asmundsson et al. (2009), the formulation depends on the concavity assumption of the CF in order to obtain a **convex** set of constraints for the capacity.

Show more
176 Read more

A very common problem in diverse areas of mathematics and physical sciences con- sists of trying to find a point in the intersection of **convex** **sets**. This problem is referred to as the **convex** feasibility problem; its precise mathematical formulation is as follows. Find an x ∈ N

14 Read more

If we fit a linear model to the variable x we obtain the straight line depicted in purple in Fig. 2.2. This model is clearly miss-specified, since the true model used to generate the **data** contains also x 2 . This behavior is known as underfitting . On the other hand, we could always add higher-order exponents of the variable x until we achieve a perfect fit, since the model is still linear in the coefficients w. In general, if we have n **data** points we could achieve zero training error using a polynomial of degree n − 1. This is represented as the red line in the figure and it will clearly exhibit poor generalization because it is also learning the noise of the **data**. We simulate the test **data** as the two green points, also drawn from the same model, but not used to compute any of the fits. It is clear that they are much closer to the blue line than to the red line, even though the latter achieves a zero error in the training points. Thus we could consider the quadratic model to be a better estimation of the true model. As we mentioned before this is known as overfitting and it is a very important problem when **fitting** statistical models since in practice

Show more
200 Read more

Veljko Petrović et al. [2] has outlined how the temporal complexity of Graham’s Scan can be linearized provided it operates on a finite, countable subset of reals that can be represented on some digital computer. Future scope of this was to do the analysis of time complexity of different **algorithms** for **convex** hull problem. Jingfan Fan et al. [3] proposed a novel **convex** hull aided registration method (CHARM) to match two point **sets** subject to a non-rigid transformation. The proposed algorithm performs several state-of-the-art ones with respect to sampling, rotational angle, and **data** noise for on both synthetic and real **data**. The proposed CHARM algorithm also shows higher computational efficiency compared to these methods. GAO Yang et al. [7] proposed a quick **convex** hull building algorithm using grid and binary tree for the minimum **convex** building of planar point set. By comparing with the currently representative **algorithms** of minimum **convex** hull building, it is found that the proposed algorithm can better describe the profile of irregular objects and its time complexity is relatively low. Artem Potebnia et al. [8] proposed an Innovative algorithm for forming graph minimum **convex** hulls using the GPU. High speed and linear complexity of this method are achieved by distribution of the graph’s vertices into separate units and their filtering. Xujun Zhou et al. [9] Considered the time complexity and newly added samples and proposed an incremental **convex** hull algorithm based online Support Vector Regression (ICH-OSVR) , which can significantly reduce the time consuming and realize fast online learning when added a new sample. Found that the proposed method can save a lot of time and also more memory. L. Cinque+ et al. [15] proposed a parallel version of the Jarvis’ s march, realized using the BSP model and which takes O(nh/p) time (where p is the number of processors and n is the problem size) against the O(nh) complexity of the sequential algorithm. Purpose of this work was to present a very efficient parallel algorithm for computing the **convex** hull in the plane. Found that the theoretical performance of the algorithm, predicted using the BSP cost formula, closely match the actual running times of this implementation.

Show more
11 Read more

In this paper we present a model for locating multiple new facilities in **convex** **sets** with respect to multiple ex- isting facilities and demand points, then present a linear programming model for this problem with block norms. We use this results for the problem with fuzzy **data**. We also do this for rectilinear and infinity norms as special cases of block norms. Rectilinear distances have been taken as the scenario may be thought of in an urban set- ting. Study of this problem and its modeling has many **applications** in industry such as locating machines in a workshop.

Show more
We have proposed a quadratic programming calculation that gives the best least squares fit to **data** values contami- nated by random errors subject to nonnegative third divided differences. The method is suitable when the **data** exhibit a sigmoid trend, where a concave region is followed by a **convex** one. The method is also suitable when it would be better to employ non-positive instead of nonnegative divided differences, in which case a **convex** region precedes a concave one. The fit consists of a certain number of overlapping parabolae, which not only provides flexibility in **data** **fitting**, but also helps managing further operations with the **data** fit like interpolation, extrapolation, differentiation and integration. Moreover, the interval of the inflection point of the fit is provided automatically by the calculation.

Show more
In this paper, decision criteria have been characterized with respect to the choice of separating or not separating disjoint twin **convex** **sets** in small areas of an isolated neighborhood where twins impose potential negative ex- ternality on each other and where positive externality comes from the centre of any neighborhood. Wherever there are tradeoffs between blocking negative externality using separation technology and receiving positive ex- ternality, the key issue is the distance of the ray travelled from the centre of the neighborhood. When the dis- tance is low relative to other parameters, the positive externality is relatively strong and non-separation is con- sidered to be optimal. On the other hand, when distance is high, positive externality is low and separation be- comes optimal. Outliers (belonging far away from centre of a neighborhood) are better candidates for separation. Results are qualitatively same for independent and interdependent pair-wise contests.

Show more
11 Read more

E = {v ⊗ α : v ∈ V, α ∈ W }, and let Y =sp(E). Thus the σ (X, Y ) topology on X = B (V ) is the weak operator topology. Suppose π : C (K) → B (W ) is a bounded unital algebra homomor- phism. Suppose K is a compact Hausdorff space and let C (K) denote the continous functions from K to C . It was proved in [17] that A = π (C (K)) −σ(X,Y ) is reflexive, which in our ter- minonogy is E-reflexive. In addition it was shown in [1] that Y is E-elementary on A. Thus, by our Theorem 8, A is hereditarily ac-E-reflexive. Hence every absolutely **convex** subset of A that is closed in the weak operator topology is ac-E-reflexive. By Example 48, this means that if T ∈ B (V ) and for every v ∈ V , we have T v ∈ (Bv) −kk , then T ∈ B .

Show more
68 Read more

tions that are generalized s-**convex** in the ﬁrst sense, and in the second sense, respectively. It is well known that there are many important established inequalities for the class of generalized **convex** functions, however, one of the most famous is known as the general- ized Hermit-Hadamard inequality, or the ‘generalized Hadamard inequality’ and stated as follows (see []): let f be a generalized **convex** function on [a , a ] ⊆ R, a < a , then

16 Read more

Proof. We proceed by induction. Clearly, mpcc n (X) is homeomorphic to X and therefore is an AR-space. Assume that we have already shown that mpcc n−1 (X) is an AR-space. Without loss of generality, one may assume that X is a max-plus **convex** subset in a cube J k , where J is a closed segment in R . Let α: (J k ) n → mpcc n (J k ) be the map defined by the formula