Top PDF Fitting Convex Sets to Data: Algorithms and Applications

Fitting Convex Sets to Data: Algorithms and Applications

Fitting Convex Sets to Data: Algorithms and Applications

Informed choices of the lifting dimension q. In many settings in practice, a suitable choice of the lifting dimension q is usually not known in advance. Smaller values of q allow us to obtain more concisely described reconstructions, although such estimates may not fit the given data well; on the other hand, larger values of q provide better fidelity to the data and yield more complex reconstructions, but one runs the risk of over-fitting. A practically relevant question in our context is to design methods akin to cross-validation to choose q in a data-driven manner. We illustrate our ideas with the following stylized experiment. In this instance, we consider reconstructing the ` 1 -ball in R 3 from 100 measurements corrupted by additive Gaussian noise with standard deviation σ = 0.1. We partition the measurements into two subsets of equal size. Next, we apply our method with the choice of C = ∆ q as our lifting set on the first partition, and we evaluate the mean squared error (MSE) of our computed estimator on the second partition. We repeat the process across 50 different random partitions, and over values of q in {3, . . . , 10}. The left sub-plot of Figure 3.15 shows the MSE averaged over all partitions. We observe that the error decreases as q increases initially as models that are more expressive allow us to fit to the data better. We observe that the error subsequently remains approximately equal (instead of increasing, as one might expect), and this occurs because our regression restricts to convex sets, which prevents the MSE from growing unboundedly.
Show more

193 Read more

Algorithms and Applications for Spatial Data Mining

Algorithms and Applications for Spatial Data Mining

Our framework for spatial data mining is based on spatial neighbourhood relations between ob- jects and on the induced neighbourhood graphs and neighbourhood paths which can be defined with respect to these neighbourhood relations. Thus, we introduce a set of database primitives or basic operations for spatial data mining which are sufficient to express most of the spatial data min- ing algorithms from the literature. This approach has several advantages. Similar to the relational standard language SQL, the use of standard primitives will speed-up the development of new data mining algorithms and will also make them more portable. Second, we can develop techniques to efficiently support the proposed database primitives (e.g. by specialized index structures) thus speeding-up all data mining algorithms which are based on our database primitives. Moreover, our basic operations for spatial data mining can be integrated into commercial database management systems. This will offer additional benefits for data mining applications such as efficient storage management, prevention of inconsistencies, index structures to support different types of database queries which may be part of the data mining algorithms.
Show more

32 Read more

Applications of Data Mining Algorithms for Network

Applications of Data Mining Algorithms for Network

two studies proposed a sequential pattern mining technique that incorporated alert information. 327[r]

16 Read more

Efficient Algorithms and Applications in Topological Data Analysis

Efficient Algorithms and Applications in Topological Data Analysis

Figure 4.1 shows an example of this analysis. Initially, a mesh with a scalar function (see Figure 4.1(a)) is converted into a Reeb graph (see Figure 4.1(b)). After that, the critical points are paired, and the persistence diagram displays the data, as seen in Figures 4.1(c) and 4.1(d). This final step can still be challenging, particularly when considering essential critical points—those critical points associated with cycles in the Reeb graph. These require an expensive search that needs to be performed on each essential critical point. While many prior works have provided efficient algorithms for the calculation of Reeb graph structures themselves, to our knowledge, none have provided a detailed description of an algorithm for pairing critical points.
Show more

115 Read more

Some Improvements of Fuzzy Clustering Algorithms Using Picture Fuzzy Sets and Applications For Geographic Data Clustering

Some Improvements of Fuzzy Clustering Algorithms Using Picture Fuzzy Sets and Applications For Geographic Data Clustering

MapWindow is an open source GIS software that Windows users are familiar with and it is currently being developed and the latest version released continuously. MapWindow support plug-ins in the form of dynamic link libraries (.dll *), and the development environment such as Visual Studio Community Edition is available for free download. This tool supports using the language C# and dot.NET frame. Our implementation of the proposed algorithms to run experimental evaluation is conducted using C / C ++, therefore the Visual Studio development environment in the most suitable choice to put our source code into.
Show more

7 Read more

Using evolutionary algorithms for fitting high dimensional models to neuronal data

Using evolutionary algorithms for fitting high dimensional models to neuronal data

The BFGS method has proven successful in many real world applications in areas such as systems biol- ogy, chemistry and nanotechnology (Kim et al. 2007; Pankratov and Uchaeva 2000; Zhao et al. 2002). We chose a version of BFGS with limited memory usage and box constraints, namely the L-BFGS-B method, already implemented in the Scipy library. The lim- ited memory aspect of the implementation means that the whole gradient history is not considered when the Hessian is calculated, thereby saving memory. L-BFGS-B combines the well known computational effectiveness of BFGS (Nocedal 1980; Fiore et al. 2003) with box constraints on the parameters. Such con- straints are necessary in our model since some para- meters must be positive. To further assess the com- putational cost and convergence properties of BFGS we compare it with a truncated Newton code (TNC) (Nash 1984; Nash and Sofer 1996; Schlick and Fogelson 1992) with parameter constraints (also implemented in Scipy), which is known to be robust in convergence but computationally costly. We use the SSQ error as the one to be minimised by the GF algorithms, in line with common practice.
Show more

20 Read more

Fenchel duality-based algorithms for convex optimization problems with applications in machine learning and image restoration

Fenchel duality-based algorithms for convex optimization problems with applications in machine learning and image restoration

In order to compare the overall performance of the resulting classifier for different choices of the loss function we performed a nested cross validation (cf. [72, 59]), too. In this way one obtains an unbiased estimate of the true classification error for each model. More precisely, we implemented a so-called two nested 10-fold cross validation, i. e. for an outer loop the whole set of images was split into ten disjoint sets used as test sets to obtain the classification error. For each test set the remaining data again was split into ten disjoint sets used in the inner loop. On the basis of these ten sets the 10-fold cross validation described above was performed to determine both the optimal kernel parameter and the regularization parameter. Once these parameters are determined, they
Show more

179 Read more

QUALITY AND ACCURACY OF CLUSTERING ALGORITHMS ON BIG DATA SETS  USING HADOOP

QUALITY AND ACCURACY OF CLUSTERING ALGORITHMS ON BIG DATA SETS USING HADOOP

Data is more than just a DataWarehouse that requires to store and analysis large volume of data. The volume Big Data is the term for a collection of data sets so large and complex that it becomes difficult to process using on hand database management tools or traditional data processing applications. The challenges include captureing or gathering,curation or activity or process, data base maintained, search, sharing, transfer, analysis, and visualization. The trend methods and technology to large amount of datasets is due to the additional information derivable from analysis of a single large set of relevant fallowing 1. Big Data is about Massive data volume 2. Big Data run on Hadoop 3. Big Data means unstructured and structured data 4. Big Data is for social media feeds and sentiment analysis5. NoSQL means No SQL.Big Data is the opportunity to extract insight from an immense volume, variety and velocity of data.
Show more

19 Read more

Partitioning clustering algorithms for protein sequence data sets

Partitioning clustering algorithms for protein sequence data sets

Background: Genome-sequencing projects are currently producing an enormous amount of new sequences and cause the rapid increasing of protein sequence databases. The unsupervised classification of these data into functional groups or families, clustering, has become one of the principal research objectives in structural and functional genomics. Computer programs to automatically and accurately classify sequences into families become a necessity. A significant number of methods have addressed the clustering of protein sequences and most of them can be categorized in three major groups: hierarchical, graph-based and partitioning methods. Among the various sequence clustering methods in literature, hierarchical and graph-based approaches have been widely used. Although partitioning clustering techniques are extremely used in other fields, few applications have been found in the field of protein sequence clustering. It is not fully demonstrated if partitioning methods can be applied to protein sequence data and if these methods can be efficient compared to the published clustering methods.
Show more

11 Read more

Evolutionary design of decision-tree algorithms tailored to microarray gene expression data sets

Evolutionary design of decision-tree algorithms tailored to microarray gene expression data sets

Abstract —Decision-tree induction algorithms are widely used in machine learning applications in which the goal is to extract knowledge from data and present it in a graphically intuitive way. The most successful strategy for inducing decision trees is the greedy top-down recursive approach, which has been continuously improved by researchers over the past 40 years. In this paper, we propose a paradigm shift in the research of decision trees: instead of proposing a new manually designed method for inducing decision trees, we propose automatically designing decision-tree induction algorithms tailored to a specific type of classification data set (or application domain). Follow- ing recent breakthroughs in the automatic design of machine learning algorithms, we propose a hyper-heuristic evolutionary algorithm called hyper-heuristic evolutionary algorithm for de- signing decision-tree algorithms (HEAD-DT) that evolves design components of top-down decision-tree induction algorithms. By the end of the evolution, we expect HEAD-DT to generate a new and possibly better decision-tree algorithm for a given application domain. We perform extensive experiments in 35 real-world microarray gene expression data sets to assess the performance of HEAD-DT, and compare it with very well known decision- tree algorithms such as C4.5, CART, and REPTree. Results show that HEAD-DT is capable of generating algorithms that significantly outperform the baseline manually designed decision- tree algorithms regarding predictive accuracy and F-measure.
Show more

21 Read more

Fitting Clearing Functions to Empirical Data: Simulation Optimization and Heuristic Algorithms.

Fitting Clearing Functions to Empirical Data: Simulation Optimization and Heuristic Algorithms.

Figure 3-8 shows the plot of machine 4, which is our bottleneck machine. On this plot we observe a good linear relation until it reaches its capacity value. When this machine reaches its capacity, increasing resource load does not increase the amount of output, causing the curve to level off. Figure 3-9, depicting the situation for the bottleneck machine, shows that when the machine approaches its capacity limit, the data is more scattered in that region. Note that in Figure 3-9, there are no observations at the right upper corner of the plot, since for given high release and WIP levels; we do not observe any low output observations. From the plots for both functional forms, we see that the observations appear to follow a concave functional form when the resource load or release for a given initial WIP increases. This is crucial in terms of implementing those functional forms in optimization models, since as discussed in Asmundsson et al. (2009), the formulation depends on the concavity assumption of the CF in order to obtain a convex set of constraints for the capacity.
Show more

176 Read more

Iterative algorithms for common elements in fixed point sets and zero point sets with applications

Iterative algorithms for common elements in fixed point sets and zero point sets with applications

A very common problem in diverse areas of mathematics and physical sciences con- sists of trying to find a point in the intersection of convex sets. This problem is referred to as the convex feasibility problem; its precise mathematical formulation is as follows. Find an x ∈ N

14 Read more

Acceleration Methods for Classic Convex Optimization Algorithms

Acceleration Methods for Classic Convex Optimization Algorithms

If we fit a linear model to the variable x we obtain the straight line depicted in purple in Fig. 2.2. This model is clearly miss-specified, since the true model used to generate the data contains also x 2 . This behavior is known as underfitting . On the other hand, we could always add higher-order exponents of the variable x until we achieve a perfect fit, since the model is still linear in the coefficients w. In general, if we have n data points we could achieve zero training error using a polynomial of degree n − 1. This is represented as the red line in the figure and it will clearly exhibit poor generalization because it is also learning the noise of the data. We simulate the test data as the two green points, also drawn from the same model, but not used to compute any of the fits. It is clear that they are much closer to the blue line than to the red line, even though the latter achieves a zero error in the training points. Thus we could consider the quadratic model to be a better estimation of the true model. As we mentioned before this is known as overfitting and it is a very important problem when fitting statistical models since in practice
Show more

200 Read more

Analysis of Convex Hull's Algorithms and It's Application

Analysis of Convex Hull's Algorithms and It's Application

Veljko Petrović et al. [2] has outlined how the temporal complexity of Graham’s Scan can be linearized provided it operates on a finite, countable subset of reals that can be represented on some digital computer. Future scope of this was to do the analysis of time complexity of different algorithms for convex hull problem. Jingfan Fan et al. [3] proposed a novel convex hull aided registration method (CHARM) to match two point sets subject to a non-rigid transformation. The proposed algorithm performs several state-of-the-art ones with respect to sampling, rotational angle, and data noise for on both synthetic and real data. The proposed CHARM algorithm also shows higher computational efficiency compared to these methods. GAO Yang et al. [7] proposed a quick convex hull building algorithm using grid and binary tree for the minimum convex building of planar point set. By comparing with the currently representative algorithms of minimum convex hull building, it is found that the proposed algorithm can better describe the profile of irregular objects and its time complexity is relatively low. Artem Potebnia et al. [8] proposed an Innovative algorithm for forming graph minimum convex hulls using the GPU. High speed and linear complexity of this method are achieved by distribution of the graph’s vertices into separate units and their filtering. Xujun Zhou et al. [9] Considered the time complexity and newly added samples and proposed an incremental convex hull algorithm based online Support Vector Regression (ICH-OSVR) , which can significantly reduce the time consuming and realize fast online learning when added a new sample. Found that the proposed method can save a lot of time and also more memory. L. Cinque+ et al. [15] proposed a parallel version of the Jarvis’ s march, realized using the BSP model and which takes O(nh/p) time (where p is the number of processors and n is the problem size) against the O(nh) complexity of the sequential algorithm. Purpose of this work was to present a very efficient parallel algorithm for computing the convex hull in the plane. Found that the theoretical performance of the algorithm, predicted using the BSP cost formula, closely match the actual running times of this implementation.
Show more

11 Read more

Locating Multiple Facilities in Convex Sets with Fuzzy Data and Block Norms

Locating Multiple Facilities in Convex Sets with Fuzzy Data and Block Norms

In this paper we present a model for locating multiple new facilities in convex sets with respect to multiple ex- isting facilities and demand points, then present a linear programming model for this problem with block norms. We use this results for the problem with fuzzy data. We also do this for rectilinear and infinity norms as special cases of block norms. Rectilinear distances have been taken as the scenario may be thought of in an urban set- ting. Study of this problem and its modeling has many applications in industry such as locating machines in a workshop.
Show more

9 Read more

Applications of the Discrete Least Squares 3-Convex Fit To Sigmoid Data

Applications of the Discrete Least Squares 3-Convex Fit To Sigmoid Data

We have proposed a quadratic programming calculation that gives the best least squares fit to data values contami- nated by random errors subject to nonnegative third divided differences. The method is suitable when the data exhibit a sigmoid trend, where a concave region is followed by a convex one. The method is also suitable when it would be better to employ non-positive instead of nonnegative divided differences, in which case a convex region precedes a concave one. The fit consists of a certain number of overlapping parabolae, which not only provides flexibility in data fitting, but also helps managing further operations with the data fit like interpolation, extrapolation, differentiation and integration. Moreover, the interval of the inflection point of the fit is provided automatically by the calculation.
Show more

6 Read more

Optimal Separation of Twin Convex Sets under Externalities

Optimal Separation of Twin Convex Sets under Externalities

In this paper, decision criteria have been characterized with respect to the choice of separating or not separating disjoint twin convex sets in small areas of an isolated neighborhood where twins impose potential negative ex- ternality on each other and where positive externality comes from the centre of any neighborhood. Wherever there are tradeoffs between blocking negative externality using separation technology and receiving positive ex- ternality, the key issue is the distance of the ray travelled from the centre of the neighborhood. When the dis- tance is low relative to other parameters, the positive externality is relatively strong and non-separation is con- sidered to be optimal. On the other hand, when distance is high, positive externality is low and separation be- comes optimal. Outliers (belonging far away from centre of a neighborhood) are better candidates for separation. Results are qualitatively same for independent and interdependent pair-wise contests.
Show more

11 Read more

General Reflexivity For Absolutely Convex Sets, Mahtab Lak

General Reflexivity For Absolutely Convex Sets, Mahtab Lak

E = {v ⊗ α : v ∈ V, α ∈ W }, and let Y =sp(E). Thus the σ (X, Y ) topology on X = B (V ) is the weak operator topology. Suppose π : C (K) → B (W ) is a bounded unital algebra homomor- phism. Suppose K is a compact Hausdorff space and let C (K) denote the continous functions from K to C . It was proved in [17] that A = π (C (K)) −σ(X,Y ) is reflexive, which in our ter- minonogy is E-reflexive. In addition it was shown in [1] that Y is E-elementary on A. Thus, by our Theorem 8, A is hereditarily ac-E-reflexive. Hence every absolutely convex subset of A that is closed in the weak operator topology is ac-E-reflexive. By Example 48, this means that if T ∈ B (V ) and for every v ∈ V , we have T v ∈ (Bv) −kk , then T ∈ B .
Show more

68 Read more

Notions of generalized s convex functions on fractal sets

Notions of generalized s convex functions on fractal sets

tions that are generalized s-convex in the first sense, and in the second sense, respectively. It is well known that there are many important established inequalities for the class of generalized convex functions, however, one of the most famous is known as the general- ized Hermit-Hadamard inequality, or the ‘generalized Hadamard inequality’ and stated as follows (see []): let f be a generalized convex function on [a  , a  ] ⊆ R, a  < a  , then

16 Read more

On hyperspaces of max-plus and max-min convex sets

On hyperspaces of max-plus and max-min convex sets

Proof. We proceed by induction. Clearly, mpcc n (X) is homeomorphic to X and therefore is an AR-space. Assume that we have already shown that mpcc n−1 (X) is an AR-space. Without loss of generality, one may assume that X is a max-plus convex subset in a cube J k , where J is a closed segment in R . Let α: (J k ) n → mpcc n (J k ) be the map defined by the formula

7 Read more

Show all 10000 documents...