Top PDF Fitting Convex Sets to Data: Algorithms and Applications

Fitting Convex Sets to Data: Algorithms and Applications

Fitting Convex Sets to Data: Algorithms and Applications

guarantees for reliable change-point estimation that require the underlying signal to remain unchanged for long portions of the sequence. In this chapter, we propose a new approach for estimating change-points in high-dimensional signals by integrating ideas from atomic norm regulariza- tion with the filtered derivative framework. The atomic norm regularization step is based on solving a convex optimization instance, and it exploits latent low-dimensional structure that is frequently found in signals encountered in practice. The specific choice of regularization may be selected based on prior knowledge about the data, or it may be learned from data using the ideas from Chapter 2. Our algorithm is well-suited to the high-dimensional setting both in terms of computational scalability and of statistical efficiency. More precisely, our main result shows that our method performs change-point esti- mation reliably as long as the product of the smallest-sized change (measured in terms of the Euclidean-norm-squared of the difference between signals at a change-point) and the smallest distance between change-points (as the number of time instances) are larger than a Gaussian width parameter that character- izes the low-dimensional complexity of the underlying signal sequence. Last, our method is applicable in online settings as it operates on small portions of the sequence of observations at a time.
Show more

193 Read more

Algorithms and Applications for Spatial Data Mining

Algorithms and Applications for Spatial Data Mining

framework. Furthermore, example applications are discussed for these algorithms. The last section gives a short summary and shows some directions for future research. 2 A Database-Oriented Framework for Spatial Data Mining Our framework for spatial data mining is based on spatial neighbourhood relations between ob- jects and on the induced neighbourhood graphs and neighbourhood paths which can be defined with respect to these neighbourhood relations. Thus, we introduce a set of database primitives or basic operations for spatial data mining which are sufficient to express most of the spatial data min- ing algorithms from the literature. This approach has several advantages. Similar to the relational standard language SQL, the use of standard primitives will speed-up the development of new data mining algorithms and will also make them more portable. Second, we can develop techniques to efficiently support the proposed database primitives (e.g. by specialized index structures) thus speeding-up all data mining algorithms which are based on our database primitives. Moreover, our basic operations for spatial data mining can be integrated into commercial database management systems. This will offer additional benefits for data mining applications such as efficient storage management, prevention of inconsistencies, index structures to support different types of database queries which may be part of the data mining algorithms.
Show more

32 Read more

Some Improvements of Fuzzy Clustering Algorithms Using Picture Fuzzy Sets and Applications For Geographic Data Clustering

Some Improvements of Fuzzy Clustering Algorithms Using Picture Fuzzy Sets and Applications For Geographic Data Clustering

This paper summarizes the major findings of the research project under the code name QG.14.60. The research aims to enhancement of some fuzzy clustering methods by the mean of more generalized fuzzy sets. The main results are: (1) Improve a distributed fuzzy clustering method for big data using picture fuzzy sets; design a novel method called DPFCM to reduce communication cost using the facilitator model (instead of the peer-to-peer model) and the picture fuzzy sets. The experimental evaluations show that the clustering quality of DPFCM is better than the original algorithm while ensuring reasonable computational time. (2) Apply picture fuzzy clustering for weather nowcasting problems in a novel method called PFS-STAR that integrates the STAR technique and picture fuzzy clustering to enhance the forecast accuracy. Experimental results on the satellite image sequences show that the proposed method is better than the related works, especially in rain predicting. (3) Develop a GIS plug-in software that implemented some improved fuzzy clustering algorithms. The tool supports access to spatial databases and visualization of clustering results in thematic map layers.
Show more

7 Read more

Authors. Data Clustering: Algorithms and Applications

Authors. Data Clustering: Algorithms and Applications

By integrating density-based, grid-based, and subspace clustering, CLIQUE discovers clusters embedded in subspaces of high dimensional data without requiring users to select subspaces of interest. The DNF expressions for the clusters give a clear representation of clustering results. The time complexity of CLIQUE is O(c p + pN ), where p is the highest subspace dimension selected, N is the number of input points, and c is a constant; this grows exponentially with respect to p. The algorithm offers an effective, efficient method of pruning the space of dense units in order to counter the inherent exponential nature of the problem. However, there is a trade-off for the pruning of dense units in the subspaces with low coverage. While the algorithm is faster, there is an increased likelihood of missing clusters. In addition, while CLIQUE does not require users to select subspaces of interest, its susceptibility to noise and ability to identify relevant attributes is highly dependent on the user’s choice of unit intervals, m, and sensitivity threshold, τ .
Show more

29 Read more

Applications of Data Mining Algorithms for Network

Applications of Data Mining Algorithms for Network

two studies proposed a sequential pattern mining technique that incorporated alert information. 327[r]

16 Read more

Compaction Algorithms for Non-Convex Polygons and Their Applications

Compaction Algorithms for Non-Convex Polygons and Their Applications

Remark 4.1 Actually, manufacturers would not mind if the polygons \leap-fogged" over each other on the way to a more compact layout. The mixed integer programming gener- alization of this optimization model (see Chapter 8) does exactly that. Remark 4.2 When applying the locality heuristic, we assume that the two polygons in- volved are both star-shaped (or at least their Minkowski sum is star-shaped). We have encountered data containing non-starshaped polygons. Our studies have shown that all these polygons can be expressed as a union of two and very rarely three starshaped poly- gons. For those non-starshaped polygons, we use a decomposition algorithm 3 to decompose the polygon into a small number of starshaped components. The locality heuristic is then applied to each pair of components of the two polygons.
Show more

159 Read more

Efficient Algorithms and Data Structures for Massive Data Sets

Efficient Algorithms and Data Structures for Massive Data Sets

4. Delete(S, x): Delete element x from S 5. Delete(S, k): Delete the element with key k from S Algorithmic applications of priority queues abound [4, 25]. Soft heap is an approximate meldable priority queue devised by Chazelle [19], and supports Insert, Findmin, Deletemin, Delete, and Meld operations. A soft heap, may at its discretion, corrupt the keys of some elements in it, by revising them upwards. A Findmin returns the element with the smallest current key, which may or may not be corrupt. A soft heap guarantees that the number of corrupt elements in it is never more than N , where N is the total number of items inserted in it, and  is a parameter of it called the error-rate. A Meld operation merges two soft heaps into one new soft heap.
Show more

144 Read more

Partitioning clustering algorithms for protein sequence data sets

Partitioning clustering algorithms for protein sequence data sets

Abstract Background: Genome-sequencing projects are currently producing an enormous amount of new sequences and cause the rapid increasing of protein sequence databases. The unsupervised classification of these data into functional groups or families, clustering, has become one of the principal research objectives in structural and functional genomics. Computer programs to automatically and accurately classify sequences into families become a necessity. A significant number of methods have addressed the clustering of protein sequences and most of them can be categorized in three major groups: hierarchical, graph-based and partitioning methods. Among the various sequence clustering methods in literature, hierarchical and graph-based approaches have been widely used. Although partitioning clustering techniques are extremely used in other fields, few applications have been found in the field of protein sequence clustering. It is not fully demonstrated if partitioning methods can be applied to protein sequence data and if these methods can be efficient compared to the published clustering methods.
Show more

11 Read more

Extracting Algorithms by Indexing and Mining Large Data Sets

Extracting Algorithms by Indexing and Mining Large Data Sets

VI. FUTURE WORK Future work would be the semantic analysis of algorithms, their trends, and how algorithms influence each other over time. Such analyses would give rise to multiple applications that could improve algorithm search. We can add document files also; Robustness of the system will increase.

7 Read more

Similarity Search: Algorithms for Sets and other High Dimensional Data

Similarity Search: Algorithms for Sets and other High Dimensional Data

A classic problem in high dimensional geometry has been whether data structures existed for (c, r)-Approximate Near Neighbour with Las Vegas guarantees, and per- formance matching that of Locality Sensitive Hashing. That is, whether we could guarantee that a query will always return an approximate near neighbour, if a near neighbour exists; or simply, if we could rule out false negatives? The problem has seen practical importance as well as theoretical. There is in general no way of verifying that an LSH algorithm is correct when it says ‘no near neighbours’ - other than iterating over every point in the set, in which case the data structure is entirely pointless. This means LSH algorithms can’t be used for many critical applications, such as finger print data bases. Even more applied, it has been observed that tuning the error probability parameter is hard to do well, when implementing LSH [84, 34]. A Las Vegas data structure entirely removes this problem. Different authors have described the problem with different names, such as ‘Las Vegas’ [91], ‘Have no false negatives’ [86, 148], ‘Have total recall’ [156], ‘Are exact’ [32] and ‘Are explicit’ [108].
Show more

158 Read more

Fitting Tractable Convex Sets to Support Function Evaluations

Fitting Tractable Convex Sets to Support Function Evaluations

The geometric problem of estimating an unknown compact convex set from evaluations of its support function arises in a range of scientific and engineering applications. Traditional ap- proaches typically rely on estimators that minimize the error over all possible compact convex sets; in particular, these methods do not allow for the incorporation of prior structural informa- tion about the underlying set and the resulting estimates become increasingly more complicated to describe as the number of measurements available grows. We address both of these short- comings by describing a framework for estimating tractably specified convex sets from support function evaluations. Building on the literature in convex optimization, our approach is based on estimators that minimize the error over structured families of convex sets that are specified as linear images of concisely described sets – such as the simplex or the free spectrahedron – in a higher-dimensional space that is not much larger than the ambient space. Convex sets parametrized in this manner are significant from a computational perspective as one can opti- mize linear functionals over such sets efficiently; they serve a different purpose in the inferential context of the present paper, namely, that of incorporating regularization in the reconstruction while still offering considerable expressive power. We provide a geometric characterization of the asymptotic behavior of our estimators, and our analysis relies on the property that certain sets which admit semialgebraic descriptions are Vapnik-Chervonenkis (VC) classes. Our numer- ical experiments highlight the utility of our framework over previous approaches in settings in which the measurements available are noisy or small in number as well as those in which the underlying set to be reconstructed is non-polyhedral.
Show more

35 Read more

Iterative algorithms for common elements in fixed point sets and zero point sets with applications

Iterative algorithms for common elements in fixed point sets and zero point sets with applications

The class of strictly pseudocontractive mappings was introduced by Browder and Petryshyn [16]. If = 0, the class of strictly pseudocontractive mappings is reduced to the class of nonexpansive mappings. In case that = 1, we call S a pseudocontractive mapping. Marino and Xu [17] proved that fixed point sets of strictly pseudocontractive mappings are closed and convex. They also proved that I - S is demi-closed at zero. To be more precise, if {x n } is a sequence in C with x n ⇀ x and x n - Sx n ® 0, then x Î

14 Read more

Locating Multiple Facilities in Convex Sets with Fuzzy Data and Block Norms

Locating Multiple Facilities in Convex Sets with Fuzzy Data and Block Norms

We use this results for the problem with fuzzy data. We also do this for rectilinear and infinity norms as special cases of block norms. Rectilinear distances have been taken as the scenario may be thought of in an urban set- ting. Study of this problem and its modeling has many applications in industry such as locating machines in a workshop.

9 Read more

Using evolutionary algorithms for fitting high dimensional models to neuronal data

Using evolutionary algorithms for fitting high dimensional models to neuronal data

show sets of tuning curves for four sample neurons to illustrate some of the characteristic behaviours that can be seen. Our aim was to fit a multi-stage nonlinear model to the data in an attempt to capture all of these tun- ing characteristics in individual neurons. The dataset consisted of tuning curves for 107 cells, collected from 13 macaques. The data are somewhat noisy for vari- ous reasons, both physiological and technical. Neurons themselves can adapt; changing their responses to the same stimuli after prolonged stimulation. The nature of in vivo extracellular recordings means that spikes can be missed as the target neuron moves closer or further from the electrode during recording sessions. Lastly, in the particular recordings made here for which the ex- perimenter primarily wanted to establish the preferred orientation, for example, without necessarily needing a full high-quality characterisation of the entire tuning curve and so there are numerous cells for which the confidence interval for some datapoints is rather large.
Show more

20 Read more

Fitting Clearing Functions to Empirical Data: Simulation Optimization and Heuristic Algorithms.

Fitting Clearing Functions to Empirical Data: Simulation Optimization and Heuristic Algorithms.

machine 3, the data is spread over the area. Figure 3-6 shows the plot for machine 7, another unreliable machine, which is very similar to that for machine 3. Figure 3-8 shows the plot of machine 4, which is our bottleneck machine. On this plot we observe a good linear relation until it reaches its capacity value. When this machine reaches its capacity, increasing resource load does not increase the amount of output, causing the curve to level off. Figure 3-9, depicting the situation for the bottleneck machine, shows that when the machine approaches its capacity limit, the data is more scattered in that region. Note that in Figure 3-9, there are no observations at the right upper corner of the plot, since for given high release and WIP levels; we do not observe any low output observations. From the plots for both functional forms, we see that the observations appear to follow a concave functional form when the resource load or release for a given initial WIP increases. This is crucial in terms of implementing those functional forms in optimization models, since as discussed in Asmundsson et al. (2009), the formulation depends on the concavity assumption of the CF in order to obtain a convex set of constraints for the capacity.
Show more

176 Read more

Tractable fitting with convex polynomials via sum-of-squares

Tractable fitting with convex polynomials via sum-of-squares

Stanford, CA 94305-9510 Email: boyd@stanford.edu Abstract— We consider the problem of fitting given data (u 1 , y 1 ), . . . , (u m , y m ) where u i ∈ R n and y i ∈ R with a convex polynomial f. A technique to solve this problem using sum of squares polynomials is presented. This technique is extended to enforce convexity of f only on a specified region. Also, an algorithm to fit the convex hull of a set of points with a convex sub-level set of a polynomial is presented. This problem is a natural extension of the problem of finding the minimum volume ellipsoid covering a set. The algorithm, like that for the minimum volume ellipsoid problem, has the property of being invariant to affine coordinate transformations. We generalize this technique to fit arbitrary unions and intersections of polynomial sub-level sets.
Show more

6 Read more

Accurate and efficient clustering algorithms for very large data sets

Accurate and efficient clustering algorithms for very large data sets

In this chapter we present and discuss computational results using small size data sets. All data sets contain only numeric attributes and they do not contain missing values. First, we give a brief description of data sets, then present results. These re- sults include optimal values of the cluster function obtained by each algorithm and CPU time required by them. The following algorithms are used for comparison: the global k-means algorithm (GKM), the multi-start modified global k-means al- gorithm (MS-MGKM), the multi-start k-means algorithm (MS-KM), the difference of convex clustering algorithm (DCA), the clustering algorithm based on the differ- ence of convex representation of the cluster function and nonsmooth optimization (DCClust) and two algorithms proposed in this thesis: the fast multi-start modi- fied global k-means algorithm without weights (FMS-MGKM2) and with weights (FMS-MGKM). The description of these algorithms can be found in Chapter 4.
Show more

152 Read more

Enhancement of Sandwich Algorithms for Approximating Higher Dimensional Convex Pareto Sets

Enhancement of Sandwich Algorithms for Approximating Higher Dimensional Convex Pareto Sets

properties is that calculating an upper bound for this measure based on IP S and OP S can be done by solving a number of LP-problems when using dummy points. To explain the general idea behind these dummy points and the advantages, we use the bi-criteria example in Figure 1. Although the problem with ’undesirable’ normals does not occur for bi-criteria problems, we use a bi-criteria example to make the explanation of the general idea behind the dummy points easier. In Figure 1, the shaded area represents Z and the points A, B, C, D, and E are the current extreme points of IP S. As the arrows indicate, facets AB and DE both have ’undesirable’ normals. Using these normals in the weighted sum method means that we search in the direction of the arrows for the point furthest away from the facet. This would in both cases result in a non-Pareto solution. In Figure 2, two dummy points are added for each extreme point z ∈ IP S E . The dummy points are created by replacing one of the two coordinates of z by a large value (An exact definition of the dummy points is given in Section 4.2). Once the dummy points are created, the set IP S is replaced by the convex hull of IP S and all dummy points. All facets containing at least one IP S-point now have a normal with only non-negative elements. All other facets, containing only dummy points, do not satisfy the upper bound constraint on the objectives and are thus not relevant.
Show more

39 Read more

Convex sets and inequalities

Convex sets and inequalities

1. Concept and fundamental result Given a natural correspondence between a family of inequalities and a closed convex set in a topological linear space, one might expect that an inequality corresponding to a special point (e.g., an extreme point) would be of special interest in view of the convex analysis theory. In this paper, we realize this concept.

11 Read more

Inscribing an axially symmetric polygon and other approximation algorithms for planar convex sets

Inscribing an axially symmetric polygon and other approximation algorithms for planar convex sets

There are a number of papers that study the best inner approximation of any convex set by a symmetric set; the distance to a symmetric set can be considered a measure of its symmetry [11]. Lower bounds for this distance are given by the Löwner–John ellipsoid [13]: any planar convex body C lies between two homothetic ellipses E ⊂ C ⊂ 2E with homothety ratio at most 2. Since any ellipse is axially symmetric, and area(E) = 1 4 area(2E)  1 4 area(C), any convex planar set C contains an axially symmetric subset with at least 1/4 of the area of C. The same lower bound of 1/4 follows from the fact that any planar convex body lies between two homothetic rectangles with homothety ratio at most two [16,19]. The lower bound can be raised to 2/3 [15], a bound that is not known to be tight. Bounds are also known for specific axis-symmetric inscribed figures, such as isosceles triangles or kites [21].
Show more

13 Read more

Show all 10000 documents...