guarantees for reliable change-point estimation that require the underlying signal to remain unchanged for long portions of the sequence.
In this chapter, we propose a new approach for estimating change-points in high-dimensional signals by integrating ideas from atomic norm regulariza- tion with the filtered derivative framework. The atomic norm regularization step is based on solving a **convex** optimization instance, and it exploits latent low-dimensional structure that is frequently found in signals encountered in practice. The specific choice of regularization may be selected based on prior knowledge about the **data**, or it may be learned from **data** using the ideas from Chapter 2. Our algorithm is well-suited to the high-dimensional setting both in terms of computational scalability and of statistical efficiency. More precisely, our main result shows that our method performs change-point esti- mation reliably as long as the product of the smallest-sized change (measured in terms of the Euclidean-norm-squared of the difference between signals at a change-point) and the smallest distance between change-points (as the number of time instances) are larger than a Gaussian width parameter that character- izes the low-dimensional complexity of the underlying signal sequence. Last, our method is applicable in online settings as it operates on small portions of the sequence of observations at a time.

Show more
193 Read more

framework. Furthermore, example **applications** are discussed for these **algorithms**. The last section gives a short summary and shows some directions for future research.
2 A Database-Oriented Framework for Spatial **Data** Mining
Our framework for spatial **data** mining is based on spatial neighbourhood relations between ob- jects and on the induced neighbourhood graphs and neighbourhood paths which can be defined with respect to these neighbourhood relations. Thus, we introduce a set of database primitives or basic operations for spatial **data** mining which are sufficient to express most of the spatial **data** min- ing **algorithms** from the literature. This approach has several advantages. Similar to the relational standard language SQL, the use of standard primitives will speed-up the development of new **data** mining **algorithms** and will also make them more portable. Second, we can develop techniques to efficiently support the proposed database primitives (e.g. by specialized index structures) thus speeding-up all **data** mining **algorithms** which are based on our database primitives. Moreover, our basic operations for spatial **data** mining can be integrated into commercial database management systems. This will offer additional benefits for **data** mining **applications** such as efficient storage management, prevention of inconsistencies, index structures to support different types of database queries which may be part of the **data** mining **algorithms**.

Show more
32 Read more

This paper summarizes the major findings of the research project under the code name QG.14.60. The research aims to enhancement of some fuzzy clustering methods by the mean of more generalized fuzzy **sets**. The main results are: (1) Improve a distributed fuzzy clustering method for big **data** using picture fuzzy **sets**; design a novel method called DPFCM to reduce communication cost using the facilitator model (instead of the peer-to-peer model) and the picture fuzzy **sets**. The experimental evaluations show that the clustering quality of DPFCM is better than the original algorithm while ensuring reasonable computational time. (2) Apply picture fuzzy clustering for weather nowcasting problems in a novel method called PFS-STAR that integrates the STAR technique and picture fuzzy clustering to enhance the forecast accuracy. Experimental results on the satellite image sequences show that the proposed method is better than the related works, especially in rain predicting. (3) Develop a GIS plug-in software that implemented some improved fuzzy clustering **algorithms**. The tool supports access to spatial databases and visualization of clustering results in thematic map layers.

Show more
By integrating density-based, grid-based, and subspace clustering, CLIQUE discovers clusters embedded in subspaces of high dimensional **data** without requiring users to select subspaces of interest. The DNF expressions for the clusters give a clear representation of clustering results. The time complexity of CLIQUE is O(c p + pN ), where p is the highest subspace dimension selected, N is the number of input points, and c is a constant; this grows exponentially with respect to p. The algorithm oﬀers an eﬀective, eﬃcient method of pruning the space of dense units in order to counter the inherent exponential nature of the problem. However, there is a trade-oﬀ for the pruning of dense units in the subspaces with low coverage. While the algorithm is faster, there is an increased likelihood of missing clusters. In addition, while CLIQUE does not require users to select subspaces of interest, its susceptibility to noise and ability to identify relevant attributes is highly dependent on the user’s choice of unit intervals, m, and sensitivity threshold, τ .

Show more
29 Read more

two studies proposed a sequential pattern mining technique that incorporated alert information. 327[r]

16 Read more

Remark 4.1 Actually, manufacturers would not mind if the polygons \leap-fogged" over each other on the way to a more compact layout. The mixed integer programming gener- alization of this optimization model (see Chapter 8) does exactly that.
Remark 4.2 When applying the locality heuristic, we assume that the two polygons in- volved are both star-shaped (or at least their Minkowski sum is star-shaped). We have encountered **data** containing non-starshaped polygons. Our studies have shown that all these polygons can be expressed as a union of two and very rarely three starshaped poly- gons. For those non-starshaped polygons, we use a decomposition algorithm 3 to decompose the polygon into a small number of starshaped components. The locality heuristic is then applied to each pair of components of the two polygons.

Show more
159 Read more

4. Delete(S, x): Delete element x from S
5. Delete(S, k): Delete the element with key k from S Algorithmic **applications** of priority queues abound [4, 25].
Soft heap is an approximate meldable priority queue devised by Chazelle [19], and supports Insert, Findmin, Deletemin, Delete, and Meld operations. A soft heap, may at its discretion, corrupt the keys of some elements in it, by revising them upwards. A Findmin returns the element with the smallest current key, which may or may not be corrupt. A soft heap guarantees that the number of corrupt elements in it is never more than N , where N is the total number of items inserted in it, and is a parameter of it called the error-rate. A Meld operation merges two soft heaps into one new soft heap.

Show more
144 Read more

Abstract
Background: Genome-sequencing projects are currently producing an enormous amount of new sequences and cause the rapid increasing of protein sequence databases. The unsupervised classification of these **data** into functional groups or families, clustering, has become one of the principal research objectives in structural and functional genomics. Computer programs to automatically and accurately classify sequences into families become a necessity. A significant number of methods have addressed the clustering of protein sequences and most of them can be categorized in three major groups: hierarchical, graph-based and partitioning methods. Among the various sequence clustering methods in literature, hierarchical and graph-based approaches have been widely used. Although partitioning clustering techniques are extremely used in other fields, few **applications** have been found in the field of protein sequence clustering. It is not fully demonstrated if partitioning methods can be applied to protein sequence **data** and if these methods can be efficient compared to the published clustering methods.

Show more
11 Read more

VI. FUTURE WORK
Future work would be the semantic analysis of **algorithms**, their trends, and how **algorithms** influence each other over time. Such analyses would give rise to multiple **applications** that could improve algorithm search. We can add document files also; Robustness of the system will increase.

A classic problem in high dimensional geometry has been whether **data** structures existed for (c, r)-Approximate Near Neighbour with Las Vegas guarantees, and per- formance matching that of Locality Sensitive Hashing. That is, whether we could guarantee that a query will always return an approximate near neighbour, if a near neighbour exists; or simply, if we could rule out false negatives? The problem has seen practical importance as well as theoretical. There is in general no way of verifying that an LSH algorithm is correct when it says ‘no near neighbours’ - other than iterating over every point in the set, in which case the **data** structure is entirely pointless. This means LSH **algorithms** can’t be used for many critical **applications**, such as finger print **data** bases. Even more applied, it has been observed that tuning the error probability parameter is hard to do well, when implementing LSH [84, 34]. A Las Vegas **data** structure entirely removes this problem. Different authors have described the problem with different names, such as ‘Las Vegas’ [91], ‘Have no false negatives’ [86, 148], ‘Have total recall’ [156], ‘Are exact’ [32] and ‘Are explicit’ [108].

Show more
158 Read more

The geometric problem of estimating an unknown compact **convex** set from evaluations of its support function arises in a range of scientific and engineering **applications**. Traditional ap- proaches typically rely on estimators that minimize the error over all possible compact **convex** **sets**; in particular, these methods do not allow for the incorporation of prior structural informa- tion about the underlying set and the resulting estimates become increasingly more complicated to describe as the number of measurements available grows. We address both of these short- comings by describing a framework for estimating tractably specified **convex** **sets** from support function evaluations. Building on the literature in **convex** optimization, our approach is based on estimators that minimize the error over structured families of **convex** **sets** that are specified as linear images of concisely described **sets** – such as the simplex or the free spectrahedron – in a higher-dimensional space that is not much larger than the ambient space. **Convex** **sets** parametrized in this manner are significant from a computational perspective as one can opti- mize linear functionals over such **sets** efficiently; they serve a different purpose in the inferential context of the present paper, namely, that of incorporating regularization in the reconstruction while still offering considerable expressive power. We provide a geometric characterization of the asymptotic behavior of our estimators, and our analysis relies on the property that certain **sets** which admit semialgebraic descriptions are Vapnik-Chervonenkis (VC) classes. Our numer- ical experiments highlight the utility of our framework over previous approaches in settings in which the measurements available are noisy or small in number as well as those in which the underlying set to be reconstructed is non-polyhedral.

Show more
35 Read more

The class of strictly pseudocontractive mappings was introduced by Browder and Petryshyn [16]. If = 0, the class of strictly pseudocontractive mappings is reduced to the class of nonexpansive mappings. In case that = 1, we call S a pseudocontractive mapping. Marino and Xu [17] proved that fixed point **sets** of strictly pseudocontractive mappings are closed and **convex**. They also proved that I - S is demi-closed at zero. To be more precise, if {x n } is a sequence in C with x n ⇀ x and x n - Sx n ® 0, then x Î

14 Read more

We use this results for the problem with fuzzy **data**. We also do this for rectilinear and infinity norms as special cases of block norms. Rectilinear distances have been taken as the scenario may be thought of in an urban set- ting. Study of this problem and its modeling has many **applications** in industry such as locating machines in a workshop.

show **sets** of tuning curves for four sample neurons to illustrate some of the characteristic behaviours that can be seen.
Our aim was to fit a multi-stage nonlinear model to the **data** in an attempt to capture all of these tun- ing characteristics in individual neurons. The dataset consisted of tuning curves for 107 cells, collected from 13 macaques. The **data** are somewhat noisy for vari- ous reasons, both physiological and technical. Neurons themselves can adapt; changing their responses to the same stimuli after prolonged stimulation. The nature of in vivo extracellular recordings means that spikes can be missed as the target neuron moves closer or further from the electrode during recording sessions. Lastly, in the particular recordings made here for which the ex- perimenter primarily wanted to establish the preferred orientation, for example, without necessarily needing a full high-quality characterisation of the entire tuning curve and so there are numerous cells for which the confidence interval for some datapoints is rather large.

Show more
20 Read more

machine 3, the **data** is spread over the area. Figure 3-6 shows the plot for machine 7, another unreliable machine, which is very similar to that for machine 3.
Figure 3-8 shows the plot of machine 4, which is our bottleneck machine. On this plot we observe a good linear relation until it reaches its capacity value. When this machine reaches its capacity, increasing resource load does not increase the amount of output, causing the curve to level off. Figure 3-9, depicting the situation for the bottleneck machine, shows that when the machine approaches its capacity limit, the **data** is more scattered in that region. Note that in Figure 3-9, there are no observations at the right upper corner of the plot, since for given high release and WIP levels; we do not observe any low output observations. From the plots for both functional forms, we see that the observations appear to follow a concave functional form when the resource load or release for a given initial WIP increases. This is crucial in terms of implementing those functional forms in optimization models, since as discussed in Asmundsson et al. (2009), the formulation depends on the concavity assumption of the CF in order to obtain a **convex** set of constraints for the capacity.

Show more
176 Read more

Stanford, CA 94305-9510 Email: boyd@stanford.edu
Abstract— We consider the problem of **fitting** given **data** (u 1 , y 1 ), . . . , (u m , y m ) where u i ∈ R n and y i ∈ R with a **convex** polynomial f. A technique to solve this problem using sum of squares polynomials is presented. This technique is extended to enforce convexity of f only on a specified region. Also, an algorithm to fit the **convex** hull of a set of points with a **convex** sub-level set of a polynomial is presented. This problem is a natural extension of the problem of finding the minimum volume ellipsoid covering a set. The algorithm, like that for the minimum volume ellipsoid problem, has the property of being invariant to affine coordinate transformations. We generalize this technique to fit arbitrary unions and intersections of polynomial sub-level **sets**.

Show more
In this chapter we present and discuss computational results using small size **data** **sets**. All **data** **sets** contain only numeric attributes and they do not contain missing values. First, we give a brief description of **data** **sets**, then present results. These re- sults include optimal values of the cluster function obtained by each algorithm and CPU time required by them. The following **algorithms** are used for comparison: the global k-means algorithm (GKM), the multi-start modified global k-means al- gorithm (MS-MGKM), the multi-start k-means algorithm (MS-KM), the difference of **convex** clustering algorithm (DCA), the clustering algorithm based on the differ- ence of **convex** representation of the cluster function and nonsmooth optimization (DCClust) and two **algorithms** proposed in this thesis: the fast multi-start modi- fied global k-means algorithm without weights (FMS-MGKM2) and with weights (FMS-MGKM). The description of these **algorithms** can be found in Chapter 4.

Show more
152 Read more

properties is that calculating an upper bound for this measure based on IP S and OP S can be done by solving a number of LP-problems when using dummy points.
To explain the general idea behind these dummy points and the advantages, we use the bi-criteria example in Figure 1. Although the problem with ’undesirable’ normals does not occur for bi-criteria problems, we use a bi-criteria example to make the explanation of the general idea behind the dummy points easier. In Figure 1, the shaded area represents Z and the points A, B, C, D, and E are the current extreme points of IP S. As the arrows indicate, facets AB and DE both have ’undesirable’ normals. Using these normals in the weighted sum method means that we search in the direction of the arrows for the point furthest away from the facet. This would in both cases result in a non-Pareto solution. In Figure 2, two dummy points are added for each extreme point z ∈ IP S E . The dummy points are created by replacing one of the two coordinates of z by a large value (An exact definition of the dummy points is given in Section 4.2). Once the dummy points are created, the set IP S is replaced by the **convex** hull of IP S and all dummy points. All facets containing at least one IP S-point now have a normal with only non-negative elements. All other facets, containing only dummy points, do not satisfy the upper bound constraint on the objectives and are thus not relevant.

Show more
39 Read more

1. Concept and fundamental result
Given a natural correspondence between a family of inequalities and a closed **convex** set in a topological linear space, one might expect that an inequality corresponding to a special point (e.g., an extreme point) would be of special interest in view of the **convex** analysis theory. In this paper, we realize this concept.

11 Read more

There are a number of papers that study the best inner approximation of any **convex** set by a symmetric set; the distance to a symmetric set can be considered a measure of its symmetry [11]. Lower bounds for this distance are given
by the Löwner–John ellipsoid [13]: any planar **convex** body C lies between two homothetic ellipses E ⊂ C ⊂ 2E with
homothety ratio at most 2. Since any ellipse is axially symmetric, and area(E) = 1 4 area(2E) 1 4 area(C), any **convex** planar set C contains an axially symmetric subset with at least 1/4 of the area of C. The same lower bound of 1/4 follows from the fact that any planar **convex** body lies between two homothetic rectangles with homothety ratio at most two [16,19]. The lower bound can be raised to 2/3 [15], a bound that is not known to be tight. Bounds are also known for specific axis-symmetric inscribed figures, such as isosceles triangles or kites [21].

Show more
13 Read more