Finding Skyline For Uncertain Contexts

(1)

Volume 3, Special Issue 1, ICSTSD 2016

46

Finding Skyline For Uncertain Contexts

Miss R.R.Meshram

Dr. Mohd. Atique

Dr.V.M.Thakare

SGBAU, Amravati SGBAU, Amravati SGBAU, Amravati

India India India

[email protected] [email protected] [email protected]

Abstract

—

Skyline is set of tuples that are not dominated by other tuples in at least one dimension. Query processing over uncertain data has gained growing attention, because it is necessary to deal with uncertain data in many real-life applications. These skyline queries are associated with user’s current context, it involves uncertainty with respect to user’s preferences or object context. This paper presents the notion of finding Skyline for uncertain context. It presents an efficient algorithm for computing skyline over uncertain context. This involve two pruning i.e. object pruning and instance pruning. The experimental results on the real NBA dataset and the synthetic datasets show that Skyline Set is interesting and useful, and our algorithms are efficient and scalable.

Keywords—Skyline Query; locational dominance; uncertain context

I. INTODUCTION

Skyline query is one of the important in multi-criteria decision making .With the increasing growth of database services there is a great need of efficient tool which allows user in quick decision making. Skyline tuples should follow following properties.

o Nondominance.

Skyline tuples are not dominated by any tuple outside the skyline set.

o Incomparability.

Skyline tuples do not dominate each other, i.e., they hold on to their own ground/ importance in skyline against each other.

o Coverage.

All together, the skyline tuples dominate all the nonskyline tuples, i.e., each

nonskyline tuple is dominated by at least one skyline tuple.

Many real life applications has inherent uncertainty, including environmental surveillance, market analysis and quantitative economics research. Consider an example calculation of SO2 and NO2 emission measurement by the machinery to evaluate the level of pollution in particular area. But this may results in uncertain data due fault in machines or any environmental conditions. For handling this type of uncertain data is more challenging in research areas today. This paper studies this problem of dealing with uncertain data and computing Skyline over this uncertain data. Another good example is based on Player’s data during his career. The dataset may involve uncertainty because it may be possible during the skyline computation that some player is good in one criteria but at the same time other player is good at another criterion. This will create uncertainty among data. So in this case skyline computation over individual players record is inappropriate, hence skyline is computed over game record.

II. BACKGROUND

(2)

47

development of this research direction is to support skyline queries over uncertain databases. This is a vital research topic with many potential real life applications of coming future. The problem of exponential increase in number of preferences is addressed in [2] by considering uncertain data by the possible worlds semantic model. [3] introduced basic principles for efficiently checking P-domination i.e. (probabilistic tuples) and exemplify them for the expected rank and expected score semantics. [4] proposed the effective parallel algorithm using MapReduce to process the probabilistic skyline queries for uncertain data.[5] proposed a new skyline query for uncertain data i.e. U-Skyline.

This paper is organizes as follows. Section I discusses Introduction. Section II discusses Background. Section III discusses previous work. Section IV discusses existing methodologies. Section V discusses attributes and parameters and comparisons between different schemes. Section VI proposed method and outcome result possible. Finally section VII Conclude this review paper.

III. PREVIOUS WORK DONE

Many real life applications have inherent uncertainty. To deal with this uncertainty present in the data is the recent research area. Many algorithms are proposed to handle this uncertainty. Jinfei Liu et al [1] (2015) proposed a new probabilistic k-Skyline Sets on uncertain data, called pk-SkylineSets. It generalizes existing work for choosing individual skyline points to choosing sets of skyline points on uncertain data and is useful in finding Pareto solutions of subsets in practical applications. In this work to study how to choose skyline sets with fixed size on uncertain data. [1] discuss an efficient algorithm for computing probabilistic k-skyline sets. It includes two heuristic pruning strategies, object pruning and instance pruning, which efficiently reduce the search space by reducing the number of candidate object sets and their instances. The disadvantage is that it is time consuming to check whether an instance is dominated by all possible instances of each object.

Jiping Zheng et al [2](2015) addressed the problem of exponential increase in number of preferences. For this by considering uncertain data by the possible worlds semantic model. However enormous generation of possible

worlds semantic model, makes the model inefficient. For this Jiping Zheng et al[2] proposed asymptotic efficient algorithms and approximation algorithm to evaluate skyline result. This includes the exact skyline query processing algorithm and heuristc Skyline algorithm. Though heuristic skyline algorithm can reduce the number of possible worlds not contributing to final skyline query result, the number of possible worlds is still large. This is the disadvantage of heuristc Skyline algorithm.

Ilaria Bartolini et al[3](2013) introduced a novel definition of domination among probabilistic tuples (P-domination), that is valid for any ranking semantics and model of tuple correlation. Along with it prove that the skyline resulting from proposed definition satisfies all the properties that hold in the deterministic case. This will introduce basic principles for efficiently checking P-domination, and exemplify them for the expected rank and expected score semantics. And also provide a family of algorithms able to efficiently compute the skyline of a probabilistic relation under the semantics, demonstrating their applicability to a range of large data sets.

Yoonjae Park et al [4] (2015) proposed the effective parallel algorithm using MapReduce to process the probabilistic skyline queries for uncertain data modeled by both discrete and continuous models. Three filtering methods to identify probabilistic non-skyline objects are proposed. This algorithm involve Quad-Tree partitioning and filtering.

Xingjie Liu et al [5] (2013) proposed a new skyline query (U-Skyline) for uncertain data. It focuses on meeting the nondominance, incomparability and coverage properties simultaneously for uncertain skyline query. [5] investigate the interplay among different data tuples during the computation, and transform U-Skyline query processing into an integer programming problem. A search algorithm based on dynamic programming (DP) to find U-Skyline is also designed. It uses to improve the algorithm with pruning and early termination (P&ET) techniques.

IV. EXISTING METHODOLOGIES

(3)

48

uncertain data. To handle this large uncertain data, number of algorithms are exists. These existing methods are explained in detail below.

Jinfei Liu et al[1] proposed an algorithm to solve the probabilistic k-skyline set problem.The baseline algorithm includes two heuristic pruning strategies object pruning and instance pruning,which efficiently reduce the search space for candidate object set. For finding the probability of instance set a layered range tree is used. The main tree is a binary search tree based on the x-coordinate of the points. The detailed structure is implemented by fractional cascading technique. Based on the concept layered range tree,a cumulative layered range tree is built.Algorithm based on this tree works by finding the nodes in the main tree corresponding to range[x,x’] and then reporting the accumulative information in the associated structure corresponding to the range[0,y’].

Asymptotic efficient skyline query processing algorithms over uncertain context are proposed in[2].First it will provide exact skyline processing algorithms. Then heuristic methods are proposed for reducing the number of possible worlds. Finally to improve the efficiency of skyline query process a Monte Carlo sampling methods to calculate approximation skyline query result.

The skyline of probabilistic relation[3] is parametric in the semantics for linearly ranking probabilistic tuples and, is based on order-theoretic principles. A 2-phase Algorithm is proposed which has given the probabilistic relation Rp as a input and it results in skyline of Rp. The phase I includes determining whether tuples have to be sorted or not; compute basic groups’ information and precompute bounds for each tuple u, possibly applying some P-domination rules. Phase II involves the actual skyline computation .First it will determine whether a spatial index has to be built or not and then determine how to apply the remaining P-domination rules

PSMR algorithm is proposed in [4] which is state of art,consisting of two phases.The first phase computes local candidate i.e. possible probabilistic skyline objects and affect sets i. e. non-skyline objects required to compute the skyline probabilities of objects in the candidate set.In second phase ,it divides the union of candidate and affect sets into

several partition each of which is allocated to a different machine .

A new skyline query (U-Skyline) for uncertain data is developed in [5].An intuitive idea for processing U-Skyline (brute-force) is to enumerate all possible candidate skylines, and evaluate their U-Skyline probabilities. A number of optimization techniques for query processing, including 1) computational simplification of U-Skyline probability, 2) pruning of unqualified candidate skylines and early termination of query processing, 3) reduction of the input data set, and 4) partition and conquest of the reduced data set.

V. ANALYSIS AND DISCUSSION

[1] analyses two significant heuristic pruning strategies i.e.object pruning and instance pruning which are efficient and simple for computing probabilistic skyline. Pk- SkylineSets is interesting and useful, and algorithms are efficient and scalable.

LHSA A CHSA[2] algorithms are prior to ESA and Monte Carlo sampling based algorithms are with best performance while maintaining small absolute error. U-skyline[5] is 10-100 times faster than the parallel integer programming solver.

The following table gives the brief analysis of existing methods.

Algorithm Advantages Disadvantages

Computation Pk -Skyline Sets (object pruning and instance pruning)

By the use of object pruning and instance pruning, it efficiently reduces the search space.

It is hard to prune instances by heuristic pruning strategies on some datasets.

LHSA , CHSA, PMA,ESA,SPM A

PMA ,SPMA are better than others.

There is burden of huge number of possible world. 2-phase

algorithm

Robust and scalable

The problem of precisely

(4)

49

ranking semantics and correlation model

PS-QP-MR(Space partitioning based variant of Quadtree)

Much faster and scalable than the other algorithm. Performance gain is good.

Applying one of the filtering technique is slow.

U-Skyline Query It is faster than the previous ones.

It not suitable for navigational behaviour

VI. PROPOSED METHODOLOGY

This section presents a top-down method for probabilistic skyline computation. The method starts with the whole set of instances of an uncertain object. The skyline probability of the object can be bounded using the maximum and the minimum corners of the MBB of the object. To improve the bounds, partition the instances into subsets can be done recursively. The skyline probability of each subset can be bounded using its MBB in the same way. the skyline probability of the uncertain object can be bounded as the weighted mean of the bounds of subsets. Once the p-skyline membership of the uncertain object is determined, the recursive bounding process stops.

VII.CONCLUSION

This paper concludes that Skyline can be computed for uncertain context efficiently. for this an efficient algorithm developed to tackle the problem of computing skylines on uncertain data. Along with algorithm, possible outcome which gives the expected result with better performance gain in terms of efficiency and scalability is discussed.

VIII. FUTURESCOPE

In future this work can be extended with ranking semantic and can be more efficient by reducing the possible world model. It can be extended for making real life applications more efficient.

References

[1] Jinfei Liu, Haoyu Zhangand Li Xiong, ”Finding Probabilistic k-Skyline Sets on Uncertain Data” ACM, Vol. 19, No. 23,Page No. 1511, October 2015.

[2] Jiping Zheng, Yongge wang, Wei Yu, and Nanjing ”Asymptotic-Efficient Algorithms for Skyline Query Processing over Uncertain Context,” ACM, VOL. 25, NO. 11, Page No.4503 , July 2013.

(5)

50

Engineering, VOL. 25, NO. 7, Page No. 1656, July 2013

[4] Yoonjae Park,Jun-Ki-Min and Kyuseok Shim, “Processing of Probabilistic Skyline Queries Using MapReduce”, IEEE Transactions On Knowledge And Data Engineering, VOL. 08, NO. 15,Page No.1406, September 2015