An exact and an approximate version of the algorithm.

(1)

An exact and an approximate version of the algorithm.

[1]

J. C.Nandini Bai

M.Sc (Computer Science) Besant theosophical college,madanapalli.

[2]

D.Raja Reddy Assistant professor

Besant theosophical college,madanapalli

ABSTRACT — Keyword-based search in

rich, multidimensional datasets helps with

many new applications and tools. In this

search, we consider objects marked with

keywords to be included in the vector space.

For these data sets, we look at queries that

require small groups of points to satisfy a

certain set of keywords. We propose a new

method called ProMiSH (Multi-Purpose

Projection and Diagnosis) that uses random

index structures and random fragmentation,

and achieves high scalability and

acceleration. We provide an accurate and

approximate version of the algorithm. Our

experimental results show on real and

synthetic data sets that ProMiSH has up to

60 times the acceleration on modern

tree-based techniques.

Keywords - query, multidimensional data,

indexing, segmentation

INTRODUCTION

Objects (such as images, chemical

compounds, documents, or experts in

collaborative networks) are often

characterized by a set of related features,

usually represented as points in a

multi-dimensional space. For example, images are

represented by vectors with a color property,

and usually contain descriptive text

information (such as tags or keywords)

associated with them. In this paper, we look

at multidimensional data sets where each

data point contains a set of keywords. The

presence of keywords in the space allows

the feature to develop new tools for

querying and exploring these

multidimensional datasets. In this research,

we examine the queries for the closest set of

keywords (referred to as NKS) on rich

multidimensional data sets. A NKS query is

a set of keywords provided by the user. The

result of the query may include sets of data

points, each containing all the query

keywords and one of the most important

groupings in the multidimensional space.

Figure 1 illustrates an NKS query across a

set of two-dimensional data points. Each

(2)

query q fafa; b; cg, the set of points f7; 8; 9g

contain all the keywords queryfa; b; cgand

constitute the minimum of groups compared

to any other set of points covering all the

query words. Therefore, the f7; 8; 9g group

is the top 1 result for query Q. NKS queries

are useful for many applications, such as

image sharing in social networks, graph

pattern search, geo-search in GIS1 systems,

[2] ], Etc. Here are some examples.

1) Consider a social network for sharing

photos (e.g., Facebook), where photos are

tagged with people and location names.

These images can be embedded in a high

dimensional space of texture, color, or shape

[3], [4]. Here, a NKS query can create a set

of similar images that contain a group of

people.

2) NKS queries are useful for searching in

the graph, where graphs are included in a

high-dimensional space (for example, by

including Lipschitz [5]) for scalability. In

this case, you can answer the search for a

subform that contains a set of tags specified

by an NKS query in the built-in space [6].

3) NKS queries can also reveal geographic

patterns. The GIS can distinguish the area

through a range of high-dimensional

features, such as pressure, humidity and soil

types. At the same time, these areas can also

be distinguished by information such as

diseases. Epidemiology can formulate NKS

queries to detect patterns by identifying a set

of similar areas with all diseases that

concern them.

Related Works A variety of related queries

have been studied in literature on text-rich

spatial data sets. [11], [12], [13], [14] with a

combination of R-Tree [15] and the inverted

index were answered earlier. Felipe et al.

[16] IR2-Tree was developed to classify

objects from spatial data sets based on a set

of their travels to the query sites and the

relevance of their text descriptions to query

words. Kong et al. [17] The inverted R and

inverted R tree to answer a query similar to

Felipe et al. [16] using a different

classification function. Martins et al. [18]

link the computerized text and proximity to

the site independently, and then merge the

ranking grades. Cao et al. [7] and Long et al.

[8] Suggested algorithms for retrieving a set

of Spatial Web objects so that the keywords

of the group cover query words for the query

and the objects in the group are closer to the

query site and have the lowest spaces

between the objects. Other relevant queries

include searching for the closest keywords

in spatial databases [19], top-to-top

(3)

based on their impact on feature points [21]

and optimal site queries [22] ], [23]

Existing System:

The website and GIS keyword queries were

earlier answered in GIS using a combination

of R-Tree and reverse index.

Felipeet al. IR2-Tree was developed to

arrange objects from Spatial Data Sets based

on a set of spaces to the query sites and their

suitability for descriptions of the text of the

query words.

Kong et al. Integrated R-tree and inverse file

to answer a query similar to Felipeet al.

Using a different classification function.

Disadvantages of existing system:

guidance on how to effectively enable the

type of queries in which query coordinates

are missing.

it is difficult

for users to provide meaningful coordinates,

and our work deals with another type of

query where users can only provide

keywords as inputs.

difficult to adapt the current techniques with

our problem.

the elements from each data point is possible

to coordinate the surveys is not scalable.

Proposed System:

data sets where each data point contains a

set of keywords. The presence of keywords

in the space allows the feature to develop

new tools for querying and exploring these

multidimensional datasets.

of the closest set of keywords (referred to as

"NKS") in rich multidimensional data sets.

AnNKS query is a set of keywords provided

by the user, and theresult of the query may

contain k sets of data points each of which

contains all the query words and one forms

of the most narrow kest block in the

multidimensional space.

is paper, we suggest ProMiSH

(Projection shortcut and Multi-Scale

Hashing) to enable quick processing of NKS

queries. In particular, ProMiSH

(ProMiSH-E), which consistently retrieves optimal

results, ProMiSH (ProMiSH-A), which is

more efficient in terms of time and space, is

able to obtain near perfect results In

(4)

Pro ProMiSH-E uses a set of inverted and

inverted indexes to perform a localized

search.

Advantages of the proposed system:

-metric novel index to handle almost

exact search query.

work with multi-metric indexes to handle

quick queries.

studies to demonstrate the performance of

the proposed techniques.

SYSTEM ARCHITECTURE:

CONCLUSIONS AND FUTURE Working

in this paper, we have suggested solutions to

the problem of searching for the keywords

closest to the top in multidimensional

datasets. We proposed a new index called

ProMiSH based on random projections and

fragmentation. Based on this indicator, we

have developed the ProMiSH-E program,

which contains an ideal subset of points and

ProMiSH-A, which looks for semi-ideal

results more effectively. Our experimental

results show that ProMiSH is faster than

advanced tree-based technologies, with

multiple performance improvements.

Moreover, our technologies are well aligned

with real and industrial data sets. Jobs

ranking. In the future, we plan to explore

other recording schemes to arrange result

sets. In one system, we may assign weights

to the keywords at a point using techniques

such as tf-idf. After that, each set of points

can be scored based on the distance between

points and keyword weights. In addition, the

result criteria that contain all keywords can

be mitigated to generate results that contain

only a subset of the search terms. Extended

Disk. We plan to explore the ProMiSH

extension to disk. ProMiSH-E reads only the

required buckets from Ikp to points that

contain at least one query word. Therefore,

Ikp can be stored on disk by using the

directory structure. We can create a

directory for Ikp. Each bucket of ikp will be

stored in a separate file named after its key

(5)

checks serially the HI data structures from

the smallest scale to generate candidate

point identifiers to search for the subgroup,

and only reads the required buckets from the

recursive index and the inverse index of the

HI structure. Therefore, all nested rules and

inverse indexes of HI can be stored again

using a similar structure such as Ikp, and all

points in the data set can be indexed to B +

-Tree [36] using their identifiers and stored

on disk. In this way, you can search the

subset to retrieve points from the disk using

B + -Tree to explore the final result set

Author Details

C.Nandini Bai

Guide Details