An exact and an approximate version of the algorithm.
[1]
J. C.Nandini Bai
M.Sc (Computer Science) Besant theosophical college,madanapalli.
[2]
D.Raja Reddy Assistant professor
Besant theosophical college,madanapalli
ABSTRACT — Keyword-based search in
rich, multidimensional datasets helps with
many new applications and tools. In this
search, we consider objects marked with
keywords to be included in the vector space.
For these data sets, we look at queries that
require small groups of points to satisfy a
certain set of keywords. We propose a new
method called ProMiSH (Multi-Purpose
Projection and Diagnosis) that uses random
index structures and random fragmentation,
and achieves high scalability and
acceleration. We provide an accurate and
approximate version of the algorithm. Our
experimental results show on real and
synthetic data sets that ProMiSH has up to
60 times the acceleration on modern
tree-based techniques.
Keywords - query, multidimensional data,
indexing, segmentation
INTRODUCTION
Objects (such as images, chemical
compounds, documents, or experts in
collaborative networks) are often
characterized by a set of related features,
usually represented as points in a
multi-dimensional space. For example, images are
represented by vectors with a color property,
and usually contain descriptive text
information (such as tags or keywords)
associated with them. In this paper, we look
at multidimensional data sets where each
data point contains a set of keywords. The
presence of keywords in the space allows
the feature to develop new tools for
querying and exploring these
multidimensional datasets. In this research,
we examine the queries for the closest set of
keywords (referred to as NKS) on rich
multidimensional data sets. A NKS query is
a set of keywords provided by the user. The
result of the query may include sets of data
points, each containing all the query
keywords and one of the most important
groupings in the multidimensional space.
Figure 1 illustrates an NKS query across a
set of two-dimensional data points. Each
query q fafa; b; cg, the set of points f7; 8; 9g
contain all the keywords queryfa; b; cgand
constitute the minimum of groups compared
to any other set of points covering all the
query words. Therefore, the f7; 8; 9g group
is the top 1 result for query Q. NKS queries
are useful for many applications, such as
image sharing in social networks, graph
pattern search, geo-search in GIS1 systems,
[2] ], Etc. Here are some examples.
1) Consider a social network for sharing
photos (e.g., Facebook), where photos are
tagged with people and location names.
These images can be embedded in a high
dimensional space of texture, color, or shape
[3], [4]. Here, a NKS query can create a set
of similar images that contain a group of
people.
2) NKS queries are useful for searching in
the graph, where graphs are included in a
high-dimensional space (for example, by
including Lipschitz [5]) for scalability. In
this case, you can answer the search for a
subform that contains a set of tags specified
by an NKS query in the built-in space [6].
3) NKS queries can also reveal geographic
patterns. The GIS can distinguish the area
through a range of high-dimensional
features, such as pressure, humidity and soil
types. At the same time, these areas can also
be distinguished by information such as
diseases. Epidemiology can formulate NKS
queries to detect patterns by identifying a set
of similar areas with all diseases that
concern them.
Related Works A variety of related queries
have been studied in literature on text-rich
spatial data sets. [11], [12], [13], [14] with a
combination of R-Tree [15] and the inverted
index were answered earlier. Felipe et al.
[16] IR2-Tree was developed to classify
objects from spatial data sets based on a set
of their travels to the query sites and the
relevance of their text descriptions to query
words. Kong et al. [17] The inverted R and
inverted R tree to answer a query similar to
Felipe et al. [16] using a different
classification function. Martins et al. [18]
link the computerized text and proximity to
the site independently, and then merge the
ranking grades. Cao et al. [7] and Long et al.
[8] Suggested algorithms for retrieving a set
of Spatial Web objects so that the keywords
of the group cover query words for the query
and the objects in the group are closer to the
query site and have the lowest spaces
between the objects. Other relevant queries
include searching for the closest keywords
in spatial databases [19], top-to-top
based on their impact on feature points [21]
and optimal site queries [22] ], [23]
Existing System:
The website and GIS keyword queries were
earlier answered in GIS using a combination
of R-Tree and reverse index.
Felipeet al. IR2-Tree was developed to
arrange objects from Spatial Data Sets based
on a set of spaces to the query sites and their
suitability for descriptions of the text of the
query words.
Kong et al. Integrated R-tree and inverse file
to answer a query similar to Felipeet al.
Using a different classification function.
Disadvantages of existing system:
guidance on how to effectively enable the
type of queries in which query coordinates
are missing.
it is difficult
for users to provide meaningful coordinates,
and our work deals with another type of
query where users can only provide
keywords as inputs.
difficult to adapt the current techniques with
our problem.
the elements from each data point is possible
to coordinate the surveys is not scalable.
Proposed System:
data sets where each data point contains a
set of keywords. The presence of keywords
in the space allows the feature to develop
new tools for querying and exploring these
multidimensional datasets.
of the closest set of keywords (referred to as
"NKS") in rich multidimensional data sets.
AnNKS query is a set of keywords provided
by the user, and theresult of the query may
contain k sets of data points each of which
contains all the query words and one forms
of the most narrow kest block in the
multidimensional space.
is paper, we suggest ProMiSH
(Projection shortcut and Multi-Scale
Hashing) to enable quick processing of NKS
queries. In particular, ProMiSH
(ProMiSH-E), which consistently retrieves optimal
results, ProMiSH (ProMiSH-A), which is
more efficient in terms of time and space, is
able to obtain near perfect results In
Pro ProMiSH-E uses a set of inverted and
inverted indexes to perform a localized
search.
Advantages of the proposed system:
-metric novel index to handle almost
exact search query.
work with multi-metric indexes to handle
quick queries.
studies to demonstrate the performance of
the proposed techniques.
SYSTEM ARCHITECTURE:
CONCLUSIONS AND FUTURE Working
in this paper, we have suggested solutions to
the problem of searching for the keywords
closest to the top in multidimensional
datasets. We proposed a new index called
ProMiSH based on random projections and
fragmentation. Based on this indicator, we
have developed the ProMiSH-E program,
which contains an ideal subset of points and
ProMiSH-A, which looks for semi-ideal
results more effectively. Our experimental
results show that ProMiSH is faster than
advanced tree-based technologies, with
multiple performance improvements.
Moreover, our technologies are well aligned
with real and industrial data sets. Jobs
ranking. In the future, we plan to explore
other recording schemes to arrange result
sets. In one system, we may assign weights
to the keywords at a point using techniques
such as tf-idf. After that, each set of points
can be scored based on the distance between
points and keyword weights. In addition, the
result criteria that contain all keywords can
be mitigated to generate results that contain
only a subset of the search terms. Extended
Disk. We plan to explore the ProMiSH
extension to disk. ProMiSH-E reads only the
required buckets from Ikp to points that
contain at least one query word. Therefore,
Ikp can be stored on disk by using the
directory structure. We can create a
directory for Ikp. Each bucket of ikp will be
stored in a separate file named after its key
checks serially the HI data structures from
the smallest scale to generate candidate
point identifiers to search for the subgroup,
and only reads the required buckets from the
recursive index and the inverse index of the
HI structure. Therefore, all nested rules and
inverse indexes of HI can be stored again
using a similar structure such as Ikp, and all
points in the data set can be indexed to B +
-Tree [36] using their identifiers and stored
on disk. In this way, you can search the
subset to retrieve points from the disk using
B + -Tree to explore the final result set
Author Details
C.Nandini Bai
Guide Details