Available Online at www.ijpret.com 299
INTERNATIONAL JOURNAL OF PURE AND
APPLIED RESEARCH IN ENGINEERING AND
TECHNOLOGY
A PATH FOR HORIZING YOUR INNOVATIVE WORK
A REVIEW ON SECURE ENTERPRISE SEARCH ENGINE USING CONTENT FILTERING
MR. SAGAR TAYADE1, DR. H. R. DESHMUKH 2, MR. N. S. BAND3, 1. M E. Scholar, Department of Computer Science,, IBSS COE Amravati , India.2. Professor & HOD, Department of Computer Science, IBSS COE Amravati, India. 3. Assistant Professor, Department of Computer Science, IBSS COE Amravati, India.
Accepted Date: 05/03/2015; Published Date: 01/05/2015
\
Abstract: Security problem is particularly important to the enterprise search engines. We propose a bloom filter based index to solve the security problem of the enterprise search engines. Our approach maintains a single system-wide index. By considering the access privilege in the index creation algorithm and applying the bloom filter algorithm to compress the index. This application is a enterprise search engine in which company employees will upload data for search engine and other employees will search the content on search engine, but the result set as per the searched query will be filtered according to employee’s access privilege The search engine content will be filtered by using content filtering techniques. A bloom filter based security index creation algorithm and the corresponding query processing and rank algorithms. Experimental results show that our index saves the disk space and guarantees both meanings of the security for the system at the same time.
Keywords: Bloom Filter; Security Model; Enterprise Search Engine; Access Privilege; Encryption;
Corresponding Author: MR. SAGAR TAYADE
Access Online On:
www.ijpret.com
How to Cite This Article:
Available Online at www.ijpret.com 300
INTRODUCTION
Nowadays organizations spread all over the nation and communication between them is necessary to increase performance. To access documents between the organizations, enterprise search engine plays important roll but search engine may provide highly confidential documents to wrong employee therefore to increase the security and to provide easy document access content filtering based search engine can be used.
Security problem is particularly important to the enterprise search engines [1, 2]. There are two meanings for the security problem. The first one is that a user, without access privilege to a set of documents, cannot get searching results from the enterprise search engine containing any document in the set. The second meaning is that a user can infer any information, he cannot access, from the searching results.
By using content filtering, employee cannot abuse search engine to leak documents. The approach filtering based algorithms [3, 4, 5, 6]. They maintain a single system-wide index without considering the access privilege. After getting the searching results, the filtering based algorithms filter the results based on a user’s access control right and only return a subset of the results, that the user can access. The advantage of the filtering based algorithms is that they can save the disk space
We propose a bloom filter based security index creation algorithm and the corresponding query processing and rank algorithms. Experimental results show that our index saves the disk space and guarantees both meanings of the security for the system at the same time
2. LITERATURE REVIEW & RELATED WORK:
All the statistics is calculated at runtime, its performance is not effective. [3] Analyzes the factors affecting the performance of the filtering based approaches and gives some methods to optimize these factors. [4] And [5] propose security indexes, which incorporate the document access information into the index. A bloom filter (BF) [11, 12] is a bit vector with l bits, all of which are initially set to 0.
Available Online at www.ijpret.com 301 The ordinary insecurity index is the smallest one among all indexes. The ACB based security index is the largest one and our index is the middle one. The ordinary index only stores index of all documents and does not store any additional information to guarantee security, so it is the smallest one. Documents are stored in different ACBs, which are organized into a complex directed acyclic graph. The algorithm creates an index for each ACB [7].
Many algorithms have been proposed to improve the performance of the two kinds of algorithms, [2] proposes a security model. The security model computes all statistics necessary to rank the search results on the fly. The security model can guarantee both meanings of security problem for the enterprise search engines. But as all the statistics is calculated at runtime; its performance is not effective. A user’s query is expanded with the access information of the user. By using the index to match the access information of the two parts, the performance of the filtering based approaches can be improved.
But these security indexes cannot guarantee the second meaning of the security problem for the enterprise search engines.
3. Proposed Work & Objective:
3.1 There are two kinds of approaches to solve the security problem:
3.1.1 Filtering based algorithms:
They maintain a single system-wide index without considering the access privilege. After getting the searching results, the filtering based algorithms filter the results based on a user’s access control right and only return a subset of the results, that the user can access. The advantage of the filtering based algorithms is that they can save the disk space because they store the documents and the index for the documents only once.
We propose a bloom filter based index and related algorithms to improve the security level of the index of the traditional filtering based algorithms. Compared with the index of the traditional filtering based algorithms, our bloom filter based index adds a list to each term in the index. Each node in the list contains the access control information compressed by bloom filter and some statistics about the term. Our rank algorithm utilizes this information to rank the query results to prevent information leaking.
Available Online at www.ijpret.com 302 They create a unique index for each user, which contains all files accessible to the user. The desktop search systems can guarantee both meanings of the security problem.
3.2 Secure Enterprise search Engine has the following phases:
3.2.1 Bloom Filter:
A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not. In other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed .The more elements that are added to the set, the larger the probability of false positives
A bloom filter (BF) [11, 12] is a bit vector with l bits, all of which are initially set to 0. The BF facilitates the membership test, element x belongs to a finite set S = {x1; x2; : : : ; xn}. The BF uses a set of k uniform and independent hash functions h1; h2; : : : ; hk to map the set S to the bit vector. To check whether an item x ∈ S, we check whether all hi(x) bits are set to 1. If not, x is definitely not a member of S. Otherwise; x is probably a member of S. We call the probability a false positive if an element x ̸∈ S has all hi(x) bits set to 1.
3.2.2 Foundation of the bloom filter based index
The main idea of our index creation algorithm is to maintain a list for each term in the index. Each node in the list stores a user’s ID, the IDF value of the term for the user and some other information. Our index can overcome the shortcoming of the filtering based algorithms and save more disk space than the desktop systems.
3.3 Algorithms:
3.3.1 AES Algorithm:
Available Online at www.ijpret.com 303
3.3.2 Filtering Algorithms:
They maintain a single system-wide index without considering the access privilege. After getting the searching results, the filtering based algorithms filter the results based on a user’s access control right and only return a subset of the results, that the user can access.
Our rank algorithm utilizes this information to rank the query results to prevent information leaking. Association Rule Mining [1] is a popular and well researched method for discovering interesting relations between variables in large databases.
Figure: Secure Enterprise Search Engine Using Content Filtering.
4. CONCLUSION:
Search engine may provide highly confidential documents to wrong employee therefore to increase the security and to provide easy document access Content Filtering based Search Engine can be used. The result set as per the searched query will be filtered according to employee’s access privilege. Company branch administrators and employees will upload documents for search engine with designation wise access permissions. Employee can download required document as per their access permissions. Company employees can communicate with each other, their incoming messages will be stored in inbox and outgoing messages will be stored in outbox. No need to worry about document security Document Leakage prevented by using content filtering.
Document Storage for search engine
Employee searches documents on search engine
Search engine fetches result set as
per keywords Content filtering
Available Online at www.ijpret.com 304
5. REFERENCES:
1. D. Hawking, Challenges in enterprise search, in: Proc. ADC 2004.
2. D. Zhang, Y.M. Chee, A. Mondal, A.K.H. Tung, and M. Kitsuregawa, “Keyword Search in Spatial Databases: Towards Searching by Document,” Proc. Int’l Conf. Data Eng. (ICDE), pp. 688-699, 2009.
3. P. Bailey, D. Hawking, B. Matson, Secure search in enterprise webs: tradeoffs in efficient implementation for document level security, in: Proc. CIKM 2006.
4. T. C. Chieu, T. Nguyen, L. Z. Zeng, Secure search of private documents in an enterprise content management system, in: Proc. ICEBE 2007.
5. J. Kasprzak, M. Brandejs, M. Cuhel, Access rights in enterprise full-text search, in: Proc. ICEIS 2010.
6. A. Singh, M. Srivatsa, Efficient and secure search of enterprise file systems, in: Proc. ICWS 2007.