The issues of Internet ContentFiltering (ICF) are an increasing concern for industrial and educational organisations that wish to limit their volume of non-work related Internet traffic (Burke 2002). Although a large source of this traffic is due to third parties such as SPAM and virus creators, the tendency of both employees and students to use the Internet for non-work related activities exacerbates the situation. The Internet traffic generated can be a real cost to organisations in terms of paying for the traffic, but it can also cause issues such as: virus attacks, time wasted dealing with SPAM email, lost productivity due to non work related browsing and increased down load times for legitimate work. There is also the concern for organisations to provide a safe working / study environment that does not subject staff or students to ‘passive’ viewing of objectionable material: This portrays the organisation as being unprofessional and
In the view of massive content explosion in World Wide Web through diverse sources, it has become mandatory to have contentfiltering tools. The filtering of contents of the web pages holds greater significance in cases of access by minor-age people. The traditional web page blocking systems goes by the Boolean methodology of either displaying the full page or blocking it completely. With the increased dynamism in the web pages, it has become a common phenomenon that different portions of the web page holds different types of content at different time instances. This paper proposes a model to block the contents at a fine-grained level i.e. instead of completely blocking the page it would be efficient to block only those segments which holds the contents to be blocked. The advantages of this method over the traditional methods are fine-graining level of blocking and automatic identification of portions of the page to be blocked. The experiments conducted on the proposed model indicate 88% of accuracy in filtering out the segments.
Online safety is the knowledge of maximizing the user's personal safety and security risks on private information in cyberspace. As the number of internet users continues to grow worldwide, internet safety is a growing concern for both children and adults. Common concerns regarding safety on the internet include: malicious users (spam, phishing, cyberbullying, cyberstalking etc.), websites and software (malware, computer viruses, etc.) and various types of obscene or offensive content. Several crimes can be committed on the Internet such as stalking, identity theft and more. So this is very important to understand the risk cause, security tips as well as proper contentfiltering technique. Following are the some key points which must to know everyone:
• Category Blocking is the latest web contentfiltering technology that greatly simplifies the management process of web inspec- tion and contentfiltering. Category Blocking utilizes external services that help keep suspect web sites up-to-date - relying on Web Category Servers that contain the latest web URL ratings to perform web filtering. With Category Blocking devices, there are no manual lists to install or maintain. Web traffic is inspect- ed against rating databases installed on the Category Servers and the results (good or bad sites) are cached to increase per- formance. The advantage is up-to-date web URL and category information at all times, eliminating the need to manually man- age and update local black lists. This method ensures accuracy and real-time compliance with the company's Internet Usage Policy.
This paper focuses on the development of a maintainable information filtering system. The simple and efficient solution to this problem is to block the Web sites by URL, including IP address. However, it is not efficient for unknown Web sites and it is difficult to obtain complete block list. Content based filtering is suggested to overcome this problem as an additional strategy of URL filtering. The manual rule based method is widely applied in current contentfiltering systems, but they overlook the knowledge acquisition bottleneck problems. To solve this problem, we employed the Multiple Classification Ripple-Down Rules (MCRDR) knowledge acquisition method, which allows the domain expert to maintain the knowledge base without the help of knowledge engineers. Throughout this study, we will prove the MCRDR based information filtering system can easily prevent unknown Web information from being delivered and easily maintain the knowledge base for the filtering system.
While such a goal-driven content-filtering system is an am- bitious target, we propose a simple approach to separate useless (irrelevant) sentences in a document from useful (rel- evant) sentences. For clarity, we focus on documents that contain highly noisy problem descriptions and solutions for IT infrastructure support (ITIS) tickets. The useful sen- tences are those that contain actions that solve the user’s problem; we consider all other types of sentences as useless. The approach can be easily generalized to remove useless sentences (from any perspective) from other types of docu- ments.
certain domain like books, e-learning and Academic research content based filtering can be more useful than collaborative filtering . This is due to the fact that such domains offer lots of features that can be easily used for matching users and items. As such we have focused on content based filtering and added a feature that could combine updated as well as diverse information into the system with passage of time. There is no need to communicate with other users in such systems . The major strength of contentfiltering is that even if the system has fewer ratings or none at all which usually is in case of most of the new users, recommendations can be made. The only requirement is to have some information about each item which is easily available in case of book recommender system.
bandwidth has now made it possible to harness ”human computation” in near-real time from a vast and ever- growing, distributed population of online internet users. In the process of distributing and managing knowledge online, so many concepts arose in which crowdsourcing is an example that cannot be overlooked. Crowdsourcing depends on human worker but human worker are prone to errors. To leverage the power of crowdsourcing, in this paper, a framework called Crowdsourcing ContentFiltering (CrowCFil) System was designed. CrowCFil is a framework designed to exploit the conventional crowdsourcing techniques in order to improve the reliability and integrity of information given by contributors to requesters on a crowdsouring platform. It consists of three major functional modules: Task Initiator Module, Contributor Module and CrowCFil Engine Module, all of which are interdependent. The core part of System is the CrowCFilS Engine Module, which gives the system the power to check for the reliability and integrity of response as submitted by a contributor with the aid well defined algorithm embedded into a set of interrelated functions present in it. The framework is suitable for implementation in a relatively large distributed crowdsourcing platform while keeping the cost of operating a crowdsourcing low.
Abstract: Security problem is particularly important to the enterprise search engines. We propose a bloom filter based index to solve the security problem of the enterprise search engines. Our approach maintains a single system-wide index. By considering the access privilege in the index creation algorithm and applying the bloom filter algorithm to compress the index. This application is an enterprise search engine in which company employees will upload data for search engine and other employees will search the content on search engine, but the result set as per the searched query will be filtered according to employee’s access privilege, the search engine content will be filtered by using contentfiltering techniques. A bloom filter based security index creation algorithm and the corresponding query processing and rank algorithms. Experimental results show that our index saves the disk space and guarantees both meanings of the security for the system at the same time. Keywords: Bloom Filter; Security Model; Enterprise Search Engine; Access Privilege; Encryption.
Abstract: Security problem is particularly important to the enterprise search engines. We propose a bloom filter based index to solve the security problem of the enterprise search engines. Our approach maintains a single system-wide index. By considering the access privilege in the index creation algorithm and applying the bloom filter algorithm to compress the index. This application is a enterprise search engine in which company employees will upload data for search engine and other employees will search the content on search engine, but the result set as per the searched query will be filtered according to employee’s access privilege The search engine content will be filtered by using contentfiltering techniques. A bloom filter based security index creation algorithm and the corresponding query processing and rank algorithms. Experimental results show that our index saves the disk space and guarantees both meanings of the security for the system at the same time. Keywords: Bloom Filter; Security Model; Enterprise Search Engine; Access Privilege; Encryption;
The prime objective of this proposal is to provide a secure platform for the organizations/ institutions which are involving directly or indirectly with confidential information and having the resources of high usage. This project will complain the needs of the professional institutions in the society offering the e-learning activities like content generation, distribution, training, preparation relating to e-content and its maintenance in both academia and research areas. When it comes to share the user’s ideas with experts and software tool providers over internet, security has become the prime concern. The intruders/hackers may be after high speed connection can send malicious viruses and worms to blackening the reputation. In order to ensure security to the resources like software tools, hardware equipment, operating system and e-content the institution has taken necessary precautions. In this context, to overcome the threats, installation of firewalls both hardware and software in nature are of high importance. Though the measures exist, the hackers/intruders are coming up with latest techniques to hijack the networking system. As the users are increasing and mostly depending on internet to extract and use the required information / open source software tools available, to carry out their research and academic projects from the global arena. It is necessary to inspect the incoming and outgoing information, which may cause the damage or loss of digital information, resources and tools. In this context, I propose an intruder detection system against vulnerabilities over internet where the information would be filtered by using a solid hardware firewall with proper configuration and installation of software. This will help the institution to stop intruders from accessing our system. Provider can keep the internet link to the outside world, but it can't share the resources unless the user has granted the privilege. With a firewall in place the users will still have typical email access, but chat and other interactive programs will require the users to take an extra step to grant access before use.
Recommender systems are turning out to be a useful tool that will provide suggestion to user according to their requirement. Filtering is used to improved recommendation accuracy in the first recommender systems. To achieve this accuracy most memory-based methods and algorithms were formulated and optimized under some circumstance (e.g., kNN metrics, singular value decomposition, etc.). At this stage, to improved the quality of the recommendations some hybrid approaches are used (primarily collaborative filtering and contentfiltering). In the second stage, algorithms that admitted social information with former hybrid approaches were accommodated and developed (e.g., trust-aware algorithms, social adaptive approaches, social networks analysis, etc.). Currently, the hybrid algorithms are used to integrate location information into existing recommendation algorithms. To improve the quality of recommender systems anticipations future research will concentrate on progressing the existing methods and algorithms. Novel lines of research will be formulated for following fields, such as on: (1) The existing recommendation methods that uses different types of available information will be combine in good order, (2) For recommender systems
589 Figure 2.1 explains the overall functionality of the system. In our proposed system input is the posted messages and output is filtering unwanted messages. Filtering depends on rule based system and machine learning based classifier in support of contentfiltering. Also access control provided to multiful users in OSNs.Initially users register the details and authentication done by verifying username and password. Here user profile information such as the name, age ,gender, likes and dislikes, interested topics, hobbies , graduation information as well as the email id and personal information can be stored. Thus users all information can be maintained separately. After the updation of all the profile information the user need to add the relationship of other users such as sending friends request to know user and getting the user profile and adding the users to his relationship status.
This research work comprises of the analytical study of various spam detection algorithms based on contentfiltering such as Fisher-Robinson Inverse Chi Square function, Bayesian classifiers, AdaBoost algorithm and KNN algorithms. The algorithms have been implemented; the results were studied to draw a relative comparison on the effectiveness of a technique to identify the most accurate one. Each technique is demonstrated in the following sections with their implemented result. The paper is concluded with the benchmarking of the techniques.
However, recommendation systems adopting the content based filtering approach can only recommend the data items in which the user has indicated his/her inter standard. Other potential interesting data items of the user cannot be explored in such recommendation systems if he/she has never accessed before. Instead of computing the similarities between the features of the data items and the user profiles, the collaborative approach computes the similarities between the user profiles. Users of similar profiles are grouped together to share the information in their profiles. The main goal of the collaborative approach is to make the recommendation among the users in the same group.
Many studies in the literature on content-based SMS spam filtering selected some features to represent SMS text messages and these selected features are extracted from SMS data sets with imbalanced class distribution problem. However, not much attention has been paid to handle the imbalanced class distribution problem which affect the characteristics and the size of the selected features and cause undesired per- formance. Therefore, in order to select suitable features from the imbalanced data sets, a suitable feature selection scheme is needed. The Gini Index (Shang et al., 2007) is a feature selection metric which has the ability to handle class imbalance problem by selecting proper features (Ogura, Amano & Kondo, 2011) which will improve the per- formance of filtering. Besides a suitable feature selection metric, a suitable technique which has been engaged in spam filtering is required. Soft computing techniques have been present in almost every domain (e.g. spam filtering) and their ability has been proven (El-Alfy & Al-Qunaieer, 2008; Guzella & Caminhas, 2009).
The concept of Recommender Systems or Recommender Engine rise out of the idea of information reuse and steady preferences. It is an idea that doesn't begin with computers and technology. It is an idea that we can find in ants, and all sorts of other creatures. For example, ants running around our house often follow a line from the ants that went before and find food. That is because ants have the ability to leave markers for other ants, thus helping other ants for food, survive and breeding. This is a system that relates in research as social navigation, following the footmark of others to find what we want, like we use social navigation if we are in a place where we don't know our way. This concept of social navigation is the idea of social information reuse. Recommendation systems or recommendation engine can be defined as the systems to select similar things whenever the user likes something, for example, suggesting movies, songs, products or friends in social media. These days there are so many applications are available in smart phones which recommend other applications to the user. This is one of the recent developments in the field of recommender systems. The recommender systems are broadly classified into three, namely collaborative filtering, content- based filtering and hybrid filtering. This survey paper mainly focuses on content-based filtering algorithms.
2.2.4 IO Group, Inc. v. Veoh Networks, Inc [586 F. Supp. 2d 1132 (N.D. Cal. 2008)]: Veoh could obtain safe harbor protection for the media that they were streaming, although several of these were openly violative of the copyright law. This was because they complied with the take down policy, removing all content for which they received notice on the ground of copyright violation. Since necessary measures were undertaken under appropriate circumstances, Veoh was not violating copyright laws. 2.2.5 Perfect 10, Inc. v. CCBill LLC [488 F.3d 1102 (9th Cir. 2007)]: Plaintiff was the publisher of an adult entertainment magazine and owned the site perfect10.com. It was alleged that the defendants violated copyright, trademark, and state unfair competition, false advertising and right of publicity laws by providing services to websites that posted images stolen from Perfect 10’s magazine and website. Immunity was sought under both the safe harbor clause under the DCMA and s.230 of the CDA. Partially reversing the district court’s decision, the court here remanded certain issues to the district court to ultimately decide whether the safe harbor provision could indeed apply to the defendants. In so far as CDA was concerned, the defendants were held to be eligible for s.230 immunity.
 is one of the most comprehensive empirical studies on spam-filtering sys- tems to date. They cite six spam-filtering systems: SpamAssassin, CRM114, SpamBayes, BogoFilter, DSPAM and SpamProbe. Of the three systems we have not tested here, BogoFilter and SpamProbe are Bayesian filters inspired by , which has also inspired SpamBayes. DSPAM and CRM114 are found to perform substantially inferior to the other filters (when training and clas- sifying mails in sequence). The latter is compatible with our findings. They conclude that The potential contribution of more sophisticated machine learn- ing techniques to real spam filtering is as-of-yet unresolved, which is also our opinion: Overall, there are some promising papers (e.g. ,,) but none of them has yet been translated into a state-of-the-art spam filtering system of non-Bayesian origin. Even commercial systems fail to beat Naive Bayesian learners such as SpamBayes, see .