In this paper, we discussed the implementation and efficiency details of an IR system with fuzzy based similarity measures. Experiments performed on TREC Ohsumed data collection using Apache Lucene prove the superiority of the proposed measure. This is a new technique having advantages over the other InformationRetrieval systems as it can handle vague and imprecise queries of user very well. The performance of proposed technique is compared with cosine based similarity measure on TREC dataset. Results indicate that proposed similarity measure technique based on fuzzy logic, is better than cosine based similarity measure technique for handling vague, uncertain and imprecise queries. The insight provided by this model makes clear that fuzzy notions describe situations known through imprecise, uncertain, and vague information in a way that neither replaces nor is replaced but that, rather, complements the views produced by other approaches .
For the premise parameters identification (identification of premise and consequence) process, the space of each input variable is taken in turn and partitioned into fuzzy subsets while keeping the range of the other variables unpartitioned. Therefore, for the category module, when the ‘programming’ variable is partitioned, the variables ‘general’, ‘graphics’, ‘ai’, and ‘internet’ are not partitioned. In addition, when the ‘general’ variable is partitioned, the variables ‘programming’, ‘graphics’, ‘ai’, and ‘internet’ are not partitioned. At the end of the identification process for the consequence and premise parameters, a set of rules, which describes the behaviour of the fuzzy inference system, is produced. Looking at the membership functions depicted in Figure 2, the input variable ‘programming’ has seven sets of premises, the variable ‘general’ has five sets of premises, the variables ‘graphics’ and ‘ai’ have four sets of premises respectively, while the variable ‘internet’ has two sets of premises for each fuzzy subset. Hence, there are 7*5*4*4*2=1120 rules for each input variable. As there are five variables, the total number of rules will amount to 1120*5=5600. However, using the rule of thumb, or heuristic, concerning the relationship among the variables, it is possible to reduce the number of rules significantly (Zhang et al., 1997). It should be noted that removing a fuzzy subset from the clause of a rule reduces the number of rules by 25. After eliminating irrelevant rules, the total number of rules left in the category module is 540. Similar procedures were carried out for the feature and fsir modules respectively. The feature module has 360 rules, while the fsir module has 720 rules.
Application of fuzzy techniques approach to model flexible system for the access to information on the WWW is also realized within the solved problem. The aim is to design the system that can represent and manages the vagueness and uncertainty, which is characteristic of the process of information searching and retrieval. When some specific information is searched, this point and click access paradigm is unpractical, and the effectiveness of the results strongly depends on the starting page. The definition of systems plays an important role that help users to automatically access information relevant to their needs , . The research is aimed at defining systems tolerant to imprecision and uncertainty in the elicitation of users' performances and able to learn them through an interactive and adaptive behaviour. The fuzzy technique approach is the definition basis of flexible system for locating and accessing information on the Web.
Since Rocchio’s method is here being used for classification rather than query refinement, there is no “initial query” term, as noted above. Dumais et al. also elected to discard the negative exam- ples, i.e., the documents that were not relevant to the given category for which the classifiers were being trained. Since the training set starts with relevance judgments for each category, there are no interactive relevance judgments. Hence, the Rocchio formula for a given category was reduced to computing the centroid (the average) of the documents labeled relevant to the given category. At test time, a new document was judged relevant to a given category if its similarity to the cate- gory’s centroid (as measured by the Jaccard similarity measure) exceeded a specified threshold. Lewis and Gale [SIGIR ‘94] use a variation on traditional relevance feedback which they call “uncertainty sampling.” In any situation where the volume of training data is too large for the user to rate all the documents, some sampling method is required. In traditional relevance feedback, the sample the user is asked to classify consists of those documents that the current classifier con- siders most relevant. Hence, Lewis and Gale call this approach “relevance sampling”. It has the notable virtue, especially if the relevance feedback is taking place while the system is operational, that the documents the user is asked to classify are the ones that (as far as the classifier can tell) he wants to see anyway. However, if the training is taking place before the system is operational (or in a very early stage of operation) and the primary objective is to perfect the classifier, then uncer- tainty sampling (derived from “results in computational learning theory”) may work better. The method assumes a “classifier that both predicts a class and provides a measurement of how certain that prediction is. Probabilistic, fuzzy, nearest neighbor, and neural classifiers, along with many others, satisfy this criterion or can be easily modified to do so.” The sample documents chosen for the user to rate are those about which the classifier is most uncertain, e.g., most uncertain whether to classify them as relevant or non-relevant. For a probabilistic classifier (such as the one they actually describe and test in their paper), the most uncertain documents are those that are classi- fied with a probability of correct classification close to 0.5. Lewis and Gale obtained substantially better classification for a given sample size when the classifier was trained by uncertainty sam- pling of the training set than when it was trained by relevance sampling (and far better than with training on a random sample).
Following this trend, researchers have paid more and more attentions to study some issues regarding blogs. Todoroki et al.  propose to utilize a blog as an electronic research notebook, since a blog system provides user-friendly interface compatible with web browsers, easy-to-use authoring tools and full-text retrieval. Chau and Xu  present a semi-automated approach to facilitate the monitoring, study, and research on blogs of online hate groups. Lin and Huang  indicate that blogs can significantly influence browsers and indirectly promote tourism. Du and Wanger  seek to explore blogs’ success factors from a technology perspective. Asano  investigated whether a ‘fiction novel’ on blogs describing a girl undergoing epilepsy surgery can potentially facilitate familiarity to epilepsy surgery among the general Internet users in Japan. Most of related studies have been conducted to measure the influence of the blogosphere However, relatively few papers discuss extracting knowledge from blogs, such as usage mining  and structure mining . And, it just has relatively few works to discuss blog content mining.
The standard practice for IR is ad-hoc. Here user puts a query and the matched information is retrieved. Matching can be exact based on Boolean logic. As a better alternative, matching can be ranked. Presently, ranking is done by statistical analysis. However, author opines that application of fuzzy logic might provide a better ranking system.
Recently the World Wide Web are packed with huge quantities of information. From this view the user finds it difficult to get the relevant informations due to the increased of their quantities. This paper uses multi- agent system uses intelligent agent in order to retrieval documents from the World Wide Web. The user by this system can easily get the relevant documents which to need them.Multi-agent System is combined with fuzzy inference system for ranking documents. The documents ranking score by cosine similarity using fuzzy inference system development and implemented much simpler than the traditional method which require mathematical equations.
Informationretrieval is an emerging technology in Information technology field. Every application needs storing and retrieval of application specific data. The traditional way of doing it, is through various SQL queries. This requires high technical knowledge about the usage of the SQL tools and the structure of relevant database schema. It is hard for common people not having technical knowledge to use these kinds of tools. In this regard, Natural Language Processing concept started evolving rapidly. NLP made human-computer interaction possible through human natural language. Application user can query the database in any of the human languages like English and get the relevant answer using NLP techniques.
Even though the results from the pilot study show the advantage of the new design over the current search interface, the application of overview/preview in the prototype is only one possible way in which an online image collection can adopt these design guidelines. Design and testing of previews and overviews for other image retrieval systems with different user groups are needed in order to better answer the research questions. Besides, in the redesign, different approaches have been taken in order to deal with several new issues brought to our attention during the usability test, e.g.
understanding, computational semantics, WordNet, word sense disambiguation, semantic role labeling, RTE and paraphrase, MUC information extraction, and events/temporal. We then plotted p(z ˆ ∈ S|y), the sum of the proportions per year for these top- ics, as shown in Figure 3. The steep decrease in se- mantics is readily apparent. The last few years has shown a levelling off of the decline, and possibly a revival of this topic; this possibility will need to be confirmed as we add data from 2007 and 2008.
As vast amount of digital image data is getting archived by the advanced libraries, there is a requirement for an ef cient search methodologies to make them accessible according to client's data requirement. For their retrieval, it is imperative to recognize their contents. Current technologies for optical character recognition (OCR) and document analysis do not handle such documents adequately because of the recognition errors. Due to these challenges, computer is unable to recognize the characters while reading them. In this paper, we propose and effective word image matching scheme that achieves high performance in the presence of noise in image, degradation and word form-variants. Initially, each image in image-database is pre-processed. In the next step find contour method is used to detect blobs which are further passed in tesseract engine. Tesseract segments the characters from the image and stores in character database. Each word in the database is used to index a given set of images. During retrieval, the query word presented to the system is matched with characters in the database and all images containing instances of the query word are retrieved and presented to the user. Using this approach, our method is able to successfully handle images with different font styles, size and heavily touching characters. From the experimental results on the variety of image database it is observed that the extraction of text from the images is mostly accurate and indexing of words based on the position is working perfectly.
There are many factors that must be considered when designing the user interface of a software because the user must be able to interact with the system in a way that the system will understand whatever input given by the user. Therefore, the quality of the interface and software in general must pass the usability testing standard. Some usability factors, such as fit for use, ease of learning, task efficiency, ease to remember, subjective satis- faction and understand ability but all are put into consideration when designing the user interface (Figure 3).
Informationretrieval (IR) system is widely dealt problem but still there are many areas in IR which needs to be addressed. Since natural language is highly ambiguous removal of intrinsic ambiguity in the query form inherent part of any informationretrievalsystem. Ambiguity may be in names (synonymy, polysemy etc) or in any other parts of the sentences. Cross lingual informationretrievalsystem  is another promising area where the task is to retrieve documents that are in other languages to that of the query language. To be precise, search engines has to retrieve documents of any language provided it is relevant to the query. Such search engines are generally regarded as semantic search engines which retrieves documents that are semantically related to the query. An extension of traditional Informationretrievalsystem is web informationretrievalsystem that involves retrieving relevant web pages for an input query. Research areas in Informationretrieval includes query expansion, index creation and maintenance, informationretrieval models etc.
It has been demonstrated that cluster-based informationretrieval can be helpful for improving retrieval effectiveness (Kang, Na, Kim, & Lee, 2007), and cluster-based document browsing is more effective than a single merged list (Crestani, Wu, 2006). Crestani and Wu’s study in 2006 demonstrates that cluster hypothesis continues to be applicable in heterogeneous distributed informationretrieval environments, and creating hierarchical clusters is highly effective for presenting retrieved results in heterogeneous distributed informationretrieval environments. However, findings from the use of cluster-based IR systems are not always absolute. Voorhees (1985) reported that in a clustered-based retrieval, there is not a full ranking of the document collection and thus, clustered-based retrieval is not agreeable to the creation of recall and precision graphs.
There was a loss of synchronization during opening and closing the valves in the channels of air- and gas-supply. The concentration of output flow was constantly changing in the channel of chemical purification and this led to the activation of the blocking system. Approbation of the PI-controller model is fulfilled on the example of regulation of the DC engine speed (valve drive).
Abstract: - Web is a huge source of informa tion a number of internet users visi t on different web si tes and extra ct thei r requi red da ta . Tha t is di rect source of informa tion whi ch is used by end client. On the other hand some additional data genera ted on the pa rked domain web server whi ch is used by web si te administra tor and used for deciding the future business trends and future servi ce planning. Tha t essential information is recovered from the web server l og files, knowledge extra ction from these raw files a re also called the web usage mining. In this presented work web usage mining is inves ti gated and a new da ta model for web recommenda tion is reported. in order to develop the proposed recommender s ys tem the user session web a ccessed log da ta is a ccessed and classified on the basis of the time based fashion. This kind of anal ysis demons tra tes the user web a ccess browsing beha viour in di fferent time slots. Thus a ccording to the user beha vi our anal ysis in different time domains a predi cti ve model namel y hidden Ma rkov model is a pplied on the recovered da ta . Tha t uses the probability es tima tion techniques for finding the new na vi ga tional web a ccess trend. The proposed da ta model is implemented using the visual s tudio envi ronment and the performance of the predicti ve algori thm is computed. The performance of the i mplemented s ys tem is evaluated in terms of a ccura cy, memory consumption, error ra te and time consumption. According to the obtained resul ts the p resented technique enhancing the performance as the training da ta is increases.
an attempt has been made to write the programs in as much a structured form as possible, so that understand- ing of the basic philosophy behind the programs and the system is easy and clear and one can translate the pro- grams into other languages without much difficulty. Or to increase the efficiency, one can write the programs in more than one language, and then have the load
The exponential increase in the number of nodes in MANET needs proper management hence organizing MANET into different groups, called cluster, each cluster has its own leader Called Cluster head (CH)[22,23].Cluster head works as a Certificate authority[2,3] for own Cluster and Mange all operation related to communication, like information about the each cluster node, node mobility etc. As security point of view Clustering play important role in MANET. Traditional informationretrieval systems have several drawbacks in common, such as delaying in information updating. The need to secure communication in MANET is extremely challenging because of the dynamic nature of the network and the lack of centralized management. A distributed corticated authority intended for cluster-based architecture is discussed in this paper. Certificate use for authentication of node and Session key play a important role in secure Communication.