information on the accuracy of the approximation operators, i.e., upper approx- imation operators which yield smaller values will achieve higher accuracy, and therefore, they induce more accurate approximations of a concept.
The outline of this article is as follows. In Section 2, we present preliminary concepts. In Section 3, we introduce nine **neighborhood** operators and discuss the partial order relations of twenty-two **neighborhood** operators. In Section 4, the partial order relations of seven **new** approximation operators with existing **rough** set approximation operators are discussed. Finally, conclusions and future work are outlined in Section 5.

Show more
11 Read more

of information systems. this paper presents a **new** approach **based** on notions of mathematical theory of **Rough** **Sets** to solve this problem. Using these concepts a systematic approach has been developed to reduce the size of decision database and extract reduced rules set from vague and uncertain data. The method has been applied to an imprical medical database with large scale data size and the final reduced and core rules has been extracted using concepts of this theory.

Medical data classification is a major element of the many decision-making tasks. Decision-making tasks are instances of classification problem that can be easily formulated into a prediction tasks, diagnosis and pattern recognition.To avoid the risks of decision-making, we need computerized decision making techniques. In this work, we proposed the **neighborhood** **rough** set **based** classification for medical diagnosis. In previous study, **neighborhood** **rough** set theory was widely applied for feature selection and none of the approaches have been adoptedcompletely for medical diagnosis. The proposed work is applied for five medical data **sets** and evaluated the efficiency of NRSC compared with five different classification algorithms. The efficiency of the classification algorithm is validated from six performance measures. The performance resultsconfidently demonstrated that the **neighborhood** **rough** set **based** classification method is very effective for medical data classification. Furthermore, the NRSC delivered good results over Pawlak’s **rough** set ( ߠ ൌ Ͳሻ , BPN, MLP, SVM, KNN. The result is intensely important in decision

Show more
© 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of [name organizer] Keywords-**rough** **sets**; classification; data mining
1. Introduction
Data preprocessing is one of the first and critical step to data mining or data analysis. The results of data preprocessing is directly inputted to mining model and obtained the final results. A good data source can not only increase the accuracy of mining, but also raise the efficiency of algorithm dramatically. In general case, data preprocessing refers to as data cleaning, data integrate, data transition, data reduction, et al., processed before the implementation of data mining algorithm. Whereas, the technique of data mining concern many comprehensive area such as mathematic, computer, statistic, artificial intelligent, computer visual, et al. different application domain need various function of data preprocessing. Luqing proposes a statistical method **based** on experience, which can transit in attributes using noun [1] . Wang Da-lin, Yu Ge, Bao Yu-bing propose a novel data preprocessing algorithm oriented to Domain knowledge, which introducing the domain knowledge to algorithm for decrease the quantity of data resource [2] . Yang Yang, Liu Feng, Zhang Tian-ge developed a **new** method to extract feature **based** on conflict analysis, and effectively eliminate the redundant attributes to classification [3] . Aimed to the data **sets** feature, Huang Rong-wei, Li Wen-jing discrete the data using the theory of **rough** **sets** [4, 5] . Tang Jian-guo, Tan Ming-shu developed a rule extraction method in uncertain environment by using **rough** **sets**, concept space and contains degrees [6, 7] . All in all, above mentioned method can increased the efficiency of classification modeling by preprocessing the original data.

Show more
between pairs of data samples (i.e., instances). Fuzzy **rough** set is mainly used in classification to address the inconsistency between features and decision labels, i.e., some samples have the similar feature values but different decision labels. The in- consistency can be measured by using the lower approximation in fuzzy **rough** **sets** to assign a membership to every sample with respect to decision labels. By keeping the membership of every sample unchanged, fuzzy **rough** set **based** feature selec- tion, usually called attribute reduction, can remove redundant or irrelevant features to find an informative feature subset [6]. Despite the extensive investigation in literature, most ex- isting methods of fuzzy **rough** set **based** feature selection are restricted to the batch processing, which handles all samples of a data set in batch mode all at once. Quite often, this is uneconomic, and even impractical for large data **sets** that easily exceed the memory capacity. This reveals one weakness of those batch algorithms in terms of the runtime. **New** feature selection algorithms are thus needed that scale well with the increase of data size[7], [8]. Incremental feature selection has been explored recently to deal with the case in which data arrives sequentially (that is dynamic) or a large data set (due to its big size) has to be cut into small subsets which are then presented sequentially. There are some state-of-the- art methods for incremental feature selection **based** on **rough** **sets**. Although these incremental methods are more efficient than batch feature selection methods **based** on **rough** **sets**, they do not provide an essential insight into the incremental mechanism of fuzzy **rough** set **based** feature selection from the viewpoint of the successive arrival of sample subsets.

Show more
14 Read more

across the different range of areas including Industrial Automation, Robotics, Logistic and Transport, Avionics, Arts and Media, Virtualization and it also will prominent in future. Different intelligent techniques include the ANN, Fuzzy logic and Fuzzy **sets**, Genetic algorithms and other methods as soft computing. The **rough** set theory is an import concept that can also bring a **new** horizon in this area. In this paper the theme is discussion about the **rough** set theory and some potential industrial applications drafted. **Rough** set theory deals with the inadequate data and gives the approximate output. This paper discussion carries on the applications of AI, conceptual overview of **rough** set theory and its application with the educational data classification.

Show more
Each attribute has different effect on the decision results for one information system. As usual we assign one attribute importance depending on the increment after removing an attribute. If the increment is more, we think the attribute importance is bigger. The attribute weight method **based** on the **rough** **sets** theory is mainly through the attribute importance of decision table. The follow is the correlative definitions.

Feature subset selection is a data preprocessing step for pattern recognition, machine learning and data mining. In real world applications an excess amount of features present in the training data may result in significantly slowing down of the learning process and may increase the risk of the learning classifier to over fit redundant features. Fuzzy **rough** set plays a prominent role in dealing with imprecision and uncertainty. Some problem domains have motivated the hybridization of fuzzy **rough** **sets** with kernel methods. In this paper, the Exponential kernel is integrated with the fuzzy **rough** **sets** approach and an Exponential kernel approximation **based** fuzzy **rough** set method is presented for feature subset selection. Algorithms for feature ranking and reduction **based** on fuzzy dependency and exponential kernel functions are presented. The performance of the Exponential kernel approximation **based** fuzzy **rough** set is compared with the Gaussian kernel approximation and the **neighborhood** **rough** **sets** for feature subset selection. Experimental results demonstrate the effectiveness of the Exponential kernel **based** fuzzy **rough** **sets** approach for feature selection in improving the classification accuracy in comparison to Gaussian kernel approximation and **neighborhood** **rough** **sets** approach.

Show more
36 Read more

From Proposition 3.1 we observe easily that deleting a reducible element in a **neighborhood** system will not generate any **new** reducible elements or make other originally reducible element become irreducible elements of the **new** **neighborhood** system. Thus we can get the reduction of a **neighborhood** system of a universe U by deleting all reducible elements at each point in the same time or by deleting one reducible element at each point in a step. The remainder still consists of a **neighborhood** system of the universe U , and it is irreducible. Thus we give the definition of **neighborhood** system reduction as follows:

Show more
In this paper, a **new** framework integrating **Rough** Set (RS) and Bayesian Networks (BN) is proposed to analyze information from traffic accident database. In the proposed framework, RS reduction of attributes is first employed to generate the key set of attributes af- fecting accident outcomes, which are then fed into a BN structure as nodes for BN construction and accident outcome classification. Such framework combines the advantages of RS in knowledge reduction capability and BN in describing interrelationships among differ- ent attributes. The framework is demonstrated using the 100-car naturalistic driving data from Virginia Tech Transportation Institute to predict the accident type. Comparative evaluation with the baseline BNs shows that the RS-**based** BNs generally have a higher pre- diction accuracy and lower network complexity, while with comparable prediction coverage and ROC curve area, it proves that the proposed RS-**based** BN over- all outperforms the BNs with/without traditional fea- ture selection approaches. Also, the most significant attributes identified that affect accident types include pre-crash manoeuvre, driver’s attention from forward roadway to centre mirror, a number of secondary tasks undertaken, traffic density, and relation to junction. Most of these attributes feature pre-crash driver states and driver behaviours that have rarely been studied in the existing literature **based** on BN [18-23], which could give further insight into the nature of traffic ac- cidents.

Show more
11 Read more

According to three evaluation criteria, we conclude that the BRSC and the BRSC-GDT are more accurate than the pruned BDT. This increased accuracy can be easily explained. Our classification techniques **based** on **rough** **sets** try to reduce the UDT without affecting the classification task. Our experiments also show that the BRSC is more accurate than the BRSC-GDT. We further conclude that the post-pruned BDT gives more combined decision rules than the BRSC. This is due to the pruning which can reduce the size of the BDT. However, the model for the BRSC-GDT is smaller than the pruned BDT and the BRSC. This performance gain is due to the fact that the BRSC-GDT selects only the best and the non-contradictory decision rules. Finally, we also conclude that our **new** classification approaches **based** on **rough** **sets** are faster than the post-pruned BDT. This positive result is due to the heuristic method used in the construction of the BRSC and the BRSC-GDT which produces only one reduct from our UDT. On the other hand, the pruning increases the time requirement needed to build the BDT. Furthermore, the BRSC-GDT, which can avoid many iterations, is slightly faster than the BRSC. Hence, we can summarize our conclusions as follows:

Show more
24 Read more

) which presents the structure and describes relationships between samples. Kernel matrix plays an important role in kernel learning algorithms as it contains all the information available in order to perform further learning. The learning algorithm relies on information about the training data available through the kernel matrix. On the other hand, there are also two modules in the **rough** set methodology: (a) granulation of data (samples) into a set of information granules according to the relation of objects and (b) approximate clas- siﬁcation realized in the presence of such induced information granules. The **rough** set methodology helps extract a relation (relation matrix) dealing with samples and subsequently granulates the set of objects into a set of information granules according to the relation between objects. The objects in the granule are indistinguishable in terms of this relation. Then the information granules induced by the relation are used to approximate the classiﬁcation of the universe. Obviously, rela- tion and relation matrix form the fundamentals of **rough** set models. They play the same conceptual role in **rough** **sets** as kernel matrix in kernel machines. The types of **rough** set models are determined by the algorithms being used to extract the relationship between samples. For example, the generic **rough** set model considers into account an equivalence relation to partition the samples into disjoint equivalence classes [17] ; **neighborhood** **rough** **sets** group the samples into different **neighborhood** information granules [18] , fuzzy **rough** **sets** segment the universe with a fuzzy relation into a set of fuzzy gran- ules and approximate fuzzy **sets** with these fuzzy granules [19–23,55,57,58] . We can ﬁnd a high level of similarity between kernel methods and **rough** set algorithms if we take the kernel matrix as a relation matrix or consider the relation matrix as a kernel one. In fact, one can show that the most relation matrices used in the existing **rough** set models satisfy the conditions of kernel functions. They are positive-semideﬁnite and symmetric. At the same time, kernel matrices are symmetric and some of them are reﬂective [24,25] . This means that some of kernel matrices could be used as fuzzy relation matrices in fuz- zy **rough** **sets**. Taking this into account, we can form a bridge between **rough** **sets** and kernel methods with the relation matrices.

Show more
19 Read more

In this paper, we built up a connection between **rough** **sets**, fuzzy **sets** and lattices. Firstly, we introduced a **new** congruence relation induced by a fuzzy ideal of a distributive lattice, and then we presented a definition of lower and upper approximations of a subset of a distributive lattice with respect to a fuzzy ideal. Some properties of **rough** subsets in distributive lattices are investigated. Finally, we obtained that the notions of **rough** sublattices (ideals, filters), **rough** fuzzy sublattices (ideals, filters) are the extensions of sublattices (ideals, filters) and fuzzy sublattices (ideals, filters), respectively.

Show more
Logistics outsourcing has rapidly become a kind of **new** enterprises operation strategy with its advantages including reducing operating costs and strengthening the core competitiveness, speeding up organization reconstruction, and improving enterprises’ reaction speed. However, enterprises’ logistics outsourcing is experiencing a variety of risks because of the influences of many factors such as the uncertainty of the external environment, changes in the market, and enterprise risks decision ability and risks management ability. Therefore, how to carry out the decision of effective logistics outsourcing for enterprises becomes a research subject which need be immediately solved. To ensure the successful completion of logistics enterprises’ outsourcing, logistics outsourcing risks must be correctly evaluated.

Show more
VI. CONCLUSION
We present a database operation **based** **rough** set approach for constructing an ensemble of classifiers. Most **rough** set **based** approach systems do not integrate with the databases systems, a lot of computational intensive operations such as generating core, reduct and rule induction are performed on flat file, which limit its applicability for large data set in data mining applications. In this paper we present a database operation **based** **rough** set approach. We borrow the main ideas of **rough** set theory and redefine them **based** on the database theory to take advantage of the very efficient set-oriented database operation. We propose a novel context sensitive measure for feature ranking, present a **new** set of algorithms to calculate core, reduct, rule induction **based** on our **new** database **based** **rough** set model. Almost all the operations used in generating core, reduct, etc in our method can be performed using the database set operations such as Count, Projection. Our **rough** set **based** approach is designed **based** on database set operations, compared with the traditional **rough** set **based** data mining approach, our method is very efficient and scalable.

Show more
the process of reducing attributes and the process of selecting samples that are discernible by those attributes. In [38], it has been successfully applied to data stream, where data samples occur consecutively. It seems that the bireduct method resembles ASS-IAR since they both share the scheme of adding/removing samples and attributes. However, there are mainly the following differences between bireduct and ASS-IAR. The first one is they select samples in different fashions. Bireduct adds the newly joined sample with removing the oldest samples (i.e., samples of a current dataset), which cannot be discerned with the **new** sample by using the attribute set of the temporal bireduct. ASS-IAR, which employs our active sample selection to evaluate each newly joined sample **based** on its usefulness, filters out useless incoming samples and selects useful incoming samples to perform the incremental computation. The second one is they reduce attributes in different modes. Bireduct selects a minimal attribute subset discerning the sample set of a temporal bireduct, i.e., a reduct for the sample set in the bireduct. ASS-IAR adds attributes when a current reduct is incapable to keep the consistency of **new** dataset (i.e., the dataset after adding the newly joined sample), and removes redundant attributes in a current reduct due to the addition of attributes. The third one is the attribute subset obtained by the bireduct method is a temporal reduct, while the reduct obtained by ASS-IAR is a reduct of the whole dataset.

Show more
17 Read more

Because some information systems may have no core attributes, but the core attributes is the foundation of the present attribute reduction algorithm **based** on mutual information, in order to solve the problem the author puts forward a **new** method to measure the importance degree of attribute and construct the corresponding heuristic reduction algorithm. This proposed algorithm takes into account the increment of mutual information after adding a attribute, but also its own information entropy, which can significantly decrease the ratio that the important attribute is taken as redundant attribute to remove. The experimental results show that the algorithm can not only solve the attribute reduction of non-core information system, but also be able to get reduction attribute faster and the reduction number is less than the present algorithms.

Show more
This paper establishes a type of application system model **based** on **rough** **sets** which only achieves a small number of targets for the analysis system of network behavior research and acquires some initial and periodical achievements. This provides a good foundation for the following decision support system for management of network behavior. There still needs more study and practice in the aspects of realizing algorithm and system optimization such as the improvement of parallel processing of **rough** analysis. As an emerging soft computing, **rough** **sets** has been respected by many fields for example computer, mathematics, artificial intelligence, controlling, etc. A great variety of research achievements are produced and widely applied in many fields. With further research and development of **rough** **sets**, more **new** research problems and directions would come into being.

Show more
6 Conclusion and future work
In this paper, we have studied a general implicator-conjunctor **based** model for the lower and upper approximation of a fuzzy set under a binary fuzzy relation. We reviewed models from the literature that can be seen as special cases, and enriched the existing axiomatic approach with a **new** notion of T -coupled pairs of approximations, which characterize the operations satisfying all relevant proper- ties of classical **rough** **sets**, i.e., left-continuous t-norms and their R-implicators. An important challenge is to extend the formal treatment to noise-tolerant fuzzy **rough** set models, such as those studied in [23–29]. Observing that the implicator-conjunctor **based** approximations are sensitive to small changes in the arguments (for instance, because of their reliance on inf and sup opera- tions), many authors have proposed models that are more robust against data perturbation. However, this normally goes at the expense of the properties the corresponding fuzzy **rough** set model satisfies.

Show more
11 Read more