Cytochrome P450 2C9 (CYP2C9) is one of the most important phase 1 metabolising enzymes in humans for many therapeutically relevant pharmaceuticals. Any new chemical candidate inhibiting this membrane-associated heme protein thus would significantly affect the metabolism of physiologically important molecules and drugs, resulting in clinically significant drug-drug interactions. In search for computational tools to identify potential CYP2C9 inhibitors early in drug discovery, we constructed a filter based on a collection of 1100 structurally diverse molecules tested for CYP2C9 inhibition under identical conditions. The chemical structures were encoded using several 2D descriptors, followed by the generation of different statistical models using **support** **vector** machines (SVM). This approach consistently leads to significant and predictive models for regression and classification of CYP2C9 inhibitors. Their predictive ability was underscored by successfully applying them to a test set of 238 compounds. Even more important for early drug discovery phases is the ability of these models to correctly discriminate CYP2C9 inhibitors from inactive molecules on this enzyme. This filter also allows extracting and visualizing important ligand substructures and functional groups, which are essential to understand protein-ligand interactions for CYP2C9. To validate the correct identification of essential functional groups connected to CYP2C9 affinity, predicted features from the SVM models for some local structure-activity series in that dataset were analysed in detail. Furthermore the application of these models to the substrate S-warfarin, which recently has been co-crystallized with CYP2C9, revealed that the identified substructures are involved in the interaction with the CYP2C9 inhibitor binding site. For example, the model correctly indicated the aromatic stacking interactions with Phe114 and Phe476 as well as a hydrogen bond with backbone of Phe100. Hence, these models consistently provide guidelines for reducing CYP2C9 inhibition in novel candidate molecules.

Show more
207 Read more

An APC system depends on physical
measurement of significant properties of processed intermediate products (wafers) operated by
appropriate and generally expensive metrology equipment [1]. An optimal monitoring of production quality by physical measurements of every single wafer after every process step is from an economical point of view by far too expensive and too time consuming. So, current practice is to sample only a statistical significant amount of wafers for physical metrology. In case of misprocessing of a wafer at any equipment the impact on the final product can either be detected with some hours delay, if the wafer was directly measured after the process, or not until electrical measurement in the final wafer test. As the latter is more likely for typical wafer sampling rates, unnecessary waste of resources (e.g. materials, productive equipment and employees working time) is unavoidable. A **Virtual** Metrology (VM) system which fills the lack of physical measurement by prediction [2], [3] enables the measurement of every wafer for every process step on all capable equipments available in the fab, thus enabling significant improvement of process control as well as reduction of operational cost. Moreover, wafer-fine metrology is a requirement for real-time quality monitoring and Wafer-to-Wafer process control which is already in scope of future developments [4]. Even for processes in the past further analysis of the process result and of the prediction model can be made. In order to implement VM **Machine** Learning (ML), algorithms are trained on the available historical data (i.e. process and equipment Fault Detection and Classification (FDC) parameters, context parameters and physical metrology results), and then applied to input data from current production to predict the associated metrology outcome [5]. In contrast to physical metrology, VM provides thereby a calculated result which is, due to the deterministic nature of the used software algorithm, highly reproducible and repeatable at any time.

Show more
10 Read more

The applications discussed in this paper enunciate the effectiveness of the **machine** learning approaches in **virtual** screening. It has some definitive advantages over random selection. Therefore, it is evident that **virtual** screening can make important contributions to the drug discovery process. The application of **machine** learning is particularly beneficial when the objective is to reduce a large data set to a smaller chemical library. But, it must is noticed that the efficiency of these methods completely depends on the quality of the data set being used. Feature selection is also important for predictive model building. When feature selection and training of the model occurs simultaneously it should be taken care that the statistical distribution of the data has been chosen appropriately.

Show more
Gesture recognition has opened gates for new methods of Human-Computer interaction even more powerful than Graphical User interfaces based on mouse and keyboards. Gesture recognition can enable human to interact with the **machine** with no external devices like mechanical devices in use. Gesture recognition is broadly classified as Glove based Gesture Recognition and Vision-Based Gesture Recognition. Glove based Gesture recognition has a drawback that it hides the naturalness as it needs accessories in the form of devices to interact with machines[ 4]. But Vision- based gesture recognition uses features from the Visual image of body part like hand and compares its features with those features extracted from web camera [3].

Show more
is adjusted to δ = 0.1 and the kernel is a standard Gaussian function with parameter σ = 1. Accuracy for the **machine** evaluated on the training vectors is 95% correct, 5% error and all the training patterns are classified. A low insensitivity parameter δ = 0.1 causes the labelling of all the data, and as a result several errors can be appreciated. Accuracy results obtained on the test vectors are given in Table 3 . Overall, the model makes a correct prediction on 247 patterns (90.15%), makes mistakes on 21 (7.66%) and no label is assigned on 6 (2.19%). It can be concluded that SVMs are sensitive to the relative size of the classes, an inherent characteristic on any discriminant analysis.

Show more
SVMs (**Support** **Vector** Machines) are the efficient technique for data classification and prediction. It works on the principle of supervised learning. As we discussed that kernel function plays an important role in the classification by **support** **vector** **machine**. Kernels are used to project the data points in the higher dimensions for better classification of the datasets as shown in fig.1. Some kernel functions are present in **support** **vector** **machine** algorithm are based on neural networks. **Support** **vector** **machine** is considered easier to use than neural networks but time taken by **support** **vector** **machine** is more compared to neural network [2]. The radial basis kernel, polynomial kernel and sigmoid kernel of **support** **vector** **machine** is used for non-linear separation and works on the principle of neural networks.

Show more
The random subspace method [4], an example of a random sampling algorithm, incorporates the benefits of bootstrapping and aggregation. Multiple classifiers can be generated by training on multiple sets of features that are produced by bootstrapping, i.e., random sampling with replacement on the training features. Aggregation of the generated clas- sifiers can then be implemented by the MVR or other multiple classi- fiers combination rules. For SVM-based RFs, overfitting is encountered when the training set is relatively small compared to the high dimen- sionality of the feature vectors. In order to avoid this over-fitting issue, we sample a small subset of features to reduce the discrepancy between the size of the training data size and the length of the feature **vector**. Exploiting this feature sampling step, we can make the kernel method operate satisfactorily. However, we cannot utilize the random subspace method directly because the cotraining algorithm requires that the dif- ferent subclassifiers should only be weakly related. Consequently, we randomly select the subset features without replacement in our new al- gorithm to meet our requirements for multitraining.

Show more
Experiments Using MATLAB Programming Conducted on Artificial Datasets in the PC As shown in fig.1, the experimental dataset is artificial which contains 60 samples with t = 0.7, R=1, C = 0.01. The positive class samples are showed with” + ”, while the negative class with” ”, and the kernel is linear function. The red line is for interval-valued fuzzy **support** **vector** **machine**, the blue line is for the fuzzy **support** **vector** **machine** and the pink line for **support** **vector** **machine**. We can see from Fig.1, although there are 4 negative samples scattered in class” + ”, due to the distance between the other negative samples is larger, they are more likely to be outliers. While there are also 4 positive samples near class” ”, because they are more closely to each other, they are less likely to become the outliers. The SVM obvious put the 4 positive samples to the negative class; FSVM corrects 1 than SVM but still puts the other 3 to the negative, while IFSVM put the 4 samples totally to the positive, which is more in line with the habits of people's judgment.

Show more
Kajian tentang bidang pengkategorian teks melibatkan proses pengelasan dokumen teks ke dalam beberapa kategori yang telah ditakrifkan oleh pengguna. Objektif bagi projek ini ialah untuk membuat kajian proses pengelasan email mengikut kategori dengan menggunakan perisian **Support** **Vector** **Machine** (SVM). Antara proses yang digunakan ialah membaca data input email dari bahagian subjek dan body,

39 Read more

Abstract. This paper presents a supervised approach for relation extraction. We apply **Support** **Vector** Machines to detect and classify the relations in Automatic Content Extraction (ACE) corpus. We use a set of features including lexical to- kens, syntactic structures, and semantic entity types for relation detection and classification problem. Besides these linguistic features, we successfully utilize the distance between two entities to improve the performance. In relation detec- tion, we filter out the negative relation candidates using entity distance thresh- old. In relation classification, we use the entity distance as a feature for **Support** **Vector** Classifier. The system is evaluated in terms of recall, precision, and F- measure, and errors of the system are analyzed with proposed solution.

Show more
12 Read more

Classification techniques are widely used in data mining to classify data among various classes. Classification techniques are being used in different industry to easily identify the type and group to which a particular tuple belongs. Classification is a data mining (**machine** learning) technique used to predict group membership for data instances. Classification is a two step process. 1st step is Model Construction and 2 nd step is Model Usage. There are many algorithms which are used for classification in data mining.

Show more
A natural thinking of multi-category classication problem is to decompose the prob- lem into a series of binary classications so that the traditional SVM can be ap- plied, which is called the indirect methods (Weston and Watkins (1999)). As two standard ensemble schemes, one-versus-one and one-versus-all are two popular tech- niques. Thus, to solve a k− class problem, at least k **support** **vector** machines have to be created, with at least k times of optimization each of which deals with a binary classication. A potential issue of the indirect methods is that each of the binary classication processes tends to become highly imbalanced along with the increasing number of categories, where the imbalanced problem during classication will occur if more sample points of a specic class than others exist. Thus, the standard SVM will be aected dramatically by the class with larger sample sizes and ignore that with smaller sizes. Consequently, the standard **support** **vector** machines will become quite sensitive to highly imbalanced classication problem due to its mechanism of construction, and will be prone to constructing classiers which potentially have large bias to majority classes over the minority ones.

Show more
138 Read more

Traditional extensions of the binary **support** **vector** **machine** (SVM) to multiclass problems are either heuristics or require solving a large dual optimization problem. Here, a general- ized multiclass SVM is proposed called GenSVM. In this method classification boundaries for a K-class problem are constructed in a (K − 1)-dimensional space using a simplex encoding. Additionally, several different weightings of the misclassification errors are in- corporated in the loss function, such that it generalizes three existing multiclass SVMs through a single optimization problem. An iterative majorization algorithm is derived that solves the optimization problem without the need of a dual formulation. This algorithm has the advantage that it can use warm starts during cross validation and during a grid search, which significantly speeds up the training phase. Rigorous numerical experiments compare linear GenSVM with seven existing multiclass SVMs on both small and large data sets. These comparisons show that the proposed method is competitive with existing methods in both predictive accuracy and training time, and that it significantly outperforms several existing methods on these criteria.

Show more
42 Read more

In this paper, we study the problem of distributed inference for linear **support** **vector** **machine** (SVM). SVM, introduced by Cortes and Vapnik (1995), has been one of the most popular classifiers in statistical **machine** learning, which finds a wide range of applications in image analysis, medicine, finance, and other domains. Due to the importance of SVM, various parallel SVM algorithms have been proposed in **machine** learning literature; see, e.g., Graf et al. (2005); Forero et al. (2010); Zhu et al. (2008); Hsieh et al. (2014) and an overview in Wang and Zhou (2012). However, these algorithms mainly focus on addressing the computational issue for SVM, i.e., developing a parallel optimization procedure to minimize the objective function of SVM that is defined on given finite samples. In contrast, our paper aims to address the statistical inference problem, which is fundamentally different. More precisely, the task of distributed inference is to construct an estimator for the population risk minimizer in a distributed setting and to characterize its asymptotic behavior (e.g., establishing its limiting distribution).

Show more
41 Read more

The **support** **vector** **machine** has been successful in a variety of applications. Also on the theoretical front, statistical properties of the **support** **vector** **machine** have been studied quite extensively with a particular attention to its Bayes risk consistency under some conditions. In this paper, we study somewhat basic statistical properties of the **support** **vector** **machine** yet to be investigated, namely the asymptotic behavior of the coefficients of the linear **support** **vector** **machine**. A Bahadur type representation of the coefficients is established under appropriate conditions, and their asymptotic normality and statistical variability are derived on the basis of the representation. These asymptotic results do not only help further our understanding of the **support** **vector** **machine**, but also they can be useful for related statistical inferences.

Show more
26 Read more

The **support** **vector** **machine** (SVM) is a widely used tool for classification. Many efficient imple- mentations exist for fitting a two-class SVM model. The user has to supply values for the tuning parameters: the regularization cost parameter, and the kernel parameters. It seems a common prac- tice is to use a default value for the cost parameter, often leading to the least restrictive model. In this paper we argue that the choice of the cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model. We illustrate our algorithm on some examples, and use our representation to give further insight into the range of SVM solutions.

Show more
25 Read more

For implementing SVM, a Software called LIBSVM by Chung Chang and Chih-Jen Lin was used. LIBSVM is integrated software for **support** **vector** classification, regression and distribution estimation [one-class SVM].It supports multiclass classification [3, 4]. The goal of using LIBSVM is to identify positives so that the classifier can accurately predict the unknown data [i.e. testing data].The values from the testing file is fed into the LIBSVM tool for training and predicting the data set and analysis is done [3].