Top PDF Classifying Aneugens Using Functional Data Analysis Techniques

Classifying Aneugens Using Functional Data Analysis Techniques

Classifying Aneugens Using Functional Data Analysis Techniques

In the field of genetic toxicology the term aneugen is used to indicate chemical or physical agents that cause chromosomes to malsegregate during division, thereby resulting in altered DNA content in daughter cells. This form of chromosome damage can be detected in certain mammalian cell-based assays, however the molecular mechanism(s) responsible for aneugenic effects are not apparent from these conventional tests. However, the responsible molecular initiating event (MIE) is of interest to pharmaceutical, chemical, and agro-chemical industries, because this knowledge can assist their efforts to design out such liabilities and/or avoid similar chemical structures altogether. This study evaluated the ability of several experimental biomarkers to identify the MIE of aneugens from the functional curves that originate from human TK6 cells exposed to fluorescent Taxol (Taxol 488) for four hours and co-treated with known aneugens over a range of concentrations. A large functional space of classifiers were evaluated using two stages of cross validation. First, a wide space was searched using a variety of depth, area under the curve (AUC) summarized and kernel methods to identify the top performing models. The top models were then evaluated in a second stage of cross-validation to establish a mean error rate and log loss that approached their theoretical distributions.
Show more

53 Read more

Analysing functional genomics data using novel ensemble, consensus and data fusion techniques

Analysing functional genomics data using novel ensemble, consensus and data fusion techniques

Signal processing based methods: Although most feature selection methods apply association measures on the original, untransformed data, the normalised compression distance (see above) is not the only exam- ple using transformed data to assess the similarity/distance between different variables. Subramani et al. showed that mathematical transforms from the field of signal processing, which are frequently used in im- age and video processing, can also provide benefits in the feature selection domain [172]. When analysing genes with the Haar Wavelet power spectrum of the gene vectors, they observed significant differences between the spectra for different diagnostic outcome classes. Building upon these results, they developed a new class separation measure and computationally simple and fast methods for gene selection and clus- tering. Wavelet transforms have several advantages over other transforms in this context, as they provide a lossless signal transformation with results that have good properties of localization both in time and fre- quency. Especially the capacity of these transforms to spatially adapt to varying frequency behaviour can be exploited for class distinction tasks in data analysis. The authors use the Haar wavelet because it consists of very simple low- and high-pass filters and can be computed efficiently. A technical drawback of the Haar wavelet transform is that it only accepts a number of input points corresponding to an integer power of 2. To address this limitation, zeros can be appended at the right end of the input data, extending the lengths of the gene rows to the next largest number x = 2 n (a technique known as “zero-padding”). Using this approach, Subramani et al. modelled the values of each gene across multiple samples (i.e. the input matrix rows) as one-dimensional signals and applied the 1D-Haar transform on them to compute a local wavelet power spectrum at different levels of detail of the signal decomposition (see [172] for details). Genes with a high difference in the average spectra across the subsets are selected as predictors for classification. When com- paring the best-ranked genes on real-world data against those obtained using classical selection methods, the results revealed a high similarity. Therefore, the main benefit of this wavelet-based feature selection lies in the very efficient computation of the feature ranks.
Show more

214 Read more

Classifying Parkinson's Disease patients based on resting state fMRI data, using penalised regression techniques

Classifying Parkinson's Disease patients based on resting state fMRI data, using penalised regression techniques

In EN with all features, an α value closer to zero was most often selected in the cross validation process. Therefore, a penalty closer to ridge regression was applied within the groups, resulting in a high number of nonzero variables to use in the model. In SGL the α value was predetermined and very close to zero and therefore the penalty within the groups almost equalled ridge regression and quite many predictors were selected. Elastic net shows more variation in the number of selected predictors, as it had the possibility of choosing different alpha values. Neither EN nor SGL left out whole groups of features from the analysis. Elastic net has often used similar variable selection as sparse group lasso, without having the group information at its disposal, which indicates that the group information was not of much added value.
Show more

40 Read more

DIABETES DATA ANALYSIS USING MAPREDUCE AND CLASSIFICATION TECHNIQUES

DIABETES DATA ANALYSIS USING MAPREDUCE AND CLASSIFICATION TECHNIQUES

KNN is a method which is used for classifying objects based on closest training examples in the feature space. KNN is the most basic type of instance-based learning or lazy learning. It assumes all instances are points in n-dimensional space.K-nearest neighbor is a supervised learning algorithm where the result of new instance query is classified based on majority of K-nearest neighbor category. The purpose of this algorithm is to classify a new object based on attributes and training samples. This algorithm used neighborhood classification as the prediction value of the new query instance. A distance measure is needed to determine the “closeness” of instances. KNN classifies an instance by finding its nearest neighbors and picking the most popular class among the neighbors.
Show more

7 Read more

Functional Analysis of Chemometric Data

Functional Analysis of Chemometric Data

In all the examples mentioned above, the data are curv- es derived from the observation of a functional variable. To analyze such data which have been developed in recent years is a branch of statistics known as Functional Data Analysis, which emerged as a generalization of the tech- niques of multivariate data analysis to the case of func- tional data [4]. Relevant applications of FDA methodolo- gies have been first developed in different fields as eco- nomy (stock quotes), environment (temperatures and pre- cipitations) and health sciences (degree of lupus and stress), among others [5]. Different nonparametric esti- mation approaches for the FDA methodologies has been developed last years [6]. Although the spectrum is clearly a functional variable, in spectroscopy there are common- ly used multivariate calibration techniques that consider it as a finite set of variables associated with the observed wavelengths. But these variables are affected by multi- collinearity because the spectrum is obtained as a sum of peaks of electromagnetic energy absorption by chemical substances (atoms, molecules...) so that the absorbances at two wavelengths close to each other are highly corre- lated. It is therefore more informative to consider the spectrum as a functional variable containing this depen-
Show more

10 Read more

Functional Data Analysis for Sparse Longitudinal Data

Functional Data Analysis for Sparse Longitudinal Data

We propose a nonparametric method to perform functional principal components analysis for the case of sparse longitudinal data. The method aims at irregularly spaced longitudinal data, where the number of repeated measurements available per subject is small. In con- trast, classical functional data analysis requires a large number of regularly spaced measurements per subject. We assume that the repeated measurements are located randomly with a random number of repetitions for each subject and are determined by an underlying smooth random (subject-specific) trajectory plus measurement errors. Basic elements of our approach are the parsimonious estimation of the co- variance structure and mean function of the trajectories, and the estimation of the variance of the measurement errors. The eigenfunction basis is estimated from the data, and functional principal components score estimates are obtained by a conditioning step. This conditional estimation method is conceptually simple and straightforward to implement. A key step is the derivation of asymptotic consistency and distribution results under mild conditions, using tools from functional analysis. Functional data analysis for sparse longitudinal data enables prediction of individual smooth trajectories even if only one or few measurements are available for a subject. Asymptotic pointwise and simultaneous confidence bands are obtained for predicted individual trajectories, based on asymptotic distributions, for simultaneous bands under the assumption of a finite number of components. Model selection techniques, such as the Akaike information criterion, are used to choose the model dimension corresponding to the number of eigenfunctions in the model. The methods are illustrated with a simulation study, longitudinal CD4 data for a sample of AIDS patients, and time-course gene expression data for the yeast cell cycle.
Show more

14 Read more

Functional data analysis in phonetics

Functional data analysis in phonetics

In a way the current work tries to challenge one of the most famous aphorisms in Natural Language Processing; Fred Jelinek’s phrase: “Anytime a linguist leaves the group the recognition rate goes up”[159] 1 . While this phrase was famously associ- ated with Speech Recognition (SR) and Text-to-Speech (TTS) research, it does echo the general feeling that theoretical and applied work are often incompatible. This is where the current work tries to make a contribution; it aims to offer a framework for phonetic analysis that is both linguistically and experimentally coherent based on the general paradigms presented in Quantitative Linguistics for example by Baayen [16] and Johnson [155]. It attempts to present a way to bridge the analysis of low- level phonetic information (eg. speaker phonemes 2 ) with higher level linguistic information (eg. vowel types and positioning within a sentence). To achieve this we use techniques that can be broadly classified as being part of Functional Data Analysis (FDA) methods [254]. FDA methods will be examined in detail in the next chapters; for now it is safe to consider these methods as direct generalizations of usual multivariate techniques in the case where the fundamental unit of analysis is a function rather than an arbitrary collection of scalar points collected as a vector. As will be shown, the advantages of employing these techniques are two-fold: they are not only robust and statistically sound but also theoretically coherent in a con- ceptual manner allowing an “expert-knowledge-first” approach to our problems. We show that FDA techniques are directly applicable in a number of situations. Appli- cations of this generalized FDA phonetic analysis framework are further shown to apply in EEG signal analysis [118] and biological phylogenetic inference [119].
Show more

210 Read more

Comparative Analysis of Gestational Diabetes using Data Mining Techniques

Comparative Analysis of Gestational Diabetes using Data Mining Techniques

Using data mining technology for disease prediction and diagnosis has become the focus of attention. Data mining technology provides an important means for extracting valuable medical rules hidden in medical data and acts as an important role in disease prediction and clinical diagnosis. In the current study, have demonstrated, using a large sample of patients hospitalized with classification. In this research work, the classification rule algorithms namely Decision table, Multi-layer perceptron and Naives bayes are used for classifying datasets which are uploaded by user. By analyzing the experimental results it is observed that the Naïve Bayes classification technique has yields better result than other techniques. In future we tend to improve efficiency of performance by applying other data mining techniques and algorithms.
Show more

9 Read more

Classifying and Identifying of Threats in s Using Data Mining Techniques

Classifying and Identifying of Threats in s Using Data Mining Techniques

Abstract— E-mail has become one of the most ubiquitous methods of communication. The large percentage of the total traffic over the internet is thee-mail. E-mail data is also growing rapidly, creating needs for automated analysis. So, to detect crime, to organize bundles of emails, a spectrum of techniques should be applied to discover and identify patterns and make predictions. Data mining has emerged to address problems of understanding ever-growing volumes of information for structured data, finding patterns within data that are used to develop useful knowledge. Earlier Statistical methods are used to characterize user behavior , classifying spam and detecting novel email viruses. However, previous techniques have not examined the contributions of these features to their classification and they need some improvements. By applying data mining classification techniques we can achieve this very efficiently. In this paper we show that Naïve Bayes classification approach is useful for predicting user’s behavior and to organize the emails according to users constraints.
Show more

5 Read more

Classifying and Identifying of Threats in E-mails – Using Data Mining Techniques

Classifying and Identifying of Threats in E-mails – Using Data Mining Techniques

Abstract— E-mail has become one of the most ubiquitous methods of communication. The large percentage of the total traffic over the internet is thee-mail. E-mail data is also growing rapidly, creating needs for automated analysis. So, to detect crime, to organize bundles of emails, a spectrum of techniques should be applied to discover and identify patterns and make predictions. Data mining has emerged to address problems of understanding ever-growing volumes of information for structured data, finding patterns within data that are used to develop useful knowledge. Earlier Statistical methods are used to characterize user behavior , classifying spam and detecting novel email viruses. However, previous techniques have not examined the contributions of these features to their classification and they need some improvements. By applying data mining classification techniques we can achieve this very efficiently. In this paper we show that Naïve Bayes classification approach is useful for predicting user’s behavior and to organize the emails according to users constraints.
Show more

5 Read more

An Efficient Approaches for Classifying and Predicting Heart Disease using Machine Learning Techniques

An Efficient Approaches for Classifying and Predicting Heart Disease using Machine Learning Techniques

SushamaNagpal, sanchitarora, et al [2017] analysis of medical data for disease prediction requires efficient feature selection techniques, as the data contains a large number of features. Researchers have used evolutionary computation (EC) techniques like genetic algorithms, particle swarm optimization etc. for FS and have found them to be faster than traditional techniques. We have explored a relatively new EC technique called gravitational search algorithm (GSA) for feature selection in medical datasets. This wrapper based method that we have employed, using GSA and k-nearest neighbors reduces the number of features by an average of 66% and considerably improves the accuracy of prediction.
Show more

6 Read more

Comparative Analysis Of Big Data Analytical Techniques Using Mapreduce

Comparative Analysis Of Big Data Analytical Techniques Using Mapreduce

Bingwei Liu, Erick Blasch, Yu Chen, Dan Shen and Genshe Chen [12] have proposed that Machine learning technologies, such as Naive Bayes Classifier (NBC) to achieve fine-grain control of the analysis procedure for a Hadoop implementation. Additional modules was were implemented on hadoop to run NBC. Parallelization was made possible by the help of MapRe-duce, which also improved in fault tolerance, data distribution and load balance. The result showed the classification accuracy, the computation time and the throughput of the system. In this they proved that by increasing input data size worked well for Hadoop whose main functionality is working for larger data sets.It performed well compared to smaller datasets.
Show more

6 Read more

Analysis of Change Detection Techniques using Remotely Sensed Data

Analysis of Change Detection Techniques using Remotely Sensed Data

________________________________________________________________________________________________________ Abstract – Accurate information about nature and extent of land cover changes especially in rapidly growing areas is essential. Change detection plays very important role in different applications such as video surveillance, medical imaging and remote sensing. It plays a very important role in landuse and cover analysis, forest and vegetation inspection and flood monitoring. Semarang City, located on the north coast of island of Java, Indonesia that is very much prone to tidal floods. The objective of this research is to assess, evaluate and monitor the nature and extent of land cover changes in Semarang city through the period from 2012 to 2014 using remotely sensed Landsat multispectral images. Four change detection techniques namely; post-classification, image differencing, image regression and principal component analysis were applied. The objective is extended to examine the effectiveness of each change detection technique regarding the ability to differentiate changed from unchanged areas based on the pixel-by-pixel analysis and calculating the overall number of changed pixels. The results indicated that the post classification change detection technique provided the highest accuracy while the principal component analysis technique gave the least accuracy.
Show more

6 Read more

1.
													A review on the study and analysis of big data using data mining techniques

1. A review on the study and analysis of big data using data mining techniques

Capturing of Big Data, its duration, storage, sharing, analysis, visualization of the massive data and of course the most important technology are several challenges that need to be faced by the enterprises or media when handling Big Data. For that the technology needs new architecture, algorithms and techniques for its implementation. It also requires technical skills. So experts are needed for this new technology to deal with Big Data. The analysis of Big Data also needs certain challenges such as Technical Challenges, Data management and sharing, Privacy, security and trust & Misuse of Big Data.
Show more

9 Read more

BIG DATA ANALYSIS USING SVM AND K-NN DATA MINING TECHNIQUES

BIG DATA ANALYSIS USING SVM AND K-NN DATA MINING TECHNIQUES

ABSTRACT: Abstracting useful information from a big data has always been a challenging task. Data mining is a powerful technology with great potential to extract knowledge based information from such data. Prediction can be done with past and related records in different fields. Risk and safety have always been an important consideration in the field of aircraft. Prediction of accident in aircraft will save life and cost. This paper proposes an accident prediction system with huge collection of past records by applying effective predictive data mining techniques like Support Vector Machine (SVM) and K-Nearest Neighbor (K-NN) which have a greater capacity to handle huge and noisy data that are used to predict accidents with more accuracy. The methods used, prove to handle noisy, unrelated and missing data. The prediction results are tabulated and ranges between 85% to 90%. Keywords: Big data, SVM, K-NN, Accident Prediction.
Show more

8 Read more

A Study on the Techniques of Sentiment Analysis for Unstructured Data using Big Data Analytics

A Study on the Techniques of Sentiment Analysis for Unstructured Data using Big Data Analytics

The real time unstructured data often refers to the information that doesn’t follow the conventional storage of information in a row-column database. Unlike structured data it does not fit into relational databases. It is responsible for the Variety, one of the four V’s of Big Data. Sources like satellite images, sensor readings, email messages, social media, web blogs, survey results, audio, videos etc., follow unstructured data. Organizations go beyond “basic” analytics and dive deeper into unstructured data to do things such as predictive analytics, temporal and geospatial visualization, sentiment, and much more. The objective of this paper is to confer model of sentiment analysis and its various techniques. Future research directions in this field are determined based on opportunities and several open issues in Big Data analytics.
Show more

5 Read more

Using Data Mining Techniques for Performance Analysis of Software Quality

Using Data Mining Techniques for Performance Analysis of Software Quality

Cluster naturally is the gathering of comparable objects. Every cluster or cluster is homogeneous, i.e., objects belonging to a similar cluster are just like one another. Also, every cluster or cluster ought to diverge from alternative clusters, i.e., objects belonging to at least one cluster ought to diverge from the objects of alternative clusters. Clustering is that the method of grouping similar objects, and this might be exhausting or fuzzy. In exhausting bunch algorithmic program, every component is allotted to one cluster throughout its operation; but, in fuzzy clustering methodology, a degree of membership is appointed to every component betting on its degree of association to many alternative clusters. Clustering drawback for unattended knowledge exploration and analysis has been investigated for many years within the statistics, image retrieval, bioinformatics, data processing and machine learning fields. Primarily clustering algorithms aim
Show more

7 Read more

Hybrid Decision Tree using K-Means for Classifying Continuous Data

Hybrid Decision Tree using K-Means for Classifying Continuous Data

in determining the acceptability of interactive cable television in a region in [21], for breast cancer diagnosis in [6]. The authors of [22] propose a Layered Decision Tree (LDT) where each cluster from the K Means clustering becomes a layer of the decision tree. In [23], at each node of the hybrid decision tree, the data is split as per decision tree analysis and information gain is calculated. For the same data, cluster analysis is done at the node and information gain is calculated. The higher information gain of the two – decision tree and clustering is selected as the methodology for splitting at that node.
Show more

8 Read more

Classifying Human Walking Patterns using Accelerometer Data from Smartphone

Classifying Human Walking Patterns using Accelerometer Data from Smartphone

and outdoor scenarios. Using walking style for different applications, such as healthcare or identification tasks using computer-vision based techniques or multiple wearable sensors, have been studied in recent years (for example, works of Mannini and Sabatin in [2] or Davis in [6]). However, our approach differs from earlier recognition methods as follows: First, we use a very simple process of recognition that can be adapted to general task identification by employing only an accelerometer built into the smartphone. Second, our powerful features allow us to reliably distinguish individuals based on differences in their walking patterns through very simple classifiers. Furthermore, considering multiple scenarios for the walking experiment decreases the effect of environmental factors on the identification process. Last but not least, we apply a simple multi-class classifier that can distinguish users based on their movement style with high accuracy, very low false acceptance rate and false rejection rates.
Show more

6 Read more

Analysis of Data Mining Techniques

Analysis of Data Mining Techniques

contain a wealth of information, which however needs to be discovered. Businesses can learn from their transaction data more about the behavior of their customers and therefore can improve their business by exploiting this knowledge. Science can obtain from observational data (e.g. satellite data) new insights on research questions. Web usage information can be analyzed and exploited to optimize information access [3]. Data mining provides methods that allow extracting from large data collections unknown relationships among the data items that are useful for decision making. Thus data mining generates novel, unsuspected interpretations of data [1][2] .
Show more

6 Read more

Show all 10000 documents...