The main difference beween profile-csHMMs and traditional profile-HMMs is as follows. Un- like profile-HMMs, profile-csHMMs have three different types of match states: single-emission match states, pairwise-emission match states, and context-sensitive match states. Single-emission match states are identical to regular states in ordinary HMMs, and they are used to represent base po- sitions in the consensus sequence that are not involved in base pairing. On the other hand, for two bases that form a base pair, we use a pairwise-emission match state and the corresponding context- sensitive match state to model the correlation between them. Each pair of pairwise-emission match state and context-sensitive match state has a distinct memory dedicated to it. The auxiliary mem- ory can be either a stack or a queue, whichever is more convenient for modeling the correlations that are under consideration. A pairwise-emission match state stores the emitted symbol in this memory before making a transition to the next state. When we enter a context-sensitive match state, it first accesses the memory and reads the symbol that was previously emitted at the corre- sponding pairwise-emission match state. The emission probabilities are adjusted according to this observation. For example, when a C was emitted at the pairwise-emission match state, the emis- sion probabilities at the context-sensitive match state may be adjusted such that it emits a G with high probability (or simply, with probability one).
Das Elektrokardiogramm (EKG) zeichnet die elektrische Aktivität des Herzens auf der Brust- oberfläche auf. Dieses Signal kann einfach und kostengünstig aufgenommen werden und wird daher in einer Vielzahl von mobilen und stationären Anwendungen genutzt. Es ist über die letzten 100 Jahre zum Goldstandard bei der Diagnose vieler kardiologischer Krankheiten geworden. Herzerkrankungen bleiben ein relevantes Thema in unserer Gesellschaft, da sie zu 30 % aller Todesfälle weltweit führen. Allein die koronare Herzkrankheit ist die häufigste Todesursache überhaupt. Weiterhin sind 2 bis 3 % der Europäer von Herzrhythmusstörungen wie Vorhofflimmern und Vorhofflattern betroffen. Die damit verbundenen geschätzten Kosten in der Europäischen Union belaufen sich auf 26 Milliarden Euro pro Jahr. In allen diesen Fällen ist die Aufzeichnung des EKGs der erste unumgängliche Schritt für eine verlässliche Diagnose und erfolgreiche Therapie.
Keywords and phrases: functional genomics, gene network, genomics, genomicsignalprocessing, microarray.
Sequences and clones for over a million expressed sequence tagged sites (ESTs) are currently publicly available. Only a minority of these identified clusters contains genes associ- ated with a known functionality. One way of gaining insight into a gene’s role in cellular activity is to study its expres- sion pattern in a variety of circumstances and contexts, as it responds to its environment and to the action of other genes. Recent methods facilitate large-scale surveys of gene expression in which transcript levels can be determined for thousands of genes simultaneously. In particular, expression microarrays result from a complex biochemical-optical sys- tem incorporating robotic spotting and computer image for- mation and analysis. Since transcription control is accom- plished by a method that interprets a variety of inputs, we require analytical tools for expression profile data that can detect the types of multivariate influences on decision mak- ing produced by complex genetic networks. Put more gen- erally, signals generated by the genome must be processed to characterize their regulatory eﬀects and their relationship to changes at both the genotypic and phenotypic levels. Two salient goals of functional genomics are to screen for key genes and gene combinations that explain specific cellular
between the expression profiles of the two genes. However, Chow-Liu algorithm loses validity if the underlying model is a cyclic graph. In addition, when the graph is densely connected, this scheme might miss too many edges.
Margolin et al.  proposed the Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) based on the information provided by independent microarray samples. ARACNE inferred the direct connectivity among genes using the mutual information and data processing inequality (DPI). ARACNE assumes first a fully connected graph and a pre-defined mutual information threshold. Whenever the mutual information between two genes X and Y , i.e., I(X; Y ), is less than the pre-specified threshold, it disconnects the two genes. Next, if in the preliminary graph there exists another gene Z so that I(X; Y ) < min(I(X; Z), I(Y ; Z)), then ARACNE will disconnect X and Y . ARACNE relies on the critical assumption that the gene interactions could be described by Markov chains. ARACNE was run on the synthetic networks generated by Mendes in 2003. The performance was evaluated favorably in terms of precision and specificity. ARACNE was also simulated in the presence of the human B-cell data. The inferred B-cell network was compared with those previously identified through biochemical methods. The published targets of hub gene c-MYC were found to be mostly c-MYCs direct neighbors in the reconstructed network.
Most signals and processes in nature are continuous.
However, genomic information occurs in the form of discrete sequences. The meaning of genomic information is nothing but the information about DNA and protein data. Application of digital signalprocessing (DSP) techniques to the genomic data is termed as genomicsignalprocessing (GSP). The fundamental functionality of any living cell of an organism is the production of proteins and these proteins are basically transformed from the DNA sequence. The DNA is composed of chromosomes that consist of genes. The important regions of DNA that code for proteins are called Exons and Introns.
In this study, a wavelet based Multiresolution Analysis (MRA) has been implemented in order to extract the FHS and MHS from FPCG. In MRA, the given signal s (n), i.e., SNR enhanced FPCG is decomposed into various levels of approximation and detail coefficients. The approximation and detail coefficients obtained after the MRA were hard thresholded based on the bio signal of interest and reconstructed back into time domain. The chosen mother wavelet should provide a reasonably good frequency resolution to FHS and MHS, through compact support. The mother wavelet needs to be able to detect the presence of hidden discontinuities. In addition, the wavelet should also be orthogonal to avoid phase distortions from the transformation. All the requirements of the current study were satisfied by the Coiflet wavelet with five vanishing moments. In addition, qualitatively, correlation of Coiflet-5 wavelet to the heart sounds (fetal and maternal) is very high.
of unique protein folds exist in nature and structure prediction of a target sequence can be performed by consulting a database of known folds and determining which fold-model best fits the sequence. Both homology modeling and threading rely on the existence of known structures and the disadvantage of such approaches is that accurate prediction relies on proteins of similar structure already being solved. Another approach, namely the ab initio techniques  or prediction from first principles, bases structure prediction on known biochemical and biophysical facts related to the proteins. In general they are computationally very expensive methods. Machine learning methods such as neural network and nearest neighbor techniques, utilize a localized prediction methodology in the sense that a window, typically of less than 20 amino acids, is presented to the prediction system with the aim of predicting secondary structure. However, local information accounts for approximately 65% of secondary structure formation . Therefore, prediction can potentially be improved by incorporating a more global prediction scheme . Secondary structure prediction methods often employ neural networks (NNs) , SVMs , and hidden Markov models (HMMs) [16, 17]. In neural networks and SVMs utilize an encoding scheme to represent the amino acid residues by numerical vectors. On the other hand, in HMM methods, hidden states generate segments of amino acids that correspond to the non- overlapping secondary structure segments. There are two types of protein secondary structure prediction algorithms. A single sequence algorithm does not use information about other similar proteins. The algorithm should be suitable for a nonhomologous sequence with no sequence similarity to any other protein sequence. Algorithms of another type explicitly use sequences of homologous proteins, which often have similar structures. The accuracy (sensitivity) of the best current single sequence prediction methods is below 70%. The prediction accuracy of the best prediction methods that employ information from multiple alignments is close to 82.0% .
Abstract Genomicsignalprocessing (GSP) is an engineering domain involved with the analysis of genomic data using digital signalprocessing (DSP) approaches after transformation of the sequence of genome to numerical sequence. One challenge of GSP is how to minimize the error of detection of the protein coding region in a specified deoxyribonucleic acid (DNA) sequence with a minimum processing time. Since the type of numerical representation of a DNA sequence extremely affects the prediction accuracy and precision, this study aimed to compare different DNA numerical representations (genetic code context (GCC), atomic number, frequency of nucleotide occurrence in exons (FNO), 2-bit binary and electron ion interaction potential (EIIP)) by measuring the sensitivity, specificity, correlation coefficient (CC) and the processing time for the protein coding region detection. The proposed technique based on digital filters was used to read-out the period 3 components and to eliminate the unwanted noise from DNA sequence. This method applied to 20 human genes demonstrated that the maximum accuracy and minimum processing time are for the 2-bit binary representation method comparing to the other used representation methods. Results suggest that using 2-bit binary representation method significantly enhanced the accuracy of detection and efficiency of the prediction of coding regions using digital filters.
assessed in a time varying environment, the algorithm can provide rapid and reliable classifications, such that transient changes can be estimated.
A major limitation of the classification algorithm developed in this research is the use of a linear regression selection approach to identify significant statistical and wavelet physiological feature inputs for non-linear classification models. Since the relationships between physiological measures and classes of emotional responses are not definite, this research attempted to explore linear models for preliminary feature reduction and initial relationship assessment. Although residual diagnosis indicated the data to conform with statistical model assumptions, relatively low R-squared values suggested there might have been an interaction among the independent variables in predicting emotional states. A non-linear approach was then applied to achieve more accurate models. However, the linear assumptions of the regression modeling may have limited the selection of input variables with significant non-linear relationships with arousal and valence states. To avoid any constraining assumptions, it might be more appropriate to directly apply neural network technology for non-linear modeling and consider degrees of feature importance for input reduction. However, without preliminary feature reduction, the network construction could be very complex and, therefore, lead to model overfitting due to a large network relative to the number of data samples available for analysis.
Ognjen Rudovic, Mihalis A. Nicolaou and Vladimir Pavlovic
In this chapter we focus on systematization, analysis, and discussion of recent trends in machine learning methods for Social signalprocessing (SSP)(Pentland 2007). Because social signaling is often of central importance to subconscious de- cision making that affects everyday tasks (e.g., decisions about risks and rewards, resource utilization, or interpersonal relationships) the need for automated un- derstanding of social signals by computers is a task of paramount importance. Machine learning has played a prominent role in the advancement of SSP over the past decade. This is, in part, due to the exponential increase of data avail- ability that served as a catalyst for the adoption of a new data-driven direction in affective computing. With the difficulty of exact modeling of latent and complex physical processes that underpin social signals, the data has long emerged as the means to circumvent or supplement expert- or physics-based models, such as the deformable musculo-sceletal models of the human body, face or hands and its movement, neuro-dynamical models of cognitive perception, or the models of the human vocal production. This trend parallels the role and success of machine learning in related areas, such as computer vision, c.f., (Poppe 2010, Wright et al. 2010, Grauman & Leibe 2011), or audio, speech and language processing, c.f., (Deng & Li 2013), that serve as the core tools for analytic SSP tasks. Rather than emphasize the exhaustive coverage of the many approaches to data-driven SSP, which can be found in excellent surveys (Vinciarelli et al. 2009, Vinciarelli et al. 2012), we seek to present the methods in the context of current modeling challenges. In particular, we identify and discuss two major modeling directions: • Simultaneous modeling of social signals and context, and
In the field of acoustics, the most common digital signalprocessing techniques are frequency based. Frequency-based filtering has been the chosen method of digital signalprocessing, because there is an analog equivalent method that was used before digital signalprocessing was available. Furthermore, frequency-based filtering is easily understood using basic circuit
analysis. While frequency-based filtering is effective in some cases, there are many cases where such a filter is not sufficient. The motivation for this paper stems from the desire to increase the signal-to-noise ratio (SNR) of an ultrasonic signal produced from a spray-on piezoelectric transducer
Recent machine learning methods tend to rely on a having access to a vast amount of correctly annotated examples to perform prediction and “extrapolation“. This paradigm is not always true and this work focuses on reducing this dependency. We propose methods allowing to perform accurate complex image classification based on only a very limited num- ber of training examples. This type of methods can prove to be useful in domains where collecting examples is costly (medical studies, physic experiments, rare events...). Those methods also heavily relies on correct annotations. We here develop methods to alleviate that need. This types of methods are valuable in situation where a perfect oracle —i.e. person able to produce annotations— does not exist. This is the case for example for medical image analysis, in analysis of spatial imagery. Those two improve- ments reduce the cost of using machine learning by reducing the need for big highly curated datasets.
Kashino et al. brought Bregman’s ideas to music scene analysis and also proposed several other new ideas for music transcription [Kas93, 95]. The front-end of their system used a “pinching plane method” to extract sinusoidal tracks from the input signal. These were clus- tered into note hypotheses by applying a subset of the above-mentioned perceptual cues. Har- monicity rules and onset timing rules were implemented. Other types of knowledge were integrated to the system, too. Timbre models were used to identify the source of each note and pre-stored tone memories were used to resolve coinciding frequency components. Chordal analysis was performed based on the probabilities of the notes to occur under a given chord. Chord transition probabilities were encoded into trigram models (Markov chains). For compu- tations, a Bayesian probability network was used to integrate the knowledge and to do simulta- neous bottom-up analysis, temporal tying, and top-down processing (chords predict notes and notes predict components). Evaluation material comprised five different instruments and poly- phonies of up to three simultaneous sounds. The work still stands among the most elegant and complete transcription systems. Later, Kashino et al. have addressed the problem of source identification and source stream formation when the F0 information is given a priori [Kas99]. The PhD work of Sterian was more tightly focused on implementing the perceptual grouping principles for the purpose of music transcription [Ste99]. Sinusoidal partials were used as the mid-level representation. These were extracted by picking peaks in successive time frames using modal distribution and then by applying Kalman filtering to estimate temporally continu- ous sinusoidal tracks. Sterian represented the perceptual grouping rules as a set of likelihood functions, each of which evaluated the likelihood of the observed partials given a hypothesized grouping. Distinct likelihood functions were defined to take into account onset and offset tim- ing, harmonicity, low partial support, partial gap, and partial density (see [Ste99] for the defini- tions of the latter concepts). The product of all the likelihood functions was used as a criterion for optimal grouping. While an exhaustive search over all possible groupings is not possible, a multiple-hypothesis tracking strategy was used to find a suboptimal solution. For each new partial, new competing hypotheses were formed and the most promising hypotheses were tracked over time. Evaluation results were given for a small test set with 1–4 concurrent sounds.
Given a 1:1 label-free LC-FTMS dataset pair contain- ing the same sample, because of experimental variation and random suppression, the measured peptide fold change is actually randomly distributed around 1:1, which is defined as the null distribution. We will need such null distributions to estimate the significance of measured fold changes in LC-FTMS experiments that compare two different samples. Note that since suppres- sion characteristics change with intensity levels, null dis- tributions also changes . At a lower intensity level, due to significant random suppression, the null distribu- tion generally has a large variance. At higher intensity levels, the random suppression effect is considerably less, and the null distribution is mainly caused by experimental variations. Generally, the null distribution at a given intensity level is not directly available in a regular differential LC-FTMS experiments, and we have to estimate them in order to provide significance P- values for all fold changes. Without estimating the appropriate null distributions, it would be hard to detect differentially expressed proteins reliably especially in the low intensity region. Currently no software provides such significance estimation or suppression correction. We develop a software, Gcorr, that performs correction/ suppression characteristics analysis, and fold change sig- nificance estimation at different intensity levels.
This thesis explores some of the main approaches to the problem of speech signal enhancement. Traditional signalprocessing techniques including spec- tral subtraction, Wiener filtering, and subspace methods are very widely used and can produce very good results, especially in the cases of constant ambient noise, or noise that is predictable over the course of the signal. We first study these methods and their results, and conclude with an analysis of the suc- cesses and failures of each. Comparisons are based on the effectiveness of the methods of removing disruptive noise, the speech quality and intelligibility of the enhanced signals, and whether or not they introduce some new artifacts into the signal. These characteristics are analyzed using the perceptual eval- uation of speech quality (PESQ) measure, the segmental signal-to-noise ratio (SNR), the log likelihood ratio (LLR), and weighted spectral slope distance.
Assuming perfect channel state information at the receiver, we simulate the performance of the proposed scheme over Rayleigh fading channel. Fig. 9 gives the average BER perfor- mance of S (4,8)(348,696) and S (3,6)(78,156) with the system load β = 2. The performance of the NOMA scheme in  with also 200% overload and OMA with QPSK modulation are provided for comparison. It reveals that the proposed NOMA scheme outperforms the other two while achieving the same spectral efficiency. It can be observed that the proposed NOMA scheme is able to obtain the diversity gain in fading channel through spreading the signal over several resources. The diversity gain the NOMA scheme can obtain is approximately the same as the designed variable sparsity. The theoretical BER of OMA- QPSK with diversity gain being 3 and 4 are also plotted to verify the observation.
Given that auscultation is a comparably inconclusive test, one can argue that in- creasing the reliability of the test by developing an objective auscultation analysis approach would allow the physicians to avoid a significant amount of expense, creating a relief on the healthcare budgets of especially less developed countries . Such a system would then increase the confidence of the diagnosis, and thus broaden availability of cardiac diagnostics even where more complicated tests are not easily accessible. With the electronic stethoscopes becoming more available by the day, it is now possible to discuss the possibility of an computer-aided af- fordable analysis tool that will provide objective measures of heart disease risks. There have been numerous approaches for detecting heart diseases using a range of signalprocessing and machine learning techniques. The majority of these algorithms share a three-step approach; involving (1) cardiac cycle segmentation, (2) feature extraction, and (3) classification. The last two steps are well-studied in the literature; many different machine learning approaches including but not limited to support vector machines, artificial neural networks, even decision trees were applied on the problem of the classification of heart sounds once they are segmented [6, 10–26], however the approaches to the segmentation step remain outdated. Bentley et al. acknowledge that once the segmentation challenge is solved, the following steps will be “considerably easier”.
Abstract : -A fast matched-filter reconstruction technique for ground penetrating radar (GPR) tomography is proposed for generating 2D images of buried objects with signalprocessing techniques to calibrate GPR data.
Reconstruction of a 2D image from these data is achieved with numerical discretization and matched-filter techniques. This requires less computational power and is simpler to implement relative to matrix inversion or other inversion methods. The primary benefits, as compared to other GPR imaging methods, are improved resolution and 2D imaging for easy survey analysis. The 2D imaging benefits are derived from the increased data collection (via multiple antenna look) that supports state-of-the-art GPR tomography to generate high-resolution 2D images. In addition, background suppression and calibration methods are presented to further the technique by removing clutter. Experimentation at the Mumma Radar Lab (MRL) at the University of Dayton was conducted to verify the proposed technique.
reconstructed signals to be obtained. Again, results from two ECG databases indicate that the method can accurately con- trol the quality in terms of mean value with a low standard deviation. Some discussion of the impact of baseline wander and noise is included, along with a description of the clini- cal validation sought for these results. The second paper in this category, “Lossless compression schemes for ECG sig- nals using neural network predictors,” by K. Ramakrishnan and E. Chikkannan, presents lossless compression schemes for ECG signals based on neural network predictors and en- tropy encoders. Decorrelation is achieved by nonlinear pre- diction in the first stage and encoding of the residues is done by using lossless entropy encoders in the second stage. Dif- ferent types of lossless encoders, such as Hu ﬀ man, arith- metic, and runlength encoders are used. The performances of the proposed neural network predictor-based compression schemes are evaluated using standard distortion and com- pression eﬃciency measures. Selected records from the MIT- BIH arrhythmia database are used for performance evalua- tion. The proposed compression schemes are compared with linear predictor-based compression schemes and it is shown that about 11% improvement in compression eﬃciency can be achieved for neural network predictor-based schemes with the same quality and similar setup. They are also compared with other known ECG compression methods and the exper- imental results show that superior performances in terms of the distortion parameters of the reconstructed signals can be achieved with the proposed schemes.
It is more difficult to adjust for differences in gene expression and alternative splicing. While each cell contributes an equal amount of DNA, and therefore an equal amount of methylation signal, neurons are relatively larger cells compared to glial cells and contribute more RNA. Also, many marker genes of neurons lie in synapses which can vary without the actual neural proportion changing. For now, we are limited to adjusting only the methylation data for proportions. Figure 30 gives the results from using the likelihood ratio test with the adjusted methylation residuals. The left panel gives a histogram of p-values which is very similar to the one using the original β-values. The right panel plots −log 10 (p-values) using mixture adjusted residuals against the −log 10 (p-values) from the original likelihood ratio test. There does not appear to be a large change in significance, with the exception of one gene that becomes highly significant after adjustment: PRRC1, a gene that is highly expressed in the brain and has been recently implicated as having an effect on liquid intelligence over brain development (Rowe et al. 2013). We will elaborate more on this gene in the following section.