Top PDF Deep Learning Architectures for Novel Problems

Deep Learning Architectures for Novel Problems

Deep Learning Architectures for Novel Problems

Table 2.7 lists document classification accuracies compared with the recent ap- proaches. Our Graph-CNN architecture (2×48F-7F) contains three Graph-CNN lay- ers: first two layers with 48 filters and third layer with seven filters. The last Graph- CNN layer computes the prediction of each vertex. We then expand this network by adding 0-hop filters after each Graph-CNN. Dropout was also added before each 0-hop filter. We observed that with deeper architectures, the network quickly overfits on the training set and the performance degrades on the test set. We report these in Table 2.6. For the model with Dropout and 0−hop, the highest accuracy that we obtain are 89.14% and 91.51% on 3 and 10 folds, respectively. All models were trained using Adam optimization [22]. The BatchNorm layers were modified to no longer use running average for mean and variance since there is only a single large sample graph.
Show more

66 Read more

A Review of Deep Learning with Special Emphasis on Architectures, Applications and Recent Trends

A Review of Deep Learning with Special Emphasis on Architectures, Applications and Recent Trends

over, because renewable energy data is complicated in nature, shallow learning models may be insufficient to identify and learn the corresponding deep non-linear and non-stationary features and traits [178]. Among the various renewable energy sources, wind and solar energy have gained more popularity due to their potential and high availability [179]. As a result, in recent years the research endeavours have been focused on developing DL techniques for the problems related to the deployment of the aforementioned renewable energy sources. Photovolatic (PV) energy has received much attention, due to its many advantages; it is abundant, inexhaustible and clean [180]. However, due to the chaotic and erratic nature of the weather systems, the power output of PV energy systems is intermittent, volatile and random [181]. These uncertainties may potentially degrade the real-time control performance, reduce system economics, and thus pose a great challenge for the management and operation of electric power and energy systems [182]. For these reasons, the accuracy of forecasting of PV power output plays a major role in ensuring optimum planning and modelling of PV plants. In [178] a deep neural network architecture is proposed for deterministic and probabilistic PV power forecasting. The deep architecture for deterministic forecasting comprises a Wavelet Transform and a deep CNN. Moreover, the probabilistic PV power forecasting model combines the deterministic model and a spine Quantile Regression (QR) technique. The method has been evaluated on historical PV power data-sets obtained from two PV farms in Belgium, exhibiting high forecasting stability and robustness. In Gensler et al. [183], several deep network architectures, i.e. MLP, LSTM networks, DBN and Autoencoders, have been examined with respect to their forecasting accuracy of the PV power output. The performance of the methods is validated on actual data from PV facilities in Germany. The architecture that has exhibited the best performance is the Auto-LSTM network, which combines the feature extraction ability of the Autoencoder with the forecasting ability of the LSTM. In [184] an LSTM-RNN is proposed for forecasting the output power of solar PV systems. In particular, the authors examine five different LSTM network architectures in order to obtain the one with the highest forecasting accuracy at the examined data-sets, which are retrieved from two cities of Egypt. The network, which provided the highest accuracy is the LSTM with memory between batches.
Show more

23 Read more

An Accurate Method Based on Deep Learning Architectures for Real world Object Detection

An Accurate Method Based on Deep Learning Architectures for Real world Object Detection

Abstract. Object detection and recognition play an important role in blind navigation. However, in real-world shooting, images are degraded and deviate from the statistical distribution of academic datasets, which have a bad impact on object detection. Here, using deep neural networks and door detection as an example, we simulate problems that the blind may encounter when shooting images, such as lack of illumination in the imaging environment, relative motion with the object in exposure moment, rotation and jitter of the photographic apparatus and the like. After establishing a mathematical model of image degradation, we compare and demonstrate the impact of various image degradation on door detection. By degenerating the training set, we train a robust model that improves the average precision (AP) of the door detection in real scenes and outperforms other training methods.
Show more

10 Read more

An Empirical Evaluation of various Deep Learning Architectures for Bi Sequence Classification Tasks

An Empirical Evaluation of various Deep Learning Architectures for Bi Sequence Classification Tasks

Several tasks in argumentation mining and debating, question-answering, and natural language inference involve classifying a sequence in the context of another sequence (referred as bi- sequence classification). For several single sequence classification tasks, the current state-of- the-art approaches are based on recurrent and convolutional neural networks. On the other hand, for bi-sequence classification problems, there is not much understanding as to the best deep learn- ing architecture. In this paper, we attempt to get an understanding of this category of problems by extensive empirical evaluation of 19 different deep learning architectures (specifically on dif- ferent ways of handling context) for various problems originating in natural language processing like debating, textual entailment and question-answering. Following the empirical evaluation, we offer our insights and conclusions regarding the architectures we have considered. We also establish the first deep learning baselines for three argumentation mining tasks.
Show more

12 Read more

RootNav 2.0: Deep learning for automatic navigation of complex plant root architectures

RootNav 2.0: Deep learning for automatic navigation of complex plant root architectures

Background: In recent years quantitative analysis of root growth has become increasingly important as a way to explore the influence of abiotic stress such as high temperature and drought on a plant’s ability to take up water and nutrients. Segmentation and feature extraction of plant roots from images presents a significant computer vision challenge. Root images contain complicated structures, variations in size, background, occlusion, clutter and variation in lighting conditions. We present a new image analysis approach that provides fully automatic extraction of complex root system architectures from a range of plant species in varied imaging set-ups. Driven by modern deep-learning approaches, RootNav 2.0 replaces previously manual and semi-automatic feature extraction with an extremely deep multi-task convolutional neural network architecture. The network also locates seeds, first order and second order root tips to drive a search algorithm seeking optimal paths throughout the image, extracting accurate architectures without user interaction. Results: We develop and train a novel deep network architecture to explicitly combine local pixel information with global scene information in order to accurately segment small root features across high-resolution images. The proposed method was evaluated on images of wheat (Triticum aestivum L.) from a seedling assay. Compared with semi-automatic analysis via the original RootNav tool, the proposed method demonstrated comparable accuracy, with a 10-fold increase in speed. The network was able to adapt to different plant species via transfer learning, offering similar accuracy when transferred to an Arabidopsis thaliana plate assay. A final instance of transfer learning, to images of Brassica napus from a hydroponic assay, still demonstrated good accuracy despite many fewer training images. Conclusions: We present RootNav 2.0, a new approach to root image analysis driven by a deep neural network. The tool can be adapted to new image domains with a reduced number of images, and offers substantial speed improvements over semi-automatic and manual approaches. The tool outputs root architectures in the widely accepted RSML standard, for which numerous analysis packages exist
Show more

16 Read more

Deep Neural Network Architectures and Learning Methodologies for Classification and Application in 3D Reconstruction

Deep Neural Network Architectures and Learning Methodologies for Classification and Application in 3D Reconstruction

We presented a complete framework for urban reconstruction based on semantic labeling. Our contribution is two-fold: First, we have presented a novel network architecture which uniquely leverages the strengths of deep con- volutional autoencoders with feed forward links and cardinality-enabled ResNeXT blocks. In an important result, we have shown that the network is capable of producing smooth results without the need for CRF-based post-processing. The results on benchmark data indicate that the proposed technique can produce comparable and in some cases better classification without the need for excessive computational requirements or training time. Secondly, we have proposed a pipeline for the automatic reconstruction of urban areas based on semantic labeling. An agglomerative clustering is performed on the points based on their class. Each cluster is further processed according to its class and generic objects such as trees and cars are removed and replaced by procedurally generated tree models and car CAD mod- els, respectively. Buildings’ boundaries are extracted, extruded and triangulated to generate 3D models. All other classes are triangulated and simplified to form a digital terrain model. Finally, we have extensively tested the proposed framework on all 17 test images and show the realistic virtual environments generated as a result.
Show more

59 Read more

Enhanced Deep Learning Architectures for Face Liveness Detection for Static and Video Sequences

Enhanced Deep Learning Architectures for Face Liveness Detection for Static and Video Sequences

Abstract: Face liveness detection is a critical preprocessing step in face recognition for avoiding face spoofing attacks, where an impostor can impersonate a valid user for authentication. While considerable research has been recently done in improving the accuracy of face liveness detection, the best current approaches use a two-step process of first applying non-linear anisotropic diffusion to the incoming image and then using a deep network for final liveness decision. Such an approach is not viable for real-time face liveness detection. We develop two end-to-end real-time solutions where nonlinear anisotropic diffusion based on an additive operator splitting scheme is first applied to an incoming static image, which enhances the edges and surface texture, and preserves the boundary locations in the real image. The diffused image is then forwarded to a pre-trained Specialized Convolutional Neural Network (SCNN) and the Inception network version 4, which identify the complex and deep features for face liveness classification. We evaluate the performance of our integrated approach using the SCNN and Inception v4 on the Replay-Attack dataset and Replay-Mobile dataset. The entire architecture is created in such a manner that, once trained, the face liveness detection can be accomplished in real-time. We achieve promising results of 96.03% and 96.21% face liveness detection accuracy with the SCNN, and 94.77% and 95.53% accuracy with the Inception v4, on the Replay-Attack, and Replay-Mobile datasets, respectively. We also develop a novel deep architecture for face liveness detection on video frames that uses the diffusion of images followed by a deep Convolutional Neural Network (CNN) and a Long Short-Term Memory (LSTM) to classify the video sequence as real or fake. Even though the use of CNN followed by LSTM is not new, combining it with diffusion (that has proven to be the best approach for single image liveness detection) is novel. Performance evaluation of our architecture on the REPLAY-ATTACK dataset gave 98.71% test accuracy and 2.77% Half Total Error Rate (HTER), and on the REPLAY-MOBILE dataset gave 95.41% accuracy and 5.28% HTER.
Show more

28 Read more

Deep Architectures for Speech Processing: Survey

Deep Architectures for Speech Processing: Survey

Speech processing is study of signals and methodology of how to deal with these signals. For speech processing signals are mostly represented in digital format. Speech processing gives practical and theoretical insights about how the human speech can be processed by the computers/machines. It involves speech recognition, speech synthesis, speech enhancement and spoken dialog system. There are some characteristics of speech signals like phonemes, prosody, IPA notation. Speech signals can contain message, speaker specific characteristics, emotions, language context. Basic parameters in speech processing involves pitch, SN ratio, voice intensity and quality. In the area of speech processing basic parameters are needed which play a vital role in good performance of the systems. Many algorithms have been developed to estimate those basic parameters from the speech signal, but their performance has not been explored sufficiently. Recently, to deal with effective representation of information from speech signals deep learning is state-of-art in speechprocessing area. It is machine learning methodology. Deep learning has turned out to be successful in tackling with many AI problems including speech information processing.
Show more

6 Read more

Learning Deep Architectures for AI - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials

Learning Deep Architectures for AI - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials

Here, we argue that training an energy-based model can be achieved by solving a series of classification problems in which one tries to dis- criminate training examples from samples generated by the model. In the Boltzmann machine learning algorithms, as well as in Con- trastive Divergence, an important element is the ability to sample from the model, maybe approximately. An elegant way to understand the value of these samples in improving the log-likelihood was introduced in [201], using a connection with boosting. We start by explaining the idea informally and then formalize it, justifying algorithms based on training the generative model with a classification criterion separating model samples from training examples. The maximum likelihood cri- terion wants the likelihood to be high on the training examples and low elsewhere. If we already have a model and we want to increase its likelihood, the contrast between where the model puts high prob- ability (represented by samples) and where the training examples are indicates how to change the model. If we were able to approximately separate training examples from model samples with a decision sur- face, we could increase likelihood by reducing the value of the energy function on one side of the decision surface (the side where there are more training examples) and increasing it on the other side (the side where there are more samples from the model). Mathematically, con- sider the gradient of the log-likelihood with respect to the parameters of the FreeEnergy(x) (or Energy(x) if we do not introduce explicit hidden variables), given in Equation (5.10). Now consider a highly reg- ularized two-class probabilistic classifier that will attempt to separate training samples of ˆ P (x) from model samples of P (x), and which is only able to produce an output probability q(x) = P (y = 1 | x) barely different from 1
Show more

130 Read more

Deep neural architectures for mapping scalp to intracranial EEG

Deep neural architectures for mapping scalp to intracranial EEG

Data is often plagued by noise which encumbers machine learning of clinically useful biomarkers and EEG data is no exemption. Intracranial EEG data enhances the training of deep learning models of the human brain, yet is often prohibitive due to the invasive recording process. A more convenient alternative is to record brain activity using scalp electrodes. However, the inherent noise associated with scalp EEG data often impedes the learning process of neural models, achieving substandard performance. Here, an ensemble deep learning architecture for non-linearly mapping scalp to intracranial EEG data is proposed. The proposed architecture exploits the information from a limited number of joint scalp- intracranial recording to establish a novel methodology for detecting the epileptic discharges from the scalp EEG of a general population of subjects. Statistical tests and qualitative analysis have revealed that the generated pseudo-intracranial data are highly correlated with the true intracranial data. This facilitated the detection of IEDs from the scalp recordings where such waveforms are not often visible. As a real world clinical application, these pseudo-intracranial EEG are then used by a convolutional neural network for the automated classification of intracranial epileptic discharges (IEDs) and non-IED of trials in the context of epilepsy analysis. Although the aim of this work was to circumvent the unavailability of intracranial EEG and the limitations of scalp EEG, we have achieved a classification accuracy of 64%; an increase of 6% over the previously proposed linear regression mapping.
Show more

15 Read more

Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond

Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond

When it comes to dimensional emotion recognition, there exists great variability between different databases, espe- cially those containing emotions in-the-wild. In particular, the annotators and the range of the annotations are dif- ferent and the labels can be either discrete or continuous. To tackle the problems caused by this variability, we take advantage of the fact that the Aff-Wild is a powerful data- base that can be exploited for learning features, which may then be used as priors for dimensional emotion recognition. In the following, we show that it can be used as prior for the RECOLA and AFEW-VA databases that are annotated for valence and arousal, just like Aff-Wild. In addition to this, we use it as a prior for categorical emotion recognition, on the EmotiW dataset, which is annotated in terms of the seven basic emotions. Experiments have been conducted on these databases yielding state-of-the-art results and thus verifying the strength of Aff-Wild for affect recognition.
Show more

23 Read more

Improving ultrasound video classification: an evaluation of novel deep learning methods in echocardiography

Improving ultrasound video classification: an evaluation of novel deep learning methods in echocardiography

Abstract: Echocardiography is the commonest medical ultrasound examination, but automated interpretation is challenging and hinges on correct recognition of the ‘view’ (imaging plane and orientation). Current state-of-the-art methods for identifying the view computationally involve 2-dimensional convolutional neural networks (CNNs), but these merely classify individual frames of a video in isolation, and ignore information describing the movement of structures throughout the cardiac cycle. Here we explore the efficacy of novel CNN architectures, including time-distributed networks and two-stream networks, which are inspired by advances in human action recognition. We demonstrate that these new architectures more than halve the error rate of traditional CNNs from 8.1% to 3.9%. These advances in accuracy may be due to these networks’ ability to track the movement of specific structures such as heart valves throughout the cardiac cycle. Finally, we show the accuracies of these new state-of-the-art networks are approaching expert agreement (3.6% discordance), with a similar pattern of discordance between views.
Show more

11 Read more

A Novel Deep Learning Model for the Detection and Identification of Rolling Element-Bearing Faults

A Novel Deep Learning Model for the Detection and Identification of Rolling Element-Bearing Faults

Abstract: Real-time acquisition of large amounts of machine operating data is now increasingly common due to recent advances in Industry 4.0 technologies. A key benefit to factory operators of this large scale data acquisition is in the ability to perform real-time condition monitoring and early-stage fault detection and diagnosis on industrial machinery—with the potential to reduce machine down-time and thus operating costs. The main contribution of this work is the development of an intelligent fault diagnosis method capable of operating on these real-time data streams to provide early detection of developing problems under variable operating conditions. We propose a novel dual-path recurrent neural network with a wide first kernel and deep convolutional neural network pathway (RNN-WDCNN) capable of operating on raw temporal signals such as vibration data to diagnose rolling element bearing faults in data acquired from electromechanical drive systems. RNN-WDCNN combines elements of recurrent neural networks (RNNs) and convolutional neural networks (CNNs) to capture distant dependencies in time series data and suppress high-frequency noise in the input signals. Experimental results on the benchmark Case Western Reserve University (CWRU) bearing fault dataset show RNN-WDCNN outperforms current state-of-the-art methods in both domain adaptation and noise rejection tasks.
Show more

25 Read more

A Novel Algorithm for Damaged Barcode Recognition Based on Deep Learning

A Novel Algorithm for Damaged Barcode Recognition Based on Deep Learning

In this paper, a barcode recognition algorithm based on deep learning is proposed. The algorithm is aimed at those damaged barcode images and solving their recognition problems. In our proposal, a convolutional neural network is designed for barcode recognition. A dataset contains of one hundred thousand barcode images with simulated degradation is generated for model training, and a custom loss function is utilized for the specific barcode task to train the model. With the corporation of max pooling, dropout and batch normalization, the model based on deep learning shows a better performance. While for barcode images with different damage, the recognition success rate can up to 99.43%.
Show more

6 Read more

Biomedical Data Classification with Improvised Deep Learning Architectures

Biomedical Data Classification with Improvised Deep Learning Architectures

However, there are many other datapoints that are available with medical images, such as omics data, biomarker calculations, patient demographics and history. All these datapoints can enhance disease classification or prediction of progression with the help of machine learning/deep learning modules. However, it is very difficult to find a comprehensive dataset with all different modalities and features in healthcare setting due to privacy regulations. Hence in this thesis, we explore both medical imaging data with clinical datapoints as well as genomics datasets separately for classification tasks using combinational deep learning architectures. We use deep neural networks with 3D volumetric structural magnetic resonance images of Alzheimer Disease dataset for classification of disease. A separate study is implemented to understand classification based on clinical datapoints achieved by machine learning algorithms. For bioinformatics applications, sequence classification task is a crucial step for many metagenomics applications, however, requires a lot of preprocessing that requires sequence assembly or sequence alignment before making use of raw whole genome sequencing data, hence time consuming especially in bacterial taxonomy classification. There are only a few approaches for sequence classification tasks that mainly involve some convolutions and deep neural network. A novel method is developed using an intrinsic nature of recurrent neural networks for 16s rRNA sequence classification which can be adapted to utilize read sequences directly. For this classification task, the accuracy is improved using optimization techniques with a hybrid neural network.
Show more

111 Read more

New architectures for very deep learning

New architectures for very deep learning

An important consideration in the context of generalization abilities of NNs is the role of architecture. For example, it is well known that CNNs are much more suitable for image analysis problems compared to simple fully-connected MLPs. Although MLPs are perfectly capable of matching the training set performance of CNNs, they do not generalize well to test data. The reason for this is that the inductive bias [ Mitchell, 1980 ] of the CNN is more suitable for spatial signals such as images, i.e., the CNN is implicitly encouraged by design to implement the “correct” function for processing images. Of course, the implicitly “correct” functions are chosen by a scientist based on the properties of the domain, and these choices may not be theoretically optimal. Other architectural choices that together build up an NN such as connectivity, activation function, etc., are also similarly connected to generalization performance. They act as priors over the nature of functions/programs that are being modeled, and choosing the right prior implies that the underlying function can be learned using much less data. Thus, there is an interesting connection between the modeling capacity of deeper networks (as discussed in section 2.3 and their generalization that is relevant to this thesis. It has been noted by Bengio et al. [2013] that increasing depth is not only a way to increase the modeling capacity, but can also lead to improved generalization, since deeper NNs can model certain function classes significantly more efficiently than shallower ones.
Show more

138 Read more

Comparing different deep learning architectures for classification of chest radiographs

Comparing different deep learning architectures for classification of chest radiographs

Chest radiographs are among the most frequently acquired images in radiology and are often the subject of computer vision research. However, most of the models used to classify chest radiographs are derived from openly available deep neural networks, trained on large image datasets. These datasets differ from chest radiographs in that they are mostly color images and have substantially more labels. Therefore, very deep convolutional neural networks (CNN) designed for ImageNet and often representing more complex relationships, might not be required for the comparably simpler task of classifying medical image data. Sixteen different architectures of CNN were compared regarding the classification performance on two openly available datasets, the CheXpert and COVID-19 Image Data Collection. Areas under the receiver operating characteristics curves (AUROC) between 0.83 and 0.89 could be achieved on the CheXpert dataset. On the COVID-19 Image Data Collection, all models showed an excellent ability to detect COVID-19 and non-COVID pneumonia with AUROC values between 0.983 and 0.998. It could be observed, that more shallow networks may achieve results comparable to their deeper and more complex counterparts with shorter training times, enabling classification performances on medical image data close to the state-of-the-art methods even when using limited hardware.
Show more

16 Read more

Optimizing Deep CNN Architectures for Face Liveness Detection

Optimizing Deep CNN Architectures for Face Liveness Detection

In this paper, we developed an optimal solution to the face liveness detection problem. We first applied nonlinear diffusion based on an additive operator splitting scheme and a block-solver tridiagonal matrix algorithm to the captured images. This generated diffused images, with the edge information and surface texture of the real images being more pronounced than fake ones. These diffused images were then fed to a CNN to extract the complex and deep features to classify the images as real or fake. Our implementation with the deep CNN architecture, Inception v4, on the NUAA dataset gave excellent results of 100% accuracy, showing that it can efficiently classify a two-dimensional diffused image as real or fake. Though the CNN-5 and ResNet50 did not give results as good as those of the Inception v4 network, they still returned promising results. A comparative analysis of the three architectures showed that the smoothness of a diffused image is an important factor in determining the liveness of a captured image. We determined that with a low value of this smoothness parameter, the Inception v4 network outperformed the 5-layer CNN and the 50-layer residual network due to its capability of recognizing features at different scales. With a higher value of this smoothness parameter, the 50-layer residual network and the 5-layer CNN performed slightly better than the Inception v4. However, for still higher values of the smoothness parameter, Inception v4 showed better performance. Not only did the Inception v4 outperform the 50-layer residual network and the 5-layer CNN, it also outperformed other state-of-the-art approaches used for face liveness detection. Compared with the Inception v4 network, faster performance was obtained with ResNet50 due to the shortcut paths incorporated in it, and CNN-5 performed still faster because it has fewer layers. Our future work will consist of using recurrent neural networks based on Long Short-Term Memory (LSTM) for face liveness detection on video streams.
Show more

16 Read more

Combating HIV with Novel Antibody Architectures

Combating HIV with Novel Antibody Architectures

In this thesis, the first chapter provides a history of the discovery of HIV, the origins of the virus, description of the HIV genome, focusing primarily on the envelope glycoprotein, a [r]

348 Read more

Learning a deep model for human action recognition from novel viewpoints

Learning a deep model for human action recognition from novel viewpoints

Deep Learning Models: Deep learning models [42]–[44] can learn a hierarchy of features by constructing high-level representations from low-level ones. Due to the impres- sive results of such deep learning on handwritten digit recognition [43], image classification [45] and object de- tection [46], several methods have been recently proposed to learn deep models for video based action recognition. Ji et al. [47] extended the deep 2D convolutional neural network (CNN) to 3D where convolutions are performed on 3D feature maps from spatial and temporal dimensions. Simonyan and Zisserman [48] trained two CNNs, one for RGB images and one for optical flow signals, to learn spatio-temporal features. Gkioxari and Malik [49] extended this approach for action localization. Donahue et al. [50] proposed an end-to-end trainable recurrent convolutional network which processes video frames with a CNN, whose outputs are passed through a recurrent neural network. None of these methods is designed for action recognition in videos acquired from unseen views. Moreover, learning deep models for the task of cross-view action recognition requires a large corpus of training data acquired from multiple views which is unavailable and very expensive to acquire and label. These limitations motivate us to propose a pipeline for generating realistic synthetic training data and subsequently learn a Robust Non-linear Knowledge Trans- fer Model (R-NKTM) which can transfer action videos from any view to a high level space where actions can be matched in a view-invariant way. Although learned from synthetic data, the proposed R-NKTM is able to generalize to real action videos and achieve state-of-the-art results.
Show more

14 Read more

Show all 10000 documents...