Top PDF Hierarchical Decomposition of Large Deep Networks

Hierarchical Decomposition of Large Deep Networks

Hierarchical Decomposition of Large Deep Networks

Teaching computers how to recognize people and objects from visual cues in images and videos is an interesting challenge. The computer vision and pattern recognition communities have already demonstrated the ability of intelligent algorithms to detect and classify objects in difficult conditions such as pose, occlusions and image fidelity. Recent deep learning approaches in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) are built using very large and deep convolution neural network architectures. In 2015, such architectures outperformed human performance (94.9% human vs 95.06% machine) for top-5 validation accuracies on the ImageNet dataset, and earlier this year deep learning approaches demonstrated a remarkable 96.43% accuracy. These successes have been made possible by deep architectures such as VGG, GoogLeNet, and most recently by deep residual models with as many as 152 weight layers. Training of these deep models is a difficult task due to compute intensive learning of millions of parameters. Due to the inevitability of these parameters, very small filters of size 3x3 are used in convolutional layers to reduce the parameters in very deep networks. On the other hand, deep networks generalize well on other datasets and outperform complex datasets with less features or Images.
Show more

90 Read more

Hierarchical deep neural networks for MeSH subject
prediction

Hierarchical deep neural networks for MeSH subject prediction

Free assignment also has it’s limitations in quality and relevance of assigned labels, as well as a general lack of standards and structure. For medical journals in Medline, Trained annotators at the National Library of Medicine (NLM) assign an average of thirteen labels per article[43]. If free assignment was used instead of MeSH vocabulary, the quality of these labels could easily be problematic if annotators are not familiar with field of research concerning certain articles, requiring authors, who are by-default experts in their work, to assign labels to their own articles as well. While it is certainly easier for authors to attach relevant labels to their own work (being intimately familiar with the nature and content of their work), it becomes difficult when an author has to assign relevant labels relative to other documents and their labels in a database. This makes it difficult for information retrieval systems to effectively index and organise articles based on their tags. Automating the labels assignment task can save time and, more importantly, ensure that the assigned labels make it easier for digital systems to characterise a document relative to other documents in the database, thereby enabling it to index, organise, and retrieve documents with ease and quickness. Additionally, going through a large label-space to find the most relevant tags is time consuming and quickly becomes an impossible task for humans in the extreme setting. With such a large influx of papers being published annually, the amount of data that will need labelling will only increase. As such, it is evident that there is a lot to be gained from developing reliable and accurate techniques to assign relevant labels to academic papers.
Show more

44 Read more

Explaining nonlinear classification decisions with deep Taylor decomposition

Explaining nonlinear classification decisions with deep Taylor decomposition

Nonlinear machine learning models have become standard tools in science and industry due to their excellent performance even for large, complex and high-dimensional problems. However, in practice it becomes more and more important to understand the underlying nonlinear model, i.e., to achieve transparency of what aspect of the input makes the model decide. To achieve this, we have contributed by novel conceptual ideas to deconstruct nonlinear models. Specifically, we have proposed a novel approach to relevance propagation called deep Taylor decomposition, and used it to assess the importance of single pixels in image classification tasks. We were able to compute heatmaps that clearly and intuitively allow to better understand the role of input pixels when classifying an unseen data point. We have shed light on theoretical connections between the Taylor decomposi- tion of a function, and rule-based relevance propagation techniques, showing a clear relationship between the two approaches for a particular class of neural networks. We have introduced the concept of relevance model as a mean to scale the analysis to networks with many layers. Our method is stable under di ff erent architectures and datasets, and does not require hyperparameter tuning. We would like to stress, that we are free to use as a starting point of our framework either an own trained and carefully tuned neural network model or we may also download existing pre-trained deep network models (e.g. the BVLC CaffeNet [48]) that have already been shown to achieve excellent performance on benchmarks. In both cases, our method provides explanation. In other words our approach is orthogonal to the quest for enhanced results on benchmarks, in fact, we can use any bench- mark winner and then enhance its transparency to the user.
Show more

12 Read more

Phone recognition with hierarchical convolutional deep maxout networks

Phone recognition with hierarchical convolutional deep maxout networks

Similar to DNNs, the early attempts of applying CNNs to speech recognition used the Texas Instruments Massachusetts Institute of Technology (TIMIT) dataset [7]. In contrast to image processing, in speech recogni- tion, the two axes of a spectro-temporal representation have different roles and should be handled differently. The earliest papers applied the convolution only along the frequency axis, arguing that small time domain shifts are automatically handled by HMMs [7–9]. The supposed benefit of frequency domain convolution is that it makes the acoustic models more robust to speaker and speak- ing style variations. Indeed, all the studies that experi- mented with frequency domain convolution found that CNNs consistently outperform fully connected DNNs on the same task [7–10]. Later studies experimented with various parameter settings, network structures, and pooling strategies, including time domain convolution [8, 11–13]. Also, the experimentation has been extended to large vocabulary recognition (LVCSR) tasks, and the latest results show that CNNs can bring a 12–14 % relative improvement in the word error rate over DNNs trained on the same LVCSR dataset [14].
Show more

13 Read more

Hierarchical aesthetic quality assessment using deep convolutional neural networks

Hierarchical aesthetic quality assessment using deep convolutional neural networks

Aesthetic image analysis has attracted much attention in recent years. However, assessing the aesthetic quality and assigning an aesthetic score are challenging problems. In this paper, we propose a novel framework for assessing the aesthetic quality of images. Firstly, we divide the images into three cate- gories: “scene”, “object” and “texture”. Each category has an associated convolutional neural network (CNN) which learns the aesthetic features for the category in question. The object CNN is trained using the whole images and a salient region in each image. The texture CNN is trained using small regions in the original images. Furthermore, an A&C CNN is developed to simultaneously assess the aesthetic quality and identify the category for overall images. For each CNN, classification and regression mod- els are developed separately to predict aesthetic class (high or low) and to assign an aesthetic score. Experimental results on a recently published large-scale dataset show that the proposed method can outperform the state-of-the-art methods for each category.
Show more

26 Read more

Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset

Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset

Guided by the success of ImageNet [8], a large-scale image dataset which has favored the recent development of computer vision and its related fields, Google intro- duced AudioSet [9] in 2017 as a large-scale dataset con- sisting of more than two million 10-s audio segments directly extracted from YouTube videos. Each audio seg- ment in AudioSet is weakly labeled (i.e., the temporal location of each audio event along the 10-s length is not available) with the different events contained in it, regard- less of the sequential or simultaneous nature of the events. Every label refers to a specific acoustic event class defined in the AudioSet Ontology. This ontology was provided along with the dataset and defines a hierarchical structure of 632 audio event categories (of which 527 are used as labels for the segments in the dataset).
Show more

18 Read more

A provably secure cluster-based hybrid hierarchical group key agreement for large wireless ad hoc networks

A provably secure cluster-based hybrid hierarchical group key agreement for large wireless ad hoc networks

In this paper a new scalable NM-CHH GKA protocol was proposed based on parallel computing for large dynamic groups with less computational capabilities. Novel archi- tectural design of our protocol provides flexibility and reduces cryptographic workload. The two level NM-CHH-GKA scheme allows on hand NM-GKA scheme to implement at cluster level to achieve scalability and robustness without sacrificing efficiency. The advantage of hierarchical management includes freeing the group controller looking after several members, enhancing security, improving scalability together with all clus- ter requiring minimal space for dealing with protocol. As a key management technique, proposed protocol uses cluster-based hybrid hierarchical scheme reducing rekeying workload of the networks while limiting the failure to local cluster without affecting other clusters. Comparative analysis showed that proposed protocol provides better per- formance in view of both communication and computation expenses. Further we estab- lished a formal security model for the proposed NM-CHH-GKA under cryptographic assumptions.
Show more

32 Read more

Deep Logic Networks: Inserting and Extracting Knowledge from Deep Belief Networks

Deep Logic Networks: Inserting and Extracting Knowledge from Deep Belief Networks

Symmetrical systems such as Markov networks and recur- rent temporal restricted Boltzmann machines also have been used for neural-symbolic integration. In [36], a method is presented for encoding background knowledge into a template Markov network (named Markov Logic Network, MLNs) which is used to create a ground Markov network representing the relationships between all the instances in the data. The idea of representing each formula into a clique of Markov network is similar to that of Penalty Logic which has been proposed to integrate symbolic knowledge and Hopfield networks [32]. The difference is that in MLNs a feature is defined by the number of true groundings of the formulas corresponding to a clique in the template model, while in Penalty Logic [32] a feature is defined by the multiplication of the variables in the clique. In practice, MLNs work well in a variety of relational domains; however, the models learned are not as comprehensible as one would expect from a symbolic model due to the size of the ground Markov network and exponential nature of such grounding. A more recent development in neural-symbolic integration is the Neural-Symbolic Cognitive Agent (NSCA) introduced in [17] in which a model based on recurrent temporal restricted Boltzmann machines (RTRBM) is proposed to represent temporal symbolic knowledge and applied to online learning and reasoning. The NSCA model contains algorithms for learning and extraction of temporal logic rules by sampling the RTRBM. It has been applied successfully to driving assessment and training in simulators. For decades, neural networks have been used successfully as a learning model, from which symbolic rules can be extracted through the use of knowledge extraction algorithms [29], [8], [9]. However, most extraction algorithms exist for discriminative models which do not support modularity. We argue that the modularity found in deep networks may facil- itate knowledge extraction, in particular improving efficiency of extraction from large networks. Most such discriminative extraction approaches treat the class variables as a special type of variable. As a result, the rules which are extracted may be helpful at explaining the relationships between all other variables and the class variables, but not the relationships that might exist among such other variables. For example, suppose a discriminative neural network was trained perfectly to learn the XOR function (denoted by ⊕ below) from its truth-table. Discriminative knowledge extraction might pro- duce the rule x 3 ↔ x 1 ⊕ x 2 , with x 3 as the class variable,
Show more

14 Read more

Hierarchical Variation in Cellulose Decomposition Across Southern Ontario Reference Streams

Hierarchical Variation in Cellulose Decomposition Across Southern Ontario Reference Streams

Contrasting effects of habitat on decomposition may be the result of regionally-scaled differences in habitat structure. Indeed, I observed inter-regional differences in the effect of habitat on decomposition rate in my study. For example, the finer sand and silt sediments of the Norfolk Sand Plains (NSP) did not produce the same distinction between habitat types as seen in the other two regions. Both riffle and pool habitats in the NSP consisted of similar substrates (i.e., sand), and similar amounts of substrate burial were observed between habitats on retrieval. In contrast, the Algonquin Highlands (AH) streams had distinct differences in substrate size and hydraulic condition with larger cobble and gravel in riffle habitats compared to predominantly fine sediments and organic debris in pools.
Show more

65 Read more

Detecting context abusiveness using hierarchical deep learning

Detecting context abusiveness using hierarchical deep learning

We have tackled the problem of detecting abusive- ness when there are no abusive words in text us- ing deep learning. We have designed a hierarchi- cal deep learning model that extracts global fea- tures for long sentences. We have also proposed an ensemble models that combine two classifiers extracting local and global features. Finally, we have combined our model for context abusiveness and an abusive lexicon method. We have evaluated the proposed system on Wikipedia, Facebook and Twitter datasets. The experimental results confirm that our hierarchical model outperforms in implicit abusive sentences of more than 100 words. En- semble model outperforms baselines as well as the state of the art in most cases. The combination of an abusive lexicon and a deep learning model shows the best performance in comparison to the individual method.
Show more

10 Read more

TDSNN: From Deep Neural Networks to Deep Spike Neural Networks with Temporal-Coding

TDSNN: From Deep Neural Networks to Deep Spike Neural Networks with Temporal-Coding

Unlike SNNs, deep neural networks (DNNs) have been able to perform the state-of-the-art results on many complex tasks such as image recognition (Krizhevsky, Sutskever, and Hinton 2012; Krizhevsky 2009; Simonyan and Zisserman 2014; He et al. 2015), speech recognition (Abdel-Hamid et al. 2012; Sainath et al. 2013; Hinton et al. 2012), natural lan- guage processing (Kim 2014; Severyn and Moschitti 2015) and so on. But heavy computation load promotes researchers to find more efficient approach to deploy them in mobiles or embedded systems. This inspires the SNN researchers that a fully-trained DNN might be slightly tuned to be directly converted to a SNN without complicated training proce- dure. Beginning with the work of (Perezcarrasco et al. 2013), where DNN units were translated into biologically inspired spiking units with leaks and refractory periods, continu- ous efforts have been made to realize this idea. After a se- ries of success in transferring deep networks like Lenet and VGG-16 (Cao, Chen, and Khosla 2015; Diehl et al. 2015; Rueckauer et al. 2017), now the rate-coding based SNN can achieve state-of-the-art performance with minor accu- racy loss even in the conversion of complicated layers like Max-Pool, BatchNorm and SoftMax.
Show more

8 Read more

Learning (from) Deep Hierarchical Structure among Features

Learning (from) Deep Hierarchical Structure among Features

root to each leaf node are summed to 1, the proposed ob- jective function can be proved to be convex no matter what the height of the hierarchical structure is. Moreover, when all the exponents take the same value, we can show that the proposed objective function is equivalent to a problem with a hierarchical group lasso regularization term. In order to optimize the objective function of the DHS method, we adopt the FISTA algorithm (Beck and Teboulle 2009) each of whose subproblems has an efficiently analytical solution. Moreover, in the proposed DHS method, the exponents of the edge weights need to be set based on a priori informa- tion. When this information is not available, by default we just set them to be identical. Usually this strategy works but it may be suboptimal. In order to alleviate this problem, we propose a variant of the DHS method to learn the exponents from data.
Show more

8 Read more

Mobility Management for Hierarchical Wireless Networks

Mobility Management for Hierarchical Wireless Networks

Mario Gerla received his graduate degree in elec- trical engineering from Politecnico di Milano, Italy, in 1966 and his M.S. and Ph.D. degrees in computer science from University of California, Los Ange- les, in 1970 and 1973, respectively. From 1973 to 1976, Dr. Gerla was a manager in Network Analy- sis Corporation, Glen Cove, NY, where he was in- volved in several computer network design projects for both government and industry, including per- formance analysis and topological updating of the ARPANET under a contract from DoD. From 1976 to 1977, he was with Tran Telecommunication, Los Angeles, CA, where he participated in the development of an integrated packet and circuit network. Since 1977, he has been on the Faculty of the Computer Science Department of UCLA. His research interests include the design, performance evaluation, and control of distributed computer communication systems and networks. His current research projects cover the following areas: design and performance evalua- tion of protocols and control schemes for ad hoc wireless networks; routing, congestion control and bandwidth allocation in wide area networks; traffic measurements and characterization.
Show more

7 Read more

On the Skewed Degree Distribution of Hierarchical Networks

On the Skewed Degree Distribution of Hierarchical Networks

Evolutionary considerations of real-world networks, how- ever, show the emergence of scale-free behavior (i.e., networks exhibiting a power-law degree distribution) in networks as a resultant of hierarchal attachment processes which are not reflected through current preferential attachment models [3], [4]. Apart from these modeling restrictions, another problem inherent to existing binary models lies in their explanatory capabilities. For instance, they fail to manifest connection strengths between individuals; a property being at the core of behavioral emergence in real networks [17], [18], [19], [20], [21].
Show more

8 Read more

Hierarchical Attention Networks for Document Classification

Hierarchical Attention Networks for Document Classification

Figure 5 shows that our model can select the words carrying strong sentiment like delicious, amazing, terrible and their corresponding sentences. Sentences containing many words like cocktails, pasta, entree are disre- garded. Note that our model can not only select words carrying strong sentiment, it can also deal with complex across-sentence context. For example, there are sentences like i don’t even like scallops in the first document of Fig. 5, if look- ing purely at the single sentence, we may think this is negative comment. However, our model looks at the context of this sentence and figures out this is a positive review and chooses to ignore this sentence. Our hierarchical attention mechanism also works well for topic classification in the Yahoo Answer data set. For example, for the left document in Figure 6 with label 1, which denotes Science and Mathematics, our model accurately localizes the words zebra, strips, camouflage, predator and their corresponding sentences. For the right document with label 4, which denotes Computers and Internet, our model focuses on web, searches, browsers and their corresponding sentences. Note that this happens in a multiclass set- ting, that is, detection happens before the selection of the topic!
Show more

10 Read more

Progressive Retrieval and Hierarchical Visualization of Large Remote Data

Progressive Retrieval and Hierarchical Visualization of Large Remote Data

GridFTP allows for server side data proessing, whih we utilize for data ltering.. The GridFTP protool, as an extension to the standard FTP protool, is well known and reliable.[r]

10 Read more

Large-scale Structural Reranking for Hierarchical Text Categorization

Large-scale Structural Reranking for Hierarchical Text Categorization

Hierarchical SVM (HSVM): this method suggested in [22] solves a series of max-cut problems: an undirected class graph with nonnegative edge weights is cut into two subgroups, and the cuts between these two subgroups receive the maximum weight. Then a binary SVM will be applied for solving the two-group problem. This approach is recursively applied to the two decomposed subgroups, until pure leaf nodes that have only one class label, are obtained. It has been shown that HSVM uses distance measures to weigh and exploit the natural class groupings. The hierarchical graph structure results in a fast and intuitive SVM training process that requires little running time and gives high classification accuracy and good generalization [22]. • Error correcting output codes (ECOC): it was proposed by [38]. It works by training
Show more

146 Read more

A Discriminative Hierarchical Model for Fast Coreference at Large Scale

A Discriminative Hierarchical Model for Fast Coreference at Large Scale

number of mentions in each entity cluster is also large. Current systems cope with this by either dividing the data into blocks to reduce the search space (Hern´andez and Stolfo, 1995; McCallum et al., 2000; Bilenko et al., 2006), using fixed heuris- tics to greedily compress the mentions (Ravin and Kazi, 1999; Rao et al., 2010), employing special- ized Markov chain Monte Carlo procedures (Milch et al., 2006; Richardson and Domingos, 2006; Singh et al., 2010), or introducing shallow hierarchies of sub-entities for MCMC block moves and super- entities for adaptive distributed inference (Singh et al., 2011). However, while these methods help man- age the search space for medium-scale data, eval- uating each coreference decision in many of these systems still scales linearly with the number of men- tions in an entity, resulting in prohibitive computa- tional costs associated with large datasets. This scal- ing with the number of mentions per entity seems particularly wasteful because although it is common for an entity to be referenced by a large number of mentions, many of these coreferent mentions are highly similar to each other. For example, in author coreference the two most common strings that refer to Richard Hamming might have the form “R. Ham- ming” and “Richard Hamming.” In newswire coref- erence, a prominent entity like Barack Obama may have millions of “Obama” mentions (many occur- ring in similar semantic contexts). Deciding whether
Show more

10 Read more

On Flat versus Hierarchical Classification in Large-Scale Taxonomies

On Flat versus Hierarchical Classification in Large-Scale Taxonomies

The procedure for collecting training data is repeated for the MLR and SVM classifiers resulting in three meta-datasets of 119 (19 positive and 100 negative), 89 (34 positive and 55 negative) and 94 (32 positive and 62 negative) examples respectively. For the binary classifiers, we used AdaBoost with random forest as a base classifier, setting the number of trees to 20, 50 and 50 for the MLR and SVM classifiers respectively and leaving the other parameters at their default values. Several values have been tested for the number of trees ({10, 20, 50, 100 and 200}), the depth of the trees ({unrestricted, 5, 10, 15, 30, 60}), as well as the number of iterations in AdaBoost ({10, 20, 30}). The final values were selected by cross-validation on the training set (LSHTC2-1 and LSHTC2-2) as the ones that maximized accuracy and minimized false-positive rate in order to prevent degradation of accuracy. We compare the fully flat classifier (FL) with the fully hierarchical (FH) top-down Pachinko ma- chine, a random pruning (RN) and the proposed pruning method (PR) . For the random pruning we restrict the procedure to the first two levels and perform 4 random prunings (this is the average number of prunings that are performed in our approach). For each dataset we perform 5 indepen- dent runs for the random pruning and we record the best performance. For MLR and SVM, we use the LibLinear library [8] and apply the L2-regularized versions, setting the penalty parameter C by cross-validation.
Show more

10 Read more

Progressive Retrieval and Hierarchical Visualization of Large Remote Data

Progressive Retrieval and Hierarchical Visualization of Large Remote Data

sentation. In the future, these stages may also be executed close to the data source, on the supercomputer itself. This would be the most efficient way to handle large simulation data, since the amount of data to be transfered during the later stages of the visualization pipeline typically decreases significantly. Completely changed access patterns to remote data can significantly reduce the amount of data transfered. Visualization algorithms using such patterns [17], in particular for large data, are seen as use cases for the presented work. The best prospects of deploying such scenarios have those environments containing PC-cluster based supercomputers. Here, adding commodity graphics boards to all nodes does not increase the total costs significantly, but allows high performance image rendering. These types of clusters are becoming increasingly common, but are still rare in the top500 [18]. For the collaborative and highly interactive visualization scenario we envision, the feedback to the remote and distributed rendering system gets important, and complex. Also, in perhaps the most important point, the field is currently missing sufficiently flexible software solutions which are able to realize such scenarios. Promising approaches do exist through work such as [19, 20, 21], and we expect major progress in that field over the next decade.
Show more

12 Read more

Show all 10000 documents...