[PDF] Top 20 Cross-Modal Multistep Fusion Network with Co-Attention for Visual Question Answering

Cross-Modal Multistep Fusion Network with Co-Attention for Visual Question Answering

... the visual question answering ...three question types including yes/no, number and other. For each question, 10 free-response answers are ... See full document

9

Dynamic Capsule Attention for Visual Question Answering

... In visual question answering (VQA), recent advances have well advocated the use of attention mechanism to precisely link the question to the potential answer ...the question ... See full document

8

Segmentation Guided Attention Networks for Visual Question Answering

... of Visual Question Answering by using a novel segmentation guided attention based network which we call SegAttend- ...Neural Network to refine our attention maps and use ... See full document

6

Multi grained Attention with Object level Grounding for Visual Question Answering

... sual Question Answering (VQA) to search for visual clues related to the ...train attention models from a coarse- grained association between sentences and im- ages, which tends to fail on ... See full document

6

Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions?

... the question just by looking at the sharpened re- ...collect attention maps is a verification task as opposed to actual question ...the question cor- rectly, as opposed to showing them just ... See full document

6

Faithful Multimodal Explanation for Visual Question Answering

... both visual and multimodal ...of attention values and source identifiers’ values in Eq, 2 over time (t) and assign the accumulated attention weight to each corresponding segmentation ...normalize ... See full document

10

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

... the visual rep- resentation ...the visual feature with the language ...the attention weight for each grid ...the attention map to create the attended visual ...multiple attention ... See full document

12

Visual TTR Modelling Visual Question Answering in Type Theory with Records

... In this paper, I pay particular attention to the formal core of this system. A necessary aspect of such a model that I have glossed over is the parser. There is no off-the-shelf English to TTR parser, so the model ... See full document

6

Analyzing the Behavior of Visual Question Answering Models

... Neural Network (CNN) to extract image features) and the other channel processes the question (using Long Short-Term Memory (LSTM) recurrent neural network to obtain question ...and ... See full document

6

Generating Question Relevant Captions to Aid Visual Question Answering

... generated question-relevant captions help the model to focus on more relevant objects via attention adjustment, we compare the differences between the generated visual attention and human- ... See full document

10

Structured Two-Stream Attention Network for Video Question Answering

... date, visual question answering (VQA) ...Two-stream Attention network, namely STA, to answer a free-form or open-ended natural language question about the content of a given ... See full document

8

Differential Networks for Visual Question Answering

... During the data embedding phase, the image features are mapped to the size of 36 × 2048 and the text features are mapped to the size of 2400. In the differential fusion phase, the number of hidden layer in DF is ... See full document

8

Stacking with Auxiliary Features for Visual Question Answering

... Most VQA systems have a single underlying method that optimizes a specific loss function and do not leverage the advantage of using multiple di- verse models. One recent ensembling approach to VQA (Fukui et al., 2016) ... See full document

10

Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering

... on visual question answering are based on recurrent neural networks (RNNs) with at- ...with Co- attention (PSAC), which does not require RNNs for video question ...of question ... See full document

8

BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection

... Our fusion model is shown in Figure ...the question, words are preprocessed and then fed into a pretrained Skip-thought encoder (Kiros et ...whole question, as in (Yu et al. 2018). We use a BLOCK ... See full document

8

Data Augmentation for Visual Question Answering

... Many algorithms have been proposed for VQA. Some notable formulations include attention based methods (Yang et al., 2016; Xiong et al., 2016; Lu et al., 2016; Fukui et al., 2016), Bayesian frame- works (Kafle and ... See full document

5

Hand in Glove: Deep Feature Fusion Network Architectures for Answer Quality Prediction in Community Question Answering

... Feature Fusion Network (DFFN)” - a novel approach which com- bines HCF and DL based ...Neural Network (DNN) which takes the question, answer and their metadata as inputs and predicts the ... See full document

12

Multi Granularity Hierarchical Attention Fusion Networks for Reading Comprehension and Question Answering

... We use the AdaMax optimizer, with a mini-batch size of 32 and initial learning rate of 0.002. A dropout rate of 0.4 is used for all LSTM layers. To directly optimize our target against the evaluation metrics, we further ... See full document

10

Utilizing Co Occurrence of Answers in Question Answering

... later questions in the same series, but can not use later questions to help answer earlier questions. This requirement models the dialogue discourse between the user and the QA system. However our experiments on ... See full document

8

KVQA: Knowledge-Aware Visual Question Answering

... Question answering about image, also popularly known as Visual Question Answering (VQA), has gained huge inter- est in recent years (Goyal et ...scale Visual Question ... See full document

9