基于网格的协同注意力VQA方法和装置

发明公开

请登陆查看更多内容

专利标题： 基于网格的协同注意力VQA方法和装置
专利标题（英）： Collaborative attention VQA method and device based on grids
申请号： CN201910901463.8

申请日： 2019-09-23
公开(公告)号： CN110704668A

公开(公告)日： 2020-01-17
发明人: 付莹
申请人： 北京影谱科技股份有限公司
申请人地址： 北京市朝阳区朝外大街22号5层521室
专利权人： 北京影谱科技股份有限公司
当前专利权人： 北京影谱科技股份有限公司
当前专利权人地址： 北京市朝阳区朝外大街22号5层521室
代理机构： 北京万思博知识产权代理有限公司
代理商 高镇
主分类号： G06F16/583
IPC分类号： G06F16/583 ; G06K9/62 ; G06N3/04

摘要：

本申请公开了一种基于网格的协同注意力VQA方法和装置，属于视觉问答领域。该方法包括：从数据集中获取图像并进行网格划分；将网格划分后的图像输入RCNN中，经卷积、池化及特征融合后得到特征图；从数据集中获取问题并映射到一个向量空间内得到词向量；根据特征图和词向量计算出相关矩阵，在计算出特征图的注意力分布和词向量的注意力分布；然后输入GRU计算得到新的词向量和对应的编码；采用MLP对新的词向量和对应的编码进行整合，得到问题对应的答案。该装置包括：划分模块、RCNN模块、映射模块、计算模块、GRU模块和MLP模块。本申请实现了图像和问题文本相互关注，提高了预测的准确率，提升了模型的性能。

摘要（英）：

The invention discloses a collaborative attention VQA method and device based on grids, and belongs to the field of visual questions and answers. The collaborative attention VQA method comprises the following steps: acquiring an image from a data set and carrying out mesh generation; inputting the image after grid division into an RCNN, and obtaining a feature map after convolution, pooling and feature fusion; obtaining a problem from the data set and mapping the problem into a vector space to obtain a word vector; calculating a correlation matrix according to the feature map and the word vector, and calculating attention distribution of the feature map and attention distribution of the word vector; inputting GRU to calculate to obtain a new word vector and a corresponding code; and integrating the new word vector and the corresponding code by adopting the MLP to obtain an answer corresponding to the question. The collaborative attention VQA device comprises a division module, an RCNNmodule, a mapping module, a calculation module, a GRU module and an MLP module. The collaborative attention VQA method and device can realize mutual attention of the image and the problem text, can improve the prediction accuracy, and can improve the performance of the model.

公开/授权文献

CN110704668B 基于网格的协同注意力VQA方法和装置公开/授权日：2022-11-04

信息查询

中国专利公布公告 Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F16/00	信息检索；数据库结构；文件系统结构
G06F16/50	.•静态图像数据
G06F16/58	..••使用元数据的特征检索,例如,不来自内容或者元数据派生的
G06F16/583	...•••使用从内容中自动派生的元数据