一种结合多视角注意力机制的细粒度视觉问答方法

发明公开

CN110717431A 一种结合多视角注意力机制的细粒度视觉问答方法有权

请登陆查看更多内容

专利标题： 一种结合多视角注意力机制的细粒度视觉问答方法
专利标题（英）： Fine-grained visual question-answering method combined with multi-view attention mechanism
申请号： CN201910927585.4

申请日： 2019-09-27
公开(公告)号： CN110717431A

公开(公告)日： 2020-01-21
发明人: 彭淑娟 , 李磊 , 柳欣 , 范文涛 , 钟必能 , 杜吉祥
申请人： 华侨大学
申请人地址： 福建省泉州市丰泽区城东城华北路269号
专利权人： 华侨大学
当前专利权人： 华侨大学
当前专利权人地址： 福建省泉州市丰泽区城东城华北路269号
代理机构： 厦门市首创君合专利事务所有限公司
代理商 张松亭; 杨锴
主分类号： G06K9/00
IPC分类号： G06K9/00 ; G06K9/32 ; G06F16/332 ; G06F16/58 ; G06F16/583 ; G06N3/04 ; G06N3/08

摘要：

本发明涉及一种结合多视角注意力机制的细粒度视觉问答方法，充分考虑到问题具体语义的导向作用，提出一种多视角注意力模型，能够有效选择出与当前任务目标(问题)相关的多个显著目标区域，从多个视角学习获取图像和问题文本中与答案有关的区域信息，提取出问题语义引导下的图像中的区域显著性特征，具有更细粒度的特征表达，并对图像中存在多个重要语义表达区域的情况表现，具有较强的刻画能力，增加了多视角注意力模型的有效性和全面性，从而有效加强图像区域显著特征和问题特征的语义关联性，以提升视觉问答的语义理解的准确性和全面性。采用本发明所述的方法进行视觉问答任务，步骤简单、效率高、准确率高，完全可以用于商业，市场前景较佳。

摘要（英）：

The invention relates to a fine-grained visual question-answering method combined with a multi-view attention mechanism. The guiding effect of specific semantics of the problem is fully considered. Amulti-view attention model is provided. A plurality of salient target areas related to a current task target (problem) can be effectively selected From multiple perspectives, region information related to answers is acquired in images and question texts, regional significance features are extracted in the images under the guidance of question semantics. The characteristic expression of finer granularity is realized; the multi-view attention model has the advantages that the multi-view attention model is constructed, the situation that a plurality of important semantic expression areas exist in the image is expressed, the depicting capacity is high, the effectiveness and comprehensiveness of the multi-view attention model are improved, and therefore the semantic relevance of image area significant features and question features is effectively enhanced, and the accuracy and comprehensiveness of semantic understanding of visual questions and answers are improved. The visual question-answering task is carried out by adopting the method, the steps are simple, the efficiency is high, the accuracy is high, the method can be completely used for business, and the market prospect is good.

公开/授权文献

CN110717431B 一种结合多视角注意力机制的细粒度视觉问答方法公开/授权日：2023-03-24

信息查询

中国专利公布公告 Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )