Visual question answering using visual knowledge bases

    公开(公告)号:US11663249B2

    公开(公告)日:2023-05-30

    申请号:US16650853

    申请日:2018-01-30

    CPC classification number: G06F16/3329 G06N3/045 G06N3/049 G06N5/025

    Abstract: An example apparatus for visual question answering includes a receiver to receive an input image and a question. The apparatus also includes an encoder to encode the input image and the question into a query representation including visual attention features. The apparatus includes a knowledge spotter to retrieve a knowledge entry from a visual knowledge base pre-built on a set of question-answer pairs. The apparatus further includes a joint embedder to jointly embed the visual attention features and the knowledge entry to generate visual-knowledge features. The apparatus also further includes an answer generator to generate an answer based on the query representation and the visual-knowledge features.

    Slimming of neural networks in machine learning environments

    公开(公告)号:US11537892B2

    公开(公告)日:2022-12-27

    申请号:US16632215

    申请日:2017-08-18

    Abstract: A mechanism is described for facilitating slimming of neural networks in machine learning environments. A method of embodiments, as described herein, includes learning a first neural network associated with machine learning processes to be performed by a processor of a computing device, where learning includes analyzing a plurality of channels associated with one or more layers of the first neural network. The method may further include computing a plurality of scaling factors to be associated with the plurality of channels such that each channel is assigned a scaling factor, wherein each scaling factor to indicate relevance of a corresponding channel within the first neural network. The method may further include pruning the first neural network into a second neural network by removing one or more channels of the plurality of channels having low relevance as indicated by one or more scaling factors of the plurality of scaling factors assigned to the one or more channels.

    Topic-guided model for image captioning system

    公开(公告)号:US11042782B2

    公开(公告)日:2021-06-22

    申请号:US16473898

    申请日:2017-03-20

    Abstract: Techniques are provided for training and operation of a topic-guided image captioning system. A methodology implementing the techniques according to an embodiment includes generating image feature vectors, for an image to be captioned, based on application of a convolutional neural network (CNN) to the image. The method further includes generating the caption based on application of a recurrent neural network (RNN) to the image feature vectors. The RNN is configured as a long short-term memory (LSTM) RNN. The method further includes training the LSTM RNN with training images and associated training captions. The training is based on a combination of: feature vectors of the training image; feature vectors of the associated training caption; and a multimodal compact bilinear (MCB) pooling of the training caption feature vectors and an estimated topic of the training image. The estimated topic is generated by an application of the CNN to the training image.

    LOW-COST FACE RECOGNITION USING GAUSSIAN RECEPTIVE FIELD FEATURES

    公开(公告)号:US20180082107A1

    公开(公告)日:2018-03-22

    申请号:US15562133

    申请日:2015-03-27

    Abstract: Methods and systems may provide for facial recognition of at least one input image utilizing hierarchical feature learning and pair-wise classification. Receptive field theory may be used on the input image to generate a pre-processed multi-channel image. Channels in the pre-processed image may be activated based on the amount of feature rich details within the channels. Similarly, local patches may be activated based on the discriminant features within the local patches. Features may be extracted from the local patches and the most discriminant features may be selected in order to perform feature matching on pair sets. The system may utilize patch feature pooling, pair-wise matching, and large-scale training in order to quickly and accurately perform facial recognition at a low cost for both system memory and computation.

Patent Agency Ranking