Topic-guided model for image captioning system

    公开(公告)号:US11042782B2

    公开(公告)日:2021-06-22

    申请号:US16473898

    申请日:2017-03-20

    Abstract: Techniques are provided for training and operation of a topic-guided image captioning system. A methodology implementing the techniques according to an embodiment includes generating image feature vectors, for an image to be captioned, based on application of a convolutional neural network (CNN) to the image. The method further includes generating the caption based on application of a recurrent neural network (RNN) to the image feature vectors. The RNN is configured as a long short-term memory (LSTM) RNN. The method further includes training the LSTM RNN with training images and associated training captions. The training is based on a combination of: feature vectors of the training image; feature vectors of the associated training caption; and a multimodal compact bilinear (MCB) pooling of the training caption feature vectors and an estimated topic of the training image. The estimated topic is generated by an application of the CNN to the training image.

    FEATURE FUSION FOR MULTI-MODAL MACHINE LEARNING ANALYSIS

    公开(公告)号:US20200279156A1

    公开(公告)日:2020-09-03

    申请号:US16645425

    申请日:2017-10-09

    Abstract: A system to perform multi-modal analysis has at least three distinct characteristics: an early abstraction layer for each data modality integrating homogeneous feature cues coming from different deep learning architectures for that data modality, a late abstraction layer for further integrating heterogeneous features extracted from different models or data modalities and output from the early abstraction layer, and a propagation-down strategy for joint network training in an end-to-end manner. The system is thus able to consider correlations among homogeneous features and correlations among heterogenous features at different levels of abstraction. The system further extracts and fuses discriminative information contained in these models and modalities for high performance emotion recognition.

    METHODS AND SYSTEMS USING IMPROVED TRAINING AND LEARNING FOR DEEP NEURAL NETWORKS

    公开(公告)号:US20200026988A1

    公开(公告)日:2020-01-23

    申请号:US16475075

    申请日:2017-04-07

    Abstract: Methods and systems are disclosed using improved training and learning for deep neural networks. In one example, a deep neural network includes a plurality of layers, and each layer has a plurality of nodes. For each L layer in the plurality of layers, the nodes of each L layer are randomly connected to nodes in a L+1 layer. For each L+1 layer in the plurality of layers, the nodes of each L+1 layer are connected to nodes in a subsequent L layer in a one-to-one manner. Parameters related to the nodes of each L layer are fixed. Parameters related to the nodes of each L+1 layers are updated, and L is an integer starting with 1. In another example, a deep neural network includes an input layer, output layer, and a plurality of hidden layers. Inputs for the input layer and labels for the output layer are determined related to a first sample. Similarity between different pairs of inputs and labels between a second sample with the first sample is estimated using Gaussian regression process.

    METHODS AND SYSTEMS FOR BUDGETED AND SIMPLIFIED TRAINING OF DEEP NEURAL NETWORKS

    公开(公告)号:US20200026965A1

    公开(公告)日:2020-01-23

    申请号:US16475078

    申请日:2017-04-07

    Abstract: Methods and systems for budgeted and simplified training of deep neural networks (DNNs) are disclosed. In one example, a trainer is to train a DNN using a plurality of training sub-images derived from a down-sampled training image. A tester is to test the trained DNN using a plurality of testing sub-images derived from a down-sampled testing image. In another example, in a recurrent deep Q-network (RDQN) having a local attention mechanism located between a convolutional neural network (CNN) and a long-short time memory (LSTM), a plurality of feature maps are generated by the CNN from an input image. Hard-attention is applied by the local attention mechanism to the generated plurality of feature maps by selecting a subset of the generated feature maps. Soft attention is applied by the local attention mechanism to the selected subset of generated feature maps by providing weights to the selected subset of generated feature maps in obtaining weighted feature maps. The weighted feature maps are stored in the LSTM. A Q value is calculated for different actions based on the weighted feature maps stored in the LSTM.

    Combinatorial shape regression for face alignment in images

    公开(公告)号:US10528839B2

    公开(公告)日:2020-01-07

    申请号:US15573631

    申请日:2015-06-26

    Abstract: Combinatorial shape regression is described as a technique for face alignment and facial landmark detection in images. As described stages of regression may be built for multiple ferns for a facial landmark detection system. In one example a regression is performed on a training set of images using face shapes, using facial component groups, and using individual face point pairs to learn shape increments for each respective image in the set of images. A fern is built based on this regression. Additional regressions are performed for building additional ferns. The ferns are then combined to build the facial landmark detection system.

    DYNAMIC NEURAL NETWORK SURGERY
    59.
    发明申请

    公开(公告)号:US20190188567A1

    公开(公告)日:2019-06-20

    申请号:US16328689

    申请日:2016-09-30

    CPC classification number: G06N3/08 G06N3/04 G06N3/0454 G06N3/082

    Abstract: Techniques related to compressing a pre-trained dense deep neural network to a sparsely connected deep neural network for efficient implementation are discussed. Such techniques may include iteratively pruning and splicing available connections between adjacent layers of the deep neural network and updating weights corresponding to both currently disconnected and currently connected connections between the adjacent layers.

    LOW-COST FACE RECOGNITION USING GAUSSIAN RECEPTIVE FIELD FEATURES

    公开(公告)号:US20180082107A1

    公开(公告)日:2018-03-22

    申请号:US15562133

    申请日:2015-03-27

    Abstract: Methods and systems may provide for facial recognition of at least one input image utilizing hierarchical feature learning and pair-wise classification. Receptive field theory may be used on the input image to generate a pre-processed multi-channel image. Channels in the pre-processed image may be activated based on the amount of feature rich details within the channels. Similarly, local patches may be activated based on the discriminant features within the local patches. Features may be extracted from the local patches and the most discriminant features may be selected in order to perform feature matching on pair sets. The system may utilize patch feature pooling, pair-wise matching, and large-scale training in order to quickly and accurately perform facial recognition at a low cost for both system memory and computation.

Patent Agency Ranking