IMAGE AUGMENTATION AND OBJECT DETECTION

    公开(公告)号:US20210150282A1

    公开(公告)日:2021-05-20

    申请号:US16686051

    申请日:2019-11-15

    Abstract: Computing systems may support image classification and image detection services, and these services may utilize object detection/image classification machine learning models. The described techniques provide for normalization of confidence scores corresponding to manipulated target images and for non-max suppression within the range of confidence scores for manipulated images. In one example, the techniques provide for generating different scales of a test image, and the system performs normalization of confidence scores corresponding to each scaled image and non-max suppression per scaled image These techniques may be used to provide more accurate image detection (e.g., object detection and/or image classification) and may be used with models that are not trained on modified image sets. The model may be trained on a standard (e.g. non-manipulated) image set but used with manipulated target images and the described techniques to provide accurate object detection.

    Dialogue state tracking using a global-local encoder

    公开(公告)号:US10929607B2

    公开(公告)日:2021-02-23

    申请号:US15978445

    申请日:2018-05-14

    Abstract: A method for maintaining a dialogue state associated with a dialogue between a user and a digital system includes receiving, by a dialogue state tracker associated with the digital system, a representation of a user communication, updating, by the dialogue state tracker, the dialogue state and providing a system response based on the updated dialogue state. The dialogue state is updated by evaluating, based on the representation of the user communication, a plurality of member scores corresponding to a plurality of ontology members of an ontology set, and selecting, based on the plurality of member scores, zero or more of the plurality of ontology members to add to or remove from the dialogue state. The dialogue state tracker includes a global-local encoder that includes a global branch and a local branch, the global branch having global trained parameters that are shared among the plurality of ontology members and the local branch having local trained parameters that are determined separately for each of the plurality of ontology members.

    Two-stage online detection of action start in untrimmed videos

    公开(公告)号:US10902289B2

    公开(公告)日:2021-01-26

    申请号:US16394992

    申请日:2019-04-25

    Abstract: Embodiments described herein provide a two-stage online detection of action start system including a classification module and a localization module. The classification module generates a set of action scores corresponding to a first video frame from the video, based on the first video frame and video frames before the first video frames in the video. Each action score indicating a respective probability that the first video frame contains a respective action class. The localization module is coupled to the classification module for receiving the set of action scores from the classification module and generating an action-agnostic start probability that the first video frame contains an action start. A fusion component is coupled to the localization module and the localization module for generating, based on the set of action scores and the action-agnostic start probability, a set of action-specific start probabilities, each action-specific start probability corresponding to a start of an action belonging to the respective action class.

    Two-Stage Online Detection of Action Start In Untrimmed Videos

    公开(公告)号:US20200302236A1

    公开(公告)日:2020-09-24

    申请号:US16394992

    申请日:2019-04-25

    Abstract: Embodiments described herein provide a two-stage online detection of action start system including a classification module and a localization module. The classification module generates a set of action scores corresponding to a first video frame from the video, based on the first video frame and video frames before the first video frames in the video. Each action score indicating a respective probability that the first video frame contains a respective action class. The localization module is coupled to the classification module for receiving the set of action scores from the classification module and generating an action-agnostic start probability that the first video frame contains an action start. A fusion component is coupled to the localization module and the localization module for generating, based on the set of action scores and the action-agnostic start probability, a set of action-specific start probabilities, each action-specific start probability corresponding to a start of an action belonging to the respective action class.

    Neural network based translation of natural language queries to database queries

    公开(公告)号:US10747761B2

    公开(公告)日:2020-08-18

    申请号:US15885613

    申请日:2018-01-31

    Abstract: A computing system uses neural networks to translate natural language queries to database queries. The computing system uses a plurality of machine learning based models, each machine learning model for generating a portion of the database query. The machine learning models use an input representation generated based on terms of the input natural language query, a set of columns of the database schema, and the vocabulary of a database query language, for example, structured query language SQL. The plurality of machine learning based models may include an aggregation classifier model for determining an aggregation operator in the database query, a result column predictor model for determining the result columns of the database query, and a condition clause predictor model for determining the condition clause of the database query. The condition clause predictor is based on reinforcement learning.

    End-to-end speech recognition with policy learning

    公开(公告)号:US10573295B2

    公开(公告)日:2020-02-25

    申请号:US15878113

    申请日:2018-01-23

    Abstract: The disclosed technology teaches a deep end-to-end speech recognition model, including using multi-objective learning criteria to train a deep end-to-end speech recognition model on training data comprising speech samples temporally labeled with ground truth transcriptions. The multi-objective learning criteria updates model parameters of the model over one thousand to millions of backpropagation iterations by combining, at each iteration, a maximum likelihood objective function that modifies the model parameters to maximize a probability of outputting a correct transcription and a policy gradient function that modifies the model parameters to maximize a positive reward defined based on a non-differentiable performance metric which penalizes incorrect transcriptions in accordance with their conformity to corresponding ground truth transcriptions; and upon convergence after a final backpropagation iteration, persisting the modified model parameters learned by using the multi-objective learning criteria with the model to be applied to further end-to-end speech recognition.

    Spatial attention model for image captioning

    公开(公告)号:US10558750B2

    公开(公告)日:2020-02-11

    申请号:US15817153

    申请日:2017-11-17

    Abstract: The technology disclosed presents a novel spatial attention model that uses current hidden state information of a decoder long short-term memory (LSTM) to guide attention and to extract spatial image features for use in image captioning. The technology disclosed also presents a novel adaptive attention model for image captioning that mixes visual information from a convolutional neural network (CNN) and linguistic information from an LSTM. At each timestep, the adaptive attention model automatically decides how heavily to rely on the image, as opposed to the linguistic model, to emit the next caption word. The technology disclosed further adds a new auxiliary sentinel gate to an LSTM architecture and produces a sentinel LSTM (Sn-LSTM). The sentinel gate produces a visual sentinel at each timestep, which is an additional representation, derived from the LSTM's memory, of long and short term visual and linguistic information.

Patent Agency Ranking