-
公开(公告)号:US20230133690A1
公开(公告)日:2023-05-04
申请号:US17453070
申请日:2021-11-01
Applicant: Salesforce.com, inc.
Inventor: Mingfei Gao , Ran Xu
IPC: G06K9/00 , G06F40/174 , G06F40/205 , G06N20/00
Abstract: An application server may receive an input document including a set of input text fields and an input key phrase querying a value for a key-value pair that corresponds to one or more of the set of input text fields. The application server may extract, using an optical character recognition model, a set of character strings and a set of two-dimensional locations of the set of character strings on a layout of the input document. After extraction, the application server may input the extracted set of character strings and the set of two-dimensional locations into a machine learned model that is trained to compute a probability that a character string corresponds to the value for the key-value pair. The application server may then identify the value for the key-value pair corresponding to the input key phrase and may out the identified value.
-
公开(公告)号:US20220300761A1
公开(公告)日:2022-09-22
申请号:US17328779
申请日:2021-05-24
Applicant: salesforce.com, inc.
Inventor: Shu Zhang , Chetan Ramaiah , Caiming Xiong , Ran Xu
Abstract: Embodiments described herein provide a hierarchical multi-label framework to learn an embedding function that may capture the hierarchical relationship between classes at different levels in the hierarchy. Specifically, supervised contrastive learning framework may be extended to the hierarchical multi-label setting. Each data point has multiple dependent labels, and the relationship between labels is represented as a hierarchy of labels. The relationship between the different levels of labels may then be learnt by a contrastive learning framework.
-
公开(公告)号:US11347708B2
公开(公告)日:2022-05-31
申请号:US16680302
申请日:2019-11-11
Applicant: salesforce.com, inc.
Inventor: Ankit Chadha , Zeyuan Chen , Caiming Xiong , Ran Xu , Richard Socher
Abstract: Embodiments described herein provide unsupervised density-based clustering to infer table structure from document. Specifically, a number of words are identified from a block of text in an noneditable document, and the spatial coordinates of each word relative to the rectangular region are identified. Based on the word density of the rectangular region, the words are grouped into clusters using a heuristic radius search method. Words that are grouped into the same cluster are determined to be the element that belong to the same cell. In this way, the cells of the table structure can be identified. Once the cells are identified based on the word density of the block of text, the identified cells can be expanded horizontally or grouped vertically to identify rows or columns of the table structure.
-
公开(公告)号:US20210150282A1
公开(公告)日:2021-05-20
申请号:US16686051
申请日:2019-11-15
Applicant: salesforce.com, inc.
Inventor: Ankit Chadha , Caiming Xiong , Ran Xu
Abstract: Computing systems may support image classification and image detection services, and these services may utilize object detection/image classification machine learning models. The described techniques provide for normalization of confidence scores corresponding to manipulated target images and for non-max suppression within the range of confidence scores for manipulated images. In one example, the techniques provide for generating different scales of a test image, and the system performs normalization of confidence scores corresponding to each scaled image and non-max suppression per scaled image These techniques may be used to provide more accurate image detection (e.g., object detection and/or image classification) and may be used with models that are not trained on modified image sets. The model may be trained on a standard (e.g. non-manipulated) image set but used with manipulated target images and the described techniques to provide accurate object detection.
-
公开(公告)号:US11710077B2
公开(公告)日:2023-07-25
申请号:US17457163
申请日:2021-12-01
Applicant: Salesforce.com, Inc.
Inventor: Ankit Chadha , Caiming Xiong , Ran Xu
IPC: G06N20/00 , G06T3/40 , G06T3/60 , G06N3/04 , G06N3/08 , G06T3/20 , G06F18/21 , G06F18/214 , G06V10/764 , G06V10/80 , G06V10/82 , G06V10/20
CPC classification number: G06N20/00 , G06F18/217 , G06F18/2148 , G06N3/04 , G06N3/08 , G06T3/20 , G06T3/40 , G06T3/60 , G06V10/20 , G06V10/764 , G06V10/809 , G06V10/82
Abstract: Computing systems may support image classification and image detection services, and these services may utilize object detection/image classification machine learning models. The described techniques provide for normalization of confidence scores corresponding to manipulated target images and for non-max suppression within the range of confidence scores for manipulated images. In one example, the techniques provide for generating different scales of a test image, and the system performs normalization of confidence scores corresponding to each scaled image and non-max suppression per scaled image These techniques may be used to provide more accurate image detection (e.g., object detection and/or image classification) and may be used with models that are not trained on modified image sets. The model may be trained on a standard (e.g. non-manipulated) image set but used with manipulated target images and the described techniques to provide accurate object detection.
-
公开(公告)号:US20220237403A1
公开(公告)日:2022-07-28
申请号:US17161378
申请日:2021-01-28
Applicant: salesforce.com, inc.
Inventor: Pan Zhou , Peng Tang , Ran Xu , Chu Hong Hoi
Abstract: A system uses a neural network based model to perform scene text recognition. The system achieves high accuracy of prediction of text from scenes based on a neural network architecture that uses double attention mechanism. The neural network based model includes a convolutional neural network component that outputs a set of visual features and an attention extractor neural network component that determines attention scores based on the visual features. The visual features and the attention scores are combined to generate mixed features that are provided as input to a character recognizer component that determines a second attention score and recognizes the characters based on the second attention score. The system trains the neural network based model by adjusting the neural network parameters to minimize a multi-class gradient harmonizing mechanism (GHM) loss. The multi-class GHM loss varies based on a level of difficulty of the sample.
-
-
-
-
-