-
11.
公开(公告)号:US20210201182A1
公开(公告)日:2021-07-01
申请号:US17200448
申请日:2021-03-12
Inventor: Yulin LI , Xiameng Qin , Chengquan Zhang , Junyu Han , Errui Ding , Tian Wu , Haifeng Wang
IPC: G06N5/04 , G06N3/04 , G06F16/901
Abstract: Embodiments of the present disclosure provide a method and apparatus for performing a structured extraction on a text, a device and a storage medium. The method may include: performing a text detection on an entity text image to obtain a position and content of a text line of the entity text image; extracting multivariate information of the text line based on the position and the content of the text line; performing a feature fusion on the multivariate information of the text line to obtain a multimodal fusion feature of the text line; performing category and relationship reasoning based on the multimodal fusion feature of the text line to obtain a category and a relationship probability matrix of the text line; and constructing structured information of the entity text image based on the category and the relationship probability matrix of the text line.
-
12.
公开(公告)号:US12211304B2
公开(公告)日:2025-01-28
申请号:US17200448
申请日:2021-03-12
Inventor: Yulin Li , Xiameng Qin , Chengquan Zhang , Junyu Han , Errui Ding , Tian Wu , Haifeng Wang
IPC: G06F16/901 , G06N3/047 , G06N5/04 , G06V10/22 , G06V10/80 , G06V30/262 , G06V30/414 , G06V10/24
Abstract: Embodiments of the present disclosure provide a method and apparatus for performing a structured extraction on a text, a device and a storage medium. The method may include: performing a text detection on an entity text image to obtain a position and content of a text line of the entity text image; extracting multivariate information of the text line based on the position and the content of the text line; performing a feature fusion on the multivariate information of the text line to obtain a multimodal fusion feature of the text line; performing category and relationship reasoning based on the multimodal fusion feature of the text line to obtain a category and a relationship probability matrix of the text line; and constructing structured information of the entity text image based on the category and the relationship probability matrix of the text line.
-
13.
公开(公告)号:US11915484B2
公开(公告)日:2024-02-27
申请号:US17304296
申请日:2021-06-17
Inventor: Zhigang Wang , Jian Wang , Errui Ding , Hao Sun
IPC: G06K9/46 , G06K9/62 , G06V20/52 , G06F18/23 , G06F18/214 , G06F18/21 , G06V10/762 , G06V10/764 , G06V10/774 , G06V10/82 , G06V20/64 , G06V40/10
CPC classification number: G06V20/52 , G06F18/214 , G06F18/2178 , G06F18/23 , G06V10/762 , G06V10/764 , G06V10/7753 , G06V10/82 , G06V20/64 , G06V40/10 , G06V2201/07
Abstract: A method, an apparatus, device and a storage medium for generating a target re-recognition model are provided. The method may include: acquiring a set of labeled samples, a set of unlabeled samples and an initialization model obtained through supervised training; performing feature extraction on each sample in the set of the unlabeled samples by using the initialization model; clustering features extracted from the set of the unlabeled samples by using a clustering algorithm; assigning, for each sample in the set of the unlabeled samples, a pseudo label to the sample according to a cluster corresponding to the sample in a feature space; and mixing a set of samples with a pseudo label and the set of the labeled samples as a set of training samples, and performing supervised training on the initialization model to obtain a target re-recognition model.
-
14.
公开(公告)号:US11600069B2
公开(公告)日:2023-03-07
申请号:US17144205
申请日:2021-01-08
Inventor: Tianwei Lin , Xin Li , Dongliang He , Fu Li , Hao Sun , Shilei Wen , Errui Ding
Abstract: A method and apparatus for detecting a temporal action of a video, an electronic device and a storage medium are disclosed, which relates to the field of video processing technologies. An implementation includes: acquiring an initial temporal feature sequence of a video to be detected; acquiring, by a pre-trained video-temporal-action detecting module, implicit features and explicit features of a plurality of configured temporal anchor boxes based on the initial temporal feature sequence; and acquiring, by the video-temporal-action detecting module, the starting position and the ending position of a video clip containing a specified action, the category of the specified action and the probability that the specified action belongs to the category from the plural temporal anchor boxes according to the explicit features and the implicit features of the plural temporal anchor boxes.
-
公开(公告)号:US20210312172A1
公开(公告)日:2021-10-07
申请号:US17353324
申请日:2021-06-21
Inventor: Zipeng Lu , Jian Wang , Yuchen Yuan , Hao Sun , Errui Ding
Abstract: A human body identification method, an electronic device and a storage medium, related to the technical field of artificial intelligence such as computer vision and deep learning, are provided. The method includes: inputting an image to be identified into a human body detection model, to obtain a plurality of preselected detection boxes; identifying a plurality of key points from each of the preselected detection boxes respectively according to a human body key point detection model, and obtaining a key point score of each of the key points; determining a target detection box from each of the preselected detection boxes, according to a number of the key points whose key point scores meet a key point threshold; and inputting the target detection box into a human body key point classification model, to obtain a human body identification result for the image to be identified.
-
公开(公告)号:US20210019531A1
公开(公告)日:2021-01-21
申请号:US16830895
申请日:2020-03-26
Inventor: Xiang Long , Dongliang He , Fu Li , Zhizhen Chi , Zhichao Zhou , Xiang Zhao , Ping Wang , Hao Sun , Shilei Wen , Errui Ding
Abstract: a method and an apparatus for classifying a video are provided. The method may include: acquiring a to-be-classified video; extracting a set of multimodal features of the to-be-classified video; inputting the set of multimodal features into a post-fusion model corresponding to each modal respectively, to obtain multimodal category information of the to-be-classified video; and fusing the multimodal category information of the to-be-classified video, to obtain category information of the to-be-classified video. This embodiment improves the accuracy of video classification.
-
17.
公开(公告)号:US10861133B1
公开(公告)日:2020-12-08
申请号:US16810986
申请日:2020-03-06
Inventor: Chao Li , Dongliang He , Xiao Liu , Yukang Ding , Shilei Wen , Errui Ding , Henan Zhang , Hao Sun
IPC: G06T3/40
Abstract: A super-resolution video reconstruction method, device, apparatus and a computer-readable storage medium are provided. The method includes: extracting a hypergraph from consecutive frames of an original video; inputting a hypergraph vector of the hypergraph into a residual convolutional neural network to obtain an output result of the residual convolutional neural network; and inputting the output result of the residual convolutional neural network into a spatial upsampling network to obtain a super-resolution frame, wherein a super-resolution video of the original video is formed by multiple super-resolution frames.
-
公开(公告)号:US11908219B2
公开(公告)日:2024-02-20
申请号:US17244291
申请日:2021-04-29
Inventor: Zihan Ni , Yipeng Sun , Kun Yao , Junyu Han , Errui Ding , Jingtuo Liu , Haifeng Wang
IPC: G06V30/413 , G06F40/30 , G06V30/414 , G06V10/70
CPC classification number: G06V30/413 , G06F40/30 , G06V10/70 , G06V30/414
Abstract: The disclosure provides a method and a device for processing information, an electronic device, and a storage medium, belonging to a field of artificial intelligence including computer vision, deep learning, and natural language processing. In the method, the computing device recognizes multiple text items in the image. The computing device classifies multiple text items into a first set of name text items and a second set of content text items based on semantics of the text items. The computing device performs a matching operation between the first set and the second set based on a layout of the text items in the image, and determines matched name-content text items. The matched name-content text items include a name text item in the first set and a content text item matching the name text item and in the second set. The computing device outputs the matched name-content text items.
-
19.
公开(公告)号:US11615140B2
公开(公告)日:2023-03-28
申请号:US17144523
申请日:2021-01-08
Inventor: Xiang Long , Dongliang He , Fu Li , Xiang Zhao , Tianwei Lin , Hao Sun , Shilei Wen , Errui Ding
IPC: G06F16/738 , G06V20/40 , G06F18/214 , G06F18/25
Abstract: A method includes screening, by a video-clip screening module in a video description model, a plurality of video proposal clips acquired from a video to be analyzed, to acquire a plurality of video clips suitable for description. The plural video proposal clips acquired from the video to be analyzed may be screened by the video-clip screening module to acquire the plural video clips suitable for description; and then, each video clip is described by a video-clip describing module, thus avoiding description of all the video proposal clips, only describing the screened video clips which have strong correlation with the video and are suitable for description, removing the interference of the description of the video clips which are not suitable for description in the description of the video, guaranteeing the accuracy of the final descriptions of the video clips, and improving the quality of the descriptions of the video clips.
-
公开(公告)号:US20210271870A1
公开(公告)日:2021-09-02
申请号:US17244291
申请日:2021-04-29
Inventor: Zihan Ni , Yipeng Sun , Kun Yao , Junyu Han , Errui Ding , Jingtuo Liu , Haifeng Wang
Abstract: The disclosure provides a method and a device for processing information, an electronic device, and a storage medium, belonging to a field of artificial intelligence including computer vision, deep learning, and natural language processing. In the method, the computing device recognizes multiple text items in the image. The computing device classifies multiple text items into a first set of name text items and a second set of content text items based on semantics of the text items. The computing device performs a matching operation between the first set and the second set based on a layout of the text items in the image, and determines matched name-content text items. The matched name-content text items include a name text item in the first set and a content text item matching the name text item and in the second set. The computing device outputs the matched name-content text items.
-
-
-
-
-
-
-
-
-