-
公开(公告)号:US12106052B2
公开(公告)日:2024-10-01
申请号:US17205894
申请日:2021-03-18
Inventor: Shuohuan Wang , Siyu Ding , Yu Sun
IPC: G06N5/02 , G06F18/21 , G06F40/279 , G06F40/30
CPC classification number: G06F40/30 , G06F18/2163 , G06F40/279 , G06N5/02
Abstract: The disclosure discloses a method and an apparatus for generating a semantic representation model, and a storage medium. The detailed implementation includes: performing recognition and segmentation on the original text included in an original text set to obtain knowledge units and non-knowledge units in the original text; performing knowledge unit-level disorder processing on the knowledge units and the non-knowledge units in the original text to obtain a disorder text; generating a training text set based on the character attribute of each character in the disorder text; and training an initial semantic representation model by employing the training text set to generate the semantic representation model.
-
2.
公开(公告)号:US20230004753A9
公开(公告)日:2023-01-05
申请号:US17209051
申请日:2021-03-22
Abstract: The present disclosure provides a method, apparatus, electronic device and storage medium for training a semantic similarity model, which relates to the field of artificial intelligence. A specific implementation solution is as follows: obtaining a target field to be used by a semantic similarity model to be trained; calculating respective correlations between the target field and application fields corresponding to each of training datasets in known multiple training datasets; training the semantic similarity model with the training datasets in turn, according to the respective correlations between the target field and the application fields corresponding to each of the training datasets. According to the technical solution of the present disclosure, it is possible to, in the fine-tuning phase, more purposefully train the semantic similarity model with the training datasets with reference to the correlations between the target field and the application fields corresponding to the training datasets, thereby effectively improving the learning capability of the sematic similarity model and effectively improving the accuracy of the trained semantic similarity model.
-
公开(公告)号:US11481656B2
公开(公告)日:2022-10-25
申请号:US16008559
申请日:2018-06-14
Inventor: Shengxian Wan , Yu Sun , Dianhai Yu
Abstract: The present disclosure provides a method and apparatus for evaluating a matching degree of multi-domain information based on artificial intelligence, a device and a medium. The method comprises: respectively obtaining valid words in a query, and valid words in each information domain in at least two information domains in a to-be-queried document; respectively obtaining word expressions of valid words in the query and word expressions of valid words in said each information domain in at least two information domains in the to-be-queried document; based on the word expressions, respectively obtaining context-based word expressions of valid words in the query and context-based word expressions of valid words in said each information domain; generating matching features corresponding to said each information domain according to the obtained information; determining a matching degree score between the query and the to-be-queried document according to the matching features corresponding to said each information domain.
-
公开(公告)号:US11232140B2
公开(公告)日:2022-01-25
申请号:US16054920
申请日:2018-08-03
Inventor: Shuohuan Wang , Yu Sun , Dianhai Yu
IPC: G06F16/33 , G10L15/18 , G06F16/31 , G06F16/951 , G06F40/30 , G06F40/211
Abstract: Embodiments of the present disclosure disclose a method and apparatus for processing information. A specific implementation of the method includes: acquiring a search result set related to a search statement inputted by a user; parsing the search statement to generate a first syntax tree, and parsing a search result in the search result set to generate a second syntax tree set; calculating a similarity between the search statement and the search result in the search result set using a pre-trained semantic matching model on the basis of the first syntax tree and the second syntax tree set, the semantic matching model being used to determine the similarity between the syntax trees; and sorting the search result in the search result set on the basis of the similarity between the search statement and the search result in the search result set, and pushing the sorted search result set to the user.
-
公开(公告)号:US20210390257A1
公开(公告)日:2021-12-16
申请号:US17116846
申请日:2020-12-09
Inventor: Chao Pang , Shuohuan Wang , Yu Sun , Hua Wu , Haifeng Wang
IPC: G06F40/295 , G06F40/30 , G06F40/137 , G06N5/02
Abstract: A method, an apparatus, a device and a storage medium for learning a knowledge representation are provided. The method can include: sampling a sub-graph of a knowledge graph from a knowledge base; serializing the sub-graph of the knowledge graph to obtain a serialized text; and reading using a pre-trained language model the serialized text in an order in the sub-graph of the knowledge graph, to perform learning to obtain a knowledge representation of each word in the serialized text. The knowledge representation learning in this embodiment is performed for entity and relationship representation learning in the knowledge base.
-
6.
公开(公告)号:US11995405B2
公开(公告)日:2024-05-28
申请号:US17348104
申请日:2021-06-15
Inventor: Xuan Ouyang , Shuohuan Wang , Chao Pang , Yu Sun , Hao Tian , Hua Wu , Haifeng Wang
Abstract: The present disclosure provides a multi-lingual model training method, apparatus, electronic device and readable storage medium and relates to the technical field of deep learning and natural language processing. A technical solution of the present disclosure when training the multi-lingual model is: obtaining training corpuses comprising a plurality of bilingual corpuses and a plurality of monolingual corpuses; training a multi-lingual model with a first training task by using the plurality of bilingual corpuses; training the multi-lingual model with a second training task by using the plurality of monolingual corpuses; and completing the training of the multi-lingual model in a case of determining that loss functions of the first training task and second training task converge. In the present disclosure, the multi-lingual model can be enabled to achieve semantic interaction between different languages and improve the accuracy of the multi-lingual model in learning the semantic representations of the multi-lingual model.
-
公开(公告)号:US11928432B2
公开(公告)日:2024-03-12
申请号:US17319189
申请日:2021-05-13
Inventor: Fei Yu , Jiji Tang , Weichong Yin , Yu Sun , Hao Tian , Hua Wu , Haifeng Wang
CPC classification number: G06F40/284 , G06F40/30 , G06N5/04 , G06N20/00 , G06V10/811 , G06V20/30
Abstract: A multi-modal pre-training model acquisition method, an electronic device and a storage medium, which relate to the fields of deep learning and natural language processing, are disclosed. The method may include: determining, for each image-text pair as training data, to-be-processed fine-grained semantic word in the text; masking the to-be-processed fine-grained semantic words; and training the multi-modal pre-training model using the training data with the fine-grained semantic words masked.
-
公开(公告)号:US11461549B2
公开(公告)日:2022-10-04
申请号:US16988907
申请日:2020-08-10
Inventor: Han Zhang , Dongling Xiao , Yukun Li , Yu Sun , Hao Tian , Hua Wu , Haifeng Wang
IPC: G06F40/274 , G06F40/56 , G06F40/30 , G06K9/62
Abstract: The present disclosure discloses a method and an apparatus for generating a text based on a semantic representation and relates to a field of natural language processing (NLP) technologies. The method for generating the text includes: obtaining an input text, the input text comprising a source text; obtaining a placeholder of an ith word to be predicted in a target text; obtaining a vector representation of the ith word to be predicted, in which the vector representation of the ith word to be predicted is obtained by calculating the placeholder of the ith word to be predicted, the source text and 1st to (i−1)th predicted words by employing a self-attention mechanism; and generating an ith predicted word based on the vector representation of the ith word to be predicted, to obtain a target text.
-
9.
公开(公告)号:US20220019736A1
公开(公告)日:2022-01-20
申请号:US17211669
申请日:2021-03-24
Inventor: Xuan Ouyang , Shuohuan Wang , Yu Sun
IPC: G06F40/253 , G06F40/166
Abstract: The present application discloses a method and apparatus for training a natural language processing model, a device and a storage medium, which relates to the natural language processing field based on artificial intelligence. An implementation includes: constructing training language material pairs of a coreference resolution task based on a preset language material set, wherein each training language material pair includes a positive sample and a negative sample; training the natural language processing model with the training language material pair to enable the natural language processing model to learn the capability of recognizing corresponding positive samples and negative samples; and training the natural language processing model with the positive samples of the training language material pairs to enable the natural language processing model to learn the capability of the coreference resolution task.
-
公开(公告)号:US11151177B2
公开(公告)日:2021-10-19
申请号:US16054842
申请日:2018-08-03
Inventor: Yukun Li , Yi Liu , Yu Sun , Dianhai Yu
Abstract: Embodiments of the present disclosure disclose a search method and apparatus based on artificial intelligence. A specific implementation of the method comprises: acquiring at least one candidate document related to a query sentence; determining a query word vector sequence corresponding to a segmented word sequence of the query sentence, and determining a candidate document word vector sequence corresponding to a segmented word sequence of each candidate document in the at least one candidate document; performing a similarity calculation for each candidate document in the at least one candidate document; selecting, in a descending order of similarities between the candidate document and the query sentence, a preset number of candidate documents from the at least one candidate document as a search result.
-
-
-
-
-
-
-
-
-