-
公开(公告)号:US11663258B2
公开(公告)日:2023-05-30
申请号:US17133869
申请日:2020-12-24
Inventor: Zhe Hu , Cheng Peng , Xuefeng Luo
IPC: G06F16/35 , G06F16/242 , G06F16/22 , G06F16/2455 , G06V30/414 , G06F18/214
CPC classification number: G06F16/35 , G06F16/2237 , G06F16/243 , G06F16/24556 , G06F18/2148 , G06V30/414
Abstract: The present disclosure discloses a method and apparatus for processing a dataset. The method includes: obtaining a first text set meeting a preset similarity matching condition with a target text from multiple text blocks provided by a target user; obtaining a second text set from the first text set, in which each text in the second text set does not belong to a same text block as the target text; generating a negative sample set of the target text based on content of a candidate text block to which each text in the second text set belongs; generating a positive sample set of the target text based on content of a target text block to which the target text belongs; and generating a dataset of the target user based on the negative sample set and the positive sample set, and training a matching model based on the dataset.