Patent search ap:("BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO LTD") AND inv:"Xuefeng Luo" Page 1

1.

发明授权
Method and apparatus for processing dataset 有权

公开(公告)号：US11663258B2

公开(公告)日：2023-05-30

申请号：US17133869

申请日：2020-12-24

Applicant: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO LTD

Inventor： Zhe Hu , Cheng Peng , Xuefeng Luo

IPC: G06F16/35 , G06F16/242 , G06F16/22 , G06F16/2455 , G06V30/414 , G06F18/214

CPC classification number: G06F16/35 , G06F16/2237 , G06F16/243 , G06F16/24556 , G06F18/2148 , G06V30/414

Abstract: The present disclosure discloses a method and apparatus for processing a dataset. The method includes: obtaining a first text set meeting a preset similarity matching condition with a target text from multiple text blocks provided by a target user; obtaining a second text set from the first text set, in which each text in the second text set does not belong to a same text block as the target text; generating a negative sample set of the target text based on content of a candidate text block to which each text in the second text set belongs; generating a positive sample set of the target text based on content of a target text block to which the target text belongs; and generating a dataset of the target user based on the negative sample set and the positive sample set, and training a matching model based on the dataset.

Patent Agency Ranking