METHOD AND APPARATUS FOR PROCESSING DATASET

    公开(公告)号:US20210365444A1

    公开(公告)日:2021-11-25

    申请号:US17133869

    申请日:2020-12-24

    Abstract: The present disclosure discloses a method and apparatus for processing a dataset. The method includes: obtaining a first text set meeting a preset similarity matching condition with a target text from multiple text blocks provided by a target user; obtaining a second text set from the first text set, in which each text in the second text set does not belong to a same text block as the target text; generating a negative sample set of the target text based on content of a candidate text block to which each text in the second text set belongs; generating a positive sample set of the target text based on content of a target text block to which the target text belongs; and generating a dataset of the target user based on the negative sample set and the positive sample set, and training a matching model based on the dataset.

    IMPLEMENTING TEXT GENERATION
    2.
    发明申请

    公开(公告)号:US20210286934A1

    公开(公告)日:2021-09-16

    申请号:US17331526

    申请日:2021-05-26

    Abstract: A method for implementing text generation, a device and a medium are provided. The method includes: determining a target task type of a target text generation task from multiple task types supported by a pre-trained general text generation model; determining, based on a requirement of the target text generation task for a target output text, a first target output text attribute for the target text generation task from multiple output text attributes supported by the general text generation model; and fine tuning the general text generation model based on a target training data set associated with the target text generation task to obtain a task-specific text generation model, by taking task indication information for the target task type and first attribute indication information for the first target output text attribute as at least part of an input of the general text generation model.

Patent Agency Ranking