NOISE DATA AUGMENTATION FOR NATURAL LANGUAGE PROCESSING

    公开(公告)号:US20210304733A1

    公开(公告)日:2021-09-30

    申请号:US17016117

    申请日:2020-09-09

    Abstract: Techniques for noise data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with noise text to generate an augmented training set of utterances; and training the intent classifier using the augmented training set of utterances. The augmenting includes: obtaining the noise text from a list of words, a text corpus, a publication, a dictionary, or any combination thereof irrelevant of original text within the utterances of the training set of utterances, and incorporating the noise text within the utterances relative to the original text in the utterances of the training set of utterances at a predefined augmentation ratio to generate augmented utterances.

Patent Agency Ranking