Patent search ap:("BRITISH TELECOMMUNICATIONS PUBLIC LIMITED COMPANY") AND inv:"Leonhard APPLIS" Page 1

1.

发明公开
PRE-PROCESSING FOR NATURAL LANGUAGE PROCESSING 审中-公开

公开(公告)号：US20240046036A1

公开(公告)日：2024-02-08

申请号：US18258867

申请日：2021-12-07

Applicant: BRITISH TELECOMMUNICATIONS PUBLIC LIMITED COMPANY

Inventor： Aygul GARIFULLINA , Mathias KERN , Leonhard APPLIS

IPC: G06F40/284

CPC classification number: G06F40/284

Abstract: A computer implemented method of pre-processing an input text for a natural language processing operation based on a training corpus of documents, can include accessing a set of stop words including predetermined words for de-emphasis in the text for the natural language processing operation, the set of stop words being separated into at least two subsets including a first subset and a second subset, the second subset containing stop words predetermined to be of potential semantic significance to documents in the training corpus; tokenizing documents in a training corpus to an ordered set of corpus tokens; removing, from the set of corpus tokens, tokens corresponding to stop words in the first subset of stop words; generating a set of n-grams by identifying n-grams from groups of tokens in the set of corpus tokens based on predetermined rules for n-gram identification; tokenizing the input text to an ordered set of input text tokens; identifying groups of tokens in the set of input text tokens corresponding to n-grams in the set of n-grams and replacing, in the set of input text tokens, each identified group of tokens by a singular n-gram token; removing, from the set of input text tokens, tokens corresponding to stop words in the second subset of stop words; and processing the input text by the natural language processing operation based on the set of input text tokens for the input text.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification