Text classification using automatically generated seed data
摘要:
Certain aspects produce a scoring model that can automatically classify future text samples. In some examples, a processing device perform operations for producing a scoring model using active learning. The operations includes receiving existing text samples and searching a stored, pre-trained corpus defining embedding vectors for selected words, phrases, or documents to produce nearest neighbor vectors for each embedding vector. Nearest neighbor selections are identified based on distance between each nearest neighbor vector and the embedding vector for each selection to produce a text cloud. Text samples are selected from the text cloud to produce seed data that is used to train a text classifier. A scoring model can be produced based on the text classifier. The scoring model can receive a plurality of new text samples and provide a score indicative of a likelihood of being a member of a selected class.
信息查询
0/0