PROHIBITING INCONSISTENT NAMED ENTITY RECOGNITION TAG SEQUENCES

    公开(公告)号:US20230136965A1

    公开(公告)日:2023-05-04

    申请号:US17978023

    申请日:2022-10-31

    Abstract: In some aspects, a computer obtains a trained conditional random field (CRF) model comprising a set of model parameters learned from training data and stored in a transition matrix. Tag sequences, inconsistent with the tag sequence logic, are identified for the tags within the transition matrix. setting, within the transition matrix, a cost associated with transitioning between the pair of tags to be equal to a predefined hyperparameter value that penalizes the transitioning between the inconsistent pair of tags. The CRF model receives a string of text comprising one or more named entities. The CRF model inputs the string of text into the CRF model having the cost associated with the transitioning between the pair of tags set equal to the predefined hyperparameter value. The CRF model classifies the words within the string of text into different classes which might include the one or more named entities.

Patent Agency Ranking