Systems and methods for generating labeled short text sequences

    公开(公告)号:US11797594B2

    公开(公告)日:2023-10-24

    申请号:US17093722

    申请日:2020-11-10

    CPC classification number: G06F16/355 G06F16/367 G06F40/289

    Abstract: A set of documents related to a particular topic, industry, or entity are received. Sentences are extract from each document. The sentences are grouped into tuples of one, two, or three consecutive sentences (i.e., short text sequences). The sentence tuples are clustered based on vector representations of the sentences. For each cluster, a set of tuples that best represents or best fits the cluster is selected. These sentence tuples are fed to an ontology to determine ontological entities associated with each tuple. These determined ontological entities are associated with the clusters corresponding to each tuple. The sentence tuples associated with each cluster are labeled based on the ontological entities associated with the cluster. The labeled sentence tuples may then be used for a variety of purposes such as training a model to determine the topic of short text sequences.

    SYSTEMS AND METHODS FOR GENERATING LABELED SHORT TEXT SEQUENCES

    公开(公告)号:US20210173862A1

    公开(公告)日:2021-06-10

    申请号:US17093722

    申请日:2020-11-10

    Abstract: A set of documents related to a particular topic, industry, or entity are received. Sentences are extract from each document. The sentences are grouped into tuples of one, two, or three consecutive sentences (i.e., short text sequences). The sentence tuples are clustered based on vector representations of the sentences. For each cluster, a set of tuples that best represents or best fits the cluster is selected. These sentence tuples are fed to an ontology to determine ontological entities associated with each tuple. These determined ontological entities are associated with the clusters corresponding to each tuple. The sentence tuples associated with each cluster are labeled based on the ontological entities associated with the cluster. The labeled sentence tuples may then be used for a variety of purposes such as training a model to determine the topic of short text sequences.

Patent Agency Ranking