Keyphrase generation leveraging public repository categories

    公开(公告)号:US12072841B2

    公开(公告)日:2024-08-27

    申请号:US18054984

    申请日:2022-11-14

    CPC classification number: G06F16/16 G06F16/148 G06N20/00

    Abstract: One or more systems, devices, computer program products and/or computer-implemented methods of use provided herein relate to a process for generating the classification of files to allow for file system organization and/or query augmentation. A system can comprise a memory that stores computer executable components, and a processor that executes the computer executable components stored in the memory, wherein the computer executable components can comprise a generating component that generates a keyphrase based on a context derived from evaluation of an input file, wherein the generating component employs a public repository of files annotated with a plurality of keyphrases, including the keyphrase, to generate the keyphrase based on the context, and an execution component that classifies the input file based on the keyphrase. In one or more embodiments, the input file can comprise a query, and classification of the input file can comprise augmenting the query based on the keyphrase.

    Unsupervised corpus expansion using domain-specific terms

    公开(公告)号:US11615154B2

    公开(公告)日:2023-03-28

    申请号:US17177459

    申请日:2021-02-17

    Abstract: In an approach to unsupervised corpus expansion using domain-specific terms, one or more computer processors retrieve one or more domain-specific terms from a corpus of text. One or more computer processors search the World Wide Web for the one or more domain-specific terms to produce a plurality of web pages associated with each of the one or more domain-specific terms. One or more computer processors determine a confidence score for each of the plurality of web pages. One or more computer processors determine the confidence score of at least one of the plurality of web pages exceeds a pre-defined threshold. One or more computer processors add the at least one of the plurality of web pages to the corpus of text.

    HYPERNYM DETECTION USING STRICT PARTIAL ORDER NETWORKS

    公开(公告)号:US20210081500A1

    公开(公告)日:2021-03-18

    申请号:US16575107

    申请日:2019-09-18

    Abstract: One embodiment of the present invention provides a method comprising receiving a text corpus, and generating a first list of triples based on the text corpus. Each triple of the first list comprises a first term representing a candidate hyponym, a second term representing a candidate hypernym, and a frequency value indicative of a number of times a hypernymy relation is observed between the candidate hyponym and the candidate hypernym in the text corpus. The method further comprises training a neural network for hypernym induction based on the first list. The trained neural network is a strict partial order network (SPON) model.

    Unsupervised hypernym induction machine learning

    公开(公告)号:US11507828B2

    公开(公告)日:2022-11-22

    申请号:US16666800

    申请日:2019-10-29

    Abstract: Training a machine learning model such as a neural network, which can automatically extract a hypernym from unstructured data, is disclosed. A preliminary candidate list of hyponym-hypernym pairs can be parsed from the corpus. A preliminary super-term—sub-term glossary can be generated from the corpus, the preliminary super-term—sub-term glossary containing one or more super-term—sub-term pairs. A super-term—sub-term pair can be filtered from the preliminary super-term—sub-term glossary, responsive to detecting that the super-term—sub-term pair is not a candidate for hyponym-hypernym pair, to generate a final super-term—sub-term glossary. The preliminary candidate list of hyponym-hypernym pairs and the final super-term—sub-term glossary can be combined to generate a final list of hyponym-hypernym pairs. An artificial neural network can be trained using the final list of hyponym-hypernym pairs as a training data set, the artificial neural network trained to identify a hypernym given new text data.

    KNOWLEDGE INDUCTION USING CORPUS EXPANSION

    公开(公告)号:US20220067539A1

    公开(公告)日:2022-03-03

    申请号:US17008856

    申请日:2020-09-01

    Abstract: A method, a computer program product, and a computer system induce knowledge from a knowledge graph. The method includes receiving a request indicative of a domain. The method includes determining a corpus corresponding to the domain and determining a quality of the corpus in generating the knowledge graph relative to a quality threshold. If the quality threshold is not met, the method includes determining a candidate expansion corpus to incorporate further data therefrom into the corpus relative to an expansion threshold. If the expansion threshold is met, the method includes generating an expanded corpus by expanding the corpus with the further data. The method includes generating the knowledge graph based on the expanded corpus from which the knowledge is induced and generating a response to the request based on the knowledge graph.

    HYPERNYM-HYPONYM PAIR INDUCTION
    6.
    发明申请

    公开(公告)号:US20220004711A1

    公开(公告)日:2022-01-06

    申请号:US16918018

    申请日:2020-07-01

    Abstract: An approach to induction of unknown terms into a term taxonomy graph may be provided. The approach may include analyzing a domain specific corpus to generate a term taxonomy graph using a term taxonomy graph generation model with a term knowledge base and determining which terms within the domain specific corpus are out of vocabulary (OOV) terms. The approach may also analyze the terms in the domain specific corpus with a semantic representation model to generate feature vectors of the OOV terms and terms known within the generated term taxonomy graph. The approach may determine if an OOV can be a hyponym of a term within the term taxonomy graph based on the feature vectors and insert the OOV term into the graph at the appropriate location.

    HYPERNYM DETECTION USING STRICT PARTIAL ORDER NETWORKS

    公开(公告)号:US20210303800A1

    公开(公告)日:2021-09-30

    申请号:US17343643

    申请日:2021-06-09

    Abstract: One embodiment of the present invention provides a method comprising receiving a text corpus, and generating a first list of triples based on the text corpus. Each triple of the first list comprises a first term representing a candidate hyponym, a second term representing a candidate hypernym, and a frequency value indicative of a number of times a hypernymy relation is observed between the candidate hyponym and the candidate hypernym in the text corpus. The method further comprises training a neural network for hypernym induction based on the first list. The trained neural network is a strict partial order network (SPON) model.

    Geographic location specific models for information extraction and knowledge discovery

    公开(公告)号:US11055491B2

    公开(公告)日:2021-07-06

    申请号:US16268044

    申请日:2019-02-05

    Abstract: Computer-implemented methods, computer systems and computer program products for providing geographic location specific models for information extraction and knowledge discovery are provided. Aspects include receiving a body of input text using a processor having natural language processing functionality. Aspects also include using information extraction functionality of the processor to extract preliminary information including a relational table from the body of input text. Aspects also include determining one or more geographical contexts associated with the input text based on the preliminary information. Aspects also include determining inferred information based on the preliminary information and the one or more geographical contexts associated with the input text. Aspect also include augmenting the relational table with the inferred information.

    GEOGRAPHIC LOCATION SPECIFIC MODELS FOR INFORMATION EXTRACTION AND KNOWLEDGE DISCOVERY

    公开(公告)号:US20200250275A1

    公开(公告)日:2020-08-06

    申请号:US16268044

    申请日:2019-02-05

    Abstract: Computer-implemented methods, computer systems and computer program products for providing geographic location specific models for information extraction and knowledge discovery are provided. Aspects include receiving a body of input text using a processor having natural language processing functionality. Aspects also include using information extraction functionality of the processor to extract preliminary information including a relational table from the body of input text. Aspects also include determining one or more geographical contexts associated with the input text based on the preliminary information. Aspects also include determining inferred information based on the preliminary information and the one or more geographical contexts associated with the input text. Aspect also include augmenting the relational table with the inferred information.

Patent Agency Ranking