-
公开(公告)号:US12072841B2
公开(公告)日:2024-08-27
申请号:US18054984
申请日:2022-11-14
Applicant: International Business Machines Corporation
Inventor: Gaetano Rossiello , Md Faisal Mahbub Chowdhury , Alfio Massimiliano Gliozzo , Nandana Mihindukulasooriya , Michael Robert Glass
CPC classification number: G06F16/16 , G06F16/148 , G06N20/00
Abstract: One or more systems, devices, computer program products and/or computer-implemented methods of use provided herein relate to a process for generating the classification of files to allow for file system organization and/or query augmentation. A system can comprise a memory that stores computer executable components, and a processor that executes the computer executable components stored in the memory, wherein the computer executable components can comprise a generating component that generates a keyphrase based on a context derived from evaluation of an input file, wherein the generating component employs a public repository of files annotated with a plurality of keyphrases, including the keyphrase, to generate the keyphrase based on the context, and an execution component that classifies the input file based on the keyphrase. In one or more embodiments, the input file can comprise a query, and classification of the input file can comprise augmenting the query based on the keyphrase.
-
公开(公告)号:US11615154B2
公开(公告)日:2023-03-28
申请号:US17177459
申请日:2021-02-17
Applicant: International Business Machines Corporation
Inventor: Md Faisal Mahbub Chowdhury , Alfio Massimiliano Gliozzo
IPC: G06F16/00 , G06F16/951 , G06F16/9535 , G06F40/284 , G06F40/30
Abstract: In an approach to unsupervised corpus expansion using domain-specific terms, one or more computer processors retrieve one or more domain-specific terms from a corpus of text. One or more computer processors search the World Wide Web for the one or more domain-specific terms to produce a plurality of web pages associated with each of the one or more domain-specific terms. One or more computer processors determine a confidence score for each of the plurality of web pages. One or more computer processors determine the confidence score of at least one of the plurality of web pages exceeds a pre-defined threshold. One or more computer processors add the at least one of the plurality of web pages to the corpus of text.
-
公开(公告)号:US20210081500A1
公开(公告)日:2021-03-18
申请号:US16575107
申请日:2019-09-18
Applicant: International Business Machines Corporation
Inventor: Sarthak Dash , Alfio Massimiliano Gliozzo , Md Faisal Mahbub Chowdhury
Abstract: One embodiment of the present invention provides a method comprising receiving a text corpus, and generating a first list of triples based on the text corpus. Each triple of the first list comprises a first term representing a candidate hyponym, a second term representing a candidate hypernym, and a frequency value indicative of a number of times a hypernymy relation is observed between the candidate hyponym and the candidate hypernym in the text corpus. The method further comprises training a neural network for hypernym induction based on the first list. The trained neural network is a strict partial order network (SPON) model.
-
公开(公告)号:US11507828B2
公开(公告)日:2022-11-22
申请号:US16666800
申请日:2019-10-29
Applicant: International Business Machines Corporation
Inventor: Md Faisal Mahbub Chowdhury , Robert G. Farrell , Nicholas Brady Garvan Monath , Michael Robert Glass , Md Arafat Sultan
IPC: G06F40/205 , G06N3/08 , G06K9/62 , G06N5/04
Abstract: Training a machine learning model such as a neural network, which can automatically extract a hypernym from unstructured data, is disclosed. A preliminary candidate list of hyponym-hypernym pairs can be parsed from the corpus. A preliminary super-term—sub-term glossary can be generated from the corpus, the preliminary super-term—sub-term glossary containing one or more super-term—sub-term pairs. A super-term—sub-term pair can be filtered from the preliminary super-term—sub-term glossary, responsive to detecting that the super-term—sub-term pair is not a candidate for hyponym-hypernym pair, to generate a final super-term—sub-term glossary. The preliminary candidate list of hyponym-hypernym pairs and the final super-term—sub-term glossary can be combined to generate a final list of hyponym-hypernym pairs. An artificial neural network can be trained using the final list of hyponym-hypernym pairs as a training data set, the artificial neural network trained to identify a hypernym given new text data.
-
公开(公告)号:US20220067539A1
公开(公告)日:2022-03-03
申请号:US17008856
申请日:2020-09-01
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: NANDANA MIHINDUKULASOORIYA , Md Faisal Mahbub Chowdhury , Yu Deng , Ruchi Mahindru , Nicolas Rodolfo Fauceglia , Alfio Massimiliano Gliozzo
IPC: G06N5/02
Abstract: A method, a computer program product, and a computer system induce knowledge from a knowledge graph. The method includes receiving a request indicative of a domain. The method includes determining a corpus corresponding to the domain and determining a quality of the corpus in generating the knowledge graph relative to a quality threshold. If the quality threshold is not met, the method includes determining a candidate expansion corpus to incorporate further data therefrom into the corpus relative to an expansion threshold. If the expansion threshold is met, the method includes generating an expanded corpus by expanding the corpus with the further data. The method includes generating the knowledge graph based on the expanded corpus from which the knowledge is induced and generating a response to the request based on the knowledge graph.
-
公开(公告)号:US20220004711A1
公开(公告)日:2022-01-06
申请号:US16918018
申请日:2020-07-01
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Feifei Pan , Md Faisal Mahbub Chowdhury , Alfio Massimiliano Gliozzo
IPC: G06F40/279 , G06F16/33 , G06N20/00 , G06F40/30
Abstract: An approach to induction of unknown terms into a term taxonomy graph may be provided. The approach may include analyzing a domain specific corpus to generate a term taxonomy graph using a term taxonomy graph generation model with a term knowledge base and determining which terms within the domain specific corpus are out of vocabulary (OOV) terms. The approach may also analyze the terms in the domain specific corpus with a semantic representation model to generate feature vectors of the OOV terms and terms known within the generated term taxonomy graph. The approach may determine if an OOV can be a hyponym of a term within the term taxonomy graph based on the feature vectors and insert the OOV term into the graph at the appropriate location.
-
公开(公告)号:US20210303800A1
公开(公告)日:2021-09-30
申请号:US17343643
申请日:2021-06-09
Applicant: International Business Machines Corporation
Inventor: Sarthak Dash , Alfio Massimiliano Gliozzo , Md Faisal Mahbub Chowdhury
Abstract: One embodiment of the present invention provides a method comprising receiving a text corpus, and generating a first list of triples based on the text corpus. Each triple of the first list comprises a first term representing a candidate hyponym, a second term representing a candidate hypernym, and a frequency value indicative of a number of times a hypernymy relation is observed between the candidate hyponym and the candidate hypernym in the text corpus. The method further comprises training a neural network for hypernym induction based on the first list. The trained neural network is a strict partial order network (SPON) model.
-
公开(公告)号:US11055491B2
公开(公告)日:2021-07-06
申请号:US16268044
申请日:2019-02-05
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Md Faisal Mahbub Chowdhury , Michael Robert Glass
Abstract: Computer-implemented methods, computer systems and computer program products for providing geographic location specific models for information extraction and knowledge discovery are provided. Aspects include receiving a body of input text using a processor having natural language processing functionality. Aspects also include using information extraction functionality of the processor to extract preliminary information including a relational table from the body of input text. Aspects also include determining one or more geographical contexts associated with the input text based on the preliminary information. Aspects also include determining inferred information based on the preliminary information and the one or more geographical contexts associated with the input text. Aspect also include augmenting the relational table with the inferred information.
-
公开(公告)号:US10740559B2
公开(公告)日:2020-08-11
申请号:US15469766
申请日:2017-03-27
Applicant: International Business Machines Corporation
IPC: G06F16/2457 , G06F40/295 , G06F16/242 , G06F16/28 , G06F16/33 , G06F16/35 , G06F16/951 , G06F16/36 , G06F40/40 , G06F40/284
Abstract: A terminology extraction method, system, and computer program product include extracting terminology specific to a domain from a corpus of domain-specific text, where no external general domain reference corpus is required. The method assumes that terms which share common noun token(s) in a domain corpus are likely to be very related, that terms which are very related in a domain are likely to be equally or similarly important even though there might be large differences among their term frequencies, and that an abbreviation and its corresponding expansion have equal importance as terms.
-
10.
公开(公告)号:US20200250275A1
公开(公告)日:2020-08-06
申请号:US16268044
申请日:2019-02-05
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Md Faisal Mahbub Chowdhury , Michael Robert Glass
IPC: G06F17/27
Abstract: Computer-implemented methods, computer systems and computer program products for providing geographic location specific models for information extraction and knowledge discovery are provided. Aspects include receiving a body of input text using a processor having natural language processing functionality. Aspects also include using information extraction functionality of the processor to extract preliminary information including a relational table from the body of input text. Aspects also include determining one or more geographical contexts associated with the input text based on the preliminary information. Aspects also include determining inferred information based on the preliminary information and the one or more geographical contexts associated with the input text. Aspect also include augmenting the relational table with the inferred information.
-
-
-
-
-
-
-
-
-