-
公开(公告)号:US11113469B2
公开(公告)日:2021-09-07
申请号:US16366628
申请日:2019-03-27
Applicant: International Business Machines Corporation
Inventor: Corville O. Allen , Roberto Delima , Chris Mwarabu , David Contreras , Kandhan Sekar , Krishna Mahajan
IPC: G10L15/00 , G10L15/18 , G06F40/284 , G06F17/16 , G06F40/289
Abstract: A phrase may be received that includes a plurality of tokens in a natural language format. A plurality of levels relating to dependencies between tokens of the plurality of tokens within the phrase is determined. A matrix structure is generated for the phrase. The matrix structure utilizes a plurality of rows and a plurality of columns to store data of the phrase. The plurality of rows and the plurality of columns each indicate one of an order of tokens of the plurality of tokens or levels of the plurality of levels.
-
公开(公告)号:US20200311197A1
公开(公告)日:2020-10-01
申请号:US16366628
申请日:2019-03-27
Applicant: International Business Machines Corporation
Inventor: Corville O. Allen , Roberto Delima , Chris Mwarabu , David Contreras , Kandhan Sekar , Krishna Mahajan
Abstract: A phrase may be received that includes a plurality of tokens in a natural language format. A plurality of levels relating to dependencies between tokens of the plurality of tokens within the phrase is determined. A matrix structure is generated for the phrase. The matrix structure utilizes a plurality of rows and a plurality of columns to store data of the phrase. The plurality of rows and the plurality of columns each indicate one of an order of tokens of the plurality of tokens or levels of the plurality of levels.
-
公开(公告)号:US11687796B2
公开(公告)日:2023-06-27
申请号:US16386652
申请日:2019-04-17
Applicant: International Business Machines Corporation
Inventor: Brien H. Muschett , Andrew R. Freed , Roberto Delima , David Contreras , Krishna Mahajan
IPC: G06F17/00 , G06N5/022 , G06F16/93 , G06N20/00 , G06F16/908 , G06F40/30 , G06F40/284 , G06F40/289 , G06N5/025
CPC classification number: G06N5/022 , G06F16/908 , G06F16/93 , G06F40/284 , G06F40/289 , G06F40/30 , G06N5/025 , G06N20/00
Abstract: An approach is provided that receives a document and a document type of the document. The document type identifies a document category to which the received document belongs. A set of linguistic metrics are retrieved that correspond to the document type. A quality of the received document is automatically determined based on a set of linguistic features found in the document as compared to the retrieved set of linguistic metrics. The document is then ingested into a corpus that is utilized by a question-answering (QA) system. The ingestion of the document is based on the determined quality.
-
公开(公告)号:US11593561B2
公开(公告)日:2023-02-28
申请号:US16567186
申请日:2019-09-11
Applicant: International Business Machines Corporation
Inventor: David Contreras , Krishna Mahajan , Roberto Delima , Kandhan Sekar , Corville O. Allen , Chris Mwarabu
IPC: G06F40/30 , G06K9/62 , G06F16/31 , G06F40/205
Abstract: A phrase that includes a trigger word that modifies a meaning within the phrase is received. The trigger word is identified. The words of the phrase that are modified by the trigger word are identified by analyzing features of the phrase that link the trigger word to other words. The phrase is interpreted by modifying the second subset of words according to the modification of the trigger word.
-
公开(公告)号:US11017171B2
公开(公告)日:2021-05-25
申请号:US16434112
申请日:2019-06-06
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Roberto Delima , Andrew R. Freed , Brien Muschett , Krishna Mahajan , David Contreras
IPC: G06F40/284 , G06N5/02 , G06F40/169
Abstract: A method, computer system, and a computer program product for relevancy-based document quality assessment is provided. The present invention may include computing a document quality score based on at least one container relevancy score determined based on at least one domain link to a domain knowledge base.
-
公开(公告)号:US20200175116A1
公开(公告)日:2020-06-04
申请号:US16567186
申请日:2019-09-11
Applicant: International Business Machines Corporation
Inventor: David Contreras , Krishna Mahajan , Roberto Delima , Kandhan Sekar , Corville O. Allen , Chris Mwarabu
Abstract: A phrase that includes a trigger word that modifies a meaning within the phrase is received. The trigger word is identified. The words of the phrase that are modified by the trigger word are identified by analyzing features of the phrase that link the trigger word to other words. The phrase is interpreted by modifying the second subset of words according to the modification of the trigger word.
-
公开(公告)号:US11295080B2
公开(公告)日:2022-04-05
申请号:US16430427
申请日:2019-06-04
Applicant: International Business Machines Corporation
Inventor: Corville O. Allen , Roberto Delima , David Contreras , Krishna Mahajan
IPC: G06F40/242 , G06F40/279 , G06N20/00 , G06F40/40 , G06F40/211
Abstract: A method, system, and computer program product include providing a list of triggers, training the natural language processor with the list of triggers, providing to the natural language processor a text including one trigger, selecting nodes in the text to create an original potential span, predicting whether the original potential span includes another trigger, and adjusting, in response to predicting that the original potential span includes another trigger, the original potential span to exclude the another trigger to create a new potential span.
-
公开(公告)号:US20200334546A1
公开(公告)日:2020-10-22
申请号:US16386652
申请日:2019-04-17
Applicant: International Business Machines Corporation
Inventor: Brien H. Muschett , Andrew R. Freed , Roberto Delima , David Contreras , Krishna Mahajan
IPC: G06N5/02 , G06F16/93 , G06F16/908 , G06F17/27 , G06N20/00
Abstract: An approach is provided that receives a document and a document type of the document. The document type identifies a document category to which the received document belongs. A set of linguistic metrics are retrieved that correspond to the document type. A quality of the received document is automatically determined based on a set of linguistic features found in the document as compared to the retrieved set of linguistic metrics. The document is then ingested into a corpus that is utilized by a question-answering (QA) system. The ingestion of the document is based on the determined quality.
-
公开(公告)号:US20200302332A1
公开(公告)日:2020-09-24
申请号:US16359591
申请日:2019-03-20
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: David Contreras , Krishna Mahajan , Roberto Delima , Andrew R. Freed , Brien Muschett
Abstract: A computer-implemented method, system and computer program product for generating a client-specific document quality model, by: analyzing data using existing quality heuristics to identify new, unexpected or problem patterns in the data; forming the quality heuristics into one or more clusters for each container level of the data; exploring each of the clusters to identify sources of the patterns; and developing new quality heuristics based on the sources of the patterns, wherein the new quality heuristics are used to generate the client-specific document quality model.
-
-
-
-
-
-
-
-