DATA AUGMENTATION FOR TEXT-BASED AI APPLICATIONS

    公开(公告)号:US20200372395A1

    公开(公告)日:2020-11-26

    申请号:US16416837

    申请日:2019-05-20

    Abstract: A cognitive system (artificial intelligence) is optimized by assessing different data augmentation methods used to augment training data, and then training the system using a training set augmented by the best identified method. The augmentation methods are assessed by applying them to the same set of training data to generate different augmented training data sets. Respective instances of the cognitive system are trained with the augmented sets, and each instance is subjected to validation testing to assess its goodness. The validation testing can include multiple validation tests leading to component scores, and a combined validation score is computed as a weighted average of the component scores using respective weights for each validation test. The augmentation method corresponding to the instance having the highest combined validation score is selected as the optimum augmentation method for the particular cognitive system at hand.

    Data augmentation for text-based AI applications

    公开(公告)号:US11568307B2

    公开(公告)日:2023-01-31

    申请号:US16416837

    申请日:2019-05-20

    Abstract: A cognitive system (artificial intelligence) is optimized by assessing different data augmentation methods used to augment training data, and then training the system using a training set augmented by the best identified method. The augmentation methods are assessed by applying them to the same set of training data to generate different augmented training data sets. Respective instances of the cognitive system are trained with the augmented sets, and each instance is subjected to validation testing to assess its goodness. The validation testing can include multiple validation tests leading to component scores, and a combined validation score is computed as a weighted average of the component scores using respective weights for each validation test. The augmentation method corresponding to the instance having the highest combined validation score is selected as the optimum augmentation method for the particular cognitive system at hand.

    Natural language processing payload generation

    公开(公告)号:US11556705B2

    公开(公告)日:2023-01-17

    申请号:US17083510

    申请日:2020-10-29

    Abstract: An input text that is also transmitted to a text processing service (e.g., a cloud based text processing service) is received. Characterizing information (e.g., contiguous parts of speech, terms used per part of speech, payload length, etc.) is extracted from the input text. A text payload is generated using the characterizing information. A performance test is run on the text payload. The performance test can include performing at least one selected from a group consisting of: sentiment analysis on the text payload, entity analysis on the text payload, content classification on the text payload, and syntax analysis on the text payload. The performance test can yield a processing time required to perform the performance test. Memory and processing power resource allocation to the text processing service can be altered based on the processing time of the performance test.

    Knowledge-based information retrieval system evaluation

    公开(公告)号:US11461376B2

    公开(公告)日:2022-10-04

    申请号:US16507770

    申请日:2019-07-10

    Abstract: Embodiments provide a computer implemented method of evaluating one or more IR systems, the method including: providing, by a processor, a pre-indexed knowledge-based document to a pre-trained sentence identification model; identifying, by the sentence identification model, a predetermined number of query-worthy sentences from the pre-indexed knowledge-based document, wherein the query-worthy sentences are ranked based on a prediction probability value of each query-worthy sentence; providing, by the sentence identification model, the query-worthy sentences to a pre-trained query generation model; generating, by the query generation model, a query for each query-worthy sentence; and evaluating, by the processor, the one or more IR systems using the generated queries, wherein one or more searches are performed via the one or more IR systems, and the one or more searches are performed in a set of knowledge-based documents including the pre-indexed knowledge-based document.

    Method and System for Unlabeled Data Selection Using Failed Case Analysis

    公开(公告)号:US20210326719A1

    公开(公告)日:2021-10-21

    申请号:US16850985

    申请日:2020-04-16

    Abstract: A method, system, and a computer program product automatically select training data for updating a model by applying human-annotated training data to a model to generate results that are evaluated to identify correct case results and false case results that are categorized into error type categories for use in building error models corresponding to the error type categories, where each error model is built from at least failed case results belonging to a corresponding error type, and where unlabeled data samples are applied to each error model to compute an error likelihood for each unlabeled data sample with respect to each error type category, thereby enabling the selection and display of unlabeled data samples for annotation by a subject matter expert based on a computed error likelihood for the one or more unlabeled data samples in a specified error type category meeting or exceeding an error threshold requirement.

    DATA AUGMENTATION FOR TEXT-BASED AI APPLICATIONS

    公开(公告)号:US20200372404A1

    公开(公告)日:2020-11-26

    申请号:US16510951

    申请日:2019-07-14

    Abstract: A cognitive system (artificial intelligence) is optimized by assessing different data augmentation methods used to augment training data, and then training the system using a training set augmented by the best identified method. The augmentation methods are assessed by applying them to the same set of training data to generate different augmented training data sets. Respective instances of the cognitive system are trained with the augmented sets, and each instance is subjected to validation testing to assess its goodness. The validation testing can include multiple validation tests leading to component scores, and a combined validation score is computed as a weighted average of the component scores using respective weights for each validation test. The augmentation method corresponding to the instance having the highest combined validation score is selected as the optimum augmentation method for the particular cognitive system at hand.

    Machine learning model error detection

    公开(公告)号:US11720819B2

    公开(公告)日:2023-08-08

    申请号:US16888356

    申请日:2020-05-29

    CPC classification number: G06N20/00 G06N5/02

    Abstract: A system includes a memory having instructions therein and at least one processor in communication with the memory. The at least one processor is configured to execute the instructions to determine a global-level importance magnitude value for a global-level importance of an explainable feature of a machine learning base model based on a first prediction of the machine learning base model. The at least one processor is also configured to execute the instructions to determine a global-level importance direction label for the global-level importance of the explainable feature based on the first prediction. The at least one processor is also configured to execute the instructions to generate a communication for presentation to a user based on a second prediction of the machine learning base model, based on the global-level importance magnitude value, and based on the global-level importance direction label.

Patent Agency Ranking