-
公开(公告)号:US12153885B2
公开(公告)日:2024-11-26
申请号:US17580535
申请日:2022-01-20
Applicant: Oracle International Corporation
Inventor: Thanh Long Duong , Vishal Vishnoi , Mark Edward Johnson , Elias Luqman Jalaluddin , Tuyen Quang Pham , Cong Duy Vu Hoang , Poorya Zaremoodi , Srinivasa Phani Kumar Gadde , Aashna Devang Kanuga , Zikai Li , Yuanxu Wu
IPC: G06F40/289 , G06F40/166 , G06F40/205 , G06F40/263 , G06F40/279 , G06F40/295 , G06N3/08 , H04L51/02
Abstract: Techniques are disclosed for systems including techniques for multi-feature balancing for natural langue processors. In an embodiment, a method includes receiving a natural language query to be processed by a machine learning model, the machine learning model utilizing a dataset of natural language phrases for processing natural language queries, determining, based on the machine learning model and the natural language query, a feature dropout value, generating, and based on the natural language query, one or more contextual features and one or more expressional features that may be input to the machine learning model, modifying at least one or the one or more contextual features and the one or more expressional features based on the feature dropout value to generate a set of input features for the machine learning model, and processing the set of input features to cause generating an output dataset for corresponding to the natural language query.
-
公开(公告)号:US20240062011A1
公开(公告)日:2024-02-22
申请号:US18351680
申请日:2023-07-13
Applicant: Oracle International Corporation
Inventor: Aashna Devang Kanuga , Cong Duy Vu Hoang , Mark Edward Johnson , Vasisht Raghavendra , Yuanxu Wu , Steve Wai-Chun Siu , Nitika Mathur , Gioacchino Tangari , Shubham Pawankumar Shah , Vanshika Sridharan , Zikai Li , Diego Andres Cornejo Barra , Stephen Andrew McRitchie , Christopher Mark Broadbent , Vishal Vishnoi , Srinivasa Phani Kumar Gadde , Poorya Zaremoodi , Thanh Long Duong , Bhagya Gayathri Hettige , Tuyen Quang Pham , Arash Shamaei , Thanh Tien Vu , Yakupitiyage Don Thanuja Samodhve Dharmasiri
IPC: G06F40/295 , G06F40/284 , G06F40/211 , G06F40/35
CPC classification number: G06F40/295 , G06F40/284 , G06F40/211 , G06F40/35
Abstract: Techniques are disclosed herein for using named entity recognition to resolve entity expression while transforming natural language to a meaning representation language. In one aspect, a method includes accessing natural language text, predicting, by a first machine learning model, a class label for a token in the natural language text, predicting, by a second machine-learning model, operators for a meaning representation language and a value or value span for each attribute of the operators, in response to determining that the value or value span for a particular attribute matches the class label, converting a portion of the natural language text for the value or value span into a resolved format, and outputting syntax for the meaning representation language. The syntax comprises the operators with the portion of the natural language text for the value or value span in the resolved format.
-
23.
公开(公告)号:US20230325599A1
公开(公告)日:2023-10-12
申请号:US18185675
申请日:2023-03-17
Applicant: Oracle International Corporation
Inventor: Omid Mohamad Nezami , Shivashankar Subramanian , Thanh Tien Vu , Tuyen Quang Pham , Budhaditya Saha , Aashna Devang Kanuga , Shubham Pawankumar Shah
IPC: G06F40/295 , G06N3/006
CPC classification number: G06F40/295 , G06N3/006
Abstract: Techniques are provided for augmenting training data using gazetteers and perturbations to facilitate training named entity recognition models. The training data can be augmented by generating additional utterances from original utterances in the training data and combining the generated additional utterances with the original utterances to form the augmented training data. The additional utterances can be generated by replacing the named entities in the original utterances with different named entities and/or perturbed versions of the named entities in the original utterances selected from a gazetteer. Gazetteers of named entities can be generated from the training data and expanded by searching a knowledge base and/or perturbing the named entities therein. The named entity recognition model can be trained using the augmented training data.
-
24.
公开(公告)号:US20230185834A1
公开(公告)日:2023-06-15
申请号:US18065434
申请日:2022-12-13
Applicant: Oracle International Corporation
Inventor: Philip Arthur , Vishal Vishnoi , Mark Edward Johnson , Thanh Long Duong , Srinivasa Phani Kumar Gadde , Balakota Srinivas Vinnakota , Cong Duy Vu Hoang , Steve Wai-Chun Siu , Nitika Mathur , Gioacchino Tangari , Aashna Devang Kanuga
IPC: G06F16/332 , G06N20/00 , G06F40/47 , G06F40/211 , G06F40/237 , G06F40/284
CPC classification number: G06F16/3329 , G06N20/00 , G06F40/47 , G06F40/211 , G06F40/237 , G06F40/284 , G06F40/35
Abstract: Techniques are disclosed herein for synthesizing synthetic training data to facilitate training a natural language to logical form model. In one aspect, training data can be synthesized from original under a framework based on templates and a synchronous context-free grammar. In one aspect, training data can be synthesized under a framework based on a probabilistic context-free grammar and a translator. In one aspect, training data can be synthesized under a framework based on tree-to-string translation. In one aspect, the synthetic training data can be combined with original training data in order to train a machine learning model to translate an utterance to a logical form.
-
公开(公告)号:US20230136965A1
公开(公告)日:2023-05-04
申请号:US17978023
申请日:2022-10-31
Applicant: Oracle International Corporation
Inventor: Thanh Tien Vu , Tuyen Quang Pham , Mark Edward Johnson , Thanh Long Duong , Aashna Devang Kanuga , Srinivasa Phani Kumar Gadde , Vishal Vishnoi
IPC: G06F40/40 , G06F40/295
Abstract: In some aspects, a computer obtains a trained conditional random field (CRF) model comprising a set of model parameters learned from training data and stored in a transition matrix. Tag sequences, inconsistent with the tag sequence logic, are identified for the tags within the transition matrix. setting, within the transition matrix, a cost associated with transitioning between the pair of tags to be equal to a predefined hyperparameter value that penalizes the transitioning between the inconsistent pair of tags. The CRF model receives a string of text comprising one or more named entities. The CRF model inputs the string of text into the CRF model having the cost associated with the transitioning between the pair of tags set equal to the predefined hyperparameter value. The CRF model classifies the words within the string of text into different classes which might include the one or more named entities.
-
-
-
-