Patent search ap:("Oracle International Corporation") AND inv:"Duy Vu" Page 2

11.

发明公开
NAMED ENTITY BIAS DETECTION AND MITIGATION TECHNIQUES FOR SENTENCE SENTIMENT ANALYSIS 审中-公开

公开(公告)号：US20230153687A1

公开(公告)日：2023-05-18

申请号：US17984717

申请日：2022-11-10

Applicant: Oracle International Corporation

Inventor： Duy Vu , Varsha Kuppur Rajendra , Shivashankar Subramanian , Ahmed Ataallah Ataallah Abobakr , Thanh Long Duong , Mark Edward Johnson

IPC: G06N20/00 , G06K9/62

CPC classification number: G06N20/00 , G06K9/6259 , G06K9/6262

Abstract: Techniques for named entity bias detection and mitigation for sentence sentiment analysis. In one particular aspect, a method is provided that includes obtaining a training set of labeled examples for training a machine learning model to classify sentiment, preparing a list of named entities using one or more data sources, for each example in the training set of labeled examples with a named entity, replacing the named entity with a corresponding entity type tag to generate a labeled template data set, executing a sampling process for each entity type t within the labeled template data set to generate a augmented invariance data set comprising one or more invariance groups having labeled examples for each entity type t, and training the machine learning model using labeled examples from the augmented invariance data set.

12.

发明申请
FUSION OF WORD EMBEDDINGS AND WORD SCORES FOR TEXT CLASSIFICATION 有权

公开(公告)号：US20230100508A1

公开(公告)日：2023-03-30

申请号：US17936679

申请日：2022-09-29

Applicant: Oracle International Corporation

Inventor： Ahmed Ataallah Ataallah Abobakr , Mark Edward Johnson , Thanh Long Duong , Vladislav Blinov , Yu-Heng Hong , Cong Duy Vu Hoang , Duy Vu

IPC: G06F40/295 , G06F40/205 , G06F40/263

Abstract: Techniques disclosed herein relate generally to text classification and include techniques for fusing word embeddings with word scores for text classification. In one particular aspect, a method for text classification is provided that includes obtaining an embedding vector for a textual unit, based on a plurality of word embedding vectors and a plurality of word scores. The plurality of word embedding vectors includes a corresponding word embedding vector for each of a plurality of words of the textual unit, and the plurality of word scores includes a corresponding word score for each of the plurality of words of the textual unit. The method also includes passing the embedding vector for the textual unit through at least one feed-forward layer to obtain a final layer output, and performing a classification on the final layer output.

13.

发明申请
TECHNIQUES FOR OUT-OF-DOMAIN (OOD) DETECTION 有权

公开(公告)号：US20210303798A1

公开(公告)日：2021-09-30

申请号：US17217909

申请日：2021-03-30

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Mark Edward Johnson , Vishal Vishnoi , Crystal C. Pan , Vladislav Blinov , Cong Duy Vu Hoang , Elias Luqman Jalaluddin , Duy Vu , Balakota Srinivas Vinnakota

IPC: G06F40/30 , G06F40/289 , H04L12/58 , G06N20/00

Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances. One particular technique includes receiving an utterance and a target domain of a chatbot, generating a sentence embedding for the utterance, obtaining an embedding representation for each cluster of in-domain utterances associated with the target domain, predicting, using a metric learning model, a first probability that the utterance belongs to the target domain based on a similarity or difference between the sentence embedding and each embedding representation for each cluster, predicting, using an outlier detection model, a second probability that the utterance belongs to the target domain based on a determined distance or density deviation between the sentence embedding and embedding representations for neighboring clusters, evaluating the first probability and the second probability to determine a final probability, and classifying the utterance as in-domain or out-of-domain for the chatbot based on the final probability.

14.

发明申请
SYSTEM AND METHOD FOR IMPROVING AN END-TO-END AUTOMATIC SPEECH RECOGNITION MODEL 有权

公开(公告)号：US20250095636A1

公开(公告)日：2025-03-20

申请号：US18823371

申请日：2024-09-03

Applicant: Oracle International Corporation

Inventor： Duy Vu , Yu-Heng Hong , Ying Xu , Philip Arthur

IPC: G10L15/06 , G10L13/08 , G10L15/26

Abstract: Techniques are disclosed herein for improving the performance of an end-to-end (E2E) Automatic Speech Recognition (ASR) model in a target domain. A set of test examples are generated. The set of test examples comprise multiple subsets of test examples and each subset of test examples corresponds to a particular test category. A machine language model is then used to convert audio samples of the subset of test examples to text transcripts. A word error rate is determined for the subset of test examples. A test category is then selected based on the word error rates and a set of training examples is generated for training the ASR model in a particular target domain from a selected subset of test examples The training examples are used to fine-tune the model in the target domain. The trained model is then deployed in a cloud infrastructure of a cloud service provider.

15.

发明公开
MULTI-TASK MODEL WITH CONTEXT MASKING 审中-公开

公开(公告)号：US20240143934A1

公开(公告)日：2024-05-02

申请号：US18485700

申请日：2023-10-12

Applicant: Oracle International Corporation

Inventor： Poorya Zaremoodi , Duy Vu , Nagaraj N. Bhat , Srijon Sarkar , Varsha Kuppur Rajendra , Thanh Long Duong , Mark Edward Johnson , Pramir Sarkar , Shahid Reza

IPC: G06F40/30 , G06F40/284 , G06F40/289

CPC classification number: G06F40/30 , G06F40/284 , G06F40/289

Abstract: A method includes accessing document including sentences, document being associated with configuration flag indicating whether ABSA, SLSA, or both are to be performed; inputting the document into language model that generates chunks of token embeddings for the document; and, based on the configuration flag, performing at least one from among the ABSA and the SLSA by inputting the chunks of token embeddings into a multi-task model. When performing the SLSA, a part of token embeddings in each of the chunks is masked, and the masked token embeddings do not belong to a particular sentence on which the SLSA is performed.

16.

发明公开
CONTEXT TAG INTEGRATION WITH NAMED ENTITY RECOGNITION MODELS 审中-公开

公开(公告)号：US20240095454A1

公开(公告)日：2024-03-21

申请号：US18521805

申请日：2023-11-28

Applicant: Oracle International Corporation

Inventor： Duy Vu , Tuyen Quang Pham , Cong Duy Vu Hoang , Srinivasa Phani Kumar Gadde , Thanh Long Duong , Mark Edward Johnson , Vishal Vishnoi

IPC: G06F40/295 , G06F40/205 , G06F40/279 , G06F40/35 , G06F40/40 , G06V30/19

CPC classification number: G06F40/295 , G06F40/205 , G06F40/279 , G06F40/35 , G06F40/40 , G06V30/19147

Abstract: Techniques are provided for using context tags in named-entity recognition (NER) models. In one particular aspect, a method is provided that includes receiving an utterance, generating embeddings for words of the utterance, generating a regular expression and gazetteer feature vector for the utterance, generating a context tag distribution feature vector for the utterance, concatenating or interpolating the embeddings with the regular expression and gazetteer feature vector and the context tag distribution feature vector to generate a set of feature vectors, generating an encoded form of the utterance based on the set of feature vectors, generating log-probabilities based on the encoded form of the utterance, and identifying one or more constraints for the utterance.

17.

发明公开
TECHNIQUES FOR OUT-OF-DOMAIN (OOD) DETECTION 审中-公开

公开(公告)号：US20230376696A1

公开(公告)日：2023-11-23

申请号：US18364298

申请日：2023-08-02

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Mark Edward Johnson , Vishal Vishnoi , Crystal C. Pan , Vladislav Blinov , Cong Duy Vu Hoang , Elias Luqman Jalaluddin , Duy Vu , Balakota Srinivas Vinnakota

IPC: G06F40/30 , G06N20/00 , G06F40/289 , H04L51/02

CPC classification number: G06F40/30 , G06N20/00 , G06F40/289 , H04L51/02 , G06F40/205

Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances. One particular technique includes receiving an utterance and a target domain of a chatbot, generating a sentence embedding for the utterance, obtaining an embedding representation for each cluster of in-domain utterances associated with the target domain, predicting, using a metric learning model, a first probability that the utterance belongs to the target domain based on a similarity or difference between the sentence embedding and each embedding representation for each cluster, predicting, using an outlier detection model, a second probability that the utterance belongs to the target domain based on a determined distance or density deviation between the sentence embedding and embedding representations for neighboring clusters, evaluating the first probability and the second probability to determine a final probability, and classifying the utterance as in-domain or out-of-domain for the chatbot based on the final probability.

18.

发明公开
WIDE AND DEEP NETWORK FOR LANGUAGE DETECTION USING HASH EMBEDDINGS 审中-公开

公开(公告)号：US20230141853A1

公开(公告)日：2023-05-11

申请号：US18052694

申请日：2022-11-04

Applicant: Oracle International Corporation

Inventor： Thanh Tien Vu , Poorya Zaremoodi , Duy Vu , Mark Edward Johnson , Thanh Long Duong , Xu Zhong , Vladislav Blinov , Cong Duy Vu Hoang , Yu-Heng Hong , Vinamr Goel , Philip Victor Ogren , Srinivasa Phani Kumar Gadde , Vishal Vishnoi

IPC: G06F40/263 , G06F16/31

CPC classification number: G06F40/263 , G06F16/325 , H04L51/02

Abstract: Techniques disclosed herein relate generally to language detection. In one particular aspect, a method is provided that includes obtaining a sequence of n-grams of a textual unit; using an embedding layer to obtain an ordered plurality of embedding vectors for the sequence of n-grams; using a deep network to obtain an encoded vector that is based on the ordered plurality of embedding vectors; and using a classifier to obtain a language prediction for the textual unit that is based on the encoded vector. The deep network includes an attention mechanism, and using the embedding layer to obtain the ordered plurality of embedding vectors comprises, for each n-gram in the sequence of n-grams: obtaining hash values for the n-gram; based on the hash values, selecting component vectors from among the plurality of component vectors; and obtaining an embedding vector for the n-gram that is based on the component vectors.

19.

发明申请
CONTEXT TAG INTEGRATION WITH NAMED ENTITY RECOGNITION MODELS 有权

公开(公告)号：US20220229993A1

公开(公告)日：2022-07-21

申请号：US17648376

申请日：2022-01-19

Applicant: Oracle International Corporation

Inventor： Duy Vu , Tuyen Quang Pham , Cong Duy Vu Hoang , Srinivasa Phani Kumar Gadde , Thanh Long Duong , Mark Edward Johnson , Vishal Vishnoi

IPC: G06F40/295 , G06F40/205 , G06F40/35 , G06F40/40 , G06V30/19

Abstract: Techniques are provided for using context tags in named-entity recognition (NER) models. In one particular aspect, a method is provided that includes receiving an utterance, generating embeddings for words of the utterance, generating a regular expression and gazetteer feature vector for the utterance, generating a context tag distribution feature vector for the utterance, concatenating or interpolating the embeddings with the regular expression and gazetteer feature vector and the context tag distribution feature vector to generate a set of feature vectors, generating an encoded form of the utterance based on the set of feature vectors, generating log-probabilities based on the encoded form of the utterance, and identifying one or more constraints for the utterance.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification