-
公开(公告)号:US20230186914A1
公开(公告)日:2023-06-15
申请号:US18092170
申请日:2022-12-30
Applicant: Oracle International Corporation
Inventor: Thanh Long Duong , Mark Edward Johnson , Vu Cong Duy Hoang , Tuyen Quang Pham , Yu-Heng Hong , Vladislavs Dovgalecs , Guy Bashkansky , Jason Eric Black , Andrew David Bleeker , Serge Le Huitouze
IPC: G10L15/22
CPC classification number: G10L15/22
Abstract: Described herein are dialog systems, and techniques for providing such dialog systems, that are suitable for use on standalone computing devices. In some embodiments, a dialog system includes a dialog manager, which takes as input an input logical form, which may be a representation of user input. The dialog manager may include a dialog state tracker, an execution subsystem, a dialog policy subsystem, and a context stack. The dialog state tracker may generate an intermediate logical form from the input logical form combined with a context from the context stack. The context stack may maintain a history of a current dialog, and thus, the intermediate logical form may include contextual information potentially missing from the input logical form. The execution subsystem may execute the intermediate logical form to produce an execution result, and the dialog policy subsystem may generate an output logical form based on the execution result.
-
公开(公告)号:US20230169955A1
公开(公告)日:2023-06-01
申请号:US17993130
申请日:2022-11-23
Applicant: Oracle International Corporation
Inventor: Elias Luqman Jalaluddin , Vishal Vishnoi , Mark Edward Johnson , Thanh Long Duong , Yu-Heng Hong , Balakota Srinivas Vinnakota
CPC classification number: G10L15/063 , G10L15/05 , G10L15/18 , G10L15/22 , G10L15/26 , G10L2015/0633 , G10L2015/0638 , G10L2015/227
Abstract: Techniques for noise data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with noise text to generate an augmented training set of utterances; and training the intent classifier using the augmented training set of utterances. The augmenting includes: obtaining the noise text from a list of words, a text corpus, a publication, a dictionary, or any combination thereof irrelevant of original text within the utterances of the training set of utterances, and incorporating the noise text within the utterances relative to the original text in the utterances of the training set of utterances at a predefined augmentation ratio to generate augmented utterances.
-
公开(公告)号:US20230100508A1
公开(公告)日:2023-03-30
申请号:US17936679
申请日:2022-09-29
Applicant: Oracle International Corporation
Inventor: Ahmed Ataallah Ataallah Abobakr , Mark Edward Johnson , Thanh Long Duong , Vladislav Blinov , Yu-Heng Hong , Cong Duy Vu Hoang , Duy Vu
IPC: G06F40/295 , G06F40/205 , G06F40/263
Abstract: Techniques disclosed herein relate generally to text classification and include techniques for fusing word embeddings with word scores for text classification. In one particular aspect, a method for text classification is provided that includes obtaining an embedding vector for a textual unit, based on a plurality of word embedding vectors and a plurality of word scores. The plurality of word embedding vectors includes a corresponding word embedding vector for each of a plurality of words of the textual unit, and the plurality of word scores includes a corresponding word score for each of the plurality of words of the textual unit. The method also includes passing the embedding vector for the textual unit through at least one feed-forward layer to obtain a final layer output, and performing a classification on the final layer output.
-
公开(公告)号:US11538457B2
公开(公告)日:2022-12-27
申请号:US17016117
申请日:2020-09-09
Applicant: Oracle International Corporation
Inventor: Elias Luqman Jalaluddin , Vishal Vishnoi , Mark Edward Johnson , Thanh Long Duong , Yu-Heng Hong , Balakota Srinivas Vinnakota
Abstract: Techniques for noise data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with noise text to generate an augmented training set of utterances; and training the intent classifier using the augmented training set of utterances. The augmenting includes: obtaining the noise text from a list of words, a text corpus, a publication, a dictionary, or any combination thereof irrelevant of original text within the utterances of the training set of utterances, and incorporating the noise text within the utterances relative to the original text in the utterances of the training set of utterances at a predefined augmentation ratio to generate augmented utterances.
-
公开(公告)号:US20220171947A1
公开(公告)日:2022-06-02
申请号:US17456916
申请日:2021-11-30
Applicant: Oracle International Corporation
Inventor: Ying Xu , Poorya Zaremoodi , Thanh Tien Vu , Cong Duy Vu Hoang , Vladislav Blinov , Yu-Heng Hong , Yakupitiyage Don Thanuja Samodhye Dharmasiri , Vishal Vishnoi , Elias Luqman Jalaluddin , Manish Parekh , Thanh Long Duong , Mark Edward Johnson
Abstract: Techniques for using logit values for classifying utterances and messages input to chatbot systems in natural language processing. A method can include a chatbot system receiving an utterance generated by a user interacting with the chatbot system. The chatbot system can input the utterance into a machine-learning model including a set of binary classifiers. Each binary classifier of the set of binary classifiers can be associated with a modified logit function. The method can also include the machine-learning model using the modified logit function to generate a set of distance-based logit values for the utterance. The method can also include the machine-learning model applying an enhanced activation function to the set of distance-based logit values to generate a predicted output. The method can also include the chatbot system classifying, based on the predicted output, the utterance as being associated with the particular class.
-
公开(公告)号:US20210304075A1
公开(公告)日:2021-09-30
申请号:US17217623
申请日:2021-03-30
Applicant: Oracle International Corporation
Inventor: Thanh Long Duong , Mark Edward Johnson , Vishal Vishnoi , Balakota Srinivas Vinnakota , Yu-Heng Hong , Elias Luqman Jalaluddin
Abstract: The present disclosure relates to chatbot systems, and more particularly, to batching techniques for handling unbalanced training data when training a model such that bias is removed from the trained machine learning model when performing inference. In an embodiment, a plurality of raw utterances is obtained. A bias eliminating distribution is determined and a subset of the plurality of raw utterances is batched according to the bias-reducing distribution. The resulting unbiased training data may be input into a prediction model for training the prediction model. The trained prediction model may be obtained and utilized to predict unbiased results from new inputs received by the trained prediction model.
-
公开(公告)号:US20210065709A1
公开(公告)日:2021-03-04
申请号:US17005847
申请日:2020-08-28
Applicant: Oracle International Corporation
Inventor: Thanh Long Duong , Mark Edward Johnson , Vu Cong Duy Hoang , Tuyen Quang Pham , Yu-Heng Hong , Vladislavs Dovgalecs , Guy Bashkansky , Jason Black , Andrew David Bleeker , Serge Le Huitouze
IPC: G10L15/22
Abstract: Described herein are dialog systems, and techniques for providing such dialog systems, that are suitable for use on standalone computing devices. In some embodiments, a dialog system includes a dialog manager, which takes as input an input logical form, which may be a representation of user input. The dialog, manager may include a dialog state tracker, an execution subsystem, a dialog policy subsystem, and a context stack. The dialog state tracker may generate an intermediate logical form from the input logical form combined with a context from the context stack. The context stack may maintain a history of a current dialog, and thus, the intermediate logical form may include contextual information potentially missing from the input logical form. The execution subsystem may execute the intermediate logical form to produce an execution result, and the dialog policy subsystem may generate an output logical form based on the execution result.
-
-
-
-
-
-