Patent search ap:("Oracle International Corporation") AND inv:"Xu Zhong" Page 1

1.

发明申请
EXTRACTING KEY INFORMATION FROM DOCUMENT USING TRAINED MACHINE-LEARNING MODELS 有权

公开(公告)号：US20250157209A1

公开(公告)日：2025-05-15

申请号：US19002208

申请日：2024-12-26

Applicant: Oracle International Corporation

Inventor： Yakupitiyage Don Thanuja Samodhye Dharmasiri , Xu Zhong , Ahmed Ataallah Ataallah Abobakr , Hongtao Yang , Budhaditya Saha , Shaoke Xu , Shashi Prasad Suravarapu , Mark Edward Johnson , Thanh Long Duong

IPC: G06V10/82 , G06V30/148 , G06V30/412

Abstract: Techniques for extracting key information from a document using machine-learning models in a chatbot system is disclosed herein. In one particular aspect, a method is provided that includes receiving a set of data, which includes key fields, within a document at a data processing system that includes a table detection module, a key information extraction module, and a table extraction module. Text information and corresponding location data are extracted via optical character recognition. The table detection module detects whether one or more tables are present in the document and, if applicable, a location of each of the tables. The key information extraction module extracts text from the key fields. The table extraction module extracts each of the tables based on input from the optical character recognition and the table detection module. Extraction results include the text from the key fields and each of the tables can be output.

2.

发明申请
TRAINING DATA COLLECTION AND EVALUATION FOR FINE-TUNING A MACHINE-LEARNING MODEL FOR AUTOMATIC SOAP NOTE GENERATION 有权

公开(公告)号：US20250118398A1

公开(公告)日：2025-04-10

申请号：US18884459

申请日：2024-09-13

Applicant: Oracle International Corporation

Inventor： Shubham Pawankumar Shah , Syed Najam Abbas Zaidi , Xu Zhong , Poorya Zaremoodi , Srinivasa Phani Kumar Gadde , Arash Shamaei , Ganesh Kumar , Thanh Tien Vu , Nitika Mathur , Chang Xu , Shiquan Yang , Sagar Kalyan Gollamudi

IPC: G16H10/60 , G06N20/00

Abstract: Techniques are disclosed for automatically generating Subjective, Objective, Assessment and Plan (SOAP) notes. Particularly, techniques are disclosed for training data collection and evaluation for automatic SOAP note generation. Training data is accessed, and evaluation process is performed on the training data to result in evaluated training data. A fine-tuned machine-learning model is generated using the evaluated training data. The fine-tuned machine-learning model can be used to perform a task associated with generating a SOAP note.

3.

发明授权
Generating tagged content from text of an electronic document 有权

公开(公告)号：US12056434B2

公开(公告)日：2024-08-06

申请号：US18150924

申请日：2023-01-06

Applicant: Oracle International Corporation

Inventor： Vishank Bhatia , Xu Zhong , Thanh Long Duong , Mark Johnson , Srinivasa Phani Kumar Gadde , Vishal Vishnoi , King-Hwa Lee , Christopher Kennewick

IPC: G06F40/117 , G06F16/9538 , G06F16/955 , G06F40/134 , G06F40/143 , G06F40/205 , G06T7/70

CPC classification number: G06F40/117 , G06F16/9538 , G06F16/9558 , G06F40/134 , G06F40/143 , G06F40/205 , G06T7/70 , G06T2207/30176

Abstract: Techniques for generating formatting tags for textual content obtained from a source electronic document are disclosed. A system parses a digital file to obtain information about characters in an electronic document. The system applies tags to text generated based on the textual content of the electronic document by creating segments of textually-consecutive characters and applying corresponding text formatting style tags to the segments. The system further identifies segments of text overlapping bounding boxes in the electronic document. The system generates textual content including a segment of text and a corresponding hyperlink associated with the segment of text. The system further generates textual content by selectively applying line breaks from the source electronic document in the textual content.

4.

发明公开
AUTOMATING LARGE-SCALE DATA COLLECTION 审中-公开

公开(公告)号：US20240169161A1

公开(公告)日：2024-05-23

申请号：US18452803

申请日：2023-08-21

Applicant: Oracle International Corporation

Inventor： Paria Jamshid Lou , Gioacchino Tangari , Jason Black , Bhagya Gayathri Hettige , Xu Zhong , Poorya Zaremoodi , Thanh Long Duong , Mark Edward Johnson

IPC: G06F40/40 , G06F40/284 , G06F40/289 , G10L15/06

CPC classification number: G06F40/40 , G06F40/284 , G06F40/289 , G10L15/063

Abstract: Obtaining collections of sentences in different languages that are usable for training models in various applications of artificial intelligence is provided. A method is provided that obtains, from text corpus, webpages in a plurality of languages, each of the webpages corresponding to an URL; obtains annotations for each of the webpages based on its URL, to obtain annotated data entries corresponding to the webpages, each of the annotated data entries including a classification label corresponding to a sub-topic of one of a plurality of topics, where each of the plurality of topics includes a corresponding plurality of sub-topics; filters the annotated data entries to obtain topic-specific content in a target language based on the classification labels, the topic-specific content corresponding to one or more sub-topics; performs post-processing on the topic-specific content to obtain result data; and outputs the result data for the topic.

5.

发明申请
DEEP LEARNING TECHNIQUES FOR EXTRACTION OF EMBEDDED DATA FROM DOCUMENTS 有权

公开(公告)号：US20230139397A1

公开(公告)日：2023-05-04

申请号：US17819445

申请日：2022-08-12

Applicant: Oracle International Corporation

Inventor： Xu Zhong , Yakupitiyage Don Thanuja Samodhye Dharmasiri , Thanh Long Duong , Mark Edward Johnson

IPC: G06F40/35

Abstract: Deep learning techniques are disclosed for extraction of embedded data from documents. In an exemplary technique, a set of unstructured text data is received. One or more text groupings are generated by processing the set of unstructured text data. One or more text grouping embeddings are generated in a format for input to a machine learning model based on the one or more generated text groupings. One or more output predictions are generated by inputting the one or more text grouping embeddings into the machine learning model. Each output prediction of the one or more output predictions correspond to a predicted aspect of a text grouping of the one or more text groupings.

6.

发明申请
EXTRACTING KEY INFORMATION FROM DOCUMENT USING TRAINED MACHINE-LEARNING MODELS 有权

公开(公告)号：US20230095673A1

公开(公告)日：2023-03-30

申请号：US17888300

申请日：2022-08-15

Applicant: Oracle International Corporation

Inventor： Yakupitiyage Don Thanuja Samodhye Dharmasiri , Xu Zhong , Ahmed Ataallah Ataallah Abobakr , Hongtao Yang , Budhaditya Saha , Shaoke Xu , Shashi Prasad Suravarapu , Mark Edward Johnson , Thanh Long Duong

IPC: G06V10/82 , G06V30/412 , G06V30/148

Abstract: Techniques for extracting key information from a document using machine-learning models in a chatbot system is disclosed herein. In one particular aspect, a method is provided that includes receiving a set of data, which includes key fields, within a document at a data processing system that includes a table detection module, a key information extraction module, and a table extraction module. Text information and corresponding location data are extracted via optical character recognition. The table detection module detects whether one or more tables are present in the document and, if applicable, a location of each of the tables. The key information extraction module extracts text from the key fields. The table extraction module extracts each of the tables based on input from the optical character recognition and the table detection module. Extraction results include the text from the key fields and each of the tables can be output.

7.

发明授权
Extracting key information from document using trained machine-learning models 有权

公开(公告)号：US12217497B2

公开(公告)日：2025-02-04

申请号：US17888300

申请日：2022-08-15

Applicant: Oracle International Corporation

Inventor： Yakupitiyage Don Thanuja Samodhye Dharmasiri , Xu Zhong , Ahmed Ataallah Ataallah Abobakr , Hongtao Yang , Budhaditya Saha , Shaoke Xu , Shashi Prasad Suravarapu , Mark Edward Johnson , Thanh Long Duong

IPC: G06V10/82 , G06V30/148 , G06V30/412

Abstract: Techniques for extracting key information from a document using machine-learning models in a chatbot system is disclosed herein. In one particular aspect, a method is provided that includes receiving a set of data, which includes key fields, within a document at a data processing system that includes a table detection module, a key information extraction module, and a table extraction module. Text information and corresponding location data are extracted via optical character recognition. The table detection module detects whether one or more tables are present in the document and, if applicable, a location of each of the tables. The key information extraction module extracts text from the key fields. The table extraction module extracts each of the tables based on input from the optical character recognition and the table detection module. Extraction results include the text from the key fields and each of the tables can be output.

8.

发明公开
EXECUTING UNSUPERVISED PRE-TRAINING TASKS WITH A MACHINE LEARNING MODEL TO PREDICT DOCUMENT GRAPH ATTRIBUTES 审中-公开

公开(公告)号：US20240338395A1

公开(公告)日：2024-10-10

申请号：US18298060

申请日：2023-04-10

Applicant: Oracle International Corporation

Inventor： Xu Zhong , Don Dharmasiri , Thanh Long Duong , Mark Johnson , Srinivasa Phani Kumar Gadde , Vishal Vishnoi

IPC: G06F16/332 , G06F40/205 , G06F40/284

CPC classification number: G06F16/3329 , G06F40/205 , G06F40/284

Abstract: Techniques for multi-layer training of a machine learning model are disclosed. A system pre-trains a machine learning model on training data obtained from unlabeled document graph data by executing unsupervised pre-training tasks on the unlabeled document graph data to generate a labeled pre-training data set. The system modifies document graphs to change attributes of nodes in the document graphs. The system pre-trains the machine learning model with a data set including the modified document graphs and un-modified document graphs to generate prediction associated with the modifications to the document graphs. Subsequent to pre-training, the system fine-tunes the machine learning model with a set of labeled training data to generate predictions associated with a specific attribute of a document graph.

9.

发明公开
GENERATING AN ELECTRONIC DOCUMENT WITH A CONSISTENT TEXT ORDERING 审中-公开

公开(公告)号：US20240061989A1

公开(公告)日：2024-02-22

申请号：US18169740

申请日：2023-02-15

Applicant: Oracle International Corporation

Inventor： Xu Zhong , Vishank Bhatia , Thanh Long Duong , Mark Johnson , Srinivasa Phani Kumar Gadde , Vishal Vishnoi

IPC: G06F40/103 , G06F40/205 , G06F40/284 , G06F40/30

CPC classification number: G06F40/103 , G06F40/205 , G06F40/284 , G06F40/30

Abstract: Techniques for generating text content arranged in a consistent read order from a source document including text corresponding to different read orders are disclosed. A system parses a binary file representing an electronic document to identify characters and metadata associated with the characters. The system pre-sorts a character order of characters in each line of the electronic document to generate an ordered list of characters arranged according to the right-to-left reading order. The system performs a layout-mirroring operation to change a position of characters within the modified document relative to a right edge of the document and a left edge of the document. Subsequent to performing layout-mirroring, the system identifies native left-to-right reading-order text in-line with the native right-to-left reading-order text. The system flips the reading order of the native left-to-right read-order characters into the left-to-right reading order to be consistent with the native right-to-left read-order text.

10.

发明公开
WIDE AND DEEP NETWORK FOR LANGUAGE DETECTION USING HASH EMBEDDINGS 审中-公开

公开(公告)号：US20230141853A1

公开(公告)日：2023-05-11

申请号：US18052694

申请日：2022-11-04

Applicant: Oracle International Corporation

Inventor： Thanh Tien Vu , Poorya Zaremoodi , Duy Vu , Mark Edward Johnson , Thanh Long Duong , Xu Zhong , Vladislav Blinov , Cong Duy Vu Hoang , Yu-Heng Hong , Vinamr Goel , Philip Victor Ogren , Srinivasa Phani Kumar Gadde , Vishal Vishnoi

IPC: G06F40/263 , G06F16/31

CPC classification number: G06F40/263 , G06F16/325 , H04L51/02

Abstract: Techniques disclosed herein relate generally to language detection. In one particular aspect, a method is provided that includes obtaining a sequence of n-grams of a textual unit; using an embedding layer to obtain an ordered plurality of embedding vectors for the sequence of n-grams; using a deep network to obtain an encoded vector that is based on the ordered plurality of embedding vectors; and using a classifier to obtain a language prediction for the textual unit that is based on the encoded vector. The deep network includes an attention mechanism, and using the embedding layer to obtain the ordered plurality of embedding vectors comprises, for each n-gram in the sequence of n-grams: obtaining hash values for the n-gram; based on the hash values, selecting component vectors from among the plurality of component vectors; and obtaining an embedding vector for the n-gram that is based on the component vectors.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification