Patent search ap:("Oracle International Corporation") AND inv:"Michael Louis Wick" Page 2

11.

发明授权
Guided augmentation of data sets for machine learning models 有权

公开(公告)号：US12242568B2

公开(公告)日：2025-03-04

申请号：US17903798

申请日：2022-09-06

Applicant: Oracle International Corporation

Inventor： Ariel Gedaliah Kobren , Swetasudha Panda , Michael Louis Wick , Qinlan Shen , Jason Anthony Peck

IPC: G06F18/214 , G06F40/56

Abstract: Techniques are disclosed for augmenting data sets used for training machine learning models and for generating predictions by trained machine learning models. These techniques may increase a number and diversity of examples within an initial training dataset of sentences by extracting a subset of words from the existing training dataset of sentences. The techniques may conserve scarce sample data in few-shot situations by training a data generation model using general data obtained from a general data source.

12.

发明公开
Determining Machine Learning Model Performance on Unlabeled Out Of Distribution Data 审中-公开

公开(公告)号：US20240289685A1

公开(公告)日：2024-08-29

申请号：US18176380

申请日：2023-02-28

Applicant: Oracle International Corporation

Inventor： Michael Louis Wick , Ariel Kobren , Swetasudha Panda , John Sullivan

IPC: G06N20/00

CPC classification number: G06N20/00

Abstract: Machine learning model performance may be determined on unlabeled out of distribution data. A source data set may be obtained for training a machine learning model. Unbiased estimates may be determined for baseline performance indicators of the machine learning model applied to a target dataset without ground truth labels using importance sampling weights. Performance metrics may then be determined using the baselined performance indicators and provided.

13.

发明公开
Providing Fairness in Fine-Tuning of Pre-Trained Language Models 审中-公开

公开(公告)号：US20230409969A1

公开(公告)日：2023-12-21

申请号：US18176374

申请日：2023-02-28

Applicant: Oracle International Corporation

Inventor： Swetasudha Panda , Ariel Kobren , Michael Louis Wick , Qinlan Shen

IPC: G06N20/00

CPC classification number: G06N20/00

Abstract: Bias in a language model generated through fine tuning of a pre-trained language model may be mitigated, whether the bias may be incorporated in the pre-trained language model or in fine-tuning data. A pre-trained language model may be fine-tuned using downstream training data. Prior to tuning, elements within the downstream data may be identified that either match or serve as proxies for one or more identity elements associated with training bias sensitivity. Proxy elements may be identified using an analysis of distributions of the downstream elements and distributions of identity elements. Once the elements are identified, instances of the identified elements may be replaced in the downstream data with one or more masking element to generate masked downstream data. A fine-tuned language model with reduced bias may then be generated from the pre-trained language model by tuning the pre-trained language model using the masked downstream data.

14.

发明公开
Enforcing Fairness on Unlabeled Data to Improve Modeling Performance 审中-公开

公开(公告)号：US20230394371A1

公开(公告)日：2023-12-07

申请号：US18453929

申请日：2023-08-22

Applicant: Oracle International Corporation

Inventor： Michael Louis Wick , Swetasudha Panda , Jean-Baptiste Frederic George Tristan

IPC: G06N20/00

CPC classification number: G06N20/00 , G06N3/088

Abstract: Fairness of a trained classifier may be ensured by generating a data set for training, the data set generated using input data points of a feature space including multiple dimensions and according to different parameters including an amount of label bias, a control for discrepancy between rarity of features, and an amount of selection bias. Unlabeled data points of the input data comprising unobserved ground truths are labeled according to the amount of label bias and the input data sampled according to the amount of selection bias and the control for the discrepancy between the rarity of features. The classifier is then trained using the sampled and labeled data points as well as additional unlabeled data points. The trained classifier is then usable to determine unbiased classifications of one or more labels for one or more other data sets.

15.

发明授权
Evaluating language models using negative data 有权

公开(公告)号：US11488579B2

公开(公告)日：2022-11-01

申请号：US16890263

申请日：2020-06-02

Applicant: Oracle International Corporation

Inventor： Michael Louis Wick , Jean-Baptiste Frederic George Tristan , Jason Peck

IPC: G10L15/01 , G10L15/06 , G10L15/197

Abstract: A method of evaluating a language model using negative data may include accessing a first language model that is trained using a first training corpus, and accessing a second language model. The second language model may be configured to generate outputs that are less grammatical than outputs generated by the first language model. The method may also include training the second language model using a second training corpus, and generating output text from the second language model. The method may further include testing the first language model using the output text from the second language model.

16.

发明申请
REMOVING UNDESIRABLE SIGNALS FROM LANGUAGE MODELS USING NEGATIVE DATA 有权

公开(公告)号：US20210374361A1

公开(公告)日：2021-12-02

申请号：US16890097

申请日：2020-06-02

Applicant: Oracle International Corporation

Inventor： Michael Louis Wick , Jean-Baptiste Frederic George Tristan , Adam Craig Pocock , Katherine Silverstein

IPC: G06F40/58

Abstract: A method for training a language model using negative data may include accessing a first training corpus comprising positive training data and accessing a second training corpus comprising negative training data. The method may further include training a first language model using at least the first training corpus, the second training corpus, and a maximum likelihood function. The maximum likelihood function may maximize the likelihood of the first language model predicting the positive training data while minimizing the likelihood of the first language model predicting the negative training data.

17.

发明授权
Systems and methods for scalable hierarchical coreference 有权

公开(公告)号：US11017151B2

公开(公告)日：2021-05-25

申请号：US16833276

申请日：2020-03-27

Applicant: Oracle International Corporation

Inventor： Michael Louis Wick , Jean-Baptiste Frederic George Tristan , Stephen Joseph Green

IPC: H03M7/00 , G06F40/137 , G06N7/00 , H03M7/30 , G06N5/04 , G06F17/16 , G06F40/146

Abstract: A scalable hierarchical coreference method that employs a homomorphic compression scheme that supports addition and partial subtraction to more efficiently represent the data and the evolving intermediate results of probabilistic inference. The method may encode the features underlying conditional random field models of coreference resolution so that cosine similarities can be efficiently computed. The method may be applied to compressing features and intermediate inference results for conditional random fields. The method may allow compressed representations to be added and subtracted in a way that preserves the cosine similarities.

18.

发明授权
Debiasing pre-trained sentence encoders with probabilistic dropouts 有权

公开(公告)号：US12106050B2

公开(公告)日：2024-10-01

申请号：US17589662

申请日：2022-01-31

Applicant: Oracle International Corporation

Inventor： Swetasudha Panda , Ariel Kobren , Michael Louis Wick , Stephen Green

IPC: G06F40/279 , G06N20/00

CPC classification number: G06F40/279 , G06N20/00

Abstract: Debiasing pre-trained sentence encoders with probabilistic dropouts may be performed by various systems, services, or applications. A sentence may be received, where the words of the sentence may be provided as tokens to an encoder of a machine learning model. A token-wise correlation using semantic orientation may be determined to determine a bias score for the tokens in the input sentence. A probability of dropout that for tokens in the input sentence may be determined from the bias scores. The machine learning model may be trained or tuned based on the probabilities of dropout for the tokens in the input sentence.

19.

发明公开
Similarity Analysis Using Enhanced MinHash 审中-公开

公开(公告)号：US20240168934A1

公开(公告)日：2024-05-23

申请号：US18426100

申请日：2024-01-29

Applicant: Oracle International Corporation

Inventor： Michael Louis Wick , Jean-Baptiste Frederic George Tristan , Swetasudha Panda

IPC: G06F16/22 , G06F17/18 , G06F18/22 , G06F18/231

CPC classification number: G06F16/2228 , G06F17/18 , G06F18/22 , G06F18/231

Abstract: A first set and a second set are identified as operands for a set operation of a similarity analysis task iteration. Using respective minimum hash information arrays and contributor count arrays of the two sets, a minimum hash information array and contributor count array of a derived set resulting from the set operation is generated. An entry in the contributor count array of the derived set indicates the number of child sets of the derived set that meet a criterion with respect to a corresponding entry in the minimum hash information array of the derived set. The generated minimum hash information array and the contributor count array are stored as part of input for a subsequent iteration. After a termination criterion of the task is met, output of the task is stored.

20.

发明公开
AUGMENTING DATA SETS FOR SELECTING MACHINE LEARNING MODELS 审中-公开

公开(公告)号：US20230401285A1

公开(公告)日：2023-12-14

申请号：US17903796

申请日：2022-09-06

Applicant: Oracle International Corporation

Inventor： Ariel Gedaliah Kobren , Swetasudha Panda , Michael Louis Wick , Qinlan Shen , Jason Anthony Peck

IPC: G06K9/62

CPC classification number: G06K9/6256 , G06K9/6262

Abstract: Techniques are disclosed for augmenting data sets used for training machine learning models and for generating predictions by trained machine learning models. The techniques generate synthesized data from sample data and train a machine learning model using the synthesized data to augment a sample data set. Embodiments selectively partition the sample data set and synthesized data into a training data and a validation data, which are used to generate and select machine learning models.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification