Patent search ap:("Oracle International Corporation") AND inv:"Sanjay Jinturkar" Page 2

11.

发明公开
ONE-PASS APPROACH TO AUTOMATED TIMESERIES FORECASTING 审中-公开

公开(公告)号：US20230153394A1

公开(公告)日：2023-05-18

申请号：US17528305

申请日：2021-11-17

Applicant: Oracle International Corporation

Inventor： Ritesh Ahuja , Anatoly Yakovlev , Venkatanathan Varadarajan , Sandeep R. Agrawal , Hesam Fathi Moghadam , Sanjay Jinturkar , Nipun Agarwal

IPC: G06K9/62 , G06N20/00

CPC classification number: G06K9/6227 , G06K9/6257 , G06K9/6265 , G06K9/6298 , G06N20/00

Abstract: Herein are timeseries preprocessing, model selection, and hyperparameter tuning techniques for forecasting development based on temporal statistics of a timeseries and a single feed-forward pass through a machine learning (ML) pipeline. In an embodiment, a computer hosts and operates the ML pipeline that automatically measures temporal statistic(s) of a timeseries. ML algorithm selection, cross validation, and hyperparameters tuning is based on the temporal statistics of the timeseries. The result from the ML pipeline is a rigorously trained and production ready ML model that is validated to have increased accuracy for multiple prediction horizons. Based on the temporal statistics, efficiency is achieved by asymmetry of investment of computer resources in the tuning and training of the most promising ML algorithm(s). Compared to other approaches, this ML pipeline produces a more accurate ML model for a given amount of computer resources and consumes fewer computer resources to achieve a given accuracy.

12.

发明申请
SEPARATION MAXIMIZATION TECHNIQUE FOR ANOMALY SCORES TO COMPARE ANOMALY DETECTION MODELS 有权

公开(公告)号：US20220138504A1

公开(公告)日：2022-05-05

申请号：US17083536

申请日：2020-10-29

Applicant: Oracle International Corporation

Inventor： Hesam Fathi Moghadam , Anatoly Yakovlev , Sandeep Agrawal , Venkatanathan Varadarajan , Robert Hopkins , Matteo Casserini , Milos Vasic , Sanjay Jinturkar , Nipun Agarwal

IPC: G06K9/62 , G06N20/20

Abstract: In an embodiment based on computer(s), an ML model is trained to detect outliers. The ML model calculates anomaly scores that include a respective anomaly score for each item in a validation dataset. The anomaly scores are automatically organized by sorting and/or clustering. Based on the organized anomaly scores, a separation is measured that indicates fitness of the ML model. In an embodiment, a computer performs two-clustering of anomaly scores into a first organization that consists of a first normal cluster of anomaly scores and a first anomaly cluster of anomaly scores. The computer performs three-clustering of the same anomaly scores into a second organization that consists of a second normal cluster of anomaly scores, a second anomaly cluster of anomaly scores, and a middle cluster of anomaly scores. A distribution difference between the first organization and the second organization is measured. An ML model is processed based on the distribution difference.

13.

发明授权
Personal information indexing for columnar data storage format 有权

公开(公告)号：US11238035B2

公开(公告)日：2022-02-01

申请号：US16814855

申请日：2020-03-10

Applicant: Oracle International Corporation

Inventor： Hamed Ahmadi , Jian Wen , Shrikumar Hariharasubrahmanian , Sanjay Jinturkar , Nipun Agarwal

IPC: G06F16/245 , G06F16/22

Abstract: Techniques are described herein for indexing personal information in columnar data storage format based files. In an embodiment, row groups of rows that comprise a plurality of columns are stored in a set of files. Each column of a row group is stored in a chunk of column pages in the set of files. A regular expression index that indexes a particular column in the set of files is stored for each row group. The regular expression index identifies column pages in the chunk of the particular column that include a particular column value that satisfies a regular expression specified in a query. The regular expression specified in the query in evaluated against the particular column using the regular expression index.

14.

发明申请
FAST, PREDICTIVE, AND ITERATION-FREE AUTOMATED MACHINE LEARNING PIPELINE 有权

公开(公告)号：US20210390466A1

公开(公告)日：2021-12-16

申请号：US17086204

申请日：2020-10-30

Applicant: Oracle International Corporation

Inventor： Venkatanathan Varadarajan , Sandeep R. Agrawal , Hesam Fathi Moghadam , Anatoly Yakovlev , Ali Moharrer , Jingxiao Cai , Sanjay Jinturkar , Nipun Agarwal , Sam Idicula , Nikan Chavoshi

IPC: G06N20/20 , G06N5/04

Abstract: A proxy-based automatic non-iterative machine learning (PANI-ML) pipeline is described, which predicts machine learning model configuration performance and outputs an automatically-configured machine learning model for a target training dataset. Techniques described herein use one or more proxy models—which implement a variety of machine learning algorithms and are pre-configured with tuned hyperparameters—to estimate relative performance of machine learning model configuration parameters at various stages of the PANI-ML pipeline. The PANI-ML pipeline implements a radically new approach of rapidly narrowing the search space for machine learning model configuration parameters by performing algorithm selection followed by algorithm-specific adaptive data reduction (i.e., row- and/or feature-wise dataset sampling), and then hyperparameter tuning. Furthermore, because of the one-pass nature of the PANI-ML pipeline and because each stage of the pipeline has convergence criteria by design, the whole PANI-ML pipeline has a novel convergence property that stops the configuration search after one pass.

15.

发明申请
EFFICIENT AND ACCURATE REGIONAL EXPLANATION TECHNIQUE FOR NLP MODELS 有权

公开(公告)号：US20220309360A1

公开(公告)日：2022-09-29

申请号：US17212163

申请日：2021-03-25

Applicant: Oracle International Corporation

Inventor： Zahra Zohrevand , Tayler Hetherington , Karoon Rashedi Nia , Yasha Pushak , Sanjay Jinturkar , Nipun Agarwal

IPC: G06N5/04 , G06N20/00 , G06F40/20

Abstract: Herein are techniques for topic modeling and content perturbation that provide machine learning (ML) explainability (MLX) for natural language processing (NLP). A computer hosts an ML model that infers an original inference for each of many text documents that contain many distinct terms. To each text document (TD) is assigned, based on terms in the TD, a topic that contains a subset of the distinct terms. In a perturbed copy of each TD, a perturbed subset of the distinct terms is replaced. For the perturbed copy of each TD, the ML model infers a perturbed inference. For TDs of a topic, the computer detects that a difference between original inferences of the TDs of the topic and perturbed inferences of the TDs of the topic exceeds a threshold. Based on terms in the TDs of the topic, the topic is replaced with multiple, finer-grained new topics. After sufficient topic modeling, a regional explanation of the ML model is generated.

16.

发明授权
Anomaly detection in SS7 control network using reconstructive neural networks 有权

公开(公告)号：US11451670B2

公开(公告)日：2022-09-20

申请号：US17123235

申请日：2020-12-16

Applicant: Oracle International Corporation

Inventor： Hamed Ahmadi , Ali Moharrer , Venkatanathan Varadarajan , Vaseem Akram , Nishesh Rai , Reema Hingorani , Sanjay Jinturkar , Nipun Agarwal

IPC: H04M7/06 , G06N20/00 , G06K9/62 , H04W84/04 , H04M3/22

Abstract: Herein are machine learning (ML) techniques for unsupervised training with a corpus of signaling system 7 (SS7) messages having a diversity of called and calling parties, operation codes (opcodes) and transaction types, numbering plans and nature of address indicators, and mobile country codes and network codes. In an embodiment, a computer stores SS7 messages that are not labeled as anomalous or non-anomalous. Each SS7 message contains an opcode and other fields. For each SS7 message, the opcode of the SS7 message is stored into a respective feature vector (FV) of many FVs that are based on respective unlabeled SS7 messages. The FVs contain many distinct opcodes. Based on the FVs that contain many distinct opcodes and that are based on respective unlabeled SS7 messages, an ML model such as a reconstructive model such as an autoencoder is unsupervised trained to detect an anomalous SS7 message.

17.

发明申请
FAST, APPROXIMATE CONDITIONAL DISTRIBUTION SAMPLING 有权

公开(公告)号：US20220261400A1

公开(公告)日：2022-08-18

申请号：US17179265

申请日：2021-02-18

Applicant: Oracle International Corporation

Inventor： Yasha Pushak , Tayler Hetherington , Karoon Rashedi Nia , Zahra Zohrevand , Sanjay Jinturkar , Nipun Agarwal

IPC: G06F16/2458 , G06N20/00

Abstract: Techniques are described for fast approximate conditional sampling by randomly sampling a dataset and then performing a nearest neighbor search on the pre-sampled dataset to reduce the data over which the nearest neighbor search must be performed and, according to an embodiment, to effectively reduce the number of nearest neighbors that are to be found within the random sample. Furthermore, KD-Tree-based stratified sampling is used to generate a representative sample of a dataset. KD-Tree-based stratified sampling may be used to identify the random sample for fast approximate conditional sampling, which reduces variance in the resulting data sample. As such, using KD-Tree-based stratified sampling to generate the random sample for fast approximate conditional sampling ensures that any nearest neighbor selected, for a target data instance, from the random sample is likely to be among the nearest neighbors of the target data instance within the unsampled dataset.

18.

发明申请
POST-HOC EXPLANATION OF MACHINE LEARNING MODELS USING GENERATIVE ADVERSARIAL NETWORKS 有权

公开(公告)号：US20220198277A1

公开(公告)日：2022-06-23

申请号：US17131387

申请日：2020-12-22

Applicant: Oracle International Corporation

Inventor： Karoon Rashedi Nia , Tayler Hetherington , Zahra Zohrevand , Yasha Pushak , Sanjay Jinturkar , Nipun Agarwal

IPC: G06N3/08 , G06N3/04

Abstract: Herein are generative adversarial networks to ensure realistic local samples and surrogate models to provide machine learning (ML) explainability (MLX). Based on many features, an embodiment trains an ML model. The ML model inferences an original inference for original feature values respectively for many features. Based on the same features, a generator model is trained to generate realistic local samples that are distinct combinations of feature values for the features. A surrogate model is trained based on the generator model and based on the original inference by the ML model and/or the original feature values that the original inference is based on. Based on the surrogate model, the ML model is explained. The local samples may be weighted based on semantic similarity to the original feature values, which may facilitate training the surrogate model and/or ranking the relative importance of the features. Local sample weighting may be based on populating a random forest with the local samples.

19.

发明申请
AUTOMATED MACHINE LEARNING PIPELINE FOR TIMESERIES DATASETS UTILIZING POINT-BASED ALGORITHMS 有权

公开(公告)号：US20220121955A1

公开(公告)日：2022-04-21

申请号：US17071285

申请日：2020-10-15

Applicant: Oracle International Corporation

Inventor： Nikan Chavoshi , Anatoly Yakovlev , Hesam Fathi Moghadam , Venkatanathan Varadarajan , Sandeep Agrawal , Ali Moharrer , Jingxiao Cai , Sanjay Jinturkar , Nipun Agarwal

IPC: G06N3/08 , G06N20/00

Abstract: Herein, a computer generates and evaluates many preprocessor configurations for a window preprocessor that transforms a training timeseries dataset for an ML model. With each preprocessor configuration, the window preprocessor is configured. The window preprocessor then converts the training timeseries dataset into a configuration-specific point-based dataset that is based on the preprocessor configuration. The ML model is trained based on the configuration-specific point-based dataset to calculate a score for the preprocessor configuration. Based on the scores of the many preprocessor configurations, an optimal preprocessor configuration is selected for finally configuring the window preprocessor, after which, the window preprocessor can optimally transform a new timeseries dataset such as in an offline or online production environment such as for real-time processing of a live streaming timeseries.

20.

发明公开
TUNING-FREE UNSUPERVISED ANOMALY DETECTION BASED ON DISTANCE TO NEAREST NORMAL POINT 审中-公开

公开(公告)号：US20240281455A1

公开(公告)日：2024-08-22

申请号：US18444454

申请日：2024-02-16

Applicant: Oracle International Corporation

Inventor： Youssef Mohamed Saied , Mohamed Ridha Chahed , Anatoly Yakovlev , Sandeep R. Agrawal , Sanjay Jinturkar , Nipun Agarwal

IPC: G06F16/28 , G06F16/22

CPC classification number: G06F16/285 , G06F16/2282

Abstract: Disclosed is an improved approach to implement anomaly detection, where an ensemble detection mechanism is provided. An improvement is provided for the KNN algorithm where scaling is applied to permit efficient detection of multiple categories of anomalies. Further extensions are used to optimize local anomaly detection.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification