专利检索 ap:("INTERNATIONAL BUSINESS MACHINES CORPORATION") AND inv:"Horst Cornelius Samulowitz" 第 2 页

11.

发明申请
Knowledge Aided Feature Engineering 有权

公开(公告)号：US20210216904A1

公开(公告)日：2021-07-15

申请号：US16741084

申请日：2020-01-13

申请人： International Business Machines Corporation

发明人： Udayan Khurana , Sainyam Galhotra , Oktie Hassanzadeh , Kavitha Srinivas , Horst Cornelius Samulowitz

IPC分类号： G06N20/00 , G06F11/34

摘要： Embodiments relate to a system, program product, and method for employing feature engineering to improve classifier performance. A first machine learning (ML) model with a first learning program is selected. The first selected ML model is operatively associated with a first structured dataset. First features in the first dataset directed at performance of the selected ML model are identified. A second structured dataset is assessed with respect to the identified features in the first dataset, and new features in the second dataset are identified, where the new feature is semantically related to the identified features in the first dataset. The first dataset is dynamically augmented with the identified new features in the second dataset. The dynamically augmented first dataset is applied to the selected ML model to subject an embedded learning algorithm of the selected ML model to training using the augmented first dataset.

12.

发明授权
Mining code expressions for data analysis 有权

公开(公告)号：US12124822B2

公开(公告)日：2024-10-22

申请号：US17895881

申请日：2022-08-25

申请人： International Business Machines Corporation

发明人： Julian Timothy Dolby , Horst Cornelius Samulowitz , Kavitha Srinivas

IPC分类号： G06F8/75 , G06F8/35 , G06F8/41 , G06F8/51 , G06F11/36 , G06F21/56 , H04L9/40

CPC分类号： G06F8/35 , G06F8/75 , G06F11/3604 , G06F21/562 , G06F21/563 , H04L63/1433

摘要： Techniques for computer software code analysis are disclosed. One or more data flows are generated, based on analyzing software code using static analysis. A data object is identified in the software code using the one or more data flows, the data object relating to a structured dataset. A correspondence between a code expression in the software code and a characteristic of the structured dataset is identified, based on analyzing one or more reads from and one or more writes to the data object using the one or more data flows. The code expression for the structured dataset is analyzed, based on the correspondence, including at least one of: (i) generating a software code recommendation engine based on the code expression and the structured dataset, or (ii) generating one or more lambda expressions for application to the structured dataset, based on the code expression.

13.

发明授权
Automated machine learning using nearest neighbor recommender systems 有权

公开(公告)号：US11941541B2

公开(公告)日：2024-03-26

申请号：US16988809

申请日：2020-08-10

申请人： International Business Machines Corporation

发明人： Saket Sathe , Gregory Bramble , Horst Cornelius Samulowitz , Charu C. Aggarwal

IPC分类号： G06N20/00 , G06F17/16 , G06F18/21 , G06F18/2113 , G06N5/04

CPC分类号： G06N5/04 , G06F17/16 , G06F18/2113 , G06F18/2193 , G06N20/00

摘要： Methods, computer program products and/or systems are provided that perform the following operations: obtaining a performance matrix representing accuracies obtained by executing a plurality of pipelines on a plurality of training data sets, wherein a pipeline comprises a series of operations performed on a data set; selecting a defined number of top pipelines as potential pipelines for a testing data set based, at least in part, on a similarity between the testing data set and each of the plurality of training data sets represented in the performance matrix; storing results from executing each of the potential pipelines as a new data set; determining a pipeline accuracy for each of the potential pipelines when executed against the testing data set; and providing a recommended pipeline for use with the testing data set based, at least in part, on the pipeline accuracy for each potential pipeline.

14.

发明授权
Methods for automatically configuring performance evaluation schemes for machine learning algorithms 有权

公开(公告)号：US11681931B2

公开(公告)日：2023-06-20

申请号：US16580953

申请日：2019-09-24

申请人： International Business Machines Corporation

发明人： Bo Zhang , Gregory Bramble , Parikshit Ram , Horst Cornelius Samulowitz

IPC分类号： G06N5/04 , G06N20/00

CPC分类号： G06N5/04 , G06N20/00

摘要： A system that provides a mathematical formulation for new problem of model validation and model selection in presence of test data feedback. The system comprises a memory that stores computer-executable components. A processor, operably coupled to the memory, executes the computer-executable components stored in the memory. A selection component selects a metric of performance evaluation accuracy; and a configuration component configures performance evaluation schemes for machine learning algorithms. A characterization component employs a supervised learning-based approach to characterize relationship between the configuration of the performance evaluation scheme and fidelity of performance estimates; and an optimization component that optimizes accuracy of the machine learning algorithms as a function of size of training data set relative to size of validation data set through selection of values associated with the configuration parameters.

15.

发明申请
AUTOMATED MACHINE LEARNING PIPELINE GENERATION 有权

公开(公告)号：US20220036246A1

公开(公告)日：2022-02-03

申请号：US16942247

申请日：2020-07-29

申请人： International Business Machines Corporation

发明人： Bei Chen , Long VU , Syed Yousaf Shah , Xuan-Hong Dang , Peter Daniel Kirchner , Si Er Han , Ji Hui Yang , Jun Wang , Jing James Xu , Dakuo Wang , Dhavalkumar C. Patel , Gregory Bramble , Horst Cornelius Samulowitz , Saket Sathe , Chuang Gan

IPC分类号： G06N20/20

摘要： Techniques regarding one or more automated machine learning processes that analyze time series data are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a time series analysis component that selects a machine learning pipeline for meta transfer learning on time series data by sequentially allocating subsets of training data from the time series data amongst a plurality of machine learning pipeline candidates.

16.

发明申请
CREATING OPTIMIZED MACHINE-LEARNING MODELS 审中-公开

公开(公告)号：US20200184380A1

公开(公告)日：2020-06-11

申请号：US16216138

申请日：2018-12-11

申请人： International Business Machines Corporation

发明人： Gegi Thomas , Adelmo Cristiano Innocenza Malossi , Tejaswini Pedapati , Ganesh Venkataraman , Roxana Istrate , Martin Wistuba , Florian Michael Scheidegger , Chao Xue , Rong Yan , Horst Cornelius Samulowitz , Benjamin Herta , Debashish Saha , Hendrik Strobelt

IPC分类号： G06N20/20 , G06N3/08 , G06N3/04

摘要： A machine-learning model generation method, system, and computer program product deciding, via a first algorithm, a machine-learning algorithm that is best for customer data, invoking the machine-learning algorithm to train a neural network model with the customer data, analyzing the neural network model produced by the training for an accuracy, and improving the accuracy by iteratively repeating the training of the neural network model until a customer-defined constraint is met, as determined by the first algorithm.

17.

发明授权
Automatic domain annotation of structured data 有权

公开(公告)号：US11954424B2

公开(公告)日：2024-04-09

申请号：US17661619

申请日：2022-05-02

申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION

发明人： Horst Cornelius Samulowitz , Kavitha Srinivas

IPC分类号： G06F17/00 , G06F16/245 , G06F40/117 , G06F40/169 , G06F40/177 , G06F40/20

CPC分类号： G06F40/169 , G06F16/245 , G06F40/117 , G06F40/177 , G06F40/20

摘要： A processor may receive structured data. The structured data may include one or more columns and associated column names. The processor may analyze the structured data. Analyzing the structured data may include gathering a requisite set of keywords from the associated column names across all columns and/or a sample of column cells. The processor may access a corpus of documents. Each of the documents in the corpus may be associated with a respective keyword. The processor may search the corpus of documents based on the requisite set of keywords. The processor may summarize one or more documents associated with the requisite set of keywords.

18.

发明授权
Question answering approach to semantic parsing of mathematical formulas 有权

公开(公告)号：US11663251B2

公开(公告)日：2023-05-30

申请号：US17447126

申请日：2021-09-08

申请人： International Business Machines Corporation

发明人： William Karol Lynch , Kavitha Srinivas , Horst Cornelius Samulowitz , Fabio Lorenzi

IPC分类号： G06F16/335 , G06V30/416 , G06F40/205 , G06F16/332

CPC分类号： G06F16/3329 , G06F16/335 , G06V30/416 , G06F40/205

摘要： A method, system, and computer program product are disclosed. The method includes extracting at least one identifier from a formula in a document and extracting text passages in the document that contain the identifier(s). The method also includes selecting an identifier and extracted text passages containing the identifier, as well as generating identifier-passage pairs for the selected text passages and the identifier. Further, the method includes submitting the identifier-passage pairs to a question answering (QA) model, which generates candidate answers from the selected text passages. A definition of the identifier is then selected from the candidate answers.

19.

发明授权
Knowledge aided feature engineering 有权

公开(公告)号：US11599826B2

公开(公告)日：2023-03-07

申请号：US16741084

申请日：2020-01-13

申请人： International Business Machines Corporation

发明人： Udayan Khurana , Sainyam Galhotra , Oktie Hassanzadeh , Kavitha Srinivas , Horst Cornelius Samulowitz

IPC分类号： G06N20/00 , G06F11/34

摘要： Embodiments relate to a system, program product, and method for employing feature engineering to improve classifier performance. A first machine learning (ML) model with a first learning program is selected. The first selected ML model is operatively associated with a first structured dataset. First features in the first dataset directed at performance of the selected ML model are identified. A second structured dataset is assessed with respect to the identified features in the first dataset, and new features in the second dataset are identified, where the new features are semantically related to the identified features in the first dataset. The first dataset is dynamically augmented with the identified new features in the second dataset. The dynamically augmented first dataset is applied to the selected ML model to subject an embedded learning algorithm of the selected ML model to training using the augmented first dataset.

20.

发明申请
INTERACTIVE FEATURE ENGINEERING IN AUTOMATIC MACHINE LEARNING WITH DOMAIN KNOWLEDGE 有权

公开(公告)号：US20220366269A1

公开(公告)日：2022-11-17

申请号：US17317242

申请日：2021-05-11

申请人： International Business Machines Corporation

发明人： Dakuo Wang , Udayan Khurana , Daniel Karl I. Weidele , Arunima Chaudhary , Carolina Maria Spina , Abel Valente , Chuang Gan , Horst Cornelius Samulowitz , Lisa Amini

IPC分类号： G06N5/02 , G06K9/62 , G06N20/00

摘要： A dataset including features and values associated with the features can be received. Each of the features in the dataset can be mapped to a corresponding node in a knowledge graph based on the concept represented by the corresponding node. The knowledge graph can be traversed to find a candidate node connected to at least one mapped node, the candidate node not being mapped to a feature in the dataset. A concept associated with the candidate node can be identified as a new feature. A machine learning model pipeline can use the features in the dataset and the new feature to select a subset of features for training a machine learning model.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类