-
公开(公告)号:US20210216904A1
公开(公告)日:2021-07-15
申请号:US16741084
申请日:2020-01-13
发明人: Udayan Khurana , Sainyam Galhotra , Oktie Hassanzadeh , Kavitha Srinivas , Horst Cornelius Samulowitz
摘要: Embodiments relate to a system, program product, and method for employing feature engineering to improve classifier performance. A first machine learning (ML) model with a first learning program is selected. The first selected ML model is operatively associated with a first structured dataset. First features in the first dataset directed at performance of the selected ML model are identified. A second structured dataset is assessed with respect to the identified features in the first dataset, and new features in the second dataset are identified, where the new feature is semantically related to the identified features in the first dataset. The first dataset is dynamically augmented with the identified new features in the second dataset. The dynamically augmented first dataset is applied to the selected ML model to subject an embedded learning algorithm of the selected ML model to training using the augmented first dataset.
-
公开(公告)号:US12124822B2
公开(公告)日:2024-10-22
申请号:US17895881
申请日:2022-08-25
CPC分类号: G06F8/35 , G06F8/75 , G06F11/3604 , G06F21/562 , G06F21/563 , H04L63/1433
摘要: Techniques for computer software code analysis are disclosed. One or more data flows are generated, based on analyzing software code using static analysis. A data object is identified in the software code using the one or more data flows, the data object relating to a structured dataset. A correspondence between a code expression in the software code and a characteristic of the structured dataset is identified, based on analyzing one or more reads from and one or more writes to the data object using the one or more data flows. The code expression for the structured dataset is analyzed, based on the correspondence, including at least one of: (i) generating a software code recommendation engine based on the code expression and the structured dataset, or (ii) generating one or more lambda expressions for application to the structured dataset, based on the code expression.
-
公开(公告)号:US11941541B2
公开(公告)日:2024-03-26
申请号:US16988809
申请日:2020-08-10
IPC分类号: G06N20/00 , G06F17/16 , G06F18/21 , G06F18/2113 , G06N5/04
CPC分类号: G06N5/04 , G06F17/16 , G06F18/2113 , G06F18/2193 , G06N20/00
摘要: Methods, computer program products and/or systems are provided that perform the following operations: obtaining a performance matrix representing accuracies obtained by executing a plurality of pipelines on a plurality of training data sets, wherein a pipeline comprises a series of operations performed on a data set; selecting a defined number of top pipelines as potential pipelines for a testing data set based, at least in part, on a similarity between the testing data set and each of the plurality of training data sets represented in the performance matrix; storing results from executing each of the potential pipelines as a new data set; determining a pipeline accuracy for each of the potential pipelines when executed against the testing data set; and providing a recommended pipeline for use with the testing data set based, at least in part, on the pipeline accuracy for each potential pipeline.
-
14.
公开(公告)号:US11681931B2
公开(公告)日:2023-06-20
申请号:US16580953
申请日:2019-09-24
摘要: A system that provides a mathematical formulation for new problem of model validation and model selection in presence of test data feedback. The system comprises a memory that stores computer-executable components. A processor, operably coupled to the memory, executes the computer-executable components stored in the memory. A selection component selects a metric of performance evaluation accuracy; and a configuration component configures performance evaluation schemes for machine learning algorithms. A characterization component employs a supervised learning-based approach to characterize relationship between the configuration of the performance evaluation scheme and fidelity of performance estimates; and an optimization component that optimizes accuracy of the machine learning algorithms as a function of size of training data set relative to size of validation data set through selection of values associated with the configuration parameters.
-
公开(公告)号:US20220036246A1
公开(公告)日:2022-02-03
申请号:US16942247
申请日:2020-07-29
发明人: Bei Chen , Long VU , Syed Yousaf Shah , Xuan-Hong Dang , Peter Daniel Kirchner , Si Er Han , Ji Hui Yang , Jun Wang , Jing James Xu , Dakuo Wang , Dhavalkumar C. Patel , Gregory Bramble , Horst Cornelius Samulowitz , Saket Sathe , Chuang Gan
IPC分类号: G06N20/20
摘要: Techniques regarding one or more automated machine learning processes that analyze time series data are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a time series analysis component that selects a machine learning pipeline for meta transfer learning on time series data by sequentially allocating subsets of training data from the time series data amongst a plurality of machine learning pipeline candidates.
-
公开(公告)号:US20200184380A1
公开(公告)日:2020-06-11
申请号:US16216138
申请日:2018-12-11
发明人: Gegi Thomas , Adelmo Cristiano Innocenza Malossi , Tejaswini Pedapati , Ganesh Venkataraman , Roxana Istrate , Martin Wistuba , Florian Michael Scheidegger , Chao Xue , Rong Yan , Horst Cornelius Samulowitz , Benjamin Herta , Debashish Saha , Hendrik Strobelt
摘要: A machine-learning model generation method, system, and computer program product deciding, via a first algorithm, a machine-learning algorithm that is best for customer data, invoking the machine-learning algorithm to train a neural network model with the customer data, analyzing the neural network model produced by the training for an accuracy, and improving the accuracy by iteratively repeating the training of the neural network model until a customer-defined constraint is met, as determined by the first algorithm.
-
公开(公告)号:US11954424B2
公开(公告)日:2024-04-09
申请号:US17661619
申请日:2022-05-02
IPC分类号: G06F17/00 , G06F16/245 , G06F40/117 , G06F40/169 , G06F40/177 , G06F40/20
CPC分类号: G06F40/169 , G06F16/245 , G06F40/117 , G06F40/177 , G06F40/20
摘要: A processor may receive structured data. The structured data may include one or more columns and associated column names. The processor may analyze the structured data. Analyzing the structured data may include gathering a requisite set of keywords from the associated column names across all columns and/or a sample of column cells. The processor may access a corpus of documents. Each of the documents in the corpus may be associated with a respective keyword. The processor may search the corpus of documents based on the requisite set of keywords. The processor may summarize one or more documents associated with the requisite set of keywords.
-
公开(公告)号:US11663251B2
公开(公告)日:2023-05-30
申请号:US17447126
申请日:2021-09-08
IPC分类号: G06F16/335 , G06V30/416 , G06F40/205 , G06F16/332
CPC分类号: G06F16/3329 , G06F16/335 , G06V30/416 , G06F40/205
摘要: A method, system, and computer program product are disclosed. The method includes extracting at least one identifier from a formula in a document and extracting text passages in the document that contain the identifier(s). The method also includes selecting an identifier and extracted text passages containing the identifier, as well as generating identifier-passage pairs for the selected text passages and the identifier. Further, the method includes submitting the identifier-passage pairs to a question answering (QA) model, which generates candidate answers from the selected text passages. A definition of the identifier is then selected from the candidate answers.
-
公开(公告)号:US11599826B2
公开(公告)日:2023-03-07
申请号:US16741084
申请日:2020-01-13
发明人: Udayan Khurana , Sainyam Galhotra , Oktie Hassanzadeh , Kavitha Srinivas , Horst Cornelius Samulowitz
摘要: Embodiments relate to a system, program product, and method for employing feature engineering to improve classifier performance. A first machine learning (ML) model with a first learning program is selected. The first selected ML model is operatively associated with a first structured dataset. First features in the first dataset directed at performance of the selected ML model are identified. A second structured dataset is assessed with respect to the identified features in the first dataset, and new features in the second dataset are identified, where the new features are semantically related to the identified features in the first dataset. The first dataset is dynamically augmented with the identified new features in the second dataset. The dynamically augmented first dataset is applied to the selected ML model to subject an embedded learning algorithm of the selected ML model to training using the augmented first dataset.
-
公开(公告)号:US20220366269A1
公开(公告)日:2022-11-17
申请号:US17317242
申请日:2021-05-11
发明人: Dakuo Wang , Udayan Khurana , Daniel Karl I. Weidele , Arunima Chaudhary , Carolina Maria Spina , Abel Valente , Chuang Gan , Horst Cornelius Samulowitz , Lisa Amini
摘要: A dataset including features and values associated with the features can be received. Each of the features in the dataset can be mapped to a corresponding node in a knowledge graph based on the concept represented by the corresponding node. The knowledge graph can be traversed to find a candidate node connected to at least one mapped node, the candidate node not being mapped to a feature in the dataset. A concept associated with the candidate node can be identified as a new feature. A machine learning model pipeline can use the features in the dataset and the new feature to select a subset of features for training a machine learning model.
-
-
-
-
-
-
-
-
-