Knowledge Aided Feature Engineering

    公开(公告)号:US20210216904A1

    公开(公告)日:2021-07-15

    申请号:US16741084

    申请日:2020-01-13

    IPC分类号: G06N20/00 G06F11/34

    摘要: Embodiments relate to a system, program product, and method for employing feature engineering to improve classifier performance. A first machine learning (ML) model with a first learning program is selected. The first selected ML model is operatively associated with a first structured dataset. First features in the first dataset directed at performance of the selected ML model are identified. A second structured dataset is assessed with respect to the identified features in the first dataset, and new features in the second dataset are identified, where the new feature is semantically related to the identified features in the first dataset. The first dataset is dynamically augmented with the identified new features in the second dataset. The dynamically augmented first dataset is applied to the selected ML model to subject an embedded learning algorithm of the selected ML model to training using the augmented first dataset.

    Methods for automatically configuring performance evaluation schemes for machine learning algorithms

    公开(公告)号:US11681931B2

    公开(公告)日:2023-06-20

    申请号:US16580953

    申请日:2019-09-24

    IPC分类号: G06N5/04 G06N20/00

    CPC分类号: G06N5/04 G06N20/00

    摘要: A system that provides a mathematical formulation for new problem of model validation and model selection in presence of test data feedback. The system comprises a memory that stores computer-executable components. A processor, operably coupled to the memory, executes the computer-executable components stored in the memory. A selection component selects a metric of performance evaluation accuracy; and a configuration component configures performance evaluation schemes for machine learning algorithms. A characterization component employs a supervised learning-based approach to characterize relationship between the configuration of the performance evaluation scheme and fidelity of performance estimates; and an optimization component that optimizes accuracy of the machine learning algorithms as a function of size of training data set relative to size of validation data set through selection of values associated with the configuration parameters.

    Knowledge aided feature engineering

    公开(公告)号:US11599826B2

    公开(公告)日:2023-03-07

    申请号:US16741084

    申请日:2020-01-13

    IPC分类号: G06N20/00 G06F11/34

    摘要: Embodiments relate to a system, program product, and method for employing feature engineering to improve classifier performance. A first machine learning (ML) model with a first learning program is selected. The first selected ML model is operatively associated with a first structured dataset. First features in the first dataset directed at performance of the selected ML model are identified. A second structured dataset is assessed with respect to the identified features in the first dataset, and new features in the second dataset are identified, where the new features are semantically related to the identified features in the first dataset. The first dataset is dynamically augmented with the identified new features in the second dataset. The dynamically augmented first dataset is applied to the selected ML model to subject an embedded learning algorithm of the selected ML model to training using the augmented first dataset.