SEMANTIC ANNOTATION FOR TABULAR DATA
    1.
    发明公开

    公开(公告)号:US20230161774A1

    公开(公告)日:2023-05-25

    申请号:US17534489

    申请日:2021-11-24

    IPC分类号: G06F16/2457 G06N3/08 G06N7/00

    摘要: An approach to column to semantic concept mapping using joint estimation through piecewise maximum likelihood estimation and utilizing large openly available structured data may be provided. The approach may include a special estimation methods for categorical, numeric, and alphanumeric/symbolic data, while unifying the overarching estimation with a common framework of likelihood estimation. The approach may also include indexes to support quick estimation computations for numeric, categorical, and mixed type data. Additionally, the approach may include semantic context utilization without a polynomial increase in mapping runtime or resource utilization.

    DATA AUGMENTATION USING SEMANTIC TRANSFORMS
    3.
    发明公开

    公开(公告)号:US20240144084A1

    公开(公告)日:2024-05-02

    申请号:US18051900

    申请日:2022-11-02

    IPC分类号: G06N20/00

    CPC分类号: G06N20/00

    摘要: A method of data augmentation includes receiving, by a processor, a set of data including a plurality of variables, mapping each variable to one or more target concepts associated with a name of each variable, and acquiring a set of semantic transforms, each semantic transform including a function applied to one or more concepts mapped to a respective variable. The method also includes comparing the one or more target concepts to the one or more concepts of each semantic transform, selecting at least one semantic transform based on the comparing, generating an expression for each selected semantic transform, each expression configured to apply a function of a selected semantic transform to at least one of the plurality of variables, and augmenting the set of data for use in an application by adding each expression to the set of data.

    Knowledge aided feature engineering

    公开(公告)号:US11599826B2

    公开(公告)日:2023-03-07

    申请号:US16741084

    申请日:2020-01-13

    IPC分类号: G06N20/00 G06F11/34

    摘要: Embodiments relate to a system, program product, and method for employing feature engineering to improve classifier performance. A first machine learning (ML) model with a first learning program is selected. The first selected ML model is operatively associated with a first structured dataset. First features in the first dataset directed at performance of the selected ML model are identified. A second structured dataset is assessed with respect to the identified features in the first dataset, and new features in the second dataset are identified, where the new features are semantically related to the identified features in the first dataset. The first dataset is dynamically augmented with the identified new features in the second dataset. The dynamically augmented first dataset is applied to the selected ML model to subject an embedded learning algorithm of the selected ML model to training using the augmented first dataset.