-
公开(公告)号:US20230161774A1
公开(公告)日:2023-05-25
申请号:US17534489
申请日:2021-11-24
发明人: Udayan Khurana , Sainyam Galhotra
IPC分类号: G06F16/2457 , G06N3/08 , G06N7/00
CPC分类号: G06F16/24573 , G06N3/08 , G06N7/005
摘要: An approach to column to semantic concept mapping using joint estimation through piecewise maximum likelihood estimation and utilizing large openly available structured data may be provided. The approach may include a special estimation methods for categorical, numeric, and alphanumeric/symbolic data, while unifying the overarching estimation with a common framework of likelihood estimation. The approach may also include indexes to support quick estimation computations for numeric, categorical, and mixed type data. Additionally, the approach may include semantic context utilization without a polynomial increase in mapping runtime or resource utilization.
-
2.
公开(公告)号:US20160110410A1
公开(公告)日:2016-04-21
申请号:US14744204
申请日:2015-06-19
发明人: Udayan Khurana , Srinivasan Parthasarathy , Venkata N. Pavuluri , Deepak S. Turaga , Long H. Vu
IPC分类号: G06F17/30
CPC分类号: G06F16/242 , G06F16/245 , G06F16/24553 , G06F16/248 , G06F16/3323
摘要: Embodiments relate to analyzing dataset. A method of analyzing data is provided. The method obtains a description of a dataset. The method automatically generates a plurality of analysis options from the description of the dataset. The method generates a plurality of queries based on the analysis options. The method deploys the queries on the dataset to build a plurality of statistical models from the dataset.
-
公开(公告)号:US20240144084A1
公开(公告)日:2024-05-02
申请号:US18051900
申请日:2022-11-02
发明人: Horst Cornelius Samulowitz , Udayan Khurana , Kavitha Srinivas , TAKAAKI TATEISHI , IBRAHIM ABDELAZIZ , Julian Timothy Dolby
IPC分类号: G06N20/00
CPC分类号: G06N20/00
摘要: A method of data augmentation includes receiving, by a processor, a set of data including a plurality of variables, mapping each variable to one or more target concepts associated with a name of each variable, and acquiring a set of semantic transforms, each semantic transform including a function applied to one or more concepts mapped to a respective variable. The method also includes comparing the one or more target concepts to the one or more concepts of each semantic transform, selecting at least one semantic transform based on the comparing, generating an expression for each selected semantic transform, each expression configured to apply a function of a selected semantic transform to at least one of the plurality of variables, and augmenting the set of data for use in an application by adding each expression to the set of data.
-
公开(公告)号:US20230153634A1
公开(公告)日:2023-05-18
申请号:US17525932
申请日:2021-11-14
发明人: Dakuo Wang , Udayan Khurana , Chuang Gan , Gregory Bramble , Abel Valente , Arunima Chaudhary , Carolina Maria Spina , Micah Smith
摘要: A domain of an input dataset is identified and one or more archived domain knowledge features corresponding to the identified domain are identified. One or more user feature definitions for one or more user features defined by a user are inputted. The identified archived domain knowledge features and the user features are processed to generate a set of candidate features for presentation to the user. A selection of a subset of the candidate features is obtained from the user and one or more predictive models are generated based on the selected features.
-
公开(公告)号:US11048718B2
公开(公告)日:2021-06-29
申请号:US15673812
申请日:2017-08-10
发明人: Elias Khalil , Udayan Khurana , Fatemeh Nargesian , Horst Cornelius Samulowitz , Deepak S. Turaga
摘要: Embodiments for feature engineering by one or more processors are described. A plurality of transformations are applied to a set of features in each of a plurality of datasets. An output of each of the plurality of transformations is a score. For each of the sets of features, selecting those of the plurality of transformations for which said score is above a predetermined threshold. A signal representative of said selection is generated.
-
公开(公告)号:US10353890B2
公开(公告)日:2019-07-16
申请号:US14744204
申请日:2015-06-19
发明人: Udayan Khurana , Srinivasan Parthasarathy , Venkata N. Pavuluri , Deepak S. Turaga , Long H. Vu
IPC分类号: G06F16/242 , G06F16/245 , G06F16/248 , G06F16/332 , G06F16/2455
摘要: Embodiments relate to analyzing dataset. A method of analyzing data is provided. The method obtains a description of a dataset. The method automatically generates a plurality of analysis options from the description of the dataset. The method generates a plurality of queries based on the analysis options. The method deploys the queries on the dataset to build a plurality of statistical models from the dataset.
-
公开(公告)号:US11599826B2
公开(公告)日:2023-03-07
申请号:US16741084
申请日:2020-01-13
发明人: Udayan Khurana , Sainyam Galhotra , Oktie Hassanzadeh , Kavitha Srinivas , Horst Cornelius Samulowitz
摘要: Embodiments relate to a system, program product, and method for employing feature engineering to improve classifier performance. A first machine learning (ML) model with a first learning program is selected. The first selected ML model is operatively associated with a first structured dataset. First features in the first dataset directed at performance of the selected ML model are identified. A second structured dataset is assessed with respect to the identified features in the first dataset, and new features in the second dataset are identified, where the new features are semantically related to the identified features in the first dataset. The first dataset is dynamically augmented with the identified new features in the second dataset. The dynamically augmented first dataset is applied to the selected ML model to subject an embedded learning algorithm of the selected ML model to training using the augmented first dataset.
-
公开(公告)号:US20220366269A1
公开(公告)日:2022-11-17
申请号:US17317242
申请日:2021-05-11
发明人: Dakuo Wang , Udayan Khurana , Daniel Karl I. Weidele , Arunima Chaudhary , Carolina Maria Spina , Abel Valente , Chuang Gan , Horst Cornelius Samulowitz , Lisa Amini
摘要: A dataset including features and values associated with the features can be received. Each of the features in the dataset can be mapped to a corresponding node in a knowledge graph based on the concept represented by the corresponding node. The knowledge graph can be traversed to find a candidate node connected to at least one mapped node, the candidate node not being mapped to a feature in the dataset. A concept associated with the candidate node can be identified as a new feature. A machine learning model pipeline can use the features in the dataset and the new feature to select a subset of features for training a machine learning model.
-
公开(公告)号:US10346393B2
公开(公告)日:2019-07-09
申请号:US14518506
申请日:2014-10-20
发明人: Udayan Khurana , Srinivasan Parthasarathy , Venkata N. Pavuluri , Deepak S. Turaga , Long H. Vu
IPC分类号: G06F16/242 , G06F16/245 , G06F16/248 , G06F16/332 , G06F16/2455
摘要: Embodiments relate to analyzing dataset. A method of analyzing data is provided. The method obtains a description of a dataset. The method automatically generates a plurality of analysis options from the description of the dataset. The method generates a plurality of queries based on the analysis options. The method deploys the queries on the dataset to build a plurality of statistical models from the dataset.
-
公开(公告)号:US20230177032A1
公开(公告)日:2023-06-08
申请号:US17545880
申请日:2021-12-08
发明人: Daniel Karl I. Weidele , Lisa Amini , Udayan Khurana , Kavitha Srinivas , Horst Cornelius Samulowitz , Takaaki Tateishi , Carolina Maria Spina , Dakuo Wang , Abel Valente , Arunima Chaudhary , Toshihiro Takahashi
IPC分类号: G06F16/22 , G06F16/2457 , G06F16/28
CPC分类号: G06F16/221 , G06F16/2457 , G06F16/288 , G06F16/2282
摘要: A computer-implemented method according to one embodiment includes identifying a data set and meta information; and augmenting the data set with additional features in response to an automatic analysis of the data set in view of the meta information.
-
-
-
-
-
-
-
-
-