Convolutional neural network (CNN)-based anomaly detection

    公开(公告)号:US11093816B2

    公开(公告)日:2021-08-17

    申请号:US15726267

    申请日:2017-10-05

    Abstract: The technology disclosed determines which field values in a set of unique field values for a particular field in a fielded dataset are anomalous using six similarity measures. A factor vector is generated per similarity measure and combined to form an input matrix. A convolutional neural network processes the input matrix to generate evaluation vectors. A fully-connected network evaluates the evaluation vectors to generate an anomaly scalar for a particular unique field value. Thresholding is applied to anomaly scalar to determine whether the particular unique field value is anomalous.

    High-dimensional data management and presentation

    公开(公告)号:US10831757B2

    公开(公告)日:2020-11-10

    申请号:US15885499

    申请日:2018-01-31

    Abstract: An online system manages data by determining relevance of data dimensions to users. The online system determines which data dimensions a user is likely to be interested in. If a user requests to access a data set that includes data of different dimensions, the online system analyzes the dimensions' relevance to the user before providing the data set to the user. The online system provides the data to the user by prioritizing data dimensions that are more relevant to the user. As such, the online system improves the user experience by allowing users to conveniently and quickly locate relevant data and minimizing the distraction caused by irrelevant data. The online system may create and provide a user interface to present data dimensions that are determined to be relevant.

    Automated data discovery with external knowledge bases

    公开(公告)号:US11113256B2

    公开(公告)日:2021-09-07

    申请号:US16455133

    申请日:2019-06-27

    Abstract: System and methods are described for improving automated data discovery analysis in a cloud computing environment. A method includes receiving a request to analyze a data set stored in the memory device, the data set including one or more columns, the one or more columns including one or more data values in one or more cells of each column; classify each of the one or more columns as a type of column; for a selected one of the one or more columns, if the selected column's type is an external type, join one or more columns of an external knowledge base correlated to the selected column into the data set to create an expanded data set; and execute an automated data discovery model on the expanded data set.

    MULTI-OBJECTIVE RECOMMENDATIONS IN A DATA ANALYTICS SYSTEM

    公开(公告)号:US20220092504A1

    公开(公告)日:2022-03-24

    申请号:US17030044

    申请日:2020-09-23

    Abstract: A method to provide multi-objective recommendations. The method includes receiving user input indicating a plurality of objectives, where each of the plurality of objectives indicates a desired goal for a field of interest, receiving user input indicating a plurality of actionable fields, receiving user input indicating selection of one of a plurality of records in a data set, determining, based on applying an evolutionary algorithm, one or more candidate changes to values of the plurality of actionable fields of the selected record, determining, for each of the one or more candidate changes, a multi-objective score for that candidate change, selecting one or more of the one or more candidate changes to recommend to a user based on the multi-objective scores of the one or more candidate changes, and providing, for display to the user, the selected one or more candidate changes as recommended changes.

    Techniques for determining and presenting dataset join candidates

    公开(公告)号:US11138202B2

    公开(公告)日:2021-10-05

    申请号:US16525199

    申请日:2019-07-29

    Abstract: Examples are described herein that relate to determining a level of relatedness between datasets. An approximation can be made of whether an entry in a first dataset appears in a same row as an entry in a second dataset. The approximation can be made by grouping entries in the second dataset together and determining an occurrence that an entry occurs in a same row as any of the entries in a grouping of entries. A test of independence between datasets can be made based at least on the occurrence values. Datasets can be ranked according to level of independence and presented to a user as candidates to join with a dataset. Occurrence values or rankings can be precomputed and available for use so that join candidates can be presented with a little amount of perceived delay to a user. A user interface can present join candidates for a dataset and allow the user the select datasets for joining. Joining of first and second datasets can supplement entries in both of the datasets and create a third dataset.

    Convolutional neural network (CNN)-based suggestions for anomaly input

    公开(公告)号:US11087211B2

    公开(公告)日:2021-08-10

    申请号:US15726268

    申请日:2017-10-05

    Abstract: The technology disclosed determines one or more field values in a set of field values for a particular field in a fielded dataset that are similar to an input value using six similarity measures. A factor vector is generated per similarity measure and combined to form an input matrix. A convolutional neural network processes the input matrix to generate evaluation vectors. A fully-connected network evaluates the evaluation vectors to generate suggestion scalars for similarity to a particular input value. Thresholding is applied to suggestions scalars to determine one or more suggestion candidates for the particular input value.

    HIGH-DIMENSIONAL DATA MANAGEMENT AND PRESENTATION

    公开(公告)号:US20190236191A1

    公开(公告)日:2019-08-01

    申请号:US15885499

    申请日:2018-01-31

    Abstract: An online system manages data by determining relevance of data dimensions to users. The online system determines which data dimensions a user is likely to be interested in. If a user requests to access a data set that includes data of different dimensions, the online system analyzes the dimensions' relevance to the user before providing the data set to the user. The online system provides the data to the user by prioritizing data dimensions that are more relevant to the user. As such, the online system improves the user experience by allowing users to conveniently and quickly locate relevant data and minimizing the distraction caused by irrelevant data. The online system may create and provide a user interface to present data dimensions that are determined to be relevant.

Patent Agency Ranking