SYSTEMS AND METHODS FOR UNSUPERVISED NAMED ENTITY RECOGNITION

    公开(公告)号:US20240242032A1

    公开(公告)日:2024-07-18

    申请号:US17317110

    申请日:2021-05-11

    IPC分类号: G06F40/295 G06K9/62 G06N3/08

    摘要: Systems, apparatuses, methods, and computer program products are disclosed for unsupervised named entity recognition. An example method includes receiving, by a communications circuitry, a reference named entity list, the reference named entity list identifying a set of named entities and an entity type of each identified named entity. The example method further includes generating, by a vectorizer, vectors from the named entities identified in the reference named entity list, and consolidating, by a synthesizer, the generated vectors into a set of representative vectors, wherein each representative vector is associated with a particular entity type. Finally, the example method receiving, by an analysis engine, a set of text, and performing, by the analysis engine, named entity recognition on the set of text using the set of representative vectors to generate a tagged set of text.

    Auto-Triage Failures In A/B Testing
    2.
    发明公开

    公开(公告)号:US20240086495A1

    公开(公告)日:2024-03-14

    申请号:US17942234

    申请日:2022-09-12

    申请人: ThoughtSpot, Inc.

    IPC分类号: G06K9/62 G06F3/0484 G06F8/71

    摘要: First images that are screenshots from a first version of a software component are obtained. Second images that are screenshots from a second version are obtained. A collection of image deviations that includes pair-wise image deviations between pairs of images are identified. A pair of images includes a first image from the first images and a corresponding second image from the second images. An image deviation indicates a portion of the second image identified as differing from a spatially corresponding portion of the first image. The image deviations are grouped into deviation groups. At least some of the second images are associated with at least some of the deviation groups. A subset of the second images corresponding to a deviation group is output responsive to a selection of an indication of the deviation group.

    Systems and Methods for Application Clustering Based on Included Libraries and Observed Events

    公开(公告)号:US20230388346A1

    公开(公告)日:2023-11-30

    申请号:US17752987

    申请日:2022-05-25

    IPC分类号: H04L9/40 G06K9/62

    摘要: A system of one embodiment that provides proactive security policy suggestions for applications based on the applications' software composition and runtime behavior. The system includes a memory and a processor. The system is operable to access data that represents one or more features of an application. The application is running on one or more nodes in a computer network, and a feature indicates an application library of the node. The system is operable to apply a clustering algorithm to the data to generate a plurality of cluster sets. The system is operable to determine a security policy to apply to a cluster set of the plurality of cluster sets and apply the security policy to an application whose features are represented by the data in the cluster set.

    ROI-BASED DATA CONTENT GRAPH FOR WIDE DATA MANAGEMENT

    公开(公告)号:US20230334122A1

    公开(公告)日:2023-10-19

    申请号:US17719724

    申请日:2022-04-13

    摘要: This disclosure provides systems, methods, and media for creating a data graph database from various unstructured and unstructured data items for use by various services. The method comprises the operations of identifying unstructured data items in data subjects; recognizing regions of interest (ROIs) in the unstructured data items; and extracting the ROIs from the unstructured data items. The method further comprises encoding the extracted ROIs into ROI vectors; creating a data graph to represent the data subjects, the data items, and the ROI vectors; and storing the data graph into a graph database. The various embodiments can manage data items of different data formats together rather than separately, thus creating a data management system for managing data across data formats. The data management system can also store structured data items into the graph database, thus complementing the existing ETL procedure for structured data items.

    GENERATING PREDICTIONS VIA MACHINE LEARNING
    6.
    发明公开

    公开(公告)号:US20230274126A1

    公开(公告)日:2023-08-31

    申请号:US17682953

    申请日:2022-02-28

    申请人: PayPal, Inc.

    摘要: A plurality of first entities have been previously associated with a predefined activity. By performing a clustering algorithm on the first entities, a subset of the first entities is identified that have met a predefined criterion. Via a Natural Language Processing (NLP) technique, a multi-dimensional matrix is generated. The matrix has a plurality of vectors associated with attributes of the subset of the first entities. A neural network model is trained with the multi-dimensional matrix. A plurality of second entities are on a list that contains entities that have been flagged for engaging in, or having engaged, the predefined activity. Based on the trained neural network model, a prediction is made whether scanning the second entities against a plurality of third entities for matches will cause a number of alerts having a predefined characteristic to exceed a predefined threshold. The alerts correspond to matches that needs further investigation.

    META-LEARNING SYSTEMS AND/OR METHODS FOR ERROR DETECTION IN STRUCTURED DATA

    公开(公告)号:US20230205740A1

    公开(公告)日:2023-06-29

    申请号:US17560688

    申请日:2021-12-23

    申请人: SOFTWARE AG

    发明人: Mohamed ABDELAAL

    摘要: Certain example embodiments relate to meta-learning based error detection. Base classifiers are provided for historical attributes in historical datasets. Each is trained to indicate dirtiness of a value for the associated historical attribute. Clusters and a clustering model are generated using historical clustering features determined for each historical attribute, which are then associated with the clusters. For each dirty attribute in a dirty dataset, corresponding dirty clustering features are determined. The dirty attributes are assigned to the clusters using the corresponding determined dirty clustering features and the clustering model. The base classifiers associated with the clusters to which the dirty attributes were assigned are retrieved. Dirty features are extracted from the dirty dataset, and selectively modified. The extracted dirty features are applied to the retrieved the base classifiers to determine meta-features. A meta-classifier is trained using labeled meta-features. Predictions about the dirty dataset's dirtiness can be made using the meta-classifier.

    INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD

    公开(公告)号:US20190244132A1

    公开(公告)日:2019-08-08

    申请号:US16329303

    申请日:2017-08-18

    申请人: SONY CORPORATION

    发明人: NAOKI IDE

    IPC分类号: G06N20/00 G06K9/62 G06F17/18

    摘要: [Object] To previously predict learning performance in accordance with the labeling status of learning data. [Solution] Provided is an information processing device including: a data distribution presentation unit configured to perform dimensionality reduction on input learning data to generate a data distribution diagram related to the learning data; a learning performance prediction unit configured to predict learning performance on the basis of the data distribution diagram and a labeling status related to the learning data; and a display control unit configured to control a display related to the data distribution diagram and the learning performance. The data distribution diagram includes overlap information about clusters including the learning data and information about the number of pieces of the learning data belonging to each of the clusters.