NAME MATCHING ENGINE BOOSTED BY MACHINE LEARNING

    公开(公告)号:US20240370500A1

    公开(公告)日:2024-11-07

    申请号:US18773452

    申请日:2024-07-15

    Abstract: Techniques are described herein for a Name Matching Engine that integrates two Machine Learning (ML) module options. The first ML module is a feature-engineered classifier that boosts text-based name matching techniques with a binary classifier ML model. The feature-engineered classifier comprises a first stage of text-based candidate finding, and a second stage in which a binary classifier model predicts whether each string, of the candidate match list, is a match or not. The binary classifier model is based on features from two or more of: a name feature level, a word feature level, a character feature level, and an initial feature level. The second ML module of the Name Matching Engine comprises an end-to-end Recurrent Neural Network (RNN) model that directly accepts name strings as a sequence of n-grams and generates learned text embeddings. The text embeddings of matching name strings are close to each other in the feature space.

    NAME MATCHING ENGINE BOOSTED BY MACHINE LEARNING

    公开(公告)号:US20210287069A1

    公开(公告)日:2021-09-16

    申请号:US16989306

    申请日:2020-08-10

    Abstract: Techniques are described herein for a Name Matching Engine that integrates two Machine Learning (ML) module options. The first ML module is a feature-engineered classifier that boosts text-based name matching techniques with a binary classifier ML model. The feature-engineered classifier comprises a first stage of text-based candidate finding, and a second stage in which a binary classifier model predicts whether each string, of the candidate match list, is a match or not. The binary classifier model is based on features from two or more of: a name feature level, a word feature level, a character feature level, and an initial feature level. The second ML module of the Name Matching Engine comprises an end-to-end Recurrent Neural Network (45RNN) model that directly accepts name strings as a sequence of n-grams and generates learned text embeddings. The text embeddings of matching name strings are close to each other in the feature space.

    GRAPH MACHINE LEARNING FOR CASE SIMILARITY
    3.
    发明公开

    公开(公告)号:US20240330130A1

    公开(公告)日:2024-10-03

    申请号:US18740689

    申请日:2024-06-12

    CPC classification number: G06F11/1476 G06N3/04 G06V30/18181

    Abstract: Herein is machine learning for anomalous graph detection based on graph embedding, shuffling, comparison, and unsupervised training techniques that can characterize an unfamiliar graph. In an embodiment, a computer obtains many known vectors that respectively represent known graphs. A new vector is generated that represents a new graph that contains multiple vertices. The new vector may contain an arithmetic aggregation of vertex vectors that respectively represent multiple vertices and/or a vector that represents a virtual vertex that is connected to the multiple vertices by respective virtual edges. In the many known vectors, some similar vectors that are similar to the new vector are identified. The new graph is automatically characterized based on a subset of the known graphs that the similar vectors represent.

    NAMED ENTITY DISAMBIGUATION USING ENTITY DISTANCE IN A KNOWLEDGE GRAPH

    公开(公告)号:US20200342055A1

    公开(公告)日:2020-10-29

    申请号:US16392386

    申请日:2019-04-23

    Abstract: Techniques are described herein for performing named entity disambiguation. According to an embodiment, a method includes receiving input text, extracting a first mention and a second mention from the input text, and selecting, from a knowledge graph, a plurality of first candidate vertices for the first mention and a plurality of second candidate vertices for the second mention. The present method also includes evaluating a score function that analyzes vertex embedding similarity between the plurality of first candidate vertices and the plurality of second candidate vertices. In response to evaluating and seeking to optimize the score function, the method performs selecting a first selected candidate vertex from the plurality of first candidate vertices and a second selected candidate vertex from the plurality of second candidate vertices. Further, the present method includes mapping a first entry from the knowledge graph to the first mention and mapping a second entry from the knowledge graph to the second mention. In this embodiment, the first entry corresponds to the first selected candidate vertex and the second entry corresponds to the second selected candidate.

    Learning property graph representations edge-by-edge

    公开(公告)号:US11205050B2

    公开(公告)日:2021-12-21

    申请号:US16179049

    申请日:2018-11-02

    Abstract: Techniques are described herein for learning property graph representations edge-by-edge. In an embodiment, an input graph is received. The input graph comprises a plurality of vertices and a plurality of edges. Each vertex of the plurality of vertices is associated with vertex properties of the respective vertex. A vertex-to-property mapping is generated for each vertex of the plurality of vertices. The mapping maps each vertex to a vertex-property signature of a plurality of vertex-property signatures. A plurality of edge words is generated. Each edge word corresponds to one or more edges that each begin at a first vertex having a particular vertex-property signature of the plurality of vertex property signatures and end at a second vertex having a particular vertex-property signature of the plurality of vertex property signatures. A plurality of sentences is generated. Each sentence comprises edge words directly connected along a path of a plurality of paths in the input graph. Using the plurality of sentences and the plurality of edge words, a document vectorization model is used to generate machine learning vectors that represent the input graph.

    NAMED ENTITY DISAMBIGUATION USING ENTITY DISTANCE IN A KNOWLEDGE GRAPH

    公开(公告)号:US20210142008A1

    公开(公告)日:2021-05-13

    申请号:US17153078

    申请日:2021-01-20

    Abstract: According to an embodiment, a method includes converting a knowledge base into a graph. In this embodiment, the knowledge base contains a plurality of entities and specifies a plurality of relationships among the plurality of entities, and entities in the knowledge base correspond to vertices in the graph, and relationships between entities in the knowledge base correspond to edges between vertices in the graph. The method may also include extracting a plurality of vertex embeddings from the graph. An example vertex embedding of the plurality of vertex embeddings represents, for a particular vertex, a proximity of the particular vertex to other vertices of the graph. Further, the method may include performing, based at least in part on the plurality of vertex embeddings, entity linking between input text and the knowledge base.

    CATEGORICAL FEATURE ENCODING FOR PROPERTY GRAPHS BY VERTEX PROXIMITY

    公开(公告)号:US20200257982A1

    公开(公告)日:2020-08-13

    申请号:US16270535

    申请日:2019-02-07

    Abstract: Techniques are described herein for encoding categorical features of property graphs by vertex proximity. In an embodiment, an input graph is received. The input graph comprises a plurality of vertices, each vertex of said plurality of vertices is associated with vertex properties of said vertex. The vertex properties include at least one categorical feature value of one or more potential categorical feature values. For each of the one or more potential categorical feature values of each vertex, a numerical feature value is generated. The numerical feature value represents a proximity of the respective vertex to other vertices of the plurality of vertices that have a categorical feature value corresponding to the respective potential categorical feature value. Using the numerical feature values for each vertex, proximity encoding data is generated representing said input graph. The proximity encoding data is used to efficiently train machine learning models that produce results with enhanced accuracy.

    Invalid traffic detection using explainable unsupervised graph ML

    公开(公告)号:US12184692B2

    公开(公告)日:2024-12-31

    申请号:US17558342

    申请日:2021-12-21

    Abstract: Herein are graph machine learning explainability (MLX) techniques for invalid traffic detection. In an embodiment, a computer generates a graph that contains: a) domain vertices that represent network domains that received requests and b) address vertices that respectively represent network addresses from which the requests originated. Based on the graph, domain embeddings are generated that respectively encode the domain vertices. Based on the domain embeddings, multidomain embeddings are generated that respectively encode the network addresses. The multidomain embeddings are organized into multiple clusters of multidomain embeddings. A particular cluster is detected as suspicious. In an embodiment, an unsupervised trained graph model generates the multidomain embeddings. Based on the clusters of multidomain embeddings, feature importances are unsupervised trained. Based on the feature importances, an explanation is automatically generated for why an object is or is not suspicious. The explained object may be a cluster or other batch of network addresses or a single network address.

    INVALID TRAFFIC DETECTION USING EXPLAINABLE UNSUPERVISED GRAPH ML

    公开(公告)号:US20230199026A1

    公开(公告)日:2023-06-22

    申请号:US17558342

    申请日:2021-12-21

    CPC classification number: H04L63/1483 H04L63/1425 G06N20/00

    Abstract: Herein are graph machine learning explainability (MLX) techniques for invalid traffic detection. In an embodiment, a computer generates a graph that contains: a) domain vertices that represent network domains that received requests and b) address vertices that respectively represent network addresses from which the requests originated. Based on the graph, domain embeddings are generated that respectively encode the domain vertices. Based on the domain embeddings, multidomain embeddings are generated that respectively encode the network addresses. The multidomain embeddings are organized into multiple clusters of multidomain embeddings. A particular cluster is detected as suspicious. In an embodiment, an unsupervised trained graph model generates the multidomain embeddings. Based on the clusters of multidomain embeddings, feature importances are unsupervised trained. Based on the feature importances, an explanation is automatically generated for why an object is or is not suspicious. The explained object may be a cluster or other batch of network addresses or a single network address.

    Named entity disambiguation using entity distance in a knowledge graph

    公开(公告)号:US10902203B2

    公开(公告)日:2021-01-26

    申请号:US16392386

    申请日:2019-04-23

    Abstract: Techniques are described herein for performing named entity disambiguation. According to an embodiment, a method includes receiving input text, extracting a first mention and a second mention from the input text, and selecting, from a knowledge graph, a plurality of first candidate vertices for the first mention and a plurality of second candidate vertices for the second mention. The present method also includes evaluating a score function that analyzes vertex embedding similarity between the plurality of first candidate vertices and the plurality of second candidate vertices. In response to evaluating and seeking to optimize the score function, the method performs selecting a first selected candidate vertex from the plurality of first candidate vertices and a second selected candidate vertex from the plurality of second candidate vertices. Further, the present method includes mapping a first entry from the knowledge graph to the first mention and mapping a second entry from the knowledge graph to the second mention. In this embodiment, the first entry corresponds to the first selected candidate vertex and the second entry corresponds to the second selected candidate.

Patent Agency Ranking