Systems and methods for estimating typed graphlets in large data

    公开(公告)号:US11343325B2

    公开(公告)日:2022-05-24

    申请号:US17008339

    申请日:2020-08-31

    Applicant: Adobe Inc.

    Abstract: A system and method for fast, accurate, and scalable typed graphlet estimation. The system and method utilizes typed edge sampling and typed path sampling to estimate typed graphlet counts in large graphs in a small fraction of the computing time of existing systems. The obtained unbiased estimates of typed graphlets are highly accurate, and have applications in the analysis, mining, and predictive modeling of massive real-world networks. During operation, the system obtains a dataset indicating nodes and edges of a graph. The system samples a portion of the graph and counts a number of graph features in the sampled portion of the graph. The system then computes an occurrence frequency of a typed graphlet pattern and a total number of typed graphlets associated with the typed graphlet pattern in the graph.

    DETERMINING PATTERNS WITHIN A STRING SEQUENCE OF USER ACTIONS

    公开(公告)号:US20220148015A1

    公开(公告)日:2022-05-12

    申请号:US17096255

    申请日:2020-11-12

    Applicant: Adobe Inc.

    Abstract: Techniques are provided for analyzing user actions that have occurred over a time period. The user actions can be, for example, with respect to the user's navigation of content or interaction with an application. Such user data is provided in an action string, which is converted into a highly searchable format. As such, the presence and frequency of particular user actions and patterns of user actions within an action string of a particular user, as well as among multiple action strings of multiple users, are determinable. Subsequences of one or more action strings are identified and both the number of action strings that include a particular subsequence and the frequency that a particular subsequence is present in a given action string are determinable. The conversion involves breaking that string into a sorted list of locations for the actions within that string. Queries can be readily applied against the sorted list.

    GENERATING OVERLAP ESTIMATIONS BETWEEN HIGH-VOLUME DIGITAL DATA SETS BASED ON MULTIPLE SKETCH VECTOR SIMILARITY ESTIMATORS

    公开(公告)号:US20220138218A1

    公开(公告)日:2022-05-05

    申请号:US17090556

    申请日:2020-11-05

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to systems, methods, and non-transitory computer-readable media that estimate the overlap between sets of data samples. In particular, in one or more embodiments, the disclosed systems utilize a sketch-based sampling routine and a flexible, accurate estimator to determine the overlap (e.g., the intersection) between sets of data samples. For example, in some implementations, the disclosed systems generate a sketch vector—such as a one permutation hashing vector—for each set of data samples. The disclosed systems further compare the sketch vectors to determine an equal bin similarity estimator, a lesser bin similarity estimator, and a greater bin similarity estimator. The disclosed systems utilize one or more of the determined similarity estimators in generating an overlap estimation for the sets of data samples.

    Determining patterns within a string sequence of user actions

    公开(公告)号:US11978067B2

    公开(公告)日:2024-05-07

    申请号:US17096255

    申请日:2020-11-12

    Applicant: Adobe Inc.

    CPC classification number: G06Q30/0201 G06F7/08 G06Q10/10 G06F3/14

    Abstract: Techniques are provided for analyzing user actions that have occurred over a time period. The user actions can be, for example, with respect to the user's navigation of content or interaction with an application. Such user data is provided in an action string, which is converted into a highly searchable format. As such, the presence and frequency of particular user actions and patterns of user actions within an action string of a particular user, as well as among multiple action strings of multiple users, are determinable. Subsequences of one or more action strings are identified and both the number of action strings that include a particular subsequence and the frequency that a particular subsequence is present in a given action string are determinable. The conversion involves breaking that string into a sorted list of locations for the actions within that string. Queries can be readily applied against the sorted list.

    Trait expansion techniques in binary matrix datasets

    公开(公告)号:US11899693B2

    公开(公告)日:2024-02-13

    申请号:US17677323

    申请日:2022-02-22

    Applicant: Adobe Inc.

    CPC classification number: G06F16/285

    Abstract: A cluster generation system identifies data elements, from a first binary record, that each have a particular value and correspond to respective binary traits. A candidate description function describing the binary traits is generated, the candidate description function including a model factor that describes the data elements. Responsive to determining that a second record has additional data elements having the particular value and corresponding to the respective binary traits, the candidate description function is modified to indicate that the model factor describes the additional elements. The candidate description function is also modified to include a correction factor describing an additional binary trait excluded from the respective binary traits. Based on the modified candidate description function, the cluster generation system generates a data summary cluster, which includes a compact representation of the binary traits of the data elements and additional data elements.

    ONLINE INFERENCE AND LEARNING FOR NONSYMMETRIC DETERMINANTAL POINT PROCESSES

    公开(公告)号:US20230368265A1

    公开(公告)日:2023-11-16

    申请号:US17743360

    申请日:2022-05-12

    Applicant: Adobe Inc.

    CPC classification number: G06Q30/0631 G06Q30/0629 G06Q30/0643

    Abstract: Embodiments provide systems, methods, and computer storage media for a Nonsymmetric Determinantal Point Process (NDPPs) for compatible set recommendations in a setting where data representing entities (e.g., items) arrives in a stream. A stream representing compatible sets of entities is received and used to update a latent representation of the entities and a compatibility distribution indicating likelihood of compatibility of subsets of the entities. The probability distribution is accessed in a single sequential pass to predict a compatible complete set of entities that completes an incomplete set of entities. The predicted complete compatible set is provided a recommendation for entities that complete the incomplete set of entities.

    GRAPH NEURAL NETWORKS FOR DATASETS WITH HETEROPHILY

    公开(公告)号:US20220309334A1

    公开(公告)日:2022-09-29

    申请号:US17210157

    申请日:2021-03-23

    Applicant: Adobe Inc.

    Abstract: Techniques are provided for training graph neural networks with heterophily datasets and generating predictions for such datasets with heterophily. A computing device receives a dataset including a graph data structure and processes the dataset using a graph neural network. The graph neural network defines prior belief vectors respectively corresponding to nodes of the graph data structure, executes a compatibility-guided propagation from the set of prior belief vectors and using a compatibility matrix. The graph neural network predicts predicting a class label for a node of the graph data structure based on the compatibility-guided propagations and a characteristic of at least one node within a neighborhood of the node. The computing device outputs the graph data structure where it is usable by a software tool for modifying an operation of a computing environment.

Patent Agency Ranking