Self-supervised document-to-document similarity system

    公开(公告)号:US11875590B2

    公开(公告)日:2024-01-16

    申请号:US18068519

    申请日:2022-12-19

    CPC classification number: G06V30/418 G06F18/2113 G06F18/2178 G06V10/751

    Abstract: Examples provide a self-supervised language model for document-to-document similarity scoring and ranking long documents of arbitrary length in an absence of similarity labels. In a first stage of a two-staged hierarchical scoring, a sentence similarity matrix is created for each paragraph in the candidate document. A sentence similarity score is calculated based on the sentence similarity matrix. In the second stage, a paragraph similarity matrix is constructed based on aggregated sentence similarity scores associated with the first candidate document. A total similarity score for the document is calculated based on the normalize the paragraph similarity matrix for each candidate document in a collection of documents. The model is trained using a masked language model and intra-and-inter document sampling. The documents are ranked based on the similarity scores for the documents.

    Diagnostic tool for deep learning similarity models

    公开(公告)号:US11532147B2

    公开(公告)日:2022-12-20

    申请号:US17084468

    申请日:2020-10-29

    Abstract: A diagnostic tool for deep learning similarity models and image classifiers provides valuable insight into neural network decision-making. A disclosed solution generates a saliency map by: receiving a baseline image and a test image; determining, with a convolutional neural network (CNN), a first similarity between the baseline image and the test image; based on at least determining the first similarity, determining, for the test image, a first activation map for at least one CNN layer; based on at least determining the first similarity, determining, for the test image, a first gradient map for the at least one CNN layer; and generating a first saliency map as an element-wise function of the first activation map and the first gradient map. Some examples further determine a region of interest (ROI) in the first saliency map, cropping the test image to an area corresponding to the ROI, and determine a refined similarity score.

    Representation learning with side information

    公开(公告)号:US12223274B2

    公开(公告)日:2025-02-11

    申请号:US17452818

    申请日:2021-10-29

    Abstract: A relational similarity determination engine receives as input a dataset including a set of entities and co-occurrence data that defines co-occurrence relations for pairs of the entities. The relational similarity determination engine also receives as input side information defining explicit relations between the entities. The relational similarity determination engine jointly models the co-occurrence relations and the explicit relations for the entities to compute a similarity metric for each different pair of entities within the dataset. Based on the computed similarity metrics, the relational similarity determination engine identifies a most similar replacement entity from the dataset for each of the entities within the dataset. For a select entity received as an input, the relational similarity determination engine outputs the identified most similar replacement entity.

    Self-supervised document-to-document similarity system

    公开(公告)号:US11580764B2

    公开(公告)日:2023-02-14

    申请号:US17354333

    申请日:2021-06-22

    Abstract: Examples provide a self-supervised language model for document-to-document similarity scoring and ranking long documents of arbitrary length in an absence of similarity labels. In a first stage of a two-staged hierarchical scoring, a sentence similarity matrix is created for each paragraph in the candidate document. A sentence similarity score is calculated based on the sentence similarity matrix. In the second stage, a paragraph similarity matrix is constructed based on aggregated sentence similarity scores associated with the first candidate document. A total similarity score for the document is calculated based on the normalize the paragraph similarity matrix for each candidate document in a collection of documents. The model is trained using a masked language model and intra-and-inter document sampling. The documents are ranked based on the similarity scores for the documents.

    Diagnostic tool for deep learning similarity models

    公开(公告)号:US11769315B2

    公开(公告)日:2023-09-26

    申请号:US18052568

    申请日:2022-11-03

    CPC classification number: G06V10/464 G06F18/2113 G06F18/22 G06N3/08 G06V10/25

    Abstract: A diagnostic tool for deep learning similarity models and image classifiers provides valuable insight into neural network decision-making. A disclosed solution generates a saliency map by: receiving a baseline image and a test image; determining, with a convolutional neural network (CNN), a first similarity between the baseline image and the test image; based on at least determining the first similarity, determining, for the test image, a first activation map for at least one CNN layer; based on at least determining the first similarity, determining, for the test image, a first gradient map for the at least one CNN layer; and generating a first saliency map as an element-wise function of the first activation map and the first gradient map. Some examples further determine a region of interest (ROI) in the first saliency map, cropping the test image to an area corresponding to the ROI, and determine a refined similarity score.

    Machine learning multiple features of depicted item

    公开(公告)号:US11373095B2

    公开(公告)日:2022-06-28

    申请号:US16725652

    申请日:2019-12-23

    Abstract: Machine learning multiple features of an item depicted in images. Upon accessing multiple images that depict the item, a neural network is used to machine train on the plurality of images to generate embedding vectors for each of multiple features of the item. For each of multiple features of the item depicted in the images, in each iteration of the machine learning, the embedding vector is converted into a probability vector that represents probabilities that the feature has respective values. That probability vector is then compared with a value vector representing the actual value of that feature in the depicted item, and an error between the two vectors is determined. That error is used to adjust parameters of the neural network used to generate the embedding vector, allowing for the next iteration in the generation of the embedding vectors. These iterative changes continue thereby training the neural network.

    Hierarchical multisource playlist generation

    公开(公告)号:US10242098B2

    公开(公告)日:2019-03-26

    申请号:US15169305

    申请日:2016-05-31

    Abstract: A playlist generator that utilizes multiple data sources to rank each track within a set of candidate tracks to enable selection of candidate tracks according to the ranking. Candidate tracks are each scored according to one or more features, such as acoustic similarity and/or similar usage patterns of the candidate track or artist of the candidate track to a current or previously played track or artist. Each feature is weighted according to historical listening patterns surrounding a user-selected playlist seed artist. The weighting may also be further corrected according to historical listening patterns of the particular user. When historical usage data related to a particular seed artist is limited, more generalized historical usage data related to a higher level in a genre hierarchy may be used.

    Deep gradient activation map model refinement

    公开(公告)号:US12153651B2

    公开(公告)日:2024-11-26

    申请号:US17452961

    申请日:2021-10-29

    Abstract: A method of generating an aggregate saliency map using a convolutional neural network. Convolutional activation maps of the convolutional neural network model are received into a saliency map generator, the convolutional activation maps being generated by the neural network model while computing the one or more prediction scores based on unlabeled input data. Each convolutional activation map corresponds to one of the multiple encoding layers. The saliency map generator generates a layer-dependent saliency map for each encoding layer of the unlabeled input data, each layer-dependent saliency map being based on a summation of element-wise products of the convolutional activation maps and their corresponding gradients. The layer-dependent saliency maps are combined into the aggregate saliency map indicating the relative contributions of individual components of the unlabeled input data to the one or more prediction scores computed by the convolutional neural network model on the unlabeled input data.

Patent Agency Ranking