Systems and methods of data augmentation for pre-trained embeddings

    公开(公告)号:US11461537B2

    公开(公告)日:2022-10-04

    申请号:US16827830

    申请日:2020-03-24

    Abstract: Systems and methods are provided for generating textual embeddings by tokenizing text data and generating vectors to be provided to a transformer system, where the textual embeddings are vector representations of semantic meanings of text that is part of the text data. The vectors may be averaged for every token of the generated textual embeddings and concatenating average output activations of two layers of the transformer system. Image embeddings may be generated with a convolutional neural network (CNN) from image data, wherein the image embeddings are vector representations of the images that are part of the image data. The textual embeddings and image embeddings may be combined to form combined embeddings to be provided to the transformer system.

    ESTIMATING PRODUCT ATTRIBUTE PREFERENCES

    公开(公告)号:US20220343389A1

    公开(公告)日:2022-10-27

    申请号:US17230257

    申请日:2021-04-14

    Abstract: Methods, computer readable media, and devices for estimating product attribute preferences are disclosed. One method may include identifying a set of users, a set of products offered to users of the set of users, and a set of product attributes associated with products in the set of products, creating a product embedding matrix, an attribute embedding matrix, a user interaction matrix, a product attribute matrix, and a user attribute matrix, assigning an attribute weight to each product attribute, assigning, for each user, a user attribute weight for each product attribute, and displaying the set of products to a user in a ranked order based on the attribute weights and the user attribute weights assigned to the user.

    SYSTEMS AND METHODS OF ONTOLOGICAL MACHINE LEARNING FOR LABELING PRODUCTS IN AN ELECTRONIC PRODUCT CATALOG

    公开(公告)号:US20210049664A1

    公开(公告)日:2021-02-18

    申请号:US16707441

    申请日:2019-12-09

    Abstract: Systems and methods are provided for receiving, at a server, a selection of an anchor product from an electronic catalog stored in at least one storage device communicatively coupled to the server, and vectorizing at least one of text and images associated with the selected anchor product and other products in the catalog. At least one of key words may be determined from text data and key images from image data for each product of the catalog. Vectors may be formed from at least one of the keywords and key images, and concatenating the separate vectors together to form final vectors for the products. A similarity search may be performed using the final vectors to determine a group of similar products from the vectorized products of the catalog. Selected products that are within a same slot as the anchor product may be labelled in batch.

    Exceeding the limits of visual-linguistic multi-task learning

    公开(公告)号:US11915471B2

    公开(公告)日:2024-02-27

    申请号:US17485985

    申请日:2021-09-27

    CPC classification number: G06V10/811 G06V10/776 G06V30/194

    Abstract: Methods, computer readable media, and devices for exceeding the limits of visual-linguistic multi-task learning are disclosed. One method may include identifying a multi-modal multi-task classification dataset including a plurality of data examples, creating a transformer machine learning model to predict a plurality of categorical attributes of a product, and training the transformer machine learning model based on the multi-modal multi-task classification dataset using an alpha decay schedule and dynamically allocating task-specific parameters for at least one of the plurality of task-specific classification heads based on task complexity.

    Method and system utilizing ontological machine learning for labeling products in an electronic product catalog

    公开(公告)号:US11361362B2

    公开(公告)日:2022-06-14

    申请号:US16707441

    申请日:2019-12-09

    Abstract: Systems and methods are provided for receiving, at a server, a selection of an anchor product from an electronic catalog stored in at least one storage device communicatively coupled to the server, and vectorizing at least one of text and images associated with the selected anchor product and other products in the catalog. At least one of key words may be determined from text data and key images from image data for each product of the catalog. Vectors may be formed from at least one of the keywords and key images, and concatenating the separate vectors together to form final vectors for the products. A similarity search may be performed using the final vectors to determine a group of similar products from the vectorized products of the catalog. Selected products that are within a same slot as the anchor product may be labelled in batch.

    SYSTEMS AND METHODS OF DATA AUGMENTATION FOR PRE-TRAINED EMBEDDINGS

    公开(公告)号:US20210141995A1

    公开(公告)日:2021-05-13

    申请号:US16827830

    申请日:2020-03-24

    Abstract: Systems and methods are provided for generating textual embeddings by tokenizing text data and generating vectors to be provided to a transformer system, where the textual embeddings are vector representations of semantic meanings of text that is part of the text data. The vectors may be averaged for every token of the generated textual embeddings and concatenating average output activations of two layers of the transformer system. Image embeddings may be generated with a convolutional neural network (CNN) from image data, wherein the image embeddings are vector representations of the images that are part of the image data. The textual embeddings and image embeddings may be combined to form combined embeddings to be provided to the transformer system.

Patent Agency Ranking