SCALABLE KNOWLEDGE DISTILLATION TECHNIQUES FOR MACHINE LEARNING

    公开(公告)号:US20230401831A1

    公开(公告)日:2023-12-14

    申请号:US17837636

    申请日:2022-06-10

    CPC classification number: G06V10/776 G06V10/7747 G06V10/82 G06N3/0454

    Abstract: A data processing system implements a dynamic knowledge distillation process including dividing training data into a plurality of batches of samples and distilling a student model from a teacher model using an iterative knowledge distillation. The process includes instantiating an instance of the teacher model and the student model in a memory of the data processing system and obtaining a respective batch of training data from the plurality of batches of samples in the memory. The process includes training the teacher and student models using each of the samples in the respective batch of the training data, evaluating the performance of the student model compared with the performance of the teacher model, and providing feedback to student model to adjust the behavior of the student model based on the performance of the student model.

    SCALABLE RETRIEVAL SYSTEM FOR SUGGESTING TEXTUAL CONTENT

    公开(公告)号:US20230161825A1

    公开(公告)日:2023-05-25

    申请号:US17530982

    申请日:2021-11-19

    CPC classification number: G06F16/953 G06N20/00

    Abstract: A data processing system implements receiving query text for a search query for textual content recommendation. The query text includes one or more words indicating a type of textual content items being sought. The system implements analyzing the query text using a first machine learning (ML) model to obtain encoded query text, where the first ML model is trained to identify features within the query text and to generate the encoded query text by mapping the features to a hyper-dimensional latent space (HDLS). The system implements identifying one or more content items in a database of encoded content items mapped to the HDLS that satisfy the search query by comparing attributes of the encoded query text with attributes of the encoded content items to identify content items that are closest to the encoded query text within the HDLS, and causing the one or more content items to be displayed.

    Machine learning-powered framework to transform overloaded text documents

    公开(公告)号:US11423207B1

    公开(公告)日:2022-08-23

    申请号:US17355673

    申请日:2021-06-23

    Inventor: Ji Li

    Abstract: Systems and methods for providing a machine learning-powered framework to transform overloaded text documents is provided. The system generates a plurality of candidate templates offline. During runtime, the system accesses a text document and analyzes the text document to identify segmentation data. The segmentation data can indicate a plurality of segments derived from the text document. The system then accesses a plurality of candidate templates, whereby each candidate template comprises a plurality of pages having a different background element that shares a common theme. The plurality of candidate templates are ranked based on at least the segmentation data. The network then generates multiple presentation pages for each of a predetermined number of top ranked candidate templates by incorporating each of the plurality of segments into a corresponding page of the plurality of pages for each of the top ranked candidate templates. The multiple presentation pages are presented for each of the top ranked candidate templates as a recommendation.

    IMAGE CLASSIFICATION MODELING WHILE MAINTAINING DATA PRIVACY COMPLIANCE

    公开(公告)号:US20200265153A1

    公开(公告)日:2020-08-20

    申请号:US16276908

    申请日:2019-02-15

    Abstract: The present disclosure relates to processing operations that execute image classification training for domain-specific traffic, where training operations are entirely compliant with data privacy regulations and policies. Image classification model training, as described herein, is configured to classify meaningful image categories in domain-specific scenarios where there is unknown data traffic and strict data compliance requirements that result in privacy-limited image data sets. Iterative image classification training satisfies data compliance requirements through a combination of online image classification training and offline image classification training. This results in tuned image recognition classifiers that have improved accuracy and efficiency over general image recognition classifiers when working with domain-specific data traffic. One or more image recognition classifiers are independently trained and tuned to detect an image class for image classification. Training of independent image recognition classifiers is also utilized for training and tuning of deeper learning models for image classification.

Patent Agency Ranking