EFFICIENT ON-DEVICE TRANSFORMER ARCHITECTURE FOR IMAGE PROCESSING

    公开(公告)号:US20240414448A1

    公开(公告)日:2024-12-12

    申请号:US18515732

    申请日:2023-11-21

    Abstract: Provided is a U-shaped network for image restoration. The U-shaped network is lightweight based on a transformer block and is suitable to be deployed on-device, such as in a smartphone. The U-shaped network uses the transformer block to implement encoder, decoder and bottleneck functions. Decoders are connected to respective encoders using skip connections based on element-wise addition, which avoids dimension expansion of concatenation. The transformer block uses a configuration of scaling and pool mixing to process input image data without the need for self-attention computations which permits reduction in memory, reduction in latency, reduction in computational demand, all while maintaining good output image quality.

    METHOD AND SYSTEM FOR WEIGHTED KNOWLEDGE DISTILLATION BETWEEN NEURAL NETWORK MODELS

    公开(公告)号:US20220398459A1

    公开(公告)日:2022-12-15

    申请号:US17835457

    申请日:2022-06-08

    Abstract: A method of training a student model includes providing an input to a teacher model that is larger than the student model, where a layer of the teacher model outputs a first output vector, providing the input to the student model, where a layer of the student model outputs a second output vector, determining an importance value associated with each dimension of the first output vector based on gradients from the teacher model and updating at least one parameter of the student model to minimize a difference between the second output vector and the first output vector based on the importance values.

    ENABLING DEVICE CONTROL PLANNING CAPABILITIES OF SMALL LANGUAGE MODEL

    公开(公告)号:US20250094820A1

    公开(公告)日:2025-03-20

    申请号:US18824166

    申请日:2024-09-04

    Abstract: A method for enabling an improved device control capability of a small language model (SLM) transferrable to a hub device configured to be operable by a user in an environment, is disclosed. The method includes performing a fine-tuning the SLM based on a data set including base plans and contrastive plans; generating computer codes corresponding to the fine-tuned SLM; and transferring the generated computer codes to the hub device to be connected with a group of the electronic devices in the environment.

    SEMI-SUPERVISED AND ROBUST MULTISPECTRAL VIDEO SEMANTIC SEGMENTATION SYSTEM

    公开(公告)号:US20250166339A1

    公开(公告)日:2025-05-22

    申请号:US18955003

    申请日:2024-11-21

    Abstract: A method includes generating a pair of features of a current frame by inputting the current frame into a first cross-collaborative consistency learning (C3L) model, the current frame comprising a red-green-blue (RGB) image and a thermal image; generating a pair of denoised features by inputting the of pair of features of the current frame and one or more pairs of features of past frames into a denoised memory read (DMR) model; generating an updated pair of denoised features by inputting the pair of denoised features into a second C3L model, the updated pair of denoised features comprising an updated RGB image feature and an updated thermal feature; and generating a segmentation mask by inputting the updated pair of features into a segmentation head.

    APPARATUS AND METHOD FOR SHARING AND PRUNING WEIGHTS FOR VISION AND LANGUAGE MODELS

    公开(公告)号:US20240119077A1

    公开(公告)日:2024-04-11

    申请号:US18368353

    申请日:2023-09-14

    CPC classification number: G06F16/334 G06F16/5846 G06N3/0985

    Abstract: A method of performing a multimodal tasks by using a multimodal model that includes a text encoder and a vision encoder, may include obtaining a text feature from the query via the text encoder; obtaining an image feature from the one or more input images via the vision encoder; and outputting a response to the query based on similarity between the text feature and the image feature, wherein weights vectors of the text encoder and the vision encoder are pruned and shared according to a sharing vector and a pruning vector that are generated by a hypernetwork, and wherein the hypernetwork and the multimodal model are jointly trained to minimize at least one of a difference between the weight vectors in the text encoder and the vision encoder, a difference between the weight vectors in different layers of the text encoder, and a number of parameters in the multimodal model.

Patent Agency Ranking