MEMORY-EFFICIENT DIFFERENTIABLE WEIGHT CLUSTERING FOR LARGE LANGUAGE MODEL COMPRESSION

    公开(公告)号:US20250037018A1

    公开(公告)日:2025-01-30

    申请号:US18658919

    申请日:2024-05-08

    Applicant: Apple Inc.

    Abstract: The subject technology provides memory-efficient differentiable weight clustering for large language model compression. An apparatus determines a tensor including an attention map between learned weights of a trained machine learning model and corresponding centroids. The apparatus also determines a compressed attention table and a plurality of index lists during compression of the trained machine learning model based on an uniquification of the attention map and sharding of an associated index list. The apparatus determines whether the tensor exists at a destination device during compression of the trained machine learning model using a marshaling layer. The apparatus refrains from copying the tensor to the destination device when the tensor exists at the destination device, or copies the tensor to the destination device when the tensor does not exist at the destination device. The apparatus deploys a compressed machine learning model based on the compression of the trained machine learning model.

    SELECTIVELY USING SENSORS FOR CONTEXTUAL DATA

    公开(公告)号:US20230199297A1

    公开(公告)日:2023-06-22

    申请号:US18112371

    申请日:2023-02-21

    Applicant: Apple Inc.

    CPC classification number: H04N23/631 G10L15/1815 G06F3/013 H04N23/633

    Abstract: Systems and processes for operating a digital assistant are provided. An example process for determining a response includes, at an electronic device having one or more processors and memory, receiving a spoken input including a request, performing a semantic analysis on the spoken input, determining, based on the semantic analysis, a likelihood that the electronic device requires additional contextual data to satisfy the request, and in accordance with the determined likelihood exceeding a threshold, enabling a camera of the electronic device and determining a response to the request based on data captured by the camera of the electronic device.

    DIGITAL ASSISTANT REFERENCE RESOLUTION

    公开(公告)号:US20230046337A1

    公开(公告)日:2023-02-16

    申请号:US17402328

    申请日:2021-08-13

    Applicant: Apple Inc.

    Abstract: Systems and processes for operating a digital assistant are provided. An example process for performing a task includes, at an electronic device having one or more processors and memory, receiving a spoken input including a request, receiving an image input including a plurality of objects, selecting a reference resolution module of a plurality of reference resolution modules based on the request and the image input, determining, with the selected reference resolution module, whether the request references a first object of the plurality of objects based on at least the spoken input, and in accordance with a determination that the request references the first object of the plurality of objects, determining a response to the request including information about the first object.

Patent Agency Ranking