SYSTEMS AND METHODS FOR STATIC CACHED DECODING

    公开(公告)号:US20250094775A1

    公开(公告)日:2025-03-20

    申请号:US18468574

    申请日:2023-09-15

    Abstract: Cached decoding systems and techniques are described. A system (e.g., decoder) receives an input token (e.g., input vector). The system applies a projection tensor (e.g., a projection matrix) to the input token to generate a feature tensor (e.g., a key tensor or a value tensor). The system processes at least the feature tensor and at least one previous feature tensor using at least one attention calculation to generate an output token. The at least one previous feature tensor is retrieved from a buffer. The at least one previous feature tensor can be stored in the buffer after having been previously calculated based on application of the projection tensor to a previous input token (e.g., from a previous iteration before the iteration in which the input token is received).

    SINGLE SEARCH FOR ARCHITECTURES ON EMBEDDED DEVICES

    公开(公告)号:US20240152726A1

    公开(公告)日:2024-05-09

    申请号:US18363487

    申请日:2023-08-01

    CPC classification number: G06N3/04 G06N3/084

    Abstract: A processor-implemented method for a neural architecture search (NAS) starts by generating an over-parameterized super network having multiple layers. The super network has multiple operator types. Each of the layers includes a largest super kernel corresponding to a search space. The method also includes performing gradient descent to evolve a largest super kernel to a small kernel corresponding to the search space in order to generate a range of kernel encodings. The method further includes identifying a subset of kernel encodings from the range of kernel encodings, for each layer of the super network, based on the gradient descent. The method determines a set of candidate architectures based on the subset of kernel encodings, each of the candidate architectures having a different model size. The method selects a target model, from the set of architectures, based on meeting hardware specifications, and then applies the target model.

Patent Agency Ranking