-
公开(公告)号:US20250094775A1
公开(公告)日:2025-03-20
申请号:US18468574
申请日:2023-09-15
Applicant: QUALCOMM Incorporated
Inventor: Shaojie ZHUO , Ramchalam KINATTINKARA RAMAKRISHNAN , Xiaopeng ZHANG , Yicheng LIN , Chenzheng SU , Liang SHEN
IPC: G06N3/0455
Abstract: Cached decoding systems and techniques are described. A system (e.g., decoder) receives an input token (e.g., input vector). The system applies a projection tensor (e.g., a projection matrix) to the input token to generate a feature tensor (e.g., a key tensor or a value tensor). The system processes at least the feature tensor and at least one previous feature tensor using at least one attention calculation to generate an output token. The at least one previous feature tensor is retrieved from a buffer. The at least one previous feature tensor can be stored in the buffer after having been previously calculated based on application of the projection tensor to a previous input token (e.g., from a previous iteration before the iteration in which the input token is received).
-
公开(公告)号:US20240152726A1
公开(公告)日:2024-05-09
申请号:US18363487
申请日:2023-08-01
Applicant: QUALCOMM Incorporated
Inventor: Chen FENG , Xiaopeng ZHANG , Shaojie ZHUO , Ramchalam KINATTINKARA RAMAKRISHNAN , Chenzheng SU , Liang SHEN , Zi Wen HAN , Yicheng LIN
Abstract: A processor-implemented method for a neural architecture search (NAS) starts by generating an over-parameterized super network having multiple layers. The super network has multiple operator types. Each of the layers includes a largest super kernel corresponding to a search space. The method also includes performing gradient descent to evolve a largest super kernel to a small kernel corresponding to the search space in order to generate a range of kernel encodings. The method further includes identifying a subset of kernel encodings from the range of kernel encodings, for each layer of the super network, based on the gradient descent. The method determines a set of candidate architectures based on the subset of kernel encodings, each of the candidate architectures having a different model size. The method selects a target model, from the set of architectures, based on meeting hardware specifications, and then applies the target model.
-