-
公开(公告)号:US20240095541A1
公开(公告)日:2024-03-21
申请号:US17946409
申请日:2022-09-16
Applicant: Apple Inc.
Inventor: Sayyed Karen Khatamifard , Thomas G. Anderl , Alexander J. Kirchhoff , Keith Wyss , Dylan H. Rush , Chenfan Sun , Jeffrey D Marker
Abstract: Embodiments relate to compiling neural network operations into tasks that may be performed in a streaming manner by a neural processor. In a streaming operation, a tensor is spatially partitioned, and tasks associated two or more layers of the neural network are performed simultaneously in an overlapping manner. To enable efficient memory usage during streaming operation, a subset of the tasks having completion times close in time are assigned to a same portion of memory in the neural processor during a compilation process. After the tasks assigned to the same portion of the memory is finished, the portion of the memory may be flushed to make space for subsequent tasks. Multiple tasks may also be coalesced into a single task to reduce the number of tasks and more efficiently perform the operations at the neural processor.
-
公开(公告)号:US20230368008A1
公开(公告)日:2023-11-16
申请号:US17745032
申请日:2022-05-16
Applicant: Apple Inc.
Inventor: Sayyed Karen Khatamifard , Alexander J. Kirchhoff , Rohit K. Gupta , Jeffrey D. Marker , Thomas G. Anderl , Saman Naderiparizi , Chenfan Sun , Alon Yaakov , Husam Khashiboun , Ramana V. Rachakonda
IPC: G06N3/063
CPC classification number: G06N3/063
Abstract: Embodiments relate to streaming operations in a neural processor circuit that includes a neural engine circuit and a data processor circuit. The neural engine circuit performs first operations on a first input tensor of a first layer to generate a first output tensor, and second operations on a second input tensor of a second layer at a higher hierarchy than the first layer, the second input tensor corresponding to the first output tensor. The data processor circuit stores a portion of the first input tensor for access by the neural engine circuit to perform a subset of the first operations and generate a portion of the first output tensor. The data processor circuit stores the portion of the first output tensor for access by the neural engine circuit as a portion of the second input tensor to perform a subset of the second operations.
-