-
公开(公告)号:US20240329933A1
公开(公告)日:2024-10-03
申请号:US18127650
申请日:2023-03-28
Applicant: Apple Inc.
Inventor: Lei Wang , Jaewon Shin , Seungjin Lee , Ji Liang Song , Michael L. Liu , Christopher L. Mills
CPC classification number: G06F7/5443 , G06N3/063
Abstract: Embodiments of the present disclosure relate to a multiply-accumulator circuit that includes a main multiplier circuit operable in a floating-point mode or an integer mode and a supplemental multiplier circuit that operates in the integer mode. The main multiplier circuit generates a multiplied output that undergoes subsequent operations including a shifting operation in the floating-point mode whereas the supplemental multiplier generates another multiplied output that does not undergo any shifting operations. Hence, in the integer mode, two parallel multiply-add operations may be performed by the two multiplier circuits, and therefore accelerate the multiply-adder operations. Due to the lack of additional shifters associated with the supplemental multiplier circuit, the multiply-accumulator circuit does not have a significantly increased footprint.
-
公开(公告)号:US12079724B2
公开(公告)日:2024-09-03
申请号:US18484203
申请日:2023-10-10
Applicant: Apple Inc.
Inventor: Christopher L. Mills
IPC: G06N3/08
CPC classification number: G06N3/08
Abstract: Embodiments of the present disclosure relate to a texture unit circuit in a neural processor circuit. The neural processor circuit includes a tensor access operation circuit with the texture unit circuit, a data processor circuit, and at least one neural engine circuit. The texture unit circuit fetches a source tensor from a system memory by referencing an index tensor in the system memory representing indexing information into the source tensor. The data processor circuit stores an output version of the source tensor obtained from the tensor access operation circuit and sends the output version of the source tensor as multiple of units of input data to the at least one neural engine circuit. The at least one neural engine circuit performs at least convolution operations on the units of input data and at least one kernel to generate output data.
-
公开(公告)号:US20230169308A1
公开(公告)日:2023-06-01
申请号:US18095960
申请日:2023-01-11
Applicant: Apple Inc.
Inventor: Christopher L. Mills
Abstract: Embodiments relate to a neural engine circuit that includes an input buffer circuit, a kernel extract circuit, and a multiply-accumulator (MAC) circuit. The MAC circuit receives input data from the input buffer circuit and a kernel coefficient from the kernel extract circuit. The MAC circuit contains several multiply-add (MAD) circuits and accumulators used to perform neural networking operations on the received input data and kernel coefficients. MAD circuits are configured to support fixed-point precision (e.g., INT8) and floating-point precision (FP16) of operands. In floating-point mode, each MAD circuit multiplies the integer bits of input data and kernel coefficients and adds their exponent bits to determine a binary point for alignment. In fixed-point mode, input data and kernel coefficients are multiplied. In both operation modes, the output data is stored in an accumulator, and may be sent back as accumulated values for further multiply-add operations in subsequent processing cycles.
-
公开(公告)号:US11513799B2
公开(公告)日:2022-11-29
申请号:US16673499
申请日:2019-11-04
Applicant: Apple Inc.
Inventor: Christopher L. Mills
Abstract: Embodiments of the present disclosure relate to chained buffers in a neural processor circuit. The neural processor circuit includes multiple neural engines, a planar engine, a buffer memory, and a flow control circuit. At least one neural engine operates as a first producer of first data or a first consumer of second data. The planar engine operates as a second consumer receiving the first data from the first producer or a second producer sending the second data to the first consumer. Data flow between the at least one neural engine and the planar engine is controlled using at least a subset of buffers in the buffer memory operating as at least one chained buffer that chains flow of the first data and the second data between the at least one neural engine and the planar engine.
-
公开(公告)号:US20220237438A1
公开(公告)日:2022-07-28
申请号:US17155878
申请日:2021-01-22
Applicant: Apple Inc.
Inventor: Christopher L. Mills , Kenneth W. Waters
Abstract: A neural processor includes neural engines for performing convolution operations on input data corresponding to one or more tasks to generate output data. The neural processor also includes a data processor circuit coupled to external system memory. The data processor circuit includes a buffer for storing the output data from the neural engines. The neural processor further includes a task manager coupled to the data processor circuit. The task manager receives a context-switch task. The context-switch task specifies a switch of the data processor circuit from handling an outgoing task to an incoming task. The task manager sends configuration data of the context-switch task to cause the data processor circuit to transmit the output data corresponding to the outgoing task from the buffer to the external system memory. The data processor circuit also fetches data corresponding to the incoming task from the external system memory to the buffer.
-
公开(公告)号:US20220222510A1
公开(公告)日:2022-07-14
申请号:US17148432
申请日:2021-01-13
Applicant: Apple Inc.
Inventor: Christopher L. Mills , Sung Hee Park
Abstract: Embodiments relate to a neural engine circuit of a neural network processor circuit that performs a convolution operation on input data in a first mode and a parallel sorting operation on input data in a second mode. The neural engine circuit includes a plurality of operation circuits and an accumulator circuit coupled to the plurality of operation circuits. The plurality of operation circuits receives input data. In the first mode, the plurality of operation circuits performs multiply-add operations of a convolution on the input data using a kernel. In the second mode, the plurality of operation circuits performs a portion of a parallel sorting operation on the input data. In the first mode, the accumulator circuit receives and stores first results of the multiply-add operations. In the second mode, the accumulator circuit receives and stores second results of the parallel sorting operation.
-
公开(公告)号:US20220222509A1
公开(公告)日:2022-07-14
申请号:US17148358
申请日:2021-01-13
Applicant: Apple Inc.
Inventor: Christopher L. Mills
Abstract: A neural processor includes one or more neural engine circuits for performing convolution operations on input data corresponding to one or more tasks to generate output data. The neural engine circuits process the input data having a power-of-two (P2) shape. The neural processor circuit also includes a data processor circuit. The data processor circuit fetches source data having a non-power-of-two (NP2) shape. The source data may correspond to data of a machine learning model. The data processor circuit also reshapes the source data to generate reshaped source data with the P2 shape. The data processor circuit further sends the reshaped source data to the one or more neural engine circuits as the input data for performing convolution operations. In some cases, the data processor circuit may also perform padding on the source data before the source data is reshaped to the P2 shape.
-
公开(公告)号:US20220138553A1
公开(公告)日:2022-05-05
申请号:US17086023
申请日:2020-10-30
Applicant: Apple Inc.
Inventor: Christopher L. Mills
IPC: G06N3/08
Abstract: Embodiments of the present disclosure relate to a texture unit circuit in a neural processor circuit. The neural processor circuit includes a tensor access operation circuit with the texture unit circuit, a data processor circuit, and at least one neural engine circuit. The texture unit circuit fetches a source tensor from a system memory by referencing an index tensor in the system memory representing indexing information into the source tensor. The data processor circuit stores an output version of the source tensor obtained from the tensor access operation circuit and sends the output version of the source tensor as multiple of units of input data to the at least one neural engine circuit. The at least one neural engine circuit performs at least convolution operations on the units of input data and at least one kernel to generate output data.
-
公开(公告)号:US20190340501A1
公开(公告)日:2019-11-07
申请号:US15971332
申请日:2018-05-04
Applicant: Apple Inc.
Inventor: Christopher L. Mills
Abstract: Embodiments of the present disclosure relate to splitting input data into smaller units for loading into a data buffer and neural engines in a neural processor circuit for performing neural network operations. The input data of a large size is split into slices and each slice is again split into tiles. The tile is uploaded from an external source to a data buffer inside the neural processor circuit but outside the neural engines. Each tile is again split into work units sized for storing in an input buffer circuit inside each neural engine. The input data stored in the data buffer and the input buffer circuit is reused by the neural engines to reduce re-fetching of input data. Operations of splitting the input data are performed at various components of the neural processor circuit under the management of rasterizers provided in these components.
-
公开(公告)号:US09911174B2
公开(公告)日:2018-03-06
申请号:US14836915
申请日:2015-08-26
Applicant: Apple Inc.
Inventor: Suk Hwan Lim , Christopher L. Mills , D. Amnon Silverstein , David R. Pope , Sheng Lin
CPC classification number: G06T1/20 , G06T1/60 , G06T3/4015 , H04N5/23229 , H04N5/3765 , H04N5/917 , H04N9/045
Abstract: An image processing pipeline may process image data at multiple rates. A stream of raw pixel data collected from an image sensor for an image frame may be processed through one or more pipeline stages of an image signal processor. The stream of raw pixel data may then be converted into a full-color domain and scaled to a data size that is less than an initial data size for the image frame. The converted pixel data may be processed through one or more other pipelines stages and output for storage, further processing, or display. In some embodiments, a back-end interface may be implemented as part of the image signal processor via which image data collected from sources other than the image sensor may be received and processed through various pipeline stages at the image signal processor.
-
-
-
-
-
-
-
-
-