RECONFIGURABLE, STREAMING-BASED CLUSTERS OF PROCESSING ELEMENTS, AND MULTI-MODAL USE THEREOF

    公开(公告)号:US20240281397A1

    公开(公告)日:2024-08-22

    申请号:US18192631

    申请日:2023-03-29

    CPC classification number: G06F13/4022 G06F13/1668

    Abstract: A hardware accelerator includes processing elements of a neural network, each processing element having a memory; a stream switch; stream engines coupled to functional circuits via the stream switch, wherein the stream engines, in operation, generate data streaming requests to stream data to and from functional circuits of the plurality of functional circuits; a first system bus interface coupled to the stream engines; a second system bus interface coupled to the processing elements; and mode control circuitry, which, in operation, sets respective modes of operation for the plurality of processing elements. The modes of operation include: a compute mode of operation in which the processing element performs computing operations using the memory associated with the processing element; and a memory mode of operation in which the memory associated with the processing element performs memory operations, bypassing the stream switch, via the second system bus interface.

    RECONFIGURABLE HARDWARE BUFFER IN A NEURAL NETWORKS ACCELERATOR FRAMEWORK

    公开(公告)号:US20220101086A1

    公开(公告)日:2022-03-31

    申请号:US17039653

    申请日:2020-09-30

    Abstract: A convolutional accelerator framework (CAF) has a plurality of processing circuits including one or more convolution accelerators, a reconfigurable hardware buffer configurable to store data of a variable number of input data channels, and a stream switch coupled to the plurality of processing circuits. The reconfigurable hardware buffer has a memory and control circuitry. A number of the variable number of input data channels is associated with an execution epoch. The stream switch streams data of the variable number of input data channels between processing circuits of the plurality of processing circuits and the reconfigurable hardware buffer during processing of the execution epoch. The control circuitry of the reconfigurable hardware buffer configures the memory to store data of the variable number of input data channels, the configuring including allocating a portion of the memory to each of the variable number of input data channels.

    DATA VOLUME SCULPTOR FOR DEEP LEARNING ACCELERATION

    公开(公告)号:US20210192833A1

    公开(公告)日:2021-06-24

    申请号:US17194055

    申请日:2021-03-05

    Abstract: A device include on-board memory, an applications processor, a digital signal processor (DSP) cluster, a configurable accelerator framework (CAF), and at least one communication bus architecture. The communication bus communicatively couples the applications processor, the DSP cluster, and the CAF to the on-board memory. The CAF includes a reconfigurable stream switch and data volume sculpting circuitry, which has an input and an output coupled to the reconfigurable stream switch. The data volume sculpting circuitry receives a series of frames, each frame formed as a two dimensional (2D) data structure, and determines a first dimension and a second dimension of each frame of the series of frames. Based on the first and second dimensions, the data volume sculpting circuitry determines for each frame a position and a size of a region-of-interest to be extracted from the respective frame, and extracts from each frame, data in the frame that is within the region-of-interest.

    HARDWARE ACCELERATOR METHOD, SYSTEM AND DEVICE

    公开(公告)号:US20200310761A1

    公开(公告)日:2020-10-01

    申请号:US16833340

    申请日:2020-03-27

    Abstract: A system includes an addressable memory array, one or more processing cores, and an accelerator framework coupled to the addressable memory. The accelerator framework includes a Multiply ACcumulate (MAC) hardware accelerator cluster. The MAC hardware accelerator cluster has a binary-to-residual converter, which, in operation, converts binary inputs to a residual number system. Converting a binary input to the residual number system includes a reduction modulo 2m and a reduction modulo 2m−1, where m is a positive integer. A plurality of MAC hardware accelerators perform modulo 2m multiply-and-accumulate operations and modulo 2m−1 multiply-and-accumulate operations using the converted binary input. A residual-to-binary converter generates a binary output based on the output of the MAC hardware accelerators.

Patent Agency Ranking