Programmable interface to in-memory cache processor

    公开(公告)号:US10705967B2

    公开(公告)日:2020-07-07

    申请号:US16160270

    申请日:2018-10-15

    Abstract: The present disclosure is directed to systems and methods of implementing a neural network using in-memory mathematical operations performed by pipelined SRAM architecture (PISA) circuitry disposed in on-chip processor memory circuitry. A high-level compiler may be provided to compile data representative of a multi-layer neural network model and one or more neural network data inputs from a first high-level programming language to an intermediate domain-specific language (DSL). A low-level compiler may be provided to compile the representative data from the intermediate DSL to multiple instruction sets in accordance with an instruction set architecture (ISA), such that each of the multiple instruction sets corresponds to a single respective layer of the multi-layer neural network model. Each of the multiple instruction sets may be assigned to a respective SRAM array of the PISA circuitry for in-memory execution. Thus, the systems and methods described herein beneficially leverage the on-chip processor memory circuitry to perform a relatively large number of in-memory vector/tensor calculations in furtherance of neural network processing without burdening the processor circuitry.

    In-memory analog neural cache
    3.
    发明授权

    公开(公告)号:US11502696B2

    公开(公告)日:2022-11-15

    申请号:US16160800

    申请日:2018-10-15

    Abstract: Embodiments are directed to systems and methods of implementing an analog neural network using a pipelined SRAM architecture (“PISA”) circuitry disposed in on-chip processor memory circuitry. The on-chip processor memory circuitry may include processor last level cache (LLC) circuitry. One or more physical parameters, such as a stored charge or voltage, may be used to permit the generation of an in-memory analog output using a SRAM array. The generation of an in-memory analog output using only word-line and bit-line capabilities beneficially increases the computational density of the PISA circuit without increasing power requirements. Thus, the systems and methods described herein beneficially leverage the existing capabilities of on-chip SRAM processor memory circuitry to perform a relatively large number of analog vector/tensor calculations associated with execution of a neural network, such as a recurrent neural network, without burdening the processor circuitry and without significant impact to the processor power requirements.

    HARDWARE APPARATUSES AND METHODS RELATING TO ELEMENTAL REGISTER ACCESSES

    公开(公告)号:US20190138305A1

    公开(公告)日:2019-05-09

    申请号:US16003555

    申请日:2018-06-08

    CPC classification number: G06F9/30036

    Abstract: Methods and apparatuses relating to a vector instruction with a register operand with an elemental offset are described. In one embodiment, a hardware processor includes a decode unit to decode a vector instruction with a register operand with an elemental offset to access a first number of elements in a register specified by the register operand, wherein the first number is a total number of elements in the register minus the elemental offset, access a second number of elements in a next logical register, wherein the second number is the elemental offset, and combine the first number of elements and the second number of elements as a data vector, and an execution unit to execute the vector instruction on the data vector.

    Weight prefetch for in-memory neural network execution

    公开(公告)号:US11347994B2

    公开(公告)日:2022-05-31

    申请号:US16160466

    申请日:2018-10-15

    Abstract: The present disclosure is directed to systems and methods of bit-serial, in-memory, execution of at least an nth layer of a multi-layer neural network in a first on-chip processor memory circuitry portion contemporaneous with prefetching and storing layer weights associated with the (n+1)st layer of the multi-layer neural network in a second on-chip processor memory circuitry portion. The storage of layer weights in on-chip processor memory circuitry beneficially decreases the time required to transfer the layer weights upon execution of the (n+1)st layer of the multi-layer neural network by the first on-chip processor memory circuitry portion. In addition, the on-chip processor memory circuitry may include a third on-chip processor memory circuitry portion used to store intermediate and/or final input/output values associated with one or more layers included in the multi-layer neural network.

    HARDWARE APPARATUSES AND METHODS RELATING TO ELEMENTAL REGISTER ACCESSES
    7.
    发明申请
    HARDWARE APPARATUSES AND METHODS RELATING TO ELEMENTAL REGISTER ACCESSES 有权
    硬件设备和与元件寄存器访问相关的方法

    公开(公告)号:US20160188334A1

    公开(公告)日:2016-06-30

    申请号:US14582784

    申请日:2014-12-24

    CPC classification number: G06F9/30036

    Abstract: Methods and apparatuses relating to a vector instruction with a register operand with an elemental offset are described. In one embodiment, a hardware processor includes a decode unit to decode a vector instruction with a register operand with an elemental offset to access a first number of elements in a register specified by the register operand, wherein the first number is a total number of elements in the register minus the elemental offset, access a second number of elements in a next logical register, wherein the second number is the elemental offset, and combine the first number of elements and the second number of elements as a data vector, and an execution unit to execute the vector instruction on the data vector.

    Abstract translation: 描述与具有具有基本偏移的寄存器操作数的向量指令相关的方法和装置。 在一个实施例中,硬件处理器包括解码单元,用于对具有基本偏移量的寄存器操作数解码向量指令,以访问由寄存器操作数指定的寄存器中的第一数量的元素,其中第一个数字是元素的总数 在所述寄存器中减去所述元素偏移量,访问下一逻辑寄存器中的第二数量的元素,其中所述第二数量是所述元素偏移量,并且将所述第一数量的元素和所述第二数量的元素组合为数据向量,以及执行 单元来执行数据向量的向量指令。

    Programmable interface to in-memory cache processor

    公开(公告)号:US11151046B2

    公开(公告)日:2021-10-19

    申请号:US16921685

    申请日:2020-07-06

    Abstract: The present disclosure is directed to systems and methods of implementing a neural network using in-memory mathematical operations performed by pipelined SRAM architecture (PISA) circuitry disposed in on-chip processor memory circuitry. A high-level compiler may be provided to compile data representative of a multi-layer neural network model and one or more neural network data inputs from a first high-level programming language to an intermediate domain-specific language (DSL). A low-level compiler may be provided to compile the representative data from the intermediate DSL to multiple instruction sets in accordance with an instruction set architecture (ISA), such that each of the multiple instruction sets corresponds to a single respective layer of the multi-layer neural network model. Each of the multiple instruction sets may be assigned to a respective SRAM array of the PISA circuitry for in-memory execution. Thus, the systems and methods described herein beneficially leverage the on-chip processor memory circuitry to perform a relatively large number of in-memory vector/tensor calculations in furtherance of neural network processing without burdening the processor circuitry.

    PROGRAMMABLE INTERFACE TO IN-MEMORY CACHE PROCESSOR

    公开(公告)号:US20200334161A1

    公开(公告)日:2020-10-22

    申请号:US16921685

    申请日:2020-07-06

    Abstract: The present disclosure is directed to systems and methods of implementing a neural network using in-memory mathematical operations performed by pipelined SRAM architecture (PISA) circuitry disposed in on-chip processor memory circuitry. A high-level compiler may be provided to compile data representative of a multi-layer neural network model and one or more neural network data inputs from a first high-level programming language to an intermediate domain-specific language (DSL). A low-level compiler may be provided to compile the representative data from the intermediate DSL to multiple instruction sets in accordance with an instruction set architecture (ISA), such that each of the multiple instruction sets corresponds to a single respective layer of the multi-layer neural network model. Each of the multiple instruction sets may be assigned to a respective SRAM array of the PISA circuitry for in-memory execution. Thus, the systems and methods described herein beneficially leverage the on-chip processor memory circuitry to perform a relatively large number of in-memory vector/tensor calculations in furtherance of neural network processing without burdening the processor circuitry.

Patent Agency Ranking