Dynamic processing element array expansion

    公开(公告)号:US11568238B2

    公开(公告)日:2023-01-31

    申请号:US16456414

    申请日:2019-06-28

    Abstract: A computer-implemented method includes receiving a neural network model that includes a tensor operation, and dividing the tensor operation into sub-operations. The sub-operations includes at least two sub-operations that have no data dependency between the two sub-operations. The computer-implemented method further includes assigning a first sub-operation in the two sub-operations to a first computing engine, assigning a second sub-operation in the two sub-operations to a second computing engine, and generating instructions for performing, in parallel, the first sub-operation by the first computing engine and the second sub-operation by the second computing engine. An inference is then made based on a result of the first sub-operation, a result of the second sub-operation, or both. The first computing engine and the second computing engine are in a same integrated circuit device or in two different integrated circuit devices.

    Target port with distributed transactions

    公开(公告)号:US11138106B1

    公开(公告)日:2021-10-05

    申请号:US16836780

    申请日:2020-03-31

    Abstract: Provided are integrated circuit devices and methods for operating integrated circuit devices. In various examples, the integrated circuit device can include a target port operable to receive transactions from a master port. The target port can be configured with a multicast address range that is associated with a plurality of indices corresponding to memory banks of the device. When the target port receives a write transaction that has an address that is within the multicast address range, the target port can determine an index from the plurality of indices, and can use the index to determine a second address, which combines the index and the offset value with the address. The target port can then use the second address to write the data to the memory.

    Secure data processing
    53.
    发明授权

    公开(公告)号:US10956584B1

    公开(公告)日:2021-03-23

    申请号:US16141770

    申请日:2018-09-25

    Abstract: Systems and methods for performing neural network processing are provided. In one example, a system comprises a neural network processor comprising: a data decryption engine that receives encrypted data and decrypts the encrypted data, the encrypted data comprising at least one of: encrypted weights data, encrypted input data, or encrypted instruction data related to a neural network model; and a computing engine that receives the weights data and perform computations of neural network processing using the input data and the weights data and based on the instruction data.

    Assisted indirect memory addressing

    公开(公告)号:US10929063B1

    公开(公告)日:2021-02-23

    申请号:US16368538

    申请日:2019-03-28

    Abstract: Systems and methods for assisted indirect memory addressing are provided. Some computing systems move data between levels of a hierarchical memory system. To accommodate data movement for computing systems that do not natively support indirect addressing between levels of the memory hierarchy, a direct memory access (DMA) engine is used to fetch data. The DMA engine executes a first set of memory instructions that modify a second set of memory instructions to fetch data stored at one level of the memory hierarchy from dynamically computed indirect addresses stored in memory locations at another level of the memory hierarchy.

    Power reduction in processor pipeline by detecting zeros

    公开(公告)号:US10901492B1

    公开(公告)日:2021-01-26

    申请号:US16369696

    申请日:2019-03-29

    Abstract: Techniques are described for power reduction in a computer processor based on detection of whether data destined for input to an arithmetic logic unit (ALU) has a particular value. The data is written to a register prior to performing an arithmetic or logical operation using the data as an operand. Depending on a timing of when the data is supplied to the register, the determination is made before or after the data is written to the register, and a memory associated with the register is updated with a result of the determination. Contents of the memory are used to make a decision whether to allow the ALU to perform the arithmetic or logical operation. The memory can be implemented as a non-architectural register.

    Registers for restricted memory
    57.
    发明授权

    公开(公告)号:US10678479B1

    公开(公告)日:2020-06-09

    申请号:US16204943

    申请日:2018-11-29

    Abstract: Provided are integrated circuits and methods for operating integrated circuits. An integrated circuit can include a plurality of memory banks and an execution engine including a set of execution components. Each execution component can be associated with a respective memory bank, and can read from and write to only the respective memory bank. The integrated circuit can further include a set of registers each associated with a respective memory bank from the plurality of memory banks. The integrated circuit can further be operable to load to or store from the set of registers in parallel, and load to or store from the set of registers serially. A parallel operation followed by a serial operation enables data to be moved from many memory banks into one memory bank. A serial operation followed by a parallel operation enables data to be moved from one memory bank into many memory banks.

Patent Agency Ranking