Memory storage format for supporting machine learning acceleration

    公开(公告)号:US12165237B2

    公开(公告)日:2024-12-10

    申请号:US17946753

    申请日:2022-09-16

    Abstract: A processor-implemented method for a memory storage format to accelerate machine learning (ML) on a computing device is described. The method includes receiving an image in a first layer storage format of a neural network. The method also includes assigning addresses to image pixels of each of three channels of the first layer storage format for accessing the image pixels in a blocked ML storage acceleration format. The method further includes storing the image pixels in the blocked ML storage acceleration format according to the assigned addresses of the image pixels. The method also includes accelerating inference video processing of the image according to the assigned addresses for the image pixels corresponding to the blocked ML storage acceleration format.

    Providing self-resetting multi-producer multi-consumer semaphores in distributed processor-based systems

    公开(公告)号:US11144368B2

    公开(公告)日:2021-10-12

    申请号:US16443954

    申请日:2019-06-18

    Abstract: Providing self-resetting multi-producer multi-consumer semaphores in distributed processor-based systems is disclosed. In one aspect, a synchronization management circuit provides a semaphore including a counting semaphore value indicator, a current wait count indicator, and a target wait count indicator. When a consumer completes a wait operation, the synchronization management circuit adjusts the value of the current wait count indicator towards the value of the target wait count indicator, and compares the value of the current wait count indicator to the value of the target wait count indicator. If the value of the current wait count indicator has reached the value of the target wait count indicator, the synchronization management circuit infers that all consumers have observed the semaphore, and accordingly resets both the counting semaphore value indicator and the current wait count indicator to an initial wait value to place the semaphore in its initial state for reuse.

    Providing flexible matrix processors for performing neural network convolution in matrix-processor-based devices

    公开(公告)号:US10936943B2

    公开(公告)日:2021-03-02

    申请号:US16117952

    申请日:2018-08-30

    Abstract: Providing flexible matrix processors for performing neural network convolution in matrix-processor-based devices is disclosed. In this regard, a matrix-processor-based device provides a central processing unit (CPU) and a matrix processor. The matrix processor reorganizes a plurality of weight matrices and a plurality of input matrices into swizzled weight matrices and swizzled input matrices, respectively, that have regular dimensions natively supported by the matrix processor. The matrix-processor-based device then performs a convolution operation using the matrix processor to perform matrix multiplication/accumulation operations for the regular dimensions of the weight matrices and the input matrices, and further uses the CPU to execute instructions for handling the irregular dimensions of the weight matrices and the input matrices (e.g., by executing a series of nested loops, as a non-limiting example). The matrix-processor-based device thus provides efficient hardware acceleration by taking advantage of dimensional regularity, while maintaining the flexibility to handle different variations of convolution.

    Providing efficient floating-point operations using matrix processors in processor-based systems

    公开(公告)号:US10747501B2

    公开(公告)日:2020-08-18

    申请号:US16118099

    申请日:2018-08-30

    Abstract: Providing efficient floating-point operations using matrix processors in processor-based systems is disclosed. In this regard, a matrix-processor-based device provides a matrix processor comprising a positive partial sum accumulator and a negative partial sum accumulator. As the matrix processor processes pairs of floating-point operands, the matrix processor calculates an intermediate product based on a first floating-point operand and a second floating-point operand and determines a sign of the intermediate product. Based on the sign, the matrix processor normalizes the intermediate product with a partial sum fraction of the positive partial sum accumulator or the negative partial sum accumulator, then adds the intermediate product to the positive sum accumulator or the negative sum accumulator. After processing all pairs of floating-point operands, the matrix processor subtracts the negative partial sum accumulator from the positive partial sum accumulator to generate a final sum, then renormalizes the final sum a single time.

Patent Agency Ranking