Asymmetric quantization of multiple-and-accumulate operations in deep learning processing

    公开(公告)号:US10977001B2

    公开(公告)日:2021-04-13

    申请号:US16250874

    申请日:2019-01-17

    Applicant: MediaTek Inc.

    Abstract: A processing unit performs multiply-and-accumulate (MAC) operations on asymmetrically quantized data. The processing unit includes a MAC hardware unit to perform the MAC operations on a first data sequence and a second data sequence to generate an asymmetric MAC output. Both the first data sequence and the second data sequence are asymmetrically quantized. The processing unit further includes an accumulator hardware unit to accumulate the first data sequence concurrently with the MAC operations to generate an accumulated output. The processing unit further includes a multiply-and-add (MAD) hardware unit to multiply the accumulated output with a second offset to generate a multiplication output, and to add the multiplication output, the asymmetric MAC output and a pre-computed value calculated before runtime to generate a final output. The second offset indicates an amount of asymmetry of the second data sequence with respect to zero.

    Hybrid non-uniform convolution transform engine for deep learning applications

    公开(公告)号:US10755169B2

    公开(公告)日:2020-08-25

    申请号:US15841733

    申请日:2017-12-14

    Applicant: MediaTek Inc.

    Abstract: A system performs convolution operations based on an analysis of the input size. The input includes data elements and filter weights. The system includes multiple processing elements. Each processing element includes multipliers and adders, with more of the adders than the multipliers. According to at least the analysis result which indicates whether the input size matches a predetermined size, the system is operative to select a first mode or a second mode. In the first mode, a greater number of the adders than the multipliers are enabled for each processing element to multiply transformed input and to perform an inverse transformation. In the second mode, an equal number of the adders and the multipliers are enabled for each processing element to multiply-and-accumulate the input. One or more of the multipliers are shared by the first mode and the second mode.

    HYBRID NON-UNIFORM CONVOLUTION TRANSFORM ENGINE FOR DEEP LEARNING APPLICATIONS

    公开(公告)号:US20190114536A1

    公开(公告)日:2019-04-18

    申请号:US15841733

    申请日:2017-12-14

    Applicant: MediaTek Inc.

    Abstract: A system performs convolution operations based on an analysis of the input size. The input includes data elements and filter weights. The system includes multiple processing elements. Each processing element includes multipliers and adders, with more of the adders than the multipliers. According to at least the analysis result which indicates whether the input size matches a predetermined size, the system is operative to select a first mode or a second mode. In the first mode, a greater number of the adders than the multipliers are enabled for each processing element to multiply transformed input and to perform an inverse transformation. In the second mode, an equal number of the adders and the multipliers are enabled for each processing element to multiply-and-accumulate the input. One or more of the multipliers are shared by the first mode and the second mode.

    SNOOP FILTER FOR MULTI-PROCESSOR SYSTEM AND RELATED SNOOP FILTERING METHOD
    6.
    发明申请
    SNOOP FILTER FOR MULTI-PROCESSOR SYSTEM AND RELATED SNOOP FILTERING METHOD 有权
    用于多处理器系统的SNOOP过滤器和相关SNOOP过滤方法

    公开(公告)号:US20160117249A1

    公开(公告)日:2016-04-28

    申请号:US14820571

    申请日:2015-08-07

    Applicant: MEDIATEK INC.

    Abstract: A snoop filter for a multi-processor system has a storage device and a control circuit. The control circuit manages at least a first-type entry and at least a second-type entry stored in the storage device. The first-type entry is configured to record information indicative of a first cache of the multi-processor system and first requested memory addresses that are associated with multiple first cache lines each being only available in the first cache. The second-type entry is configured to record information indicative of multiple second caches of the multi-processor system and at least a second requested memory address that is associated with a second cache line being available in each of the multiple second caches.

    Abstract translation: 用于多处理器系统的窥探滤波器具有存储装置和控制电路。 控制电路管理存储在存储装置中的至少第一类型条目和至少第二类型条目。 第一类型条目被配置为记录指示多处理器系统的第一高速缓存的信息和与多个第一高速缓存行相关联的第一请求存储器地址,每个第一高速缓存行仅在第一高速缓存中可用。 第二类型条目被配置为记录指示多处理器系统的多个第二高速缓存的信息,以及与第二高速缓存行相关联的至少第二请求存储器地址在多个第二高速缓存中的每一个中可用。

    NEURAL NETWORK ENGINE WITH TILE-BASED EXECUTION

    公开(公告)号:US20190220742A1

    公开(公告)日:2019-07-18

    申请号:US16246884

    申请日:2019-01-14

    Applicant: MediaTek Inc.

    CPC classification number: G06N3/08

    Abstract: An accelerator for neural network computing includes hardware engines and a buffer memory. The hardware engines include a convolution engine and at least a second engine. Each hardware engine includes circuitry to perform neural network operations. The buffer memory stores a first input tile and a second input tile of an input feature map. The second input tile overlaps with the first input tile in the buffer memory. The convolution engine is operative to retrieve the first input tile from the buffer memory, perform convolution operations on the first input tile to generate an intermediate tile of an intermediate feature map, and pass the intermediate tile to the second engine via the buffer memory.

    NEURAL NETWORK PROCESSING UNIT FOR HYBRID AND MIXED PRECISION COMPUTING

    公开(公告)号:US20220156567A1

    公开(公告)日:2022-05-19

    申请号:US17505422

    申请日:2021-10-19

    Applicant: MediaTek Inc.

    Abstract: A neural network (NN) processing unit includes an operation circuit to perform tensor operations of a given layer of a neural network in one of a first number representation and a second number representation. The NN processing unit further includes a conversion circuit coupled to at least one of an input port and an output port of the operation circuit to convert between the first number representation and the second number representation. The first number representation is one of a fixed-point number representation and a floating-point number representation, and the second number representation is the other one of the fixed-point number representation and the floating-point number representation.

    ASYMMETRIC QUANTIZATION OF MULTIPLE-AND-ACCUMULATE OPERATIONS IN DEEP LEARNING PROCESSING

    公开(公告)号:US20190243610A1

    公开(公告)日:2019-08-08

    申请号:US16250874

    申请日:2019-01-17

    Applicant: MediaTek Inc.

    CPC classification number: G06F7/5443 G06N3/063

    Abstract: A processing unit performs multiply-and-accumulate (MAC) operations on asymmetrically quantized data. The processing unit includes a MAC hardware unit to perform the MAC operations on a first data sequence and a second data sequence to generate an asymmetric MAC output. Both the first data sequence and the second data sequence are asymmetrically quantized. The processing unit further includes an accumulator hardware unit to accumulate the first data sequence concurrently with the MAC operations to generate an accumulated output. The processing unit further includes a multiply-and-add (MAD) hardware unit to multiply the accumulated output with a second offset to generate a multiplication output, and to add the multiplication output, the asymmetric MAC output and a pre-computed value calculated before runtime to generate a final output. The second offset indicates an amount of asymmetry of the second data sequence with respect to zero.

Patent Agency Ranking