Adaptive execution engine for convolution computing systems

    公开(公告)号:US10394929B2

    公开(公告)日:2019-08-27

    申请号:US15787897

    申请日:2017-10-19

    Applicant: MediaTek Inc.

    Abstract: A system performs convolution computing in either a matrix mode or a filter mode. An analysis module generates a mode select signal to select the matrix mode or the filter mode based on results of analyzing convolution characteristics. The results include at least a comparison of resource utilization between the matrix mode and the filter mode. A convolution module includes processing elements, each of which further includes arithmetic computing circuitry. The convolution module is configured according to the matrix mode for performing matrix multiplications converted from convolution computations, and is configured according to the filter mode for performing the convolution computations.

    Apparatus for mutual-transposition of scalar and vector data sets and related method
    2.
    发明授权
    Apparatus for mutual-transposition of scalar and vector data sets and related method 有权
    标量和向量数据集相互转置的装置及相关方法

    公开(公告)号:US09507601B2

    公开(公告)日:2016-11-29

    申请号:US14184663

    申请日:2014-02-19

    Applicant: MEDIATEK INC.

    Abstract: An apparatus for processing a plurality of data sets is disclosed, wherein one data set of the plurality of data sets includes N components and has a data type of one of a scalar type and a vector type, wherein N is a positive integer number. The apparatus includes a memory module and a data accessing module. The memory module comprises N memory units configured to store the plurality of data sets. The data accessing module is configured to write the data set into the memory module according to a write data index corresponding to the data set and one of a first writing mapping information and a second writing mapping information, wherein the first writing mapping information is employed when the data type is one of the scalar and the vector type and the second writing mapping information is employed when the data type is the other of the scalar and the vector type.

    Abstract translation: 公开了一种用于处理多个数据集的装置,其中所述多个数据集中的一个数据集包括N个分量,并且具有标量类型和矢量类型之一的数据类型,其中N是正整数。 该装置包括存储器模块和数据访问模块。 存储器模块包括被配置为存储多个数据集的N个存储器单元。 数据访问模块被配置为根据与数据集相对应的写入数据索引将数据集写入存储器模块中,并且将第一写入映射信息和第二写入映射信息中的一个写入映射信息, 数据类型是标量和向量类型之一,并且当数据类型是标量和向量类型中的另一个时采用第二写入映射信息。

    Memory shuffle engine for efficient work execution in a parallel computing system

    公开(公告)号:US10324730B2

    公开(公告)日:2019-06-18

    申请号:US15285472

    申请日:2016-10-04

    Applicant: MediaTek Inc.

    Abstract: A computing device performs parallel computations using a set of thread processing units and a memory shuffle engine. The memory shuffle engine includes a register array to store an array of data elements retrieved from a memory buffer, and an array of input selectors. According to a first control signal, each input selector transfers at least a first data element from a corresponding subset of the register array, which is coupled to the input selector via input lines, to one or more corresponding thread processing units. According to a second control signal, each input selector transfers at least a second data element from another subset of the register array, which is coupled to another input selector via other input lines, to the one or more corresponding thread processing units.

    Graphic processing system and method thereof

    公开(公告)号:US09760969B2

    公开(公告)日:2017-09-12

    申请号:US14641449

    申请日:2015-03-09

    Applicant: MEDIATEK INC.

    CPC classification number: G06T1/20

    Abstract: A graphic processing system and a method of graphic processing are provided. The graphic processing system has a collector, a plurality of slots, a scheduler, an arbiter and at least an arithmetic logic unit (ALU). The collector is configured to group a plurality of workitems into elementary wavefronts. Each of the elementary wavefronts comprises workitems configured to execute the same kernel code. The scheduler is configured to allocate the elementary wavefronts to the slots. Two or more of the elementary wavefronts exist at one slot to form one of a plurality of macro wavefronts. The arbiter is configured to select one of the macro wavefronts. The ALU is configured to execute workitems of at least an elementary wavefront of the selected macro wavefront and output results of execution of the workitems.

    ADAPTIVE EXECUTION ENGINE FOR CONVOLUTION COMPUTING SYSTEMS

    公开(公告)号:US20180173676A1

    公开(公告)日:2018-06-21

    申请号:US15787897

    申请日:2017-10-19

    Applicant: MediaTek Inc.

    CPC classification number: G06F17/15 G06F17/16 G06N3/0454 G06N3/063

    Abstract: A system performs convolution computing in either a matrix mode or a filter mode. An analysis module generates a mode select signal to select the matrix mode or the filter mode based on results of analyzing convolution characteristics. The results include at least a comparison of resource utilization between the matrix mode and the filter mode. A convolution module includes processing elements, each of which further includes arithmetic computing circuitry. The convolution module is configured according to the matrix mode for performing matrix multiplications converted from convolution computations, and is configured according to the filter mode for performing the convolution computations.

    APPARATUS FOR MUTUAL-TRANSPOSITION OF SCALAR AND VECTOR DATA SETS AND RELATED METHOD
    8.
    发明申请
    APPARATUS FOR MUTUAL-TRANSPOSITION OF SCALAR AND VECTOR DATA SETS AND RELATED METHOD 有权
    用于标量和矢量数据集的相互传递的装置及相关方法

    公开(公告)号:US20150234662A1

    公开(公告)日:2015-08-20

    申请号:US14184663

    申请日:2014-02-19

    Applicant: MEDIATEK INC.

    Abstract: An apparatus for processing a plurality of data sets is disclosed, wherein one data set of the plurality of data sets includes N components and has a data type of one of a scalar type and a vector type, wherein N is a positive integer number. The apparatus includes a memory module and a data accessing module. The memory module comprises N memory units configured to store the plurality of data sets. The data accessing module is configured to write the data set into the memory module according to a write data index corresponding to the data set and one of a first writing mapping information and a second writing mapping information, wherein the first writing mapping information is employed when the data type is one of the scalar and the vector type and the second writing mapping information is employed when the data type is the other of the scalar and the vector type.

    Abstract translation: 公开了一种用于处理多个数据集的装置,其中所述多个数据集中的一个数据集包括N个分量,并且具有标量类型和矢量类型之一的数据类型,其中N是正整数。 该装置包括存储器模块和数据访问模块。 存储器模块包括被配置为存储多个数据集的N个存储器单元。 数据访问模块被配置为根据与数据集相对应的写入数据索引将数据集写入存储器模块中,并且将第一写入映射信息和第二写入映射信息中的一个写入映射信息, 数据类型是标量和向量类型之一,并且当数据类型是标量和向量类型中的另一个时采用第二写入映射信息。

    AUTONOMOUS COPY BETWEEN EXTERNAL MEMORY AND INTERNAL MEMORY

    公开(公告)号:US20240319904A1

    公开(公告)日:2024-09-26

    申请号:US18407990

    申请日:2024-01-09

    Applicant: MediaTek Inc.

    CPC classification number: G06F3/065 G06F3/0604 G06F3/0659 G06F3/0683

    Abstract: A method of managing access to a first memory via a second memory includes autonomously copying data from one or more of the data blocks in the first plurality of data blocks in the first memory to corresponding one or more of the data blocks in the second plurality of data blocks in the second memory sequentially. Access to the first memory with a first plurality of data blocks is at a first speed and access to the second memory with a second plurality of data blocks is at a second speed. A command is received for reading from the second memory. Responsive to receiving the command, a pointer is obtained indicating an address of a data block in the second memory that contains data copied from the first memory and that is first available for access. The data is obtained from the data block based on the pointer.

    Efficient work execution in a parallel computing system

    公开(公告)号:US11175920B2

    公开(公告)日:2021-11-16

    申请号:US16395193

    申请日:2019-04-25

    Applicant: MediaTek Inc.

    Abstract: A computing device operative to perform parallel computations. The computing device includes a controller unit to assign workgroups to a set of batches. Each batch includes a program counter shared by M workgroups assigned to the batch, where M is a positive integer determined according to a configurable batch setting. Each batch further includes a set of thread processing units operative to execute, in parallel, a subset of work items in each of the M workgroups. Each batch further includes a spilling memory to store intermediate data of the M workgroups when one or more workgroups in the M workgroups encounters a synchronization barrier.

Patent Agency Ranking