METHOD AND APPARATUS FOR PARALLELIZED QRD-BASED OPERATIONS OVER A MULTIPLE EXECUTION UNIT PROCESSING SYSTEM
    1.
    发明申请
    METHOD AND APPARATUS FOR PARALLELIZED QRD-BASED OPERATIONS OVER A MULTIPLE EXECUTION UNIT PROCESSING SYSTEM 审中-公开
    用于多个执行单元处理系统的基于QRD的并行操作的方法和装置

    公开(公告)号:US20160226468A1

    公开(公告)日:2016-08-04

    申请号:US14610365

    申请日:2015-01-30

    CPC classification number: G06F17/16

    Abstract: Methods and apparatuses relating to QR decomposition using a multiple execution unit processing system are provided. A method includes receiving input values at the processing system and generating a first set of values based on the input values, where at least some of the first values are computed in parallel. A second set of values are generated recursively based on values in the first set. A third set of values are generated based on values in the second set, where at least some of the values in the third set are computed in parallel. The recursive component may be simplified to consist of one or more low latency operations. The processing performance of operations relating to QR decomposition may therefore be improved by using the parallelism available in multiple execution unit systems.

    Abstract translation: 提供了使用多执行单元处理系统与QR分解有关的方法和装置。 一种方法包括在处理系统处接收输入值并基于输入值生成第一组值,其中并行计算至少一些第一值。 第二组值是基于第一组中的值递归生成的。 基于第二组中的值生成第三组值,其中并行计算第三组中的至少一些值。 递归组件可以被简化为由一个或多个低延迟操作组成。 因此,可以通过使用多个执行单元系统中可用的并行性来改进与QR分解相关的操作的处理性能。

    METHOD AND APPARATUS FOR REALIZING SELF-TIMED PARALLELIZED MANY-CORE PROCESSOR
    2.
    发明申请
    METHOD AND APPARATUS FOR REALIZING SELF-TIMED PARALLELIZED MANY-CORE PROCESSOR 审中-公开
    用于实现自相并列的多核处理器的方法和装置

    公开(公告)号:US20160224349A1

    公开(公告)日:2016-08-04

    申请号:US14611140

    申请日:2015-01-30

    Abstract: A self-timed parallelized multi-core processor and method for operating the processor are provided. The processor has an instruction decoder unit to receive a program code instruction, determine an operating code and latency for the program code instructions, and assign a loop index to the program code instruction. The processor further includes an instruction decomposer unit coupled to the instruction decoder unit, the instruction decomposer configured to create a primitive by decomposing the instruction, replace the loop index with a core index, and broadcast the primitive. The processor further has a plurality of self-timed processing cores coupled to the instruction decomposer unit, each core having a unique core index and having a dispatch unit for comparing the core index in the primitive with the core index of its processing core, each core acting on the primitive when the index of the processing core is within a threshold of the core index.

    Abstract translation: 提供了一种用于操作处理器的自定时并行多核处理器和方法。 处理器具有指令解码器单元,用于接收程序代码指令,确定程序代码指令的操作代码和延迟,并向程序代码指令分配循环索引。 所述处理器还包括耦合到所述指令译码器单元的指令分解器单元,所述指令分解器被配置为通过分解所述指令来创建基元,用核心索引替换所述循环索引,并且广播所述图元。 处理器还具有耦合到指令分解器单元的多个自定时处理核心,每个核心具有唯一的核心索引,并且具有用于将原语中的核心索引与其处理核心的核心索引进行比较的调度单元,每个核心 当处理核心的索引在核心索引的阈值内时,它作用于原语。

Patent Agency Ranking