Memory address collision detection of ordered parallel threads with bloom filters
    4.
    发明授权
    Memory address collision detection of ordered parallel threads with bloom filters 有权
    带有绽放滤波器的有序并行线程的内存地址冲突检测

    公开(公告)号:US09542193B2

    公开(公告)日:2017-01-10

    申请号:US13730704

    申请日:2012-12-28

    Abstract: A semiconductor chip is described having a load collision detection circuit comprising a first bloom filter circuit. The semiconductor chip has a store collision detection circuit comprising a second bloom filter circuit. The semiconductor chip has one or more processing units capable of executing ordered parallel threads coupled to the load collision detection circuit and the store collision detection circuit. The load collision detection circuit and the store collision detection circuit is to detect younger stores for load operations of said threads and younger loads for store operations of said threads.

    Abstract translation: 描述了一种具有负载碰撞检测电路的半导体芯片,该电路包括第一起爆滤波器电路。 半导体芯片具有存储冲突检测电路,该电路包括第二盛盛滤波器电路。 该半导体芯片具有能够执行与负载碰撞检测电路和存储冲突检测电路耦合的有序并行线程的一个或多个处理单元。 负载碰撞检测电路和存储碰撞检测电路是检测较年轻的存储器用于所述线程和较小负载的负载操作,用于所述线程的存储操作。

    Apparatus and method for a hybrid latency-throughput processor
    5.
    发明授权
    Apparatus and method for a hybrid latency-throughput processor 有权
    用于混合延迟吞吐量处理器的装置和方法

    公开(公告)号:US09417873B2

    公开(公告)日:2016-08-16

    申请号:US13730055

    申请日:2012-12-28

    Abstract: An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.

    Abstract translation: 描述了用于在处理设备上执行延迟优化的执行逻辑和吞吐量优化的执行逻辑的装置和方法。 例如,根据一个实施例的处理器包括:执行第一类型的程序代码的等待时间优化的执行逻辑; 吞吐量优化执行逻辑以执行第二类型的程序代码,其中所述第一类型的程序代码和所述第二类型的程序代码被设计用于相同的指令集架构; 识别过程中的第一类型的程序代码和第二类型的程序代码的逻辑,并且将用于执行的第一类型的程序代码分配在延迟优化的执行逻辑和第二类型的程序代码上以便在吞吐量 - 优化的执行逻辑。

    Apparatus and method for a hybrid latency-throughput processor

    公开(公告)号:US10664284B2

    公开(公告)日:2020-05-26

    申请号:US16289075

    申请日:2019-02-28

    Abstract: An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.

    Memory address collision detection of ordered parallel threads with bloom filters

    公开(公告)号:US10101999B2

    公开(公告)日:2018-10-16

    申请号:US15403101

    申请日:2017-01-10

    Abstract: A semiconductor chip is described having a load collision detection circuit comprising a first bloom filter circuit. The semiconductor chip has a store collision detection circuit comprising a second bloom filter circuit. The semiconductor chip has one or more processing units capable of executing ordered parallel threads coupled to the load collision detection circuit and the store collision detection circuit. The load collision detection circuit and the store collision detection circuit is to detect younger stores for load operations of said threads and younger loads for store operations of said threads.

    Apparatus and method for low-latency invocation of accelerators

    公开(公告)号:US10089113B2

    公开(公告)日:2018-10-02

    申请号:US15282082

    申请日:2016-09-30

    Abstract: An apparatus and method are described for providing low-latency invocation of accelerators. For example, a system according to one embodiment comprises: a processor includes a plurality of simultaneous multithreading (SMT) cores, at least one shared cache circuit to be shared among two or more of the SMT cores; and at least one of the SMT cores including at least one level 2 (L2) cache circuit to store both instructions and data and communicatively coupled to the instruction cache circuit and the data cache circuit, a communication interconnect circuit including a peripheral component interconnect express (PCIe) circuit to communicatively couple one or more of the SMT cores to an accelerator device and a memory access circuit to identify an accelerator context save/restore region in a memory responsive to a context save/restore value, the accelerator context save/restore region to share an accelerator context state.

Patent Agency Ranking