Multi-threaded instruction buffer design

    公开(公告)号:US10346173B2

    公开(公告)日:2019-07-09

    申请号:US13041881

    申请日:2011-03-07

    IPC分类号: G06F9/30 G06F9/38

    摘要: An instruction buffer for a processor configured to execute multiple threads is disclosed. The instruction buffer is configured to receive instructions from a fetch unit and provide instructions to a selection unit. The instruction buffer includes one or more memory arrays comprising a plurality of entries configured to store instructions and/or other information (e.g., program counter addresses). One or more indicators are maintained by the processor and correspond to the plurality of threads. The one or more indicators are usable such that for instructions received by the instruction buffer, one or more of the plurality entries of a memory array can be determined as a write destination for the received instructions, and for instructions to be read from the instruction buffer (and sent to a selection unit), one or more entries can be determined as the correct source location from which to read.

    System and method for balancing instruction loads between multiple execution units using assignment history
    2.
    发明授权
    System and method for balancing instruction loads between multiple execution units using assignment history 有权
    用于使用分配历史平衡多个执行单元之间的指令加载的系统和方法

    公开(公告)号:US09122487B2

    公开(公告)日:2015-09-01

    申请号:US12490005

    申请日:2009-06-23

    IPC分类号: G06F9/38

    摘要: A system and method for balancing instruction loads between multiple execution units are disclosed. One or more execution units may be represented by a slot configured to accept instructions on behalf of the execution unit(s). A decode unit may assign instructions to a particular slot for subsequent scheduling for execution. Slot assignments may be made based on an instruction's type and/or on a history of previous slot assignments. A cumulative slot assignment history may be maintained in a bias counter, the value of which reflects the bias of previous slot assignments. Slot assignments may be determined based on the value of the bias counter, in order to balance the instruction load across all slots, and all execution units. The bias counter may reflect slot assignments made only within a desired historical window. A separate data structure may store data reflecting the actual slot assignments made during the desired historical window.

    摘要翻译: 公开了一种用于平衡多个执行单元之间的指令负载的系统和方法。 一个或多个执行单元可以由被配置为接受代表执行单元的指令的时隙来表示。 解码单元可以向特定时隙分配指令用于后续调度以执行。 插槽分配可以基于指令的类型和/或先前的时隙分配的历史来进行。 可以在偏置计数器中保持累积时隙分配历史,其偏差反映了先前时隙分配的偏差。 可以基于偏置计数器的值来确定插槽分配,以便平衡所有时隙上的指令负载以及所有执行单元。 偏置计数器可以反映仅在期望的历史窗口内进行的时隙分配。 单独的数据结构可以存储反映在所需历史窗口期间进行的实际时隙分配的数据。

    Accessing a multibank register file using a thread identifier
    3.
    发明授权
    Accessing a multibank register file using a thread identifier 有权
    使用线程标识符访问多银行寄存器文件

    公开(公告)号:US08458446B2

    公开(公告)日:2013-06-04

    申请号:US12570682

    申请日:2009-09-30

    IPC分类号: G06F9/30

    摘要: A processor includes an instruction fetch unit configured to issue instructions for execution, where the instructions are selected from a number of threads, where each given instruction has a corresponding thread identifier, and where at least some of the instructions specify operand(s) via register identifiers. A register file stores operands usable by the instructions, and may include several banks, each corresponding to a register identifiers and including several entries corresponding to the several threads, wherein the entries are configured to store data values. In response to receiving a request to read a particular register identifier for a given thread identifier, the register file may be configured to decode the given thread identifier to retrieve entries from the banks that correspond to the given thread identifier. The register file may further select, from among the retrieved entries, a data value corresponding to the particular register identifier to be output.

    摘要翻译: 处理器包括:指令获取单元,被配置为发出用于执行的指令,其中从多个线程中选择指令,其中每个给定指令具有对应的线程标识符,并且其中至少一些指令经由寄存器指定操作数 身份标识。 寄存器文件存储指令可用的操作数,并且可以包括几个存储体,每个存储体对应于寄存器标识符,并且包括与多个线程对应的多个条目,其中条目被配置为存储数据值。 响应于接收到针对给定线程标识符读取特定寄存器标识符的请求,寄存器文件可以被配置为对给定的线程标识符进行解码以从对应于给定线程标识符的存储体检索条目。 寄存器文件还可以从检索到的条目中选择与要输出的特定寄存器标识符对应的数据值。

    Method and system for sharing functional units of a multithreaded processor
    4.
    发明授权
    Method and system for sharing functional units of a multithreaded processor 有权
    用于共享多线程处理器功能单元的方法和系统

    公开(公告)号:US08095778B1

    公开(公告)日:2012-01-10

    申请号:US10880712

    申请日:2004-06-30

    申请人: Robert T. Golla

    发明人: Robert T. Golla

    IPC分类号: G06F9/40

    摘要: Sharing functional units within a multithreaded processor. In one embodiment, the multithreaded processor may include a multithreaded instruction source that may provide an instruction from each of a plurality of thread groups in a given cycle. A given thread group may include one or more instructions from one or more threads. The arbitration functionality may arbitrate between the plurality of thread groups for access to a functional unit such as a load store unit, for example, that may be shared between the thread groups.

    摘要翻译: 在多线程处理器中共享功能单元。 在一个实施例中,多线程处理器可以包括可以在给定周期中从多个线程组中的每一个提供指令的多线程指令源。 给定的线程组可以包括来自一个或多个线程的一个或多个指令。 仲裁功能可以在多个线程组之间仲裁以访问功能单元,例如可以在线程组之间共享的加载存储单元。

    THREAD FAIRNESS ON A MULTI-THREADED PROCESSOR WITH MULTI-CYCLE CRYPTOGRAPHIC OPERATIONS
    5.
    发明申请
    THREAD FAIRNESS ON A MULTI-THREADED PROCESSOR WITH MULTI-CYCLE CRYPTOGRAPHIC OPERATIONS 有权
    具有多周期运行的多线程处理器的螺纹公差

    公开(公告)号:US20110276783A1

    公开(公告)日:2011-11-10

    申请号:US12773278

    申请日:2010-05-04

    IPC分类号: G06F9/38

    摘要: Systems and methods for efficient execution of operations in a multi-threaded processor. Each thread may include a blocking instruction. A blocking instruction blocks other threads from utilizing hardware resources for an appreciable amount of time. One example of a blocking type instruction is a Montgomery multiplication cryptographic instruction. Each thread can operate in a thread-based mode that allows the insertion of stall cycles during the execution of blocking instructions, during which other threads may utilize the previously blocked hardware resources. At times when multiple threads are scheduled to execute blocking instructions, the thread-based mode may be changed to increase throughput for these multiple threads. For example, the mode may be changed to disallow the insertion of stall cycles. Therefore, the time for sequential operation of the blocking instructions corresponding to the multiple threads may be reduced.

    摘要翻译: 在多线程处理器中有效执行操作的系统和方法。 每个线程可以包括阻塞指令。 阻塞指令阻止其他线程在相当长的时间内利用硬件资源。 阻塞型指令的一个例子是蒙哥马利乘法加密指令。 每个线程都可以以线程为基础的模式运行,允许在执行阻塞指令期间插入停滞周期,在此期间其他线程可能利用先前阻止的硬件资源。 在多个线程被调度执行阻塞指令的时候,可以改变基于线程的模式,以增加这些多线程的吞吐量。 例如,可以改变该模式以不允许插入失速循环。 因此,可以减少对应于多个线程的阻塞指令的顺序操作的时间。

    Register access protocol in a multihreaded multi-core processor
    6.
    发明授权
    Register access protocol in a multihreaded multi-core processor 有权
    在多线程多核处理器中注册访问协议

    公开(公告)号:US07747771B1

    公开(公告)日:2010-06-29

    申请号:US10881178

    申请日:2004-06-30

    IPC分类号: G06F15/16 G06F15/76 G06F13/00

    CPC分类号: G06F15/16

    摘要: A method and mechanism for managing access to a plurality of registers in a processing device are contemplated. A processing device includes multiple nodes coupled to a ring bus, each of which include one or more registers which may be accessed by processes executing within the device. Also coupled to the ring bus is a ring control unit which is configured to initiate transactions targeted to nodes on the ring bus. Each of the nodes are configured receive and process bus transaction with a fixed latency whether or not the first transaction is targeted to the receiving node. The ring control unit is configured to periodically convey idle transactions on the ring bus in order to allow nodes responding to indeterminate transactions to gain access to the bus.

    摘要翻译: 考虑了用于管理对处理设备中的多个寄存器的访问的方法和机制。 处理设备包括耦合到环形总线的多个节点,每个节点包括一个或多个可由设备内执行的进程访问的寄存器。 还耦合到环形总线的环控制单元被配置为发起针对环形总线上的节点的事务。 每个节点被配置为具有固定延迟的接收和处理总线事务,无论第一个事务是否针对接收节点。 环控制单元被配置为周期性地传送环总线上的空闲事务,以便允许节点响应不确定的事务来访问总线。

    Arbitration of window swap operations
    7.
    发明授权
    Arbitration of window swap operations 有权
    窗口交换操作的仲裁

    公开(公告)号:US07426630B1

    公开(公告)日:2008-09-16

    申请号:US10881151

    申请日:2004-06-30

    摘要: In one embodiment, a processor comprises a register file, register management logic coupled to the register file, and at least two sources of window swap operations coupled to the register management logic. The register management logic is configured to control an interface to the register file to switch register windows in the register file in response to one or more window swap operations. The sources of window swap operations and the register management logic are configured to cooperate according to an arbitration scheme to arbitrate between conflicting window swap operations to be performed using the interface. In one particular implementation, for example, block signals may be used from higher priority sources to lower priority sources to block issuance of window swap operations by the lower priority sources.

    摘要翻译: 在一个实施例中,处理器包括寄存器文件,耦合到寄存器文件的寄存器管理逻辑以及耦合到寄存器管理逻辑的至少两个窗口交换源。 寄存器管理逻辑被配置为响应于一个或多个窗口交换操作来控制寄存器文件的接口来切换寄存器文件中的寄存器窗口。 窗口交换操作的来源和寄存器管理逻辑被配置为根据仲裁方案进行协作以在使用该接口执行的冲突的窗口交换操作之间进行仲裁。 在一个特定实现中,例如,可以使用块信号从较高优先级源降低优先级源,以阻止较低优先级源发出窗口交换操作。

    Software accessible fast VA to PA translation
    8.
    发明授权
    Software accessible fast VA to PA translation 有权
    软件可访问快速VA到PA翻译

    公开(公告)号:US07350053B1

    公开(公告)日:2008-03-25

    申请号:US11034345

    申请日:2005-01-11

    IPC分类号: G06F9/26 G06F9/34 G06F12/00

    CPC分类号: G06F12/1081 G06F12/1027

    摘要: A method to communicate data is disclosed which includes communicating a virtual address to a translation lookaside buffer (TLB) and translating the virtual address to a physical address of a computer memory. The method also includes loading the physical address translated by the TLB into a register within a processor and transmitting the data from the physical address to a destination computing device.

    摘要翻译: 公开了一种用于传送数据的方法,其包括将虚拟地址传送到翻译后备缓冲器(TLB)并将虚拟地址转换为计算机存储器的物理地址。 该方法还包括将由TLB转换的物理地址加载到处理器内的寄存器中,并将数据从物理地址传输到目标计算设备。

    Minimal address state in a fine grain multithreaded processor
    9.
    发明授权
    Minimal address state in a fine grain multithreaded processor 有权
    细粒度多线程处理器中的最小地址状态

    公开(公告)号:US07343474B1

    公开(公告)日:2008-03-11

    申请号:US10881616

    申请日:2004-06-30

    IPC分类号: G06F9/30

    摘要: In one embodiment, a processor comprises a plurality of pipeline stages and a first circuit operable at a first pipeline stage of the plurality of pipeline stages. The first circuit is configured to maintain a plurality of program counters (PCs), each of which corresponds to one of a plurality of threads that the processor is configured to have concurrently in process with respect to the plurality of pipeline stages. The first circuit is configured to provide a first PC to a second pipeline stage of the plurality of pipeline stages. The first PC is derived from one of the plurality of PCs that corresponds to a first thread of the plurality of threads, and a first instruction entering the second pipeline stage is from the first thread.

    摘要翻译: 在一个实施例中,处理器包括多个流水线级和在多个流水线级的第一流水线级可工作的第一电路。 第一电路被配置为维持多个程序计数器(PC),每个程序计数器(PC)对应于处理器被配置为相对于多个流水线级并行处理的多个线程中的一个。 第一电路被配置为向多个流水线级的第二流水线级提供第一PC。 第一个PC是从与多个线程中的第一个线程相对应的多个PC中的一个导出的,并且进入第二流水线级的第一指令来自第一线程。

    Method and apparatus for executing fixed-point instructions within idle
execution units of a superscalar processor
    10.
    发明授权
    Method and apparatus for executing fixed-point instructions within idle execution units of a superscalar processor 失效
    用于在超标量处理器的空闲执行单元内执行定点指令的方法和装置

    公开(公告)号:US5809323A

    公开(公告)日:1998-09-15

    申请号:US530552

    申请日:1995-09-19

    IPC分类号: G06F9/302 G06F9/38

    摘要: A superscalar processor and method for executing fixed-point instructions within a superscalar processor are disclosed. The superscalar processor has a memory and multiple execution units, including a fixed point execution unit (FXU) and a non-fixed point execution unit (non-FXU). According to the present invention, a set of instructions to be executed are fetched from among a number of instructions stored within memory. A determination is then made if n instructions, the maximum number possible, can be dispatched to the multiple execution units during a first processor cycle if fixed point arithmetic and logical instructions are dispatched only to the FXU. If so, n instructions are dispatched to the multiple execution units for execution. In response to a determination that n instructions cannot be dispatched during the first processor cycle, a determination is made whether a fixed point instruction is available to be dispatched and whether dispatching the fixed point instruction to the non-FXU for execution will result in greater efficiency. In response to a determination that a fixed point instruction is not available to be dispatched or that dispatching the fixed point instruction to the non-FXU will not result in greater efficiency, dispatch of the fixed point instruction is delayed until a second processor cycle. However, in response to a determination that dispatching the fixed point instruction to the non-FXU will result in greater efficiency, the fixed point instruction is dispatched to the non-FXU and executed, thereby improving execution unit utilization.

    摘要翻译: 公开了一种用于在超标量处理器内执行定点指令的超标量处理器和方法。 超标量处理器具有存储器和多个执行单元,包括固定点执行单元(FXU)和非固定点执行单元(非FXU)。 根据本发明,从存储在存储器中的多个指令中取出要执行的一组指令。 然后如果将固定点算术和逻辑指令仅发送到FXU,则可以在第一处理器周期期间将n个指令(尽可能最大数)分派到多个执行单元进行确定。 如果是这样,n个指令被分派到多个执行单元执行。 响应于在第一处理器周期期间不能调度n个指令的确定,确定是否可以调度固定点指令,以及是否向非FXU分派定点指令以执行将导致更高的效率 。 响应于确定不能发送固定点指令或者将定点指令分派到非FXU不会导致更高的效率,所以定点指令的调度被延迟到第二处理器周期。 然而,响应于将定点指令发送到非FXU的确定将导致更高的效率,将定点指令分派到非FXU并执行,从而提高执行单元的利用率。