LOAD BALANCING WHEN ASSIGNING OPERATIONS IN A PROCESSOR
    1.
    发明申请
    LOAD BALANCING WHEN ASSIGNING OPERATIONS IN A PROCESSOR 审中-公开
    在处理器中对运行进行分配时的负载平衡

    公开(公告)号:US20120110594A1

    公开(公告)日:2012-05-03

    申请号:US12914483

    申请日:2010-10-28

    IPC分类号: G06F9/46

    摘要: A method and apparatus for assigning operations in a processor are provided. An incoming instruction is received. The incoming instruction is capable of being processed: only by a first processing unit (PU), only by a second PU or by either first and second PUs. The processing of first and second PUs is load balanced by assigning the received instructions capable of being processed by either the first and the second PUs based on a metric representing differential loads placed on the first and the second PUs.

    摘要翻译: 提供了一种用于在处理器中分配操作的方法和装置。 接收到进入的指令。 输入指令只能由第一处理单元(PU),仅由第二PU或第一和第二PU进行处理。 基于表示放置在第一和第二PU上的差分负载的度量,通过分配能够被第一和第二PU处理的接收指令来对第一和第二PU的处理进行负载平衡。

    MECHANISM FOR IDENTIFYING THE SOURCE OF PERFORMANCE LOSS IN A MICROPROCESSOR
    2.
    发明申请
    MECHANISM FOR IDENTIFYING THE SOURCE OF PERFORMANCE LOSS IN A MICROPROCESSOR 有权
    识别微处理器性能损失的来源的机制

    公开(公告)号:US20090019317A1

    公开(公告)日:2009-01-15

    申请号:US11776986

    申请日:2007-07-12

    申请人: Nhon Quach Sean Lie

    发明人: Nhon Quach Sean Lie

    IPC分类号: G06F11/34

    摘要: A system and method of accounting for lost clock cycles in a microprocessor. A method includes detecting a first reason which prevents exit of an entry from an instruction retirement queue, and incrementing a first count corresponding to the first reason, wherein the first count is incremented while the first reason prevents exit of the entry from the queue. A first point in time is determined when said first reason no longer prevents exit of the entry from the queue. A second reason which prevents exit of the entry from the queue is detected, wherein the second reason came into existence prior to said first point in time. A second count corresponding to the second reason is incremented, wherein incrementing the second count begins at the first point in time.

    摘要翻译: 一种考虑微处理器中丢失时钟周期的系统和方法。 一种方法包括检测防止条目从指令退出队列退出的第一原因,并且增加与第一原因相对应的第一计数,其中第一计数增加,而第一原因阻止该条目从队列退出。 当第一个原因不再阻止条目退出队列时,确定第一时间点。 检测到防止条目从队列中退出的第二个原因,其中第二原因在所述第一时间点之前存在。 对应于第二原因的第二计数增加,其中增加第二计数从第一时间点开始。

    Method and Apparatus for Length Decoding and Identifying Boundaries of Variable Length Instructions
    3.
    发明申请
    Method and Apparatus for Length Decoding and Identifying Boundaries of Variable Length Instructions 有权
    用于长度解码和识别可变长度指令边界的方法和装置

    公开(公告)号:US20090019257A1

    公开(公告)日:2009-01-15

    申请号:US11775456

    申请日:2007-07-10

    申请人: Gene W. Shen Sean Lie

    发明人: Gene W. Shen Sean Lie

    IPC分类号: G06F15/76

    摘要: A mechanism for superscalar decode of variable length instructions. A length decode unit may obtain a plurality of instruction bytes based on a scan window of a predetermined size. The instruction bytes may be associated with a plurality of variable length instructions, which are scheduled to be executed by a processing unit. The length decode unit may, for each instruction byte, estimate the start of a next variable length instruction following a current variable length instruction, and store a first pointer. A pre-pick unit may, for each instruction byte, use the first pointer to estimate the start of a subsequent variable length instruction following the next variable length instruction within the scan window, and store a second pointer. A pick unit may use a start pointer and related first and second pointers to determine the actual start of the variable length instructions within the scan window, and generate instruction pointers.

    摘要翻译: 用于可变长度指令的超标量解码的机制。 长度解码单元可以基于预定大小的扫描窗口获得多个指令字节。 指令字节可以与被调度为由处理单元执行的多个可变长度指令相关联。 对于每个指令字节,长度解码单元可以估计当前可变长度指令之后的下一可变长度指令的开始,并存储第一指针。 对于每个指令字节,预选单元可以使用第一指针来估计在扫描窗口内的下一个可变长度指令之后的后续可变长度指令的开始,并存储第二指针。 拾取单元可以使用起始指针和相关的第一和第二指针来确定扫描窗口内的可变长度指令的实际开始,并且生成指令指针。

    Distributed packet switching in a source routed cluster server
    4.
    发明授权
    Distributed packet switching in a source routed cluster server 有权
    源路由集群服务器中的分布式数据包交换

    公开(公告)号:US09331958B2

    公开(公告)日:2016-05-03

    申请号:US13731331

    申请日:2012-12-31

    摘要: A cluster compute server includes nodes coupled in a network topology via a fabric that source routes packets based on location identifiers assigned to the nodes, the location identifiers representing the locations in the network topology. Host interfaces at the nodes may be associated with link layer addresses that do not reflect the location identifier associated with the nodes. The nodes therefore implement locally cached link layer address translations that map link layer addresses to corresponding location identifiers in the network topology. In response to originating a packet directed to one of these host interfaces, the node accesses the local translation cache to obtain a link layer address translation for a destination link layer address of the packet. When a node experiences a cache miss, the node queries a management node to obtain the specified link layer address translation from a master translation table maintained by the management node.

    摘要翻译: 集群计算服务器包括经由结构的网络拓扑中耦合的节点,该结构基于分配给节点的位置标识符来路由分组,所述位置标识符表示网络拓扑中的位置。 节点处的主机接口可以与不反映与节点相关联的位置标识符的链路层地址相关联。 因此,节点实现本地缓存的链路层地址转换,其将链路层地址映射到网络拓扑中的相应位置标识符。 响应于发起指向这些主机接口之一的分组,节点访问本地转换高速缓存以获得分组的目的地链路层地址的链路层地址转换。 当节点经历高速缓存未命中时,节点向管理节点查询,以从管理节点维护的主转换表中获取指定的链路层地址转换。

    Method and apparatus for length decoding variable length instructions
    5.
    发明授权
    Method and apparatus for length decoding variable length instructions 有权
    长度解码可变长度指令的方法和装置

    公开(公告)号:US07818542B2

    公开(公告)日:2010-10-19

    申请号:US11775451

    申请日:2007-07-10

    申请人: Gene W. Shen Sean Lie

    发明人: Gene W. Shen Sean Lie

    IPC分类号: G06F9/30 G06F9/32

    摘要: A mechanism for superscalar decode of variable length instructions. The decode mechanism may be included within a processing unit, and may comprise a length decode unit. The length decode unit may obtain a plurality of instruction bytes. The instruction bytes may be associated with a plurality of variable length instructions, which are to be executed by the processing unit. The length decode unit may perform a length decode operation for each of the plurality of instruction bytes. For each instruction byte, the length decode unit may estimate the instruction length of a current variable length instruction associated with a current instruction byte. Furthermore, during the length decode operation, for each instruction byte, the length decode unit may estimate the start of a next variable length instruction based on the estimated instruction length of the current variable length instruction, and store a first pointer to the estimated start of the next variable length instruction.

    摘要翻译: 用于可变长度指令的超标量解码的机制。 解码机构可以包括在处理单元内,并且可以包括长度解码单元。 长度解码单元可以获得多个指令字节。 指令字节可以与由处理单元执行的多个可变长度指令相关联。 长度解码单元可以对多个指令字节中的每一个执行长度解码操作。 对于每个指令字节,长度解码单元可以估计与当前指令字节相关的当前可变长度指令的指令长度。 此外,在长度解码操作期间,对于每个指令字节,长度解码单元可以基于当前可变长度指令的估计指令长度来估计下一个可变长度指令的开始,并将第一个指针存储到 下一个可变长度指令。

    MULTIPLE-CORE PROCESSOR WITH HIERARCHICAL MICROCODE STORE
    6.
    发明申请
    MULTIPLE-CORE PROCESSOR WITH HIERARCHICAL MICROCODE STORE 有权
    具有分层微处理器的多核处理器

    公开(公告)号:US20090024836A1

    公开(公告)日:2009-01-22

    申请号:US11779642

    申请日:2007-07-18

    IPC分类号: G06F9/26

    CPC分类号: G06F9/28 G06F9/223

    摘要: A multiple-core processor having a hierarchical microcode store. A processor may include multiple processor cores, each configured to independently execute instructions defined according to a programmer-visible instruction set architecture (ISA). Each core may include a respective local microcode unit configured to store microcode entries. The processor may also include a remote microcode unit accessible by each of the processor cores. Any given one of the processor cores may be configured to generate a given microcode entrypoint corresponding to a particular microcode entry including one or more operations to be executed by the given processor core, and to determine whether the particular microcode entry is stored within the respective local microcode unit of the given core. In response to determining that the particular microcode entry is not stored within the respective local microcode unit, the given core may convey a request for the particular microcode entry to the remote microcode unit.

    摘要翻译: 具有分级微代码存储器的多核处理器。 处理器可以包括多个处理器核心,每个处理器核心被配置为独立地执行根据编程器 - 可见指令集架构(ISA)定义的指令。 每个核心可以包括被配置为存储微代码条目的相应的本地微代码单元。 处理器还可以包括可由每个处理器核心访问的远程微代码单元。 任何给定的一个处理器核心可以被配置为生成对应于特定微代码条目的给定微代码入口点,该特定微代码条目包括要由给定处理器核心执行的一个或多个操作,并且确定特定微代码条目是否存储在各自的本地 给定核心的微码单元。 响应于确定特定微代码条目未存储在相应的本地微代码单元内,给定的核心可以向远程微代码单元传达特定微代码条目的请求。

    PROCESSING PIPELINE HAVING STAGE-SPECIFIC THREAD SELECTION AND METHOD THEREOF
    7.
    发明申请
    PROCESSING PIPELINE HAVING STAGE-SPECIFIC THREAD SELECTION AND METHOD THEREOF 有权
    具有特殊螺纹选择的加工管道及其方法

    公开(公告)号:US20090172362A1

    公开(公告)日:2009-07-02

    申请号:US11967923

    申请日:2007-12-31

    IPC分类号: G06F9/30

    摘要: One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage.

    摘要翻译: 多核处理设备的一个或多个处理器核心可以利用具有多个执行单元(例如,整数执行单元或浮点单元)的处理流水线,这些执行单元共同共享具有指令获取的前执行前端,解码 并派遣资源。 此外,一个或多个处理器核心可以实现调度资源,配置为通过分开的调度总线并行分配多个相应执行单元的多个指令。 调度资源还可以并行地从多个线程机会地解码和分派指令操作,以增加调度带宽。 此外,一个或多个处理器核心的处理流水线的一些或所有阶段可被配置为实现相应阶段的独立线程选择。

    PROCESSING PIPELINE HAVING PARALLEL DISPATCH AND METHOD THEREOF
    8.
    发明申请
    PROCESSING PIPELINE HAVING PARALLEL DISPATCH AND METHOD THEREOF 有权
    具有并行分配的处理管道及其方法

    公开(公告)号:US20090172359A1

    公开(公告)日:2009-07-02

    申请号:US11967924

    申请日:2007-12-31

    申请人: Gene Shen Sean Lie

    发明人: Gene Shen Sean Lie

    IPC分类号: G06F9/30

    摘要: One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage.

    摘要翻译: 多核处理设备的一个或多个处理器核心可以利用具有多个执行单元(例如,整数执行单元或浮点单元)的处理流水线,该多个执行单元共同共享具有指令获取的前执行前端,解码 并派遣资源。 此外,一个或多个处理器核心可以实现调度资源,配置为通过分开的调度总线并行分配多个相应执行单元的多个指令。 调度资源还可以并行地从多个线程机会地解码和分派指令操作,以增加调度带宽。 此外,一个或多个处理器核心的处理流水线的一些或所有阶段可被配置为实现相应阶段的独立线程选择。

    Redirect Recovery Cache
    9.
    发明申请
    Redirect Recovery Cache 有权
    重定向恢复缓存

    公开(公告)号:US20080195844A1

    公开(公告)日:2008-08-14

    申请号:US11674566

    申请日:2007-02-13

    申请人: Gene W. Shen Sean Lie

    发明人: Gene W. Shen Sean Lie

    IPC分类号: G06F9/312

    摘要: In one embodiment, a processor comprises a branch resolution unit and a redirect recovery cache. The branch resolution unit is configured to detect a mispredicted branch operation, and to transmit a redirect address for fetching instructions from a correct target of the branch operation responsive to detecting the mispredicted branch operation. The redirect recovery cache comprises a plurality of cache entries, each cache entry configured to store operations corresponding to instructions fetched in response to respective mispredicted branch operations. The redirect recovery cache is coupled to receive the redirect address and, if the redirect address is a hit in the redirect recovery cache, the redirect recovery cache is configured to supply operations from the hit cache entry to a pipeline of the processor, bypassing at least one initial pipeline stage.

    摘要翻译: 在一个实施例中,处理器包括分支解决单元和重定向恢复高速缓存。 分支解决单元被配置为检测错误的分支操作,并且响应于检测到错误的分支操作,发送用于从分支操作的正确目标获取指令的重定向地址。 重定向恢复高速缓存包括多个高速缓存条目,每个高速缓存条目被配置为存储对应于响应于相应的错误预测的分支操作而获取的指令的操作。 重定向恢复缓存被耦合以接收重定向地址,并且如果重定向地址是重定向恢复高速缓存中的命中,则重定向恢复高速缓存被配置为将命中高速缓存条目的操作提供给处理器的流水线,至少绕过 一个初始流水线阶段。

    Processing pipeline having stage-specific thread selection and method thereof
    10.
    发明授权
    Processing pipeline having stage-specific thread selection and method thereof 有权
    具有阶段特定线程选择的处理管线及其方法

    公开(公告)号:US08086825B2

    公开(公告)日:2011-12-27

    申请号:US11967923

    申请日:2007-12-31

    IPC分类号: G06F9/38 G06F9/48

    摘要: One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage.

    摘要翻译: 多核处理设备的一个或多个处理器核心可以利用具有多个执行单元(例如,整数执行单元或浮点单元)的处理流水线,这些执行单元共同共享具有指令获取的前执行前端,解码 并派遣资源。 此外,一个或多个处理器核心可以实现调度资源,配置为通过分开的调度总线并行分配多个相应执行单元的多个指令。 调度资源还可以并行地从多个线程机会地解码和分派指令操作,以增加调度带宽。 此外,一个或多个处理器核心的处理流水线的一些或所有阶段可被配置为实现相应阶段的独立线程选择。