Unpacking packed data in multiple lanes

    公开(公告)号:US09086872B2

    公开(公告)日:2015-07-21

    申请号:US12494667

    申请日:2009-06-30

    CPC classification number: G06F9/30145 G06F9/30032 G06F9/30036

    Abstract: Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand.

    Vector shuffle instructions operating on multiple lanes each having a plurality of data elements using a common set of per-lane control bits
    3.
    发明授权
    Vector shuffle instructions operating on multiple lanes each having a plurality of data elements using a common set of per-lane control bits 有权
    在多个通道上操作的矢量洗牌指令,每个通道具有使用公共的每通道控制位的多个数据元素

    公开(公告)号:US08078836B2

    公开(公告)日:2011-12-13

    申请号:US11967211

    申请日:2007-12-30

    CPC classification number: G06F9/30032 G06F9/30036 G06F9/3885 G06F9/3887

    Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.

    Abstract translation: 描述车道内向量随机操作。 在一个实施例中,混洗指令指定每通道控制位,源操作数和目的地操作数的字段,这些操作数具有相应的通道,每个通道被划分为多个数据元素的相应部分。 根据每通道控制位,从源操作数的每个通道的相应部分中选择数据元素的集合。 这些集合的元素被复制到目标操作数的每个通道的相应部分中的指定字段。 混洗指令的另一实施例还指定第二源操作数,所有操作数具有被划分为多个数据元素的相应通道。 根据每通道控制位选择的集合包含来自第一源操作数的每个通道部分的数据元素和来自第二源操作数的每个对应通道部分的数据元素。 将元素复制到目标操作数的每个通道中的指定字段。

    METHOD AND APPARATUS FOR AFFINITY-GUIDED SPECULATIVE HELPER THREADS IN CHIP MULTIPROCESSORS
    4.
    发明申请
    METHOD AND APPARATUS FOR AFFINITY-GUIDED SPECULATIVE HELPER THREADS IN CHIP MULTIPROCESSORS 有权
    芯片多路由器中辅助引导的辅助线路的方法和装置

    公开(公告)号:US20110035555A1

    公开(公告)日:2011-02-10

    申请号:US12909774

    申请日:2010-10-21

    CPC classification number: G06F9/3842 G06F9/383 G06F9/3851 G06F12/0862

    Abstract: Apparatus, system and methods are provided for performing speculative data prefetching in a chip multiprocessor (CMP). Data is prefetched by a helper thread that runs on one core of the CMP while a main program runs concurrently on another core of the CMP. Data prefetched by the helper thread is provided to the helper core. For one embodiment, the data prefetched by the helper thread is pushed to the main core. It may or may not be provided to the helper core as well. A push of prefetched data to the main core may occur during a broadcast of the data to all cores of an affinity group. For at least one other embodiment, the data prefetched by a helper thread is provided, upon request from the main core, to the main core from the helper core's local cache.

    Abstract translation: 提供了用于在芯片多处理器(CMP)中执行推测性数据预取的装置,系统和方法。 数据由在CMP的一个核心上运行的辅助线程预取,而主程序在CMP的另一个核心上同时运行。 由辅助线程预取的数据被提供给辅助核心。 对于一个实施例,由辅助线程预取的数据被推送到主核心。 它也可以也可以不被提供给辅助核心。 在将数据广播到亲和组的所有核心的过程中,可能会将预取数据推送到主核心。 对于至少另一个实施例,根据主核心的请求,从辅助核心的本地高速缓存提供由辅助线程预取的数据到主核心。

    Method and apparatus for reducing clock frequency during low workload periods
    5.
    发明授权
    Method and apparatus for reducing clock frequency during low workload periods 有权
    在低工作负载期间降低时钟频率的方法和装置

    公开(公告)号:US07721129B2

    公开(公告)日:2010-05-18

    申请号:US11330647

    申请日:2006-01-12

    CPC classification number: G06F1/324 G06F1/3203 Y02D10/126

    Abstract: A clock frequency control unit for an integrated circuit (IC) includes a clock generator, a finite state machine (FSM), and a gating circuit (GC). The FSM has at least first and second states corresponding to non-low workload low workload states, respectively. In the first state, the GC provides a clock signal to functional units of the IC with the same frequency as the clock generator output. In the second state, the GC reduces the frequency of the clock signal. In one embodiment, the GC masks out selected cycles of the clock generator output to reduce the clock signal frequency. The FSM monitors the operation of the IC to transition from the first state to the second state when selected “low workload” conditions are detected (e.g., long latency cache miss). Similarly, the FSM transitions from the second state to the first state when selected “non-low workload” conditions are detected.

    Abstract translation: 用于集成电路(IC)的时钟频率控制单元包括时钟发生器,有限状态机(FSM)和门控电路(GC)。 FSM至少具有与非低工作负载低工作负载状态相对应的第一和第二状态。 在第一种状态下,GC以与时钟发生器输出相同的频率向IC的功能单元提供时钟信号。 在第二种状态下,GC降低了时钟信号的频率。 在一个实施例中,GC屏蔽时钟发生器输出的选定周期以减少时钟信号频率。 当检测到选择的“低工作负载”条件(例如,长延迟高速缓存未命中)时,FSM监视IC的操作从第一状态转换到第二状态。 类似地,当检测到所选择的“非低工作负载”条件时,FSM从第二状态转变到第一状态。

    Memory system for multiple data types
    7.
    发明授权
    Memory system for multiple data types 失效
    多种数据类型的内存系统

    公开(公告)号:US06944720B2

    公开(公告)日:2005-09-13

    申请号:US10402827

    申请日:2003-03-27

    CPC classification number: G06F12/0875 G06F12/1054 G06F2212/401

    Abstract: A memory system is provided for storing multiple data types. The memory system includes a main memory, a local cache, and a translation unit. The local cache has multiple entries, each of which includes a data field to store data and a status field to indicate a storage state for the stored data. The translation unit includes a translation lookaside buffer (TLB) and a status-cache (STC). The TLB stores address translations for data in the main memory, and the STC stores storage states for data indicated by the address translations.

    Abstract translation: 提供了一种用于存储多种数据类型的存储器系统。 存储器系统包括主存储器,本地高速缓存和翻译单元。 本地缓存具有多个条目,每个条目包括用于存储数据的数据字段和用于指示所存储的数据的存储状态的状态字段。 翻译单元包括翻译后备缓冲器(TLB)和状态缓存(STC)。 TLB存储主存储器中的数据的地址转换,并且STC存储由地址转换指示的数据的存储状态。

    Method and apparatus for resuming memory operations from a low latency wake-up low power state
    8.
    发明授权
    Method and apparatus for resuming memory operations from a low latency wake-up low power state 有权
    从低延迟唤醒低功率状态恢复存储器操作的方法和装置

    公开(公告)号:US06886105B2

    公开(公告)日:2005-04-26

    申请号:US09504003

    申请日:2000-02-14

    Abstract: A method and apparatus for resuming operations from a low latency wake-up low power state. One embodiment provides a system including a processor, an operating system, and a memory subsystem that requires initialization commands to exit a memory low power state. Control logic detects exit from an operating system low latency low power state and responsively generates a plurality of initialization commands to remove the memory subsystem from the memory low power state prior to deasserting a stop clock signal and allowing execution to resume.

    Abstract translation: 一种用于从低延迟唤醒低功率状态恢复操作的方法和装置。 一个实施例提供了一种包括处理器,操作系统和存储器子系统的系统,其需要初始化命令来退出存储器低功率状态。 控制逻辑检测到从操作系统的低延迟低功率状态退出,并响应地产生多个初始化命令,以在停止停止时钟信号并允许执行恢复之前将存储器子系统从存储器低功率状态中移除。

    Method and apparatus for providing memory access in a processor pipeline
    9.
    发明授权
    Method and apparatus for providing memory access in a processor pipeline 失效
    用于在处理器流水线中提供存储器访问的方法和装置

    公开(公告)号:US5787026A

    公开(公告)日:1998-07-28

    申请号:US575780

    申请日:1995-12-20

    CPC classification number: G06F9/3826 G06F9/3867

    Abstract: The invention provides a method and apparatus for providing operand reads in a processor pipeline. According to one aspect of the invention, a method is described for executing an instruction in a computer pipeline that requires different operands be read from the same register file in different stages of the computer pipeline. According to another aspect of the invention, a method is described for executing an instruction in a processor pipeline. According to this method, at least a first operand is read from a register file in a first stage of the processor pipeline. If execution of the instruction causes the processor to place the first operand in a storage area other than the register file, then the first operand in written to that storage area in a subsequent stage of the processor pipeline. Otherwise, one or more ALU operations are performed on the first operand and at least a second operand in a different subsequent stage of the processor pipeline.

    Abstract translation: 本发明提供了一种用于在处理器管线中提供操作数读取的方法和装置。 根据本发明的一个方面,描述了一种用于执行计算机流水线中的指令的方法,其需要在计算机管线的不同阶段从同一寄存器文件读取不同的操作数。 根据本发明的另一方面,描述了一种用于在处理器流水线中执行指令的方法。 根据该方法,在处理器管线的第一级中,从寄存器文件读取至少第一操作数。 如果指令的执行导致处理器将第一操作数放置在除寄存器文件之外的存储区域中,则将第一操作数写入处理器管线的后续阶段中的该存储区域。 否则,在处理器流水线的不同后续阶段的第一操作数和至少第二操作数上执行一个或多个ALU操作。

Patent Agency Ranking