PROCESSING MEMORY ACCESS INSTRUCTIONS THAT HAVE DUPLICATE MEMORY INDICES
    1.
    发明申请
    PROCESSING MEMORY ACCESS INSTRUCTIONS THAT HAVE DUPLICATE MEMORY INDICES 有权
    处理存储器访问指令,具有重复的存储器指示

    公开(公告)号:US20140095779A1

    公开(公告)日:2014-04-03

    申请号:US13631378

    申请日:2012-09-28

    IPC分类号: G06F12/00 G06F12/02

    摘要: A method of an aspect includes receiving an instruction indicating a first source packed memory indices, a second source packed data operation mask, and a destination storage location. Memory indices of the packed memory indices are compared with one another. One or more sets of duplicate memory indices are identified. Data corresponding to each set of duplicate memory indices is loaded only once. The loaded data corresponding to each set of duplicate memory indices is replicated for each of the duplicate memory indices in the set. A packed data result in the destination storage location in response to the instruction. The packed data result includes data elements from memory locations that are indicated by corresponding memory indices of the packed memory indices when not blocked by corresponding elements of the packed data operation mask.

    摘要翻译: 一方面的方法包括接收指示第一源打包存储器索引的指令,第二源打包数据操作掩码和目的地存储位置。 将打包的内存索引的内存索引彼此进行比较。 识别一组或多组重复的内存索引。 与每组重复存储器索引对应的数据仅加载一次。 对于集合中的每个重复存储器索引,复制对应于每组重复存储器索引的加载数据。 打包数据导致响应于该指令的目的地存储位置。 打包数据结果包括来自存储器位置的数据元素,当不被打包数据操作掩码的相应元素阻塞时,由打包的存储器索引的相应存储器索引指示。

    METHOD AND APPARATUS FOR EFFICIENTLY MANAGING ARCHITECTURAL REGISTER STATE OF A PROCESSOR
    3.
    发明申请
    METHOD AND APPARATUS FOR EFFICIENTLY MANAGING ARCHITECTURAL REGISTER STATE OF A PROCESSOR 有权
    有效管理处理者建筑登记状态的方法和装置

    公开(公告)号:US20160179527A1

    公开(公告)日:2016-06-23

    申请号:US14581535

    申请日:2014-12-23

    IPC分类号: G06F9/30

    摘要: An apparatus and method for efficiently managing the architectural state of a processor. For example, one embodiment of a processor comprises: a source mask register to be logically subdivided into at least a first portion to store a usable portion of a mask value and a second portion to store an indication of whether the usable portion of the mask value has been updated; a control register to store an unusable portion of the mask value; architectural state management logic to read the indication to determine whether the mask value has been updated prior to performing a store operation, wherein if the mask value has been updated, then the architectural state management logic is to read the usable portion of the mask value from the first portion of the source mask register and zero out bits of the unusable portion of the mask value to generate a final mask value to be saved to memory, and wherein if the mask value has not been updated, then the architectural state management logic is to concatenate the usable portion of the mask value with the unusable portion of the mask value read from the control register to generate a final mask value to be saved to memory.

    摘要翻译: 一种用于有效管理处理器的架构状态的装置和方法。 例如,处理器的一个实施例包括:源屏蔽寄存器,其逻辑地细分为至少第一部分以存储掩模值的可用部分,以及第二部分,用于存储掩模值的可用部分的指示 已经升级; 控制寄存器,用于存储掩模值的不可用部分; 架构状态管理逻辑,用于读取指示以确定在执行存储操作之前是否更新了掩码值,其中如果掩码值已被更新,则架构状态管理逻辑将从掩码值的可用部分读取 源掩码寄存器的第一部分和掩模值的不可用部分的零输出位,以产生要保存到存储器的最终掩码值,并且其中如果掩码值尚未被更新,则架构状态管理逻辑是 将掩模值的可用部分与从控制寄存器读取的掩模值的不可用部分连接,以生成要保存到存储器的最终掩模值。

    Processors having fully-connected interconnects shared by vector conflict instructions and permute instructions

    公开(公告)号:US10678541B2

    公开(公告)日:2020-06-09

    申请号:US13977126

    申请日:2011-12-29

    IPC分类号: G06F9/30 G06F9/38

    摘要: An apparatus includes a decode unit to decode a permute instruction and a vector conflict instruction. A vector execution unit is coupled with the decode unit and includes a fully-connected interconnect. The fully-connected interconnect has at least four inputs to receive at least four corresponding data elements of at least one source vector. The fully-connected interconnect has at least four outputs. Each of the at least four inputs is coupled with each of the at least four outputs. The execution unit also includes a permute instruction execution logic coupled with the at least four outputs and operable to store a first vector result in response to the permute instruction. The execution unit also includes a vector conflict instruction execution logic coupled with the at least four outputs and operable to store a second vector result in a destination storage location in response to the vector conflict instruction.

    PROCESSORS HAVING FULLY-CONNECTED INTERCONNECTS SHARED BY VECTOR CONFLICT INSTRUCTIONS AND PERMUTE INSTRUCTIONS
    8.
    发明申请
    PROCESSORS HAVING FULLY-CONNECTED INTERCONNECTS SHARED BY VECTOR CONFLICT INSTRUCTIONS AND PERMUTE INSTRUCTIONS 审中-公开
    具有由VECTOR CONFLICT指令和指令说明共享的完全连接的互连的处理程序

    公开(公告)号:US20140181466A1

    公开(公告)日:2014-06-26

    申请号:US13977126

    申请日:2011-12-29

    IPC分类号: G06F9/30 G06F9/38

    摘要: An apparatus includes a decode unit to decode a permute instruction and a vector conflict instruction. A vector execution unit is coupled with the decode unit and includes a fully-connected interconnect. The fully-connected interconnect has at least four inputs to receive at least four corresponding data elements of at least one source vector. The fully-connected interconnect has at least four outputs. Each of the at least four inputs is coupled with each of the at least four outputs. The execution unit also includes a permute instruction execution logic coupled with the at least four outputs and operable to store a first vector result in response to the permute instruction. The execution unit also includes a vector conflict instruction execution logic coupled with the at least four outputs and operable to store a second vector result in a destination storage location in response to the vector conflict instruction.

    摘要翻译: 一种装置包括解码单元,用于解码置换指令和向量冲突指令。 向量执行单元与解码单元耦合并且包括完全连接的互连。 完全连接的互连具有至少四个输入以接收至少一个源向量的至少四个对应的数据元素。 完全连接的互连至少有四个输出。 所述至少四个输入中的每一个与所述至少四个输出中的每一个耦合。 所述执行单元还包括与所述至少四个输出耦合的置换指令执行逻辑,并且可操作以响应于所述置换指令来存储第一向量结果。 执行单元还包括与至少四个输出耦合的向量冲突指令执行逻辑,并且可操作以响应于向量冲突指令将第二向量结果存储在目的地存储位置。