Managing history information for branch prediction

    公开(公告)号:US10007524B2

    公开(公告)日:2018-06-26

    申请号:US14541882

    申请日:2014-11-14

    Applicant: Cavium, Inc.

    Abstract: Branch history information characterizes results of branch instructions previously executed by a processor. A count is stored of a number of consecutive branch instructions previously executed by the processor whose results all indicate a not taken branch. In a first pipeline stage, a predicted branch result is provided based on at least a portion of the branch history information, and one or more of the branch history information, and the count, is updated based on the predicted branch result. In a second pipeline stage an actual branch result is provided based on an executed branch instruction, and the branch history information is updated based on the actual branch result. If the predicted branch result indicates a taken branch, the branch history information is updated based on the count, and if the predicted branch result indicates a not taken branch, the count is updated but not the branch history information.

    Microprocessor with ALU integrated into load unit
    2.
    发明授权
    Microprocessor with ALU integrated into load unit 有权
    具有ALU的微处理器集成到负载单元中

    公开(公告)号:US09501286B2

    公开(公告)日:2016-11-22

    申请号:US12609169

    申请日:2009-10-30

    Abstract: A superscalar pipelined microprocessor includes a register set defined by its instruction set architecture, a cache memory, execution units, and a load unit, coupled to the cache memory and distinct from the other execution units. The load unit comprises an ALU. The load unit receives an instruction that specifies a memory address of a source operand, an operation to be performed on the source operand to generate a result, and a destination register of the register set to which the result is to be stored. The load unit reads the source operand from the cache memory. The ALU performs the operation on the source operand to generate the result, rather than forwarding the source operand to any of the other execution units of the microprocessor to perform the operation on the source operand to generate the result. The load unit outputs the result for subsequent retirement to the destination register.

    Abstract translation: 超标量流水线微处理器包括由其指令集架构定义的寄存器组,高速缓冲存储器,执行单元和负载单元,耦合到高速缓冲存储器并且与其他执行单元不同。 负载单元包括一个ALU。 加载单元接收指定源操作数的存储器地址的指令,要在源操作数上执行的用于生成结果的操作以及要存储结果的寄存器集的目标寄存器。 加载单元从缓存中读取源操作数。 ALU对源操作数执行操作以生成结果,而不是将源操作数转发到微处理器的任何其他执行单元,以对源操作数执行操作以生成结果。 加载单元将结果退出到目的地寄存器。

    Apparatus and method for controlling the number of vector elements written to a data store while performing speculative vector write operations
    3.
    发明授权
    Apparatus and method for controlling the number of vector elements written to a data store while performing speculative vector write operations 有权
    用于控制在执行推测矢量写入操作时写入数据存储器的向量元素的数量的装置和方法

    公开(公告)号:US09483438B2

    公开(公告)日:2016-11-01

    申请号:US14462194

    申请日:2014-08-18

    Applicant: ARM LIMITED

    Abstract: A data processing apparatus and method for performing speculative vector access operations are provided. The data processing apparatus has a reconfigurable buffer accessible to vector data access circuitry and comprising a storage array for storing up to M vectors of N vectors elements. The vector data access circuitry performs speculative data write operations in order to cause vector elements from selected vector operands in a vector register bank to be stored into the reconfigurable buffer. On occurrence of a commit condition, the vector elements currently stored in the reconfigurable buffer are then written to a data store. Speculation control circuitry maintains a speculation width indication indicating the number of vector elements of each selected vector operand stored in the reconfigurable buffer. The speculation width indication is initialized to an initial value, but on detection of an overflow condition within the reconfigurable buffer the speculation width indication is modified to reduce the number of vector elements of each selected vector operand stored in the reconfigurable buffer. The reconfigurable buffer then responds to a change in the speculation width indication by reconfiguring the storage array to increase the number of vectors M and reduce the number of vector elements N per vector. This provides an efficient mechanism for supporting performance of speculative data write operations.

    Abstract translation: 提供了一种用于执行推测向量访问操作的数据处理装置和方法。 数据处理装置具有可访问向量数据访问电路的可重构缓冲器,并且包括用于存储N个向量元素的多达M个向量的存储阵列。 向量数据访问电路执行推测性数据写入操作,以便使来自向量寄存器组中的所选向量操作数的向量元素被存储到可重构缓冲器中。 在发生提交条件时,当前存储在可重构缓冲器中的向量元素然后被写入数据存储。 投机控制电路维持指示宽度指示,指示存储在可重构缓冲器中的每个所选向量操作数的向量元素的数量。 推测宽度指示被初始化为初始值,但是通过检测可重构缓冲器内的溢出条件,推测宽度指示被修改以减少存储在可重构缓冲器中的每个所选向量操作数的向量元素的数量。 然后,可重构缓冲器通过重新配置存储阵列来响应推测宽度指示的变化,以增加向量M的数量并减少每个向量的向量元素N的数量。 这提供了一种有效的机制来支持投机数据写入操作的性能。

    RUN-TIME PARALLELIZATION OF CODE EXECUTION BASED ON AN APPROXIMATE REGISTER-ACCESS SPECIFICATION
    4.
    发明申请
    RUN-TIME PARALLELIZATION OF CODE EXECUTION BASED ON AN APPROXIMATE REGISTER-ACCESS SPECIFICATION 有权
    基于近似寄存器访问规范的代码执行的运行时间并行

    公开(公告)号:US20160306633A1

    公开(公告)日:2016-10-20

    申请号:US14690424

    申请日:2015-04-19

    Abstract: A method includes, in a processor that processes instructions of program code, processing a first segment of the instructions. One or more destination registers are identified in the first segment using an approximate specification of register access by the instructions. Respective values of the destination registers are made available to a second segment of the instructions only upon verifying that the values are valid for readout by the second segment in accordance with the approximate specification. The second segment is processed at least partially in parallel with processing of the first segment, using the values made available from the first segment.

    Abstract translation: 一种方法包括在处理程序代码指令的处理器中处理指令的第一段。 使用指令的寄存器访问的近似规范,在第一段中标识一个或多个目的地寄存器。 目的地寄存器的相应值仅在验证该值对于根据近似规范由第二段读出有效时才可用于指令的第二段。 使用从第一段获得的值,至少部分地与第一段的处理并行处理第二段。

    MANAGING HISTORY INFORMATION FOR BRANCH PREDICTION
    5.
    发明申请
    MANAGING HISTORY INFORMATION FOR BRANCH PREDICTION 有权
    管理分支预测的历史信息

    公开(公告)号:US20160139932A1

    公开(公告)日:2016-05-19

    申请号:US14541882

    申请日:2014-11-14

    Applicant: Cavium, Inc.

    Abstract: Branch history information characterizes results of branch instructions previously executed by a processor. A count is stored of a number of consecutive branch instructions previously executed by the processor whose results all indicate a not taken branch. In a first pipeline stage, a predicted branch result is provided based on at least a portion of the branch history information, and one or more of the branch history information, and the count, is updated based on the predicted branch result. In a second pipeline stage an actual branch result is provided based on an executed branch instruction, and the branch history information is updated based on the actual branch result. If the predicted branch result indicates a taken branch, the branch history information is updated based on the count, and if the predicted branch result indicates a not taken branch, the count is updated but not the branch history information.

    Abstract translation: 分支历史信息表征先前由处理器执行的分支指令的结果。 存储由处理器预先执行的多个连续分支指令的计数,其结果全部指示未被分支。 在第一流水线级中,基于分支历史信息的至少一部分提供预测分支结果,并且基于预测分支结果来更新分支历史信息和计数中的一个或多个。 在第二流水线级中,基于所执行的分支指令提供实际分支结果,并且基于实际分支结果来更新分支历史信息。 如果预测分支结果表示取得分支,则根据计数更新分支历史信息,如果预测分支结果表示未分支,则更新计数,而不更新分支历史信息。

    Multi-level instruction cache prefetching
    6.
    发明授权
    Multi-level instruction cache prefetching 有权
    多级指令缓存预取

    公开(公告)号:US09110810B2

    公开(公告)日:2015-08-18

    申请号:US13312962

    申请日:2011-12-06

    CPC classification number: G06F12/0862 G06F9/3802 G06F9/3875 G06F2212/6026

    Abstract: One embodiment of the present invention sets forth an improved way to prefetch instructions in a multi-level cache. Fetch unit initiates a prefetch operation to transfer one of a set of multiple cache lines, based on a function of a pseudorandom number generator and the sector corresponding to the current instruction L1 cache line. The fetch unit selects a prefetch target from the set of multiple cache lines according to some probability function. If the current instruction L1 cache 370 is located within the first sector of the corresponding L1.5 cache line, then the selected prefetch target is located at a sector within the next L1.5 cache line. The result is that the instruction L1 cache hit rate is improved and instruction fetch latency is reduced, even where the processor consumes instructions in the instruction L1 cache at a fast rate.

    Abstract translation: 本发明的一个实施例提出了一种改进的方式来预取多级缓存中的指令。 提取单元基于伪随机数发生器的功能和与当前指令L1高速缓存行相对应的扇区,发起预取操作以传送一组多个高速缓存行中的一个。 提取单元根据一些概率函数从多条高速缓存行集合中选择预取目标。 如果当前指令L1高速缓存370位于对应的L1.5高速缓存行的第一扇区内,则所选择的预取目标位于下一个L1.5高速缓存行内的扇区处。 结果是,即使在处理器以快速的速率消耗指令L1高速缓存中的指令的情况下,指令L1高速缓存命中率得到改善并且指令提取延迟被降低。

    DATA PROCESSING APPARATUS AND METHOD FOR PERFORMING SCAN OPERATIONS
    7.
    发明申请
    DATA PROCESSING APPARATUS AND METHOD FOR PERFORMING SCAN OPERATIONS 有权
    数据处理设备和执行扫描操作的方法

    公开(公告)号:US20150212972A1

    公开(公告)日:2015-07-30

    申请号:US14165967

    申请日:2014-01-28

    Applicant: ARM LIMITED

    Abstract: A data processing apparatus and method are provided for executing a vector scan instruction. The data processing apparatus comprises a vector register store configured to store vector operands, and processing circuitry configured to perform operations on vector operands retrieved from said vector register store. Further, control circuitry is configured to control the processing circuitry to perform the operations required by one or more instructions, said one or more instructions including a vector scan instruction specifying a vector operand comprising N vector elements and defining a scan operation to be performed on a sequence of vector elements within the vector operand. The control circuitry is responsive to the vector scan instruction to partition the N vector elements of the specified vector operand into P groups of adjacent vector elements, where P is between 2 and N/2, and to control the processing circuitry to perform a partitioned scan operation yielding the same result as the defined scan operation. The processing circuitry is configured to perform the partitioned scan operation by performing separate scan operations on those vector elements of the sequence contained within each group to produce intermediate results for each group, and to perform a computation operation to combine the intermediate results into a final result vector operand containing a sequence of result vector elements. The partitioned scan operation approach of the present invention enables a balance to be achieved between energy consumption and performance.

    Abstract translation: 提供了一种用于执行向量扫描指令的数据处理装置和方法。 数据处理装置包括被配置为存储向量操作数的向量寄存器存储器,以及被配置为对从所述向量寄存器存储器检索的向量操作数执行操作的处理电路。 此外,控制电路被配置为控制处理电路执行一个或多个指令所需的操作,所述一个或多个指令包括指定包括N个向量元素的向量操作数的向量扫描指令,并且定义要在 向量操作数中向量元素的序列。 控制电路响应于矢量扫描指令将指定矢量操作数的N个向量元素划分为相邻矢量元素的P组,其中P在2和N / 2之间,并且控制处理电路执行分区扫描 操作产生与定义的扫描操作相同的结果。 处理电路被配置为通过对包含在每个组中的序列的那些矢量元素执行单独的扫描操作来执行分割扫描操作,以产生每个组的中间结果,并且执行计算操作以将中间结果组合成最终结果 向量操作数包含一系列结果向量元素。 本发明的划分扫描操作方法能够在能量消耗和性能之间实现平衡。

    DATA PROCESSING APPARATUS AND METHOD FOR PERFORMING SPECULATIVE VECTOR ACCESS OPERATIONS
    8.
    发明申请
    DATA PROCESSING APPARATUS AND METHOD FOR PERFORMING SPECULATIVE VECTOR ACCESS OPERATIONS 有权
    数据处理装置和执行分布式矢量访问操作的方法

    公开(公告)号:US20150100754A1

    公开(公告)日:2015-04-09

    申请号:US14462194

    申请日:2014-08-18

    Applicant: ARM LIMITED

    Abstract: A data processing apparatus and method for performing speculative vector access operations are provided. The data processing apparatus has a reconfigurable buffer accessible to vector data access circuitry and comprising a storage array for storing up to M vectors of N vectors elements. The vector data access circuitry performs speculative data write operations in order to cause vector elements from selected vector operands in a vector register bank to be stored into the reconfigurable buffer. On occurrence of a commit condition, the vector elements currently stored in the reconfigurable buffer are then written to a data store. Speculation control circuitry maintains a speculation width indication indicating the number of vector elements of each selected vector operand stored in the reconfigurable buffer. The speculation width indication is initialised to an initial value, but on detection of an overflow condition within the reconfigurable buffer the speculation width indication is modified to reduce the number of vector elements of each selected vector operand stored in the reconfigurable buffer. The reconfigurable buffer then responds to a change in the speculation width indication by reconfiguring the storage array to increase the number of vectors M and reduce the number of vector elements N per vector. This provides an efficient mechanism for supporting performance of speculative data write operations.

    Abstract translation: 提供了一种用于执行推测向量访问操作的数据处理装置和方法。 数据处理装置具有可访问向量数据访问电路的可重构缓冲器,并且包括用于存储N个向量元素的多达M个向量的存储阵列。 向量数据访问电路执行推测性数据写入操作,以便使来自向量寄存器组中的所选向量操作数的向量元素被存储到可重构缓冲器中。 在发生提交条件时,当前存储在可重构缓冲器中的向量元素然后被写入数据存储。 投机控制电路维持指示宽度指示,指示存储在可重构缓冲器中的每个所选向量操作数的向量元素的数量。 推测宽度指示初始化为初始值,但是通过检测可重构缓冲器内的溢出条件,可以修改推测宽度指示,以减少存储在可重构缓冲器中的每个选定向量操作数的向量元素的数量。 然后,可重构缓冲器通过重新配置存储阵列来响应推测宽度指示的变化,以增加向量M的数量并减少每个向量的向量元素N的数量。 这提供了一种有效的机制来支持投机数据写入操作的性能。

    DIVISION UNIT WITH MULTIPLE DIVIDE ENGINES
    10.
    发明申请
    DIVISION UNIT WITH MULTIPLE DIVIDE ENGINES 有权
    具有多个引擎的部门

    公开(公告)号:US20130179664A1

    公开(公告)日:2013-07-11

    申请号:US13345391

    申请日:2012-01-06

    Abstract: Techniques are disclosed relating to integrated circuits that include hardware support for divide and/or square root operations. In one embodiment, an integrated circuit is disclosed that includes a division unit that, in turn, includes a normalization circuit and a plurality of divide engines. The normalization circuit is configured to normalize a set of operands. Each divide engine is configured to operate on a respective normalized set of operands received from the normalization circuit. In some embodiments, the integrated circuit includes a scheduler unit configured to select instructions for issuance to a plurality of execution units including the division unit. The scheduler unit is further configured to maintain a counter indicative of a number of instructions currently being operated on by the division unit, and to determine, based on the counter whether to schedule subsequent instructions for issuance to the division unit.

    Abstract translation: 公开了涉及包括用于划分和/或平方根操作的硬件支持的集成电路的技术。 在一个实施例中,公开了一种集成电路,其包括分割单元,该分割单元又包括归一化电路和多个除法引擎。 归一化电路被配置为归一化一组操作数。 每个分频引擎被配置为对从归一化电路接收的相应的归一化操作数集进行操作。 在一些实施例中,集成电路包括调度器单元,其被配置为选择用于向包括该分割单元的多个执行单元发布的指令。 调度器单元还被配置为保持指示当前正在由分割单元操作的指令的数量的计数器,并且基于计数器确定是否计划用于发布到分割单元的后续指令。

Patent Agency Ranking