Method and apparatus for performing table lookup
    1.
    发明授权
    Method and apparatus for performing table lookup 有权
    执行表查找的方法和装置

    公开(公告)号:US09058284B1

    公开(公告)日:2015-06-16

    申请号:US13422979

    申请日:2012-03-16

    CPC classification number: G06F12/10 G06F12/0855 G06F12/1009 G06F12/1027

    Abstract: Method and apparatus for performing table lookup are disclosed. In one embodiment, the method includes providing a lookup table, where the lookup table includes a plurality of translation modes and each translation mode includes a corresponding translation table tree supporting a plurality of page sizes. The method further includes receiving a search request from a requester, determining a translation table tree for conducting the search request, determining a lookup sequence based on the translation table tree, generating a search output using the lookup sequence, and transmitting the search output to the requester. The plurality of translation modes includes a first set of page sizes for 32-bit operating system software and a second set of page sizes for 64-bit operating system software. The plurality of page sizes includes non-global pages, global pages, and both non-global and global pages.

    Abstract translation: 公开了用于执行表查找的方法和装置。 在一个实施例中,该方法包括提供查找表,其中查找表包括多个翻译模式,并且每个翻译模式包括支持多个页面大小的对应的翻译表格树。 该方法还包括从请求者接收搜索请求,确定用于进行搜索请求的翻译表格树,基于转换表格树确定查找序列,使用查找序列生成搜索输出,并将搜索输出发送到 请求者 多个翻译模式包括用于32位操作系统软件的第一组页面大小和用于64位操作系统软件的第二组页面大小。 多个页面大小包括非全局页面,全局页面以及非全局页面和全局页面。

    Unified multi-function operation scheduler for out-of-order execution in a superscaler processor
    2.
    发明授权
    Unified multi-function operation scheduler for out-of-order execution in a superscaler processor 有权
    统一的多功能操作调度器,用于在超级计数器处理器中进行无序执行

    公开(公告)号:US06195744B1

    公开(公告)日:2001-02-27

    申请号:US09252898

    申请日:1999-02-18

    Abstract: A superscalar processor includes a scheduler which selects operations for out-of-order execution. The scheduler contains storage and control logic which is partitioned into entries corresponding to operations to be executed, being executed, or completed. The scheduler issues operations to execution units for parallel pipelined execution, selects and provides operands as required for execution, and acts as a reorder buffer keeping the results of operations until the results can be safely committed. The scheduler is tightly coupled to execution pipelines and provides a large parallel path for initial operation stages which minimize pipeline bottlenecks and hold ups into and out of the execution units. The scheduler monitors the entries to determine when all operands required for execution of an operation are available and provides required operands to the execution units. The operands selected can be from a register file, a scheduler entry, or an execution unit. Control logic in the entries is linked together into scan chains which identify operations and operands for execution.

    Abstract translation: 超标量处理器包括调度器,其选择用于无序执行的操作。 调度器包含存储和控制逻辑,其被分割成与要执行,被执行或完成的操作相对应的条目。 调度程序向并行流水线执行的执行单元发出操作,根据执行需要选择并提供操作数,并充当重新排序缓冲区,保持操作结果,直到结果可以安全地提交。 调度程序与执行管道紧密耦合,并为初始操作阶段提供了一个大的并行路径,从而最大限度地减少管道瓶颈和进出执行单元。 调度器监视条目以确定执行操作所需的所有操作数何时可用,并向执行单元提供所需的操作数。 选择的操作数可以来自寄存器文件,调度器条目或执行单元。 条目中的控制逻辑链接在一起,用于识别用于执行的操作和操作数的扫描链。

    Processing system that rapidly indentifies first or second operations of
selected types for execution
    3.
    发明授权
    Processing system that rapidly indentifies first or second operations of selected types for execution 失效
    快速确定所选类型的第一或第二操作执行的处理系统

    公开(公告)号:US5881261A

    公开(公告)日:1999-03-09

    申请号:US650055

    申请日:1996-05-16

    Abstract: A processing system includes sequential entries for storing operations of different types and a scan chain which can identify an operation of a first type which follows after an operation of a second type. The first and second types can be identical so that the scan chain identifies the second operation of a particular type in the sequence. The scan chain includes single-entry "generate", "propagate", "kill", and "only" terms which control a scan bit. Conceptually, if the "only" term is not asserted, an entry of the second type generates the scan bit and asserts the "only" term. After the "only" term is asserted, further generation of the scan bit is inhibited. Each entry either propagates the scan bit to the next entry or if the entry is of the first type, kills the scan bit and identifies itself as the selected entry. Look-ahead logic determines group terms from single-entry terms to indicate whether a scan bit would be generated, propagated, or killed by a group of entries. Accordingly, the scan bit is not required to propagate through every entry, and scans can be performed quickly.

    Abstract translation: 处理系统包括用于存储不同类型的操作的顺序条目和可识别第二类操作之后的第一类型的操作的扫描链。 第一和第二类型可以相同,使得扫描链标识序列中特定类型的第二操作。 扫描链包括控制扫描位的单个条目“生成”,“传播”,“杀死”和“唯一”术语。 在概念上,如果“唯一”术语没有被断言,则第二种类型的条目生成扫描位并且断言“唯一”项。 在“唯一”术语被断言之后,进一步产生扫描位被禁止。 每个条目将扫描位传播到下一个条目,或者如果条目是第一个条目,则将扫描位置并将其标识为所选条目。 先行逻辑从单项条目确定组术语,以指示扫描位是否由一组条目生成,传播或杀死。 因此,扫描位不需要通过每个条目传播,并且可以快速执行扫描。

    Self-modifying code handling system
    4.
    发明授权
    Self-modifying code handling system 失效
    自修改代码处理系统

    公开(公告)号:US5826073A

    公开(公告)日:1998-10-20

    申请号:US592150

    申请日:1996-01-26

    Abstract: A processor which includes tags indicating memory addresses for instructions advancing through pipeline stages of the processor and which includes an instruction decoder having a store target address buffer allows a self-modifying code handling system to detect store operations writing into the instruction stream and trigger a self-modifying code fault. In one embodiment of a seIf-modifying code handling system, a store pipe is coupled to a data cache to commit results of a store operation to a memory subsystem. The store pipe supplies a store operation target address indication on commitment of a store operation result. A scheduler includes ordered Op entries for Ops decoded from instructions and includes corresponding first address tags covering memory addresses for the instructions. First comparison logic is coupled to the store pipe and to the first address tags to trigger self-modifying code fault handling means in response to a match between the store operation target address and one of the first address tags. An instruction decoder is coupled between the instruction cache and the scheduler. The instruction decoder includes instruction buffer entries and second address tags associated with the instruction buffer entries. Second comparison logic is coupled to the store pipe and to the second address tags to trigger the self-modifying code fault handling means in response to a match between the store operation target address and one of the second address tags.

    Abstract translation: 一种处理器,其包括指示用于在处理器的流水线级前进的指令的存储器地址的标签,并且包括具有存储目标地址缓冲器的指令解码器的标签允许自修改代码处理系统检测写入指令流的存储操作并触发自身 修改代码错误。 在修改代码处理系统的一个实施例中,存储管耦合到数据高速缓存以将存储操作的结果提交给存储器子系统。 存储管道提供存储操作结果承诺的存储操作目标地址指示。 调度器包括用于从指令解码的Ops的有序的Op条目,并且包括覆盖指令的存储器地址的对应的第一地址标签。 响应于存储操作目标地址与第一地址标签之一的匹配,第一比较逻辑被耦合到存储管道和第一地址标签以触发自修改代码故障处理装置。 指令解码器耦合在指令高速缓存和调度器之间。 指令解码器包括与指令缓冲器条目相关联的指令缓冲器条目和第二地址标签。 响应于存储操作目标地址和第二地址标签之一的匹配,第二比较逻辑耦合到存储管和第二地址标签以触发自修改代码故障处理装置。

    Out-of-order processing that removes an issued operation from an
execution pipeline upon determining that the operation would cause a
lengthy pipeline delay
    5.
    发明授权
    Out-of-order processing that removes an issued operation from an execution pipeline upon determining that the operation would cause a lengthy pipeline delay 失效
    在确定操作将导致漫长的流水线延迟之后,从处理流水线中删除发出的操作的无序处理

    公开(公告)号:US5799165A

    公开(公告)日:1998-08-25

    申请号:US649242

    申请日:1996-05-16

    Abstract: A superscalar microprocessor includes a scheduler which contains storage for information related to operations and scan logic for selecting operations for out-of-order execution by a set of execution units. To provide fast operation, the selection is made without regard for the availability of operands which are required for execution of the operation but may be unavailable pending completion of an operation. An operand forward stage, which follows the issue stage, selects sources for an operand which may be a register file or a sourcing operation in the scheduler, completed or not. The scheduler contains all information describing the sourcing operations and forwards an operand value and information indicating the state of a sourcing operations. The state information indicates whether the sourcing operation is complete and execution of the issued operation can continue. The state also indicates a wait until the sourcing operation will complete. If the wait is too long, the issued operation is bumped so that another operation can be executed. This reduces pipeline hold ups and increase execution unit utilization.

    Abstract translation: 超标量微处理器包括调度器,该调度器包含用于与操作相关的信息的存储和扫描逻辑,用于选择由一组执行单元进行的无序执行的操作。 为了提供快速操作,进行选择而不考虑执行操作所需的操作数的可用性,但是在操作完成之前可能不可用。 在发布阶段之后的操作数前进阶段选择可能是调度程序中的注册文件或采购操作的操作数的源,完成或不完成。 调度器包含描述采购操作的所有信息,并转发操作数值和指示采购操作状态的信息。 状态信息指示采购操作是否完成,并且执行发出的操作可以继续。 国家还表示等待采购操作完成。 如果等待时间太长,则发出的操作被碰撞,从而可以执行另一个操作。 这减少了流水线保持并提高了执行单位利用率。

    Prefetch instruction mechanism for processor
    6.
    发明授权
    Prefetch instruction mechanism for processor 失效
    处理器预取指令机制

    公开(公告)号:US06253306B1

    公开(公告)日:2001-06-26

    申请号:US09124098

    申请日:1998-07-29

    CPC classification number: G06F9/383 G06F9/30101

    Abstract: Accordingly, a prefetch instruction mechanism is desired for implementing a prefetch instruction which is non-faulting, non-blocking, and non-modifying of architectural register state. Advantageously, a prefetch mechanism described herein is provided largely without the addition of substantial complexity to a load execution unit. In one embodiment, the non-faulting attribute of the prefetch mechanism is provided though use of the vector decode supplied Op sequence that activates an alternate exception handler. The non-modifying of architectural register state attribute is provided (in an exemplary embodiment) by first decoding a PREFETCH instruction to an Op sequence targeting a scratch register wherein the scratch register has scope limited to the Op sequence corresponding to the PREFETCH instruction. Although described in the context of a vector decode embodiment, the prefetch mechanism can be implemented with hardware decoders and suitable modifications to decode paths will be appreciated by those of skill in the art based on the description herein. Similarly, although in one particular embodiment such a scratch register is architecturally defined to read as a NULL (or zero) value, any target for the Op sequence that is not part of the architectural state of the processor would also be suitable. Finally, in one embodiment the non-blocking attribute is provided by the Op sequence completing (without waiting for return of fill data) upon posting of a cache fill request to load logic of a data cache. In this way, LdOps which follow in a load pipe are not stalled by a prefetch-related miss and can instead execute concurrently with the prefetch-related line fill.

    Abstract translation: 因此,需要预取指令机制来实现非错误,非阻塞和架构寄存器状态的非修改的预取指令。 有利地,本文描述的预取机制在很大程度上被提供,而不会对负载执行单元增加相当大的复杂性。 在一个实施例中,通过使用激活备用异常处理程序的向量解码提供的Op序列来提供预取机制的非故障属性。 通过首先将PREFETCH指令解码为针对临时寄存器的操作序列来提供架构寄存器状态属性的不修改(其中暂存寄存器具有限于对应于PREFETCH指令的操作序列的范围)。 尽管在矢量解码实施例的上下文中描述,但是可以使用硬件解码器来实现预取机制,并且基于本文的描述,本领域技术人员将理解对解码路径的适当修改。 类似地,尽管在一个特定实施例中,这样的临时寄存器在架构上被定义为被读取为NULL(或零)值,但是对于不是处理器的架构状态的一部分的Op序列的任何目标也将是合适的。 最后,在一个实施例中,在缓存填充请求发送到数据高速缓存的加载逻辑时,通过Op序列完成(不等待填充数据的返回)来提供非阻塞属性。 以这种方式,在加载管道中跟随的LdOps不会被预取相关的错误停止,并且可以与预取相关的行填充同时执行。

    Even bus clock circuit
    7.
    发明授权
    Even bus clock circuit 失效
    甚至总线时钟电路

    公开(公告)号:US5898640A

    公开(公告)日:1999-04-27

    申请号:US938219

    申请日:1997-09-26

    CPC classification number: G11C7/22 H03K5/26

    Abstract: An even bus clock circuit generates logic pulses in response to substantially coincident rising edges of a processor clock and a bus clock over a given range of processor clock to bus clock ratios that includes whole integers and half integers. The even bus clock circuit includes a delay element for receiving the bus clock and generating a delayed bus clock, a first flip-flop for receiving the processor clock at a data input and receiving the delayed bus clock at a clock input, and a second flip-flop for receiving a data output of the first flip-flop at a data input, receiving the processor clock at a clock input and generating a data output that is coupled to an asynchronous reset input of the first flip-flop. The logic pulses are generated at the data output of the first flip-flop and have a pulse width of substantially the same duration as a single cycle of the processor clock.

    Abstract translation: 偶数总线时钟电路响应于处理器时钟和总线时钟在给定范围的处理器时钟到总线时钟比的基本上一致的上升沿产生逻辑脉冲,其包括整数和整数。 偶数总线时钟电路包括用于接收总线时钟并产生延迟的总线时钟的延迟元件,用于在数据输入处接收处理器时钟并在时钟输入处接收延迟的总线时钟的第一触发器,以及第二触发器 用于在数据输入端接收第一触发器的数据输出,在时钟输入端接收处理器时钟,并产生耦合到第一触发器的异步复位输入的数据输出。 逻辑脉冲在第一触发器的数据输出处产生,并且具有与处理器时钟的单个周期基本相同的持续时间的脉冲宽度。

    Apparatus and method for a coincident rising edge detection circuit
    8.
    发明授权
    Apparatus and method for a coincident rising edge detection circuit 有权
    符合上升沿检测电路的装置和方法

    公开(公告)号:US06194927B1

    公开(公告)日:2001-02-27

    申请号:US09314556

    申请日:1999-05-19

    CPC classification number: H03L7/183 H03L7/06

    Abstract: In a data processing system, a circuit for providing an even bus clock signal, EVENBCLK, when the leading edges of the bus clock signal BCLK and a processor clock signal PCLK are coincident includes a phase-locked loop unit and a coincidence unit. The phase-locked loop unit provides PCLK signals that have a frequency Nx the frequency of the BCLK signals, where N can have an integer or a half integer value. The phase-locked loop unit includes a divide-by-M unit, where M=2N, that receives the PCLK signal at an input terminal and applies an output signal, PCLK/M, to the phase detector unit of the phase-locked loop unit. The operation of the phase-locked loop results in the BCLK signal and the PCLK/M signal having an established phase relationship. The PCLK signal and the PCLK/M signal are applied to the coincidence unit, the simultaneous application of the two signals resulting in the coincidence unit providing the EVENBCLK signals. When N is an integer, the PCLK signal and the BCLK signal have coincident rising edges that do not coincide with a leading edge of a PCLK/M signal. In this situation, a delayed signal, triggered by a previous PCLK/M signal, is generated that is applied to the coincidence unit in place of the missing PCLK/M signal to provide the EVENBCLK signal.

    Abstract translation: 在数据处理系统中,当总线时钟信号BCLK和处理器时钟信号PCLK的前沿一致时,用于提供偶数总线时钟信号EVENBCLK的电路包括锁相环单元和重合单元。 锁相环单元提供具有BCLK信号频率的频率Nx的PCLK信号,其中N可以具有整数或半个整数值。 锁相环单元包括M分频单元,其中M = 2N,其在输入端接收PCLK信号,并将输出信号PCLK / M施加到锁相环的相位检测器单元 单元。 锁相环的操作导致BCLK信号和PCLK / M信号具有建立的相位关系。 PCLK信号和PCLK / M信号被施加到符合单元,同时施加两个信号,从而产生提供EVENBCLK信号的符合单元。 当N是整数时,PCLK信号和BCLK信号具有与PCLK / M信号的前沿不一致的重合上升沿。 在这种情况下,产生由先前的PCLK / M信号触发的延迟信号,而不是丢失的PCLK / M信号而被提供给符合单元以提供EVENBCLK信号。

    Integration of multi-stage execution units with a scheduler for
single-stage execution units
    9.
    发明授权
    Integration of multi-stage execution units with a scheduler for single-stage execution units 有权
    将多级执行单元与单级执行单元的调度程序集成

    公开(公告)号:US6161173A

    公开(公告)日:2000-12-12

    申请号:US307316

    申请日:1999-05-07

    Abstract: A superscalar processor includes a central scheduler for multiple execution units. The scheduler presumes operations issued to a particular execution unit all have the same latency, e.g., one clock cycle, even though some of the operations have longer latencies, e.g., two clock cycles. The execution unit that executes the operations having with longer than expected latencies, includes scheduling circuitry that holds up particular operation pipelines when operands required for the pipelines will not be valid when the scheduler presumes. Accordingly, the design of the scheduler can be simplified and can accommodate longer latency operations without being significantly redesigned for the longer latency operations.

    Abstract translation: 超标量处理器包括用于多个执行单元的中央调度器。 即使一些操作具有较长的延迟,例如两个时钟周期,调度器假定发布到特定执行单元的操作都具有相同的等待时间,例如一个时钟周期。 执行具有长于预期延迟的操作的执行单元包括调度电路,当调度器假设时,当管道所需的操作数不成立时,该电路保持特定的操作流水线。 因此,可以简化调度器的设计,并且可以适应更长的延迟操作,而不会对较长的延迟操作进行显着的重新设计。

    Scan chains for out-of-order load/store execution control
    10.
    发明授权
    Scan chains for out-of-order load/store execution control 失效
    扫描链用于无序加载/存储执行控制

    公开(公告)号:US6038657A

    公开(公告)日:2000-03-14

    申请号:US40087

    申请日:1998-03-17

    Abstract: Scan logic which tracks the relative age of stores with respect to a particular load (or of loads with respect to a particular store) allows at processor to hold younger stores until the completion of older loads (or to hold younger loads until completion of older stores). Embodiments of propagate-kill style lookahead scan logic or of tree-structured, hierarchically-organized scan logic constructed in accordance with the present invention provide store older and load older indications with very few gate delays, even in processor embodiments adapted to concurrently evaluate large numbers of operations. Operating in conjunction with the scan logic, address matching logic allows the processor to more precisely tailor its avoidance of load-store (or store-load) dependencies. In a processor having a load unit and a store unit, a load/store execution control system allows load and store instructions to execute generally out-of-order with respect to each other while enforcing data dependencies between the load and store instructions.

    Abstract translation: 跟踪相对于特定负载(或相对于特定商店的负载)的商店的相对年龄的扫描逻辑允许处理器容纳较年轻的商店,直到完成较旧的负载(或保持较年轻的负载,直到完成较旧的商店 )。 根据本发明构造的传播杀手样式前瞻扫描逻辑或树结构的,分级组织的扫描逻辑的实施例提供了较旧的存储,并且加载具有非常少的门延迟的较旧指示,即使在适于同时评估大量的处理器实施例中 的操作。 与扫描逻辑一起运行,地址匹配逻辑允许处理器更精确地定制其避免加载存储(或存储加载)依赖性。 在具有加载单元和存储单元的处理器中,加载/存储执行控制系统允许加载和存储指令在执行加载和存储指令之间的数据依赖性的同时相对于彼此执行一般无序。

Patent Agency Ranking