Coordination and synchronization of an asymmetric, single-chip, dual
multiprocessor
    1.
    发明授权
    Coordination and synchronization of an asymmetric, single-chip, dual multiprocessor 失效
    不对称,单芯片双重多处理器的协调和同步

    公开(公告)号:US5978838A

    公开(公告)日:1999-11-02

    申请号:US703434

    申请日:1996-08-26

    摘要: An integrated multiprocessor architecture simplifies synchronization of multiple processing units. The multiple processing units constitute a general-purpose or control processor and a vector processor which has a single-instruction-multiple-data (SIMD) architecture so that multiple parallel processing units in the vector processor all complete an instruction simultaneously and do not require software synchronization. The control control processor controls the vector processor and creates a fork in a program flow by starting the vector processor. An instruction set for the control processor includes special instructions that enable the control processor to access registers of the vector processor, start or halt execution by the vector processor, and test flags written by the vector processor to indicate completion of tasks. The two processors then execute separate program threads in parallel until the control processor stops the vector processor, an exception is encountered, or the vector processor completes its program thread and enters an idle state. An instruction set for the vector processor includes special instructions that interrupt the first processor to indicate a task is complete. A register coupled to and accessible by both processors stores a state bit indicating whether the vector processor is running or idle. The control processor can synchronize the separate program threads by executing a loop which polls the state bit. When the state bit indicates the vector processor is idle, the general-purpose processor can process results from the vector processor and restart the vector processor.

    摘要翻译: 集成多处理器架构简化了多个处理单元的同步。 多个处理单元构成具有单指令多数据(SIMD)架构的通用或控制处理器和向量处理器,使得向量处理器中的多个并行处理单元同时完成指令并且不需要软件 同步 控制控制处理器控制向量处理器并通过启动向量处理器在程序流中创建一个分支。 用于控制处理器的指令集包括使得控制处理器能够访问向量处理器的寄存器,启动或停止由向量处理器执行的特殊指令,以及由向量处理器写入的指示完成任务的测试标志。 然后,两个处理器并行执行单独的程序线程,直到控制处理器停止向量处理器,遇到异常,或者向量处理器完成其程序线程并进入空闲状态。 用于向量处理器的指令集包括中断第一处理器以指示任务完成的特殊指令。 耦合到两个处理器并由两个处理器访问的寄存器存储指示矢量处理器是正在运行还是空闲的状态位。 控制处理器可以通过执行轮询状态位的循环来同步单独的程序线程。 当状态位指示向量处理器空闲时,通用处理器可以处理来自向量处理器的结果并重新启动向量处理器。

    Execution unit for processing a data stream independently and in parallel
    3.
    发明授权
    Execution unit for processing a data stream independently and in parallel 失效
    独立并行处理数据流的执行单元

    公开(公告)号:US06401194B1

    公开(公告)日:2002-06-04

    申请号:US08790142

    申请日:1997-01-28

    IPC分类号: G06F9302

    摘要: A vector processor provides a data path divided into smaller slices of data, with each slice processed in parallel with the other slices. Furthermore, an execution unit provides smaller arithmetic and functional units chained together to execute more complex microprocessor instructions requiring multiple cycles by sharing single-cycle operations, thereby reducing both costs and size of the microprocessor. One embodiment handles 288-bit data widths using 36-bit data path slices. Another embodiment executes integer multiply and multiply-and-accumulate and floating point add/subtract and multiply operations using single-cycle arithmetic logic units. Other embodiments support 8-bit, 9-bit, 16-bit, and 32-bit integer data types and 32-bit floating data types.

    摘要翻译: 向量处理器提供分割成更小的数据片段的数据路径,每个片段与其他片段并行处理。 此外,执行单元提供较小的算术和功能单元链接在一起,以通过共享单周期操作来执行需要多个周期的更复杂的微处理器指令,从而降低了微处理器的成本和尺寸。 一个实施例使用36位数据路径片处理288位数据宽度。 另一个实施例使用单周期算术逻辑单元执行整数乘法和乘法和累加和浮点加法和减法运算。 其他实施例支持8位,9位,16位和32位整数数据类型和32位浮点数据类型。

    Processor that decodes a multi-cycle instruction into single-cycle
micro-instructions and schedules execution of the micro-instructions
    4.
    发明授权
    Processor that decodes a multi-cycle instruction into single-cycle micro-instructions and schedules execution of the micro-instructions 失效
    将多周期指令解码为单周期微指令并计划执行微指令的处理器

    公开(公告)号:US5923862A

    公开(公告)日:1999-07-13

    申请号:US789574

    申请日:1997-01-28

    摘要: An instruction decoder in a processor decodes an instruction by creating a decode buffer entry that includes global fields, operand fields, and a set of micro-instructions. Each micro-instruction represent an operation that an associated execution unit can execute in a single clock cycle. A scheduler issues the micro-instructions from one or more entries to the execution units for possible parallel and out-of-order execution. Each execution unit completes an operation, typically, in one clock cycle and does not monitor instructions that may block a pipeline. The execution units do not need separate decoding for multiple stages. One global field indicates which micro-instructions are execute first. Further, micro-instructions have fields that indicate an execution sequence. The scheduler issues operations in the order indicated by the global fields and the micro-instructions. When the last operation for an instruction is completed, the instruction is retired and removed from the decode buffer.

    摘要翻译: 处理器中的指令解码器通过创建包括全局字段,操作数字段和一组微指令的解码缓冲器条目来解码指令。 每个微指令表示相关执行单元可以在单个时钟周期内执行的操作。 调度器将微指令从一个或多个条目发送到执行单元,以实现可能的并行和无序执行。 每个执行单元通常在一个时钟周期内完成一个操作,并且不监视可能阻塞流水线的指令。 执行单元不需要对多个阶段进行单独的解码。 一个全局字段指示哪个微指令首先执行。 此外,微指令具有指示执行顺序的字段。 调度器按照全局字段和微指令指示的顺序发布操作。 当指令的最后一个操作完成时,指令被退出并从解码缓冲器中移除。

    Instruction fetch unit including instruction buffer and secondary or
branch target buffer that transfers prefetched instructions to the
instruction buffer
    5.
    发明授权
    Instruction fetch unit including instruction buffer and secondary or branch target buffer that transfers prefetched instructions to the instruction buffer 失效
    指令提取单元包括指令缓冲器和将预取指令传送到指令缓冲器的辅助或分支目标缓冲器

    公开(公告)号:US5889986A

    公开(公告)日:1999-03-30

    申请号:US790028

    申请日:1997-01-28

    IPC分类号: G06F9/38 G06F9/06

    CPC分类号: G06F9/3806 G06F9/3804

    摘要: An instruction fetch unit includes a program buffer for sequential instructions being decoded and a target buffer for an instruction sequence including the target of the next branch instruction. Scan logic coupled to the program buffer scans the program buffer for branch instructions. A target for the first branch instruction is determined and a request to external memory fills the target buffer with a sequence of instructions including a target instruction before sequential decoding reaches the branch instruction. If the branch is subsequently taken, the instructions from the branch target buffer are transferred to the program buffer. The program buffer may be divided into a main and a secondary buffer that have the same size as the target buffer, and an instruction bus between the instruction fetch unit and external memory is sufficiently wide to fill the main, secondary, or target buffer in a single write operation.

    摘要翻译: 指令提取单元包括用于正在解码的顺序指令的程序缓冲器和用于包括下一个分支指令的目标的指令序列的目标缓冲器。 耦合到程序缓冲区的扫描逻辑扫描程序缓冲区以获得分支指令。 确定第一分支指令的目标,并且对外部存储器的请求在序列解码到达分支指令之前用包括目标指令的指令序列填充目标缓冲器。 如果随后采取分支,则来自分支目标缓冲器的指令被传送到程序缓冲器。 程序缓冲器可以被划分为与目标缓冲器具有相同大小的主缓冲器和辅助缓冲器,并且指令提取单元和外部存储器之间的指令总线足够宽以填充主缓冲器,辅助缓冲器或目标缓冲器 单写操作。

    Resizable and relocatable memory scratch pad as a cache slice
    6.
    发明授权
    Resizable and relocatable memory scratch pad as a cache slice 失效
    可调整大小和可重定位的内存便笺作为缓存片

    公开(公告)号:US5966734A

    公开(公告)日:1999-10-12

    申请号:US733818

    申请日:1996-10-18

    IPC分类号: G06F12/00 G06F12/08 G06F12/12

    摘要: A cache system supports a re-sizable software-managed fast scratch pad that is implemented as a cache-slice. A processor register indicates the size and base address of the scratch pad. Instructions which facilitate use of the scratch pad include a prefetch instruction which loads multiple lines of data from external memory into the scratch pad and a writeback instruction which writes multiple lines of data from the scratch pad to external memory. The prefetch and writeback instructions are non-blocking instructions to allow instructions following in the program order to be executed while a prefetch or writeback operation is pending.

    摘要翻译: 高速缓存系统支持重新定义的软件管理快速暂存板,实现为缓存片。 处理器寄存器指示便笺板的大小和基址。 便于使用便笺板的指令包括预取指令,其将来自外部存储器的多行数据加载到便笺本中,以及将多行数据从便笺本写入外部存储器的写回指令。 预取和回写指令是非阻塞指令,允许在预取或回写操作挂起时执行程序顺序中的指令。

    Load and store unit for a vector processor
    7.
    发明授权
    Load and store unit for a vector processor 失效
    加载和存储矢量处理器的单元

    公开(公告)号:US5961628A

    公开(公告)日:1999-10-05

    申请号:US789575

    申请日:1997-01-28

    摘要: An apparatus coupled to a requesting unit and a memory. The apparatus includes a data path and a request control circuit. The data path is coupled to the requesting unit and the memory. The data path is for buffering a vector. The vector includes multiple data elements of a substantially similar data type. The request control circuit is coupled to the data path and the requesting unit. The request control circuit is for receiving a vector memory request from the requesting unit. The request control circuit services the vector memory request by causing the transference of the vector between the requesting unit and the memory via the data path.

    摘要翻译: 耦合到请求单元和存储器的装置。 该装置包括数据路径和请求控制电路。 数据路径耦合到请求单元和存储器。 数据路径用于缓冲向量。 矢量包括基本相似的数据类型的多个数据元素。 请求控制电路耦合到数据路径和请求单元。 请求控制电路用于从请求单元接收向量存储器请求。 请求控制电路通过经由数据路径引起请求单元和存储器之间的向量的传送来服务向量存储器请求。

    Method for performing dead-zone quantization in a single processor
instruction
    8.
    发明授权
    Method for performing dead-zone quantization in a single processor instruction 失效
    在单处理器指令中执行死区量化的方法

    公开(公告)号:US5845112A

    公开(公告)日:1998-12-01

    申请号:US812774

    申请日:1997-03-06

    IPC分类号: H04N7/24 G06F9/302 G06T9/00

    CPC分类号: G06F9/3001 G06T9/005

    摘要: An extension to existent vector instruction sets is presented in a form of new vector instructions which perform operations specialized for efficient digital video compression and decompression. A processor is designed to implement the arithmetic operation of each of these instructions in a single clock cycle, and some of the present instructions perform arithmetic operations selectively and directly on elements of the same registers.

    摘要翻译: 存在向量指令集的扩展以新的向量指令的形式呈现,其执行专门用于高效数字视频压缩和解压缩的操作。 处理器被设计成在单个时钟周期内实现这些指令中的每一个的算术运算,并且本指令中的一些指令在相同寄存器的元件上选择性地和直接地执行算术运算。

    Efficient context saving and restoring in a multi-tasking computing
system environment
    9.
    发明授权
    Efficient context saving and restoring in a multi-tasking computing system environment 失效
    在多任务计算系统环境中高效的上下文保存和恢复

    公开(公告)号:US06061711A

    公开(公告)日:2000-05-09

    申请号:US699280

    申请日:1996-08-19

    摘要: In a multi-tasking computing system environment, one program is halted and context switched out so that a processor may context switch in a subsequent program for execution. Processor state information exists which reflects the state of the program being context switched out. Storage of this processor state information permits successful resumption of the context switched out program. When the context switched out program is subsequently context switched in, the stored processor information is loaded in preparation for successfully resuming the program at the point in which execution was previously halted. Although, large areas of memory can be allocated to processor state information storage, only a portion of this may need to be preserved across a context switch for successfully saving and resuming the context switched out program. Unnecessarily saving and loading all available processor state information can be noticeably inefficient particularly where relatively large amounts of processor state information exists. In one embodiment, a processor requests a co-processor to context switch out the currently executing program. At a predetermined appropriate point in the executing program, the co-processor responds by halting program execution and saving only the minimal amount of processor state information necessary for successful restoration of the program. The appropriate point is chosen by the application programmer at a location in the executing program that requires preserving a minimal portion of the processor information across a context switch. By saving only a minimal amount of processor information, processor time savings are accumulated across context save and restoration operations.

    摘要翻译: 在多任务计算系统环境中,停止一个程序并上下文切换,使得处理器可以在后续程序中上下文切换以执行。 存在反映正在上下文切换的程序的状态的处理器状态信息。 该处理器状态信息的存储允许成功恢复上下文切换程序。 当上下文切换程序随后进行上下文切换时,加载所存储的处理器信息以准备好在先前停止执行的点成功恢复程序。 尽管可以将大面积的存储器分配给处理器状态信息存储,但是只有一部分可能需要在上下文切换中被保留以成功地保存和恢复上下文切换程序。 不必要地保存和加载所有可用的处理器状态信息,特别是在存在相对大量的处理器状态信息的情况下是显着的。 在一个实施例中,处理器请求协处理器上下文切换当前执行的程序。 在执行程序中的预定的适当点处,协处理器通过停止程序执行并且仅节省成功恢复程序所需的最小量的处理器状态信息来进行响应。 应用程序员在执行程序中需要在上下文切换中保留处理器信息的最小部分的位置来选择适当的点。 通过仅节省最少量的处理器信息,可以在上下文保存和恢复操作中累积处理器时间节省。

    Multifunction data aligner in wide data width processor
    10.
    发明授权
    Multifunction data aligner in wide data width processor 失效
    多功能数据对齐器在宽数据宽度处理器

    公开(公告)号:US5922066A

    公开(公告)日:1999-07-13

    申请号:US805392

    申请日:1997-02-24

    IPC分类号: G06F9/38 G06F5/01 G06F11/26

    摘要: A wide data width processor has an execution unit including an aligner that aligns data for load/store instructions and shifts or rotates data for arithmetic logic instructions. Use of the same circuitry and execution unit for these different types of instructions reduces overall circuit size because alignment circuitry need not be repeated, once in a load/store unit and once in an arithmetic logic unit.

    摘要翻译: 宽数据宽度处理器具有执行单元,该执行单元包括对准器,其对准用于加载/存储指令的数据,并移位或旋转用于算术逻辑指令的数据。 对于这些不同类型的指令使用相同的电路和执行单元减少了总体电路尺寸,因为对准电路不需要重复,一次在加载/存储单元中,并且一次在算术逻辑单元中。