Communicaton across shared mutually exclusive direction paths between clustered processing elements
    1.
    发明授权
    Communicaton across shared mutually exclusive direction paths between clustered processing elements 有权
    在群集处理元素之间的共享互斥方向路径之间进行通信

    公开(公告)号:US09390057B2

    公开(公告)日:2016-07-12

    申请号:US13616942

    申请日:2012-09-14

    IPC分类号: G06F15/173 G06F15/80 G06F9/30

    摘要: An array processor includes processing elements arranged in clusters to form a rectangular array. Inter-cluster communication paths are mutually exclusive. Due to the mutual exclusivity of the data paths, communications between the processing elements of each cluster may be combined in a single inter-cluster path, thus eliminating half the wiring required for the path. The length of the longest communication path is not directly determined by the overall dimension of the array, as in conventional torus arrays. Rather, the longest communications path is limited by the inter-cluster spacing. Transpose elements of an N×N torus may be combined in clusters and communicate with one another through intra-cluster communications paths. Transpose operation latency is eliminated in this approach. Each PE may have a single transmit port and a single receive port. Thus, the individual PEs are decoupled from the array topology.

    摘要翻译: 阵列处理器包括以簇形成矩阵阵列的处理元件。 群集间通信路径是互斥的。 由于数据路径的相互独占性,每个集群的处理元件之间的通信可以组合在单个集群间路径中,从而消除路径所需的一半接线。 最长通信路径的长度不直接取决于阵列的整体尺寸,如在常规环形阵列中。 相反,最长的通信路径受群间间隔的限制。 N×N环面的移位元素可以组合在一起,并通过群内通信路径相互通信。 这种方法消除了转置操作延迟。 每个PE可以具有单个发送端口和单个接收端口。 因此,各个PE与阵列拓扑分离。

    Methods and apparatus for abbreviated instruction sets adaptable to configurable processor architecture
    5.
    再颁专利
    Methods and apparatus for abbreviated instruction sets adaptable to configurable processor architecture 有权
    缩写指令集的方法和装置适应于可配置的处理器架构

    公开(公告)号:USRE40509E1

    公开(公告)日:2008-09-16

    申请号:US10848615

    申请日:2004-05-18

    IPC分类号: G06F9/45

    摘要: An improved manifold array (ManArray) architecture addresses the problem of configurable application-spacific instruction set optimization and instruction memory reduction using an instruction abbreviation process thereby further optimizing the general ManArray architecture for application to high-volume and portablke battery-powered type of products.In the ManArray abbreviation process a standard 32-bit ManArray instruction is reduced to a smaller length instruction format, such as 14-bits. An application is first programmed using the full ManArray instruction set using the native 32-bit instructions. After the application program is completed and verified, an instruction-abbreviation tool analyzes the 32-bit application program and generates the abbreviated program using the abbreviated instructions. This instruction abbreviation process allows different program-reduction optimizations tailored for each application program. This process develops an optimized instruction set for the intended application. The abbreviated program, now located in a significantly smaller instruction memory, is functionally equivalent to the original native 32-bit application program. The abbreviated-instructions are fetched from this smaller memory and then dynamically translated into native ManArray instruction form in a sequence processor controller. Since the instruction set is now determined for the specific application. an optimized processor design can be easily produced. The system and process can be applied to native instructions having other numbers of bits and to other processing architectures.

    摘要翻译: 改进的歧管阵列(ManArray)架构使用指令缩写过程解决了可配置的应用空间指令集优化和指令存储器减少的问题,从而进一步优化了通用的ManArray架构,以应用于大容量和portablke电池供电类型的产品。 在ManArray缩写过程中,标准的32位ManArray指令被缩减为较小长度的指令格式,例如14位。 应用程序首先使用本机32位指令使用完整的ManArray指令集进行编程。 应用程序完成和验证后,一个指令缩写工具分析32位应用程序,并使用缩写说明生成缩写程序。 该指令缩写过程允许针对每个应用程序量身定制的不同的程序减少优化。 该过程为预期应用开发了优化的指令集。 缩写程序现在位于显着较小的指令存储器中,在功能上等同于原始的本机32位应用程序。 缩写指令从该较小的存储器中获取,然后在序列处理器控制器中动态地转换为本地ManArray指令形式。 由于现在针对具体应用确定了指令集。 可以轻松制作优化的处理器设计。 系统和过程可以应用于具有其他位数的本机指令和其他处理架构。

    Manifold array processor
    6.
    发明授权
    Manifold array processor 有权
    歧管阵列处理器

    公开(公告)号:US07197624B2

    公开(公告)日:2007-03-27

    申请号:US10774815

    申请日:2004-02-09

    IPC分类号: G06F15/16

    摘要: An array processor includes processing elements (00, 01, 02, 03, 10, 11, 12, 13, 20, 21, 22, 23, 30, 31, 32, 33) arranged in clusters (e.g., 44, 46, 48, 50) to form a rectangular array (40). Inter-cluster communication paths (88) are mutually exclusive. Due to the mutual exclusivity of the data paths, communications between the processing elements of each cluster may be combined in a single inter-cluster path, thus eliminating half the wiring required for the path. The length of the longest communication path is not directly determined by the overall dimension of the array, as in conventional torus arrays. Rather, the longest communications path is limited by the inter-cluster spacing. Transpose elements of an N×N torus may be combined in clusters and communicate with one another through intra-cluster communications paths. Transpose operation latency is eliminated in this approach. Each PE may have a single transmit port (35) and a single receive port (37). Thus, the individual PEs are decoupled from the array topology.

    摘要翻译: 阵列处理器包括以簇(例如,44,46,48)排列的处理元件(00,01,02,03,10,11,12,13,20,21,22,23,30,31,32,33) ,50)以形成矩形阵列(40)。 群集间通信路径(88)是互斥的。 由于数据路径的相互独占性,每个集群的处理元件之间的通信可以组合在单个集群间路径中,从而消除路径所需的一半接线。 最长通信路径的长度不直接取决于阵列的整体尺寸,如在常规环形阵列中。 相反,最长的通信路径受群间间隔的限制。 NxN环面的转置元素可以组合在一起并通过集群内通信路径相互通信。 这种方法消除了转置操作延迟。 每个PE可以具有单个发送端口(35)和单个接收端口(37)。 因此,各个PE与阵列拓扑分离。

    Cascaded event detection modules for generating combined events interrupt for processor action
    7.
    发明授权
    Cascaded event detection modules for generating combined events interrupt for processor action 失效
    用于生成组合事件的级联事件检测模块用于处理器动作中断

    公开(公告)号:US07058790B2

    公开(公告)日:2006-06-06

    申请号:US10786604

    申请日:2004-02-25

    IPC分类号: G06F11/30

    摘要: An eventpoint chaining apparatus for generalized event detection and action specification in a processing environment is described. In one aspect, the eventpoint chaining apparatus includes a first processor which has a programmable eventpoint module with an input trigger (InTrig) input. The first processing element detects an occurrence of a first processor event (p-event) and produces an OutTrigger (OT) signal. The eventpoint chaining apparatus also includes a second processor which has a programmable eventpoint module with an input trigger (InTrig) input which receives the OT signal from the first processing element. The second processing element detects an occurrence of a second p-event and produces, in response to the OT signal received from the first processing element and the detection of a second p-event, an eventpoint (EP) interrupt signal. The eventpoint chaining apparatus also includes a sequence processor interrupt control unit for receiving the EP interrupt signals indicating the occurrence of both the first and second p-events and causing a p-action in response to the occurrence of both the first and second p-events.

    摘要翻译: 描述用于处理环境中的广义事件检测和动作规范的事件点链接装置。 一方面,事件点链接装置包括具有可输入触发(InTrig)输入的可编程事件点模块的第一处理器。 第一处理元件检测出第一处理器事件(p事件)的发生并产生OutTrigger(OT)信号。 事件点链接装置还包括具有可编程事件点模块的第二处理器,该可编程事件点模块具有从第一处理元件接收OT信号的输入触发(InTrig)输入。 第二处理元件检测第二p事件的发生,并且响应于从第一处理元件接收的OT信号和第二p事件的检测,产生事件点(EP)中断信号。 事件点链接装置还包括序列处理器中断控制单元,用于接收表示第一和第二p事件的发生的EP中断信号,并且响应于第一和第二p事件的发生而引起p动作 。

    Twisted and wrapped array organized into clusters of processing elements
    8.
    发明授权
    Twisted and wrapped array organized into clusters of processing elements 失效
    扭曲和包裹的阵列组织成处理元素的群集

    公开(公告)号:US08341381B2

    公开(公告)日:2012-12-25

    申请号:US11830357

    申请日:2007-07-30

    IPC分类号: G06F15/80

    摘要: An array of processing elements (PEs) is logically twisted in a first direction, wrapped to form a cylindrical array, and grouped in a second direction to determine PEs that are to be located in clusters and implemented to form physical clusters of PEs. Inter-cluster communication paths are mutually exclusive. Due to the mutual exclusivity of the data paths, communications between the processing elements of each cluster may be combined in a single inter-cluster path, thus eliminating half the wiring required for the path. The length of the longest communication path is not directly determined by the overall dimension of the array, as in conventional torus arrays. Rather, the longest communications path is limited by the inter-cluster spacing. Transpose elements of an N×N torus may be combined in clusters and communicate with one another through intra-cluster communications paths. Transpose operation latency is eliminated in this approach. Each PE may have a single transmit port and a single receive port. Thus, the individual PEs are decoupled from the array topology.

    摘要翻译: 处理元件(PE)的阵列在第一方向上被逻辑地扭曲,被包裹以形成圆柱形阵列,并且在第二方向上分组以确定要被定位在群集中并被实现以形成PE的物理群集的PE。 群集间通信路径是互斥的。 由于数据路径的相互独占性,每个集群的处理元件之间的通信可以组合在单个集群间路径中,从而消除路径所需的一半接线。 最长通信路径的长度不直接取决于阵列的整体尺寸,如在常规环形阵列中。 相反,最长的通信路径受群间间隔的限制。 N×N环面的移位元素可以组合在一起,并通过群内通信路径相互通信。 这种方法消除了转置操作延迟。 每个PE可以具有单个发送端口和单个接收端口。 因此,各个PE与阵列拓扑分离。