专利检索 ap:"Charles W. Kurak, JR." 第 1 页

1.

发明申请
Manifold Array Processor 审中-公开
标题翻译：歧管阵列处理器

公开(公告)号：US20130019082A1

公开(公告)日：2013-01-17

申请号：US13616942

申请日：2012-09-14

申请人： Gerald G. Pechanek , Charles W. Kurak, JR.

发明人： Gerald G. Pechanek , Charles W. Kurak, JR.

IPC分类号： G06F15/80

CPC分类号： G06F15/17381 , G06F9/30076 , G06F15/17337 , G06F15/8023

摘要： An array processor includes processing elements arranged in to form a rectangular array. Inter-cluster communication paths are mutually exclusive. Due to the mutual exclusivity of the data paths, communications between the processing elements of each cluster may be combined in a single inter-cluster path, thus eliminating half the wiring required for the path. The length of the longest communication path is not directly determined by the overall dimension of the array, as in conventional torus arrays. Rather, the longest communications path is limited by the inter-cluster spacing. Transpose elements of an N×N torus may be combined in clusters and communicate with one another through intra-cluster communications paths. Transpose operation latency is eliminated in this approach. Each PE may have a single transmit port and a single receive port. Thus, the individual PEs are decoupled from the array topology.

摘要翻译： 阵列处理器包括布置成形成矩形阵列的处理元件。群集间通信路径是互斥的。由于数据路径的相互独占性，每个集群的处理元件之间的通信可以组合在单个集群间路径中，从而消除路径所需的一半接线。最长通信路径的长度不直接取决于阵列的整体尺寸，如在常规环形阵列中。相反，最长的通信路径受群间间隔的限制。 N×N环面的移位元素可以组合在一起，并通过群内通信路径相互通信。这种方法消除了转置操作延迟。每个PE可以具有单个发送端口和单个接收端口。因此，各个PE与阵列拓扑分离。

2.

发明申请
Methods and Apparatus for Video Decoding 有权
标题翻译：视频解码方法与装置

公开(公告)号：US20100238999A1

公开(公告)日：2010-09-23

申请号：US12792228

申请日：2010-06-02

申请人： Doina Petrescu , Trampas Stern , Marco Jacobs , Dan Searles , Charles W. Kurak, JR.

发明人： Doina Petrescu , Trampas Stern , Marco Jacobs , Dan Searles , Charles W. Kurak, JR.

IPC分类号： H04N7/12

CPC分类号： H04N19/42 , H04N19/436 , H04N19/44 , H04N19/61

摘要： Techniques for performing the processing of blocks of video in multiple stages. Each stage is executed for blocks of data in the frame that need to go through that stage, based on the coding type, before moving to the next stage. This order of execution allows blocks of data to be processed in a nonsequential order, unless the blocks need to go through the same processing stages. Multiple processing elements (PEs) operating in SIMD mode executing the same task and operating on different blocks of data may be utilized, avoiding idle times for the PEs. In another aspect, inverse scan and dequantization operations for blocks of data are merged in a single procedure operating on multiple PEs operating in SIMD mode. This procedure makes efficient use of the multiple PEs and speeds up processing by combining two operations, inverse scan (reordering) and dequantization, which load the execution units differently. The reordering loads mainly the load and store units of the PEs, while the dequantization loads mainly other units. By combining the inverse scan and dequantization in an efficient VLIW packing performance, processing gain is achieved.

摘要翻译： 用于在多个阶段中执行视频块处理的技术。在移动到下一阶段之前，根据编码类型，在需要经过该阶段的帧中的数据块执行每个阶段。这种执行顺序允许以非顺序的顺序处理数据块，除非块需要经历相同的处理阶段。可以利用以SIMD模式运行的执行相同任务并在不同的数据块上操作的多个处理元件（PE），避免了PE的空闲时间。在另一方面，用于数据块的逆扫描和去量化操作在以在SIMD模式下操作的多个PE上操作的单个过程中合并。该过程有效地利用多个PE，并通过组合两个操作，反向扫描（重新排序）和去量化来加快处理，从而不同地加载执行单元。重新排序负载主要是PE的负载和存储单元，而反量化主要负载其他单元。通过在有效的VLIW包装性能中组合逆扫描和去量化，实现了处理增益。

3.

发明授权
Methods and apparatus to support conditional execution in a VLIW-based array processor with subword execution 失效

公开(公告)号：US07010668B2

公开(公告)日：2006-03-07

申请号：US10650340

申请日：2003-08-28

申请人： Thomas L. Drabenstott , Gerald G. Penchanek , Edwin F. Barry , Charles W. Kurak, Jr.

发明人： Thomas L. Drabenstott , Gerald G. Penchanek , Edwin F. Barry , Charles W. Kurak, Jr.

IPC分类号： G06F15/76 , G06F15/80

CPC分类号： G06F9/30094 , G06F9/30036 , G06F9/30072 , G06F9/30181 , G06F9/3842 , G06F9/3885 , G06F9/3887 , G06F9/3891 , G06F15/8007

摘要： General purpose flags (ACFs) are defined and encoded utilizing a hierarchical one-, two- or three-bit encoding. Each added bit provides a superset of the previous functionality. With condition combination, a sequential series of conditional branches based on complex conditions may be avoided and complex conditions can then be used for conditional execution. ACF generation and use can be specified by the programmer. By varying the number of flags affected, conditional operation parallelism can be widely varied, for example, from mono-processing to octal-processing in VLIW execution, and across an array of processing elements (PE)s. Multiple PEs can generate condition information at the same time with the programmer being able to specify a conditional execution in one processor based upon a condition generated in a different processor using the communications interface between the processing elements to transfer the conditions. Each processor in a multiple processor array may independently have different units conditionally operate based upon their ACFs.

4.

发明授权
Methods and apparatus to support conditional execution in a VLIW-based array processor with subword execution 有权
标题翻译：在具有子字执行的基于VLIW的阵列处理器中支持条件执行的方法和装置

公开(公告)号：US06366999B1

公开(公告)日：2002-04-02

申请号：US09238446

申请日：1999-01-28

申请人： Thomas L. Drabenstott , Gerald G. Pechanek , Edwin F. Barry , Charles W. Kurak, Jr.

发明人： Thomas L. Drabenstott , Gerald G. Pechanek , Edwin F. Barry , Charles W. Kurak, Jr.

IPC分类号： G06F1580

CPC分类号： G06F9/30094 , G06F9/30036 , G06F9/30072 , G06F9/30181 , G06F9/3842 , G06F9/3885 , G06F9/3887 , G06F9/3891 , G06F15/8007

摘要： General purpose flags (ACFs) are defined and encoded utilizing a hierarchical one-, two- or three-bit encoding. Each added bit provides a superset of the previous functionality. With condition combination, a sequential series of conditional branches based on complex conditions may be avoided and complex conditions can then be used for conditional execution. ACF generation and use can be specified by the programmer. By varying the number of flags affected, conditional operation parallelism can be widely varied, for example, from mono-processing to octal-processing in VLIW execution, and across an array of processing elements (PE)s. Multiple PEs can generate condition information at the same time with the programmer being able to specify a conditional execution in one processor based upon a condition generated in a different processor using the communications interface between the processing elements to transfer the conditions. Each processor in a multiple processor array may independently have different units conditionally operate based upon their ACFs.

摘要翻译： 使用分层一位，二位或三位编码来定义和编码通用标志（ACF）。每个添加的位提供了先前功能的超集。通过条件组合，可以避免基于复杂条件的顺序一系列条件分支，然后可以将复杂条件用于条件执行。 ACF生成和使用可以由程序员指定。通过改变受影响的标志的数量，条件操作并行性可以被广泛地变化，例如，从VLIW执行中的单处理到八进制处理，以及处理元件（PE）的阵列。多个PE可以同时生成条件信息，程序员能够基于使用处理元件之间的通信接口在不同的处理器中生成的条件来指定一个处理器中的条件执行以传送条件。多处理器阵列中的每个处理器可以独立地具有基于它们的ACF有条件地操作的不同单元。

5.

发明授权
Communicaton across shared mutually exclusive direction paths between clustered processing elements 有权
标题翻译：在群集处理元素之间的共享互斥方向路径之间进行通信

公开(公告)号：US09390057B2

公开(公告)日：2016-07-12

申请号：US13616942

申请日：2012-09-14

申请人： Gerald George Pechanek , Charles W. Kurak, Jr.

发明人： Gerald George Pechanek , Charles W. Kurak, Jr.

IPC分类号： G06F15/173 , G06F15/80 , G06F9/30

CPC分类号： G06F15/17381 , G06F9/30076 , G06F15/17337 , G06F15/8023

摘要： An array processor includes processing elements arranged in clusters to form a rectangular array. Inter-cluster communication paths are mutually exclusive. Due to the mutual exclusivity of the data paths, communications between the processing elements of each cluster may be combined in a single inter-cluster path, thus eliminating half the wiring required for the path. The length of the longest communication path is not directly determined by the overall dimension of the array, as in conventional torus arrays. Rather, the longest communications path is limited by the inter-cluster spacing. Transpose elements of an N×N torus may be combined in clusters and communicate with one another through intra-cluster communications paths. Transpose operation latency is eliminated in this approach. Each PE may have a single transmit port and a single receive port. Thus, the individual PEs are decoupled from the array topology.

摘要翻译： 阵列处理器包括以簇形成矩阵阵列的处理元件。群集间通信路径是互斥的。由于数据路径的相互独占性，每个集群的处理元件之间的通信可以组合在单个集群间路径中，从而消除路径所需的一半接线。最长通信路径的长度不直接取决于阵列的整体尺寸，如在常规环形阵列中。相反，最长的通信路径受群间间隔的限制。 N×N环面的移位元素可以组合在一起，并通过群内通信路径相互通信。这种方法消除了转置操作延迟。每个PE可以具有单个发送端口和单个接收端口。因此，各个PE与阵列拓扑分离。

6.

发明授权
System core for transferring data between an external device and memory 有权
标题翻译：用于在外部设备和存储器之间传输数据的系统核心

公开(公告)号：US07266620B1

公开(公告)日：2007-09-04

申请号：US10797726

申请日：2004-03-10

申请人： Gerald George Pechanek , David Strube , Edwin Franklin Barry , Charles W. Kurak, Jr. , Carl Donald Busboom , Dale Edward Schneider , Nikos P. Pitsianis , Grayson Morris , Edward A. Wolff , Patrick R. Marchand , Ricardo E. Rodriguez , Marco C. Jacobs

发明人： Gerald George Pechanek , David Strube , Edwin Franklin Barry , Charles W. Kurak, Jr. , Carl Donald Busboom , Dale Edward Schneider , Nikos P. Pitsianis , Grayson Morris , Edward A. Wolff , Patrick R. Marchand , Ricardo E. Rodriguez , Marco C. Jacobs

IPC分类号： G06F13/28 , G06F13/00

CPC分类号： G06F15/82 , G06F9/30145 , G06F11/263 , Y10S707/99943

摘要： A system core having an internal memory which transfers data from an external device to the internal memory is described. To this end, the system core includes a processor, a direct memory access (DMA) controller, an instruction memory and a plurality of memories. The instruction memory contains processor instructions and DMA instructions. The DMA controller fetches DMA instructions from the instruction memory. The DMA controller executes the fetched DMA instructions and thus populates the plurality of memories with data from the external device. The processor then operates on the data found in the populated memories.

摘要翻译： 描述具有将数据从外部设备传送到内部存储器的内部存储器的系统核心。为此，系统核心包括处理器，直接存储器访问（DMA）控制器，指令存储器和多个存储器。指令存储器包含处理器指令和DMA指令。 DMA控制器从指令存储器中获取DMA指令。 DMA控制器执行所提取的DMA指令，并且由此来自外部设备的数据填充多个存储器。然后，处理器对在填充的存储器中找到的数据进行操作。

7.

发明授权
Methods and apparatus for efficient cosine transform implementations 有权
标题翻译：用于有效余弦变换实现的方法和装置

公开(公告)号：US06754687B1

公开(公告)日：2004-06-22

申请号：US09711218

申请日：2000-11-09

申请人： Charles W. Kurak, Jr. , Gerald G. Pechanek

发明人： Charles W. Kurak, Jr. , Gerald G. Pechanek

IPC分类号： G06F1714

CPC分类号： G06F9/30014 , G06F9/30032 , G06F9/30036 , G06F9/3885 , G06F17/147

摘要： Many video processing applications, such as the decoding and encoding standards promulgated by the moving picture experts group (MPEG), are time constrained applications with multiple complex compute intensive algorithms such as the two-dimensional 8×8 IDCT. In addition, for encoding applications, cost, performance, and programming flexibility for algorithm optimizations are important design requirements. Consequently, it is of great advantage to meeting performance requirements to have a programmable processor that can achieve extremely high performance on the 2D 8×8 IDCT function. The ManArray 2×2 processor is able to process the 2D 8×8 IDCT in 34-cycles and meet the IEEE standard 1180-1990 for precision of the IDCT. A unique distributed 2D 8×8 IDCT process is presented along with the unique data placement supporting the high performance algorithm. In addition, a scalable 2D 8×8 IDCT algorithm that is operable on a 1×0, 1×1, 1×2, 2×2, 2×3, and further arrays of greater numbers of processors is presented that minimizes the VIM memory size by reuse of VLIWs and streamlines further application processing by having the IDCT results output in a standard row-major order. The techniques are applicable to cosine transforms more generally, such as discrete cosine transforms (DCTs).

摘要翻译： 诸如运动图像专家组（MPEG）所公布的解码和编码标准的许多视频处理应用是具有诸如二维8×8 IDCT的复杂计算密集型算法的时间约束应用。此外，对于编码应用，算法优化的成本，性能和编程灵活性是重要的设计要求。因此，满足性能要求具有可在2D 8x8 IDCT功能上实现极高性能的可编程处理器是非常有利的。 ManArray 2x2处理器能够以34个周期处理2D 8x8 IDCT，并符合IEEE标准1180-1990的IDCT精度。提供独特的分布式2D 8x8 IDCT过程以及支持高性能算法的独特数据布局。此外，还提出了一种可扩展的2D 8x8 IDCT算法，可在1x0,1x1,1x2,2x2,2x3以及更多数量处理器的其他阵列上工作，可通过重用VLIW来最小化VIM存储器大小，并通过以下方式简化进一步的应用处理将IDCT结果输出为标准行主要顺序。这些技术更适用于更一般的余弦变换，例如离散余弦变换（DCT）。

8.

发明授权
Constructing database representing manifold array architecture instruction set for use in support tool code creation 有权
标题翻译：构建表示用于支持工具代码创建的多支数组架构指令集的数据库

公开(公告)号：US06748517B1

公开(公告)日：2004-06-08

申请号：US09599980

申请日：2000-06-22

申请人： Gerald G. Pechanek , David Carl Strube , Edwin Frank Barry , Charles W. Kurak, Jr. , Carl Donald Busboom , Dale Edward Schneider , Nikos P. Pitsianis , Grayson Morris , Edward A. Wolff , Patrick R. Marchand , Ricardo E. Rodriguez , Marco C. Jacobs

发明人： Gerald G. Pechanek , David Carl Strube , Edwin Frank Barry , Charles W. Kurak, Jr. , Carl Donald Busboom , Dale Edward Schneider , Nikos P. Pitsianis , Grayson Morris , Edward A. Wolff , Patrick R. Marchand , Ricardo E. Rodriguez , Marco C. Jacobs

IPC分类号： G06F944

CPC分类号： G06F15/82 , G06F9/30145 , G06F11/263 , Y10S707/99943

摘要： Details of a highly cost effective and efficient implementation of a manifold array (ManArray) architecture and instruction syntax for use therewith are described herein. Various aspects of this approach include the regularity of the syntax, the relative ease with which the instruction set can be represented in database form, the ready ability with which tools can be created, the ready generation of self-checking codes and parameterized testcases. Parameterizations can be fairly easily mapped and system maintenance is significantly simplified.

摘要翻译： 这里描述了使用歧管阵列（ManArray）架构和指令语法的高成本有效和高效的实现的细节。该方法的各个方面包括语法的规律性，可以以数据库形式表示指令集的相对容易程度，可以创建工具的准备就绪能力，自检代码的准备生成和参数化测试用例。可以很容易地映射参数化，并显着简化系统维护。

9.

发明授权
Manifold array processor 有权
标题翻译：歧管阵列处理器

公开(公告)号：US06338129B1

公开(公告)日：2002-01-08

申请号：US09323609

申请日：1999-06-01

申请人： Gerald G. Pechanek , Charles W. Kurak, Jr.

发明人： Gerald G. Pechanek , Charles W. Kurak, Jr.

IPC分类号： G06F1516

CPC分类号： G06F15/17381 , G06F9/30076 , G06F15/17337 , G06F15/8023

摘要： An array processor includes processing elements arranged in clusters which are, in turn, combined in a rectangular array. Each cluster is formed of processing elements which preferably communicate with the processing elements of at least two other clusters. Additionally each inter-cluster communication path is mutually exclusive, that is, each path carries either north and west, south and east, north and east, or south and west communications. Due to the mutual exclusivity of the data paths, communications between the processing elements of each cluster may be combined in a single inter-cluster path. That is, communications from a cluster which communicates to the north and east with another cluster may be combined in one path, thus eliminating half the wiring required for the path. Additionally, the length of the longest communication path is not directly determined by the overall dimension of the array, as it is in conventional torus arrays. Rather, the longest communications path is limited only by the inter-cluster spacing. In one implementation, transpose elements of an N×N torus are combined in clusters and communicate with one another through intra-cluster communications paths. Since transpose elements have direct connections to one another, transpose operation latency is eliminated in this approach. Additionally, each PE may have a single transmit port and a single receive port. As a result, the individual PEs are decoupled from the topology of the array.

摘要翻译： 阵列处理器包括按簇排列的处理元件，它们依次以矩形阵列组合。每个簇由优选地与至少两个其他簇的处理元件通信的处理元件形成。另外每个集群间的通信路径是相互排斥的，也就是说，每条路径都有北西，南，东，北，东，或南，西通信。由于数据路径的相互独占性，每个集群的处理元件之间的通信可以组合在单个集群间路径中。也就是说，来自与北部和东部与另一个群集通信的群集的通信可以组合在一个路径中，从而消除路径所需的一半布线。此外，最长通信路径的长度不是直接由阵列的整体尺寸决定，就像在传统的环面阵列中一样。相反，最长的通信路径仅受群间间隔限制。在一个实现中，将NxN环面的转置元素组合在一起并通过集群内通信路径相互通信。由于转置元素具有彼此的直接连接，因此在此方法中消除了转置操作延迟。另外，每个PE可以具有单个发送端口和单个接收端口。因此，各个PE与阵列的拓扑结构分离。

10.

发明授权
Manifold array processor 失效

公开(公告)号：US6023753A

公开(公告)日：2000-02-08

申请号：US885310

申请日：1997-06-30

申请人： Gerald G. Pechanek , Charles W. Kurak, Jr.

发明人： Gerald G. Pechanek , Charles W. Kurak, Jr.

IPC分类号： G06F15/173 , G06F15/80 , G06F15/00

CPC分类号： G06F15/17381 , G06F15/17337 , G06F15/8023 , G06F9/30076

摘要： An array processor includes processing elements arranged in clusters which are, in turn, combined in a rectangular array. Each cluster is formed of processing elements which preferably communicate with the processing elements of at least two other clusters. Additionally each inter-cluster communication path is mutually exclusive, that is, each path carries either north and west, south and east, north and east, or south and west communications. Due to the mutual exclusivity of the data paths, communications between the processing elements of each cluster may be combined in a single inter-cluster path. That is, communications from a cluster which communicates to the north and east with another cluster may be combined in one path, thus eliminating half the wiring required for the path. Additionally, the length of the longest communication path is not directly determined by the overall dimension of the array, as it is in conventional torus arrays. Rather, the longest communications path is limited only by the inter-cluster spacing. In one implementation, transpose elements of an N.times.N torus are combined in clusters and communicate with one another through intra-cluster communications paths. Since transpose elements have direct connections to one another, transpose operation latency is eliminated in this approach. Additionally, each PE may have a single transmit port and a single receive port. As a result, the individual PEs are decoupled from the topology of the array.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类