摘要:
A method of rotating data in a plurality of processing elements comprises a plurality of shifting operations and a plurality of storing operations, with the shifting and storing operations coordinated to enable a three shears operation to be performed on the data. The plurality of storing operations is responsive to the processing element's positions.
摘要:
A method of determining an interleave pattern for n lots of A and y lots of B, when n plus y equals a power of two such that the expression 2z−n may be used to represent the value of y, includes generating a key including the reverse bit order of a serially indexed count from 0 to 2z. An interleave pattern can be generated from the key in which all values less than n are replace by A and all other values are replaced by B. The key can be used to generate a table that contains all possible combinations of values of A and B. The table can then be stored such that an interleave pattern can be automatically selected based on either the number of lots of A or the number of lots of B.
摘要:
A method for balancing the load of a parallel processing system having parallel processing elements (PEs) linked serially in a line with first and second ends, wherein each of the PEs has a local number of tasks associated therewith, the method comprising determining a total number of tasks present on the line; notifying each of the PEs of the total number of tasks, calculating a local mean number of tasks for each of the PEs, and calculating a local deviation for each of the PEs. The method also comprises determining a first local cumulative deviation for each of the PEs, determining a second local cumulative deviation for each of the PEs, and redistributing tasks among the PEs in response to the first local cumulative deviation and the second local cumulative deviation.
摘要:
A method for transposing data in a plurality of processing elements is comprised of a plurality of shifting operations and a plurality of storing operations. The shifting and storing operations are coordinated to enable data to be stored along a diagonal of processing elements from a first direction or first pair of directions and to be output from the diagonal in a second direction or a second pair of directions perpendicular to the first pair of directions, respectively. The plurality of storing operations are responsive to the processing elements' positions. The first and second pairs of directions are selected from among the dimensions of the array, e.g., the +x/−x, +z/−z and +y/−y pairs of directions.
摘要翻译:用于在多个处理元件中转置数据的方法包括多个移位操作和多个存储操作。 协调移位和存储操作,使数据沿着第一方向或第一对方向的处理元件的对角线存储,并且在垂直于第一对的第二方向或第二对方向上从对角线输出 的方向。 多个存储操作响应于处理元件的位置。 从阵列的尺寸中选择第一和第二对方向,例如+ x / -x,+ z / -z和+ y / -y对方向。
摘要:
A method of rotating data in a plurality of processing elements comprises a plurality of shifting operations and a plurality of storing operations, with the shifting and storing operations coordinated to enable a three shears operation to be performed on the data. The plurality of storing operations is responsive to the processing element's positions.
摘要:
A method of rotating data in a plurality of processing elements comprises a plurality of shifting operations and a plurality of storing operations, with the shifting and storing operations coordinated to enable a three shears operation to be performed on the data. The plurality of storing operations is responsive to the processing element's positions.
摘要:
A method for generating a reflection of data in a plurality of processing elements comprises shifting the data along, for example, each row in the array until each processing element in the row has received all the data held by every other processing element in that row. Each processing element stores and outputs final data as a function of its position in the row. A similar reflection along a horizontal line can be achieved by shifting data along columns instead of rows. Also disclosed is a method for reflecting data in a matrix of processing elements about a vertical line comprising shifting data between processing elements arranged in rows. An initial count is set in each processing element according to the expression (2×Col_Index) MOD (array size). In one embodiment, a counter counts down from the initial count in each processing element as a function of the number of shifts that have peen performed. Output is selected as a function of the current count. A similar reflection about a horizontal line can be achieved by shifting data between processing elements arranged in columns and setting the initial count according to the expression (2×Row_Index) MOD (array size). The present invention represents an efficient method for obtaining the reflection of data.
摘要:
A method of rotating data in a plurality of processing elements comprises a plurality of shifting operations and a plurality of storing operations, with the shifting and storing operations coordinated to enable a three shears operation to be performed on the data. The plurality of storing operations is responsive to the processing element's positions.
摘要:
A method for finding a local extrema for a single processing element having a set of values associated therewith includes separating the set of values into an odd set of values and an even set of values, determining a first extrema from the odd set of values, determining a second extrema from the even set of values, and determining the local extrema from the first extrema and the second extrema. The first extrema is found by comparing each odd-numbered value in the set to each other odd-numbered value in the set and the second extrema is found by comparing each even-numbered value in the set to each other even-numbered value in the set.
摘要:
One aspect of the present invention relates to a method for balancing the load of a parallel processing system having a plurality of parallel processing elements arranged in a loop, wherein each processing element has a local number of tasks associated therewith. The method comprises determining within each processing element a total number of tasks present within the loop, calculating a local mean number of tasks within each processing element, assigning a weight to each of said plurality of processing elements, and calculating a local weighted deviation within each processing element. The method also comprises determining the sum weighted deviations within each processing element for one-half the loop in an anti-clockwise direction and in a clockwise direction, determining clockwise and anti-clockwise transfer parameters within each processing element, and redistributing tasks among the processing elements in response to the clockwise and anti-clockwise transfer parameters.