Stack mechanism for a data processor

    公开(公告)号:US3810117A

    公开(公告)日:1974-05-07

    申请号:US29949972

    申请日:1972-10-20

    申请人: IBM

    发明人: HEALEY R

    摘要: A storage device (hereinafter referred to as a high speed stack) having an access speed compatible with that of its processor has operands and/or operators entered therein (a push operation) and removed therefrom (a pop operation) for processing in a last-infirst-out order. The number of entries stored in the stack at any moment can become very large due to the nesting of operators. Since it is not economically feasible to provide a large capacity high speed stack, overflow of the stack into a slower speed storage device (hereinafter called a low speed stack) is provided. ''''Roll out'''' of entries to the low speed stack and ''''roll in'''' of the entries back to the high speed stack is effected as the high speed stack becomes relatively full and empty. A backup register, which normally stores the last entry transferred to the low speed stack, permits delay of roll in, roll out operations until the last possible moment. When a new entry is to be stored into the high speed stack (hereinafter referred to as a push operation) and the stack is full, the new entry is put into the backup register, a selected number of entries are rolled out from the high speed stack to the low speed stack and the new entry is then transferred from the register to the high speed stack. Roll in is not initiated even when the high speed stack is empty since the next available entry in the slow speed stack is available in the backup register for fast access by the processor. Only after the entry in the backup register is accessed for processing, the high speed stack being empty, does roll in of entries from the low speed stack to the high speed stack begin. High speed stack top and bottom pointers and a slow speed stack pointer are incremented and decremented to address the stacks and to determine the full, empty states of the high speed stack. With the stack bottom movable, the number of entries left in high speed storage on a roll out (or the number of entries not filled with valid data on a roll in) can be controlled by the roll in and roll out routines. Thus the stack mechanism can be tuned to an optimum based on the program language being processed.

    Stack register renamer
    72.
    发明授权
    Stack register renamer 失效
    堆栈寄存器

    公开(公告)号:US3737871A

    公开(公告)日:1973-06-05

    申请号:US3737871D

    申请日:1971-07-28

    发明人: KATZMAN J

    IPC分类号: G06F7/78 G11C19/00 G06F9/06

    CPC分类号: G06F7/78

    摘要: A stack oriented memory system for a computer is provided with a plurality of top of the stack registers. The top elements of a logical stack of information are stored in the stack registers and the remaining information is stored in core memory. An embodiment of a bookkeeping scheme for keeping track of the order of the information in the stack registers comprises two additional registers. A first register stores the number of stack registers filled with stack information. A second register stores a number representing a naming state which defines the logical order of the stack registers. There is also a third register for storing the location of the top piece of information in the stack in core memory. These three registers store the necessary information to keep track of the order of the information in and the size of the logical stack. These registers also facilitate the bookkeeping when information is added to or deleted from the stack registers.

    摘要翻译: 用于计算机的面向栈的存储器系统具有堆叠寄存器的多个顶部。 逻辑堆叠信息的顶层元素存储在堆栈寄存器中,剩余的信息存储在核心存储器中。 用于跟踪堆栈寄存器中的信息顺序的记帐方案的实施例包括两个附加寄存器。 第一个寄存器存储填充堆栈信息的堆栈寄存器的数量。 第二个寄存器存储表示定义堆栈寄存器的逻辑顺序的命名状态的数字。 还有一个第三个寄存器用于将堆栈中的顶部信息的位置存储在核心存储器中。 这三个寄存器存储必要的信息,以跟踪信息的顺序和逻辑堆栈的大小。 当信息被添加到堆栈寄存器或从堆栈寄存器中删除时,这些寄存器也便于簿记。

    Dynamically ordered magnetic bubble shift register memory
    73.
    发明授权
    Dynamically ordered magnetic bubble shift register memory 失效
    动态订购磁浮动寄存器存储器

    公开(公告)号:US3670313A

    公开(公告)日:1972-06-13

    申请号:US3670313D

    申请日:1971-03-22

    申请人: IBM

    CPC分类号: G06F7/78 G11C19/0875

    摘要: This specification discloses a bubble domain memory in which data is arranged for immediacy of access in accordance with its last use. The memory comprises a plurality of parallel shift registers in which data can be accessed in parallel. In other words, each of the shift registers contains a bit of a page or word so that by the performance of one shifting operation all of the bits of the page or word can be accessed. Data in each shift register is arranged in its order of last use so that the access position K of a shift register having K bit positions contains the last bit of information used and the position K-1 preceding the access position K in the shift register contains the bit of data used just previously to the data in the access position K and so on. In these shift registers the shift positions are arranged in loops for shifting the data between the positions of the shift register. Two such loops are provided, one of the loops contains all the shift positions so that data in any position in the shift register can be shifted into the access position K of the register for reading or writing. The other loop contains all the positions of the shift register but the access position K. This second loop is for reordering the data in the shift register in order of last use after data has been shifted into the access position K for reading or writing by the first loop.

    摘要翻译: 本说明书公开了一种气泡域记忆,其中数据被安排用于根据其最后使用的即时性。 存储器包括多个并行移位寄存器,其中可并行访问数据。 换句话说,每个移位寄存器包含页或字的位,使得通过执行一个移位操作,可以访问页面或单词的所有位。 每个移位寄存器中的数据以其最后使用的顺序排列,使得具有K位位置的移位寄存器的访问位置K包含所使用的信息的最后位,移位寄存器中存取位置K之前的位置K-1包含 刚才使用的数据位在访问位置K中的数据等等。 在这些移位寄存器中,移位位置被布置成用于在移位寄存器的位置之间移位数据。 提供了两个这样的环路,其中一个环路包含所有移位位置,使得移位寄存器中任何位置的数据都可以移入寄存器的访问位置K进行读取或写入。 另一个环路包含移位寄存器的所有位置,但存在访问位置K.该第二循环用于在数据已经被移入访问位置K之后按照上次使用顺序重新排序移位寄存器中的数据,以便通过读取或写入 第一个循环。

    SIMILARITY CONTRIBUTION DETECTING METHOD AND SIMILARITY CONTRIBUTION DETECTING SYSTEM

    公开(公告)号:US20240361989A1

    公开(公告)日:2024-10-31

    申请号:US18761712

    申请日:2024-07-02

    发明人: Teng-Yok LEE

    IPC分类号: G06F7/78 G06N3/0464

    CPC分类号: G06F7/78 G06N3/0464

    摘要: A method comprises calculating a first difference d between first and second input data a and b that are provided to a machine learning model that has a function f and outputs first and second results f(a) and f(b), where d=(elements d[1], . . . , d[n]), a=(elements a[1], . . . , a[n]), b=(elements b[1], . . . , b[n]), f(a)=(f(a)[1], . . . , f(a)[m]), f(b)=(f(b)[1], . . . , f(b)[m]); calculating transposed Jacobian matrices JaT and JbT by partially differentiating the function f with respect to the first and second input data a and b to yield Jacobian matrices Ja and Jb; calculating a first product of the matrix JaT and the result f(a), and a second product of the matrix JbT and the result f(b); calculating a second difference w between the products, where w=(elements w[1], . . . , w[n]); and judging that a larger product of an element d[j] of the first difference d and an element w[j] of the second difference w contributes more to a similarity between the results f(a) and f(b).

    Broadcasting mode of planar engine for neural processor

    公开(公告)号:US12124943B2

    公开(公告)日:2024-10-22

    申请号:US18120218

    申请日:2023-03-10

    申请人: Apple Inc.

    摘要: Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In an elementwise mode, the planar engine circuit may combine two tensors by performing operations element by element. The planar engine circuit may support elementwise operation for two tensors that are in different sizes and ranks. The planar engine circuit may perform a broadcasting operation to duplicate one or more values across one or more channels to make a smaller tensor matching the size of the larger tensor.

    Selecting an I
    78.
    发明授权

    公开(公告)号:US12086566B2

    公开(公告)日:2024-09-10

    申请号:US16670482

    申请日:2019-10-31

    发明人: Thomas Rose

    摘要: A method of selecting, in hardware logic, an ith largest or a pth smallest number from a set of n m-bit numbers is described. The method is performed iteratively and in the rth iteration, the method comprises: summing an (m−r)th bit from each of the m-bit numbers to generate a summation result and comparing the summation result to a threshold value. Depending upon the outcome of the comparison, the rth bit of the selected number is determined and output and additionally the (m−r−1)th bit of each of the m-bit numbers is selectively updated based on the outcome of the comparison and the value of the (m−r)th bit in the m-bit number. In a first iteration, a most significant bit from each of the m-bit numbers is summed and each subsequent iteration sums bits occupying successive bit positions in their respective numbers.

    MATRIX TRANSPOSITION IN MATRIX MULTIPLICATION ARRAY CIRCUITRY

    公开(公告)号:US20240168723A1

    公开(公告)日:2024-05-23

    申请号:US18056822

    申请日:2022-11-18

    申请人: Intel Corporation

    IPC分类号: G06F7/78 G06F17/16

    CPC分类号: G06F7/78 G06F17/16

    摘要: An apparatus to facilitate matrix transposition in matrix multiplication array circuitry is disclosed. The apparatus includes a processor comprising matrix acceleration hardware comprising storage buffers and an array of data processing units (DPUs), wherein the matrix acceleration hardware is to: load data for a source matrix to the storage buffers; generate a transposed matrix corresponding comprising transposed elements of the source matrix; and input the transposed matrix to the array of DPUs for a matrix multiplication operation.