MANAGEMENT DEVICE, ARITHMETIC PROCESSING DEVICE, AND LOAD DISTRIBUTION METHOD AND COMPUTER-READABLE RECORDING MEDIUM STORING PROGRAM OF ARITHMETIC PROCESSING DEVICE

    公开(公告)号:US20240020154A1

    公开(公告)日:2024-01-18

    申请号:US18308824

    申请日:2023-04-28

    申请人: Fujitsu Limited

    IPC分类号: G06F9/48 G06F7/57

    CPC分类号: G06F9/4881 G06F7/57

    摘要: A device includes a processor configured to: classify arithmetic processing devices that executes tasks in parallel by distributing loads into arithmetic processing device groups; select a representative arithmetic processing device; notify the representative arithmetic processing device of identification information of other arithmetic processing devices of an arithmetic processing device group to which the representative arithmetic processing device belongs; instruct the representative arithmetic processing device to acquire information regarding tasks to be executed by the arithmetic processing devices of the arithmetic processing device group from a first task list, and to generate a second task list; notify each other arithmetic processing devices of identification information of the representative arithmetic processing device; and instruct each other arithmetic processing device to acquire information regarding tasks to be executed by the representative arithmetic processing device and each other arithmetic processing device from the second task list, and to generate a third task list.

    AI synaptic coprocessor
    13.
    发明授权

    公开(公告)号:US11868776B2

    公开(公告)日:2024-01-09

    申请号:US18168597

    申请日:2023-02-14

    摘要: A coprocessor may include a memory configured to store a plurality of Very Long Data Words, each as a test Very Long Data Word (VLDW) having a length in the range of about one thousand bits to one million or more bits and containing encoded information that is distributed across the length of the VLDW. A processor generates search terms and a processing logic unit receives a test VLDW from the memory, receives a search term from the processor, and computes a Boolean inner product between the search term and the test VLDW read from memory indicative of the measure of similarity between the test VLDW and the search term. Optionally, buffers within logic circuits of processing pipelines may receive the test VLDWs.

    Dynamic allocation of arithmetic logic units for vectorized operations

    公开(公告)号:US11816061B2

    公开(公告)日:2023-11-14

    申请号:US17127757

    申请日:2020-12-18

    申请人: RED HAT, INC.

    发明人: Ulrich Drepper

    摘要: A system includes a processing device that includes a vector arithmetic logic unit comprising a plurality of arithmetic logic units (ALUs), and a first processor core operatively coupled to the vector arithmetic logic unit, the processing device to receive a first vector instruction from the first processor core, wherein the first vector instruction specifies at least one first input vector having a first vector length, identify a first subset of the ALUs in view of the first vector length and one or more allocation criteria, execute, using the first subset of the set of ALUs, one or more first ALU operations specified by the first vector instruction, wherein the vector arithmetic logic unit executes the first ALU operations in parallel with one or more second ALU operations specified by a second vector instruction received from a second processor core.

    Extended memory communication
    17.
    发明授权

    公开(公告)号:US11810618B2

    公开(公告)日:2023-11-07

    申请号:US17453136

    申请日:2021-11-01

    摘要: Systems, apparatuses, and methods related to extended memory communication subsystems for performing extended memory operations are described. An example method can include receiving, at a processing unit that is coupled between a host device and a non-volatile memory device, signaling indicative of a plurality of operations to be performed on data written to or read from the non-volatile memory device. The method can further include performing, at the processing unit, at least one operation of the plurality of operations in response to the signaling. The method can further include accessing a portion of a memory array in the non-volatile memory device. The method can further include transmitting additional signaling indicative of a command to perform one or more additional operations of the plurality of operations on the data written to or read from the non-volatile memory device.

    Arithmetic logic unit register sequencing

    公开(公告)号:US11789732B2

    公开(公告)日:2023-10-17

    申请号:US17574026

    申请日:2022-01-12

    摘要: A graphics processing unit (GPU) sequences provision of operands to a set of operand registers, thereby allowing the GPU to share at least one of the operand registers between processing. The GPU includes a plurality of arithmetic logic units (ALUs) with at least one of the ALUs configured to perform double precision operations. The GPU further includes a set of operand registers configured to store single precision operands. For a plurality of executing threads that request double precision operations, the GPU stores the corresponding operands at the operand registers. Over a plurality of execution cycles, the GPU sequences transfer of operands from the set of operand registers to a designated double precision operand register. During each execution cycle, the double-precision ALU executes a double precision operation using the operand stored at the double precision operand register.

    Systems and methods for data placement for in-memory-compute

    公开(公告)号:US11782707B2

    公开(公告)日:2023-10-10

    申请号:US17548220

    申请日:2021-12-10

    IPC分类号: G06F9/30 G06F7/57 G06F7/53

    摘要: According to one embodiment, a memory module includes: a memory die including a dynamic random access memory (DRAM) banks, each including: an array of DRAM cells arranged in pages; a row buffer to store values of one of the pages; an input/output (IO) module; and an in-memory compute (IMC) module including: an arithmetic logic unit (ALU) to receive operands from the row buffer or the IO module and to compute an output based on the operands and one of a plurality of ALU operations; and a result register to store the output of the ALU; and a controller to: receive, from a host processor, operands and an instruction; determine, based on the instruction, a data layout; supply the operands to the DRAM banks in accordance with the data layout; and control an IMC module to perform one of the ALU operations on the operands in accordance with the instruction.