-
公开(公告)号:US20240045810A1
公开(公告)日:2024-02-08
申请号:US18488494
申请日:2023-10-17
IPC分类号: G06F12/1045 , G06F9/30 , G06F9/345 , G06F9/38 , G06F11/00 , G06F11/10 , G06F7/24 , G06F7/487 , G06F7/499 , G06F7/53 , G06F7/57 , G06F9/48 , G06F17/16 , H03H17/06 , G06F9/32 , G06F12/0875 , G06F12/0897 , G06F12/0862 , G06F12/1009
CPC分类号: G06F12/1045 , G06F9/30145 , G06F9/345 , G06F9/30014 , G06F9/30036 , G06F9/30112 , G06F9/383 , G06F9/3867 , G06F11/00 , G06F11/1048 , G06F9/30065 , G06F7/24 , G06F7/487 , G06F7/4876 , G06F7/49915 , G06F7/53 , G06F7/57 , G06F9/3001 , G06F9/30021 , G06F9/30149 , G06F9/3818 , G06F9/3836 , G06F9/3851 , G06F9/48 , G06F17/16 , G06F9/30032 , G06F9/30072 , G06F9/3887 , H03H17/0664 , G06F9/3856 , G06F9/30098 , G06F9/3016 , G06F9/32 , G06F9/3802 , G06F12/0875 , G06F12/0897 , G06F12/0862 , G06F12/1009 , G06F11/10 , G06F9/3822 , G06F9/30018 , G06F9/325 , G06F9/381 , G06F15/7807
摘要: A processor and a vector sort instruction for the processor to execute are provided, in which the vector sort instructions includes instructions for comparing a first element of a set of vector elements of a vector to a remainder of the set of vector elements; determining, based on the comparing, a control vector that specifies a respective sorted position for each element of the set of vector elements; and reordering the set of vector elements based on the control vector.
-
公开(公告)号:US20240020154A1
公开(公告)日:2024-01-18
申请号:US18308824
申请日:2023-04-28
申请人: Fujitsu Limited
发明人: Masashi HAYANO , Takumi HONDA , Naoto FUKUMOTO
CPC分类号: G06F9/4881 , G06F7/57
摘要: A device includes a processor configured to: classify arithmetic processing devices that executes tasks in parallel by distributing loads into arithmetic processing device groups; select a representative arithmetic processing device; notify the representative arithmetic processing device of identification information of other arithmetic processing devices of an arithmetic processing device group to which the representative arithmetic processing device belongs; instruct the representative arithmetic processing device to acquire information regarding tasks to be executed by the arithmetic processing devices of the arithmetic processing device group from a first task list, and to generate a second task list; notify each other arithmetic processing devices of identification information of the representative arithmetic processing device; and instruct each other arithmetic processing device to acquire information regarding tasks to be executed by the representative arithmetic processing device and each other arithmetic processing device from the second task list, and to generate a third task list.
-
公开(公告)号:US11868776B2
公开(公告)日:2024-01-09
申请号:US18168597
申请日:2023-02-14
发明人: David Sherwood , Terry A. Higbee
CPC分类号: G06F9/30152 , G06F7/57 , G06F13/28 , G06N3/063
摘要: A coprocessor may include a memory configured to store a plurality of Very Long Data Words, each as a test Very Long Data Word (VLDW) having a length in the range of about one thousand bits to one million or more bits and containing encoded information that is distributed across the length of the VLDW. A processor generates search terms and a processing logic unit receives a test VLDW from the memory, receives a search term from the processor, and computes a Boolean inner product between the search term and the test VLDW read from memory indicative of the measure of similarity between the test VLDW and the search term. Optionally, buffers within logic circuits of processing pipelines may receive the test VLDWs.
-
14.
公开(公告)号:US11853869B2
公开(公告)日:2023-12-26
申请号:US17987369
申请日:2022-11-15
发明人: Sungho Kim , Cheheung Kim , Jaeho Lee
摘要: A neural network apparatus that is configured to process an operation includes neural network circuitry configured to receive a first input of an n-bit activation, store a second input of an m-bit weight, perform a determination whether to perform an operation on an ith bit of the first input and a jth bit of the second input, output an operation value of an operation performed on the ith bit of the first input and the jth bit of the second input based on the determination, and produce an operation value of the operation based on the determination.
-
公开(公告)号:US11829439B2
公开(公告)日:2023-11-28
申请号:US17137226
申请日:2020-12-29
发明人: Yun Du , Gang Zhong , Fei Wei , Yibin Zhang , Jing Han , Hongjiang Shang , Elina Kamenetskaya , Minjie Huang , Alexei Vladimirovich Bourd , Chun Yu , Andrew Evan Gruber , Eric Demers
摘要: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
-
公开(公告)号:US11816061B2
公开(公告)日:2023-11-14
申请号:US17127757
申请日:2020-12-18
申请人: RED HAT, INC.
发明人: Ulrich Drepper
CPC分类号: G06F15/8053 , G06F7/57 , G06F9/3877 , G06F9/3887 , G06F9/5027
摘要: A system includes a processing device that includes a vector arithmetic logic unit comprising a plurality of arithmetic logic units (ALUs), and a first processor core operatively coupled to the vector arithmetic logic unit, the processing device to receive a first vector instruction from the first processor core, wherein the first vector instruction specifies at least one first input vector having a first vector length, identify a first subset of the ALUs in view of the first vector length and one or more allocation criteria, execute, using the first subset of the set of ALUs, one or more first ALU operations specified by the first vector instruction, wherein the vector arithmetic logic unit executes the first ALU operations in parallel with one or more second ALU operations specified by a second vector instruction received from a second processor core.
-
公开(公告)号:US11810618B2
公开(公告)日:2023-11-07
申请号:US17453136
申请日:2021-11-01
发明人: Vijay S. Ramesh , Allan Porterfield
CPC分类号: G11C13/0023 , G06F7/57 , G06F9/3001 , G06F9/542 , G06F9/546 , G11C13/004 , G11C13/0069 , G11C2213/71
摘要: Systems, apparatuses, and methods related to extended memory communication subsystems for performing extended memory operations are described. An example method can include receiving, at a processing unit that is coupled between a host device and a non-volatile memory device, signaling indicative of a plurality of operations to be performed on data written to or read from the non-volatile memory device. The method can further include performing, at the processing unit, at least one operation of the plurality of operations in response to the signaling. The method can further include accessing a portion of a memory array in the non-volatile memory device. The method can further include transmitting additional signaling indicative of a command to perform one or more additional operations of the plurality of operations on the data written to or read from the non-volatile memory device.
-
公开(公告)号:US11789732B2
公开(公告)日:2023-10-17
申请号:US17574026
申请日:2022-01-12
发明人: Bin He , Jiasheng Chen , Jian Huang
CPC分类号: G06F9/3001 , G06F7/57 , G06F9/3009 , G06F9/30101 , G06F9/4806
摘要: A graphics processing unit (GPU) sequences provision of operands to a set of operand registers, thereby allowing the GPU to share at least one of the operand registers between processing. The GPU includes a plurality of arithmetic logic units (ALUs) with at least one of the ALUs configured to perform double precision operations. The GPU further includes a set of operand registers configured to store single precision operands. For a plurality of executing threads that request double precision operations, the GPU stores the corresponding operands at the operand registers. Over a plurality of execution cycles, the GPU sequences transfer of operands from the set of operand registers to a designated double precision operand register. During each execution cycle, the double-precision ALU executes a double precision operation using the operand stored at the double precision operand register.
-
公开(公告)号:US11782707B2
公开(公告)日:2023-10-10
申请号:US17548220
申请日:2021-12-10
发明人: Krishna T. Malladi , Wenqin Huangfu
CPC分类号: G06F9/3001 , G06F7/5318 , G06F7/57 , G06F9/3016 , G06F9/30036 , G06F9/30098
摘要: According to one embodiment, a memory module includes: a memory die including a dynamic random access memory (DRAM) banks, each including: an array of DRAM cells arranged in pages; a row buffer to store values of one of the pages; an input/output (IO) module; and an in-memory compute (IMC) module including: an arithmetic logic unit (ALU) to receive operands from the row buffer or the IO module and to compute an output based on the operands and one of a plurality of ALU operations; and a result register to store the output of the ALU; and a controller to: receive, from a host processor, operands and an instruction; determine, based on the instruction, a data layout; supply the operands to the DRAM banks in accordance with the data layout; and control an IMC module to perform one of the ALU operations on the operands in accordance with the instruction.
-
公开(公告)号:US20230289296A1
公开(公告)日:2023-09-14
申请号:US18321050
申请日:2023-05-22
IPC分类号: G06F12/1045 , G06F9/30 , G06F9/345 , G06F9/38 , G06F11/00 , G06F11/10 , G06F7/24 , G06F7/487 , G06F7/499 , G06F7/53 , G06F7/57 , G06F9/48 , G06F17/16 , H03H17/06 , G06F9/32 , G06F12/0875 , G06F12/0897 , G06F12/0862 , G06F12/1009
CPC分类号: G06F12/1045 , G06F9/30145 , G06F9/345 , G06F9/30014 , G06F9/30036 , G06F9/30112 , G06F9/383 , G06F9/3867 , G06F11/00 , G06F11/1048 , G06F9/30065 , G06F7/24 , G06F7/487 , G06F7/4876 , G06F7/49915 , G06F7/53 , G06F7/57 , G06F9/3001 , G06F9/30021 , G06F9/30149 , G06F9/3818 , G06F9/3836 , G06F9/3851 , G06F9/48 , G06F17/16 , G06F9/30032 , G06F9/30072 , G06F9/3887 , H03H17/0664 , G06F9/30098 , G06F9/3016 , G06F9/32 , G06F9/3802 , G06F12/0875 , G06F12/0897 , G06F12/0862 , G06F12/1009 , G06F11/10 , G06F9/3822 , G06F9/30018 , G06F9/325 , G06F9/381 , G06F15/7807
摘要: A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.
-
-
-
-
-
-
-
-
-