-
公开(公告)号:US20240354107A1
公开(公告)日:2024-10-24
申请号:US18754447
申请日:2024-06-26
申请人: Intel Corporation
CPC分类号: G06F9/30047 , G06F9/321 , G06F9/3836
摘要: In one example, a processor includes: at least one core to execute instructions; and at least one cache memory coupled to the at least one core, the at least one cache memory to store data, at least some of the data a copy of data stored in a memory. The at least one core is to determine whether to conditionally offload a sequence of instructions for execution on a compute circuit associated with the memory, based at least in part on whether one or more first data is present in the at least one cache memory, the one or more first data for use during execution of the sequence of instructions. Other embodiments are described and claimed.
-
公开(公告)号:US12106104B2
公开(公告)日:2024-10-01
申请号:US17133328
申请日:2020-12-23
申请人: Intel Corporation
IPC分类号: G06F9/30 , G06F12/0862 , H03M7/30
CPC分类号: G06F9/30047 , G06F9/30145 , G06F12/0862 , H03M7/30 , G06F2212/602
摘要: A processor that includes compression instructions to compress multiple adjacent data blocks of uncompressed read-only data stored in memory into one compressed read-only data block and store the compressed read-only data block in multiple adjacent blocks in the memory is provided. During execution of an application to operate on the read-only data, one of the multiple adjacent blocks storing the compressed read-only block is read from memory, stored in a prefetch buffer and decompressed in the memory controller. In response to a subsequent request during execution of the application for an adjacent data block in the compressed read-only data block, the uncompressed adjacent block is read directly from the prefetch buffer.
-
公开(公告)号:US11972230B2
公开(公告)日:2024-04-30
申请号:US16914318
申请日:2020-06-27
申请人: Intel Corporation
发明人: Menachem Adelman , Robert Valentine , Barukh Ziv , Amit Gradstein , Simon Rubanovich , Zeev Sperber , Mark J. Charney , Christopher J. Hughes , Alexander F. Heinecke , Evangelos Georganas , Binh Pham
CPC分类号: G06F7/78 , G06F9/3001 , G06F9/3016 , G06F17/16
摘要: Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, a processor includes a decoder and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode, a first destination operand field to specify a destination matrix location, a first source operand field to specify a first source matrix location, and a second source operand field to specify a second source matrix location. The execution circuitry is to, in response to the decoded instruction, transpose the first source matrix to generate a transposed first source matrix, perform a matrix multiplication using the transposed first source matrix and the second source matrix to generate a result, and store the result in a destination matrix location.
-
公开(公告)号:US11892952B2
公开(公告)日:2024-02-06
申请号:US17867673
申请日:2022-07-18
申请人: Intel Corporation
IPC分类号: G06F12/0877 , G06F9/30 , G06F12/0862 , G06F12/0811 , G06F15/80 , G06F12/0897
CPC分类号: G06F12/0877 , G06F9/30 , G06F9/30036 , G06F12/0811 , G06F12/0862 , G06F12/0897 , G06F15/8069 , G06F2212/1016 , G06F2212/1024 , G06F2212/27 , G06F2212/283 , G06F2212/6028
摘要: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode a no-locality hint vector memory access instruction. The no-locality hint vector memory access instruction to indicate a packed data register of the plurality of packed data registers that is to have a source packed memory indices. The source packed memory indices to have a plurality of memory indices. The no-locality hint vector memory access instruction is to provide a no-locality hint to the processor for data elements that are to be accessed with the memory indices. The processor also includes an execution unit coupled with the decode unit and the plurality of packed data registers. The execution unit, in response to the no-locality hint vector memory access instruction, is to access the data elements at memory locations that are based on the memory indices.
-
公开(公告)号:US11847185B2
公开(公告)日:2023-12-19
申请号:US17485055
申请日:2021-09-24
申请人: Intel Corporation
发明人: Dan Baum , Chen Koren , Elmoustapha Ould-Ahmed-Vall , Michael Espig , Christopher J. Hughes , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke
CPC分类号: G06F17/16 , G06F9/3001 , G06F9/3016 , G06F9/30101 , G06F9/3802
摘要: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.
-
6.
公开(公告)号:US11675590B2
公开(公告)日:2023-06-13
申请号:US17865849
申请日:2022-07-15
申请人: Intel Corporation
发明人: Raanan Sade , Robert Valentine , Bret Toll , Christopher J. Hughes , Alexander F. Heinecke , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney
IPC分类号: G06F12/128 , G06T1/00 , G06F9/30
CPC分类号: G06F9/30167 , G06F9/30101 , G06F9/30149
摘要: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of source and destination matrices, wherein the opcode indicates that the processor is to transform the specified source matrix into the specified destination matrix having the row-interleaved format; and execution circuitry to respond to the decoded instruction by transforming the specified source matrix into the specified RowInt-formatted destination matrix by interleaving J elements of each J-element sub-column of the specified source matrix in either row-major or column-major order into a K-wide submatrix of the specified destination matrix, the K-wide submatrix having K columns and enough rows to hold the J elements.
-
公开(公告)号:US11513957B2
公开(公告)日:2022-11-29
申请号:US17027248
申请日:2020-09-21
申请人: Intel Corporation
发明人: Ren Wang , Andrew J. Herdrich , Yen-cheng Liu , Herbert H. Hum , Jong Soo Park , Christopher J. Hughes , Namakkal N. Venkatesan , Adrian C. Moga , Aamer Jaleel , Zeshan A. Chishti , Mesut A. Ergin , Jr-shian Tsai , Alexander W. Min , Tsung-yuan C. Tai , Christian Maciocco , Rajesh Sankaran
IPC分类号: G06F12/0842 , G06F12/0893 , G06F12/109 , G06F12/0813 , G06F12/0831 , G06F9/455
摘要: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.
-
公开(公告)号:US20220237123A1
公开(公告)日:2022-07-28
申请号:US17712632
申请日:2022-04-04
申请人: Intel Corporation
发明人: Jason W. Brandt , Robert S. Chappell , Jesus Corbal , Edward T. Grochowski , Stephen H. Gunther , Buford M. Guy , Thomas R. Huff , Christopher J. Hughes , Elmoustapha Ould-Ahmed-Vall , Ronak Singhal , Seyed Yahya Sotoudeh , Bret L. Toll , Lihu Rappoport , David B. Papworth , James D. Allen
IPC分类号: G06F12/0831 , G06F12/1027 , G06F12/1009 , G06F9/30
摘要: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.
-
公开(公告)号:US11003455B2
公开(公告)日:2021-05-11
申请号:US16398183
申请日:2019-04-29
申请人: Intel Corporation
IPC分类号: G06F9/38 , G06F9/30 , G06F12/0875 , G06F12/1027 , G06F15/80 , G06F13/42
摘要: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
-
10.
公开(公告)号:US10963256B2
公开(公告)日:2021-03-30
申请号:US16147254
申请日:2018-09-28
申请人: Intel Corporation
发明人: Raanan Sade , Robert Valentine , Bret Toll , Christopher J. Hughes , Alexander F. Heinecke , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney
IPC分类号: G06F12/128 , G06T1/00 , G06F9/30
摘要: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of source and destination matrices, wherein the opcode indicates that the processor is to transform the specified source matrix into the specified destination matrix having the row-interleaved format; and execution circuitry to respond to the decoded instruction by transforming the specified source matrix into the specified RowInt-formatted destination matrix by interleaving J elements of each J-element sub-column of the specified source matrix in either row-major or column-major order into a K-wide submatrix of the specified destination matrix, the K-wide submatrix having K columns and enough rows to hold the J elements.
-
-
-
-
-
-
-
-
-