专利检索 ap:("Intel Corporation") AND inv:"Christopher J. Hughes" 第 1 页

1.

发明公开
SYSTEM, METHOD AND APPARATUS FOR CONDITIONALLY OFFLOADING INSTRUCTION EXECUTION 审中-公开

公开(公告)号：US20240354107A1

公开(公告)日：2024-10-24

申请号：US18754447

申请日：2024-06-26

申请人： Intel Corporation

发明人： Frank Hady , Christopher J. Hughes , Scott Peterson

IPC分类号： G06F9/30 , G06F9/32 , G06F9/38

CPC分类号： G06F9/30047 , G06F9/321 , G06F9/3836

摘要： In one example, a processor includes: at least one core to execute instructions; and at least one cache memory coupled to the at least one core, the at least one cache memory to store data, at least some of the data a copy of data stored in a memory. The at least one core is to determine whether to conditionally offload a sequence of instructions for execution on a compute circuit associated with the memory, based at least in part on whether one or more first data is present in the at least one cache memory, the one or more first data for use during execution of the sequence of instructions. Other embodiments are described and claimed.

2.

发明授权
Processor instructions for data compression and decompression 有权

公开(公告)号：US12106104B2

公开(公告)日：2024-10-01

申请号：US17133328

申请日：2020-12-23

申请人： Intel Corporation

发明人： Zhe Wang , Alaa R. Alameldeen , Christopher J. Hughes

IPC分类号： G06F9/30 , G06F12/0862 , H03M7/30

CPC分类号： G06F9/30047 , G06F9/30145 , G06F12/0862 , H03M7/30 , G06F2212/602

摘要： A processor that includes compression instructions to compress multiple adjacent data blocks of uncompressed read-only data stored in memory into one compressed read-only data block and store the compressed read-only data block in multiple adjacent blocks in the memory is provided. During execution of an application to operate on the read-only data, one of the multiple adjacent blocks storing the compressed read-only block is read from memory, stored in a prefetch buffer and decompressed in the memory controller. In response to a subsequent request during execution of the application for an adjacent data block in the compressed read-only data block, the uncompressed adjacent block is read directly from the prefetch buffer.

3.

发明授权
Matrix transpose and multiply 有权

公开(公告)号：US11972230B2

公开(公告)日：2024-04-30

申请号：US16914318

申请日：2020-06-27

申请人： Intel Corporation

发明人： Menachem Adelman , Robert Valentine , Barukh Ziv , Amit Gradstein , Simon Rubanovich , Zeev Sperber , Mark J. Charney , Christopher J. Hughes , Alexander F. Heinecke , Evangelos Georganas , Binh Pham

IPC分类号： G06F7/78 , G06F9/30 , G06F17/16

CPC分类号： G06F7/78 , G06F9/3001 , G06F9/3016 , G06F17/16

摘要： Embodiments for a matrix transpose and multiply operation are disclosed. In an embodiment, a processor includes a decoder and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode, a first destination operand field to specify a destination matrix location, a first source operand field to specify a first source matrix location, and a second source operand field to specify a second source matrix location. The execution circuitry is to, in response to the decoded instruction, transpose the first source matrix to generate a transposed first source matrix, perform a matrix multiplication using the transposed first source matrix and the second source matrix to generate a result, and store the result in a destination matrix location.

4.

发明授权
No-locality hint vector memory access processors, methods, systems, and instructions 有权

公开(公告)号：US11892952B2

公开(公告)日：2024-02-06

申请号：US17867673

申请日：2022-07-18

申请人： Intel Corporation

发明人： Christopher J. Hughes

IPC分类号： G06F12/0877 , G06F9/30 , G06F12/0862 , G06F12/0811 , G06F15/80 , G06F12/0897

CPC分类号： G06F12/0877 , G06F9/30 , G06F9/30036 , G06F12/0811 , G06F12/0862 , G06F12/0897 , G06F15/8069 , G06F2212/1016 , G06F2212/1024 , G06F2212/27 , G06F2212/283 , G06F2212/6028

摘要： A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode a no-locality hint vector memory access instruction. The no-locality hint vector memory access instruction to indicate a packed data register of the plurality of packed data registers that is to have a source packed memory indices. The source packed memory indices to have a plurality of memory indices. The no-locality hint vector memory access instruction is to provide a no-locality hint to the processor for data elements that are to be accessed with the memory indices. The processor also includes an execution unit coupled with the decode unit and the plurality of packed data registers. The execution unit, in response to the no-locality hint vector memory access instruction, is to access the data elements at memory locations that are based on the memory indices.

5.

发明授权
Systems and methods of instructions to accelerate multiplication of sparse matrices using bitmasks that identify non-zero elements 有权

公开(公告)号：US11847185B2

公开(公告)日：2023-12-19

申请号：US17485055

申请日：2021-09-24

申请人： Intel Corporation

发明人： Dan Baum , Chen Koren , Elmoustapha Ould-Ahmed-Vall , Michael Espig , Christopher J. Hughes , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke

IPC分类号： G06F17/16 , G06F9/38 , G06F9/30

CPC分类号： G06F17/16 , G06F9/3001 , G06F9/3016 , G06F9/30101 , G06F9/3802

摘要： Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

6.

发明授权
Systems and methods for performing instructions to transform matrices into row-interleaved format 有权

公开(公告)号：US11675590B2

公开(公告)日：2023-06-13

申请号：US17865849

申请日：2022-07-15

申请人： Intel Corporation

发明人： Raanan Sade , Robert Valentine , Bret Toll , Christopher J. Hughes , Alexander F. Heinecke , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney

IPC分类号： G06F12/128 , G06T1/00 , G06F9/30

CPC分类号： G06F9/30167 , G06F9/30101 , G06F9/30149

摘要： Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of source and destination matrices, wherein the opcode indicates that the processor is to transform the specified source matrix into the specified destination matrix having the row-interleaved format; and execution circuitry to respond to the decoded instruction by transforming the specified source matrix into the specified RowInt-formatted destination matrix by interleaving J elements of each J-element sub-column of the specified source matrix in either row-major or column-major order into a K-wide submatrix of the specified destination matrix, the K-wide submatrix having K columns and enough rows to hold the J elements.

7.

发明授权
Processor and method implementing a cacheline demote machine instruction 有权

公开(公告)号：US11513957B2

公开(公告)日：2022-11-29

申请号：US17027248

申请日：2020-09-21

申请人： Intel Corporation

发明人： Ren Wang , Andrew J. Herdrich , Yen-cheng Liu , Herbert H. Hum , Jong Soo Park , Christopher J. Hughes , Namakkal N. Venkatesan , Adrian C. Moga , Aamer Jaleel , Zeshan A. Chishti , Mesut A. Ergin , Jr-shian Tsai , Alexander W. Min , Tsung-yuan C. Tai , Christian Maciocco , Rajesh Sankaran

IPC分类号： G06F12/0842 , G06F12/0893 , G06F12/109 , G06F12/0813 , G06F12/0831 , G06F9/455

摘要： Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.

8.

发明申请
APPARATUSES AND METHODS FOR A PROCESSOR ARCHITECTURE 有权

公开(公告)号：US20220237123A1

公开(公告)日：2022-07-28

申请号：US17712632

申请日：2022-04-04

申请人： Intel Corporation

发明人： Jason W. Brandt , Robert S. Chappell , Jesus Corbal , Edward T. Grochowski , Stephen H. Gunther , Buford M. Guy , Thomas R. Huff , Christopher J. Hughes , Elmoustapha Ould-Ahmed-Vall , Ronak Singhal , Seyed Yahya Sotoudeh , Bret L. Toll , Lihu Rappoport , David B. Papworth , James D. Allen

IPC分类号： G06F12/0831 , G06F12/1027 , G06F12/1009 , G06F9/30

摘要： Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.

9.

发明授权
Coalescing adjacent gather/scatter operations 有权

公开(公告)号：US11003455B2

公开(公告)日：2021-05-11

申请号：US16398183

申请日：2019-04-29

申请人： Intel Corporation

发明人： Andrew T. Forsyth , Brian J. Hickmann , Jonathan C. Hall , Christopher J. Hughes

IPC分类号： G06F9/38 , G06F9/30 , G06F12/0875 , G06F12/1027 , G06F15/80 , G06F13/42

摘要： According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.

10.

发明授权
Systems and methods for performing instructions to transform matrices into row-interleaved format 有权

公开(公告)号：US10963256B2

公开(公告)日：2021-03-30

申请号：US16147254

申请日：2018-09-28

申请人： Intel Corporation

发明人： Raanan Sade , Robert Valentine , Bret Toll , Christopher J. Hughes , Alexander F. Heinecke , Elmoustapha Ould-Ahmed-Vall , Mark J. Charney

IPC分类号： G06F12/128 , G06T1/00 , G06F9/30

摘要： Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of source and destination matrices, wherein the opcode indicates that the processor is to transform the specified source matrix into the specified destination matrix having the row-interleaved format; and execution circuitry to respond to the decoded instruction by transforming the specified source matrix into the specified RowInt-formatted destination matrix by interleaving J elements of each J-element sub-column of the specified source matrix in either row-major or column-major order into a K-wide submatrix of the specified destination matrix, the K-wide submatrix having K columns and enough rows to hold the J elements.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类