专利检索 ap:"Christopher J Hughes" 第 1 页

1.

发明申请
HARDWARE/SOFTWARE CO-OPTIMIZATION TO IMPROVE PERFORMANCE AND ENERGY FOR INTER-VM COMMUNICATION FOR NFVS AND OTHER PRODUCER-CONSUMER WORKLOADS 有权

公开(公告)号：US20210004328A1

公开(公告)日：2021-01-07

申请号：US17027248

申请日：2020-09-21

申请人： Ren Wang , Andrew J. Herdrich , Yen-cheng Liu , Herbert H. Hum , Jong Soo Park , Christopher J. Hughes , Namakkal N. Venkatesan , Adrian C. Moga , Aamer Jaleel , Zeshan A. Chishti , Mesut A. Ergin , Jr-shian Tsai , Alexander W. Min , Tsung-yuan C. Tai , Christian Maciocco , Rajesh Sankaran

发明人： Ren Wang , Andrew J. Herdrich , Yen-cheng Liu , Herbert H. Hum , Jong Soo Park , Christopher J. Hughes , Namakkal N. Venkatesan , Adrian C. Moga , Aamer Jaleel , Zeshan A. Chishti , Mesut A. Ergin , Jr-shian Tsai , Alexander W. Min , Tsung-yuan C. Tai , Christian Maciocco , Rajesh Sankaran

IPC分类号： G06F12/0842 , G06F12/0831 , G06F12/0893 , G06F12/109 , G06F12/0813 , G06F9/455

摘要： Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.

2.

发明申请
APPARATUS AND METHOD FOR PROCESSING STRUCTURE OF ARRAYS (SOA) AND ARRAY OF STRUCTURES (AOS) DATA 审中-公开

公开(公告)号：US20200097298A1

公开(公告)日：2020-03-26

申请号：US16140294

申请日：2018-09-24

申请人： CHRISTOPHER J. HUGHES , BRET TOLL , ALEXANDER HEINECKE , DAN BAUM , ELMOUSTAPHA OULD-AHMED-VALL , RAANAN SADE , ROBERT VALENTINE , MARK CHARNEY

发明人： CHRISTOPHER J. HUGHES , BRET TOLL , ALEXANDER HEINECKE , DAN BAUM , ELMOUSTAPHA OULD-AHMED-VALL , RAANAN SADE , ROBERT VALENTINE , MARK CHARNEY

IPC分类号： G06F9/38 , G06F9/30 , G06F15/80

摘要： An apparatus and method for processing array of structures (AoS) and structure of arrays (SoA) data. For example, one embodiment of a processor comprises: a destination tile register to store data elements in a structure of arrays (SoA) format; a first source tile register to store indices associated with the data elements; instruction fetch circuitry to fetch an array of structures (AoS) gather instruction comprising operands identifying the first source tile register and the destination tile register; a decoder to decode the AoS gather instruction; and execution circuitry to determine a plurality of system memory addresses based on the indices from the first source tile register, to read data elements from the system memory addresses in an AoS format, and to load the data elements to the destination tile register in an SoA format.

3.

发明申请
SYSTEMS FOR PERFORMING INSTRUCTIONS FOR FAST ELEMENT UNPACKING INTO 2-DIMENSIONAL REGISTERS 审中-公开

公开(公告)号：US20190042245A1

公开(公告)日：2019-02-07

申请号：US16146854

申请日：2018-09-28

申请人： Bret TOLL , Alexander F. HEINECKE , Christopher J. HUGHES , Ronen ZOHAR , Michael ESPIG , Dan BAUM , Raanan SADE , Robert VALENTINE

发明人： Bret TOLL , Alexander F. HEINECKE , Christopher J. HUGHES , Ronen ZOHAR , Michael ESPIG , Dan BAUM , Raanan SADE , Robert VALENTINE

IPC分类号： G06F9/30 , G06F9/38 , G06F12/02 , G06F12/06

摘要： Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.

4.

发明申请
METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE PERMUTE CONTROLS WITH LEADING ZERO COUNT FUNCTIONALITY 审中-公开

公开(公告)号：US20180196671A1

公开(公告)日：2018-07-12

申请号：US15912486

申请日：2018-03-05

申请人： Christopher J. Hughes , Mikhail Plotnikov , Andrey Naraikin , Robert Valentine

发明人： Christopher J. Hughes , Mikhail Plotnikov , Andrey Naraikin , Robert Valentine

IPC分类号： G06F9/30 , G06F9/38

CPC分类号： G06F9/30145 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/3834

摘要： Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.

5.

发明申请
APPARATUS AND METHOD FOR LAZY TRANSLATION LOOKASIDE BUFFER (TLB) COHERENCE 审中-公开

公开(公告)号：US20170286316A1

公开(公告)日：2017-10-05

申请号：US15089211

申请日：2016-04-01

申请人： KSHITIJ A. DOSHI , CHRISTOPHER J. HUGHES

发明人： KSHITIJ A. DOSHI , CHRISTOPHER J. HUGHES

IPC分类号： G06F12/10

CPC分类号： G06F12/1063 , G06F12/1009 , G06F2212/682

摘要： An apparatus and method are described for managing TLB coherence. For example, one embodiment of a processor comprises: one or more cores to execute instructions and process data; one or more translation lookaside buffers (TLBs) each comprising a plurality of entries to cache virtual-to-physical address translations usable by the set of one or more cores when executing the instructions; one or more epoch counters each programmed with a specified epoch value; and TLB validation logic to validate a specified set of TLB entries at intervals specified by the epoch value.

6.

发明申请
HYBRID HARDWARE AND SOFTWARE IMPLEMENTATION OF TRANSACTIONAL MEMORY ACCESS 审中-公开

公开(公告)号：US20170206159A1

公开(公告)日：2017-07-20

申请号：US15477052

申请日：2017-04-01

申请人： Sanjeev KUMAR , Christopher J. HUGHES , Partha KUNDU , Anthony NGUYEN

发明人： Sanjeev KUMAR , Christopher J. HUGHES , Partha KUNDU , Anthony NGUYEN

IPC分类号： G06F12/0806 , G06F9/38 , G06F12/0846 , G06F9/30 , G06F12/084 , G06F12/0831 , G06F9/46 , G06F9/52

CPC分类号： G06F12/0806 , G06F9/3004 , G06F9/30043 , G06F9/30087 , G06F9/30145 , G06F9/3834 , G06F9/3857 , G06F9/3859 , G06F9/467 , G06F9/528 , G06F12/0831 , G06F12/084 , G06F12/0848 , G06F2212/60 , G06F2212/621

摘要： Embodiments of the invention relate a hybrid hardware and software implementation of transactional memory accesses in a computer system. A processor including a transactional cache and a regular cache is utilized in a computer system that includes a policy manager to select one of a first mode (a hardware mode) or a second mode (a software mode) to implement transactional memory accesses. In the hardware mode the transactional cache is utilized to perform read and write memory operations and in the software mode the regular cache is utilized to perform read and write memory operations.

7.

发明申请
INSTRUCTION AND LOGIC FOR SUPPRESSION OF HARDWARE PREFETCHERS 审中-公开
标题翻译：用于抑制硬件预制器的指令和逻辑

公开(公告)号：US20160179544A1

公开(公告)日：2016-06-23

申请号：US14580999

申请日：2014-12-23

申请人： Alexander F. Heinecke , Christopher J. Hughes , Daehyun Kim , Jong Soo Park

发明人： Alexander F. Heinecke , Christopher J. Hughes , Daehyun Kim , Jong Soo Park

IPC分类号： G06F9/38 , G06F9/30

摘要： A processor includes a core, a hardware prefetcher, and a prefetcher control module. The hardware prefetcher includes logic to make speculative prefetch requests, through a memory subsystem, for elements for execution by the core, and logic to store prefetched elements in a cache. The prefetcher control module includes logic to selectively suppress, based on a hardware-prefetch suppression instruction executed by the core, a speculative prefetch request to be made by the hardware prefetcher.

摘要翻译： 处理器包括核心，硬件预取器和预取器控制模块。硬件预取器包括用于通过存储器子系统进行推测预取请求的逻辑，用于由核心执行的元素以及将预取元素存储在高速缓存中的逻辑。预取器控制模块包括用于基于由核心执行的硬件预取抑制指令来选择性地抑制由硬件预取器进行的推测预取请求的逻辑。

8.

发明申请
COALESCING ADJACENT GATHER/SCATTER OPERATIONS 审中-公开

公开(公告)号：US20160124749A1

公开(公告)日：2016-05-05

申请号：US14976231

申请日：2015-12-21

申请人： Andrew T. FORSYTH , Brian J. HICKMANN , Jonathan C. HALL , Christopher J. HUGHES

发明人： Andrew T. FORSYTH , Brian J. HICKMANN , Jonathan C. HALL , Christopher J. HUGHES

IPC分类号： G06F9/38 , G06F13/42 , G06F9/30

CPC分类号： G06F9/3853 , G06F9/30018 , G06F9/30036 , G06F9/30043 , G06F9/30098 , G06F9/30105 , G06F9/30145 , G06F9/3804 , G06F9/3824 , G06F9/3836 , G06F9/3887 , G06F12/0875 , G06F12/1027 , G06F13/4282 , G06F15/8007 , G06F2212/1016 , G06F2212/452 , G06F2212/68

摘要： According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.

9.

发明授权
Automatic transaction coarsening 有权
标题翻译：自动交易粗化

公开(公告)号：US09244746B2

公开(公告)日：2016-01-26

申请号：US13956609

申请日：2013-08-01

申请人： Christopher J. Hughes , Richard M. Yoo

发明人： Christopher J. Hughes , Richard M. Yoo

IPC分类号： G06F9/46 , G06F9/52

CPC分类号： G06F9/526 , G06F9/466 , G06F9/467

摘要： A processing device comprises an instruction execution unit and track and combing logic to combine a plurality of transactions into a single combined transaction. The track and combine logic comprises a transaction monitoring module to monitor an execution of a plurality of transactions by the instruction execution unit, each of the plurality of transactions comprising a transaction begin instruction, at least one operation instruction and a transaction end instruction. The track and combine logic further comprises a transaction combination module to identify, in view of the monitoring, a subset of the plurality of transactions to combine into a single combined transaction for execution on the processing device and to combine the identified subset of the plurality of transactions into the single combined transaction, the single combined transaction comprising a single transaction begin instruction, a plurality of operation instructions corresponding to the subset of the plurality of transactions and a single transaction end instruction.

摘要翻译： 处理装置包括指令执行单元和跟踪和组合逻辑以将多个事务组合成单个组合事务。跟踪和组合逻辑包括事务监视模块，用于监视指令执行单元执行多个事务，所述多个事务中的每个事务包括事务开始指令，至少一个操作指令和事务结束指令。轨道和组合逻辑还包括交易组合模块，用于鉴于监视，识别多个事务的子集以组合成单个组合事务以在处理设备上执行，并且将所识别的多个事务转换为单个组合事务，单个组合事务包括单个事务开始指令，对应于多个事务的子集的多个操作指令和单个事务结束指令。

10.

发明申请
METHOD AND APPARATUS FOR SELECTING CACHE LOCALITY FOR ATOMIC OPERATIONS 有权
标题翻译：选择用于原子操作的缓存本地化的方法和装置

公开(公告)号：US20150178086A1

公开(公告)日：2015-06-25

申请号：US14137218

申请日：2013-12-20

申请人： Christopher J. Hughes , Daehyun Kim , Camilo A. Moreno , Jong Soo Park , Richard M. Yoo

发明人： Christopher J. Hughes , Daehyun Kim , Camilo A. Moreno , Jong Soo Park , Richard M. Yoo

IPC分类号： G06F9/38 , G06F12/08

CPC分类号： G06F9/3806 , G06F9/3004 , G06F9/30087 , G06F9/382 , G06F9/3834 , G06F9/3836 , G06F11/0724 , G06F12/0806 , G06F12/0811 , G06F12/0842 , G06F12/0897 , G06F15/80

摘要： An apparatus and method for determining whether to execute an atomic operation locally or remotely. For example, one embodiment of a processor comprises: a decoder to decode an atomic operation on a local core; prediction logic on the local core to estimate a cost associated with execution of the atomic operation on the local core and a cost associated with execution of the atomic operation on a remote core; and the remote core to execute the atomic operation remotely if the prediction logic determines that the cost for execution on the local core is relatively greater than the cost for execution on the remote core; and the local core to execute the atomic operation locally if the prediction logic determines that the cost for local execution on the local core is relatively less than the cost for execution on the remote core.

摘要翻译： 一种用于确定是在本地还是远程执行原子操作的装置和方法。例如，处理器的一个实施例包括：解码器，用于解码局部核心上的原子操作; 本地核心上的预测逻辑来估计与本地核心上的原子操作的执行相关的成本以及与在远程核心上执行原子操作相关联的成本; 以及所述远程核心，如果所述预测逻辑确定所述本地核上的执行成本相对大于所述远程核上的执行成本，则远程执行所述原子操作; 如果预测逻辑确定本地核心上的本地执行成本相对低于在远程核心上执行的成本，本地核心将在本地执行原子操作。

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类