Patent search ap:("Intel Corporation") AND inv:"Roger Gramunt" Page 1

1.

发明申请
METHOD, SYSTEM, AND APPARATUS FOR PAGE SIZING EXTENSION 有权

公开(公告)号：US20210374069A1

公开(公告)日：2021-12-02

申请号：US17404770

申请日：2021-08-17

Applicant: Intel Corporation

Inventor： Edward Grochowski , Julio Gago , Roger Gramunt , Roger Espasa , Rolf Kassa

IPC: G06F12/1009 , G06F12/1027 , G06F12/14 , G06F12/0864

Abstract: A method, system, and apparatus may initialize a fixed plurality of page table entries for a fixed plurality of pages in memory, each page having a first size, wherein a linear address for each page table entry corresponds to a physical address and the fixed plurality of pages are aligned. A bit in each of the page table entries for the aligned pages may be set to indicate whether or not the fixed plurality of pages is to be treated as one combined page having a second page size larger than the first page size. Other embodiments are described and claimed.

2.

发明申请
OPTIMIZED COMPUTE HARDWARE FOR MACHINE LEARNING OPERATIONS 有权

公开(公告)号：US20210019631A1

公开(公告)日：2021-01-21

申请号：US16983107

申请日：2020-08-03

Applicant: Intel Corporation

Inventor： Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke

IPC: G06N3/08 , G06N3/063 , G06N3/04 , G06F17/16 , G06F9/30

Abstract: A processing cluster of a processing cluster array comprises a plurality of registers to store input values of vector input operands, the input values of at least some of the vector input operands having different bit lengths than those of other input values of other vector input operands, and a compute unit to execute a dot-product instruction with the vector input operands to perform a number of parallel multiply operations and an accumulate operation per 32-bit lane based on a bit length of the smallest-sized input value of a first vector input operand relative to the 32-bit lane.

3.

发明申请
OPTIMIZED COMPUTE HARDWARE FOR MACHINE LEARNING OPERATIONS 审中-公开

公开(公告)号：US20180322390A1

公开(公告)日：2018-11-08

申请号：US15869564

申请日：2018-01-12

Applicant: Intel Corporation

Inventor： Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke

IPC: G06N3/08 , G06F17/16 , G06N3/04 , G06N3/063

Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a fetch unit to fetch a single instruction having multiple input operands, wherein the multiple input operands have an unequal bit-length, a first input operand having a first bit-length and a second input operand having a second bit-length; a decode unit to decode the single instruction into a decoded instruction; an operand length unit to determine a smaller bit-length of the first bit-length and the second bit-length; and a compute unit to perform a matrix operation on the multiple input operands to generate an output value having a bit length of the smaller bit length.

4.

发明授权
Optimized compute hardware for machine learning operations 有权

公开(公告)号：US10776699B2

公开(公告)日：2020-09-15

申请号：US15869564

申请日：2018-01-12

Applicant: Intel Corporation

Inventor： Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke

IPC: G06F17/16 , G06F9/30 , G06N3/08 , G06N3/063 , G06N3/04

Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a fetch unit to fetch a single instruction having multiple input operands, wherein the multiple input operands have an unequal bit-length, a first input operand having a first bit-length and a second input operand having a second bit-length; a decode unit to decode the single instruction into a decoded instruction; an operand length unit to determine a smaller bit-length of the first bit-length and the second bit-length; and a compute unit to perform a matrix operation on the multiple input operands to generate an output value having a bit length of the smaller bit length.

5.

发明授权
Stateless capture of data linear addresses during precise event based sampling 有权

公开(公告)号：US10175986B2

公开(公告)日：2019-01-08

申请号：US15589510

申请日：2017-05-08

Applicant: Intel Corporation

Inventor： Roger Gramunt , Ramon Matas , Benjamin C. Chaffin , Neal S. Moyer , Rammohan Padmanabhan , Alexey P. Suprun , Matthew G. Smith

IPC: G06F9/30 , G06F9/38 , G06F11/36 , G06F11/34 , G06F11/30

Abstract: A processor includes a logic for stateless capture of data linear addresses (DLA) during precise event based sampling (PEBS) for an out-of-order execution engine. The engine may include a PEBS unit with logic to increment a counter each time an instance of a designated micro-op is retired a reorder buffer, capture output DLA referenced by an instance of the micro-op that executes after the counter overflows, set a captured bit associated with a reorder buffer identifier for the instance of the micro-op, and store a PEBS record in a debug storage when the instance of the micro-op is retired from the reorder buffer. The designated micro-op references a DLA of a memory accessible to the processor.

6.

发明授权
Method and apparatus for efficiently managing architectural register state of a processor 有权

公开(公告)号：US09804842B2

公开(公告)日：2017-10-31

申请号：US14581535

申请日：2014-12-23

Applicant: Intel Corporation

Inventor： Jesus Corbal San Adrian , Dennis R. Bradford , Benjamin C. Chaffin , Taraneh Bahrami , Jonathan C. Hall , Thomas B. Maciukenas , Roger Gramunt , Rohan Sharma

IPC: G06F9/302 , G06F9/305 , G06F9/312 , G06F9/315 , G06F15/76 , G06F9/30 , G06F15/80

CPC classification number: G06F9/30036 , G06F9/30018 , G06F9/30032 , G06F9/30072 , G06F9/30101 , G06F15/8084

Abstract: An apparatus and method for efficiently managing the architectural state of a processor. For example, one embodiment of a processor comprises: a source mask register to be logically subdivided into at least a first portion to store a usable portion of a mask value and a second portion to store an indication of whether the usable portion of the mask value has been updated; a control register to store an unusable portion of the mask value; architectural state management logic to read the indication to determine whether the mask value has been updated prior to performing a store operation, wherein if the mask value has been updated, then the architectural state management logic is to read the usable portion of the mask value from the first portion of the source mask register and zero out bits of the unusable portion of the mask value to generate a final mask value to be saved to memory, and wherein if the mask value has not been updated, then the architectural state management logic is to concatenate the usable portion of the mask value with the unusable portion of the mask value read from the control register to generate a final mask value to be saved to memory.

7.

发明申请
STATELESS CAPTURE OF DATA LINEAR ADDRESSES DURING PRECISE EVENT BASED SAMPLING 审中-公开

公开(公告)号：US20170242698A1

公开(公告)日：2017-08-24

申请号：US15589510

申请日：2017-05-08

Applicant: Intel Corporation

Inventor： Roger Gramunt , Ramon Matas , Benjamin C. Chaffin , Neal S. Moyer , Rammohan Padmanabhan , Alexey P. Suprun , Matthew G. Smith

IPC: G06F9/30 , G06F11/30 , G06F11/34 , G06F9/38 , G06F11/36

CPC classification number: G06F9/3016 , G06F9/30098 , G06F9/30101 , G06F9/30145 , G06F9/3855 , G06F9/3857 , G06F11/3024 , G06F11/34 , G06F11/3466 , G06F11/36 , G06F11/362 , G06F11/3636

Abstract: A processor includes a logic for stateless capture of data linear addresses (DLA) during precise event based sampling (PEBS) for an out-of-order execution engine. The engine may include a PEBS unit with logic to increment a counter each time an instance of a designated micro-op is retired a reorder buffer, capture output DLA referenced by an instance of the micro-op that executes after the counter overflows, set a captured bit associated with a reorder buffer identifier for the instance of the micro-op, and store a PEBS record in a debug storage when the instance of the micro-op is retired from the reorder buffer. The designated micro-op references a DLA of a memory accessible to the processor.

8.

发明公开
SYSTEM, APPARATUS AND METHOD FOR THROTTLING FUSION OF MICRO-OPERATIONS IN A PROCESSOR 审中-公开

公开(公告)号：US20230195456A1

公开(公告)日：2023-06-22

申请号：US17558978

申请日：2021-12-22

Applicant: Intel Corporation

Inventor： Sufiyan Syed , Roger Gramunt , Jayesh Gaur , Priyank Deshpande

IPC: G06F9/28 , G06F9/22 , G06F9/48 , G06F9/50

CPC classification number: G06F9/28 , G06F9/223 , G06F9/4806 , G06F9/5027 , G06F2209/5014

Abstract: In one embodiment, an apparatus includes: a plurality of execution circuits to execute and instruct micro-operations (μops), where a subset of the plurality of execution circuits are capable of execution of a fused μop; a fusion circuit coupled to at least the subset of the plurality of execution circuits, wherein the fusion circuit is to fuse at least some pairs of producer-consumer μops into fused μops; and a fusion throttle circuit coupled to the fusion circuit, wherein the fusion throttle circuit is to prevent a first μop from being fused with another μop based at least in part on historical information associated with the first μop. Other embodiments are described and claimed.

9.

发明申请
METHOD, SYSTEM, AND APPARATUS FOR PAGE SIZING EXTENSION 审中-公开

公开(公告)号：US20200242046A1

公开(公告)日：2020-07-30

申请号：US16560213

申请日：2019-09-04

Applicant: Intel Corporation

Inventor： Edward Grochowski , Julio Gago , Roger Gramunt , Roger Espasa , Rolf Kassa

IPC: G06F12/1009 , G06F12/0864 , G06F12/14 , G06F12/1027

Abstract: A method, system, and apparatus may initialize a fixed plurality of page table entries for a fixed plurality of pages in memory, each page having a first size, wherein a linear address for each page table entry corresponds to a physical address and the fixed plurality of pages are aligned. A bit in each of the page table entries for the aligned pages may be set to indicate whether or not the fixed plurality of pages is to be treated as one combined page having a second page size larger than the first page size. Other embodiments are described and claimed.

10.

发明申请
OPTIMIZED COMPUTE HARDWARE FOR MACHINE LEARNING OPERATIONS 有权

公开(公告)号：US20220343174A1

公开(公告)日：2022-10-27

申请号：US17742581

申请日：2022-05-12

Applicant: Intel Corporation

Inventor： Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke

IPC: G06N3/08 , G06N3/063 , G06N3/04 , G06F17/16 , G06F9/30 , G06F9/38 , G06F7/544

Abstract: Described herein is a graphics processor including a processing resource including a multiplier configured to multiply input associated with the instruction at one of a first plurality of bit widths, an adder configured to add a product output from the multiplier with an accumulator value at one of a second plurality of bit widths, and circuitry to select a first bit width of the first plurality of bit widths for the multiplier and a second bit width of the second plurality of bit widths for the adder.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification