-
公开(公告)号:US20210374069A1
公开(公告)日:2021-12-02
申请号:US17404770
申请日:2021-08-17
Applicant: Intel Corporation
Inventor: Edward Grochowski , Julio Gago , Roger Gramunt , Roger Espasa , Rolf Kassa
IPC: G06F12/1009 , G06F12/1027 , G06F12/14 , G06F12/0864
Abstract: A method, system, and apparatus may initialize a fixed plurality of page table entries for a fixed plurality of pages in memory, each page having a first size, wherein a linear address for each page table entry corresponds to a physical address and the fixed plurality of pages are aligned. A bit in each of the page table entries for the aligned pages may be set to indicate whether or not the fixed plurality of pages is to be treated as one combined page having a second page size larger than the first page size. Other embodiments are described and claimed.
-
公开(公告)号:US20210019631A1
公开(公告)日:2021-01-21
申请号:US16983107
申请日:2020-08-03
Applicant: Intel Corporation
Inventor: Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke
Abstract: A processing cluster of a processing cluster array comprises a plurality of registers to store input values of vector input operands, the input values of at least some of the vector input operands having different bit lengths than those of other input values of other vector input operands, and a compute unit to execute a dot-product instruction with the vector input operands to perform a number of parallel multiply operations and an accumulate operation per 32-bit lane based on a bit length of the smallest-sized input value of a first vector input operand relative to the 32-bit lane.
-
公开(公告)号:US20180322390A1
公开(公告)日:2018-11-08
申请号:US15869564
申请日:2018-01-12
Applicant: Intel Corporation
Inventor: Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke
Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a fetch unit to fetch a single instruction having multiple input operands, wherein the multiple input operands have an unequal bit-length, a first input operand having a first bit-length and a second input operand having a second bit-length; a decode unit to decode the single instruction into a decoded instruction; an operand length unit to determine a smaller bit-length of the first bit-length and the second bit-length; and a compute unit to perform a matrix operation on the multiple input operands to generate an output value having a bit length of the smaller bit length.
-
公开(公告)号:US10776699B2
公开(公告)日:2020-09-15
申请号:US15869564
申请日:2018-01-12
Applicant: Intel Corporation
Inventor: Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke
Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a fetch unit to fetch a single instruction having multiple input operands, wherein the multiple input operands have an unequal bit-length, a first input operand having a first bit-length and a second input operand having a second bit-length; a decode unit to decode the single instruction into a decoded instruction; an operand length unit to determine a smaller bit-length of the first bit-length and the second bit-length; and a compute unit to perform a matrix operation on the multiple input operands to generate an output value having a bit length of the smaller bit length.
-
公开(公告)号:US10175986B2
公开(公告)日:2019-01-08
申请号:US15589510
申请日:2017-05-08
Applicant: Intel Corporation
Inventor: Roger Gramunt , Ramon Matas , Benjamin C. Chaffin , Neal S. Moyer , Rammohan Padmanabhan , Alexey P. Suprun , Matthew G. Smith
Abstract: A processor includes a logic for stateless capture of data linear addresses (DLA) during precise event based sampling (PEBS) for an out-of-order execution engine. The engine may include a PEBS unit with logic to increment a counter each time an instance of a designated micro-op is retired a reorder buffer, capture output DLA referenced by an instance of the micro-op that executes after the counter overflows, set a captured bit associated with a reorder buffer identifier for the instance of the micro-op, and store a PEBS record in a debug storage when the instance of the micro-op is retired from the reorder buffer. The designated micro-op references a DLA of a memory accessible to the processor.
-
6.
公开(公告)号:US09804842B2
公开(公告)日:2017-10-31
申请号:US14581535
申请日:2014-12-23
Applicant: Intel Corporation
Inventor: Jesus Corbal San Adrian , Dennis R. Bradford , Benjamin C. Chaffin , Taraneh Bahrami , Jonathan C. Hall , Thomas B. Maciukenas , Roger Gramunt , Rohan Sharma
CPC classification number: G06F9/30036 , G06F9/30018 , G06F9/30032 , G06F9/30072 , G06F9/30101 , G06F15/8084
Abstract: An apparatus and method for efficiently managing the architectural state of a processor. For example, one embodiment of a processor comprises: a source mask register to be logically subdivided into at least a first portion to store a usable portion of a mask value and a second portion to store an indication of whether the usable portion of the mask value has been updated; a control register to store an unusable portion of the mask value; architectural state management logic to read the indication to determine whether the mask value has been updated prior to performing a store operation, wherein if the mask value has been updated, then the architectural state management logic is to read the usable portion of the mask value from the first portion of the source mask register and zero out bits of the unusable portion of the mask value to generate a final mask value to be saved to memory, and wherein if the mask value has not been updated, then the architectural state management logic is to concatenate the usable portion of the mask value with the unusable portion of the mask value read from the control register to generate a final mask value to be saved to memory.
-
公开(公告)号:US20170242698A1
公开(公告)日:2017-08-24
申请号:US15589510
申请日:2017-05-08
Applicant: Intel Corporation
Inventor: Roger Gramunt , Ramon Matas , Benjamin C. Chaffin , Neal S. Moyer , Rammohan Padmanabhan , Alexey P. Suprun , Matthew G. Smith
CPC classification number: G06F9/3016 , G06F9/30098 , G06F9/30101 , G06F9/30145 , G06F9/3855 , G06F9/3857 , G06F11/3024 , G06F11/34 , G06F11/3466 , G06F11/36 , G06F11/362 , G06F11/3636
Abstract: A processor includes a logic for stateless capture of data linear addresses (DLA) during precise event based sampling (PEBS) for an out-of-order execution engine. The engine may include a PEBS unit with logic to increment a counter each time an instance of a designated micro-op is retired a reorder buffer, capture output DLA referenced by an instance of the micro-op that executes after the counter overflows, set a captured bit associated with a reorder buffer identifier for the instance of the micro-op, and store a PEBS record in a debug storage when the instance of the micro-op is retired from the reorder buffer. The designated micro-op references a DLA of a memory accessible to the processor.
-
公开(公告)号:US20230195456A1
公开(公告)日:2023-06-22
申请号:US17558978
申请日:2021-12-22
Applicant: Intel Corporation
Inventor: Sufiyan Syed , Roger Gramunt , Jayesh Gaur , Priyank Deshpande
CPC classification number: G06F9/28 , G06F9/223 , G06F9/4806 , G06F9/5027 , G06F2209/5014
Abstract: In one embodiment, an apparatus includes: a plurality of execution circuits to execute and instruct micro-operations (μops), where a subset of the plurality of execution circuits are capable of execution of a fused μop; a fusion circuit coupled to at least the subset of the plurality of execution circuits, wherein the fusion circuit is to fuse at least some pairs of producer-consumer μops into fused μops; and a fusion throttle circuit coupled to the fusion circuit, wherein the fusion throttle circuit is to prevent a first μop from being fused with another μop based at least in part on historical information associated with the first μop. Other embodiments are described and claimed.
-
公开(公告)号:US20200242046A1
公开(公告)日:2020-07-30
申请号:US16560213
申请日:2019-09-04
Applicant: Intel Corporation
Inventor: Edward Grochowski , Julio Gago , Roger Gramunt , Roger Espasa , Rolf Kassa
IPC: G06F12/1009 , G06F12/0864 , G06F12/14 , G06F12/1027
Abstract: A method, system, and apparatus may initialize a fixed plurality of page table entries for a fixed plurality of pages in memory, each page having a first size, wherein a linear address for each page table entry corresponds to a physical address and the fixed plurality of pages are aligned. A bit in each of the page table entries for the aligned pages may be set to indicate whether or not the fixed plurality of pages is to be treated as one combined page having a second page size larger than the first page size. Other embodiments are described and claimed.
-
公开(公告)号:US20220343174A1
公开(公告)日:2022-10-27
申请号:US17742581
申请日:2022-05-12
Applicant: Intel Corporation
Inventor: Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke
Abstract: Described herein is a graphics processor including a processing resource including a multiplier configured to multiply input associated with the instruction at one of a first plurality of bit widths, an adder configured to add a product output from the multiplier with an accumulator value at one of a second plurality of bit widths, and circuitry to select a first bit width of the first plurality of bit widths for the multiplier and a second bit width of the second plurality of bit widths for the adder.
-
-
-
-
-
-
-
-
-