-
1.
公开(公告)号:US20190138309A1
公开(公告)日:2019-05-09
申请号:US16004081
申请日:2018-06-08
Applicant: INTEL CORPORATION
Inventor: VICTOR LEE , Mikhail Smelyanskiy , Alexander Heinecke
IPC: G06F9/30 , G06F12/02 , G06F12/0862 , G06F9/345 , G06F12/0875 , G06F9/34 , G06F12/0811
Abstract: Methods and apparatuses relating to a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache. In one embodiment, a hardware processor includes a decoder to decode a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache, wherein at least one operand of the prefetch instruction is to indicate a system memory address of an element of the multidimensional block of elements, a stride of the multidimensional block of elements, and boundaries of the multidimensional block of elements, and an execution unit to execute the prefetch instruction to generate system memory addresses of the other elements of the multidimensional block of elements, and load the multidimensional block of elements into the cache from the system memory addresses.
-
公开(公告)号:US20150228091A1
公开(公告)日:2015-08-13
申请号:US14693056
申请日:2015-04-22
Applicant: Intel Corporation
Inventor: Victor W. Lee , Mikhail Smelyanskiy , Ganesh S. Dasika , Jose Gonzalez , Jatin Chhugani , Yen-Kuang Chen , Changkyu Kim , Julio Gago , Santiago Galan , Victor Moya Del Barrio
IPC: G06T11/00
CPC classification number: G06T11/001 , G06F17/16 , G06T1/00
Abstract: A texture unit may be used to perform general purpose mathematical computations such as dot products. This enables some general purpose computations and operations to be offloaded from a central processing unit to the texture unit. The texture unit may use linear interpolators in order to perform the dot product calculations.
Abstract translation: 纹理单元可用于执行诸如点积的通用数学计算。 这使得一些通用计算和操作能够从中央处理单元卸载到纹理单元。 纹理单元可以使用线性内插器来执行点积计算。
-
公开(公告)号:US11334796B2
公开(公告)日:2022-05-17
申请号:US16983107
申请日:2020-08-03
Applicant: Intel Corporation
Inventor: Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke
Abstract: A processing cluster of a processing cluster array comprises a plurality of registers to store input values of vector input operands, the input values of at least some of the vector input operands having different bit lengths than those of other input values of other vector input operands, and a compute unit to execute a dot-product instruction with the vector input operands to perform a number of parallel multiply operations and an accumulate operation per 32-bit lane based on a bit length of the smallest-sized input value of a first vector input operand relative to the 32-bit lane.
-
公开(公告)号:US10198264B2
公开(公告)日:2019-02-05
申请号:US14969864
申请日:2015-12-15
Applicant: Intel Corporation
Inventor: Asit K. Mishra , Deborah T. Marr , Jong Soo Park , Nadathur Rajagopalan Satish , Mikhail Smelyanskiy , Michael Anderson , Mostofa Ali Patwary , Narayanan Sundaram , Sheng Li
IPC: G06F9/30
Abstract: A processing device includes a sorting module, which adds to each of a plurality of elements a position value of a corresponding position in a register rest resulting in a plurality of transformed elements in corresponding positions. The plurality of elements include a plurality of bits. The sorting module compares each of the plurality of transformed elements to itself and to one another. The sorting module also assigns one of an enabled or disabled indicator to each of the plurality of the transformed elements based on the comparison. The sorting module further counts a number of the enabled indicators assigned to each of the plurality of the transformed elements to generate a sorted sequence of the plurality of elements.
-
公开(公告)号:US09996350B2
公开(公告)日:2018-06-12
申请号:US14583651
申请日:2014-12-27
Applicant: Intel Corporation
Inventor: Victor Lee , Mikhail Smelyanskiy , Alexander Heinecke
IPC: G06F9/30 , G06F9/34 , G06F12/0875 , G06F9/345
CPC classification number: G06F9/30047 , G06F9/30145 , G06F9/34 , G06F9/3455 , G06F12/0207 , G06F12/0811 , G06F12/0862 , G06F12/0875 , G06F2212/452 , G06F2212/6026
Abstract: Methods and apparatuses relating to a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache. In one embodiment, a hardware processor includes a decoder to decode a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache, wherein at least one operand of the prefetch instruction is to indicate a system memory address of an element of the multidimensional block of elements, a stride of the multidimensional block of elements, and boundaries of the multidimensional block of elements, and an execution unit to execute the prefetch instruction to generate system memory addresses of the other elements of the multidimensional block of elements, and load the multidimensional block of elements into the cache from the system memory addresses.
-
公开(公告)号:US09720663B2
公开(公告)日:2017-08-01
申请号:US14750635
申请日:2015-06-25
Applicant: Intel Corporation
Inventor: Hongbo Rong , Jong Soo Park , Mikhail Smelyanskiy , Geoff Lowney
CPC classification number: G06F8/41 , G06F8/443 , G06F8/4434 , G06F8/4435 , G06F9/44521 , G06F17/16
Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to optimize sparse matrix execution. An example disclosed apparatus includes a context former to identify a matrix function call from a matrix function library, the matrix function call associated with a sparse matrix, a pattern matcher to identify an operational pattern associated with the matrix function call, and a code generator to associate a function data structure with the matrix function call exhibiting the operational pattern, the function data structure stored external to the matrix function library, and facilitate a runtime link between the function data structure and the matrix function call.
-
公开(公告)号:US20210019631A1
公开(公告)日:2021-01-21
申请号:US16983107
申请日:2020-08-03
Applicant: Intel Corporation
Inventor: Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke
Abstract: A processing cluster of a processing cluster array comprises a plurality of registers to store input values of vector input operands, the input values of at least some of the vector input operands having different bit lengths than those of other input values of other vector input operands, and a compute unit to execute a dot-product instruction with the vector input operands to perform a number of parallel multiply operations and an accumulate operation per 32-bit lane based on a bit length of the smallest-sized input value of a first vector input operand relative to the 32-bit lane.
-
公开(公告)号:US20190026158A1
公开(公告)日:2019-01-24
申请号:US15872762
申请日:2018-01-16
Applicant: Intel Corporation
Inventor: Anthony Nguyen , Engin Ipek , Victor Lee , Daehyun Kim , Mikhail Smelyanskiy
Abstract: Methods and apparatus to provide virtualized vector processing are described. In one embodiment, one or more operations corresponding to a virtual vector request are distributed to one or more processor cores for execution.
-
公开(公告)号:US20180322390A1
公开(公告)日:2018-11-08
申请号:US15869564
申请日:2018-01-12
Applicant: Intel Corporation
Inventor: Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke
Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a fetch unit to fetch a single instruction having multiple input operands, wherein the multiple input operands have an unequal bit-length, a first input operand having a first bit-length and a second input operand having a second bit-length; a decode unit to decode the single instruction into a decoded instruction; an operand length unit to determine a smaller bit-length of the first bit-length and the second bit-length; and a compute unit to perform a matrix operation on the multiple input operands to generate an output value having a bit length of the smaller bit length.
-
公开(公告)号:US09076254B2
公开(公告)日:2015-07-07
申请号:US14054933
申请日:2013-10-16
Applicant: Intel Corporation
Inventor: Victor W. Lee , Mikhail Smelyanskiy , Ganesh S. Dasika , Jose Gonzalez , Jatin Chhugani , Yen-Kuang Chen , Changkyu Kim , Julio Gago , Santiago Galan , Victor Moya Del Barrio
CPC classification number: G06T11/001 , G06F17/16 , G06T1/00
Abstract: A texture unit may be used to perform general purpose mathematical computations such as dot products. This enables some general purpose computations and operations to be offloaded from a central processing unit to the texture unit. The texture unit may use linear interpolators in order to perform the dot product calculations.
Abstract translation: 纹理单元可用于执行诸如点积的通用数学计算。 这使得一些通用计算和操作能够从中央处理单元卸载到纹理单元。 纹理单元可以使用线性内插器来执行点积计算。
-
-
-
-
-
-
-
-
-