-
公开(公告)号:US12039000B2
公开(公告)日:2024-07-16
申请号:US18163418
申请日:2023-02-02
Applicant: Intel Corporation
Inventor: Joydeep Ray , Fangwen Fu , Dhiraj D. Kalamkar , Sasikanth Avancha
Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.
-
公开(公告)号:US20190042939A1
公开(公告)日:2019-02-07
申请号:US15994930
申请日:2018-05-31
Applicant: Intel Corporation
Inventor: Martin Langhammer , Sudarshan Srinivasan , Gregg William Baeckler , Duncan Moss , Sasikanth Avancha , Dipankar Das
Abstract: The present disclosure relates generally to techniques for improving the implementation of certain operations on an integrated circuit. In particular, deep learning techniques, which may use a deep neural network (DNN) topology, may be implemented more efficiently using low-precision weights and activation values by efficiently performing down conversion of data to a lower precision and by preventing data overflow during suitable computations. Further, by more efficiently mapping multipliers to programmable logic on the integrated circuit device, the resources used by the DNN topology to perform, for example, inference tasks may be reduced, resulting in improved integrated circuit operating speeds.
-
公开(公告)号:US20240320000A1
公开(公告)日:2024-09-26
申请号:US18621539
申请日:2024-03-29
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Jorge Parra , Ashutosh Garg , Chandra Gurram , Chunhui Mei , Durgesh Borkar , Shubra Marwaha , Supratim Pal , Varghese George , Wei Xiong , Yan Li , Yongsheng Liu , Dipankar Das , Sasikanth Avancha , Dharma Teja Vooturi , Naveen K. Mellempudi
CPC classification number: G06F9/30036 , G06F9/3001 , G06F9/30101 , G06F9/3893 , G06F15/8046
Abstract: An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.
-
公开(公告)号:US11593454B2
公开(公告)日:2023-02-28
申请号:US16890122
申请日:2020-06-02
Applicant: Intel Corporation
Inventor: Joydeep Ray , Fangwen Fu , Dhiraj D. Kalamkar , Sasikanth Avancha
Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.
-
公开(公告)号:US20220413803A1
公开(公告)日:2022-12-29
申请号:US17304803
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Jorge Parra , Fangwen Fu , Subramaniam Maiyuran , Varghese George , Mike Macpherson , Supratim Pal , Chandra Gurram , Sabareesh Ganapathy , Sasikanth Avancha , Dharma Teja Vooturi , Naveen Mellempudi , Dipankar Das
Abstract: A processing apparatus is described herein that includes a general-purpose parallel processing engine comprising a matrix accelerator including one or more systolic arrays, at least one of the one or more systolic arrays comprising multiple pipeline stages, each pipeline stage of the multiple pipeline stages including multiple processing elements, the multiple processing elements configured to perform processing operations on input matrix elements based on output sparsity metadata. The output sparsity metadata indicates to the multiple processing elements to bypass multiplication for a first row of elements of a second matrix and multiply a second row of elements of the second matrix with a column of matrix elements of a first matrix.
-
公开(公告)号:US11275998B2
公开(公告)日:2022-03-15
申请号:US15994930
申请日:2018-05-31
Applicant: Intel Corporation
Inventor: Martin Langhammer , Sudarshan Srinivasan , Gregg William Baeckler , Duncan Moss , Sasikanth Avancha , Dipankar Das
IPC: G06N3/08 , G06N3/04 , G06N3/063 , G06F17/16 , G06F7/501 , G06F5/01 , G06F7/509 , H03M7/40 , H03M7/42 , H03M7/30
Abstract: The present disclosure relates generally to techniques for improving the implementation of certain operations on an integrated circuit. In particular, deep learning techniques, which may use a deep neural network (DNN) topology, may be implemented more efficiently using low-precision weights and activation values by efficiently performing down conversion of data to a lower precision and by preventing data overflow during suitable computations. Further, by more efficiently mapping multipliers to programmable logic on the integrated circuit device, the resources used by the DNN topology to perform, for example, inference tasks may be reduced, resulting in improved integrated circuit operating speeds.
-
公开(公告)号:US20210374209A1
公开(公告)日:2021-12-02
申请号:US16890122
申请日:2020-06-02
Applicant: Intel Corporation
Inventor: Joydeep Ray , Fangwen Fu , Dhiraj D. Kalamkar , Sasikanth Avancha
Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.
-
-
-
-
-
-