-
公开(公告)号:US20210397925A1
公开(公告)日:2021-12-23
申请号:US17446101
申请日:2021-08-26
Applicant: Intel Corporation
Inventor: Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Barath Lakshmanan , Ben J. Ashbaugh , Jingyi Jin , Jeremy Bottleson , Mike B. Macpherson , Kevin Nealis , Dhawal Srivastava , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Altug Koker , Abhishek R. Appu
Abstract: A library of machine learning primitives is provided to optimize a machine learning model to improve the efficiency of inference operations. In one embodiment a trained convolutional neural network (CNN) model is processed into a trained CNN model via pruning, convolution window optimization, and quantization.
-
公开(公告)号:US20210350215A1
公开(公告)日:2021-11-11
申请号:US17323694
申请日:2021-05-18
Applicant: Intel Corporation
Inventor: Brian T. Lewis , Rajkishore Barik , Murali Sundaresan , Leonard Truong , Feng Chen , Xiaoming Chen , Mike B. MacPherson
Abstract: A mechanism is described for facilitating efficient training of neural networks at computing devices. A method of embodiments, as described herein, includes detecting one or more inputs for training of a neural network, and introducing randomness in floating point (FP) numbers to prevent overtraining of the neural network, where introducing randomness includes replacing less-significant low-order bits of operand and result values with new low-order bits during the training of the neural network.
-
公开(公告)号:US20210334637A1
公开(公告)日:2021-10-28
申请号:US17317857
申请日:2021-05-11
Applicant: INTEL CORPORATION
Inventor: Kamal Sinha , Balaji Vembu , Eriko Nurvitadhi , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Farshad Akhbari , Narayan Srinivasa , Feng Chen , Dukhwan Kim , Nadathur Rajagopalan Satish , John C. Weast , Mike B. MacPherson , Linda L. Hurd , Vasanth Ranganathan , Sanjeev S. Jahagirdar
IPC: G06N3/063 , G06N3/08 , G06N3/04 , G06T1/20 , G06F9/30 , G06T15/00 , G06F15/78 , G06F15/76 , G06F1/3287 , G06F1/3293
Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US11080813B2
公开(公告)日:2021-08-03
申请号:US16584076
申请日:2019-09-26
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Linda L. Hurd , Dukhwan Kim , Mike B. Macpherson , John C. Weast , Feng Chen , Farshad Akhbari , Narayan Srinivasa , Nadathur Rajagopalan Satish , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman
IPC: G06T1/20 , G06F3/14 , G06F9/30 , G06F9/38 , G06N3/04 , G06N3/063 , G06N3/08 , G06T15/00 , G09G5/36 , G06T15/04
Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a mixed precision core to perform a mixed precision multi-dimensional matrix multiply and accumulate operation on 8-bit and/or 32 bit signed or unsigned integer elements.
-
公开(公告)号:US11010659B2
公开(公告)日:2021-05-18
申请号:US15495020
申请日:2017-04-24
Applicant: Intel Corporation
Inventor: Kamal Sinha , Balaji Vembu , Eriko Nurvitadhi , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Farshad Akhbari , Narayan Srinivasa , Feng Chen , Dukhwan Kim , Nadathur Rajagopalan Satish , John C. Weast , Mike B. MacPherson , Linda L. Hurd , Vasanth Ranganathan , Sanjeev S. Jahagirdar
IPC: G06F17/50 , G06N3/063 , G06N3/08 , G06N3/04 , G06T1/20 , G06F9/30 , G06T15/00 , G06F15/78 , G06F15/76 , G06F1/3287 , G06F1/3293 , G06T1/60
Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20210081774A1
公开(公告)日:2021-03-18
申请号:US17083080
申请日:2020-10-28
Applicant: Intel Corporation
Inventor: Rajkishore Barik , Elmoustapha Ould-Ahmed-Vall , Xiaoming Chen , Dhawal Srivastava , Anbang Yao , Kevin Nealis , Eriko Nurvitadhi , Sara S. Baghsorkhi , Balaji Vembu , Tatiana Shpeisman , Ping T. Tang
Abstract: One embodiment provides for a general-purpose graphics processing unit including a scheduler to schedule multiple matrix operations for execution by a general-purpose graphics processing unit. The multiple matrix operations are determined based on a single machine learning compute instruction. The single machine learning compute instruction is a convolution instruction and the multiple matrix operations are associated with a convolution operation.
-
公开(公告)号:US10853906B2
公开(公告)日:2020-12-01
申请号:US16197821
申请日:2018-11-21
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , Altug Koker , Abhishek R. Appu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Ben J. Ashbaugh , Barath Lakshmanan , Liwei Ma , Joydeep Ray , Ping T. Tang , Michael S. Strickland
IPC: G06T1/20 , G06F7/483 , G06N3/08 , G06F9/30 , G06N3/04 , G06N3/063 , G06F9/50 , G06F9/38 , G06N20/00 , G06F3/14 , G06T1/60 , G06T15/00
Abstract: One embodiment provides an accelerator module comprising a memory stack including multiple memory dies; a graphics processing unit (GPU) coupled with the memory stack via one or more memory controllers, the GPU including a plurality of multiprocessors having a single instruction, multiple thread (SIMT) architecture, the multiprocessors to execute at least one single instruction. The at least one single instruction is to cause at least a portion of the GPU to perform a floating point operation on input having differing precisions. The floating point operation is a two-dimensional matrix multiply and accumulate operation.
-
公开(公告)号:US20200020070A1
公开(公告)日:2020-01-16
申请号:US16584076
申请日:2019-09-26
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Linda L. Hurd , Dukhwan Kim , Mike B. Macpherson , John C. Weast , Feng Chen , Farshad Akhbari , Narayan Srinivasa , Nadathur Rajagopalan Satish , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman
IPC: G06T1/20 , G09G5/36 , G06T15/00 , G06N3/08 , G06N3/063 , G06N3/04 , G06F9/38 , G06F9/30 , G06F3/14
Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a mixed precision core to perform a mixed precision multi-dimensional matrix multiply and accumulate operation on 8-bit and/or 32 bit signed or unsigned integer elements.
-
公开(公告)号:US20200019844A1
公开(公告)日:2020-01-16
申请号:US16518828
申请日:2019-07-22
Applicant: Intel Corporation
Inventor: Brian T. Lewis , Feng Chen , Jeffrey R. Jackson , Justin E. Gottschlich , Rajkishore Barik , Xiaoming Chen , Prasoonkumar Surti , Mike B. Macpherson , Murali Sundaresan
IPC: G06N3/063 , B60W30/095 , G06N3/00 , G06N3/04
Abstract: A mechanism is described for facilitating smart collection of data and smart management of autonomous machines. A method of embodiments, as described herein, includes detecting one or more sets of data from one or more sources over one or more networks, and combining a first computation directed to be performed locally at a local computing device with a second computation directed to be performed remotely at a remote computing device in communication with the local computing device over the one or more networks, where the first computation consumes low power, wherein the second computation consumes high power.
-
公开(公告)号:US10303953B2
公开(公告)日:2019-05-28
申请号:US15488555
申请日:2017-04-17
Applicant: Intel Corporation
Inventor: Mayuresh M. Varerkar , Barnan Das , Narayan Biswal , Stanley J. Baran , Gokcen Cilingir , Nilesh V. Shah , Archie Sharma , Sherine Abdelhak , Sachin Godse , Farshad Akhbari , Narayan Srinivasa , Altug Koker , Nadathur Rajagopalan Satish , Dukhwan Kim , Feng Chen , Abhishek R. Appu , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Vasanth Ranganathan , Sanjeev Jahagirdar
Abstract: A mechanism is described for facilitating person tracking and data security in machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting, by a camera associated with one or more trackers, a person within a physical vicinity, where detecting includes capturing one or more images the person. The method may further include tracking, by the one or more trackers, the person based on the one or more images of the person, where tracking includes collect tracking data relating to the person. The method may further include selecting a tracker of the one or more trackers as a preferred tracker based on the tracking data.
-
-
-
-
-
-
-
-
-