-
公开(公告)号:US11803935B2
公开(公告)日:2023-10-31
申请号:US17881720
申请日:2022-08-05
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Nicolas C. Galoppo Von Borries
IPC: G06F17/16 , G06T1/20 , G06F9/30 , G06F9/38 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , H03M7/30 , G06N20/00 , G06F12/02 , G06F18/2136 , G06F9/48 , G06N3/04 , G06N3/08 , G06T1/60 , G06T15/00
CPC classification number: G06T1/20 , G06F9/3001 , G06F9/3885 , G06F9/4881 , G06F12/0207 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , G06F17/16 , G06F18/2136 , G06N3/04 , G06N3/08 , G06N20/00 , G06T1/60 , G06T15/005 , H03M7/30 , G06F2212/1024 , G06F2212/302 , G06F2212/401 , G06F2212/621 , G06T2200/28
Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.
-
公开(公告)号:US11797837B2
公开(公告)日:2023-10-24
申请号:US15494971
申请日:2017-04-24
Applicant: Intel Corporation
Inventor: Altug Koker , Abhishek R. Appu , Kamal Sinha , Joydeep Ray , Balaji Vembu , Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , John C. Weast , Justin E. Gottschlich , Prasoonkumar Surti , Chandrasekaran Sakthivel , Farshad Akhbari , Nadathur Rajagopalan Satish , Liwei Ma , Jeremy Bottleson , Eriko Nurvitadhi , Travis T. Schluessler , Ankur N. Shah , Jonathan Kennedy , Vasanth Ranganathan , Sanjeev Jahagirdar
Abstract: In an example, an apparatus comprises a plurality of execution units comprising at least a first type of execution unit and a second type of execution unit and logic, at least partially including hardware logic, to analyze a workload and assign the workload to one of the first type of execution unit or the second type of execution unit. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20230334316A1
公开(公告)日:2023-10-19
申请号:US18314450
申请日:2023-05-09
Applicant: Intel Corporation
Inventor: Altug Koker , Abhishek R. Appu , Kamal Sinha , Joydeep Ray , Balaji Vembu , Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , John C. Weast , Justin E. Gottschlich , Prasoonkumar Surti , Chandrasekaran Sakthivel , Farshad Akhbari , Nadathur Rajagopalan Satish , Liwei Ma , Jeremy Bottleson , Eriko Nurvitadhi , Travis T. Schluessler , Ankur N. Shah , Jonathan Kennedy , Vasanth Ranganathan , Sanjeev Jahagirdar
Abstract: Described herein is a graphics processor comprising a memory device and a graphics processing cluster coupled with the memory device. The graphics processing cluster includes a plurality of graphics multiprocessors interconnected via a data interconnect. A graphics multiprocessor includes circuitry configured to load a modular neural network including a plurality of subnetworks, each of the plurality of subnetworks trained to perform a computer vision operation on a separate subject.
-
公开(公告)号:US20230297526A1
公开(公告)日:2023-09-21
申请号:US17832305
申请日:2022-06-03
Applicant: Intel Corporation
Inventor: David Puffer , Ankur Shah , Niranjan Cooray , Bryan White , Balaji Vembu , Hema Chand Nalluri , Kritika Bala
CPC classification number: G06F13/24 , G06F13/1668 , G06T1/20
Abstract: Embodiments described herein provide techniques to facilitate scalable interrupts and workload submission for a virtualized graphics processor. The techniques include memory-based interrupt reporting and shared work queue submission for multiple software domains.
-
公开(公告)号:US11748283B1
公开(公告)日:2023-09-05
申请号:US17832305
申请日:2022-06-03
Applicant: Intel Corporation
Inventor: David Puffer , Ankur Shah , Niranjan Cooray , Bryan White , Balaji Vembu , Hema Chand Nalluri , Kritika Bala
IPC: G06F9/48 , G06F12/084 , G06F13/24 , G06F13/16 , G06T1/20
CPC classification number: G06F13/24 , G06F13/1668 , G06T1/20
Abstract: Embodiments described herein provide techniques to facilitate scalable interrupts and workload submission for a virtualized graphics processor. The techniques include memory-based interrupt reporting and shared work queue submission for multiple software domains.
-
公开(公告)号:US11748106B2
公开(公告)日:2023-09-05
申请号:US17683564
申请日:2022-03-01
Applicant: Intel Corporation
Inventor: Liwei Ma , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Eriko Nurvitadhi , Abhishek R. Appu , Altug Koker , Kamal Sinha , Joydeep Ray , Balaji Vembu , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/38
CPC classification number: G06F9/3832
Abstract: A mechanism is described for facilitating fast data operations and for facilitating a finite state machine for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting input data to be used in computational tasks by a computation component of a processor including a graphics processor. The method may further include determining one or more frequently-used data values (FDVs) from the data, and pushing the one or more frequent data values to bypass the computational tasks.
-
公开(公告)号:US20230260072A1
公开(公告)日:2023-08-17
申请号:US18168207
申请日:2023-02-13
Applicant: Intel Corporation
Inventor: Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu
CPC classification number: G06T1/20 , G06F9/45533 , G06F9/5061 , G06F9/5094 , G06N3/063 , G06N3/084 , G06N3/044 , G06N3/045 , G06F8/41
Abstract: Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.
-
18.
公开(公告)号:US11720355B2
公开(公告)日:2023-08-08
申请号:US17834482
申请日:2022-06-07
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G09G5/393 , G06F9/38 , G06F7/483 , G06F7/544 , G06N3/063 , G06N3/08 , G06N3/044 , G06N3/045 , G06T15/00 , G06N20/00 , G06F17/16
CPC classification number: G06F9/3001 , G06F7/483 , G06F7/5443 , G06F9/30014 , G06F9/30036 , G06F9/3851 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F9/3013 , G06F9/30025 , G06F17/16 , G06F2207/3824 , G06N20/00 , G06T15/005
Abstract: One embodiment provides a graphics processor comprising a memory controller and a graphics processing resource coupled with the memory controller. The graphics processing resource includes circuitry configured to execute an instruction to perform a matrix operation on first input including weight data and second input including input activation data, generate intermediate data based on a result of the matrix operation, quantize the intermediate data to a floating-point format determined based on a statistical distribution of first output data, and output, as second output data, quantized intermediate data in a determined floating-point format.
-
公开(公告)号:US20230142472A1
公开(公告)日:2023-05-11
申请号:US17959374
申请日:2022-10-04
Applicant: Intel Corporation
Inventor: Eric J. Asperheim , Subramaniam Maiyuran , Kiran C. Veernapu , Sanjeev S. Jahagirdar , Balaji Vembu , Devan Burke , Philip R. Laws , Kamal Sinha , Abhishek R. Appu , Elmoustapha Ould-Ahmed-Vall , Peter L. Doyle , Joydeep Ray , Travis T. Schluessler , John H. Feit , Nikos Kaburlasos , Jacek Kwiatkowski , Altug Koker
IPC: G06F3/14 , G06F3/01 , G09G5/391 , G06F3/0484
CPC classification number: G06F3/1438 , G06F3/013 , G09G5/391 , G06F3/0484 , G09G2354/00 , G09G2352/00 , G09G2360/08 , G09G2340/0435 , G09G2360/121 , G09G5/001
Abstract: In accordance with some embodiments, the render rate is varied across and/or up and down the display screen. This may be done based on where the user is looking in order to reduce power consumption and/or increase performance. Specifically the screen display is separated into regions, such as quadrants. Each of these regions is rendered at a rate determined by at least one of what the user is currently looking at, what the user has looked at in the past and/or what it is predicted that the user will look at next. Areas of less focus may be rendered at a lower rate, reducing power consumption in some embodiments.
-
20.
公开(公告)号:US20230046506A1
公开(公告)日:2023-02-16
申请号:US17967283
申请日:2022-10-17
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G06F7/483 , G06N3/063 , G06N3/04 , G06F9/38 , G06N3/08 , G09G5/393 , G06F7/544 , G06T15/00 , G06N20/00 , G06F17/16
Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.
-
-
-
-
-
-
-
-
-