-
公开(公告)号:US12086120B2
公开(公告)日:2024-09-10
申请号:US18066436
申请日:2022-12-15
Applicant: Intel Corporation
Inventor: Prasoonkumar Surti , Abhishek R. Appu , Karol Szerszen , Eric Liskay , Karthik Vaidyanathan
CPC classification number: G06F16/2237 , G06N20/00 , G06T1/20
Abstract: Embodiments are generally directed to compression for compression for sparse data structures utilizing mode search approximation. An embodiment of an apparatus includes one or more processors including a graphics processor to process data; and a memory for storage of data, including compressed data. The one or more processors are to provide for compression of a data structure, including identification of a mode in the data structure, the data structure including a plurality of values and the mode being a most repeated value in a data structure, wherein identification of the mode includes application of a mode approximation operation, and encoding of an output vector to include the identified mode, a significance map to indicate locations at which the mode is present in the data structure, and remaining uncompressed data from the data structure.
-
公开(公告)号:US12056906B2
公开(公告)日:2024-08-06
申请号:US18466141
申请日:2023-09-13
Applicant: Intel Corporation
Inventor: Joydeep Ray , Ben Ashbaugh , Prasoonkumar Surti , Pradeep Ramani , Rama Harihara , Jerin C. Justin , Jing Huang , Xiaoming Cui , Timothy B. Costa , Ting Gong , Elmoustapha Ould-ahmed-vall , Kumar Balasubramanian , Anil Thomas , Oguz H. Elibol , Jayaram Bobba , Guozhong Zhuang , Bhavani Subramanian , Gokce Keskin , Chandrasekaran Sakthivel , Rajesh Poornachandran
CPC classification number: G06T9/002 , G06F12/023 , G06T15/005 , G06F2212/302 , G06F2212/401
Abstract: Embodiments are generally directed to compression in machine learning and deep learning processing. An embodiment of an apparatus for compression of untyped data includes a graphical processing unit (GPU) including a data compression pipeline, the data compression pipeline including a data port coupled with one or more shader cores, wherein the data port is to allow transfer of untyped data without format conversion, and a 3D compression/decompression unit to provide for compression of untyped data to be stored to a memory subsystem and decompression of untyped data from the memory subsystem.
-
公开(公告)号:US20240257433A1
公开(公告)日:2024-08-01
申请号:US18414841
申请日:2024-01-17
Applicant: Intel Corporation
Inventor: Prasoonkumar Surti , Abhishek R. Appu , Karthik Vaidyanathan , Saikat Mandal , Michael Norris
CPC classification number: G06T15/005 , G06F3/0604 , G06F3/0659 , G06F3/0673 , G06T15/06
Abstract: Apparatus and method for asynchronous ray tracing. For example, one embodiment of a processor comprises: a bounding volume hierarchy (BVH) generator to construct a BVH comprising a plurality of hierarchically arranged nodes including a root node, a plurality of internal nodes, and a plurality of leaf nodes comprising primitives, wherein each internal node comprises a child node to either the root node or another internal node and each leaf node comprises a child node to an internal node; a first storage bank to be arranged as a first plurality of entries; a second storage bank to be arranged as a second plurality of entries, wherein each entry of the first plurality of entries and the second plurality of entries is to store a ray to be traversed through the BVH; an allocator circuit to distribute an incoming ray to either the first storage bank or the second storage bank based on a relative numbers of rays currently stored in the first and second storage banks; and traversal circuitry to alternate between selecting a next ray from the first storage bank and the second storage bank, the traversal circuitry to traverse the next ray through the BVH by reading a next BVH node from a top of a BVH node stack and determining whether the next ray intersects the next BVH node.
-
公开(公告)号:US20240256456A1
公开(公告)日:2024-08-01
申请号:US18391346
申请日:2023-12-20
Applicant: Intel Corporation
Inventor: Vikranth Vemulapalli , Lakshminarayanan Striramassarma , Mike MacPherson , Aravindh Anantaraman , Ben Ashbaugh , Murali Ramadoss , William B. Sadler , Jonathan Pearce , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter, Jr. , Prasoonkumar Surti , Nicolas Galoppo von Borries , Joydeep Ray , Abhishek R. Appu , ElMoustapha Ould-Ahmed-Vall , Altug Koker , Sungye Kim , Subramaniam Maiyuran , Valentin Andrei
IPC: G06F12/0862 , G06T1/20 , G06T1/60
CPC classification number: G06F12/0862 , G06T1/20 , G06T1/60 , G06F2212/602 , G06F2212/608
Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the Li cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
-
公开(公告)号:US11915357B2
公开(公告)日:2024-02-27
申请号:US16820483
申请日:2020-03-16
Applicant: Intel Corporation
Inventor: Karthik Vaidyanathan , Abhishek Appu , Vasanth Ranganathan , Joydeep Ray , Prasoonkumar Surti
CPC classification number: G06T15/005 , G06T15/06
Abstract: Apparatus and method for stack throttling. For example, one embodiment of an apparatus comprises: execution circuitry comprising a plurality of functional units to execute a plurality of ray shaders and generate a plurality of primary rays and a corresponding plurality of ray messages; a first in first out (FIFO) buffer to queue the ray messages generated by the EUs; a cache to store one or more of the plurality of primary rays; a memory-backed stack to store a first subset of the plurality of ray messages in a corresponding plurality of entries; memory-backed stack management circuitry to either store a second subset of the plurality of ray messages to the memory-backed stack, or to temporarily store the one or more the second subset of the plurality of ray messages to a memory subsystem based, at least in part, on a number of entries currently occupied by ray messages in the memory-backed stack; and ray traversal circuitry to read a next ray message from the memory-backed stack, retrieve a next primary ray identified by the ray message from the cache or a memory subsystem, and perform traversal operations on the next primary ray.
-
公开(公告)号:US20240013337A1
公开(公告)日:2024-01-11
申请号:US18351898
申请日:2023-07-13
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , John C. Weast , Mike B. Macpherson , Linda L. Hurd , Sara S. Baghsorkhi , Justin E. Gottschlich , Prasoonkumar Surti , Chandrasekaran Sakthivel , Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Kamal Sinha , Joydeep Ray , Balaji Vembu , Sanjeev Jahagirdar , Vasanth Ranganathan , Dukhwan Kim
Abstract: A mechanism is described for detecting, at training time, information related to one or more tasks to be performed by the one or more processors according to a training dataset for a neural network, analyzing the information to determine one or more portions of hardware of a processor of the one or more processors that is configurable to support the one or more tasks, configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks, and monitoring utilization of the hardware via a hardware unit of the graphics processor and, via a scheduler of the graphics processor, adjusting allocation of the one or more tasks to the one or more portions of the hardware based on the utilization.
-
公开(公告)号:US20240012767A1
公开(公告)日:2024-01-11
申请号:US18358550
申请日:2023-07-25
Applicant: Intel Corporation
Inventor: Joydeep Ray , Altug Koker , Elmoustapha Ould-Ahmed-Vall , Michael Macpherson , Aravindh V. Anantaraman , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Varghese George , Abhishek Appu , Prasoonkumar Surti
CPC classification number: G06T15/005 , G06F9/3013 , G06F9/38873
Abstract: An apparatus to facilitate efficient data sharing for graphics data processing operations is disclosed. The apparatus includes a processing resource to generate a stream of instructions, an L1 cache communicably coupled to the processing resource and comprising an on-page detector circuit to determine that a set of memory requests in the stream of instructions access a same memory page; and set a marker in a first request of the set of memory requests; and arbitration circuitry communicably coupled to the L1 cache, the arbitration circuitry to route the set of memory requests to memory comprising the memory page and to, in response to receiving the first request with the marker set, remain with the processing resource to process the set of memory requests.
-
公开(公告)号:US20230418355A1
公开(公告)日:2023-12-28
申请号:US18339827
申请日:2023-06-22
Applicant: INTEL CORPORATION
Inventor: Altug Koker , Abhishek R. Appu , Kiran C. Veernapu , Joydeep Ray , Balaji Vembu , Prasoonkumar Surti , Kamal Sinha , Eric J. Hoekstra , Wenyin Fu , Nikos Kaburlasos , Bhushan M. Borole , Travis T. Schluessler , Ankur N. Shah , Jonathan Kennedy
IPC: G06F1/3209 , H04W52/02 , G06F1/324 , G06F1/3203 , G06F1/3212 , G06F1/3218 , G06F1/3231 , G06F3/01 , G06F11/07 , G06F11/30
CPC classification number: G06F1/3209 , H04W52/0258 , G06F1/324 , G06F1/3203 , G06F1/3212 , G06F1/3218 , G06F1/3231 , G06F3/01 , G06F11/0781 , G06F11/3062 , Y02D10/00 , Y02D30/70 , H04M1/72448
Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to collect user information for a user of a data processing device, generate a user profile for the user of the data processing device from the user information, and set a power profile a processor in the data processing device using the user profile. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US11810222B2
公开(公告)日:2023-11-07
申请号:US17521009
申请日:2021-11-08
Applicant: Intel Corporation
Inventor: Prasoonkumar Surti , Larry Seiler , Adam Z. Leibel
CPC classification number: G06T1/20 , G06T15/005
Abstract: Systems and methods may provide for receiving a pixel shader and sending the pixel shader to shader bypass hardware if the pixel shader and a render target associated with the pixel shader satisfy a simplicity condition. In one example, the shader bypass hardware is dedicated to pixel shaders and associated render targets that satisfy the simplicity condition.
-
公开(公告)号:US20230351543A1
公开(公告)日:2023-11-02
申请号:US18310688
申请日:2023-05-02
Applicant: Intel Corporation
Inventor: Joydeep Ray , Scott Janus , Varghese George , Subramaniam Maiyuran , Altug Koker , Abhishek Appu , Prasoonkumar Surti , Vasanth Ranganathan , Valentin Andrei , Ashutosh Garg , Yoav Harel , Arthur Hunter, JR. , SungYe Kim , Mike Macpherson , Elmoustapha Ould-Ahmed-Vall , William Sadler , Lakshminarayanan Striramassarma , Vikranth Vemulapalli
IPC: G06N3/084 , G06F15/80 , G06F17/16 , G06N3/048 , G06T1/20 , G06F9/50 , G06F12/0806 , G06F7/544 , G06N3/08
CPC classification number: G06T1/20 , G06F7/5443 , G06F9/5027 , G06F12/0806 , G06F15/8046 , G06F17/16 , G06N3/048 , G06N3/08 , G06N3/084
Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. Embodiment described herein provided techniques to detect zero value elements within a vector or a set of packed data elements output by a processing resource and generate metadata to indicate a location of the zero value elements within the plurality of data elements.
-
-
-
-
-
-
-
-
-