Patent search ap:("Intel Corporation") AND inv:"Balaji Vembu" Page 4

31.

发明授权
Apparatus and method for protecting content in virtualized and graphics environments 有权

公开(公告)号：US11341212B2

公开(公告)日：2022-05-24

申请号：US16792822

申请日：2020-02-17

Applicant: INTEL CORPORATION

Inventor： Joydeep Ray , Abhishek R. Appu , Pattabhiraman K , Balaji Vembu , Altug Koker

IPC: G06F21/10 , G06F12/14 , G06F12/0895 , G06F9/455 , G06F12/0815 , G06T15/00 , H04N19/00 , H04N21/4405

Abstract: An apparatus and method for protecting content in a graphics processor. For example, one embodiment of an apparatus comprises: encode/decode circuitry to decode protected audio and/or video content to generate decoded audio and/or video content; a graphics cache of a graphics processing unit (GPU) to store the decoded audio and/or video content; first protection circuitry to set a protection attribute for each cache line containing the decoded audio and/or video data in the graphics cache; a cache coherency controller to generate a coherent read request to the graphics cache; second protection circuitry to read the protection attribute to determine whether the cache line identified in the read request is protected, wherein if it is protected, the second protection circuitry to refrain from including at least some of the data from the cache line in a response.

32.

发明申请
COMPUTE OPTIMIZATION MECHANISM FOR DEEP NEURAL NETWORKS 有权

公开(公告)号：US20220156876A1

公开(公告)日：2022-05-19

申请号：US17529862

申请日：2021-11-18

Applicant: Intel Corporation

Inventor： Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu

IPC: G06T1/20 , G06T1/60 , G09G5/36 , G06F3/06 , G06N3/08 , G06F3/14 , G06N3/04 , G06N3/063

Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes one or more processing units to provide a first set of shader operations associated with a shader stage of a graphics pipeline, a scheduler to schedule shader threads for processing, and a field-programmable gate array (FPGA) dynamically configured to provide a second set of shader operations associated with the shader stage of the graphics pipeline.

33.

发明授权
Compute optimization mechanism for deep neural networks 有权

公开(公告)号：US11334962B2

公开(公告)日：2022-05-17

申请号：US17385693

申请日：2021-07-26

Applicant: Intel Corporation

Inventor： Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu

IPC: G06T1/20 , G06F9/455 , G06F9/50 , G06N3/04 , G06N3/063 , G06N3/08 , G06F8/41

Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a plurality of processing units each comprising a plurality of processing cores of a first type and a second type. A first set of processing cores of a first type perform multi-dimensional matrix operations and a second set of processing cores of a second type perform general purpose graphics processing unit (GPGPU) operations.

34.

发明授权
Replacement policies for a hybrid hierarchical cache 有权

公开(公告)号：US11263152B2

公开(公告)日：2022-03-01

申请号：US16881271

申请日：2020-05-22

Applicant: Intel Corporation

Inventor： Abhishek R. Appu , Joydeep Ray , James A. Valerio , Altug Koker , Prasoonkumar Surti , Balaji Vembu , Wenyin Fu , Bhushan M. Borole , Kamal Sinha

IPC: G06F12/12 , G06F12/128 , G06F12/0811 , G06F13/40 , G06T1/60 , G06F12/0897 , G06F12/084

Abstract: A hybrid hierarchical cache is implemented at the same level in the access pipeline, to get the faster access behavior of a smaller cache and, at the same time, a higher hit rate at lower power for a larger cache, in some embodiments. A split cache at the same level in the access pipeline includes two caches that work together. In the hybrid, split, low level cache (e.g., L1) evictions are coordinated locally between the two L1 portions, and on a miss to both L1 portions, a line is allocated from a larger L2 cache to the smallest L1 cache.

35.

发明授权
Order independent asynchronous compute and streaming for graphics 有权

公开(公告)号：US11257274B2

公开(公告)日：2022-02-22

申请号：US16887439

申请日：2020-05-29

Applicant: Intel Corporation

Inventor： Devan Burke , Adam T. Lake , Jeffery S. Boles , John H. Feit , Karthik Vaidyanathan , Abhishek R. Appu , Joydeep Ray , Subramaniam Maiyuran , Altug Koker , Balaji Vembu , Murali Ramadoss , Prasoonkumar Surti , Eric J. Hoekstra , Gabor Liktor , Jonathan Kennedy , Slawomir Grajewski , Elmoustapha Ould-Ahmed-Vall

IPC: G06T15/00 , G06F9/48 , G06T15/04 , G06T15/80 , G06T17/10 , G06T17/20

Abstract: An embodiment of an electronic processing system may include an application processor, persistent storage media communicatively coupled to the application processor, and a graphics subsystem communicatively coupled to the application processor. The system may include one or more of a draw call re-orderer communicatively coupled to the application processor and the graphics subsystem to re-order two or more draw calls, a workload re-orderer communicatively coupled to the application processor and the graphics subsystem to re-order two or more work items in an order independent mode, a queue primitive included in at least one of the two or more draw calls to define a producer stage and a consumer stage, and an order-independent executor communicatively coupled to the application processor and the graphics subsystem to provide tile-based order independent execution of a compute stage. Other embodiments are disclosed and claimed.

36.

发明申请
COMPUTE OPTIMIZATION MECHANISM FOR DEEP NEURAL NETWORKS 有权

公开(公告)号：US20210350499A1

公开(公告)日：2021-11-11

申请号：US17385693

申请日：2021-07-26

Applicant: Intel Corporation

Inventor： Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu

IPC: G06T1/20 , G06F9/455 , G06F9/50 , G06N3/04 , G06N3/063 , G06N3/08

Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a plurality of processing units each comprising a plurality of processing cores of a first type and a second type. A first set of processing cores of a first type perform multi-dimensional matrix operations and a second set of processing cores of a second type perform general purpose graphics processing unit (GPGPU) operations.

37.

发明授权
Instructions and logic to perform floating-point and integer operations for machine learning 有权

公开(公告)号：US11169799B2

公开(公告)日：2021-11-09

申请号：US16432402

申请日：2019-06-05

Applicant: Intel Corporation

Inventor： Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC: G06F9/30 , G06F9/38 , G06F7/483 , G06F7/544 , G06N3/063 , G06N20/00 , G09G5/393 , G06N3/04 , G06N3/08 , G06T15/00

Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.

38.

发明申请
ADAPTIVE COMPUTE SIZE PER WORKLOAD 有权

公开(公告)号：US20210287328A1

公开(公告)日：2021-09-16

申请号：US17211743

申请日：2021-03-24

Applicant: INTEL CORPORATION

Inventor： Balaji Vembu , Josh B. Mastronarde , Altug Koker , Nikos Kaburlasos , Abhishek R. Appu , Joydeep Ray

IPC: G06T1/60 , G06T1/20 , G06F1/329 , G06F9/48

Abstract: Methods and apparatus relating to techniques for power management. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive one or more frames for a workload, determine one or more compute resource parameters for the workload, and store the one or more compute resource parameters for the workload in a memory in association with workload context data for the workload. Other embodiments are also disclosed and claimed.

39.

发明申请
WORKLOAD SCHEDULING AND DISTRIBUTION ON A DISTRIBUTED GRAPHICS DEVICE 有权

公开(公告)号：US20210241418A1

公开(公告)日：2021-08-05

申请号：US17234039

申请日：2021-04-19

Applicant: Intel Corporation

Inventor： Balaji Vembu , Brandon Fliflet , James Valerio , Michael Apodaca , Ben Ashbaugh , Hema Nalluri , Ankur Shah , Murali Ramadoss , David Puffer , Altug Koker , Aditya Navale , Abhishek R. Appu , Joydeep Ray , Travis Schluessler

IPC: G06T1/20 , G06F9/48 , G06F9/50 , G06F9/52 , G06T1/60

Abstract: Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.