-
公开(公告)号:US11074072B2
公开(公告)日:2021-07-27
申请号:US16505012
申请日:2019-07-08
Applicant: Intel Corporation
Inventor: Kevin Nealis , Anbang Yao , Xiaoming Chen , Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha
Abstract: One embodiment provides for a compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction that specifies multiple operands including a multi-bit input value and a bipolar binary weight associated with a neural network and an arithmetic logic unit including a multiplier, an adder, and an accumulator register. To execute the decoded instruction, the multiplier is to perform a multiplication operation on the multi-bit input based on the bipolar binary weight to generate an intermediate product and the adder is to add the intermediate product to a value stored in the accumulator register and update the value stored in the accumulator register.
-
公开(公告)号:US20210216467A1
公开(公告)日:2021-07-15
申请号:US16952817
申请日:2020-11-19
Applicant: Intel Corporation
Inventor: Altug Koker , Balaji Vembu , Joydeep Ray , Abhishek R. Appu
IPC: G06F12/0895 , G06F12/126 , G06F12/02 , G06T1/60
Abstract: A mechanism is described for facilitating optimization of cache associated with graphics processors at computing devices. A method of embodiments, as described herein, includes introducing coloring bits to contents of a cache associated with a processor including a graphics processor, wherein the coloring bits to represent a signal identifying one or more caches available for use, while avoiding explicit invalidations and flushes.
-
公开(公告)号:US20210192676A1
公开(公告)日:2021-06-24
申请号:US17197126
申请日:2021-03-10
Applicant: Intel Corporation
Inventor: Balaji Vembu , Altug Koker , Joydeep Ray
Abstract: One embodiment provides an apparatus comprising an interconnect fabric comprising one or more fabric switches, a plurality of memory interfaces coupled to the interconnect fabric to provide access to a plurality of memory devices, an input/output (TO) interface coupled to the interconnect fabric to provide access to IO devices, an array of multiprocessors coupled to the interconnect fabric, scheduling circuitry to distribute a plurality of thread groups across the array of multiprocessors, each thread group comprising a plurality of threads and each thread comprising a plurality of instructions to be executed by at least one of the multiprocessors, and a first multiprocessor of the array of multiprocessors to be assigned to process a first thread group comprising a first plurality of threads, the first multiprocessor comprising a plurality of parallel execution circuits.
-
44.
公开(公告)号:US20210182058A1
公开(公告)日:2021-06-17
申请号:US17169232
申请日:2021-02-05
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.
-
公开(公告)号:US10955896B2
公开(公告)日:2021-03-23
申请号:US16782791
申请日:2020-02-05
Applicant: INTEL CORPORATION
Inventor: Abhishek R. Appu , Altug Koker , Eric J. Hoekstra , Kiran C. Veernapu , Prasoonkumar Surti , Vasanth Ranganathan , Kamal Sinha , Balaji Vembu , Eric J. Asperheim , Sanjeev S. Jahagirdar , Joydeep Ray
IPC: G06F3/06 , G06F1/3225 , G06F1/3234
Abstract: Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive data for a current write operation to a memory, determine a number of bits in the received data for the current write operation to the memory which have changed from a previous write operation to the memory and in response to a determination that the number of bits in the received data for the current write operation to the memory which have changed from a previous write operation to the memory exceeds a threshold, to toggle a plurality of bits in the data for the current write operation to create an encoded data set and set an indicator bit to a value which indicates that the plurality of bits have been toggled. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US10922085B2
公开(公告)日:2021-02-16
申请号:US16825129
申请日:2020-03-20
Applicant: Intel Corporation
Inventor: Balaji Vembu , Abhishek R. Appu , Joydeep Ray , Altug Koker
IPC: G06T1/20 , G06F9/50 , G06F9/48 , G06F9/38 , G06F9/46 , G06F9/52 , G06F9/54 , G06F15/16 , G06F15/76 , G06F12/0897 , G06F12/0866 , G06T1/60
Abstract: An apparatus to facilitate thread scheduling is disclosed. The apparatus includes logic to store barrier usage data based on a magnitude of barrier messages in an application kernel and a scheduler to schedule execution of threads across a plurality of multiprocessors based on the barrier usage data.
-
公开(公告)号:US20210035254A1
公开(公告)日:2021-02-04
申请号:US16924895
申请日:2020-07-09
Applicant: Intel Corporation
Inventor: Altug Koker , Ingo Wald , David Puffer , Subramaniam M. Maiyuran , Prasoonkumar Surti , Balaji Vembu , Guei-Yuan Lueh , Murali Ramadoss , Abhishek R. Appu , Joydeep Ray
Abstract: One embodiment provides for a parallel processor comprising a processing array within the parallel processor, the processing array including multiple compute blocks, each compute block including multiple processing clusters configured for parallel operation, wherein each of the multiple compute blocks is independently preemptable. In one embodiment a preemption hint can be generated for source code during compilation to enable a compute unit to determine an efficient point for preemption.
-
公开(公告)号:US10908865B2
公开(公告)日:2021-02-02
申请号:US16586043
申请日:2019-09-27
Applicant: Intel Corporation
Inventor: Deepak S. Vembar , Atsuo Kuwahara , Chandrasekaran Sakthivel , Radhakrishnan Venkataraman , Brent E. Insko , Anupreet S. Kalra , Hughes Labbe , Altug Koker , Michael Apodaca , Kai Xiao , Jeffery S. Boles , Adam T. Lake , David M. Cimini , Balaji Vembu , Elmoustapha Ould-Ahmed-Vall , Jacek Kwiatkowski , Philip R. Laws , Ankur N. Shah , Abhishek R. Appu , Joydeep Ray , Wenyin Fu , Nikos Kaburlasos , Prasoonkumar Surti , Bhushan M. Borole
Abstract: An embodiment of a graphics apparatus may include a processor, memory communicatively coupled to the processor, and a collaboration engine communicatively coupled to the processor to identify a shared graphics component between two or more users in an environment, and share the shared graphics components with the two or more users in the environment. Embodiments of the collaboration engine may include one or more of a centralized sharer, a depth sharer, a shared preprocessor, a multi-port graphics subsystem, and a decode sharer. Other embodiments are disclosed and claimed.
-
公开(公告)号:US10901909B2
公开(公告)日:2021-01-26
申请号:US15488592
申请日:2017-04-17
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Joydeep Ray , Altug Koker , Balaji Vembu , Kamal Sinha , Prasoonkumar Surti , Wenyin Fu , Bhushan M. Borole , Vasanth Ranganathan
IPC: G06T1/60 , G06F12/0893 , G06F12/0891 , G06F12/0846 , G06F12/0875 , G06F12/0811
Abstract: In accordance with some embodiments, a separate pipe is used in graphics processor for handling accesses, namely reads, to read only (RO) surfaces within caches. Moreover, the caches may have defined read only section and defined read write (RW) sections. The read only section may be accessed through a dedicated read only pipe and the read write section may be accessed through a read write pipe for those surfaces that can also be written. Thus, the read only sections are handled in a read only fashion without the need to accommodate writes.
-
公开(公告)号:US10853132B2
公开(公告)日:2020-12-01
申请号:US16379565
申请日:2019-04-09
Applicant: Intel Corporation
Inventor: Altug Koker , Joydeep Ray , Balaji Vembu , James A. Valerio , Abhishek R. Appu
Abstract: A mechanism is described for facilitating memory-based software barriers to emulate hardware barriers at graphics processors in computing devices. A method of embodiments, as described herein, includes facilitating converting thread scheduling at a processor from hardware barriers to software barriers, where the software barriers emulate the hardware barriers.
-
-
-
-
-
-
-
-
-