-
公开(公告)号:US12130739B2
公开(公告)日:2024-10-29
申请号:US16833304
申请日:2020-03-27
申请人: Intel Corporation
发明人: Ayan Mandal , Neetu Jindal , Leon Polishuk , Yossi Grotas , Aravindh Anantaraman
IPC分类号: G06F12/00 , G06F12/0811 , G06F12/0831 , G06F12/123
CPC分类号: G06F12/0811 , G06F12/0831 , G06F12/123 , G06F2212/1021
摘要: Systems, methods, and apparatuses relating to circuitry to implement dynamic bypassing of last level cache are described. In one embodiment, a hardware processor includes a cache to store a plurality of cache lines of data, a processing element to generate a memory request and mark the memory request with a reuse hint value, and a cache controller circuit to mark a corresponding cache line in the cache as more recently used when the memory request is a read request that is a hit in the cache and the reuse hint value is a first value, and mark the corresponding cache line in the cache as less recently used when the memory request is the read request that is the hit in the cache and the reuse hint value is a second, different value.
-
公开(公告)号:US20240345990A1
公开(公告)日:2024-10-17
申请号:US18626775
申请日:2024-04-04
申请人: Intel Corporation
发明人: Lakshminarayanan Striramassarma , Prasoonkumar Surti , Varghese George , Ben Ashbaugh , Aravindh Anantaraman , Valentin Andrei , Abhishek Appu , Nicolas Galoppo Von Borries , Altug Koker , Mike Macpherson , Subramaniam Maiyuran , Nilay Mistry , Elmoustapha Ould-Ahmed-Vall , Selvakumar Panneer , Vasanth Ranganathan , Joydeep Ray , Ankur Shah , Saurabh Tangri
IPC分类号: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06N3/08 , G06T1/20 , G06T1/60 , G06T15/06 , H03M7/46
CPC分类号: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06
摘要: Multi-tile Memory Management for Detecting Cross Tile Access, Providing Multi-Tile Inference Scaling with multicasting of data via copy operation, and Providing Page Migration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a memory and a memory controller, a second graphics processing unit (GPU) having a memory and a cross-GPU fabric to communicatively couple the first and second GPUs. The memory controller is configured to determine whether frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU in the multi-GPU configuration and to send a message to initiate a data transfer mechanism when frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU.
-
公开(公告)号:US20240345956A1
公开(公告)日:2024-10-17
申请号:US18754499
申请日:2024-06-26
IPC分类号: G06F12/0811 , G06F9/30 , G06F9/38 , G06F9/46 , G06F9/54 , G06F11/30 , G06F12/0808 , G06F12/0815 , G06F12/0817 , G06F12/0831 , G06F12/084 , G06F12/0895 , G06F12/128 , G06F13/16
CPC分类号: G06F12/0811 , G06F9/3004 , G06F9/30047 , G06F9/30079 , G06F9/3867 , G06F9/467 , G06F9/544 , G06F9/546 , G06F11/3037 , G06F12/0808 , G06F12/0815 , G06F12/0828 , G06F12/0831 , G06F12/084 , G06F12/0895 , G06F12/128 , G06F13/1668 , G06F2212/1021 , G06F2212/608
摘要: An apparatus includes a CPU core and a L1 cache subsystem including a L1 main cache, a L1 victim cache, and a L1 controller. The apparatus includes a L2 cache subsystem coupled to the L1 cache subsystem by a transaction bus and a tag update bus. The L2 cache subsystem includes a L2 main cache, a shadow L1 main cache, a shadow L1 victim cache, and a L2 controller. The L2 controller receives a message from the L1 controller over the tag update bus, including a valid signal, an address, and a coherence state. In response to the valid signal being asserted, the L2 controller identifies an entry in the shadow L1 main cache or the shadow L1 victim cache having an address corresponding to the address of the message and updates a coherence state of the identified entry to be the coherence state of the message.
-
公开(公告)号:US12099446B2
公开(公告)日:2024-09-24
申请号:US18131348
申请日:2023-04-05
发明人: Kevin David Howard
IPC分类号: G06F12/08 , G06F12/0842
CPC分类号: G06F12/0842 , G06F2212/1021
摘要: Methods and systems for existing software applications to automatically take advantage of multicore computer systems outside of the conventional simultaneous processing of multiple applications and without performance problems from cache misses and mismatched task processing times are presented. Unlike other multicore optimization techniques, the present invention uses techniques that are applied to design graphs and work for scaled and standard speedup-based parallel processing. The methods and systems optimize software designs that are attached to code for maximum performance on multicore computer hardware by analyzing and modifying loop structures to produce a general parallel solution, not just simple loop unrolling.
-
公开(公告)号:US12079155B2
公开(公告)日:2024-09-03
申请号:US17428216
申请日:2020-03-14
申请人: Intel Corporation
发明人: Joydeep Ray , Selvakumar Panneer , Saurabh Tangri , Ben Ashbaugh , Scott Janus , Abhishek Appu , Varghese George , Ravishankar Iyer , Nilesh Jain , Pattabhiraman K , Altug Koker , Mike MacPherson , Josh Mastronarde , Elmoustapha Ould-Ahmed-Vall , Jayakrishna P. S , Eric Samson
IPC分类号: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06N3/08 , G06T15/06
CPC分类号: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06
摘要: Embodiments described herein include software, firmware, and hardware that provides techniques to enable deterministic scheduling across multiple general-purpose graphics processing units. One embodiment provides a multi-GPU architecture with uniform latency. One embodiment provides techniques to distribute memory output based on memory chip thermals. One embodiment provides techniques to enable thermally aware workload scheduling. One embodiment provides techniques to enable end to end contracts for workload scheduling on multiple GPUs.
-
公开(公告)号:US12061909B2
公开(公告)日:2024-08-13
申请号:US18312380
申请日:2023-05-04
发明人: Salma Ayub , Sundeep Chadha , Robert Allen Cordes , David Allen Hrusecky , Hung Qui Le , Dung Quoc Nguyen , Brian William Thompto
IPC分类号: G06F9/38 , G06F9/30 , G06F12/0875
CPC分类号: G06F9/3802 , G06F9/30043 , G06F9/30145 , G06F9/3836 , G06F9/3885 , G06F12/0875 , G06F2212/1021 , G06F2212/452 , G06F2212/608
摘要: An execution unit circuit for use in a processor core provides efficient use of area and energy by reducing the per-entry storage requirement of a load-store unit issue queue. The execution unit circuit includes a recirculation queue that stores the effective address of the load and store operations and the values to be stored by the store operations. A queue control logic controls the recirculation queue and issue queue so that that after the effective address of a load or store operation has been computed, the effective address of the load operation or the store operation is written to the recirculation queue and the operation is removed from the issue queue, so that address operands and other values that were in the issue queue entry no longer require storage. When a load or store operation is rejected by the cache unit, it is subsequently reissued from the recirculation queue.
-
公开(公告)号:US12056051B2
公开(公告)日:2024-08-06
申请号:US17723559
申请日:2022-04-19
IPC分类号: G06F12/0811 , G06F9/30 , G06F9/38 , G06F9/46 , G06F9/54 , G06F11/30 , G06F12/0808 , G06F12/0815 , G06F12/0817 , G06F12/0831 , G06F12/084 , G06F12/0895 , G06F12/128 , G06F13/16
CPC分类号: G06F12/0811 , G06F9/3004 , G06F9/30047 , G06F9/30079 , G06F9/3867 , G06F9/467 , G06F9/544 , G06F9/546 , G06F11/3037 , G06F12/0808 , G06F12/0815 , G06F12/0828 , G06F12/0831 , G06F12/084 , G06F12/0895 , G06F12/128 , G06F13/1668 , G06F2212/1021 , G06F2212/608
摘要: An apparatus includes a CPU core and a L1 cache subsystem including a L1 main cache, a L1 victim cache, and a L1 controller. The apparatus includes a L2 cache subsystem coupled to the L1 cache subsystem by a transaction bus and a tag update bus. The L2 cache subsystem includes a L2 main cache, a shadow L1 main cache, a shadow L1 victim cache, and a L2 controller. The L2 controller receives a message from the L1 controller over the tag update bus, including a valid signal, an address, and a coherence state. In response to the valid signal being asserted, the L2 controller identifies an entry in the shadow L1 main cache or the shadow L1 victim cache having an address corresponding to the address of the message and updates a coherence state of the identified entry to be the coherence state of the message.
-
公开(公告)号:US12038845B2
公开(公告)日:2024-07-16
申请号:US17029913
申请日:2020-09-23
申请人: Intel Corporation
IPC分类号: G06F12/00 , G06F12/0895
CPC分类号: G06F12/0895 , G06F2212/1021
摘要: Techniques and mechanisms for identifying tag information that describes data to be cached at a processor. In an embodiment, a memory controller services a memory access request from the processor, wherein the memory controller reads multiple chunks of data from a memory device, and determines first tag information which corresponds to the multiple chunks. One or more of the multiple chunks are sent to the processor in a response to the request. Based on the first tag information, the memory controller detects for a match—if any—between at least two tags. Where such a match is detected, the memory controller further indicates to the processor that second tag information corresponds to the one or more chunks. In another embodiment, the first tag information is more granular than the second tag information.
-
公开(公告)号:US20240232100A1
公开(公告)日:2024-07-11
申请号:US18584151
申请日:2024-02-22
IPC分类号: G06F12/128 , G06F9/30 , G06F9/54 , G06F11/10 , G06F12/02 , G06F12/0802 , G06F12/0804 , G06F12/0806 , G06F12/0811 , G06F12/0815 , G06F12/0817 , G06F12/0853 , G06F12/0855 , G06F12/0864 , G06F12/0884 , G06F12/0888 , G06F12/0891 , G06F12/0895 , G06F12/0897 , G06F12/12 , G06F12/121 , G06F12/126 , G06F12/127 , G06F13/16 , G06F15/80 , G11C5/06 , G11C7/10 , G11C7/22 , G11C29/42 , G11C29/44
CPC分类号: G06F12/128 , G06F9/3001 , G06F9/30043 , G06F9/30047 , G06F9/546 , G06F11/1064 , G06F12/0215 , G06F12/0238 , G06F12/0292 , G06F12/0802 , G06F12/0804 , G06F12/0806 , G06F12/0811 , G06F12/0815 , G06F12/082 , G06F12/0853 , G06F12/0855 , G06F12/0864 , G06F12/0884 , G06F12/0888 , G06F12/0891 , G06F12/0895 , G06F12/0897 , G06F12/12 , G06F12/121 , G06F12/126 , G06F12/127 , G06F13/1605 , G06F13/1642 , G06F13/1673 , G06F13/1689 , G06F15/8069 , G11C5/066 , G11C7/10 , G11C7/1015 , G11C7/106 , G11C7/1075 , G11C7/1078 , G11C7/1087 , G11C7/222 , G11C29/42 , G11C29/44 , G06F2212/1016 , G06F2212/1021 , G06F2212/1024 , G06F2212/1041 , G06F2212/1044 , G06F2212/301 , G06F2212/454 , G06F2212/603 , G06F2212/6032 , G06F2212/6042 , G06F2212/608 , G06F2212/62
摘要: Methods, apparatus, systems and articles of manufacture are disclosed to reduce read-modify-write cycles for non-aligned writes. An example apparatus includes a memory that includes a plurality of memory banks, an interface configured to be coupled to a central processing unit, the interface to obtain a write operation from the central processing unit, wherein the write operation is to write a subset of the plurality of memory banks, and bank processing logic coupled to the interface and to the memory, the bank processing logic to determine the subset of the plurality of memory banks to write based on the write operation, and determine whether to cause a read operation to be performed in response to the write operation based on whether a number of addresses in the subset of the plurality of memory banks to write satisfies a threshold.
-
公开(公告)号:US20240211405A1
公开(公告)日:2024-06-27
申请号:US18504468
申请日:2023-11-08
申请人: NXP USA, Inc.
发明人: Rajan Srivastava , Sourav Roy
IPC分类号: G06F12/0875 , G06F12/0811
CPC分类号: G06F12/0875 , G06F12/0811 , G06F2212/1021
摘要: A system on chip (SoC) architecture includes an integrated branch and cache hit-miss trace circuit operably coupled to a CPU core, a first trace circuit, and a cache hit-miss trace circuit. Following an occurrence of a cache-fetch instruction: the cache hit-miss trace circuit identifies whether the fetch instruction is a cache-missed instruction, and, in response thereto, sends a cache miss report message that includes a fetch instruction address to the first trace circuit. The first trace circuit is configured to identify whether the fetch instruction is a taken-branch instruction and creates a modified branch trace response message (BTM) that includes the fetch instruction address and sends the modified BTM to a create trace messages circuit. The modified BTM indicates an instruction address of the cache miss.
-
-
-
-
-
-
-
-
-