-
公开(公告)号:US20240345990A1
公开(公告)日:2024-10-17
申请号:US18626775
申请日:2024-04-04
Applicant: Intel Corporation
Inventor: Lakshminarayanan Striramassarma , Prasoonkumar Surti , Varghese George , Ben Ashbaugh , Aravindh Anantaraman , Valentin Andrei , Abhishek Appu , Nicolas Galoppo Von Borries , Altug Koker , Mike Macpherson , Subramaniam Maiyuran , Nilay Mistry , Elmoustapha Ould-Ahmed-Vall , Selvakumar Panneer , Vasanth Ranganathan , Joydeep Ray , Ankur Shah , Saurabh Tangri
IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06N3/08 , G06T1/20 , G06T1/60 , G06T15/06 , H03M7/46
CPC classification number: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06
Abstract: Multi-tile Memory Management for Detecting Cross Tile Access, Providing Multi-Tile Inference Scaling with multicasting of data via copy operation, and Providing Page Migration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a memory and a memory controller, a second graphics processing unit (GPU) having a memory and a cross-GPU fabric to communicatively couple the first and second GPUs. The memory controller is configured to determine whether frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU in the multi-GPU configuration and to send a message to initiate a data transfer mechanism when frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU.
-
公开(公告)号:US20240330203A1
公开(公告)日:2024-10-03
申请号:US18739768
申请日:2024-06-11
Applicant: Texas Instruments Incorporated
Inventor: Mujibur Rahman , Timothy David Anderson
IPC: G06F12/1045 , G06F7/24 , G06F7/487 , G06F7/499 , G06F7/53 , G06F7/57 , G06F9/30 , G06F9/32 , G06F9/345 , G06F9/38 , G06F9/48 , G06F11/00 , G06F11/10 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/1009 , G06F15/78 , G06F17/16 , H03H17/06
CPC classification number: G06F12/1045 , G06F7/24 , G06F7/487 , G06F7/4876 , G06F7/49915 , G06F7/53 , G06F7/57 , G06F9/3001 , G06F9/30014 , G06F9/30021 , G06F9/30032 , G06F9/30036 , G06F9/30065 , G06F9/30072 , G06F9/30098 , G06F9/30112 , G06F9/30145 , G06F9/30149 , G06F9/3016 , G06F9/32 , G06F9/345 , G06F9/3802 , G06F9/3818 , G06F9/383 , G06F9/3836 , G06F9/3851 , G06F9/3856 , G06F9/3867 , G06F9/3887 , G06F9/48 , G06F11/00 , G06F11/1048 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/1009 , G06F17/16 , H03H17/0664 , G06F9/30018 , G06F9/325 , G06F9/381 , G06F9/3822 , G06F11/10 , G06F15/7807 , G06F15/781 , G06F2212/452 , G06F2212/60 , G06F2212/602 , G06F2212/68
Abstract: Devices and methods are provided for performing, by a processor in response to a floating point multiply instruction, multiplication of floating point numbers. In an example, a device includes a processor that includes a multiply circuit. The multiply circuit is configured to multiply floating point numbers in response to a floating point multiply instruction, and is further configured to determine values of implied bits of mantissas of the floating point numbers, and multiply the mantissas in parallel with the determining operation.
-
公开(公告)号:US20240330196A1
公开(公告)日:2024-10-03
申请号:US18388602
申请日:2023-11-10
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Skyler J. SALEH , Samuel NAFFZIGER , Milind S. BHAGAVAT , Rahul AGARWAL
IPC: G06F12/0897 , G06F13/16 , G06F13/40
CPC classification number: G06F12/0897 , G06F13/1668 , G06F13/4027 , G06F2212/1024
Abstract: A chiplet system includes a central processing unit (CPU) communicably coupled to a first GPU chiplet of a GPU chiplet array. The GPU chiplet array includes the first GPU chiplet communicably coupled to the CPU via a bus and a second GPU chiplet communicably coupled to the first GPU chiplet via a passive crosslink. The passive crosslink is a passive interposer die dedicated for inter-chiplet communications and partitions systems-on-a-chip (SoC) functionality into smaller functional chiplet groupings.
-
公开(公告)号:US12105635B2
公开(公告)日:2024-10-01
申请号:US17384858
申请日:2021-07-26
Applicant: TEXAS INSTRUMENTS INCORPORATED
Inventor: Timothy David Anderson , Mujibur Rahman , Dheera Balasubramanian Samudrala , Peter Richard Dent , Duc Quang Bui
IPC: G06F9/30 , G06F7/24 , G06F7/487 , G06F7/499 , G06F7/53 , G06F7/57 , G06F9/32 , G06F9/345 , G06F9/38 , G06F9/48 , G06F11/00 , G06F11/10 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/1009 , G06F12/1045 , G06F17/16 , H03H17/06 , G06F15/78
CPC classification number: G06F12/1045 , G06F7/24 , G06F7/487 , G06F7/4876 , G06F7/49915 , G06F7/53 , G06F7/57 , G06F9/3001 , G06F9/30014 , G06F9/30021 , G06F9/30032 , G06F9/30036 , G06F9/30065 , G06F9/30072 , G06F9/30098 , G06F9/30112 , G06F9/30145 , G06F9/30149 , G06F9/3016 , G06F9/32 , G06F9/345 , G06F9/3802 , G06F9/3818 , G06F9/383 , G06F9/3836 , G06F9/3851 , G06F9/3856 , G06F9/3867 , G06F9/3887 , G06F9/48 , G06F11/00 , G06F11/1048 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/1009 , G06F17/16 , H03H17/0664 , G06F9/30018 , G06F9/325 , G06F9/381 , G06F9/3822 , G06F11/10 , G06F15/7807 , G06F15/781 , G06F2212/452 , G06F2212/60 , G06F2212/602 , G06F2212/68
Abstract: A method is provided that includes performing, by a processor in response to a vector permutation instruction, permutation of values stored in lanes of a vector to generate a permuted vector, wherein the permutation is responsive to a control storage location storing permute control input for each lane of the permuted vector, wherein the permute control input corresponding to each lane of the permuted vector indicates a value to be stored in the lane of the permuted vector, wherein the permute control input for at least one lane of the permuted vector indicates a value of a selected lane of the vector is to be stored in the at least one lane, and storing the permuted vector in a storage location indicated by an operand of the vector permutation instruction.
-
公开(公告)号:US20240319909A1
公开(公告)日:2024-09-26
申请号:US18679895
申请日:2024-05-31
Applicant: Lodestar Licensing Group LLC
Inventor: Frank F. Ross
IPC: G06F3/06 , G06F11/20 , G06F12/0868 , G06F12/0897 , G06F13/16 , G11C7/10
CPC classification number: G06F3/0655 , G06F3/0635 , G06F3/0679 , G06F3/0688 , G06F11/201 , G06F12/0868 , G06F12/0897 , G06F13/1668 , G11C7/1075
Abstract: The present disclosure includes apparatuses and methods related to data transfer in memory. An example apparatus can include a first number of memory devices coupled to a host via a first number of ports and a second number of memory devices coupled to the first number of memory device via a second number of ports, wherein a first number of commands are executed to transfer data between the first number of memory devices and the host via the first number of ports and a second number of commands are executed to transfer data between the first number of memory device and the second number of memory device via the second number of ports.
-
公开(公告)号:US12086067B2
公开(公告)日:2024-09-10
申请号:US18141463
申请日:2023-04-30
Applicant: SiFive, Inc.
Inventor: Andrew Waterman , Krste Asanovic
IPC: G06F12/0855 , G06F12/0815 , G06F12/0875 , G06F12/0897
CPC classification number: G06F12/0855 , G06F12/0815 , G06F12/0875 , G06F12/0897
Abstract: Systems and methods are disclosed for load-store pipeline selection for vectors. For example, an integrated circuit (e.g., a processor) for executing instructions includes an L1 cache that provides an interface to a memory system; an L2 cache connected to the L1 cache that implements a cache coherency protocol with the L1 cache; a first store unit configured to write data to the memory system via the L1 cache; a second store unit configured to bypass the L1 cache and write data to the memory system via the L2 cache; and a store pipeline selection circuitry configured to: identify an address associated with a first beat of a store instruction with a vector argument; select between the first store unit and the second store unit based on the address associated with the first beat of the store instruction; and dispatch the store instruction to the selected store unit.
-
公开(公告)号:US12079470B2
公开(公告)日:2024-09-03
申请号:US17379345
申请日:2021-07-19
Applicant: TEXAS INSTRUMENTS INCORPORATED
Inventor: Matthew Pierson
IPC: G06F3/06 , G06F9/30 , G06F9/32 , G06F9/38 , G06F12/0875 , G06F12/0897 , G06F13/14
CPC classification number: G06F3/0604 , G06F3/0656 , G06F3/0659 , G06F3/0683 , G06F9/3004 , G06F9/30047 , G06F9/30076 , G06F9/3016 , G06F9/32 , G06F9/3802 , G06F9/383 , G06F12/0875 , G06F12/0897 , G06F13/14 , G06F2212/1016 , G06F2212/452 , G06F2212/60
Abstract: Disclosed embodiments relate to one or more techniques to control access by a requestor of a computing system to a shared memory resource. In one embodiment, a technique includes determining a number (N) of pending requests to be sent to the memory by the requestor, determining a number (M) of requests that the requestor is limited to sending based on an amount of buffering resources available, and comparing M to N. When N is both greater than zero and less than or equal to M, the requestor sends the N pending requests to the memory. When N is both greater than zero and greater than M, M is compared to a hysteresis value (R) and, when M is less than R, the requestor sends R of the N pending requests to the memory.
-
公开(公告)号:US12079155B2
公开(公告)日:2024-09-03
申请号:US17428216
申请日:2020-03-14
Applicant: Intel Corporation
Inventor: Joydeep Ray , Selvakumar Panneer , Saurabh Tangri , Ben Ashbaugh , Scott Janus , Abhishek Appu , Varghese George , Ravishankar Iyer , Nilesh Jain , Pattabhiraman K , Altug Koker , Mike MacPherson , Josh Mastronarde , Elmoustapha Ould-Ahmed-Vall , Jayakrishna P. S , Eric Samson
IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06N3/08 , G06T15/06
CPC classification number: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06
Abstract: Embodiments described herein include software, firmware, and hardware that provides techniques to enable deterministic scheduling across multiple general-purpose graphics processing units. One embodiment provides a multi-GPU architecture with uniform latency. One embodiment provides techniques to distribute memory output based on memory chip thermals. One embodiment provides techniques to enable thermally aware workload scheduling. One embodiment provides techniques to enable end to end contracts for workload scheduling on multiple GPUs.
-
9.
公开(公告)号:US12061908B2
公开(公告)日:2024-08-13
申请号:US17472852
申请日:2021-09-13
Applicant: TEXAS INSTRUMENTS INCORPORATED
Inventor: Joseph Zbiciak , Timothy Anderson
IPC: G06F9/32 , G06F9/30 , G06F9/345 , G06F9/38 , G06F11/00 , G06F12/02 , G06F12/0875 , G06F12/0897 , G06F13/16 , G06F13/40 , G06F11/10
CPC classification number: G06F9/321 , G06F9/30014 , G06F9/30036 , G06F9/30043 , G06F9/30047 , G06F9/30098 , G06F9/30112 , G06F9/30145 , G06F9/3016 , G06F9/32 , G06F9/345 , G06F9/3802 , G06F9/383 , G06F9/3867 , G06F11/00 , G06F12/0207 , G06F12/0875 , G06F12/0897 , G06F13/1605 , G06F13/4068 , G06F9/3836 , G06F11/10 , G06F2212/452 , G06F2212/60
Abstract: A streaming engine employed in a digital data processor specifies fixed first and second read only data streams. Corresponding stream address generator produces address of data elements of the two streams. Corresponding steam head registers stores data elements next to be supplied to functional units for use as operands. The two streams share two memory ports. A toggling preference of stream to port ensures fair allocation. The arbiters permit one stream to borrow the other's interface when the other interface is idle. Thus one stream may issue two memory requests, one from each memory port, if the other stream is idle. This spreads the bandwidth demand for each stream across both interfaces, ensuring neither interface becomes a bottleneck.
-
公开(公告)号:US12038843B1
公开(公告)日:2024-07-16
申请号:US18537927
申请日:2023-12-13
Applicant: Next Silicon Ltd
Inventor: Yiftach Gilad , Liron Zur
IPC: G06F12/08 , G06F12/0862 , G06F12/0897
CPC classification number: G06F12/0862 , G06F12/0897 , G06F2212/602 , G06F2212/6024
Abstract: A joint scheduler adapted for dispatching prefetch and demand accesses of data relating to a plurality of instructions loaded in an execution pipeline of processing circuit(s). Each prefetch access comprises checking whether a respective data is cached in a cache entry and each demand access comprises accessing a respective data. The joint scheduler is adapted to, responsive to each hit prefetch access dispatched for a respective data relating to a respective instruction, associate the respective instruction with a valid indication and a pointer to a respective cache entry storing the respective data such that the demand access relating to the respective instruction uses the associated pointer to access the respective data in the cache, and responsive to each missed prefetch access dispatched for a respective data relating to a respective instruction, initiate a read cycle for loading the respective data from next level memory and cache it in the cache.
-
-
-
-
-
-
-
-
-