-
公开(公告)号:US20240403052A1
公开(公告)日:2024-12-05
申请号:US18329456
申请日:2023-06-05
Applicant: Arm Limited
Inventor: Joshua Randall , Siying Feng
IPC: G06F9/30
Abstract: The present disclosure relates generally to integrated circuits and relates more particularly to indexed vector permutation operations.
-
公开(公告)号:US20240403050A1
公开(公告)日:2024-12-05
申请号:US18509121
申请日:2023-11-14
Applicant: Arm Limited
Inventor: Joshua Randall , Siying Feng
IPC: G06F9/30
Abstract: The present disclosure relates generally to integrated circuits and relates more particularly to vector comparison and/or population count operations, such as for vector sorting, merging, and/or intersection.
-
公开(公告)号:US20210374059A1
公开(公告)日:2021-12-02
申请号:US16884359
申请日:2020-05-27
Applicant: Arm Limited
Inventor: Jose Alberto Joao , Tiago Rogerio Muck , Joshua Randall , Alejandro Rico Carro , Bruce James Mathewson
IPC: G06F12/0842 , G06F12/0875
Abstract: A method and apparatus is disclosed for transferring data from a first processor core to a second processor core. The first processor core executes a stash instruction having a first operand associated with a data address of the data. A second processor core is determined to be a stash target for a stash message, based on the data address or a second operand. A stash message is sent to the second processor core, notifying the second processor core of the written data. Responsive to receiving the stash message, the second processor core can opt to store the data in its cache. The data may be included in the stash message or retrieved in response to a read request by the second processing core. The second processor core may be determined by prediction based, at least in part, on monitored data transactions.
-
公开(公告)号:US11151039B2
公开(公告)日:2021-10-19
申请号:US16821271
申请日:2020-03-17
Applicant: Arm Limited
Inventor: Joshua Randall , Jesse Garrett Beu
IPC: G06F12/0815 , G06F12/0895 , G06F12/14 , G06F12/0884 , G06F12/02
Abstract: An apparatus is provided for receiving requests from a plurality of processing units, at least some of which may have associated cache storage. A snoop unit implements a cache coherency protocol when a request received by the apparatus identifies a cacheable memory address. Snoop filter storage is provided comprising an N-way set associative storage structure with a plurality of entries. Each entry stores coherence data for an associated address range identifying a memory block, and the coherence data is used to determine which cache storages need to be subjected to a snoop operation when implementing the cache coherency protocol in response to a received request. The snoop filter storage stores coherence data for memory blocks of at least a plurality P of different size granularities, and is organised as a plurality of at least P banks that are accessible in parallel, where each bank has entries within each of the N-ways of the snoop filter storage. The snoop control circuitry controls access to the snoop filter storage, and is responsive to a received address to create a group of indexes, the group of indexes comprising an index for each different size granularity amongst the P different size granularities, and each index in the group being constrained so as to identify an entry in a different bank of the snoop filter storage. The snoop control circuitry uses the group of indexes to perform a lookup operation in parallel within the snoop filter storage in order to determine, taking into account each of the different size granularities, whether an entry stores coherence data for the received address.
-
公开(公告)号:US11934307B2
公开(公告)日:2024-03-19
申请号:US17905566
申请日:2021-01-18
Applicant: Arm Limited
Inventor: Joshua Randall , Jesse Garrett Beu
IPC: G06F12/00 , G06F12/02 , G06F12/0831 , G06F12/0871
CPC classification number: G06F12/0292 , G06F12/0831 , G06F12/0871
Abstract: An apparatus and method are provided for receiving a request from a plurality of processing units, where multiple of those processing units have associated cache storage. A snoop unit is used to implement a cache coherency protocol when a request is received that identifies a cacheable memory address. The snoop unit has snoop filter storage comprising a plurality of snoop filter tables organized in a hierarchical arrangement. The snoop filter tables comprise a primary snoop filter table at a highest level in the hierarchy, and each snoop filter table at a lower level in the hierarchy forms a backup snoop filter table for an adjacent snoop filter table at a higher level in the hierarchy. Each snoop filter table is arranged as a multi-way set associative storage structure, and each backup snoop filter table has a different number of sets than are provided in the adjacent snoop filter table.
-
公开(公告)号:US11144318B2
公开(公告)日:2021-10-12
申请号:US16550644
申请日:2019-08-26
Applicant: Arm Limited
Inventor: Alejandro Rico Carro , Joshua Randall , Jose Alberto Joao
Abstract: A method and apparatus for application thread prioritization to mitigate the effects of operating system noise is disclosed. The method generally includes executing in parallel a plurality of application threads of a parallel application. An interrupt condition of an application thread of the plurality of application threads is detected. A priority of the interrupted application thread is changed relative to priorities of one or more other application threads of the plurality of application threads, and control is returned to the interrupted application thread after the interrupt condition. The interrupted application thread then resumes execution in accordance with the changed priority.
-
公开(公告)号:US10776266B2
公开(公告)日:2020-09-15
申请号:US16182741
申请日:2018-11-07
Applicant: Arm Limited
Inventor: Joshua Randall , Alejandro Rico Carro , Jose Alberto Joao , Richard William Earnshaw , Alasdair Grant
IPC: G06F12/00 , G06F12/0802 , G06F13/00 , G06F13/28
Abstract: Aspects of the present disclosure relate to an apparatus comprising a requester master processing device having an associated private cache storage to store data for access by the requester master processing device. The requester master processing device is arranged to issue a request to modify data that is associated with a given memory address and stored in a private cache storage associated with a recipient master processing device. The private cache storage associated with the recipient master processing device is arranged to store data for access by the recipient master processing device. The apparatus further comprises the recipient master processing device having its private cache storage. One of the recipient master processing device and its associated private cache storage is arranged to perform the requested modification of the data while the data is stored in the cache storage associated with the recipient master processing device.
-
公开(公告)号:US11899583B2
公开(公告)日:2024-02-13
申请号:US17388927
申请日:2021-07-29
Applicant: Arm Limited
Inventor: Joshua Randall , Alejandro Rico Carro , Dam Sunwoo , Saurabh Pijuskumar Sinha , Jamshed Jalal
IPC: G06F12/0811 , G06F12/084 , H04L45/42 , H04L49/109 , G06F12/0813 , G06F12/0893
CPC classification number: G06F12/0811 , G06F12/084 , G06F12/0813 , G06F12/0893 , H04L45/42 , H04L49/109
Abstract: Various implementations described herein are directed to a device with a multi-layered logic structure with multiple layers including a first layer and a second layer arranged vertically in a stacked configuration. The device may have a first cache memory with first interconnect logic disposed in the first layer. The device may have a second cache memory with second interconnect logic disposed in the second layer, wherein the second interconnect logic in the second layer is linked to the first interconnect logic in the first layer.
-
公开(公告)号:US20230367843A1
公开(公告)日:2023-11-16
申请号:US17743705
申请日:2022-05-13
Applicant: Arm Limited
Inventor: Joshua Randall , Jesse Garrett Beu , Krishnendra Nathella , Tuan Quang Ta
Abstract: A data processing method and processor instructions are provided that leverage scatter operations to efficiently merge vector and matrix indices, as compared to standard matrix and vector operations, as well as merge other arithmetic results, lists of numbers, etc.
-
公开(公告)号:US11625349B1
公开(公告)日:2023-04-11
申请号:US17529768
申请日:2021-11-18
Applicant: Arm Limited
Inventor: Joshua Randall , Alexander Cole Shulyak , Jose Alberto Joao
Abstract: An apparatus and method are provided for managing prefetch transactions. The apparatus has an interconnect for providing communication paths between elements coupled to the interconnect. The elements coupled to the interconnect comprise at least a requester element to initiate transactions, and a plurality of completer elements each of which is arranged to respond to a transaction received by that completer element. Congestion tracking circuitry maintains, in association with the requester element, a congestion indication for each of a plurality of routes through the interconnect used to propagate transactions initiated by that requester element. Each route comprises one or more communication paths, and the route employed to propagate a given transaction is dependent on a target completer element for that transaction. Prefetch throttling circuitry then identifies, in response to an indication of a given prefetch transaction that the requester element wishes to initiate, the target completer element amongst the plurality of completer elements to which that given prefetch transaction would be issued. It then determines whether to issue the given prefetch transaction in dependence on the congestion indication for the route that has been determined.
-
-
-
-
-
-
-
-
-