-
公开(公告)号:US10983793B2
公开(公告)日:2021-04-20
申请号:US16369846
申请日:2019-03-29
Applicant: INTEL CORPORATION
Inventor: Joshua Fryman , Ankit More , Jason Howard , Robert Pawlowski , Yigit Demir , Nick Pepperling , Fabrizio Petrini , Sriram Aananthakrishnan , Shaden Smith
Abstract: The present disclosure is directed to systems and methods of performing one or more broadcast or reduction operations using direct memory access (DMA) control circuitry. The DMA control circuitry executes a modified instruction set architecture (ISA) that facilitates the broadcast distribution of data to a plurality of destination addresses in system memory circuitry. The broadcast instruction may include broadcast of a single data value to each destination address. The broadcast instruction may include broadcast of a data array to each destination address. The DMA control circuitry may also execute a reduction instruction that facilitates the retrieval of data from a plurality of source addresses in system memory and performing one or more operations using the retrieved data. Since the DMA control circuitry, rather than the processor circuitry performs the broadcast and reduction operations, system speed and efficiency is beneficially enhanced.
-
公开(公告)号:US12158852B2
公开(公告)日:2024-12-03
申请号:US17358832
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Robert Pawlowski , Bharadwaj Krishnamurthy , Shruti Sharma , Byoungchan Oh , Jing Fang , Daniel Klowden , Jason Howard , Joshua Fryman
IPC: G06F13/28
Abstract: Systems, methods, and apparatuses for direct memory access instruction set architecture support for flexible dense compute using a reconfigurable spatial array are described. In one embodiment, a processor includes a first type of hardware processor core that includes a two-dimensional grid of compute circuits, a memory, and a direct memory access circuit coupled to the memory and the two-dimensional grid of compute circuits; and a second different type of hardware processor core that includes a decoder circuit to decode a single instruction into a decoded single instruction, the single instruction including a first field to identify a base address of two-dimensional data in the memory, a second field to identify a number of elements in each one-dimensional array of the two-dimensional data, a third field to identify a number of one-dimensional arrays of the two-dimensional data, a fourth field to identify an operation to be performed by the two-dimensional grid of compute circuits, and a fifth field to indicate the direct memory access circuit is to move the two-dimensional data indicated by the first field, the second field, and the third field into the two-dimensional grid of compute circuits and the two-dimensional grid of compute circuits is to perform the operation on the two-dimensional data according to the fourth field, and an execution circuit to execute the decoded single instruction according to the fields.
-
公开(公告)号:US11360809B2
公开(公告)日:2022-06-14
申请号:US16024343
申请日:2018-06-29
Applicant: Intel Corporation
Inventor: William Paul Griffin , Joshua Fryman , Jason Howard , Sang Phill Park , Robert Pawlowski , Michael Abbott , Scott Cline , Samkit Jain , Ankit More , Vincent Cave , Fabrizio Petrini , Ivan Ganev
Abstract: Embodiments of apparatuses, methods, and systems for scheduling tasks to hardware threads are described. In an embodiment, a processor includes a multiple hardware threads and a task manager. The task manager is to issue a task to a hardware thread. The task manager includes a hardware task queue to store a descriptor for the task. The descriptor is to include a field to store a value to indicate whether the task is a single task, a collection of iterative tasks, and a linked list of tasks.
-
公开(公告)号:US20240256283A1
公开(公告)日:2024-08-01
申请号:US18566068
申请日:2022-03-31
Applicant: Intel Corporation
Inventor: Joshua B. Fryman , Byoungchan Oh , Sai Dheeraj Polagani , Kevin P. Ma , Robert S. Pawlowski , Bharadwaj Coimbatore Krishnamurthy , Shruti Sharma , Smitha P. Vasantha Kumar , Jason Howard , Daniel S. Klowden
CPC classification number: G06F9/3851 , G06F11/3409
Abstract: A system is provided that includes a set of graph processing cores and a set of dense compute cores. where the set of graph processing cores and the set of dense cores are interconnected in a network. The dense compute cores include offload queue circuitry to receive an offload request from the set of graph processing cores to handle dense compute workloads. Memory controllers are also provided in the system for use by the graph processing cores in reading and writing to memory in association with sparse graph applications. the memory controllers enhanced to efficiently handle memory transactions in sparse graph applications.
-
公开(公告)号:US20220413855A1
公开(公告)日:2022-12-29
申请号:US17359305
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Robert Pawlowski , Sriram Aananthakrishnan , Jason Howard , Joshua Fryman
IPC: G06F9/30 , G06F9/38 , G06F12/0875
Abstract: Techniques for operating on an indirect memory access instruction, where the instruction accesses a memory location via at least one indirect address. A pipeline processes the instruction and a memory operation engine generates a first access to the at least one indirect address and a second access to a target address determined by the at least one indirect address. A cache memory used with the pipeline and the memory operation engine caches pointers. In response to a cache hit when executing the indirect memory access instruction, operations dereference a pointer to obtain the at least one indirect address, not set a cache bit, and return data for the instruction without storing the data in the cache memory; and in response to a cache miss, operations set the cache bit, obtain, and store a cache line for a missed pointer, and return data without storing the data in the cache memory.
-
公开(公告)号:US20200310795A1
公开(公告)日:2020-10-01
申请号:US16369846
申请日:2019-03-29
Applicant: INTEL CORPORATION
Inventor: Joshua Fryman , Ankit More , Jason Howard , Robert Pawlowski , Yigit Demir , Nick Pepperling , Fabrizio Petrini , Sriram Aananthakrishnan , Shaden Smith
Abstract: The present disclosure is directed to systems and methods of performing one or more broadcast or reduction operations using direct memory access (DMA) control circuitry. The DMA control circuitry executes a modified instruction set architecture (ISA) that facilitates the broadcast distribution of data to a plurality of destination addresses in system memory circuitry. The broadcast instruction may include broadcast of a single data value to each destination address. The broadcast instruction may include broadcast of a data array to each destination address. The DMA control circuitry may also execute a reduction instruction that facilitates the retrieval of data from a plurality of source addresses in system memory and performing one or more operations using the retrieved data. Since the DMA control circuitry, rather than the processor circuitry performs the broadcast and reduction operations, system speed and efficiency is beneficially enhanced.
-
公开(公告)号:US10929132B1
公开(公告)日:2021-02-23
申请号:US16579806
申请日:2019-09-23
Applicant: Intel Corporation
Inventor: Robert Pawlowski , Scott Hagan Schmittel , Joshua Fryman , Wim Heirman , Jason Howard , Ankit More , Shaden Smith , Scott Cline
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to access a compressed graphic list. In one example, a processor includes fetch and decode circuitry to fetch and decode the single instruction to access the compressed graphic list, and execution circuitry to execute the decoded single instruction to cause access to the compressed graphic list by: receiving, from a load store queue, at a first op-engine associated with a first data location, an indirection request, computing, via the first op-engine, a second data location associated with a second op-engine, computing, via the second op-engine, a third data location associated with a third op-engine responsive to the indirection request, and providing, via the third op-engine, a data response to the load store queue responsive to receiving data from the third data location.
-
公开(公告)号:US20200004587A1
公开(公告)日:2020-01-02
申请号:US16024343
申请日:2018-06-29
Applicant: Intel Corporation
Inventor: Paul Griffin , Joshua Fryman , Jason Howard , Sang Phill Park , Robert Pawlowski , Michael Abbott , Scott Cline , Samkit Jain , Ankit More , Vincent Cave , Fabrizio Petrini , Ivan Ganev
Abstract: Embodiments of apparatuses, methods, and systems for a multithreaded processor core with hardware-assisted task scheduling are described. In an embodiment, a processor includes a first hardware thread, a second hardware thread, and a task manager. The task manager is to issue a task to the first hardware thread. The task manager includes a hardware task queue in which to store a plurality of task descriptors. Each of the task descriptors is to represent one of a single task, a collection of iterative tasks, and a linked list of tasks.
-
公开(公告)号:US12204901B2
公开(公告)日:2025-01-21
申请号:US17359305
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Robert Pawlowski , Sriram Aananthakrishnan , Jason Howard , Joshua Fryman
IPC: G06F9/30 , G06F9/38 , G06F12/0875
Abstract: Techniques for operating on an indirect memory access instruction, where the instruction accesses a memory location via at least one indirect address. A pipeline processes the instruction and a memory operation engine generates a first access to the at least one indirect address and a second access to a target address determined by the at least one indirect address. A cache memory used with the pipeline and the memory operation engine caches pointers. In response to a cache hit when executing the indirect memory access instruction, operations dereference a pointer to obtain the at least one indirect address, not set a cache bit, and return data for the instruction without storing the data in the cache memory; and in response to a cache miss, operations set the cache bit, obtain, and store a cache line for a missed pointer, and return data without storing the data in the cache memory.
-
公开(公告)号:US20240020428A1
公开(公告)日:2024-01-18
申请号:US18476026
申请日:2023-09-27
Applicant: Intel Corporation
Inventor: Akhilesh Thyagaturu , Jason Howard , Nicholas Ross , Sanjaya Tayal , Vinodh Gopal
CPC classification number: G06F21/85 , G06F21/71 , G06F21/577 , G06F2221/034
Abstract: Systems, apparatus, articles of manufacture, and methods are disclosed to generate and manage a firewall policy. An example includes interface circuitry, machine readable instructions, and programmable circuitry to at least one of instantiate or execute the machine readable instructions to determine whether an operation is allowed to pass between a first component on a system-on-chip (SoC) and a second component on the SoC, detect an interconnect between the first component on the SoC and the second component on the SoC, cause the interconnect to filter the operation based on the determination of whether the operation is allowed to pass between the first component and the second component, and transmit a request to filter the operation based on the determination of whether the operation is allowed to pass between the first component and the second component.
-
-
-
-
-
-
-
-
-