-
公开(公告)号:US20190042534A1
公开(公告)日:2019-02-07
申请号:US15980579
申请日:2018-05-15
Applicant: Intel Corporation
Inventor: William J. Butera , Simon C. Steely, JR. , Richard J. Dischler
Abstract: Embodiments herein may present a multi-tile processor including a plurality of processor tiles, and a plurality of interconnects selectively coupling the plurality of processor tiles to each other. A first processor tile may include a memory to store a bulletin board to hold a message, an execution unit, and an encapsulated software module. The encapsulated software module may select a second processor tile coupled with the first processor tile by an interconnect to be a part of a signal pathway. The second processor tile may be selected based on a selection criterion of the signal pathway and the message held in the bulletin board. The encapsulated software module may post and read a message at the bulletin board stored in the memory, or read a message from a bulletin board stored in a memory of the second processor tile. Other embodiments may be described and/or claimed.
-
公开(公告)号:US20180095756A1
公开(公告)日:2018-04-05
申请号:US15283259
申请日:2016-09-30
Applicant: Intel Corporation
Inventor: William C. Hasenplaugh , Chris J. Newburn , Simon C. Steely, JR. , Samantika S. Sury
IPC: G06F9/30 , G06F12/1045
CPC classification number: G06F9/30032 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/3013 , G06F9/3016 , G06F12/0886 , G06F12/0897 , G06F12/1027 , G06F12/1054 , G06F12/126 , G06F2212/1024 , G06F2212/1028 , G06F2212/681
Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.
-
公开(公告)号:US20200371566A1
公开(公告)日:2020-11-26
申请号:US16862263
申请日:2020-04-29
Applicant: Intel Corporation
Inventor: Simon C. Steely, JR. , Richard Dischler , David Bach , Olivier Franza , William J. Butera , Christian Karl , Benjamin Keen , Brian Leung
IPC: G06F1/18 , H01L23/538 , G06F9/50 , H01L25/065 , G06F15/76
Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.
-
公开(公告)号:US20190087240A1
公开(公告)日:2019-03-21
申请号:US16192322
申请日:2018-11-15
Applicant: Intel Corporation
Inventor: Samantika S. Sury , Robert G. Blankenship , Simon C. Steely, JR.
IPC: G06F9/52 , G06F12/0817
Abstract: In an embodiment, a processor includes a plurality of cores and synchronization logic. The synchronization logic includes circuitry to: receive a first memory request and a second memory request; determine whether the second memory request is in contention with the first memory request; and in response to a determination that the second memory request is in contention with the first memory request, process the second memory request using a non-blocking cache coherence protocol. Other embodiments are described and claimed.
-
15.
公开(公告)号:US20190005161A1
公开(公告)日:2019-01-03
申请号:US15640535
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Kermin Fleming , Kent D. Glossop , Simon C. Steely, JR. , Ping Tak Peter Tang
IPC: G06F17/50 , G06F15/78 , G06F12/0802
CPC classification number: G06F17/505 , G06F12/0802 , G06F12/0862 , G06F12/0888 , G06F12/0895 , G06F15/7867 , G06F15/8015 , G11C8/12 , H03K19/17736 , H03K19/17756 , H03K19/17764 , H03K19/17776 , H03K19/1778
Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements. At least one of the plurality of processing elements includes a plurality of control inputs.
-
公开(公告)号:US20190004955A1
公开(公告)日:2019-01-03
申请号:US15640534
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Michael C. Adler , Chiachen Chou , Neal C. Crago , Kermin Fleming , Kent D. Glossop , Aamer Jaleel , Pratik M. Marolia , Simon C. Steely, JR. , Samantika S. Sury
IPC: G06F12/0862 , G06F12/0802 , H03K19/177 , G06F15/78
CPC classification number: G06F12/0862 , G06F12/0802 , G06F15/7867 , G06F15/8015 , G06F17/505 , G06F2212/6026 , G11C8/12 , H03K19/17736 , H03K19/17756 , H03K19/1776 , H03K19/17764 , H03K19/17776 , H03K19/1778
Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements. The processor also includes a streamer element to prefetch the incoming operand set from two or more levels of a memory system.
-
公开(公告)号:US20180188983A1
公开(公告)日:2018-07-05
申请号:US15396049
申请日:2016-12-30
Applicant: INTEL CORPORATION
Inventor: Kermin Elliott Fleming, JR. , Simon C. Steely, JR. , Kent D. Glossop
IPC: G06F3/06
Abstract: An integrated circuit includes a processor to execute instructions and to interact with memory, and acceleration hardware, to execute a sub-program corresponding to instructions. A set of input queues includes a store address queue to receive, from the acceleration hardware, a first address of the memory, the first address associated with a store operation and a store data queue to receive, from the acceleration hardware, first data to be stored at the first address of the memory. The set of input queues also includes a completion queue to buffer response data for a load operation. A disambiguator circuit, coupled to the set of input queues and the memory, is to, responsive to determining the load operation, which succeeds the store operation, has an address conflict with the first address, copy the first data from the store data queue into the completion queue for the load operation.
-
18.
公开(公告)号:US20190102295A1
公开(公告)日:2019-04-04
申请号:US15721121
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Samantika S. Sury , Robert G. Blankenship , Simon C. Steely, JR. , Yen-Cheng Liu
IPC: G06F12/084 , G06F12/0846 , G06F12/128 , G06F12/0811
Abstract: A method for adaptively performing a set of data transfer processes in a multi-core processor is described. The method may include receiving, by a shared cache from a first core cache, a first request for a cache line; determining, by the shared cache in response to receipt of the first request, whether the cache line is a widely-shared cache line or a single-producer-single-consumer cache line; and performing, by the first core cache and a second core cache, a three-hop data transfer process in response to determining that the cache line is a single-producer-single-consumer cache line, wherein the three-hop data transfer process transfers the cache line directly from the second core cache to the first core cache.
-
19.
公开(公告)号:US20190095369A1
公开(公告)日:2019-03-28
申请号:US15719285
申请日:2017-09-28
Applicant: Intel Corporation
Inventor: Kermin Fleming , Kent D. Glossop , Simon C. Steely, JR.
IPC: G06F13/28 , G06F12/0813 , G06F12/0811
Abstract: Systems, methods, and apparatuses relating to a memory fence mechanism in a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform a plurality of operations, each by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements. The processor also includes a fence manager to manage a memory fence between a first operation and a second operation of the plurality of operations.
-
20.
公开(公告)号:US20190004945A1
公开(公告)日:2019-01-03
申请号:US15640533
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Kermin Fleming , Kent D. Glossop , Simon C. Steely, JR. , Samantika S. Sury
IPC: G06F12/0802 , G06F17/50 , H03K19/177
CPC classification number: G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0815 , G06F15/7867 , G06F15/8015 , G06F15/825 , G06F17/505 , G11C7/1012 , G11C8/12 , G11C2207/2245 , H03K19/17736 , H03K19/17756 , H03K19/1776 , H03K19/17764 , H03K19/17776 , H03K19/1778
Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In an embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform an atomic operation when an incoming operand set arrives at the plurality of processing elements.
-
-
-
-
-
-
-
-
-