-
公开(公告)号:US20190155620A1
公开(公告)日:2019-05-23
申请号:US16259608
申请日:2019-01-28
Applicant: Intel Corporation
Inventor: Meenakshi Arunachalam , Kushal Datta , Vikram Saletore , Vishal Verma , Deepthi Karkada , Vamsi Sripathi , Rahul Khanna , Mohan Kumar
Abstract: Systems, apparatuses and methods may provide for technology that identifies a first set of compute nodes and a second set of compute nodes, wherein the first set of compute nodes execute more slowly than the second set of compute nodes. The technology may also automatically determine a compute node configuration that results in a relatively low difference in completion time between the first set of compute nodes and the second set of compute nodes with respect to a neural network workload. In an example, the technology applies the compute node configuration to an execution of the neural network workload on one or more nodes in the first set of compute nodes and one or more nodes in the second set of compute nodes.
-
公开(公告)号:US10157142B2
公开(公告)日:2018-12-18
申请号:US15280965
申请日:2016-09-29
Applicant: INTEL CORPORATION
Inventor: Ashok Raj , Sivakumar Radhakrishnan , Dan J. Williams , Vishal Verma , Narayan Ranganathan , Chet R. Douglas
Abstract: In one embodiment, a block data transfer interface employing offload data transfer engine in accordance with the present description includes an offload data transfer engine executing a data transfer command set to transfer a block of data in a transfer data path from a source memory to a new region of a destination memory, wherein the transfer data path bypasses a central processing unit to minimize or reduce involvement of the central processing unit in the block transfer. In response to a successful transfer indication, a logical address is re-mapped to a physical address of the new region of the destination memory, instead of a physical address of the original region of the destination memory. In one embodiment, the re-mapping is performed by a central processing unit. In another embodiment, the re-mapping is performed by the offload data transfer engine. Other aspects are described herein.
-
公开(公告)号:US11734204B2
公开(公告)日:2023-08-22
申请号:US16825538
申请日:2020-03-20
Applicant: Intel Corporation
Inventor: Gang Cao , James R. Harris , Ziye Yang , Vishal Verma , Changpeng Liu , Chong Han , Benjamin Walker
CPC classification number: G06F13/1668 , G06F9/5027
Abstract: Examples herein relate to polling for input/output transactions of a network interface or a storage device, or any peripheral device. Some examples monitor clock cycles spent checking for a presence of input/output (I/O) events and processing I/O events and monitor clock cycles spent checking for presence of I/O events without completing an I/O event. Central processing unit (CPU) core utilization can be based on clock cycles spent checking for a presence of I/O events and processing I/O events and clock cycles spent checking for presence of I/O events without completion of an I/O event. For example, if core utilization is below a threshold, frequency of the core can be reduced for performing polling of I/O events. If core utilization is at or above the threshold, frequency of the core can be increased used to performing polling of I/O events.
-
公开(公告)号:US11029971B2
公开(公告)日:2021-06-08
申请号:US16259608
申请日:2019-01-28
Applicant: Intel Corporation
Inventor: Meenakshi Arunachalam , Kushal Datta , Vikram Saletore , Vishal Verma , Deepthi Karkada , Vamsi Sripathi , Rahul Khanna , Mohan Kumar
Abstract: Systems, apparatuses and methods may provide for technology that identifies a first set of compute nodes and a second set of compute nodes, wherein the first set of compute nodes execute more slowly than the second set of compute nodes. The technology may also automatically determine a compute node configuration that results in a relatively low difference in completion time between the first set of compute nodes and the second set of compute nodes with respect to a neural network workload. In an example, the technology applies the compute node configuration to an execution of the neural network workload on one or more nodes in the first set of compute nodes and one or more nodes in the second set of compute nodes.
-
-
-