-
公开(公告)号:US09880842B2
公开(公告)日:2018-01-30
申请号:US13834049
申请日:2013-03-15
Applicant: Intel Corporation
Inventor: Jayaram Bobba , Ruchira Sasanka , Jeffrey J. Cook , Abhinav Das , Arvind Krishnaswamy , David J. Sager , Jason M. Agron
CPC classification number: G06F9/3005 , G06F8/433 , G06F11/0715 , G06F11/0721 , G06F11/076 , G06F11/3466
Abstract: A mechanism for tracking the control flow of instructions in an application and performing one or more optimizations of a processing device, based on the control flow of the instructions in the application, is disclosed. Control flow data is generated to indicate the control flow of blocks of instructions in the application. The control flow data may include annotations that indicate whether optimizations may be performed for different blocks of instructions. The control flow data may also be used to track the execution of the instructions to determine whether an instruction in a block of instructions is assigned to a thread, a process, and/or an execution core of a processor, and to determine whether errors have occurred during the execution of the instructions.
-
公开(公告)号:US09170789B2
公开(公告)日:2015-10-27
申请号:US13997140
申请日:2013-03-05
Applicant: INTEL CORPORATION
Inventor: Ruchira Sasanka , Jeffrey J. Cook , Abhinav Das , Jayaram Bobba , Michael R. Greenfield , Suresh Srinivas
Abstract: Embodiments of computer-implemented methods, systems, computing devices, and computer-readable media (transitory and non-transitory) are described herein for analyzing execution of a plurality of executable instructions and, based on the analysis, providing an indication of a benefit to be obtained by vectorization of at least a subset of the plurality of executable instructions. In various embodiments, the analysis may include identification of the subset of the plurality of executable instructions suitable for conversion to one or more single-instruction multiple-data (“SIMD”) instructions.
Abstract translation: 本文描述了计算机实现的方法,系统,计算设备和计算机可读介质(暂时性和非暂时性)的实施例,用于分析多个可执行指令的执行,并且基于该分析,提供对 可以通过对多个可执行指令的至少一个子集进行向量化来获得。 在各种实施例中,分析可以包括识别适合于转换成一个或多个单指令多数据(“SIMD”)指令的多个可执行指令的子集。
-
公开(公告)号:US20190235849A1
公开(公告)日:2019-08-01
申请号:US16378641
申请日:2019-04-09
Applicant: Intel Corporation
Inventor: Paul Caprioli , Jeffrey J. Cook
CPC classification number: G06F8/52 , G06F9/30 , G06F9/45525 , G06F11/3409 , G06F12/023
Abstract: Technologies for optimized binary translation include a computing device that determines a cost-benefit metric associated with each translated code block of a translation cache. The cost-benefit metric is indicative of translation cost and performance benefit associated with the translated code block. The translation cost may be determined by measuring translation time of the translated code block. The cost-benefit metric may be calculated using a weighted cost-benefit function based on an expected workload of the computing device. In response to determining to free space in the translation cache, the computing device determines whether to discard each translated code block as a function of the cost-benefit metric. In response to determining to free space in the translation cache, the computing device may increment an iteration count and skip each translated code block if the iteration count modulo the corresponding cost-benefit metric is non-zero. Other embodiments are described and claimed.
-
公开(公告)号:US09588766B2
公开(公告)日:2017-03-07
申请号:US13630154
申请日:2012-09-28
Applicant: Intel Corporation
Inventor: Paul Caprioli , Abhay S. Kanhere , Jeffrey J. Cook , Muawya M. Al-Otoom
CPC classification number: G06F9/30036 , G06F9/30014 , G06F9/30032 , G06F9/3012 , G06F9/3887 , G06F9/3893
Abstract: A vector reduction instruction is executed by a processor to provide efficient reduction operations on an array of data elements. The processor includes vector registers. Each vector register is divided into a plurality of lanes, and each lane stores the same number of data elements. The processor also includes execution circuitry that receives the vector reduction instruction to reduce the array of data elements stored in a source operand into a result in a destination operand using a reduction operator. Each of the source operand and the destination operand is one of the vector registers. Responsive to the vector reduction instruction, the execution circuitry applies the reduction operator to two of the data elements in each lane, and shifts one or more remaining data elements when there is at least one of the data elements remaining in each lane.
Abstract translation: 由处理器执行向量减少指令以对数据元素阵列提供有效的减少操作。 处理器包括向量寄存器。 每个向量寄存器被分成多个通道,每个通道存储相同数量的数据元素。 处理器还包括执行电路,其接收向量减少指令,以使用缩减运算符将存储在源操作数中的数据元素的阵列减少到目标操作数的结果。 源操作数和目标操作数中的每一个都是向量寄存器之一。 响应于向量减少指令,执行电路将减法运算符应用于每个通道中的两个数据元素,并且当存在每个通道中的至少一个数据元素时,移位一个或多个剩余数据元素。
-
5.
公开(公告)号:US11188341B2
公开(公告)日:2021-11-30
申请号:US16364704
申请日:2019-03-26
Applicant: Intel Corporation
Inventor: Jeffrey J. Cook , Srikanth T. Srinivasan , Jonathan D. Pearce , David B. Sheffield
Abstract: In one embodiment, an apparatus includes: a plurality of execution lanes to perform parallel execution of instructions; and a unified symbolic store address buffer coupled to the plurality of execution lanes, the unified symbolic store address buffer comprising a plurality of entries each to store a symbolic store address for a store instruction to be executed by at least some of the plurality of execution lanes. Other embodiments are described and claimed.
-
公开(公告)号:US10725755B2
公开(公告)日:2020-07-28
申请号:US15615798
申请日:2017-06-06
Applicant: Intel Corporation
Inventor: David J. Sager , Ruchira Sasanka , Ron Gabor , Shlomo Raikin , Joseph Nuzman , Leeor Peled , Jason A. Domer , Ho-Seop Kim , Youfeng Wu , Koichi Yamada , Tin-Fook Ngai , Howard H. Chen , Jayaram Bobba , Jeffrey J. Cook , Omar M. Shaikh , Suresh Srinivas
Abstract: Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.
-
公开(公告)号:US10282182B2
公开(公告)日:2019-05-07
申请号:US15274624
申请日:2016-09-23
Applicant: Intel Corporation
Inventor: Paul Caprioli , Jeffrey J. Cook
Abstract: Technologies for optimized binary translation include a computing device that determines a cost-benefit metric associated with each translated code block of a translation cache. The cost-benefit metric is indicative of translation cost and performance benefit associated with the translated code block. The translation cost may be determined by measuring translation time of the translated code block. The cost-benefit metric may be calculated using a weighted cost-benefit function based on an expected workload of the computing device. In response to determining to free space in the translation cache, the computing device determines whether to discard each translated code block as a function of the cost-benefit metric. In response to determining to free space in the translation cache, the computing device may increment an iteration count and skip each translated code block if the iteration count modulo the corresponding cost-benefit metric is non-zero. Other embodiments are described and claimed.
-
公开(公告)号:US12135981B2
公开(公告)日:2024-11-05
申请号:US18207870
申请日:2023-06-09
Applicant: Intel Corporation
Inventor: Rajesh M. Sankaran , Gilbert Neiger , Narayan Ranganathan , Stephen R. Van Doren , Joseph Nuzman , Niall D. McDonnell , Michael A. O'Hanlon , Lokpraveen B. Mosur , Tracy Garrett Drysdale , Eriko Nurvitadhi , Asit K. Mishra , Ganesh Venkatesh , Deborah T. Marr , Nicholas P. Carter , Jonathan D. Pearce , Edward T. Grochowski , Richard J. Greco , Robert Valentine , Jesus Corbal , Thomas D. Fletcher , Dennis R. Bradford , Dwight P. Manley , Mark J. Charney , Jeffrey J. Cook , Paul Caprioli , Koichi Yamada , Kent D. Glossop , David B. Sheffield
Abstract: Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.
-
公开(公告)号:US11416281B2
公开(公告)日:2022-08-16
申请号:US16474978
申请日:2016-12-31
Applicant: Intel Corporation
Inventor: Rajesh M. Sankaran , Gilbert Neiger , Narayan Ranganathan , Stephen R. Van Doren , Joseph Nuzman , Niall D. McDonnell , Michael A. O'Hanlon , Lokpraveen B. Mosur , Tracy Garrett Drysdale , Eriko Nurvitadhi , Asit K. Mishra , Ganesh Venkatesh , Deborah T. Marr , Nicholas P. Carter , Jonathan D. Pearce , Edward T. Grochowski , Richard J. Greco , Robert Valentine , Jesus Corbal , Thomas D. Fletcher , Dennis R. Bradford , Dwight P. Manley , Mark J. Charney , Jeffrey J. Cook , Paul Caprioli , Koichi Yamada , Kent D. Glossop , David B. Sheffield
Abstract: Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.
-
公开(公告)号:US10983773B2
公开(公告)日:2021-04-20
申请号:US16378641
申请日:2019-04-09
Applicant: Intel Corporation
Inventor: Paul Caprioli , Jeffrey J. Cook
Abstract: Technologies for optimized binary translation include a computing device that determines a cost-benefit metric associated with each translated code block of a translation cache. The cost-benefit metric is indicative of translation cost and performance benefit associated with the translated code block. The translation cost may be determined by measuring translation time of the translated code block. The cost-benefit metric may be calculated using a weighted cost-benefit function based on an expected workload of the computing device. In response to determining to free space in the translation cache, the computing device determines whether to discard each translated code block as a function of the cost-benefit metric. In response to determining to free space in the translation cache, the computing device may increment an iteration count and skip each translated code block if the iteration count modulo the corresponding cost-benefit metric is non-zero. Other embodiments are described and claimed.
-
-
-
-
-
-
-
-
-