Dynamically rewriting branch instructions in response to cache line eviction
    1.
    发明授权
    Dynamically rewriting branch instructions in response to cache line eviction 有权
    动态地重写分支指令以响应缓存线驱逐

    公开(公告)号:US08782381B2

    公开(公告)日:2014-07-15

    申请号:US13444890

    申请日:2012-04-12

    CPC classification number: G06F8/4442 G06F9/3806 G06F12/0875 G06F12/12

    Abstract: Mechanisms are provided for evicting cache lines from an instruction cache of the data processing system. The mechanisms store, for a portion of code in a current cache line, a linked list of call sites that directly or indirectly target the portion of code in the current cache line. A determination is made as to whether the current cache line is to be evicted from the instruction cache. The linked list of call sites is processed to identify one or more rewritten branch instructions having associated branch stubs, that either directly or indirectly target the portion of code in the current cache line. In addition, the one or more rewritten branch instructions are rewritten to restore the one or more rewritten branch instructions to an original state based on information in the associated branch stubs.

    Abstract translation: 提供用于从数据处理系统的指令高速缓存中驱逐高速缓存行的机制。 机制存储当前高速缓存行中代码的一部分,直接或间接地定位当前高速缓存行中代码部分的调用站点的链接列表。 确定当前高速缓存行是否将从指令高速缓存中逐出。 处理呼叫站点的链接列表以识别具有相关联的分支存根的一个或多个重写的分支指令,其直接或间接地对目标当前高速缓存行中的代码部分。 此外,重写一个或多个重写的分支指令,以基于相关联的分支存根中的信息将一个或多个重写的分支指令恢复到原始状态。

    Building approximate data dependences with a moving window
    2.
    发明授权
    Building approximate data dependences with a moving window 失效
    使用移动窗口构建近似数据依赖关系

    公开(公告)号:US08667260B2

    公开(公告)日:2014-03-04

    申请号:US12717985

    申请日:2010-03-05

    CPC classification number: G06F9/32

    Abstract: Mechanisms for building approximate data dependences using a moving look-back window are provided. The mechanisms track dependence information for memory accesses over iterations of execution of a portion of code. The mechanisms receive a memory access of an iteration of the portion of code, the memory access having an address for access the memory and an access type indicating at least one of a read or a write access type. An entry in a moving look-back window data structure is generated corresponding to a memory location accessed by the memory access. The entry comprises at least an identification of the address, the access type, and an iteration number corresponding to the iteration of the memory access. The moving look-back window data structure is utilized to determine dependence information for memory accesses over a plurality of iterations of the portion of code.

    Abstract translation: 提供了使用移动后视窗构建近似数据依赖关系的机制。 机制跟踪代码的一部分执行迭代的存储器访问的依赖信息。 机构接收代码部分的迭代的存储器访问,存储器访问具有用于访问存储器的地址和指示读取或写入访问类型中的至少一个的访问类型。 对应于由存储器访问访问的存储器位置产生移动后视窗数据结构中的条目。 该条目至少包括对应于存储器访问的迭代的地址的标识,访问类型和迭代号。 移动后视窗数据结构用于确定代码部分的多个迭代中的存储器访问的依赖信息。

    Dynamically rewriting branch instructions to directly target an instruction cache location
    3.
    发明授权
    Dynamically rewriting branch instructions to directly target an instruction cache location 有权
    动态地重写分支指令直接指向指令高速缓存位置

    公开(公告)号:US08627051B2

    公开(公告)日:2014-01-07

    申请号:US13442919

    申请日:2012-04-10

    CPC classification number: G06F9/3806 G06F12/0875

    Abstract: Mechanisms are provided for dynamically rewriting branch instructions in a portion of code. The mechanisms execute a branch instruction in the portion of code. The mechanisms determine if a target instruction of the branch instruction, to which the branch instruction branches, is present in an instruction cache associated with the processor. Moreover, the mechanisms directly branch execution of the portion of code to the target instruction in the instruction cache, without intervention from an instruction cache runtime system, in response to a determination that the target instruction is present in the instruction cache. In addition, the mechanisms redirect execution of the portion of code to the instruction cache runtime system in response to a determination that the target instruction cannot be determined to be present in the instruction cache.

    Abstract translation: 提供了用于在代码的一部分中动态地重写分支指令的机制。 这些机制在代码的一部分中执行分支指令。 这些机制确定分支指令的目标指令是否存在于与处理器相关联的指令高速缓存中。 此外,响应于确定目标指令存在于指令高速缓存中,机制直接将代码部分的执行分支到指令高速缓存中的目标指令,而不需要来自指令高速缓存运行时系统的干预。 此外,响应于确定目标指令不能被确定为存在于指令高速缓存中,这些机制将代码部分的执行重定向到指令高速缓存运行时系统。

    Partitioning programs between a general purpose core and one or more accelerators
    4.
    发明授权
    Partitioning programs between a general purpose core and one or more accelerators 失效
    通用核心和一个或多个加速器之间的分区程序

    公开(公告)号:US08375374B2

    公开(公告)日:2013-02-12

    申请号:US12127395

    申请日:2008-05-27

    CPC classification number: G06F8/45 G06F8/451 G06F8/456

    Abstract: An mechanism is provided for partitioning programs between a general purpose core and one or more accelerators. With the apparatus and method, a compiler front end is provided for converting a program source code in a corresponding high level programming language into an intermediate code representation. This intermediate code representation is provided to an interprocedural optimizer which determines which core processor or accelerator each portion of the program should execute on and partitions the program into sub-programs based on this set of decisions. The interprocedural optimizer may further add instructions to the partitions to coordinate and synchronize the sub-programs as required. Each sub-program is compiled on an appropriate compiler backend for the instruction set architecture of the particular core processor or accelerator selected to execute the sub-program. The compiled sub-programs and then linked to thereby generate an executable program.

    Abstract translation: 提供了一种用于在通用核心和一个或多个加速器之间划分程序的机制。 利用该装置和方法,提供了一种编译器前端,用于将相应高级编程语言中的程序源代码转换为中间代码表示。 该中间代码表示被提供给过程间优化器,其确定程序的每个部分应执行哪个核心处理器或加速器,并且基于该组决策将程序分割成子程序。 过程间优化器可以进一步向分区添加指令以根据需要协调和同步子程序。 每个子程序被编译在用于执行子程序的特定核心处理器或加速器的指令集架构的适当编译器后端上。 编译的子程序然后链接从而生成可执行程序。

    Runtime dependence-aware scheduling using assist thread
    5.
    发明授权
    Runtime dependence-aware scheduling using assist thread 失效
    使用辅助线程的运行时依赖感知调度

    公开(公告)号:US08214831B2

    公开(公告)日:2012-07-03

    申请号:US12435809

    申请日:2009-05-05

    CPC classification number: G06F8/445

    Abstract: A runtime dependence-aware scheduling of dependent iterations mechanism is provided. Computation is performed for one or more iterations of computer executable code by a main thread. Dependence information is determined for a plurality of memory accesses within the computer executable code using modified executable code using a set of dependence threads. Using the dependence information, a determination is made as to whether a subset of a set of uncompleted iterations in the plurality of iterations is capable of being executed ahead-of-time by the one or more available threads in the data processing system. If the subset of the set of uncompleted iterations in the plurality of iterations is capable of being executed ahead-of-time, the main thread is signaled to skip the subset of the set of uncompleted iterations and the set of assist threads is signaled to execute the subset of the set of uncompleted iterations.

    Abstract translation: 提供依赖迭代机制的运行时依赖感知调度。 通过主线程执行计算机可执行代码的一个或多个迭代的计算。 使用一组依赖线程使用经修改的可执行代码来确定计算机可执行代码内的多个存储器访问的依赖性信息。 使用依赖性信息,确定多个迭代中的一组未完成迭代的子集是否能够由数据处理系统中的一个或多个可用线程提前执行。 如果多次迭代中的一组未完成迭代的子集能够在时间之前被执行,则主线程被用信号通知以跳过该组未完成迭代的子集,并且该信号通知该组辅助线程以执行 该组未完成迭代的子集。

    Arranging Binary Code Based on Call Graph Partitioning
    6.
    发明申请
    Arranging Binary Code Based on Call Graph Partitioning 有权
    基于调用图划分二进制代码

    公开(公告)号:US20110321021A1

    公开(公告)日:2011-12-29

    申请号:US12823244

    申请日:2010-06-25

    CPC classification number: G06F8/4442

    Abstract: Mechanisms are provided for arranging binary code to reduce instruction cache conflict misses. These mechanisms generate a call graph of a portion of code. Nodes and edges in the call graph are weighted to generate a weighted call graph. The weighted call graph is then partitioned according to the weights, affinities between nodes of the call graph, and the size of cache lines in an instruction cache of the data processing system, so that binary code associated with one or more subsets of nodes in the call graph are combined into individual cache lines based on the partitioning. The binary code corresponding to the partitioned call graph is then output for execution in a computing device.

    Abstract translation: 提供了用于布置二进制代码以减少指令高速缓存冲突未命中的机制。 这些机制产生一部分代码的调用图。 调用图中的节点和边被加权以生成加权调用图。 然后根据权重,调用图的节点之间的亲和度和数据处理系统的指令高速缓存中的高速缓存行的大小来分配加权调用图,使得与一个或多个节点的子集相关联的二进制代码 调用图被组合到基于分区的各个高速缓存行。 然后输出与划分的调用图对应的二进制代码,以在计算设备中执行。

    Rewriting Branch Instructions Using Branch Stubs
    7.
    发明申请
    Rewriting Branch Instructions Using Branch Stubs 有权
    使用分支存根重写分支指令

    公开(公告)号:US20110321002A1

    公开(公告)日:2011-12-29

    申请号:US12823204

    申请日:2010-06-25

    CPC classification number: G06F8/4436 G06F8/433 G06F8/4442

    Abstract: Mechanisms are provided for rewriting branch instructions in a portion of code. The mechanisms receive a portion of source code having an original branch instruction. The mechanisms generate a branch stub for the original branch instruction. The branch stub stores information about the original branch instruction including an original target address of the original branch instruction. Moreover, the mechanisms rewrite the original branch instruction so that a target of the rewritten branch instruction references the branch stub. In addition, the mechanisms output compiled code including the rewritten branch instruction and the branch stub for execution by a computing device. The branch stub is utilized by the computing device at runtime to determine if execution of the rewritten branch instruction can be redirected directly to a target instruction corresponding to the original target address in an instruction cache of the computing device without intervention by an instruction cache runtime system.

    Abstract translation: 提供了用于在一部分代码中重写分支指令的机制。 该机制接收一部分具有原始分支指令的源代码。 机制为原始分支指令生成分支存根。 分支存根存储关于原始分支指令的信息,包括原始分支指令的原始目标地址。 此外,机制重写原始分支指令,使得重写的分支指令的目标引用分支存根。 此外,机制输出编译代码,包括重写的分支指令和分支存根,以供计算设备执行。 计算设备在运行时利用分支存根来确定重写的分支指令的执行是否可以被直接重定向到与计算设备的指令高速缓存中的原始目标地址相对应的目标指令,而无需指令高速缓存运行时系统的干预 。

    System and method for advanced polyhedral loop transformations of source code in a compiler
    8.
    发明授权
    System and method for advanced polyhedral loop transformations of source code in a compiler 失效
    编译器中源代码的高级多面体循环变换的系统和方法

    公开(公告)号:US08060870B2

    公开(公告)日:2011-11-15

    申请号:US11861449

    申请日:2007-09-26

    CPC classification number: G06F8/447

    Abstract: A system and method for advanced polyhedral loop transformations of source code in a compiler are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.

    Abstract translation: 提供了一种用于编译器中源代码的高级多面体循环变换的系统和方法。 说明性实施例的机制通过提供用于在通过源代码的多面体环转换优化生成的中间表示中对各个语句实例执行代码生成变换的机制来解决已知的基于多面体循环变换的方法的弱点。 这些代码生成转换具有重要的属性,它们不改变中间表示中的语句的程序顺序。 该属性允许通过示例性实施例的新的重新导入路径将代码生成转换的结果提供给程序语句视图中的多面体循环变换机制,用于附加优化。

    Computer program code size partitioning system for multiple memory multi-processing systems
    9.
    发明授权
    Computer program code size partitioning system for multiple memory multi-processing systems 失效
    用于多个存储器多处理系统的计算机程序代码分配系统

    公开(公告)号:US08032873B2

    公开(公告)日:2011-10-04

    申请号:US12337197

    申请日:2008-12-17

    Abstract: The present invention provides for a system for computer program code size partitioning for multiple memory multi-processor systems. At least one system parameter of a computer system comprising one or more disparate processing nodes is identified. Computer program code comprising a program to be run on the computer system is received. A program representation based on received computer program code is generated. At least one single-entry-single-exit (SESE) region is identified based on the whole program representation. At least one SESE region of less than a certain size (store-size-specific) is identified based on identified SESE regions and the at least one system parameter. Each store-size-specific SESE region is grouped into a node-specific subroutine. The non node-specific parts of the computer program code are modified based on the partitioning into node-specific subroutines. The modified computer program code including each node-specific subroutine is compiled based on a specified node characteristic.

    Abstract translation: 本发明提供了一种用于多存储器多处理器系统的计算机程序代码大小划分的系统。 识别包括一个或多个不同处理节点的计算机系统的至少一个系统参数。 接收包括要在计算机系统上运行的程序的计算机程序代码。 生成基于所接收的计算机程序代码的程序表示。 基于整个程序表示来识别至少一个单入口单出口(SESE)区域。 基于所识别的SESE区域和至少一个系统参数来识别小于一定大小(存储大小特定)的至少一个SESE区域。 每个存储大小特定的SESE区域被分组为特定于节点的子例程。 计算机程序代码的非节点特定部分是基于划分到特定于节点的子例程中进行修改的。 基于指定的节点特性编译包括每个特定于节点的子例程的修改的计算机程序代码。

    Compiler method for employing multiple autonomous synergistic processors to simultaneously operate on longer vectors of data
    10.
    发明授权
    Compiler method for employing multiple autonomous synergistic processors to simultaneously operate on longer vectors of data 有权
    使用多个自主协同处理器同时对较长的数据向量进行编译的方法

    公开(公告)号:US07962906B2

    公开(公告)日:2011-06-14

    申请号:US11686400

    申请日:2007-03-15

    CPC classification number: G06F8/456

    Abstract: A compiler includes a mechanism for employing multiple synergistic processors to execute long vectors. The compiler receives a single source program. The compiler identifies vectorizable loop code in the single source program and extracts the vectorizable loop code from the single source program. The compiler then compiles the extracted vectorizable loop code for a plurality of synergistic processors. The compiler also compiles a remainder of the single source program for a principal processor to form an executable main program such that the executable main program controls operation of the executable vectorizable loop code on the plurality of synergistic processors.

    Abstract translation: 编译器包括使用多个协同处理器执行长向量的机制。 编译器接收单个源程序。 编译器在单个源程序中识别可矢量化的循环代码,并从单个源程序中提取可向量循环代码。 编译器然后编译用于多个协同处理器的提取的可矢量化循环码。 编译器还编译用于主处理器的单个源程序的剩余部分以形成可执行主程序,使得可执行主程序控制多个协同处理器上的可执行向量化循环代码的操作。

Patent Agency Ranking