摘要:
A computer-implemented method, system and computer program product for improving the performance of a program that manipulates two vectors of data. It is determined whether the program contains one of the following patterns: a first pattern corresponding to v0.rearrange(s, v1); a second pattern corresponding to v0.blend(v1, m); and a third pattern corresponding to v0.rearrange(s).blend(v1.rearrange(s), m). Upon identifying code written as the first pattern in the program, the first pattern is rewritten and replaced with the second or third pattern if the execution time of the program with the second or third pattern is less than the execution time of the program with the first program. In a similar manner, upon identifying code written as the second or third pattern in the program, the second or third pattern is rewritten and replaced with the first pattern if the execution time of the program can be improved.
摘要:
A method includes identifying a code portion that accesses a primitive value in a user-defined function included in a user program, converting the code portion and an argument in a manner to directly reference an internal data representation of the user program, and generating a code for calling the user-defined function converted by the conversion.
摘要:
A method for improving performance of a system including a first processor and a second processor includes obtaining a code region specified to be executed on the second processor, the code region including a plurality of instructions, calculating a performance improvement of executing at least one of the plurality of instructions included in the code region on the second processor over executing the at least one instruction on the first processor, removing the at least one instruction from the code region in response to a condition including that the performance improvement does not exceed a first threshold, and repeating the calculating and the removing to produce a modified code region specified to be executed on the second processor.
摘要:
A method is provided for buffer allocation on a graphics processing unit. The method includes analyzing, by the graphics processing unit, a program to be executed on the graphics processing unit to determine, for an object in the program, a set of elements in the object that are designated to be accessed during an execution of the program. The method further includes allocating, by the graphics processing unit, a placement of the object in a device buffer on the graphics processing unit based on the set of elements to minimize a number of memory accesses during the execution of the program.
摘要:
An allocation system and a method for allocating an architectural register in a system having one or more mapping tables. When the allocation system detects a plurality of available architectural registers to an allocation target virtual register, it identifies adjacent instructions to all instructions having the allocation target virtual register in its destination operand, counts the number of uses of the architectural register appearing in the destination operand for each architectural register, summing the number of uses for each architectural register for each entry group in one or more mapping tables having the same assignment rule for correlations with the architectural registers, calculating the total of the numbers of uses of entries for each entry group, and allocating the architectural register to the allocation target virtual register such that the total of the numbers of uses of entries for each entry group approaches uniformity.
摘要:
A computer-implemented method and a computer program product are provided for converting a first object having a first data format to a second object having a second data format that is different from the first format in that the second data format requires an object header. The method includes adding the object header to the first object. The method further includes returning, as a pointer, an address of the added object header to a user defined function that uses the second object. The first object lacks pointers to other objects, and does not escape.
摘要:
Computer-implemented methods are provided for compiling a parallel loop and generating Graphics Processing Unit (GPU) code and Central Processing Unit (CPU) code for writing an array for the GPU and the CPU. A method includes compiling the parallel loop by (i) checking, based on a range of array elements to be written, whether the parallel loop can update all of the array elements and (ii) checking whether an access order of the array elements that the parallel loop reads or writes is known at compilation time. The method further includes determining an approach, from among a plurality of available approaches, to generate the CPU code and the GPU code based on (i) the range of the array elements to be written and (ii) the access order to the array elements in the parallel loop.
摘要:
A method and system are provided for executing, by a processor including a read-only cache, a program having a plurality of variables including a first variable and a second variable. Each variable is for executing a respective read operation or a respective write operation for an object. The method includes providing a first code that uses the read-only cache and a second code that does not use the read-only cache. The method further includes determining, by the processor, whether a first object designated by the first variable is aliased or not aliased with a second object designated by the second variable. The method also includes executing, by the processor, the first code when the first object is not aliased with the second object, and the second code when the first object is aliased with the second object.
摘要:
An allocation system and a method for allocating an architectural register in a system having one or more mapping tables. When the allocation system detects a plurality of available architectural registers to an allocation target virtual register, it identifies adjacent instructions to all instructions having the allocation target virtual register in its destination operand, counts the number of uses of the architectural register appearing in the destination operand for each architectural register, summing the number of uses for each architectural register for each entry group in one or more mapping tables having the same assignment rule for correlations with the architectural registers, calculating the total of the numbers of uses of entries for each entry group, and allocating the architectural register to the allocation target virtual register such that the total of the numbers of uses of entries for each entry group approaches uniformity.
摘要:
An allocation system and a method for allocating an architectural register in a system having one or more mapping tables. When the allocation system detects a plurality of available architectural registers to an allocation target virtual register, it identifies adjacent instructions to all instructions having the allocation target virtual register in its destination operand, counts the number of uses of the architectural register appearing in the destination operand for each architectural register, summing the number of uses for each architectural register for each entry group in one or more mapping tables having the same assignment rule for correlations with the architectural registers, calculating the total of the numbers of uses of entries for each entry group, and allocating the architectural register to the allocation target virtual register such that the total of the numbers of uses of entries for each entry group approaches uniformity.