Abstract:
Examples are described for a device to receive intermediate code that was generated from compiling source code of an application. The intermediate code includes information generated from the compiling that identifies a hierarchical structure of lower level sub-routines in higher level sub-routines, and the lower level sub-routines are defined in the source code of the application to execute more frequently than the higher level sub-routines that identify the lower level sub-routines. The device is configured to compile the intermediate code to generate object code based on the information that identifies lower level sub-routines in higher level sub-routines, and store the object code.
Abstract:
Techniques are described for copying data only from a subset of memory locations allocated to a set of instructions to free memory locations for higher priority instructions to execute. Data from a dynamic portion of one or more general purpose registers (GPRs) allocated to the set of instructions may be copied and stored to another memory unit while data from a static portion of the one or more GPRs allocated to the set of instructions may not be copied and stored to another memory unit.
Abstract:
Examples are described for a device to receive intermediate code that was generated from compiling source code of an application. The intermediate code includes information generated from the compiling that identifies a hierarchical structure of lower level sub-routines in higher level sub-routines, and the lower level sub-routines are defined in the source code of the application to execute more frequently than the higher level sub-routines that identify the lower level sub-routines. The device is configured to compile the intermediate code to generate object code based on the information that identifies lower level sub-routines in higher level sub-routines, and store the object code.
Abstract:
In an example, a method for speculative scalarization may include receiving, by a first processor, vector code. The method may include determining, during compilation of the vector code, whether at least one instruction of the plurality of instructions is a speculatively uniform instruction. The method may include generating, during complication of the vector code, uniformity detection code for the at least one speculatively uniform instruction. The uniformity detection code, when executed, may be configured to determine whether the at least one speculatively uniform instruction is uniform during runtime. The method may include generating, during complication of the vector code, scalar code by scalarizing the at least one speculatively uniform instruction. The scalar code may be configured to be compiled for execution by the first processor, a scalar processor, a scalar processing unit of the vector processor, or a vector pipeline of the vector processor.
Abstract:
In an example, a method for speculative scalarization may include receiving, by a first processor, vector code. The method may include determining, during compilation of the vector code, whether at least one instruction of the plurality of instructions is a speculatively uniform instruction. The method may include generating, during complication of the vector code, uniformity detection code for the at least one speculatively uniform instruction. The uniformity detection code, when executed, may be configured to determine whether the at least one speculatively uniform instruction is uniform during runtime. The method may include generating, during complication of the vector code, scalar code by scalarizing the at least one speculatively uniform instruction. The scalar code may be configured to be compiled for execution by the first processor, a scalar processor, a scalar processing unit of the vector processor, or a vector pipeline of the vector processor.
Abstract:
Techniques are described for copying data only from a subset of memory locations allocated to a set of instructions to free memory locations for higher priority instructions to execute. Data from a dynamic portion of one or more general purpose registers (GPRs) allocated to the set of instructions may be copied and stored to another memory unit while data from a static portion of the one or more GPRs allocated to the set of instructions may not be copied and stored to another memory unit.