Abstract:
System and methods for the parallelization of software applications are described. In some embodiments, a compiler may automatically identify within source code dependencies of a function called by another function. A persistent database may be generated to store identified dependencies. When calls the function are encountered within the source code, the persistent database may be checked, and a parallelized implementation of the function may be employed dependent upon the dependency indicated in the persistent database.
Abstract:
A processor may include a vector functional unit that supports concurrent operations on multiple data elements of a maximum element size. The functional unit may also support concurrent execution of multiple distinct vector program instructions, where the multiple vector instructions each operate on multiple data elements of less than the maximum element size.
Abstract:
A processor may generate a result vector when executing a RunningShiftForDivide1P or RunningShiftForDivide2P instruction. For example, upon executing a RunningShiftForDivide1P/2P instruction, the processor may receive a first input vector and a second input vector. The processor then may record a base value from an element at a key element position in the first input vector. Next, when generating the result vector, for each active element in the result vector to the right of the key element position, the processor may generate a shifted base value using shift values from the second input vector. The processor then may correct the shifted base value when a predetermined condition is met. Next, the processor may set the element of the result vector equal to the shifted base value.
Abstract:
In an embodiment, a processor may implement a vector instruction set including a conditional termination instruction (CTerm). The CTerm instruction may take two source operands and compare them according to a specified condition, updating flags as a result of the instruction. The flags may be used to affect predicate vector generation to control vectorized loop execution. In an embodiment, the vector instruction set may also include a conditional termination predicate instruction (CTPred). The CTPred instruction may take a pair of predicate vectors and a set of flags as operands, and may generate: a predicate vector to control parallel processing of vector elements, and a set of flags to control further loop processing. Either instruction may be used to efficiently manage vector loops in various embodiments, or the instructions may be used together.
Abstract:
In an embodiment, a processor may implement a conditional stop instruction that includes a first predicate vector identifying the active elements of the instruction, a second predicate vector indicating true and false results for a conditional expression within a loop that is being vectorized, and a source operand specifying which combinations in the true and false results may indicate a dependency. The conditional stop instruction may generate a vector result indicating vector elements that have a dependency on a prior vector element, as well as an identification of which element position the dependency is on. More particularly, dependencies may be detected only on active elements as indicated by the first predicate vector. False dependencies that may occur due to inactive elements may be avoided, which may improve performance and/or provide for correct functional operation.
Abstract:
In an embodiment, a processor may implement a vector hazard check instruction to detect dependencies between vector memory operations based on the addresses of the vectors accessed by the vector memory operations. The addresses may be specified via a base address and a vector of indexes for each vector. In an embodiment, one of the base addresses may be an implied (or assumed) zero address, reducing the number of operands of the hazard check instruction.
Abstract:
A processor may include a vector functional unit that supports concurrent operations on multiple data elements of a maximum element size. The functional unit may also support concurrent execution of multiple distinct vector program instructions, where the multiple vector instructions each operate on multiple data elements of less than the maximum element size.
Abstract:
Systems, methods, and devices for dynamically mapping and remapping memory when a portion of memory is activated or deactivated are provided. In accordance with an embodiment, an electronic device may include several memory banks, one or more processors, and a memory controller. The memory banks may store data in hardware memory locations and may be independently deactivated. The processors may request the data using physical memory addresses, and the memory controller may translate the physical addresses to hardware memory locations. The memory controller may use a first memory mapping function when a first number of memory banks is active and a second memory mapping function when a second number is active. When one of the memory banks is to be deactivated, the memory controller may copy data from only the memory bank that is to be deactivated to the active remainder of memory banks.
Abstract:
Systems, apparatuses and methods for utilizing enhanced predicate registers which specify the element width and which elements are to be processed. The predicate size is dynamic, depending on the contents of the enhanced predicate register used for an instruction rather than being a static quality of a specific instruction. Specifying the element size in the enhanced predicate registers results in fewer instructions in an instruction set.
Abstract:
A method of predicting a backward conditional branch instruction used in a vector partitioning loop includes detecting the first conditional branch instruction that occurs after consumption of a dependency index vector by a predicate generating instruction. The dependency index vector includes information indicative of a number of iterations of the vector partitioning loop, and the conditional branch instruction may branch backwards when taken. The conditional branch instruction may then be predicted to be taken a number of times that is determined by the dependency index vector.