Abstract:
Embodiments include computing devices, systems, and methods for task-based handling of repetitive processes in parallel. At least one processor of the computing device, or a specialized hardware controller, may be configured to partition iterations of a repetitive process and assign the partitions to initialized tasks to be executed in parallel by a plurality of processor cores. Upon completing a task, remaining divisible partitions of the repetitive process of ongoing tasks may be subpartitioned and assigned to the ongoing task, and the completed task or a newly initialized task. Information about the iteration space for a repetitive process may be stored in a descriptor table, and status information for all partitions of a repetitive process stored in a status table. Each processor core may have an associated local table that tracks iteration execution of each task, and is synchronized with the status table.
Abstract:
Embodiments include computing devices, apparatus, and methods implemented by the apparatus for implementing speculative loop iteration partitioning (SLIP) for heterogeneous processing devices. A computing device may receive iteration information for a first partition of iterations of a repetitive process and select a SLIP heuristic based on available SLIP information and iteration information for the first partition. The computing device may determine a split value for the first partition using the SLIP heuristic, and partition the first partition using the split value to produce a plurality of next partitions.
Abstract:
Embodiments include computing devices, apparatus, and methods implemented by the apparatus for implementing shared virtual index translation on a computing device. The computing device may receive a base virtual address for storing an output of a kernel function execution to a dedicated memory and determine whether the virtual address is in a range of virtual addresses for a privatized output buffer within the dedicated memory, which may be smaller than the dedicated memory. The computing device may calculate a first modified physical address using a physical address mapped to the base virtual address and an offset of a first processing device associated with the dedicated memory in response to determining that the base virtual address is in the range of virtual addresses. The computing device may store the output of the kernel function execution to the privatized output buffer at the first modified physical address.
Abstract:
Various embodiments include methods for reclaiming memory in a computing device that may include storing a first pointer pointing to a first memory location storing the beginning of a data structure in which a plurality of threads executing on the computing device may concurrently access the data structure and storing a second pointer pointing to the current beginning of the data structure. In response to performing an operation on the data structure that changes the location of the beginning of the data structure from the first memory location to a second memory location, the second pointer may be updated to point to the second memory location. In response to determining that memory allocated to the data structure may be reclaimed, memory allocated to the data structure, including memory located at the first memory location pointed to by the first pointer, may be reclaimed.