Abstract:
An approach and a method for efficient execution of nested map-reduce framework workloads to take advantage of the combined execution of central processing units (CPUs) and graphics processing units (GPUs) and lower latency of data access in accelerated processing units (APUs) is described. In embodiments, metrics are generated to determine whether a map or reduce function is more efficiently processed on a CPU or a GPU. A first metric is based on ratio of a number of branch instructions to a number of non-branch instructions, and a second metric is based on the comparison of execution times on each of the CPU and the GPU. Selecting execution of map and reduce functions based on the first and second metrics result in accelerated computations. Some embodiments include scheduling pipelined executions of functions on the CPU and functions on the GPU concurrently to achieve power-efficient nested map reduce framework execution.