Abstract:
Embodiments relate to joining data across a parallel database and a distributed processing system. Aspects include receiving a query on data stored in parallel database T and data stored in distributed processing system L, applying local query predicates and projection to data T to create T′, and applying local query predicates and projection to L to create L′. Based on determining that a size of L′ is less than a size of T′ and that the size of L′ is less than a first threshold, transmitting L′ to the parallel database and executing a join between T′ and L′. Based on determining that a number of the nodes distributed processing system n multiplied by the size of T′ is less than the size of L′ and that the size of T′ is less than a second threshold; transmitting T′ to the distributed processing system and executing a join between T′ and L′.
Abstract:
Embodiments relate to joining data across a parallel database and a distributed processing system. Aspects include receiving a query on data stored in parallel database T and data stored in distributed processing system L, applying local query predicates and projection to data T to create T′, and applying local query predicates and projection to L to create L′. Based on determining that a size of L′ is less than a size of T′ and that the size of L′ is less than a first threshold, transmitting L′ to the parallel database and executing a join between T′ and L′. Based on determining that a number of the nodes distributed processing system n multiplied by the size of T′ is less than the size of L′ and that the size of T′ is less than a second threshold; transmitting T′ to the distributed processing system and executing a join between T′ and L′.