Design and GPGPU performance of Futhark's redomap construct
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Standard
Design and GPGPU performance of Futhark's redomap construct. / Henriksen, Troels; Larsen, Ken Friis; Oancea, Cosmin Eugen.
Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming. Association for Computing Machinery, 2016. s. 17-24.Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Design and GPGPU performance of Futhark's redomap construct
AU - Henriksen, Troels
AU - Larsen, Ken Friis
AU - Oancea, Cosmin Eugen
N1 - Conference code: 3
PY - 2016
Y1 - 2016
N2 - This paper presents and evaluates a novel second-order operator, named 'redomap', that stems from 'map'-'reduce' compositions in the context of the purely-functional array language Futhark, which is aimed at efficient GPGPU execution. Main contributions are: First, we demonstrate an aggressive fusion technique that is centered on the 'redomap' operator. Second, we present a compilation technique for 'redomap' that efficiently sequentializes the excess parallelism and ensures coalesced access to global memory, even for non-commutative 'reduce' operators. Third, a detailed performance evaluation shows that Futhark's automatically generated code matches or exceeds performance of hand-tuned Thrust code. Our evaluation infrastructure is publicly available and we encourage replication and verification of our results.
AB - This paper presents and evaluates a novel second-order operator, named 'redomap', that stems from 'map'-'reduce' compositions in the context of the purely-functional array language Futhark, which is aimed at efficient GPGPU execution. Main contributions are: First, we demonstrate an aggressive fusion technique that is centered on the 'redomap' operator. Second, we present a compilation technique for 'redomap' that efficiently sequentializes the excess parallelism and ensures coalesced access to global memory, even for non-commutative 'reduce' operators. Third, a detailed performance evaluation shows that Futhark's automatically generated code matches or exceeds performance of hand-tuned Thrust code. Our evaluation infrastructure is publicly available and we encourage replication and verification of our results.
U2 - 10.1145/2935323.2935326
DO - 10.1145/2935323.2935326
M3 - Article in proceedings
SP - 17
EP - 24
BT - Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming
PB - Association for Computing Machinery
T2 - 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming
Y2 - 14 June 2016 through 14 June 2016
ER -
ID: 164443159