AD for an Array Language with Nested Parallelism
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
AD for an Array Language with Nested Parallelism. / Schenck, Robert; Rønning, Ola; Henriksen, Troels; Oancea, Cosmin E.
Proceedings of SC 2022: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, 2022. (International Conference for High Performance Computing, Networking, Storage and Analysis, SC, Vol. 2022-November).Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - AD for an Array Language with Nested Parallelism
AU - Schenck, Robert
AU - Rønning, Ola
AU - Henriksen, Troels
AU - Oancea, Cosmin E.
N1 - Publisher Copyright: © 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - We present a technique for applying reverse mode automatic differentiation (AD) on a non-recursive second-order functional array language that supports nested parallelism and is primarily aimed at efficient GPU execution. The key idea is to eliminate the need for a tape by relying on redundant execution to bring into each new scope all program variables that may be needed by the differentiated code. Efficient execution is enabled by the observation that perfectly nested scopes do not introduce re-execution and that such perfect nests can be readily produced by application of known compiler transformations. Our technique differentiates loops and bulk-parallel operators-e.g., map, reduce(-by-index), scan, and scatter-by specific rewrite rules and aggressively optimizes the resulting nested-parallel code. We report an evaluation that compares with established AD solutions and demonstrates competitive performance on ten common benchmarks from recent applied AD literature.
AB - We present a technique for applying reverse mode automatic differentiation (AD) on a non-recursive second-order functional array language that supports nested parallelism and is primarily aimed at efficient GPU execution. The key idea is to eliminate the need for a tape by relying on redundant execution to bring into each new scope all program variables that may be needed by the differentiated code. Efficient execution is enabled by the observation that perfectly nested scopes do not introduce re-execution and that such perfect nests can be readily produced by application of known compiler transformations. Our technique differentiates loops and bulk-parallel operators-e.g., map, reduce(-by-index), scan, and scatter-by specific rewrite rules and aggressively optimizes the resulting nested-parallel code. We report an evaluation that compares with established AD solutions and demonstrates competitive performance on ten common benchmarks from recent applied AD literature.
KW - automatic differentiation
KW - compilers
KW - functional data parallel language
KW - GPGPU
U2 - 10.1109/SC41404.2022.00063
DO - 10.1109/SC41404.2022.00063
M3 - Article in proceedings
AN - SCOPUS:85149327504
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - Proceedings of SC 2022
PB - IEEE Computer Society Press
T2 - 2022 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2022
Y2 - 13 November 2022 through 18 November 2022
ER -
ID: 344653603