Activation Compression of Graph Neural Networks Using Block-Wise Quantization with Improved Variance Minimization

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Standard

Activation Compression of Graph Neural Networks Using Block-Wise Quantization with Improved Variance Minimization. / Eliassen, Sebastian; Selvan, Raghavendra.

2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings. IEEE, 2024. s. 7430-7434.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Harvard

Eliassen, S & Selvan, R 2024, Activation Compression of Graph Neural Networks Using Block-Wise Quantization with Improved Variance Minimization. i 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings. IEEE, s. 7430-7434, 49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024, Seoul, Sydkorea, 14/04/2024. https://doi.org/10.1109/ICASSP48485.2024.10446393

APA

Eliassen, S., & Selvan, R. (2024). Activation Compression of Graph Neural Networks Using Block-Wise Quantization with Improved Variance Minimization. I 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings (s. 7430-7434). IEEE. https://doi.org/10.1109/ICASSP48485.2024.10446393

Vancouver

Eliassen S, Selvan R. Activation Compression of Graph Neural Networks Using Block-Wise Quantization with Improved Variance Minimization. I 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings. IEEE. 2024. s. 7430-7434 https://doi.org/10.1109/ICASSP48485.2024.10446393

Author

Eliassen, Sebastian ; Selvan, Raghavendra. / Activation Compression of Graph Neural Networks Using Block-Wise Quantization with Improved Variance Minimization. 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings. IEEE, 2024. s. 7430-7434

Bibtex

@inproceedings{2397a9e5aebf4b5ab4221206c379acb4,

title = "Activation Compression of Graph Neural Networks Using Block-Wise Quantization with Improved Variance Minimization",

abstract = "Efficient training of large-scale graph neural networks (GNNs) has been studied with a specific focus on reducing their memory consumption. Work by Liu et al. (2022) proposed extreme activation compression (EXACT) which demonstrated drastic reduction in memory consumption by performing quantization of the intermediate activation maps down to using INT2 precision. They showed little to no reduction in performance while achieving large reductions in GPU memory consumption. In this work, we present an improvement to the EXACT strategy by using block-wise quantization of the intermediate activations. We experimentally analyze different block sizes and show further reduction in memory consumption (> 15%), and runtime speedup per epoch (≈ 5%) even when performing extreme extents of quantization with similar performance trade-offs as with the original EXACT. Further, we present a correction to the assumptions on the distribution of intermediate activation maps in EXACT (assumed to be uniform) and show improved variance estimations of the quantization and dequantization steps.",

keywords = "activation compression, deep learning, efficient machine learning, graph neural networks, quantization",

author = "Sebastian Eliassen and Raghavendra Selvan",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 ; Conference date: 14-04-2024 Through 19-04-2024",

year = "2024",

doi = "10.1109/ICASSP48485.2024.10446393",

language = "English",

pages = "7430--7434",

booktitle = "2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings",

publisher = "IEEE",

}

RIS

TY - GEN

T1 - Activation Compression of Graph Neural Networks Using Block-Wise Quantization with Improved Variance Minimization

AU - Eliassen, Sebastian

AU - Selvan, Raghavendra

PY - 2024

Y1 - 2024

N2 - Efficient training of large-scale graph neural networks (GNNs) has been studied with a specific focus on reducing their memory consumption. Work by Liu et al. (2022) proposed extreme activation compression (EXACT) which demonstrated drastic reduction in memory consumption by performing quantization of the intermediate activation maps down to using INT2 precision. They showed little to no reduction in performance while achieving large reductions in GPU memory consumption. In this work, we present an improvement to the EXACT strategy by using block-wise quantization of the intermediate activations. We experimentally analyze different block sizes and show further reduction in memory consumption (> 15%), and runtime speedup per epoch (≈ 5%) even when performing extreme extents of quantization with similar performance trade-offs as with the original EXACT. Further, we present a correction to the assumptions on the distribution of intermediate activation maps in EXACT (assumed to be uniform) and show improved variance estimations of the quantization and dequantization steps.

AB - Efficient training of large-scale graph neural networks (GNNs) has been studied with a specific focus on reducing their memory consumption. Work by Liu et al. (2022) proposed extreme activation compression (EXACT) which demonstrated drastic reduction in memory consumption by performing quantization of the intermediate activation maps down to using INT2 precision. They showed little to no reduction in performance while achieving large reductions in GPU memory consumption. In this work, we present an improvement to the EXACT strategy by using block-wise quantization of the intermediate activations. We experimentally analyze different block sizes and show further reduction in memory consumption (> 15%), and runtime speedup per epoch (≈ 5%) even when performing extreme extents of quantization with similar performance trade-offs as with the original EXACT. Further, we present a correction to the assumptions on the distribution of intermediate activation maps in EXACT (assumed to be uniform) and show improved variance estimations of the quantization and dequantization steps.

KW - activation compression

KW - deep learning

KW - efficient machine learning

KW - graph neural networks

KW - quantization

U2 - 10.1109/ICASSP48485.2024.10446393

DO - 10.1109/ICASSP48485.2024.10446393

M3 - Article in proceedings

AN - SCOPUS:85188900287

SP - 7430

EP - 7434

BT - 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings

PB - IEEE

T2 - 49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024

Y2 - 14 April 2024 through 19 April 2024

ER -

ID: 395155271