Information Bottleneck: Exact Analysis of (Quantized) Neural Networks

Publikation: Working paperPreprintForskning

Standard

Information Bottleneck : Exact Analysis of (Quantized) Neural Networks. / Lorenzen, Stephan Sloth; Igel, Christian; Nielsen, Mads.

arXiv.org, 2022.

Publikation: Working paperPreprintForskning

Harvard

Lorenzen, SS, Igel, C & Nielsen, M 2022 'Information Bottleneck: Exact Analysis of (Quantized) Neural Networks' arXiv.org.

APA

Lorenzen, S. S., Igel, C., & Nielsen, M. (2022). Information Bottleneck: Exact Analysis of (Quantized) Neural Networks. arXiv.org.

Vancouver

Lorenzen SS, Igel C, Nielsen M. Information Bottleneck: Exact Analysis of (Quantized) Neural Networks. arXiv.org. 2022.

Author

Lorenzen, Stephan Sloth ; Igel, Christian ; Nielsen, Mads. / Information Bottleneck : Exact Analysis of (Quantized) Neural Networks. arXiv.org, 2022.

Bibtex

@techreport{02d9011ef0224309b8bf26d58aa38f74,
title = "Information Bottleneck: Exact Analysis of (Quantized) Neural Networks",
abstract = "The information bottleneck (IB) principle has been suggested as a way to analyze deep neural networks. The learning dynamics are studied by inspecting the mutual information (MI) between the hidden layers and the input and output. Notably, separate fitting and compression phases during training have been reported. This led to some controversy including claims that the observations are not reproducible and strongly dependent on the type of activation function used as well as on the way the MI is estimated. Our study confirms that different ways of binning when computing the MI lead to qualitatively different results, either supporting or refusing IB conjectures. To resolve the controversy, we study the IB principle in settings where MI is non-trivial and can be computed exactly. We monitor the dynamics of quantized neural networks, that is, we discretize the whole deep learning system so that no approximation is required when computing the MI. This allows us to quantify the information flow without measurement errors. In this setting, we observed a fitting phase for all layers and a compression phase for the output layer in all experiments; the compression in the hidden layers was dependent on the type of activation function. Our study shows that the initial IB results were not artifacts of binning when computing the MI. However, the critical claim that the compression phase may not be observed for some networks also holds true.",
author = "Lorenzen, {Stephan Sloth} and Christian Igel and Mads Nielsen",
year = "2022",
language = "English",
publisher = "arXiv.org",
type = "WorkingPaper",
institution = "arXiv.org",

}

RIS

TY - UNPB

T1 - Information Bottleneck

T2 - Exact Analysis of (Quantized) Neural Networks

AU - Lorenzen, Stephan Sloth

AU - Igel, Christian

AU - Nielsen, Mads

PY - 2022

Y1 - 2022

N2 - The information bottleneck (IB) principle has been suggested as a way to analyze deep neural networks. The learning dynamics are studied by inspecting the mutual information (MI) between the hidden layers and the input and output. Notably, separate fitting and compression phases during training have been reported. This led to some controversy including claims that the observations are not reproducible and strongly dependent on the type of activation function used as well as on the way the MI is estimated. Our study confirms that different ways of binning when computing the MI lead to qualitatively different results, either supporting or refusing IB conjectures. To resolve the controversy, we study the IB principle in settings where MI is non-trivial and can be computed exactly. We monitor the dynamics of quantized neural networks, that is, we discretize the whole deep learning system so that no approximation is required when computing the MI. This allows us to quantify the information flow without measurement errors. In this setting, we observed a fitting phase for all layers and a compression phase for the output layer in all experiments; the compression in the hidden layers was dependent on the type of activation function. Our study shows that the initial IB results were not artifacts of binning when computing the MI. However, the critical claim that the compression phase may not be observed for some networks also holds true.

AB - The information bottleneck (IB) principle has been suggested as a way to analyze deep neural networks. The learning dynamics are studied by inspecting the mutual information (MI) between the hidden layers and the input and output. Notably, separate fitting and compression phases during training have been reported. This led to some controversy including claims that the observations are not reproducible and strongly dependent on the type of activation function used as well as on the way the MI is estimated. Our study confirms that different ways of binning when computing the MI lead to qualitatively different results, either supporting or refusing IB conjectures. To resolve the controversy, we study the IB principle in settings where MI is non-trivial and can be computed exactly. We monitor the dynamics of quantized neural networks, that is, we discretize the whole deep learning system so that no approximation is required when computing the MI. This allows us to quantify the information flow without measurement errors. In this setting, we observed a fitting phase for all layers and a compression phase for the output layer in all experiments; the compression in the hidden layers was dependent on the type of activation function. Our study shows that the initial IB results were not artifacts of binning when computing the MI. However, the critical claim that the compression phase may not be observed for some networks also holds true.

M3 - Preprint

BT - Information Bottleneck

PB - arXiv.org

ER -

ID: 300681520