Bits-Back Coding as Error Correction with Side Information
Gergely Flamich
09/10/2024
gergely-flamich.github.io
In Collaboration With
Bits-back Coding
\begin{align*}
&\text{error correction} \\
&\quad\quad\quad\quad+\text{ invertible sampling } \\
&\quad\quad\quad\quad\quad\quad\quad\quad= \text{bits-back coding}
\end{align*}
Error correction
Image from [1].
I will pick \(p(y \mid x)\) instead of assuming it is fixed.
Repetition code
\begin{align*}
0 &\mapsto 000 \\
1 &\mapsto 111
\end{align*}
can perfectly decode if
\begin{align*}
y = x + \epsilon, \quad \epsilon \in B_1(0)
\end{align*}
a puzzle
- source \(P\) over \(\Omega\)
- entropy code \(\mathtt{enc}_P: \Omega \to \{0, 1\}^*\)
If \(X \sim P\), then \(\mathtt{enc}_P(X) \sim ?\)
the "solution"
- \(\mathbb{E}[\lvert\mathtt{enc}_P(X)\rvert] \approx \mathbb{H}[X]\)
- \(\mathbb{H}[\mathtt{enc}_P(X)] = \mathbb{H}[X]\)
\(\therefore \mathtt{enc}_P(X) \stackrel{approx}{\sim} \mathrm{Bern}(1/2)^{\otimes \mathbb{H}[X]}\)
invertible sampling
- extend \(\mathtt{enc}_P: \{0, 1\}^* \times \Omega \to \{0, 1\}^*\)
- \(\mathtt{enc}_P^{-1} = \mathtt{dec}_P\)
- let \(m = (b_1, b_2, \dots), \quad b_i \sim \mathrm{Bern}(1/2)\)
If \(m', X' \gets \mathtt{dec}_P(m)\), what is \(X' \sim ?\)
\(X' \sim P\)!
bits-back coding
Error correction: For source \(P_X\) design \(P_{Y \mid X}\)
Encoding, given \(m\) and \(X \sim P_X\):
- Decode: \(m', y \gets \mathtt{dec}_{\color{red} P_{Y \mid X}}(m)\)
- Encode: \(m'' \gets \mathtt{enc}_{\color{blue} P_{Y}}(m', y)\)
Decoding, given \(m''\):
- Decode: \(m', y \gets \mathtt{dec}_{\color{blue} P_{Y}}(m'')\)
- Error correction: recover \(x\) from \(y\)
- Encode: \(m \gets \mathtt{enc}_{\color{red} P_{Y \mid X}}(m', y)\)
bits-back coding efficiency
\begin{align*}
\vert m'' \vert - \vert m \vert
&\approx -\log P_{Y}(y) - \big(-\log P_{Y \mid X}(y \mid x)\big) \\
&= -\log \frac{P_X(x)}{P_{X \mid Y}(x \mid y)} \\
&= - \log P_X(x)
\end{align*}
What have we gained?
Warm-up: ANS
ANS - Asymmetric Numeral Systems [2, 3]
- \(x \in \{0, 1, 2\}\)
- \(P_X(x) = \frac{|\mathfrak{z}(x)|}{n}\)
Set \((Y \mid X=x) \sim \mathrm{Unif}(\mathfrak{z}(x))\)
Case study 1: unordered data
Have:
Want to encode:
Shuffle coding [4]
Idea: canonical representative \(\leftrightarrow\) equivalence class ("sorting")
Case study 2: rotational symmetries in LLMs [5]
\(f(x \mid W) = f(x \mid QW)\)
SVD: error correction for matrices
\begin{align*}
W = UDV^\top
\end{align*}
Unique up to a sign change:
\begin{gather*}
UDV^\top = \sigma UDV^\top \sigma \\
\text{where } \sigma = \mathrm{diag}(\pm 1, \pm 1, \dots, \pm 1)
\end{gather*}
bits-back from llms
results
some notes caveats
- \(\sigma\) is the side information
- \(P_{Y \mid X}\) is supported on a \(P_Y\) null set
- Numerical errors in SVD
- Use block coding!
Case study 3: bits-back + relative entropy coding
Relative entropy coding:
- \(x, y \sim P_{x, y}\)
- shared randomness \(z\)
- want: encode \(y \sim P_{y \mid x}\)
Idea: What happens if we set \(z\) as the side information in bits-back coding?
dithered quantization
- \(c \in \mathbb{R}\)
- \(U, U' \sim \mathrm{Unif}(-1/2, 1/2)\)
\begin{align*}
\lfloor c + U \rceil - U \sim c + U'
\end{align*}
DQ for REC
For \(c \in [0, b)\) for \(b \in \mathbb{N}\):
- encode: set \(K = \lfloor c + U \rceil\), encode \(K\)
- decode: set \(y = K - U\)
equivalent to:
- \(P_y = \mathrm{Unif}(0, 1)\)
- \(P_{y \mid x} \mathrm{Unif}(x - 1/(2b) , x + 1/(2b))\)
Could we extend to uniform target with width \(a/b\)?
bits-back quantization
References I
- [1] Cover, T. M., & Thomas, J. A. (1991). Elements of information theory.
- [2] Duda, J. (2009). Asymmetric numeral systems. arXiv preprint arXiv:0902.0271.
- [3] Bamler, R. (2022). Understanding entropy coding with asymmetric numeral systems (ans): a statistician's perspective. arXiv preprint arXiv:2201.01741
References II
- [4] Kunze, J., Severo, D., Zani, G., van de Meent, J. W., & Townsend, J. (2024). Entropy coding of unordered data structures. arXiv preprint arXiv:2408.08837.
- [5] He, J., Flamich, G., & Hernández-Lobato, J. M. (2024). Getting Free Bits Back from Rotational Symmetries in LLMs. arXiv preprint arXiv:2410.01309.
- [6] Flamich, G., & Theis, L. (2023, June). Adaptive greedy rejection sampling. In 2023 IEEE International Symposium on Information Theory (ISIT) (pp. 454-459). IEEE.