Data Compression with Relative Entropy Coding


Gergely Flamich

04/02/2025

gergely-flamich.github.io

what is data compression?

lossless compression

  • Source: \(x \sim P\)
  • Code: \(C_P(x)\)
  • Decode: \[ C_P^{-1}(C_P(x)) = x \]
  • Measure of efficiency: \[ \mathbb{E}[\vert C_P(x) \vert] \]

lossy compression

  • Encode: \(C_P(x)\)
  • Decode: \[D_P(C_P(x)) = \hat{x} \approx x\]
  • Measures of efficiency:
    • Rate: \(\mathbb{E}[\vert C_P(x) \vert]\)
    • Distortion: \(\mathbb{E}[\Delta(x, \hat{x})]\)

transform coding

transform_coding.png

Usual transform: \(Q(f(x))\)

what is relative entropy coding?

as a lossy compression mechanism

💡 Idea: perturb instead of quantise

  • Source: \(x \sim P\)
  • Shared randomness: \(z \sim P_z\)
  • Code: \(C_P(x, z)\)
  • Decode \[ D_P(C_P(x, z), z) \sim P_{y \mid x}\]

''implementing'' relative entropy coding

rec_sketch.png

why care?

learned transform coding

transform_coding.png

can use reparameterisation trick!

realistic lossy compression

Right-hand image from Careil et al. [1]

differentially private federated learning

main thesis contributions

  1. Established tight lower bounds for REC performance
  2. Developed optimal REC algorithms
  3. Applied REC to practical data compression problems

Fundamental Limits

Communication Complexity

Use \(\mathbb{H}[y \mid z]\) as proxy for rate. Li and El Gamal:

\[ I[x; y] \leq \mathbb{H}[y \mid z] \leq I[x; y] + \log(I[x; y] + 1) + \mathcal{O}(1) \]

❌ not tight!

The width function

\(Q \gets P_{y \mid x}, P \gets P_y\)

width_fn.png

\(w_P\) is a probability density!

representing divergences

KL divergence:

\begin{align*} D_{KL}[Q || P] &= -h_{Z \sim P}\left[\frac{dQ}{dP}(Z)\right] \\ &= \log e + \mathbb{E}_{H \sim w_P}[\log H] \end{align*}

Channel simulation divergence:

\[ D_{CS}[Q || P] = h[H] \]

\[ D_{KL}[Q || P] \leq D_{CS}[Q || P] \]

tight lower bound

one_shot_bound.png

behaviour of the lower bound

csd_behaviour.png

  • A: \(P = \mathcal{L}(0, 1)\), \(Q = \mathcal{L}(0, b)\)
  • B: \(P = \mathcal{N}(0, 1)^{\otimes d}\), \(Q = \mathcal{N}(1, 1/4)^{\otimes d}\)

asymptotic lower bound

asymptotic_result.png

Relative entropy coding using Poisson processes

Poisson Processes

  • Collection of random points in space
  • Focus on spatio-temporal processes on \(\mathbb{R}^D \times \mathbb{R}^+\)
  • Exponential inter-arrival times
  • Spatial distribution \(P_{X \mid T}\)
  • Idea: use process as common randomness in REC

Example with \(P_{X \mid T} = \mathcal{N}(0, 1)\)

empty_pp.png

Example with \(P_{X \mid T} = \mathcal{N}(0, 1)\)

pp_t1.png

Example with \(P_{X \mid T} = \mathcal{N}(0, 1)\)

pp_x1.png

Example with \(P_{X \mid T} = \mathcal{N}(0, 1)\)

pp_t1_x1.png

Example with \(P_{X \mid T} = \mathcal{N}(0, 1)\)

pp_t2.png

Example with \(P_{X \mid T} = \mathcal{N}(0, 1)\)

pp_x2.png

Example with \(P_{X \mid T} = \mathcal{N}(0, 1)\)

pp_t2_x2.png

Example with \(P_{X \mid T} = \mathcal{N}(0, 1)\)

pp_sim.png

Rejection Sampling

  • Sampling algorithm for target distribution \(Q\).
  • Using proposal \(P\)
  • Bound on their density ratio \(q/p\): \(M\)

Rejection Sampling

RS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

rs_0.png

RS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

rs_1.png

RS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

rs_2.png

RS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

rs_3.png

RS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

rs_4.png

RS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

rs_5.png

RS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

rs_6.png

RS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

rs_7.png

RS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

rs_8.png

RS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

rs_9.png

RS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

rs_10.png

RS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

rs_accept.png

Greedy Poisson Rejection Sampling

Motivation

gprs_motivation_illustration.png

Fact: \((x, y) \sim \mathrm{Unif}(A) \, \Rightarrow\, x \sim P\)

Can we do the same with Poisson processes?

Yes!

\begin{align*} \varphi &= \sigma \circ \frac{dQ}{dP} \\ \sigma(h) &= \int_0^h \frac{1}{\mathbb{P}[H \geq h]} \, dh \end{align*}

GPRS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

gprs_0.png

GPRS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

gprs_1.png

GPRS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

gprs_2.png

GPRS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

gprs_3.png

GPRS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

gprs_4.png

GPRS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

gprs_5.png

GPRS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

gprs_accept.png

Runtime of GPRS

gprs_runtime.png

Codelength of GPRS

gprs_codelength.png

Fast GPRS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

fast_gprs_0.png

Fast GPRS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

fast_gprs_1.png

Fast GPRS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

fast_gprs_2.png

Fast GPRS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

fast_gprs_3.png

Fast GPRS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

fast_gprs_4.png

Fast GPRS with \(P = \mathcal{N}(0, 1), Q = \mathcal{N}(1, 1/16)\)

fast_gprs_5.png

Runtime of fast GPRS

sac_gprs_runtime.png

This is optimal.

Codelength of fast GPRS

Now, encode search path \(\nu\).

sac_gprs_codelength.png

Computationally Lightweight ML-based data compression

Data Compression with INRs

Image from Dupont et al. [4]

  • computationally lightweight
  • short codelength

COMBINER

COMpression with Bayesian Implicit Neural Representations

Image from Blundell et al. [7]

💡Gradient descent is the transform!

COMBINER

COMBINER

Take-home messages

  • Relative entropy coding is a stochastic alternative to quantization for lossy source coding
  • Analysed selection sampling-based REC algorithms
  • Greedy Poisson rejection sampling is an optimal selection sampler
  • Implicit neural represenations are an exciting, compute-efficient approach to data compression with huge potential

References I

  • [1] Careil, M., Muckley, M. J., Verbeek, J., & Lathuilière, S. Towards image compression with perfect realism at ultra-low bitrates. ICLR 2024.
  • [2] C. T. Li and A. El Gamal, “Strong functional representation lemma and applications to coding theorems,” IEEE Transactions on Information Theory, vol. 64, no. 11, pp. 6967–6978, 2018.

References II

  • [4] E. Dupont, A. Golinski, M. Alizadeh, Y. W. Teh and Arnaud Doucet. "COIN: compression with implicit neural representations" arXiv preprint arXiv:2103.03123, 2021.
  • [5] G. F., L. Wells, Some Notes on the Sample Complexity of Approximate Channel Simulation. To appear at Learning to Compress workshop @ ISIT 2024.
  • [6] D. Goc, G. F. On Channel Simulation with Causal Rejection Samplers. To appear at ISIT 2024

References III

  • [7] C. Blundell, J. Cornebise, K. Kavukcuoglu and D. Wierstra. Weight uncertainty in neural network. In ICML 2015.