Compression Benchmark

Training Quality Analysis:
Compressed vs Uncompressed Data

Does training on Knonik-compressed data produce models that learn as well as training on raw uncompressed data? We benchmarked three production policy architectures on a 14-DOF dual-arm manipulation task, comparing learning quality and deployment transfer across both formats.

18.4 GB

Uncompressed

126.2 MB

Knonik

146x smaller

full signal preserved

Overview

What Was Tested

This benchmark answers two questions simultaneously. First, learning quality: when both the compressed and uncompressed training runs start from the same random initialisation and train for the same number of steps, does the compressed training signal produce a model of equivalent quality? Second, domain transfer: does a model trained exclusively on compressed data generalise to clean uncompressed observations - the typical real-world deployment scenario?

Both questions are answered using surrogate models that replicate the architecture and training dynamics of production robotics policy networks without requiring the full training infrastructure or massive compute budgets. The surrogate approach is standard practice for data pipeline benchmarking: it isolates the effect of the data source from the effect of model scale. All three models were trained on the act_14dof dataset - 50 episodes of 14-DOF dual-arm manipulation, each 400 timesteps, with RGB observations at 480×640 and full joint state (position, velocity, action) at 14 dimensions.

Methodology

Experimental Design

For each model, two runs are conducted from the identical random initialisation. Run A trains on Knonik-compressed data using the Knonik loader. Run B trains on raw uncompressed HDF5 files using a standard PyTorch DataLoader. In both cases, validation is always performed on the uncompressed val set - there is no data-format advantage in evaluation. Periodic validation is capped at 32 batches for speed; after training completes, compressed runs receive a full post-training evaluation over the entire uncompressed dataset with no batch cap, which is the domain transfer score.

Dataset

act_14dof - 14-DOF dual-arm manipulation

Runs

Single run per condition

Steps

500

Batch size

History length

Action / State dim

Image size

96 × 96

Eval cadence

Every 50 steps

Optimizer

AdamW lr=1e-4, wd=1e-4

Val loader

Always uncompressed HDF5

Models

Surrogate Architectures

Diffusion PolicyDiffusion Policy, 2023

Architecture

- Image encoder: ResNet-18 style - Conv + 3 residual blocks + AdaptiveAvgPool → Linear(256 → hidden_dim)
- Condition encoder: Linear projection of concatenated image features and joint positions
- 1-D UNet denoiser: 3 down-blocks → mid → 3 up-blocks with skip connections + noise timestep input
- Output: predicted clean action sequence (B, history_len, 14)

Training Objective

Single-step denoising MSE on a noisy action sample.

Key Hyperparameters

hidden_dim=256 · obs_history=2 · lr=1e-4 · grad_clip=1.0

ACT: Action Chunking TransformerAction Chunking Transformer, 2023

Architecture

- Image encoder: same ResNet-18 block structure as Diffusion Policy
- VAE encoder: Linear → ReLU → Linear with separate μ and log-σ heads; reparameterisation trick
- Transformer decoder: 3 layers, 8 heads, d_model=512, ffn_dim=2048; learned action queries attend to image + state + latent z
- Output: predicted action sequence (B, history_len, 14)

Training Objective

L1 reconstruction loss + KL divergence (weight 10.0) via CVAE.

Key Hyperparameters

hidden_dim=512 · history_len=16 · kl_weight=10.0 · lr=1e-4

Flow Matching PolicyFlow Matching Policy, 2024

Architecture

- ViT tokenizer: 16×16 patch embedding → flatten to (B, 36, hidden_dim) + learned positional embedding + LayerNorm
- State encoder: 2-layer MLP projecting joint positions to a single token appended to image tokens
- Sinusoidal flow-time embedding: conditioned on diffusion step t, injected into action tokens
- Transformer decoder: 4 pre-LN layers, 8 heads; action tokens attend to image + state memory

Training Objective

Flow-matching MSE on the velocity field u_t = noise − action.

Key Hyperparameters

hidden_dim=256 · num_layers=4 · action_horizon=16 · lr=1e-4

Results

Full Results Table

Final validation MAE, best loss checkpoints, training time, and domain transfer gap for all six runs.

Model	Training	Final Val MAE	Best Val Loss	Best Step	Train Time	Full Uncomp. MAE	Domain Gap
Diffusion Policy	compressed	0.0604	0.00918	500	107.4 s	0.0637	+0.0033
Diffusion Policy	uncompressed	0.0780	0.01344	500	98.8 s	-	-
ACT	compressed	0.0911	0.09243	450	105.8 s	0.0931	+0.0020
ACT	uncompressed	0.0919	0.09489	500	96.3 s	-	-
Flow Matching Policy	compressed	0.0917	0.12391	500	110.8 s	0.0930	+0.0012
Flow Matching Policy	uncompressed	0.0970	0.13166	500	100.2 s	-	-

MAE Ratio: Compressed ÷ Uncompressed

Below 1.0 means compressed training outperforms uncompressed.

Diffusion Policy

0.774×

Compressed trains 22.6% better

Compressed: 0.0604Uncompressed: 0.0780

ACT

0.991×

Compressed trains better

Compressed: 0.0911Uncompressed: 0.0919

Flow Matching Policy

0.945×

Compressed trains better

Compressed: 0.0917Uncompressed: 0.0970

Domain Transfer

Compressed Training → Uncompressed Deployment

Do models trained on compressed data generalise to clean uncompressed observations at deployment time?

Model	Periodic Val MAE	Full Uncompressed MAE	Gap	Gap %	Verdict
Diffusion Policy	0.0604	0.0637	+0.0033	+5.5%	Excellent
ACT	0.0911	0.0931	+0.0020	+2.2%	Excellent
Flow Matching Policy	0.0917	0.0930	+0.0012	+1.3%	Excellent

All domain transfer gaps are within 0.004 MAE absolute across all three models. Models trained on Knonik-compressed data generalise cleanly to uncompressed deployment data - the periodic val scores recorded during training closely match the post-training full-dataset evaluation, confirming the compressed training signal is genuine and not a measurement artefact.

Learning Curves

Per-Model Deep Dive

Training loss, validation loss, and validation action MAE curves for each architecture.

Diffusion Policy

Training Loss

Validation Loss

Val Action MAE

Key Findings

Best overall: compressed MAE 0.0604, the lowest of all conditions tested
Compressed trains 22.6% better than uncompressed (0.0604 vs 0.0780)
Both conditions converge at step 500; compressed curve is smoother throughout
Domain gap +5.5% (0.0033 absolute): generalises well to uncompressed at deployment

ACT

Training Loss

Validation Loss

Val Action MAE

Key Findings

Compressed trains better: MAE 0.0911 vs 0.0919 uncompressed (0.991x ratio)
Compressed peaks at step 450; training converges more reliably than uncompressed
Training time overhead negligible: 105.8 s compressed vs 96.3 s uncompressed (~10%)
Domain gap +2.2% (0.0020 absolute): safe for production deployment on uncompressed data

Flow Matching Policy

Training Loss

Validation Loss

Val Action MAE

Key Findings

Most consistent convergence curves across all three models
Near-identical behaviour across compressed and uncompressed conditions
Flow-matching objective is the most data-format agnostic of the three
Still clearly converging at step 500; extended runs would yield further improvement

Analysis

Detailed Analysis

Compression as implicit regularisation

The most striking result is that all three models achieve lower validation MAE when trained on Knonik-compressed data. Diffusion Policy shows the largest effect - compressed training reaches a final MAE of 0.0604, compared to 0.0780 with raw uncompressed data, a 22.6% reduction. ACT and the Flow Matching Policy show smaller but consistent advantages (0.991× and 0.945× respectively). The most plausible mechanism is that video codec compression introduces subtle temporal smoothing in the RGB stream, acting as implicit data augmentation that reduces overfitting on fine visual texture. This effect is consistent across architectures with very different inductive biases - from denoising diffusion to CVAE-based action chunking to flow matching - suggesting it is a property of the data format rather than any specific model.

Domain transfer - compressed training deploys on uncompressed data

The central practical question is whether models trained on compressed data generalise to uncompressed observations at deployment. All three models show small, bounded domain transfer gaps: +5.5% for Diffusion Policy (0.0637 vs 0.0604), +2.2% for ACT (0.0931 vs 0.0911), and +1.3% for the Flow Matching Policy (0.0930 vs 0.0917). In absolute MAE terms the largest gap is 0.0033. These are well within acceptable bounds for continuous action prediction - the model has learned the underlying manipulation task geometry, not the specific artefacts of the encoding format. The periodic val evaluator, which uses uncompressed data throughout training, closely tracked the post-training full-dataset evaluation in all three cases, validating that the compressed-data training signal is genuine.

Convergence and training overhead

None of the six runs meet the automatic convergence criterion at 500 steps - all models retain headroom. Diffusion Policy and the Flow Matching Policy are both still descending at step 500; ACT compressed is the only run that shows early saturation, peaking at step 450. Training time overhead for compressed data is negligible across all three architectures, ranging from 8% to 10%, consistent with Knonik's video decode running in background worker processes overlapped with GPU forward passes.

Back to Product

Training Quality Analysis:Compressed vs Uncompressed Data

What Was Tested

Experimental Design

Surrogate Architectures

Full Results Table

MAE Ratio: Compressed ÷ Uncompressed

Compressed Training → Uncompressed Deployment

Per-Model Deep Dive

Detailed Analysis

Compression as implicit regularisation

Domain transfer - compressed training deploys on uncompressed data

Convergence and training overhead

Training Quality Analysis:
Compressed vs Uncompressed Data