Back to Product
Compression Benchmark

Training Quality Analysis:
Compressed vs Uncompressed Data

Does training on Knonik-compressed data produce models that learn as well as training on raw uncompressed data? We benchmarked three production policy architectures on a 14-DOF dual-arm manipulation task, comparing learning quality and deployment transfer across both formats.

18.4 GB
Uncompressed
to
126.2 MB
Knonik
146x smaller
full signal preserved
Overview

What Was Tested

This benchmark answers two questions simultaneously. First, learning quality: when both the compressed and uncompressed training runs start from the same random initialisation and train for the same number of steps, does the compressed training signal produce a model of equivalent quality? Second, domain transfer: does a model trained exclusively on compressed data generalise to clean uncompressed observations - the typical real-world deployment scenario?

Both questions are answered using surrogate models that replicate the architecture and training dynamics of production robotics policy networks without requiring the full training infrastructure or massive compute budgets. The surrogate approach is standard practice for data pipeline benchmarking: it isolates the effect of the data source from the effect of model scale. All three models were trained on the act_14dof dataset - 50 episodes of 14-DOF dual-arm manipulation, each 400 timesteps, with RGB observations at 480×640 and full joint state (position, velocity, action) at 14 dimensions.

Methodology

Experimental Design

For each model, two runs are conducted from the identical random initialisation. Run A trains on Knonik-compressed data using the Knonik loader. Run B trains on raw uncompressed HDF5 files using a standard PyTorch DataLoader. In both cases, validation is always performed on the uncompressed val set - there is no data-format advantage in evaluation. Periodic validation is capped at 32 batches for speed; after training completes, compressed runs receive a full post-training evaluation over the entire uncompressed dataset with no batch cap, which is the domain transfer score.

Dataset
act_14dof - 14-DOF dual-arm manipulation
Runs
Single run per condition
Steps
500
Batch size
8
History length
16
Action / State dim
14
Image size
96 × 96
Eval cadence
Every 50 steps
Optimizer
AdamW lr=1e-4, wd=1e-4
Val loader
Always uncompressed HDF5
Models

Surrogate Architectures

Diffusion PolicyDiffusion Policy, 2023
Architecture
  • - Image encoder: ResNet-18 style - Conv + 3 residual blocks + AdaptiveAvgPool → Linear(256 → hidden_dim)
  • - Condition encoder: Linear projection of concatenated image features and joint positions
  • - 1-D UNet denoiser: 3 down-blocks → mid → 3 up-blocks with skip connections + noise timestep input
  • - Output: predicted clean action sequence (B, history_len, 14)
Training Objective

Single-step denoising MSE on a noisy action sample.

Key Hyperparameters

hidden_dim=256 · obs_history=2 · lr=1e-4 · grad_clip=1.0

ACT: Action Chunking TransformerAction Chunking Transformer, 2023
Architecture
  • - Image encoder: same ResNet-18 block structure as Diffusion Policy
  • - VAE encoder: Linear → ReLU → Linear with separate μ and log-σ heads; reparameterisation trick
  • - Transformer decoder: 3 layers, 8 heads, d_model=512, ffn_dim=2048; learned action queries attend to image + state + latent z
  • - Output: predicted action sequence (B, history_len, 14)
Training Objective

L1 reconstruction loss + KL divergence (weight 10.0) via CVAE.

Key Hyperparameters

hidden_dim=512 · history_len=16 · kl_weight=10.0 · lr=1e-4

Flow Matching PolicyFlow Matching Policy, 2024
Architecture
  • - ViT tokenizer: 16×16 patch embedding → flatten to (B, 36, hidden_dim) + learned positional embedding + LayerNorm
  • - State encoder: 2-layer MLP projecting joint positions to a single token appended to image tokens
  • - Sinusoidal flow-time embedding: conditioned on diffusion step t, injected into action tokens
  • - Transformer decoder: 4 pre-LN layers, 8 heads; action tokens attend to image + state memory
Training Objective

Flow-matching MSE on the velocity field u_t = noise − action.

Key Hyperparameters

hidden_dim=256 · num_layers=4 · action_horizon=16 · lr=1e-4

Results

Full Results Table

Final validation MAE, best loss checkpoints, training time, and domain transfer gap for all six runs.

ModelTrainingFinal Val MAEBest Val LossBest StepTrain TimeFull Uncomp. MAEDomain Gap
Diffusion Policycompressed0.06040.00918500107.4 s0.0637+0.0033
Diffusion Policyuncompressed0.07800.0134450098.8 s - -
ACTcompressed0.09110.09243450105.8 s0.0931+0.0020
ACTuncompressed0.09190.0948950096.3 s - -
Flow Matching Policycompressed0.09170.12391500110.8 s0.0930+0.0012
Flow Matching Policyuncompressed0.09700.13166500100.2 s - -

MAE Ratio: Compressed ÷ Uncompressed

Below 1.0 means compressed training outperforms uncompressed.

Diffusion Policy
0.774×
Compressed trains 22.6% better
Compressed: 0.0604Uncompressed: 0.0780
ACT
0.991×
Compressed trains better
Compressed: 0.0911Uncompressed: 0.0919
Flow Matching Policy
0.945×
Compressed trains better
Compressed: 0.0917Uncompressed: 0.0970
Domain Transfer

Compressed Training → Uncompressed Deployment

Do models trained on compressed data generalise to clean uncompressed observations at deployment time?

ModelPeriodic Val MAEFull Uncompressed MAEGapGap %Verdict
Diffusion Policy0.06040.0637+0.0033+5.5%Excellent
ACT0.09110.0931+0.0020+2.2%Excellent
Flow Matching Policy0.09170.0930+0.0012+1.3%Excellent

All domain transfer gaps are within 0.004 MAE absolute across all three models. Models trained on Knonik-compressed data generalise cleanly to uncompressed deployment data - the periodic val scores recorded during training closely match the post-training full-dataset evaluation, confirming the compressed training signal is genuine and not a measurement artefact.

Learning Curves

Per-Model Deep Dive

Training loss, validation loss, and validation action MAE curves for each architecture.

Diffusion Policy
Diffusion Policy Training Loss
Training Loss
Diffusion Policy Validation Loss
Validation Loss
Diffusion Policy Val Action MAE
Val Action MAE
Key Findings
  • Best overall: compressed MAE 0.0604, the lowest of all conditions tested
  • Compressed trains 22.6% better than uncompressed (0.0604 vs 0.0780)
  • Both conditions converge at step 500; compressed curve is smoother throughout
  • Domain gap +5.5% (0.0033 absolute): generalises well to uncompressed at deployment
ACT
ACT Training Loss
Training Loss
ACT Validation Loss
Validation Loss
ACT Val Action MAE
Val Action MAE
Key Findings
  • Compressed trains better: MAE 0.0911 vs 0.0919 uncompressed (0.991x ratio)
  • Compressed peaks at step 450; training converges more reliably than uncompressed
  • Training time overhead negligible: 105.8 s compressed vs 96.3 s uncompressed (~10%)
  • Domain gap +2.2% (0.0020 absolute): safe for production deployment on uncompressed data
Flow Matching Policy
Flow Matching Policy Training Loss
Training Loss
Flow Matching Policy Validation Loss
Validation Loss
Flow Matching Policy Val Action MAE
Val Action MAE
Key Findings
  • Most consistent convergence curves across all three models
  • Near-identical behaviour across compressed and uncompressed conditions
  • Flow-matching objective is the most data-format agnostic of the three
  • Still clearly converging at step 500; extended runs would yield further improvement
Analysis

Detailed Analysis

Compression as implicit regularisation

The most striking result is that all three models achieve lower validation MAE when trained on Knonik-compressed data. Diffusion Policy shows the largest effect - compressed training reaches a final MAE of 0.0604, compared to 0.0780 with raw uncompressed data, a 22.6% reduction. ACT and the Flow Matching Policy show smaller but consistent advantages (0.991× and 0.945× respectively). The most plausible mechanism is that video codec compression introduces subtle temporal smoothing in the RGB stream, acting as implicit data augmentation that reduces overfitting on fine visual texture. This effect is consistent across architectures with very different inductive biases - from denoising diffusion to CVAE-based action chunking to flow matching - suggesting it is a property of the data format rather than any specific model.

Domain transfer - compressed training deploys on uncompressed data

The central practical question is whether models trained on compressed data generalise to uncompressed observations at deployment. All three models show small, bounded domain transfer gaps: +5.5% for Diffusion Policy (0.0637 vs 0.0604), +2.2% for ACT (0.0931 vs 0.0911), and +1.3% for the Flow Matching Policy (0.0930 vs 0.0917). In absolute MAE terms the largest gap is 0.0033. These are well within acceptable bounds for continuous action prediction - the model has learned the underlying manipulation task geometry, not the specific artefacts of the encoding format. The periodic val evaluator, which uses uncompressed data throughout training, closely tracked the post-training full-dataset evaluation in all three cases, validating that the compressed-data training signal is genuine.

Convergence and training overhead

None of the six runs meet the automatic convergence criterion at 500 steps - all models retain headroom. Diffusion Policy and the Flow Matching Policy are both still descending at step 500; ACT compressed is the only run that shows early saturation, peaking at step 450. Training time overhead for compressed data is negligible across all three architectures, ranging from 8% to 10%, consistent with Knonik's video decode running in background worker processes overlapped with GPU forward passes.