Research Paper · 2025

Knonik Is All You Need for Scaffolding Robot Learning

Arjun P S — knonik.com

60 BC-Transformer trainings across 3 storage modes, 10 tasks, and 2 seeds on the LIBERO-Object benchmark (500 demonstrations, 149,014 RGB frames). Policy, optimizer, and evaluation held fixed — only the data format and dataloader change.

Knonik Lossless

64.25% vs 54.35% HDF5

+9.9 pp · 3.51× smaller

Knonik Lossy

54.83% (tied baseline)

+0.5 pp · 18.23× smaller

Stability

HDF5 worst collapse: 90.5 pp

Lossless max: 61.5 pp · Lossy: 46 pp

Download Full Paper (PDF)
+9.9 ppaccuracy gainLossless vs. raw HDF5
3.51×smallerKnonik lossless storage
18.23×smallerKnonik lossy storage
1.5×less stableHDF5 seed variance vs Knonik

Figure 1 — Storage cost (log scale) vs. mean held-out success

0.3131050%55%60%65%Dataset size (GB, log scale)Mean success (%)Uncompressed(HDF5)KnonikLosslessKnonikLossy

Lossy is nearly an order of magnitude more storage-efficient per unit of task success. Lossless gets both a size reduction and a quality gain.

Figure 2 — Throughput (samples/s) and per-batch data-fetch latency (log scale)

Throughput (samples / s)

90100110120130HDF5102.9Lossless119.6Lossy118.2

Batch-fetch latency (ms, log scale)

102050100200HDF565msLossless21msLossy22ms

Knonik's parallel prefetch hides decode overhead — compressed data arrives faster than raw HDF5. Median batch-fetch latency drops from 65 ms → 21 ms despite adding a video decode step.

Background

The Data-Infrastructure Problem

Oblivious format

  • 500 demos → 7 GB at 128×128. At 480p+ teleoperation scale: 2.5 TB/day.
  • HDF5 ignores temporal and spatial redundancy — compressed-video codecs exploit both.
  • Teams workaround by downsampling or cutting re-training frequency.

Wrong dataloader

  • PyTorch DataLoader assumes i.i.d. samples. Robotics data is correlated trajectories.
  • Single-process sequential reads cannot prefetch behind compute.
  • No global shuffle → correlated batches → seed-sensitive policies.

Silent GPU tax

  • 37% of GPU wall-clock is idle data-starvation with HDF5 + stock loader.
  • You pay 1.54× the GPU-hours the model actually needs.
  • nvidia-smi won't show this — only the inter-step gap reveals it.
Section III

Methodology

Dataset

  • LIBERO-Object: 10 pick-and-place tasks
  • 50 demos per task · ~149,000 RGB frames
  • 128×128 · dual camera · 126–248 timesteps/ep

Training

  • BC-Transformer policy (unchanged from LIBERO)
  • Batch 32 · 50 epochs · AdamW
  • 60 total runs: 3 modes × 10 tasks × 2 seeds
  • Post-hoc eval: 200 rollouts per task

Table 1 — The 10 LIBERO-Object Tasks

IDTask instruction
T0Pick up the alphabet soup and place it in the basket
T1Pick up the cream cheese and place it in the basket
T2Pick up the salad dressing and place it in the basket
T3Pick up the bbq sauce and place it in the basket
T4Pick up the ketchup and place it in the basket
T5Pick up the tomato sauce and place it in the basket
T6Pick up the butter and place it in the basket
T7Pick up the milk and place it in the basket
T8Pick up the chocolate pudding and place it in the basket
T9Pick up the orange juice and place it in the basket

Table 2 — Dataset Footprint

ModeFormatSizeRatio
Uncompressed (HDF5)HDF5 raw uint86.93 GB1.00×
Knonik LosslessKnonik lossless1.98 GB3.51×
Knonik LossyKnonik lossy (ul2)0.38 GB18.23×

Figure 3 — Storage Footprint (GB)

Uncompressed (HDF5)
6.93
Knonik Lossless
1.98
Knonik Lossy
0.38
Section VI

Results

Headline success rate

Figure 4 — Mean Held-out Success · best ckpt · 200 rollouts · 10 tasks × 2 seeds

Uncompressed (HDF5)
54.35
Knonik Lossless
64.25
Knonik Lossy
54.83

Mean success rate (%). HDF5 std ±34.48 pp — highest of three.

Table 3 — Held-out Success Rate (mean ± std)

ModeBest (%)Latest (%)N
Uncompressed (HDF5)54.35 ± 34.4849.33 ± 32.6420
Knonik Lossless64.25 ± 22.1753.83 ± 25.8420
Knonik Lossy54.83 ± 25.4351.00 ± 24.7920
  • Lossless beats uncompressed by +9.9 pp on bit-identical data — the delta is entirely the pipeline, not the data content.
  • Lossy matches the baseline at 18.23× smaller storage and outperforms it on several individual tasks.

Per-task breakdown

Table 4 — Per-task Held-out Success (%) · Seed-averaged · Bold = winner per row

TaskObjectUncompressedKnonik LosslessKnonik Lossy
T0alphabet soup64.852.865.5
T1cream cheese60.258.559
T2salad dressing9289.580.2
T3bbq sauce73.864.854
T4ketchup38.829.28.2
T5tomato sauce83.577.868.8
T6butter16.870.834.2
T7milk29.274.845.5
T8chocolate pudding39.266.281.2
T9orange juice45.258.251.5
Mean54.3564.2554.83

Seed stability heatmap

HDF5 shows catastrophic per-seed collapses absent from both Knonik conditions — T8 drops 78% → 0.5%, T9 drops 90.5% → 0%.

Figure 5 — Success by Task × Mode × Seed · Green = high · Red = collapse

TaskSeed 0Seed 47
HDF5LosslessLossyHDF5LosslessLossy
T0 alphabet_soup57.568.575723756
T1 cream_cheese7965.568.541.551.549.5
T2 salad_dressing9284.5669294.594.5
T3 bbq_sauce82.573626556.546
T4 ketchup011077.547.516.5
T5 tomato_sauce78.5917688.564.561.5
T6 butter32.575.515.516653
T7 milk1971.56239.57829
T8 chocolate_pu7846.5890.58673.5
T9 orange_juice90.58974.5027.528.5
Section VII

Discussion

Cross-seed stability

Table 5 — |Δ| Between Seed 0 and Seed 47 (lower = better)

ModeMean |Δ|Max |Δ|Std
Uncompressed (HDF5)37.7 pp90.5 pp32.4 pp
Knonik Lossless25.2 pp61.5 pp17.4 pp
Knonik Lossy24.6 pp46.0 pp11.0 pp
  • HDF5's worst-case collapse is nearly double any Knonik condition. High mean + high variance is worse in production than lower mean + low variance — the floor matters more once you deploy.
  • Likely cause: Knonik's parallel prefetch produces more uniformly mixed sample streams, reducing correlated batches that steer training into seed-dependent local minima.

Pipeline efficiency

  • GPU utilisation rises from 62.9% → 74.8%. The GPU idles 11.9 pp less per step.
  • Batch-fetch latency: 67.5 ms → 23.8 ms. 50 epochs complete in 48.9 min vs 56.9 min.
  • Counterintuitive: compression should add decode latency — Knonik's parallel dataloader hides it so effectively that compressed data arrives faster than raw HDF5.

Figure 6 — GPU Utilisation vs. Idle (step/(step+gap))

Uncompressed (HDF5)
62.9% compute
37.1% idle
Knonik Lossless
74.8% compute
25.2% idle
Knonik Lossy
74.2% compute
25.8% idle

Table 6 — Pipeline Efficiency (averaged across all tasks and seeds)

MetricUncompressedKnonik LosslessKnonik Lossy
Throughput (samples/s)102.9119.6118.2
Total wall time (min)56.948.949.5
GPU util (step/cycle, %)62.974.874.2
GPU idle (data-wait, %)37.125.225.8
Batch fetch mean (ms)67.5323.8024.25
Inter-step gap mean (ms)108.6263.3065.43
GPU energy (Wh)133.3122.9123.8
Est. cost (USD)2.902.502.53

Read the full paper

Complete methodology, all figures, extended ablations, and raw per-step profiler data.

Download Full Paper (PDF)