Product - Knonik

// ingest agent

Ingest

The universal receiving dock for your robot data.

Every robot speaks a slightly different language. Knonik's Ingest Agent is fluent in all of them. It accepts raw data in every format your robots produce - rosbag files, HDF5 archives, raw MP4 video, or live sensor streams - and normalises everything into a single consistent structure without any manual conversion work from your team.

Think of it as a receiving dock staffed 24/7. Every delivery is checked, sorted, and put away correctly - automatically.

Formats accepted

rosbag, HDF5, MP4, raw sensor streams, LeRobot v2, LeRobot v3, Zarr

What it does

Validates, deduplicates, and normalises incoming episodes into a unified schema

What you don't do

Write custom parsers, convert formats by hand, or babysit uploads

Where it runs

Entirely inside your infrastructure - no data is uploaded anywhere

// compress agent

Compress

Shrink your dataset 100–150× without losing a single bit of learning signal.

Raw robot teleoperation data is extremely redundant. Cameras capture nearly identical frames at high frequency; joint sensors record tiny incremental movements. The Compress Agent orchestrates a carefully chosen combination of existing compression methods - tuned specifically for the patterns in robotics sensor data - and achieves 100–150× size reduction while preserving every meaningful signal your model will ever need. The insight isn't a new algorithm; it's knowing exactly which tools to use, in what order, and with what settings for this data type.

Smaller is faster. Data that fits in memory loads without stalling your GPU. Lower cloud and local memory costs.

Typical reduction

100–150× smaller (same dataset)

Signal fidelity

No loss of learning signal.

Training impact

Compressed data trains to equal or better validation loss (see proof page)

Benefit

Faster I/O and lower storage bills

// score agent

Score

An AI reviewer that catches bad demonstrations before they reach your model.

Not every robot demonstration is worth training on. Shaky grasps, incomplete tasks, sensor glitches, and operator mistakes all produce episodes that teach your model the wrong thing. The Score Agent watches every episode and assigns a quality score based on task completion, motion smoothness, sensor consistency, and outcome success - automatically, before your training run even starts.

Garbage in, garbage out. The Score Agent is the quality gate that prevents bad data from ever reaching your model.

What it detects

Failed tasks, noisy trajectories, sensor drop-outs, repetitive or near-duplicate episodes

Output

A quality score and flag per episode - bad ones are quarantined, not deleted

Why it matters

One bad episode in a small dataset can significantly degrade policy performance

Human role

Review flagged episodes if you want to - the agent handles the rest

// process agent

Process

A Vision-Language Model that reads your robot's videos and writes descriptions - no human labellers needed.

Training modern robot policies often requires natural-language task descriptions, object labels, and structured episode metadata. The Process Agent uses a Vision-Language Model (VLM) to watch each episode's video, understand what the robot is doing, and automatically generate accurate annotations - descriptions, object labels, task phases, and quality notes. It then filters and curates based on your criteria.

Language-conditioned policies need language labels. This agent generates them at the speed of your data pipeline, not your annotation budget.

Annotations generated

Task descriptions, object labels, phase segmentation, success / failure tags

Technology

Vision-Language Model (VLM) running on your infrastructure

Filtering

Apply custom rules to curate which episodes enter your training set

Cost vs humans

Eliminates manual labelling time entirely for standard annotation tasks

Compute efficiency

Input prompt optimisation sends only relevant frames to the VLM, not the whole video - reducing compute cost by over 70%

// visualize agent

Visualize

A live dashboard for understanding what your robot actually collected.

You shouldn't be training blind. The Visualize Agent generates a rich, interactive dashboard where your team can browse every episode, watch the raw video alongside joint trajectories, compare episodes side by side, and explore annotations and quality scores. It's the difference between trusting your dataset and actually knowing what's in it.

Every robot team has a dataset they've never fully looked at. This makes it possible to actually understand what you have.

Episode browser

Search, filter, and sort by task, score, date, or annotation

Multimodal playback

Video, joint angles, end-effector pose, and force signals in sync

Comparison view

Overlay two episodes to spot differences in strategy or execution

Access

Web dashboard running inside your infrastructure - no external accounts

// dataloader agent

Dataloader

High-speed data delivery that keeps your GPU busy instead of waiting.

GPU time is expensive. A dataloader that stalls - even for 100 ms per batch - wastes 10–30% of your training budget. Knonik's Dataloader is purpose-built for compressed robotics data, engineered to deliver the next batch to your GPU before it's needed, every time.

The fastest loader isn't the one with the highest throughput spec - it's the one that never makes your GPU wait.

Epoch speedup

4.3× faster per epoch than standard LeRobot v3 loading on the same dataset

Modes

Batched (best for long runs), OnDemand (instant start), Pipelined (parallel decode)

Cross-epoch cache

Data decoded once and served from RAM across all subsequent epochs

Tail latency

p95 wait under 150 ms across all modes - fewer GPU stalls per training run

From Raw Sensor Data to Training-Ready.
Fully Automated.

Ingest

Compress

Score

Process

Visualize

Dataloader

Don't Take Our Word For It.
See The Numbers.

Training Quality Benchmark

DataLoader Performance

From Raw Sensor Data to Training-Ready.Fully Automated.

Ingest

Compress

Score

Process

Visualize

Dataloader

Don't Take Our Word For It.See The Numbers.

Training Quality Benchmark

DataLoader Performance

From Raw Sensor Data to Training-Ready.
Fully Automated.

Don't Take Our Word For It.
See The Numbers.