Lucky Robots Blog Open Roles

2.15 · Data / Observation

Records simulation observations — joint states, sensor data, camera frames — into Parquet, LeRobot-compatible. Drives episode reporting and streaming to LuckyHub.

Module: Hazel/src/Hazel/Data/ Observer.h ParquetDataWriter Recording-critical
Recording-critical — read this before editing

This subsystem is the one whose product value is the data it produces. Datasets that look fine but have missing frames, off-by-one timestamps, silently-dropped writes, or non-deterministic noise corrupt downstream training without being noticed. Before you touch Observer, Recorders/, Writers/, or EpisodeReportStreamer, read Recording Integrity (and .claude/docs/RecordingIntegrity.md) end-to-end. /cr auto-promotes findings here to must-fix.

Idle DataState::Idle PendingApproval background POST Recording 2xx from Hub Rejected auth / 4xx / 5xx Idle session end Tick promotes Tick disarms StartSession → RecordingApiGateway::RequestApproval 1. IsAuthenticated() — sync 2. POST /api/luckyengine/tasks — background worker
Recording lifecycle. StartSession does not record — it requests approval; Tick promotes on 2xx or disarms on failure.

Overview

A central Observer coordinates a set of Recorders (per-source data collectors) which hand rows and frames to Writers (per-format serialisers). Output is Parquet for tabular data and MP4/PNG for camera streams — LeRobot-compatible by construction. Episode lifecycle and Hub coordination live alongside the Observer and route through a single worker thread.

Files

PathRole
Hazel/src/Hazel/Data/Observer.{h,cpp}Central observation layer. Owns recorders, drives state machine, exposes StartSession / Tick.
Hazel/src/Hazel/Data/Recorders/RobotRecorder, MujocoFullRecorder, DroneRecorder, CameraRecorder, DataRecorder.
Hazel/src/Hazel/Data/Writers/ParquetDataWriter, FFmpegWriter, MaskImageWriter, WriterInterfaces.
Hazel/src/Hazel/Data/EpisodeReportStreamer.{h,cpp}Episode → LuckyHub streaming.
Hazel/src/Hazel/Hub/HubRecordingClient.{h,cpp}/api/luckyengine/tasks endpoint client.

TimeManager hooks

CallbackPhasePurpose
UpdateObserverData(StepContext)Acquisition / ExportCaptures observation rows, hands them to writers.
UpdateObserverVideo(StepContext)Acquisition / ExportCaptures camera frames, feeds the FFmpeg writer.
Observer::TickPer-frame poll from Scene::OnUpdateRuntimePromotes PendingApprovalRecording on Hub approval; disarms on failure.

LuckyHub recording lifecycle

Recording is gated on Hub approval. There is no path that opens a writer before approval lands.

  1. Observer::StartSession does not start recording directly — it calls RecordingApiGateway::RequestApproval which synchronously checks AuthService::Get().IsAuthenticated(). Failure → state Rejected, abort.
  2. Posts /api/luckyengine/tasks on a background worker. Observer sits in DataState::PendingApproval.
  3. On 2xx, Tick promotes to Recording and fires m_OnRecordingStarted.
  4. On failure, Tick disarms and surfaces the error through apiGateway.GetLastError() (consumed by DataPanel).

Episode end & session end

Episode end fires SubmitEpisodeAsync — fire-and-forget POST /api/luckyengine/tasks/:taskId/episodes/submit with executionMetadata only (no file upload). Session end fires StopTaskAsyncPOST /api/luckyengine/tasks/:taskId/stop. Both paths route through a single worker thread + FIFO queue.

Hub submission must never stall recording

Episode submission failures are logged, never propagated. The simulation thread must not block on Hub latency, must not retry inline, and must not abandon the recording because a network call failed. If you find yourself catching a Hub error and "doing something about it" inside a recording path, stop — the rule is fire-and-forget plus a log line.

Data flow

Scene / MuJoCo step output Cameras render frames Sensors IMU, force, ... Observer central layer RobotRecorder MujocoFullRecorder CameraRecorder DroneRecorder DataRecorder generic ParquetDataWriter tabular → .parquet FFmpegWriter video → .mp4 MaskImageWriter segmentation .png
Sources → Observer → Recorders → Writers → Parquet / MP4 / PNG. EpisodeReportStreamer + HubRecordingClient emit metadata to LuckyHub in parallel.

Recording integrity essentials

Full rules live in Recording Integrity. The non-negotiable ones at a glance:

No try/catch in the recording path

catch (...) { warn; continue; } in a writer silently drops data and is the textbook way to produce a corrupt dataset that no one notices. If a write fails, fail the episode — do not cover it up.

1. Completeness

Every step that ran produced a row. Every captured camera frame got written. No silent skips. Dropped frames are detected and either backfilled with a sentinel or recorded as a gap with explicit metadata.

2. Determinism

Same seed → byte-identical output. No std::unordered_map iterated to disk, no steady_clock::now() deltas as comparable fields, no thread-local RNG in recording paths.

3. Atomicity

Crash mid-episode leaves either "complete and finalised" or "in progress, not claimed complete." The manifest update is the last write — everything else flushes first.

4. Schema fidelity

Every field has a documented type, shape, and unit. Schema changes are versioned. Don't change the meaning of an existing column without renaming it — old datasets become invalid otherwise.

Pitfalls

Don't allocate per row

Episodes run for hours. A std::string constructed per row over 10M rows is 10M allocations — pre-size and reuse buffers. std::format per write is the same problem; batch into a column writer.

Don't open / close file handles per row

Open the writer once per episode. Don't churn handles inside a tight loop — it crushes performance and increases the risk of dangling temp files on crash.

Camera encoding is already async

FFmpegWriter buffers frames and drains them on an encoder thread. Don't bypass it by encoding synchronously on the simulation thread — extend the existing pattern instead.

Extending

  • New recorded source — add a recorder under Recorders/ implementing the recorder interface; register it with Observer for the appropriate TimeManager phase.
  • New output format — add a writer under Writers/ behind WriterInterfaces; reuse the buffered/async pattern from FFmpegWriter if writes are expensive.
  • New schema field — bump the schema version and document the field's type, shape, and unit. Decide explicitly how old datasets are handled.
  • New Hub endpoint — route through HubRecordingClient on the existing worker + FIFO queue. Do not add inline retries that block the simulation thread.