2.15 · Data / Observation
Records simulation observations — joint states, sensor data, camera frames — into Parquet, LeRobot-compatible. Drives episode reporting and streaming to LuckyHub.
This subsystem is the one whose product value is the data it produces. Datasets that
look fine but have missing frames, off-by-one timestamps, silently-dropped writes, or non-deterministic
noise corrupt downstream training without being noticed. Before you touch
Observer, Recorders/, Writers/, or
EpisodeReportStreamer, read
Recording Integrity (and
.claude/docs/RecordingIntegrity.md) end-to-end. /cr auto-promotes findings here to
must-fix.
StartSession does not record — it requests approval; Tick promotes on 2xx or disarms on failure.Overview
A central Observer coordinates a set of Recorders (per-source data collectors) which
hand rows and frames to Writers (per-format serialisers). Output is Parquet for tabular data and
MP4/PNG for camera streams — LeRobot-compatible by construction. Episode lifecycle and Hub coordination
live alongside the Observer and route through a single worker thread.
Files
| Path | Role |
|---|---|
Hazel/src/Hazel/Data/Observer.{h,cpp} | Central observation layer. Owns recorders, drives state machine, exposes StartSession / Tick. |
Hazel/src/Hazel/Data/Recorders/ | RobotRecorder, MujocoFullRecorder, DroneRecorder, CameraRecorder, DataRecorder. |
Hazel/src/Hazel/Data/Writers/ | ParquetDataWriter, FFmpegWriter, MaskImageWriter, WriterInterfaces. |
Hazel/src/Hazel/Data/EpisodeReportStreamer.{h,cpp} | Episode → LuckyHub streaming. |
Hazel/src/Hazel/Hub/HubRecordingClient.{h,cpp} | /api/luckyengine/tasks endpoint client. |
TimeManager hooks
| Callback | Phase | Purpose |
|---|---|---|
UpdateObserverData(StepContext) | Acquisition / Export | Captures observation rows, hands them to writers. |
UpdateObserverVideo(StepContext) | Acquisition / Export | Captures camera frames, feeds the FFmpeg writer. |
Observer::Tick | Per-frame poll from Scene::OnUpdateRuntime | Promotes PendingApproval → Recording on Hub approval; disarms on failure. |
LuckyHub recording lifecycle
Recording is gated on Hub approval. There is no path that opens a writer before approval lands.
Observer::StartSessiondoes not start recording directly — it callsRecordingApiGateway::RequestApprovalwhich synchronously checksAuthService::Get().IsAuthenticated(). Failure → stateRejected, abort.- Posts
/api/luckyengine/taskson a background worker. Observer sits inDataState::PendingApproval. - On 2xx,
Tickpromotes toRecordingand firesm_OnRecordingStarted. - On failure,
Tickdisarms and surfaces the error throughapiGateway.GetLastError()(consumed byDataPanel).
Episode end & session end
Episode end fires SubmitEpisodeAsync — fire-and-forget
POST /api/luckyengine/tasks/:taskId/episodes/submit with executionMetadata only
(no file upload). Session end fires StopTaskAsync —
POST /api/luckyengine/tasks/:taskId/stop. Both paths route through a single worker thread + FIFO
queue.
Episode submission failures are logged, never propagated. The simulation thread must not block on Hub latency, must not retry inline, and must not abandon the recording because a network call failed. If you find yourself catching a Hub error and "doing something about it" inside a recording path, stop — the rule is fire-and-forget plus a log line.
Data flow
Observer → Recorders → Writers → Parquet / MP4 / PNG. EpisodeReportStreamer + HubRecordingClient emit metadata to LuckyHub in parallel.Recording integrity essentials
Full rules live in Recording Integrity. The non-negotiable ones at a glance:
catch (...) { warn; continue; } in a writer silently drops data and is the textbook way to
produce a corrupt dataset that no one notices. If a write fails, fail the episode — do not cover it
up.
1. Completeness
Every step that ran produced a row. Every captured camera frame got written. No silent skips. Dropped frames are detected and either backfilled with a sentinel or recorded as a gap with explicit metadata.
2. Determinism
Same seed → byte-identical output. No std::unordered_map iterated to disk, no
steady_clock::now() deltas as comparable fields, no thread-local RNG in recording paths.
3. Atomicity
Crash mid-episode leaves either "complete and finalised" or "in progress, not claimed complete." The manifest update is the last write — everything else flushes first.
4. Schema fidelity
Every field has a documented type, shape, and unit. Schema changes are versioned. Don't change the meaning of an existing column without renaming it — old datasets become invalid otherwise.
Pitfalls
Episodes run for hours. A std::string constructed per row over 10M rows is 10M allocations
— pre-size and reuse buffers. std::format per write is the same problem; batch into a
column writer.
Open the writer once per episode. Don't churn handles inside a tight loop — it crushes performance and increases the risk of dangling temp files on crash.
FFmpegWriter buffers frames and drains them on an encoder thread. Don't bypass it by
encoding synchronously on the simulation thread — extend the existing pattern instead.
Extending
- New recorded source — add a recorder under
Recorders/implementing the recorder interface; register it withObserverfor the appropriate TimeManager phase. - New output format — add a writer under
Writers/behindWriterInterfaces; reuse the buffered/async pattern fromFFmpegWriterif writes are expensive. - New schema field — bump the schema version and document the field's type, shape, and unit. Decide explicitly how old datasets are handled.
- New Hub endpoint — route through
HubRecordingClienton the existing worker + FIFO queue. Do not add inline retries that block the simulation thread.