Lucky Robots Blog Open Roles

TH · Threading & Concurrency

The engine has several long-lived threads plus per-task workers. Putting code on the wrong thread causes UI hitches (long blocking work on main), torn data (unsynchronised cross-thread access), or silent failures (ImGui calls off-main). This page lists every thread context, what's allowed on it, and how to bounce between them.

Doc source: .claude/docs/Threading.md TimeManager: Hazel/src/Hazel/Core/TimeManager.h ImGui & NVRHI: main only
TimeManager phase pipeline (per tick) time → PreAcquisition free rate refresh-rate prelude Acquisition PhaseID 0 sensors / inputs Control PhaseID 1 IK, policy, PID Physics PhaseID 2 Jolt / mj_step Validation PhaseID 3 constraints / limits Export PhaseID 4 record / telemetry PostExport free rate
TimeManager fixed phases execute in order at each runner's frequency; free phases bracket them at refresh rate.

Quick reference

Thread contextAllowedForbidden
Main thread ImGui, NVRHI / Vulkan, OS window / input, asset registry mutations, scene mutations Anything > ~16 ms (long file I/O, archive extract, network, subprocess wait, asset import)
TimeManager fixed runners (Acquisition / Control / Physics / Validation / Export) The phase-appropriate work — see PhaseID table below Cross-phase mutations, ImGui, anything outside the phase contract
TimeManager free runners (PreAcquisition / PostExport) Refresh-rate work that doesn't fit a fixed phase Same as fixed runners
std::jthread workers Long-running file / network / subprocess work, archive ops, computation ImGui, NVRHI / Vulkan, direct mutation of main-thread-owned state
gRPC server thread(s) RPC handlers, request marshalling Direct main-thread state mutation — bounce via the gRPC step system
File watchers (filewatch::FileWatch) Setting a "needs reload" flag the main thread polls Doing the reload itself; ImGui; touching project state directly
AuthService callback HTTP roundtrip, parsing, setting auth state under its mutex UI updates — main thread observes the state machine
Default placement

When in doubt: assume work belongs on a std::jthread worker, with results signalled to the main thread via a flag / queue / atomic that the main thread polls in its OnUpdate.

Detection helpers

  • Application::IsMainThread() — global check. Use in HZ_CORE_ASSERT(Application::IsMainThread()) to enforce a function is main-thread only, or HZ_CORE_ASSERT(!Application::IsMainThread()) to enforce it isn't (the worker family uses this).
  • MainThreadDispatcher::IsMainThread() — equivalent for editor code in LuckyEditor/src/Agent/MainThreadDispatcher.h.

Both record the main thread's std::thread::id at startup. If neither has been initialised yet, the check returns false — so don't rely on these before the engine has booted.

Main thread

Owns: GLFW / OS window, NVRHI device + command lists, ImGui context, the active Scene, the active Project, the asset registry. Runs Application::Run, which pumps OS events, calls each layer's OnUpdate / OnImGuiRender / OnEvent, drives TimeManager::Tick, and presents the frame.

Do on main
  • UI: ImGui calls, panel / popup rendering.
  • Rendering: NVRHI command-list building, draw submission.
  • Asset registry mutations (EditorAssetManager is single-threaded).
  • Scene mutations: adding / removing entities, attaching components.
Don't do on main
  • Long file I/O (> a couple of MB, slow disk). Bounce to a worker.
  • Archive extract / zip create. Use FileSystem::ExtractZip from a worker.
  • Network roundtrips — even a "fast" HTTP can stall for seconds.
  • Subprocess WaitProcess. Spawn from a worker, or fire-and-forget.
  • Asset import (mesh / texture decode). EditorAssetSystem has its own worker.

Symptoms of a violation: UI freezes during the operation, glfwSwapBuffers reports late frames, the editor stops responding to input until the work finishes.

Editor vs. runtime threading

The single most-misunderstood fact in this doc

In the editor (LuckyEditor), the main thread and the render thread are the same thread. Only the runtime (HeadlessApplication and shipping standalone builds) runs the renderer on a dedicated thread. The split is controlled by Application::Specification::CoreThreadingPolicy (enum in Hazel/src/Hazel/Core/ThreadingPolicy.h).

Editor — SingleThreaded Runtime — MultiThreaded Main thread OnUpdate / sim Application::Run Renderer::Submit queue push drain queue end-of-frame swap / present same thread Main thread (sim) tick N+1 sim Submit lambdas Render thread (std::thread) drain queue NVRHI submit
Left: editor — one thread plays both main and render roles, queue drains end-of-frame. Right: runtime — a dedicated render thread drains while main starts the next sim tick.
ModeWhere setWhat RenderThread actually does
SingleThreaded LuckyEditor.cpp (editor binary) No std::thread spawned. RenderThread::Kick / Pump / BlockUntilRenderComplete are no-ops or run synchronously on the calling thread.
MultiThreaded HeadlessApplication / runtime defaults A real std::thread is spawned. Submitted commands are drained on that thread; main blocks on it at frame boundaries.
None Off-by-default Reserved; not currently used.

Renderer::Submit(lambda) in either mode pushes the lambda into the global RenderCommandQueue. The queue is drained on whichever thread plays the "render thread" role for that mode:

  • In the editor (SingleThreaded), the main thread itself drains the queue at the end of the application frame, before swap. The lambda runs synchronously, just deferred until the queue-drain point.
  • In the runtime (MultiThreaded), the dedicated render thread drains the queue during its WaitAndRender cycle while the main thread starts on the next simulation tick.
What this means in practice
  • There is no race condition between the "main thread" and the "render thread" inside the editor — they're the same thread.
  • SubmitResourceFree is not crossing a thread boundary in the editor; it's still a deferral, but only across the queue-drain point on the same thread.
  • Renderer::IsCurrentThreadRT() returns true for the main thread in the editor. Don't assert "this is not the render thread" from the main thread in editor-targeted code — it'll fire.
  • A bug that reproduces only under MultiThreaded (runtime, not editor) is a real cross-thread issue. A bug that reproduces in both is single-threaded — investigate logic and frame-ordering, not data races.
  • Don't add locks "just to be safe" around state read by a Renderer::Submit lambda in editor code paths.

When you genuinely need the render-thread-vs-main distinction (i.e. you're writing code that ships in the runtime and needs correctness guarantees there), use Renderer::Submit for the GPU work and use main-thread state synchronisation only for the runtime build. The editor will run the same code correctly because submission collapses to a same-thread deferral.

TimeManager runners

The engine's deterministic execution model. Header: Hazel/src/Hazel/Core/TimeManager.h. Five fixed phases (executed in order at each runner's frequency) + two free phases (refresh-rate-based).

PhasePhaseIDPurposeExamples
AcquisitionPhaseID::Acquisition (0)Read sensors, sample inputsCamera capture, joint encoder reads
ControlPhaseID::Control (1)Run control algorithmsIK solvers, policy networks, PID controllers
PhysicsPhaseID::Physics (2)Step physics enginesJolt step, MuJoCo step (mj_step)
ValidationPhaseID::Validation (3)Check state, enforce constraintsCollision checks, workspace limits, episode termination
ExportPhaseID::Export (4)Record data, emit telemetryDataset recording, logging, metrics, video frame writes
PreAcquisition (free)FreePhaseID::PreAcquisition (0)Free-rate work before fixed phasesCamera frame capture timing, certain rendering preludes
PostExport (free)FreePhaseID::PostExport (1)Free-rate work after fixed phasesUI refresh, statistics aggregation

Modes

TimeManager::Mode — three execution policies:

  • RealtimeNonDeterministic — keep wall-clock pace; drop ticks if overloaded. UI / sim default during interactive editing.
  • LowPerformanceDeterministic — never drop ticks; may run slower than real time when overloaded; capped at 1× speed.
  • HighPerformanceDeterministic — never drop ticks; run as fast as the hardware permits. The headless / training mode.

The mode determines whether long work in a runner is acceptable: in RealtimeNonDeterministic, a slow Physics runner causes dropped ticks; in HighPerformanceDeterministic, the whole simulation just slows down.

Registration

  • Fixed-rate work: TimeManager::RegisterWithRunner(registration) or RegisterWithRunnerID(registration). Pick a phase from the table above; pick a frequency that matches the work (typically 500 Hz physics, 100 Hz control, 30 Hz recording — never assume 60 Hz).
  • Free-rate work: TimeManager::RegisterFreeUpdate(callback, FreePhaseID::PostExport) (or PreAcquisition).
Forbidden

Putting periodic logic in Scene::OnUpdate or EditorLayer::OnUpdate. Those files are orchestration-only — see the "Protected Files" rule in send-pr/SKILL.md § 13.

Thread placement

In the current implementation, TimeManager::Tick runs on the main thread (called from Application::Run). Runners therefore execute on the main thread, in serial within a tick. This is an implementation detail that can change — write runners as if they may run on a dedicated thread later:

  • Don't touch ImGui from a runner.
  • Don't assume you can read Scene mutations made by another runner in the same phase; cross-runner data flow goes through the StepContext or explicit shared state with synchronisation.
  • Allocate per-tick budget; don't accumulate work in a runner that grows unbounded across ticks.

std::jthread workers — one-shot bg jobs

Pattern: a class owns a std::jthread m_Worker declared last (so the destructor joins before other members are destroyed). The worker calls into a private RunBackground(std::stop_token st) method. State is exchanged via atomics / a shared_ptr<Progress> protected by a mutex.

Existing instances — don't add new bespoke threading patterns; extend or copy these:

  • ImportProjectJob — extracts a project zip, renames, patches identity. LuckyEditor/src/Utilities/Content/ImportProjectJob.{h,cpp}.
  • ExportProjectJob — zips a project, applies export filters.
  • CreateProjectFromTemplateJob — downloads a vault project, instantiates as new project, optionally installs a robot pack.
  • SaveProjectAsJob — copies a project to a new location with patched identity.
  • EpisodeReportStreamer — streams episode reports to LuckyHub.
  • ContentVaultSystem — heavy-op slot guarded by HeavyOpGuard; only one bg vault job at a time.
  • EditorAssetSystem — async asset import worker.
Hard rules for workers
  • No ImGui calls. Ever. Communicate via a result struct the main thread reads after IsDone().
  • No NVRHI calls. GPU work belongs on the main thread (or on the renderer's own command-list worker if it ever exists).
  • Cancellation: accept a std::stop_token. Check st.stop_requested() at every cancellation checkpoint. Past the point of no return, set m_Cancellable = false.
  • Progress reporting: use the existing Progress struct ({ std::string StatusText; float Percent; std::mutex Mutex; }) with SetProgress(progress, "...", 0.5f) under the lock. Don't re-invent.
  • Result handoff: IsDone() true after m_Done.store(true, std::memory_order_release). TakeResult() called once on the main thread.
  • Cleanup on failure: delete partial output (zip, extracted dir, copied tree) before reporting failure so the user can retry from a clean state.

gRPC server thread

The gRPC server (configured in LuckyEditor/src/Panels/GrpcPanel and friends) runs its handlers on gRPC's own pool — a different thread from the engine main loop.

  • RPC handlers don't directly mutate Scene or EditorAssetManager. They go through the GrpcStepSystem / GrpcCapturePool which marshals work to the engine's main loop at safe checkpoints (typically the Acquisition or Validation phase).
  • gRPC handlers may capture state from a recent simulation tick (e.g. via GrpcStepCapture) but must not block waiting for a future tick under the main thread.

If you're adding a new RPC: don't reach into engine state from the handler. Add a marshalling step and read from the capture / snapshot the engine produces per tick. See gRPC / Cross-System for the full service surface.

File watchers

m_ScriptFileWatcher in EditorLayer.cpp uses filewatch::FileWatch<WatcherString> to detect script-DLL rebuilds. The watcher callback runs on filewatch's own thread.

  • The callback only sets a flag (m_ShouldReloadCSharp = true) — it does not perform the reload.
  • The reload happens in OnUpdate on the main thread, after observing the flag.

When adding a new file watcher: follow the same pattern. Watcher thread sets a flag; main thread polls and acts.

AuthService

Hazel::AuthService (singleton, Hazel/src/Hazel/Auth/AuthService.{h,cpp}) handles the OAuth login flow. The browser callback comes back to a local HTTP listener running on a worker thread. The main thread observes auth state via AuthService::Get().GetState() (AuthState::PendingCallback, Authenticated, etc.).

  • UI code (SimpleUXWelcome's Login button) reads state on the main thread.
  • The login worker writes state under the service's internal mutex.
  • When auth completes, the worker sets the new state and the main thread picks it up next frame.

Bouncing between threads

Background → main: MainThreadDispatcher

Header: LuckyEditor/src/Agent/MainThreadDispatcher.h.

// From a background thread, run something on the main thread and wait for the result.
std::string result = MainThreadDispatcher::Execute([]() {
    return DoMainThreadOnlyWork();
});

// From a background thread, fire-and-forget a main-thread operation.
MainThreadDispatcher::ExecuteAsync([]() {
    MutateSomething();
});

MainThreadDispatcher::ProcessPendingOperations() drains the queue. It's already wired into EditorLayer::OnUpdate — don't call it from anywhere else.

Background → main: scoped pattern (preferred for jobs)

Workers that have a clear completion event don't need MainThreadDispatcher. They:

  1. Compute results on the worker thread.
  2. Set m_Done (release) when finished.
  3. Main thread polls IsDone() each frame in OnUpdate.
  4. Main thread calls TakeResult() and acts on the outcome.

This is what CreateProjectFromTemplateFlow::OnUpdate does — see LuckyEditor/src/Popups/CreateProjectFromTemplateFlow.cpp.

Main → background: just spawn a job

Use the *Job::Start(...) static factory pattern. The job allocates itself, validates inputs synchronously, launches the jthread, and returns a Scope<...> the main thread owns. Don't manually create std::thread / std::jthread for one-off work — use the job pattern.

Synchronisation primitives

  • std::atomic<T> for single-value flags / counters where no other state needs to be coordinated.
  • std::mutex + std::scoped_lock for compound state. Don't roll your own spinlock.
  • std::shared_mutex if read-heavy and contention matters.
  • std::stop_token / std::stop_source for cancellation. Don't use a custom bool flag.

Locks should be held for the minimum time. If you find yourself holding a lock across an I/O call, restructure: copy the data out under the lock, do the I/O unlocked, then re-acquire to write back.

When you're not sure

If you can't figure out which thread a piece of code runs on, walk up the call graph until you hit one of:

  • Application::Run / OnUpdate / OnImGuiRender / OnEvent → main thread.
  • A std::jthread constructor or Job::Start → that worker.
  • TimeManager::Tick → currently main, but treat as if it could move.
  • A gRPC handler entry point → gRPC pool.
  • A filewatch::FileWatch callback → file watcher thread.

If you still can't tell, add HZ_CORE_ASSERT(Application::IsMainThread()) (or !IsMainThread) and run the editor — the assert tells you on the first frame.