Skip to content

feat: async run_experiment via RunHandle + cancellation + status widget#10

Open
hinderling wants to merge 37 commits into
pertzlab:mainfrom
hinderling:feat/async-run-handle
Open

feat: async run_experiment via RunHandle + cancellation + status widget#10
hinderling wants to merge 37 commits into
pertzlab:mainfrom
hinderling:feat/async-run-handle

Conversation

@hinderling
Copy link
Copy Markdown
Collaborator

@hinderling hinderling commented May 15, 2026

Summary

Move the MDA feed loop onto a worker thread, expose live status through a RunHandle (psygnal Signal), and add a napari dock widget that mirrors + steers the current run. Replaces the synchronous-blocking run_experiment / continue_experiment API.

Draft: breaks the public API. Notebook updates required (see below) before merging. The async demo notebook included here is a test artifact — it must be removed before merge (see Demo notebook section).

Why

The controller's feed loop ran on the main thread, so:

  • napari froze for the duration of every run (no Qt-event processing).
  • run_experiment blocked the calling cell — no interactive monitoring / cancellation without Ctrl-C (which sometimes left device state half-set).
  • Status was opaque: "what timepoint are we on, are we lagging?" was unanswerable.
  • No clean way to cancel or pause a long run.

Moving the loop onto its own thread fixes all of these: napari is responsive by construction, the cell returns immediately, and cancellation / pause / live status become natural.

What changed

New: faro/core/run_status.py

  • RunStatus — immutable snapshot dataclass: state, current_event_index, current_fov, n_events_total, n_events_consumed, n_frames_received, started_at / finished_at, lag_ms, background_errors, fatal_error, …
  • RunHandle — owns the worker thread + cooperative cancel/pause events, carries the run's (sorted) event list. Methods: status(), wait(), cancel(), pause(), resume(), is_running(), is_paused(). Signal: statusChanged (psygnal) emitting the latest RunStatus.
  • RunState: pending → running ⇄ pausing/paused → done/error (cancelling on cancel).

faro/core/controller.py

  • Controller.runStarted = Signal(object) fires on each new run/continue carrying the fresh RunHandle.
  • run_experiment / continue_experiment spawn a worker thread and return the handle immediately; validation still runs synchronously on the caller. Events are sorted once and stashed on the handle so the widget renders them in execution order.
  • _run_worker centralises pre-flight setup and wraps the feed loop so failures land in handle.fatal_error instead of crashing the user.
  • _run_mda_with_events polls cancel_event and pause_event each iteration — pause halts feeding after the in-flight backpressure window drains; resume continues.
  • fix: the engine queue is recreated per run. A cancelled run aborts the engine mid-drain, leaving a stale STOP_EVENT behind; reusing the queue made the next run's engine consume that sentinel and stall after a few events ("stuck at 3/80").
  • fix: _bump_status_for_frame skips IMG_STIM snaps — a stim emission is the SLM-illuminated snap paired with its imaging frame; counting it double-updated lag/elapsed and drifted the frame count off the RTMEvent count.
  • napari preview: the controller no longer carries its own preview-layer machinery, and live mode no longer has to be manually disconnected before a run. napari-micromanager's own _NapariMDAHandler keeps routing frames into the preview layer throughout the run; the controller just stops continuous sequence acquisition once at MDA start to avoid a snap-buffer race. Notebooks can drop the old "break the CoreViewerLink before running" dance.

New: faro/widgets/experiment_status.py

ExperimentStatusWidget — a napari dock panel that mirrors and controls the current run:

  • State chip, legend (imaging / stim / ref).
  • Event strip — one cell per RTMEvent, color-coded by type, past=opaque / future=dimmed progress fill, current cell bordered. Scales to thousands of events.
  • FOV map — one dot per unique stage position, equal-aspect, visit-order path, active dot recolored to the current event type.
  • Stats — event N/M, elapsed, scheduled, lag (red > 5 s), remaining, errors.
  • Pause / Resume + Stop buttons.
  • Theme-adaptive (napari light/dark), auto-rebinds on every new run via runStarted.

Async/Qt fixes folded in

  • PYMM_SIGNALS_BACKEND=psygnal forced in faro/microscope/base.py — with a QApplication loaded, pymmcore-plus otherwise picks the Qt signal backend and queues frameReady to the main thread; if the main thread is blocked (handle.wait()), frames never reach the controller. Forcing psygnal keeps the data path direct/synchronous on the engine thread.
  • Widget connects statusChanged with thread="main" + drives psygnal.qt.start_emitting_from_queue() so worker-thread emits reach QWidgets safely.
  • uv.lock: bumped pymmcore-widgets past an upstream fix (_presets_widget crashing on an empty device label during MDA events).

BREAKING: notebook updates required

Before

ctrl.run_experiment(events, stim_mode="current")   # blocked here
ctrl.finish_experiment()

After — choose one:

(a) Blocking equivalent (smallest diff):

ctrl.run_experiment(events, stim_mode="current").wait()
ctrl.finish_experiment()

(b) Non-blocking, with status / cancel / pause:

handle = ctrl.run_experiment(events, stim_mode="current")
# other cells can run; handle.status() / handle.cancel() / handle.pause()
handle.wait()                  # block at the end if desired
ctrl.finish_experiment()

Optional napari widget:

from faro.widgets import ExperimentStatusWidget
viewer.window.add_dock_widget(ExperimentStatusWidget(ctrl), name="Experiment")

Demo notebook (test artifact — remove before merge)

experiments/02_demo_sim_optogenetic/demo_sim_optogenetic_napari_async.ipynb is included only to exercise this PR against the virtual-microscope optogenetic backend (async run, pause/resume, cancel/restart, the status widget, multi-FOV). It doubles as a worked example of what the migrated notebooks could look like. It should be deleted before this PR merges — the real deliverable is the API + widget, not this notebook.

What to check / test before merging

  • Every notebook in experiments/* that calls run_experiment / continue_experiment — migrate to .wait() or the non-blocking flow. Confirm none rely on the old blocking return.
  • Notebooks that manually tear down the napari live link / CoreViewerLink before a run — that workaround is no longer needed; verify removing it and that the preview layer keeps updating during the run.
  • tests/hardware/* — update for the new RunHandle return type; run on the Moench rig.
  • Multi-channel imaging: the widget's frame counter / strip cursor assume ~1 imaging frame per RTMEvent. For multi-channel plans n_frames_received outpaces the RTMEvent count — verify the strip/stats still read sensibly or gate the assumption.
  • continue_experiment + the widget: confirm the strip/map rebuild correctly for the appended events and the FOV map merges positions.
  • Headless / no-Qt runs (CI, non-microscope dev machine) — import faro stays Qt-free; .wait() path works without a QApplication.
  • Cancel-then-restart and pause/resume on real hardware (verified on the simulator; engine-abort semantics differ per device).
  • Bump the virtual-microscope lockfile pinuv lock --upgrade-package virtual-microscope to pick up the fixes now on its default branch (JIT pre-warm; SimCameraDevice digital ROI / MDA-teardown fix). Without this the demo notebook's first ~4 s of frames stall and the napari Snap preview freezes after a run. Commit the uv.lock change separately (it is not async/widget code).

Related (separate repo)

Two virtual-microscope fixes were needed for the demo notebook and have already landed on its default branch (virtual-env):

  • JIT pre-warm — pre-warms the numba physics-step JIT before the RealtimeEngine starts; otherwise the first ~4 s of snaps stall behind a compile holding the sim lock, so frames arrive in a burst instead of paced.
  • SimCameraDevice digital ROI — implements real ROI cropping. It also fixes an MDA-teardown bug: the camera previously raised NotImplementedError from set_roi, which aborted MDARunner._finish_run before it emitted sequenceFinished; napari-micromanager then never cleared _mda_running, so the Snap preview silently stopped updating after a run.

These are not part of this PR — faro just needs the lockfile bump above to pick them up.

Verification

Exercised end-to-end against the virtual-microscope optogenetic backend (napari + napari-micromanager + the widget):

  • Live status flows worker → widget on the main thread (psygnal queued delivery); strip / FOV map / stats update in real time.
  • Cancel mid-run, then restart from the notebook — reaches steady state, no stall.
  • Pause halts feeding after the backpressure window drains; resume runs to completion.
  • Frame count tracks RTMEvents 1:1 for single-channel plans; stim snaps no longer double-count.
  • 87 unit tests pass.

Compatibility notes

  • Headless / no Qt: works — psygnal delivers slots synchronously without Qt. Widget package is opt-in (import faro.widgets); import faro / import faro.core stay Qt-free.
  • MDA engines other than pymmcore-plus: no regression — the controller still talks to hardware exclusively through AbstractMicroscope.

Screenshot

Screenshot 2026-05-16 at 11 16 31 AM

hinderling and others added 7 commits May 16, 2026 11:35
Move the MDA feed loop onto a worker thread, expose live status through a
RunHandle + psygnal Signal, and add a minimal napari widget that mirrors
the current run.

Breaking change:
  ctrl.run_experiment(events, ...) and ctrl.continue_experiment(...) now
  return a RunHandle immediately instead of blocking until the run is
  done. Existing notebooks that did `ctrl.run_experiment(events, ...)`
  must be updated to either `handle = ctrl.run_experiment(events, ...);
  handle.wait()` for the old blocking semantics, or to use the new
  non-blocking flow (poll handle.status(), subscribe to
  handle.statusChanged, call handle.cancel() to stop early).

What's in this commit:

- faro/core/run_status.py (new):
  * RunStatus -- immutable snapshot dataclass with state, event/FOV
    indices, frame count, lag_ms, error info.
  * RunHandle -- owns the worker thread + cooperative cancel event,
    exposes status()/wait()/cancel()/is_running() + a psygnal
    statusChanged signal that emits the latest RunStatus on each update.
    Subscribers on the main thread see queued-connection delivery via
    psygnal's Qt integration.

- faro/core/controller.py:
  * Controller exposes a class-level runStarted = Signal(object). Fires
    on every new run/continue so widgets can re-bind.
  * run_experiment / continue_experiment spawn a worker thread, return
    the handle, emit runStarted. Validation still happens synchronously
    so a bad event list raises on the calling thread.
  * _run_worker centralises pre-flight setup (writer init -- including
    the potentially-slow zarr rmtree on overwrite -- and Analyzer
    construction) and wraps the feed loop in try/except so worker-side
    failures land in handle.fatal_error rather than crashing the user.
  * _run_mda_with_events accepts the handle, checks handle.cancel_event
    at each loop iteration and in the backpressure throttle, asks the
    engine to cancel the in-flight event when set, and emits status
    updates on each RTMEvent dequeue.
  * _on_frame_ready (and ControllerSimulated._on_frame_ready) call a
    shared _bump_status_for_frame helper that increments
    n_frames_received and computes lag_ms vs event.min_start_time.
  * Now off the main thread, all the prior Qt-pumping helpers
    (_pump_qt_and_sleep, _qt_join, _wait_for_frame_pumping_qt) and the
    superqt ensure_main_thread import are obsolete and removed. The
    preview-layer machinery (viewer=, _on_preview_frame, _apply_preview,
    PREVIEW_LAYER_NAME) is also removed -- napari-micromanager's own
    _NapariMDAHandler already routes generator events into the preview
    layer.
  * finish_experiment now waits for the current handle before shutting
    down the Analyzer.
  * _pending_sentinels guarded by a Lock since extend_experiment now
    runs on the calling thread while the feed loop runs on the worker.

- faro/widgets/experiment_status.py (new):
  * ExperimentStatusWidget -- read-out of state, FOV, event index,
    frame count, lag, elapsed time, error count. Has a Stop button
    that calls handle.cancel(). Subscribes to controller.runStarted
    so it automatically re-binds when a new run begins; cleans up the
    previous handle's signal subscription on each rebind.

Verified end-to-end via a Qt smoke test:
  - Live updates flow from the worker thread to the widget on the main
    thread (psygnal+Qt queued delivery).
  - Stop button triggers handle.cancel(); the worker's cancel-check
    fires within one iteration and the run exits at the next event
    boundary.
  - Starting a new run re-binds the widget to the new handle and resets
    the progress bar / counters.
The OmeZarrWriter init in _run_worker still pulled image height/width
via self._mic.mmc.getImageHeight/Width -- a pymmcore-plus-specific
call that breaks any non-pymmcore microscope.

Use the AbstractMicroscope-level convention: subclasses populate
self.image_height / self.image_width on the microscope instance (Moench
already does this in init_scope). Fall back to mmc if the attributes
aren't present but mmc is, so existing pymmcore-only microscopes keep
working without code changes. Raise a clear error when neither path is
available.
Three independent bugs surfaced when running the new async
run_experiment + ExperimentStatusWidget against a napari viewer
(reproduced with the optogenetic virtual_microscope backend):

1. pymmcore-plus's signals_backend() auto-selects the *qt* backend
   whenever a QApplication is loaded. core.mda.events.frameReady then
   becomes a QtCore.SignalInstance and cross-thread emits land in
   Qt.QueuedConnection, where they're delivered only when the main
   thread pumps events. With Controller.run_experiment now spawning a
   worker and RunHandle.wait() joining on it, the main thread is
   typically idle-blocked exactly when the engine is firing frames --
   so the controller's _on_frame_ready never ran, the engine completed
   "successfully" with zero frames received, and the pipeline never
   saw any data. Force PYMM_SIGNALS_BACKEND=psygnal in
   faro/microscope/base.py so the data path stays direct/synchronous
   on the engine thread regardless of whether Qt is loaded. The
   widget-side path (RunHandle.statusChanged) still uses psygnal's
   own queued delivery -- see fix #2.

2. ExperimentStatusWidget connected handle.statusChanged with the
   default (direct) connection. Status updates emitted from the worker
   thread therefore ran the widget's _refresh slot synchronously
   off-main, calling QLabel.setText / QProgressBar.setValue from a
   non-GUI thread. Under napari that lands in vispy's OpenGL
   compositor and aborts with "Cannot make QOpenGLContext current in
   a different thread" -> SIGABRT (kernel hard-crash in VSCode
   Jupyter). Switch to connect(..., thread="main") so psygnal queues
   the call into its main-thread queue.

3. psygnal's queued callbacks live in QueuedCallback._GLOBAL_QUEUE,
   which nothing drains by default -- the widget would be invoked on
   the main thread, but only when something explicitly calls
   psygnal.emit_queued(). RunHandle's docstring claims auto-Qt
   delivery; that's not how psygnal actually works. Call
   psygnal.qt.start_emitting_from_queue() in the widget's __init__,
   which installs a main-thread QTimer that fires emit_queued() on
   every Qt event-loop tick. Idempotent and global, so multiple
   widgets / multiple runs are safe.

Lockfile: bump pymmcore-widgets (8c8f76e -> 48ff414) so the unrelated
upstream crash in pymmcore_widgets._presets_widget._on_property_changed
when handed an empty device label (virtual_microscope's shutter)
is included. Without that bump, the MDA engine itself aborts on the
first setShutterOpen() once frames actually start flowing.

Verified end-to-end against virtual_microscope's optogenetic backend:
- headless async run: 5/5 frames (regression check, unchanged)
- napari.Viewer() + handle.wait():     5/5 frames (was 0/5)
- napari + napari-micromanager + widget: 5/5 frames, no crash, exit 0
- widget visibly updates progress / frames / state mid-experiment
  (sampled QLabel.text() while pumping Qt events)
- 87 unit tests still pass
Sibling of demo_sim_optogenetic.ipynb that exercises the new async
run_experiment + RunHandle + ExperimentStatusWidget end-to-end against
virtual_microscope's optogenetic backend, with a live napari viewer
dock-attached.

Walks through: handle = ctrl.run_experiment(...) is non-blocking, the
kernel is free; poll handle.status() while it runs; subscribe to
handle.statusChanged from the kernel side; cancel via the widget Stop
button or handle.cancel(); handle.wait() blocks if you want the
old synchronous semantics; continue_experiment() re-binds the widget
automatically via runStarted.

Phases are concatenated with combine(..., axis="t") per the new
RTMSequence API.
Backend changes that make an async run inspectable and steerable --
the data the new ExperimentStatusWidget renders, plus two bug fixes
surfaced while building it.

run_status.py
  - RunHandle.events: optional snapshot of the (sorted) RTMEvents the
    handle is driving, so widgets can render per-event visualisations
    (event strip, FOV map) that need the full plan up front.
  - Pause/resume: RunState gains "pausing"/"paused"; RunHandle gains
    pause()/resume()/is_paused() and a pause_event the feed loop polls.
    cancel() now also clears the pause event so a cancel while paused
    still releases the feed loop.

controller.py
  - run_experiment / continue_experiment sort events once (by
    min_start_time, then position) and stash the sorted list on the
    handle, so the order the worker processes matches what the widget
    displays.
  - Feed loop honors pause_event: before pulling the next RTMEvent it
    checks the flag, flips state to "paused", and idles until resume()
    -- the MDA engine drains whatever is already queued, then waits.
  - fix: the engine queue (self._queue) is recreated per run. The
    finally-block feeds a STOP_EVENT sentinel to stop the engine; on a
    *cancelled* run cancel_mda() aborts the engine, which may stop
    without draining the queue, leaving stale events + the sentinel
    behind. Reusing that queue made the next run's engine consume the
    stale sentinel and exit after a few events ("stuck at 3/80"). A
    fresh queue per run fixes it.
  - fix: _bump_status_for_frame skips IMG_STIM frames. A stim emission
    is the SLM-illuminated snap paired with its imaging frame; counting
    it double-updated the status (lag/elapsed refreshing twice per stim
    event) and made n_frames_received drift away from the RTMEvent
    count. Imaging + ref frames are the meaningful data frames.

Verified end-to-end against the optogenetic virtual-microscope backend:
cancel mid-run then restart reaches steady state (no stall); pause
halts feeding after the backpressure window drains and resume continues
to completion; frame count tracks RTMEvents 1:1 for single-channel plans.
Rework the minimal status widget into a full run dashboard, driven by
the RunHandle data exposed in the previous commit.

Components (top to bottom):
  - State chip -- RUNNING / PAUSED / DONE / ... as plain text in a
    translucent-neutral rounded chip (no per-state fill: a colored
    banner competed with the imaging/stim/ref legend colors).
  - Legend chips -- imaging / stim / ref; the chip matching the current
    event type is fully opaque, the others dimmed.
  - EventStrip -- one cell per RTMEvent, color-coded by type. Past +
    current cells opaque (progress fill), future cells dimmed. Same-type
    runs are coalesced into single fills so thousands of events render
    with correct alpha instead of over-stacking at sub-pixel widths.
    Empty state draws a "(no events loaded)" placeholder.
  - FovMap -- one dot per unique FOV position, equal-aspect (a straight
    line of FOVs stays a line), grey visit-order path, active dot
    recolored to the current event type. Pinned square via resizeEvent.
    Paints its own rounded panel background; "FOV X/Y" counter in the
    corner.
  - Stats form -- event N/M, elapsed, scheduled, lag, remaining, errors.
    Times formatted hh:mm:ss with the leading unit suffixed and dropped
    when zero; lag turns red past 5 s. Wrapped in a shaded panel echoing
    napari's layer-controls boxes.
  - Pause/Resume + Stop buttons.

Threading / theming details:
  - statusChanged is connected with thread="main" and the widget calls
    psygnal.qt.start_emitting_from_queue() so worker-thread emits are
    delivered on the GUI thread (drives QWidgets safely under napari).
  - A 250 ms QTimer ticks the elapsed/remaining clocks between status
    emissions so time fields don't freeze between frames.
  - The strip cursor tracks n_frames_received (actual snaps), not
    n_events_consumed (the feed loop runs 3-4 ahead via backpressure,
    which made the strip jump several cells at run start).
  - Colors/fonts derive from the Qt palette so the widget adapts to
    napari's light/dark theme; corner radii match napari widgets.
Add a second stage position (20, 20, 0) to the baseline / stim /
recovery sequences so the demo exercises a 2-FOV acquisition -- the
ExperimentStatusWidget's FOV map then shows both positions and the
visit-order path between them. Drop the frame interval 1.5s -> 1s.
@hinderling hinderling force-pushed the feat/async-run-handle branch from d473b9b to 3c0e798 Compare May 16, 2026 09:53
@hinderling hinderling marked this pull request as ready for review May 16, 2026 11:29
@hinderling
Copy link
Copy Markdown
Collaborator Author

@alandolt can you have a look if you see any general issues with this architecture change? still a few open TODOs before merging, but the main idea is there i think! but would be great to have your input before i start migrating the other notebooks etc. I think this will also be useful more long-term, running experiments on different microscopes simultaneously with BO for example, in combo with pymmcore-proxy.

Add FrameDispenser.cancel() and the FrameWaitCancelled exception so a
thread blocked in wait_for_frame / get_predecessor is woken immediately
instead of sitting out the full timeout. This lets an experiment abort
promptly: a feed loop parked in an up-to-80s stim-mask wait is released
the instant the run is cancelled.
Cancellation: RunHandle gains an on_cancel hook, invoked synchronously
from cancel(), that wakes a feed loop blocked in a stim-mask wait via
Analyzer.cancel_pending_waits(). Previously a cancel issued during that
wait took up to the stim-mask timeout (~80s) to take effect, leaving
the frame handler connected in the meantime.

Queue stats: Analyzer.queue_stats() / Controller.queue_stats() expose
storage, pipeline and deferred queue depths for the status widget.

finish_experiment runs its teardown (run wait + Analyzer drain) on a
worker thread and pumps Qt, so napari stays responsive during the drain.

Lag is anchored to the first frame's acquisition start rather than the
worker's start time, so worker/engine startup (~1s) is no longer
charged to every lag reading.
Stop now cancels the run and then runs finish_experiment(), so the next
run starts clean instead of leaking the old Analyzer; the state banner
shows STOPPING... while the drain runs.

Stats are split into three panels (timing / queues / errors). The
storage and pipeline queue depths render as grayscale fill bars that
turn red past 80% of capacity; deferred shows as a plain count. The
FovMap is freely resizable instead of pinned square.
@alandolt
Copy link
Copy Markdown
Contributor

looks super cool and well executed. Thanks.
After a first glance through the code I don't see any issue, will probably soon push forward to also expand controller by an update method that replaces the old stored acquisition events (as seen here https://github.com/pertzlab/faro/blob/main/faro/core/controller.py), as for my agent stuff this is the way to go for some agent classes.
Will try it out on the real mic tomorrow.

hinderling added 15 commits May 21, 2026 12:58
Empty the backpressure window (~3 queued MDAEvents) into a held buffer
when the user pauses, refilling on resume. On sparse experiments the
queued events would otherwise keep snapping for minutes after Pause.
min_start_times are not shifted -- late events catch up on resume.
WaitEvent is an RTMEvent that emits no MDAEvents; wait(duration_s)
builds one and combine() treats it as a pure time marker -- it extends
wall-clock for subsequent phases but claims no t/p index. The feed loop
waits for the engine to catch up, then counts down to
max(scheduled_start, now) + duration_s so a pause-drain can't eat the
wait window. Adds a "waiting" RunState + wait_remaining_s, and anchors
started_at to the first acquired frame so a leading wait doesn't tick
elapsed/lag. Demo notebook brackets the stim phase with wait(5).
Show a "WAITING hh:mm:ss" countdown banner, keep the strip cursor on the
wait cell during the countdown, and draw wait cells as solid gray (hatch
overlaid only when wide enough to read). Pause->Resume flips immediately
during a wait by reading pause_event rather than the run state. Also
remove the 1px inter-phase gap (runs span to the next run's start; active
border widened to match) and dim inactive legend chips.
Pin the invariants: pause/resume changes when frames are acquired, not
what -- a paused run yields byte-identical OME-Zarr to an unpaused one,
same frame count, clean cancel-during-pause. WaitEvents claim no t/p
index, emit no MDAEvents (add time, not frames), and shift subsequent
min_start_times by at least their duration. Pause is driven by polling
status() with min_start_time-spaced events so the tests are deterministic.
tests/fixtures.py imports TrackerMotile at module load, so test
collection fails without motile installed.
motile is the non-default tracking backend (a runtime dep), not a test
tool, so list it as a feature extra alongside the other backends and
have the test extra pull it in via faro[motile].
A WaitEvent carries no channels and no metadata, so events_to_dataframe
emitted a bogus zero-channel imaging row for it and validate_pipeline
falsely flagged it as missing required metadata. Skip WaitEvents in both
(they are timed gaps, not acquired frames). Add regression tests,
including that validate_hardware already tolerates them.
Niesen runs a WakeUpLaser keepalive thread and holds a DMD/SLM handle
but had no shutdown() override, so pymmcore native threads could keep
the process alive as a zombie that blocks the next session. Mirror
Moench.shutdown: stop the keepalive thread and unloadAllDevices.
Re-run outputs for the wait-bracketed experiment (no source changes).
run_experiment/continue_experiment are non-blocking and return a
RunHandle; update every example to call .wait() and add a usage-level
note covering pause/resume/cancel and the ExperimentStatusWidget, plus
a "Timed waits" note for wait() between phases via combine.
run_experiment is now non-blocking; these notebooks read results or call
post_experiment right after, so add .wait(). The demo_sim_optogenetic
runs had post_experiment() between run and finish, which raced.
Edit-only; outputs not refreshed.
Split each run cell into an async launch (dock ExperimentStatusWidget,
return handle) and a finalize cell (handle.wait() before post_experiment
and result reads). Drop the post-run time.sleep(10/90) plumbing waits,
now covered by handle.wait() + finish/post drain. Edit-only.
handle.wait() only joins the run worker, not the analyzer that writes
tracks. The deleted time.sleep() was the crude analyzer-drain wait;
replace it with ctrl.finish_experiment() (waits for the run + drains
the pipeline) before generate_exp_data_from_tracks / parquet reads.
stim_rtmsequence: dock ExperimentStatusWidget; its existing
finish_experiment() already waits for the async run + drains.
stim_dfacquire: serialize the two-controller phases with handle.wait()
so they don't run concurrently on one microscope, and dock the widget.
(Pre-existing bug left untouched: cell-12/cell-16 are mis-tagged as
markdown, so df_acquire_2 + post-processing don't execute.)
Six sequential phases driven by run_experiment + chained
continue_experiment on one controller, with manual drug pipetting at
each cell boundary. Add .wait() after every run/continue so a phase
finishes before the next continue_experiment (which would otherwise
raise "already running") and before the napari reconnect/pipetting
pause. Preserves the original blocking semantics; dock the status
widget once (re-binds on each phase via runStarted).
hinderling and others added 12 commits May 28, 2026 10:26
cell-12 (builds df_acquire_2) and cell-16 (post-processing) were tagged
as markdown, so they never executed -- cell-15 referenced an undefined
df_acquire_2 and results were never written. Convert both to code. The
post-processing cell also gets the async treatment: wait on phase 2,
drain via finish_experiment (replacing the dropped sleep), then read.
napari-micromanager keeps routing frames into its preview layer during a
run (the controller only stops continuous acquisition once at MDA start),
so the manual mm_wdg._core_link.cleanup() before each run + CoreViewerLink
reconnect after is no longer needed. Remove it from run/phase cells so
the live link stays connected throughout. DMD-calibration cells (which
drive the camera directly, outside an MDA run) and the manual
reconnect/break utility cells are left as-is.
uv lock --upgrade-package virtual-microscope: e2aca8da -> bd4ac3e3 to
pick up the JIT pre-warm + SimCameraDevice digital-ROI/MDA-teardown
fixes on its default branch. Also syncs the motile extra into the lock
(added to pyproject earlier but not yet relocked).
calibrate_dmd gains background=True (default): it runs dmd.calibrate on
a DMDCalibration worker thread while pumping Qt in the caller, so napari
keeps updating the live preview during calibration instead of freezing.
Adds a module-level _pump_qt_events helper; background=False runs the
calibration synchronously as before.

Also fix dmd.py's frameReady.disconnect() calls: the calibration's
frame-collection blocks now disconnect their own named handlers
(frameReady.disconnect(handler)) instead of the no-arg disconnect, which
also tore down napari-micromanager's preview listener and left the
preview dead after calibration.
tests/hardware/test_cell_migration.py and test_line_stimulation.py were
left behind when the hardware suite moved into tests/hardware/pertzlab/.
They're superseded by the pertzlab copies (line_stimulation is identical
bar the import; cell_migration's pertzlab version adds the shared
cellpose fixture + a timestep-ordering check). Running both trees in one
pytest session spun up a second session-scoped Moench, which then failed
to open the DMD ("Mosaic3: No Mosaic3 devices found") because the first
microscope already held it. Drop the duplicates and their now-orphaned
tests/hardware/conftest.py.
Scale the lag stress test down: N_FRAMES 12->6, interval 5->2 s, pipeline
delay 7->3 s. The lag invariant is the delay/interval ratio
(interval < delay < 2*interval), preserved here, so coverage is
identical -- the pipeline still lags ~1.5 frames and every stim mask
must still land without a dispenser skip or deadlock. Verified green on
the Moench.
TIME_BETWEEN_TIMESTEPS_S dominated wall-clock on every acquisition test
(4 frames x 5 s = 20 s scheduled per test). Drop it to 2 s -- the 3-FOV
acquisition fits comfortably and the tests assert correctness (which
frames stim, masks present, no background errors), not the interval, so
coverage is unchanged. Also drop stim_mask_timeout's pipeline delay
10 s -> 3 s (still above its 1 s timeout) to speed the post-run drain.

Verified on the Moench: pertzlab suite 14 passed / 1 skipped, 4:15 -> 2:50.
The cellpose + empty-fov tests got the same mechanical interval change
but were not run here (cellpose extra not installed; empty-fov skips
without .preflight.json).
apply_fov_batching gains offset_min_start_time (default True). FOVs in
a batch are imaged sequentially, not simultaneously, so the k-th FOV of
a batch only starts ~k * time_per_fov after the batch's first FOV.
Encoding that offset into each event's min_start_time keeps the
scheduled per-FOV frame interval consistent and makes lag measurement
meaningful (lag is acquisition-start minus min_start_time).

The first FOV of every batch gets a 0 within-batch offset; batches
after the first still get their batch wall-clock offset on top.
statusChanged is delivered cross-thread (worker -> Qt main) via psygnal's
queued emission, which proved unreliable in some embeddings (notably
Jupyter notebooks): the event strip / FOV cursor / counters stayed
frozen even though handle.status() was current and Stop still worked.
The _tick QTimer already polled the time + queue fields between
emissions; make it do a full _refresh from the latest status snapshot so
the whole widget stays live regardless of signal delivery. The
statusChanged connection stays as a push optimisation; _refresh's
repaints are change-guarded, so a full refresh at the 250 ms tick is
cheap.
pipeline.run() names the per-phase tracks file
{fov}_phase_{phase_id}_latest.parquet whenever phase metadata is present
and reads metadata['phase_id'] directly. The guard keyed off
"phase_id in metadata OR phase_name in metadata", so supplying phase_name
without phase_id passed the guard and then KeyError'd mid-run, after
acquisition had already started.

- validate_pipeline now flags any event carrying phase_name but no
  phase_id, so validate_events fails up front with a clear message.
- run() keys the per-phase filename off phase_id alone, so the
  combination can no longer crash even when validation is skipped.
pipeline_post.run() (the deferred/reprocess-from-disk path) had the same
"phase_id in metadata or phase_name in metadata" guard followed by a
direct metadata['phase_id'] access as pipeline.run(). Key it off
phase_id alone so a phase_name-without-phase_id event can't KeyError on
the deferred path either.
A WaitEvent carries an explicit gap (duration_s); _combine_pair was
adding the inferred inter-source interval on top of it, double-counting
the wait. The feed loop sleeps for duration_s AND the next source's
min_start_time included the wait *plus* an interval -- so
combine(wait(10), phase) at a 10 s interval started the first
acquisition at t=20 (10 s wait countdown + 10 s engine pacing) instead
of t=10.

Skip _infer_interval when either side of the merge boundary is a
WaitEvent. combine(wait(10), phase) now starts the first frame at t=10,
and combine(a, wait(10), b) starts b exactly duration_s after a's last
frame (verified: [0,10,20] + wait(10) -> [30,40,50]).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants