Skip to content

Fix cold-launch races in PHP runtime boot and Laravel extraction#102

Merged
shanerbaner82 merged 2 commits intomainfrom
fix/persistent-boot-race-gate
Apr 24, 2026
Merged

Fix cold-launch races in PHP runtime boot and Laravel extraction#102
shanerbaner82 merged 2 commits intomainfrom
fix/persistent-boot-race-gate

Conversation

@shanerbaner82
Copy link
Copy Markdown
Contributor

@shanerbaner82 shanerbaner82 commented Apr 24, 2026

Summary

Two related cold-launch races surfaced by background-tasks testing on both iOS and Android. Both are fixed here with targeted gates at the C and Kotlin layers; no changes to happy-path behavior.

1. PHP runtime boot race (C layer, iOS + Android)

The ephemeral and worker runtimes piggyback on the global tsrm_startup/sapi_startup done inside the persistent runtime's php_embed_init. They skip straight to ts_resource(0) and module startup, assuming persistent has already finished its init. If something triggers ephemeral/worker boot before persistent completes:

  • iOS: a BGTaskScheduler handler fires on a background thread while the main-thread persistent runtime is mid-php_embed_init. ts_resource returns valid TSRM storage but sapi_globals is still unallocated; sapi_initialize_empty_request writes to NULL and we crash (EXC_BAD_ACCESS). This is the crash reported by users running the background-tasks plugin on physical devices.
  • Android: WorkManager starts an ephemeral task while native_persistent_boot is mid-php_embed_init. They hold different mutexes, so the ephemeral reads php_initialized == 0 and takes the cold path, calling a second php_embed_init concurrently with the persistent thread and corrupting TSRM/SAPI globals.

Fix: add a pthread_cond_t-backed boot-state gate (NEVER_STARTED / IN_PROGRESS / SUCCEEDED / FAILED) that persistent_boot transitions through. Worker/ephemeral callers wait out IN_PROGRESS before proceeding. iOS returns an error if persistent didn't succeed (it requires persistent to piggyback on); Android preserves its cold-path fallback after the wait settles, so WorkManager-started-from-killed still works. 10s timeout on both.

Shutdown resets state to NEVER_STARTED so a subsequent boot cycle transitions cleanly.

2. Laravel extraction race (Kotlin layer, Android-only)

After an APK update, WorkManager can resurrect scheduled jobs from the previous install and fire PHPSchedulerWorker before MainActivity's init thread finishes deleting and re-extracting the Laravel bundle. The worker boots an ephemeral PHP (correctly, via the cold path — so the C gate above correctly does not block it) but invokes \Native\Mobile\Runtime::artisan() against a mid-delete vendor/ tree, producing Class "Native\Mobile\Runtime" not found.

The C gate doesn't cover this because persistent_boot_state is still NEVER_STARTED when the worker arrives — the gate correctly allows the cold path. The gap is at the Kotlin layer: extraction and ephemeral dispatch run on different threads with no shared serialization.

Fix: add a process-wide ReentrantLock around extractLaravelBundle() and have initializeForBackground() run extraction too (the existing isUpToDate check makes it idempotent):

  • Whichever entry grabs the lock first extracts; the other blocks and short-circuits on version match.
  • WorkManager cold start after an APK update is safe — the worker does the extraction itself if MainActivity isn't handling it.
  • Warm launches pay one uncontended lock acquisition plus a version-file read.

Testing

Verified both fixes end-to-end on an emulator against the bundled kitchen-sink app:

  • Before, Android: post-update install shows PHPSchedulerWorker: Executing scheduled command: sync:data at t=+1s followed by Ephemeral artisan error: Class "Native\Mobile\Runtime" not found while bash rm is still running.
  • After, Android: same scenario, post-update install. LaravelInit extracts 13:30:52 → 13:30:56; persistent boots at 13:30:57; worker boots at 13:30:58; PHPSchedulerWorker fires at 13:30:59, takes the hot path cleanly (ephemeral_embed_init: hot path — using existing TSRM), runs sync:data and sync:data-two which each fire a LocalNotification.Show successfully.
  • iOS: clean launch. Both gates transition correctly in order — PHP-WORKER: php_embed_init SUCCESS (C state → SUCCEEDED) → PHPScheduler: runtime marked ready (Swift latch opens) → worker thread spawns without waiting. Companion PHPScheduler gate in mobile-background-tasks 0.0.3 fast-paths this when the group is already empty.

Test plan

  • Android: update-install during queued WorkManager task, verify no Class "Native\Mobile\Runtime" not found.
  • iOS: cold-launch with BGTasks registered, verify no sapi_initialize_empty_request crash in logs.
  • Warm-launch regression check: persistent boot time unchanged (~135ms Android, ~136ms iOS).
  • Soak test on a physical iOS device overnight to confirm the reported "crashes multiple times a day" symptom is resolved.
  • Run existing unit/integration tests.

Related

Companion fix in NativePHP/mobile-background-tasks (commit d7c2514) adds a Swift-side PHPScheduler.runtimeReadyGroup latch so handleTask doesn't even spawn ephemeral work until persistent boot completes. That layer is defensive / fast-path; this PR closes the window at the C runtime layer so any future plugin doing background PHP work is also safe.

🤖 Generated with Claude Code

shanerbaner82 and others added 2 commits April 24, 2026 11:44
On both iOS and Android, the worker and ephemeral runtimes piggyback on
the global tsrm_startup/sapi_startup done inside the persistent runtime's
php_embed_init(). They skip straight to ts_resource(0) and module startup,
assuming persistent has already finished its init.

A cold-launch race could violate that assumption:

* iOS: a BGTaskScheduler handler fires on a background thread while the
  main-thread persistent runtime is mid-php_embed_init. ts_resource
  returns valid TSRM storage but sapi_globals is still unallocated,
  and sapi_initialize_empty_request writes to NULL (EXC_BAD_ACCESS).

* Android: WorkManager starts an ephemeral task while persistent_boot
  is mid-php_embed_init. They hold different mutexes, so the ephemeral
  reads php_initialized==0 and takes the cold path — calling a second
  php_embed_init concurrently with the persistent thread, corrupting
  TSRM/SAPI globals.

Add a pthread_cond-backed boot-state gate (NEVER_STARTED / IN_PROGRESS /
SUCCEEDED / FAILED) that persistent_boot transitions through, and have
worker/ephemeral callers wait out IN_PROGRESS before proceeding. On iOS
the gate returns an error if persistent didn't succeed (they require it).
On Android it just waits for settlement, preserving the cold-path fallback
that WorkManager needs when the process was killed. 10s timeout on both.

Shutdown resets the state to NEVER_STARTED so a subsequent boot cycle
transitions cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Android testing of the persistent_boot gate surfaced a distinct cold-launch
race. After an APK update, WorkManager can resurrect scheduled jobs from the
previous install and fire PHPSchedulerWorker before MainActivity's init
thread finishes deleting and re-extracting the Laravel bundle. The worker
then boots an ephemeral PHP runtime (correctly, via the cold path) but
invokes \Native\Mobile\Runtime::artisan() against a mid-delete vendor/
tree, producing `Class "Native\Mobile\Runtime" not found`.

The C-level boot gate doesn't cover this because persistent_boot_state is
still NEVER_STARTED when the worker arrives — the gate correctly allows the
cold path. The gap is at the Kotlin layer: extraction and ephemeral
dispatch run on different threads with no shared serialization.

Fix: add a process-wide ReentrantLock around extractLaravelBundle() and
have initializeForBackground() run extraction too. The existing
isUpToDate check makes extraction idempotent, so:

* Whichever entry grabs the lock first extracts; the other blocks and
  then short-circuits on the version match.
* WorkManager cold start after an APK update is safe — the worker will
  do the extraction itself if MainActivity isn't handling it.
* Warm launches pay one uncontended lock acquisition plus a version-file
  read.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shanerbaner82 shanerbaner82 merged commit 53b5d2f into main Apr 24, 2026
0 of 5 checks passed
@shanerbaner82 shanerbaner82 deleted the fix/persistent-boot-race-gate branch April 24, 2026 17:37
shanerbaner82 added a commit that referenced this pull request Apr 28, 2026
#102 added an extractionLock companion at the top of the class without
noticing the existing companion further down, producing two top-level
companions and a Kotlin compile error ("Only one companion object is
allowed per class") that cascaded into ~150 unresolved references.

Move extractionLock into the existing companion so the constants and
the lock share one block.
@bkuhl
Copy link
Copy Markdown
Contributor

bkuhl commented Apr 28, 2026

Because this build wasn't passing, this caused the main branch's builds to begin failing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants