diff --git a/docs/pse51-matrix.md b/docs/pse51-matrix.md
index 3ffcc90..acf9c01 100644
--- a/docs/pse51-matrix.md
+++ b/docs/pse51-matrix.md
@@ -18,16 +18,10 @@ PSE51 interfaces fare on top of that base.
 Mazu's PSE51-oriented userspace ABI is feature-complete: every
 mandatory PSE51 syscall is wired and exercised by selftests.
 
-Two narrow gaps remain that are not blocking and have reasonable
+One narrow gap remains that is not blocking and has reasonable
 default behavior today:
 
-1. **`sigqueue` per-signal value payload**. Mazu signals are
-   level-style: a single bit per signal. The wait-for-signal API
-   (`SYS_SIGSUSPEND`, `SYS_SIGTIMEDWAIT`) is wired and `sysconf`
-   reports `_POSIX_REALTIME_SIGNALS = 1` (subset), but
-   `sigqueue` with a payload value is not implemented. Closing
-   this gap requires a bounded per-signal queue subsystem.
-2. **`pthread_attr_*` libc family**. Strictly a libc-side
+1. **`pthread_attr_*` libc family**. Strictly a libc-side
    concern. The kernel ABI accepts the resolved (entry, arg,
    prio) tuple and exposes per-thread `setschedparam` /
    `getschedparam`; once a PSE51 libc lands it can synthesize
@@ -59,7 +53,8 @@ libc attr family are out of scope for the kernel layer.
 ## What Mazu ships today (PSE51-relevant)
 
 The following PSE51 services are present and exercised by selftests
-(`tests/tests-pse51.c`, `tests/tests-syscall.c`,
+(`tests/tests-pse51.c` as the consolidated profile suite, with
+subsystem regression detail in `tests/tests-syscall.c`,
 `tests/tests-mqueue.c`, `tests/tests-posix_timer.c`,
 `tests/tests-rwlock.c`, `tests/tests-barrier.c`,
 `tests/tests-condvar.c`, `tests/tests-semaphore.c`,
@@ -144,8 +139,8 @@ sync handle table (`kernel/sync/sync_handle.c`).
 | `pthread_sigmask` | `SYS_PTHREAD_SIGMASK` | implemented | Same wire shape as `SYS_SIGPROCMASK`; both operate on the calling thread's `td_sig.blocked`. Distinct syscall numbers so libc can keep `pthread_sigmask` and `sigprocmask` as separate ABI surfaces. |
 | `pthread_kill` | `SYS_PTHREAD_KILL` | implemented | Thread-directed signal: bit lands on the named thread's `td_sig.pending` rather than the per-proc `proc_pending` mask. Takes a `CAP_TYPE_THREAD` handle. SIGKILL rejected with EINVAL (must be process-wide). |
 | `sigsuspend` | `SYS_SIGSUSPEND` | implemented | Replace blocked mask with the supplied set, yield-loop until a deliverable signal arrives, restore prior mask, return EINTR. |
-| `sigtimedwait` / `sigwait` / `sigwaitinfo` | `SYS_SIGTIMEDWAIT` | implemented-with-mazu-abi | Block until any signal in the supplied set is pending; dequeue without invoking the handler; return signo. Honors `struct timespec *` timeout (NULL = wait forever; expired = EAGAIN). |
-| `sigqueue` value delivery | (none) | stubbed | Mazu signals are level-style: a single bit per signal in `pending`, no per-signal value queue. The wait API set above advertises `_POSIX_REALTIME_SIGNALS = 1` (subset) but `sigqueue` with a payload value requires an additional bounded queue subsystem. |
+| `sigtimedwait` / `sigwait` / `sigwaitinfo` | `SYS_SIGTIMEDWAIT` | implemented-with-mazu-abi | Block until any signal in the supplied set is pending; dequeue without invoking the handler; return signo. Honors `struct timespec *` timeout (NULL = wait forever; expired = EAGAIN). Mazu ABI also accepts an optional payload-out pointer in `a3`; queued `sigqueue` values are surfaced there when present. |
+| `sigqueue` value delivery | `SYS_SIGQUEUE` | implemented-with-mazu-abi | Process-directed queued values use a bounded per-signal ring (`SIGQUEUE_MAX_PER_SIGNO` entries per signo, with one extra internal slot reserved so a single in-flight `SYS_SIGTIMEDWAIT` consumer can losslessly roll back a dequeued payload if `copy_to_user` faults after the lock was dropped). Lossless rollback is guaranteed for the single-consumer case; if multiple threads simultaneously fault their rollbacks for the same signo, the helper drops one payload as defense-in-depth and surfaces a plain pending instance so the signal stays observable. Plain `kill` remains level-style and is tracked on a separate `proc_pending_plain` mask so it cannot be silently swallowed by a queued instance of the same signo. Queued payloads are observable via `SYS_SIGTIMEDWAIT` and friends, while one-argument signal handlers still receive only signo. |
 | `sigprocmask` (single-threaded) | `SYS_SIGPROCMASK` | implemented | Modify-and-return-old of the calling thread's `td_sig.blocked` under `sig_lock`. SIGKILL cannot be blocked. The mask migrated from per-process to per-thread when `SYS_PTHREAD_SIGMASK` landed; both syscalls now share the same backing field with distinct wire shapes so libc can keep them as separate ABI surfaces. |
 | `raise` | (libc) | not-applicable | Library-level wrapper for `kill(getpid(), sig)`; covered by `SYS_KILL`. |
 
@@ -249,7 +244,7 @@ feature-test value when implemented, or `-1` when absent):
 | `_SC_THREAD_PRIORITY_INHERIT` | `_POSIX_THREAD_PRIO_INHERIT` (200809L) | PI mutex is the only mutex flavor. |
 | `_SC_MESSAGE_PASSING` | `_POSIX_MESSAGE_PASSING` (1) | Anonymous queues only. |
 | `_SC_SPIN_LOCKS` | -1 | No userspace `pthread_spin_*` surface today, so the macro is intentionally not defined. |
-| `_SC_REALTIME_SIGNALS` | `_POSIX_REALTIME_SIGNALS` (1) | Wait-for-signal API present (sigsuspend / sigtimedwait); per-signal value queue (sigqueue) not implemented. |
+| `_SC_REALTIME_SIGNALS` | `_POSIX_REALTIME_SIGNALS` (1) | Wait-for-signal API is present. Bounded `sigqueue` payload delivery exists, but via a Mazu-specific extension rather than the full POSIX `siginfo_t` / `SA_SIGINFO` contract. |
 | `_SC_THREADS` | `_POSIX_THREADS` (1) | `SYS_THREAD_*` present; `PROC_THREAD_MAX = 4`. |
 | `_SC_THREAD_CPUTIME` | `_POSIX_THREAD_CPUTIME` (200809L) | `clock_gettime(CLOCK_THREAD_CPUTIME_ID, ...)` measures the calling thread's accumulated CPU time. |
 | `_SC_CPUTIME` | `_POSIX_CPUTIME` (200809L) | `clock_gettime(CLOCK_PROCESS_CPUTIME_ID, ...)` returns the sum across all live threads in the calling process. |
@@ -277,7 +272,6 @@ The bounded multi-threaded process model is in place: per-thread
 state migration (signal pending/blocked, signal-frame chain, robust
 futex list, errno TLS) and the user-visible pthread surface
 (`SYS_THREAD_CREATE` and friends) have both landed, with
-`PROC_THREAD_MAX = 4`. The two remaining gaps are the `sigqueue` payload
-queue (requires a bounded per-signal queue subsystem) and the
-`pthread_attr_*` libc family (strictly a libc-side concern; the
-kernel ABI already accepts the resolved (entry, arg, prio) tuple).
+`PROC_THREAD_MAX = 4`. The remaining gap is the `pthread_attr_*`
+libc family (strictly a libc-side concern; the kernel ABI already
+accepts the resolved (entry, arg, prio) tuple).
diff --git a/include/mazu/proc.h b/include/mazu/proc.h
index 1568227..cda139c 100644
--- a/include/mazu/proc.h
+++ b/include/mazu/proc.h
@@ -22,6 +22,16 @@
 
 /* PSE51 signal state.  31 signals (1-31) in a 32-bit bitmask. */
 #define SIG_MAX 32
+#define SIGQUEUE_MAX_PER_SIGNO 4
+
+/* Internal ring capacity is one greater than the user-visible cap so that a
+ * sigtimedwait consumer that has already dequeued a payload can always put
+ * it back if copy_to_user faults after the lock was dropped, even if a
+ * concurrent sigqueue producer filled the slot we vacated. Producers still
+ * cap at SIGQUEUE_MAX_PER_SIGNO, so EAGAIN behavior is unchanged for user
+ * space.
+ */
+#define SIGQUEUE_RING_CAP (SIGQUEUE_MAX_PER_SIGNO + 1)
 typedef void (*sig_handler_fn_t)(i32);
 struct sigaction_entry {
     sig_handler_fn_t handler;
@@ -29,16 +39,33 @@ struct sigaction_entry {
     u32 sa_flags;
 };
 
+struct signal_value_queue {
+    u64 values[SIGQUEUE_RING_CAP];
+    u8 head;
+    u8 tail;
+    u8 count;
+};
+
 /* Per-process signal state. The blocked mask and signal-frame chain live
  * per-thread (struct sched_task::td_sig); the disposition table stays
- * per-process per POSIX. proc_pending holds process-directed signals that have
- * not yet been claimed by any specific thread; the return-to-user delivery
- * path folds it into each thread's local pending view, preserving the bit even
- * if the thread that the sender first observed has since exited.
+ * per-process per POSIX.
+ *
+ * Process-directed pending state has two distinct sources that must not be
+ * conflated:
+ *   - proc_pending_plain: kill()-style instances (no queued payload). One bit
+ *     per signo records "at least one plain instance is in flight".
+ *   - queued[signo].count: sigqueue()-style payload instances, FIFO.
+ * The summary mask proc_pending is the OR of the two and is what the lockless
+ * return-to-user fast path reads. Writers under sig_lock keep it in sync.
+ * Consumers (signal_claim_proc_pending_locked, signal_deliver) take exactly
+ * one source at a time so a plain pending instance cannot be silently dropped
+ * when a queued instance for the same signo is consumed first.
  */
 struct signal_state {
     struct sigaction_entry actions[SIG_MAX];
     u32 proc_pending;
+    u32 proc_pending_plain;
+    struct signal_value_queue queued[SIG_MAX];
 };
 
 #define PROC_MAX 16
diff --git a/include/mazu/syscall.h b/include/mazu/syscall.h
index 51d0926..6457bfb 100644
--- a/include/mazu/syscall.h
+++ b/include/mazu/syscall.h
@@ -188,7 +188,13 @@
 #define SYS_CAP_REVOKE_DELEGATE 98
 #define SYS_CAP_GET_TOKEN 99
 
-#define SYS_NR 100 /* total number of syscalls */
+/* PSE51 sigqueue(): queued process-directed signal with payload. Appended at
+ * the end of the syscall table so the rest of the numbering stays stable across
+ * this branch.
+ */
+#define SYS_SIGQUEUE 100
+
+#define SYS_NR 101 /* total number of syscalls */
 
 /* pthread_setcancelstate state values. */
 #define PTHREAD_CANCEL_ENABLE 0
diff --git a/include/mazu/sysconf.h b/include/mazu/sysconf.h
index 86a90e2..195dd39 100644
--- a/include/mazu/sysconf.h
+++ b/include/mazu/sysconf.h
@@ -36,9 +36,10 @@
 #define _POSIX_THREAD_CPUTIME 200809L
 #define _POSIX_THREADS 1 /* SYS_THREAD_*; PROC_THREAD_MAX = 4 */
 /* _POSIX_REALTIME_SIGNALS reports the wait-for-signal API set
- * (sigsuspend, sigtimedwait, sigwait, sigwaitinfo). Mazu does not
- * yet implement the per-signal value queue (sigqueue), so this
- * advertises the subset value 1 rather than 200809L.
+ * (sigsuspend, sigtimedwait, sigwait, sigwaitinfo). Mazu also has a
+ * bounded sigqueue-style payload path, but it is exposed through a
+ * Mazu-specific ABI extension rather than the full POSIX siginfo /
+ * SA_SIGINFO surface, so this remains the subset value 1.
  */
 #define _POSIX_REALTIME_SIGNALS 1
 /* _POSIX_SPIN_LOCKS is intentionally not defined: there is no
diff --git a/kernel/proc/signal.c b/kernel/proc/signal.c
index 2de6479..6d09c80 100644
--- a/kernel/proc/signal.c
+++ b/kernel/proc/signal.c
@@ -2,7 +2,8 @@
 /* Signal delivery for PSE51.
  *
  * Signal delivery model:
- * - Signals are pending bits, not queued (standard POSIX behavior).
+ * - Plain kill-style signals are pending bits.
+ * - sigqueue-style process-directed payloads use a bounded per-signo queue.
  * - Delivery happens on trap exit (return-to-user path).
  * - Handler invocation: save current trap frame on the user stack,
  *   set up execution to jump to the handler, sys_sigreturn restores.
@@ -25,10 +26,15 @@ void signal_init(struct signal_state *ss)
      * task creation in sched_create_user_task; this initializer is
      * for the per-process disposition table only.
      */
+    ss->proc_pending = 0;
+    ss->proc_pending_plain = 0;
     for (i32 i = 0; i < SIG_MAX; i++) {
         ss->actions[i].handler = SIG_DFL;
         ss->actions[i].sa_mask = 0;
         ss->actions[i].sa_flags = 0;
+        ss->queued[i].head = 0;
+        ss->queued[i].tail = 0;
+        ss->queued[i].count = 0;
     }
 }
 
@@ -37,6 +43,74 @@ static inline bool sig_valid(i32 signo)
     return signo > 0 && signo < SIG_MAX;
 }
 
+static bool signal_value_queue_push_locked(struct signal_state *ss,
+                                           i32 signo,
+                                           u64 value)
+{
+    struct signal_value_queue *q = &ss->queued[signo];
+
+    /* Producers cap at the user-visible limit; the +1 internal slot is
+     * reserved for the rollback-after-fault path below.
+     */
+    if (q->count >= SIGQUEUE_MAX_PER_SIGNO)
+        return false;
+    q->values[q->tail] = value;
+    q->tail = (u8) ((q->tail + 1) % SIGQUEUE_RING_CAP);
+    q->count++;
+    return true;
+}
+
+static bool signal_value_queue_pop_locked(struct signal_state *ss,
+                                          i32 signo,
+                                          u64 *out_value)
+{
+    struct signal_value_queue *q = &ss->queued[signo];
+
+    if (q->count == 0)
+        return false;
+    if (out_value)
+        *out_value = q->values[q->head];
+    q->head = (u8) ((q->head + 1) % SIGQUEUE_RING_CAP);
+    q->count--;
+    return true;
+}
+
+/* Push a payload back at the queue head (LIFO insert used only to undo a
+ * previous pop). Always succeeds for a single in-flight consumer because the
+ * ring is sized SIGQUEUE_MAX_PER_SIGNO + 1 (the producer cap leaves one slot
+ * unused for exactly this case). Returns false only on the pathological case
+ * where multiple consumers race their rollbacks past the reserved slot.
+ */
+static bool signal_value_queue_push_head_locked(struct signal_state *ss,
+                                                i32 signo,
+                                                u64 value)
+{
+    struct signal_value_queue *q = &ss->queued[signo];
+
+    if (q->count >= SIGQUEUE_RING_CAP)
+        return false;
+    q->head = (u8) ((q->head + SIGQUEUE_RING_CAP - 1) % SIGQUEUE_RING_CAP);
+    q->values[q->head] = value;
+    q->count++;
+    return true;
+}
+
+/* Refresh the summary proc_pending bit for signo from the underlying state.
+ * Caller must hold p->sig_lock. The atomic ensures the lockless fast-path
+ * reader (signal_has_deliverable) sees a coherent value.
+ */
+static inline void sig_refresh_proc_pending_locked(struct signal_state *ss,
+                                                   i32 signo)
+{
+    bool any = (ss->proc_pending_plain & sig_bit(signo)) ||
+               ss->queued[signo].count > 0;
+    if (any)
+        __atomic_or_fetch(&ss->proc_pending, sig_bit(signo), __ATOMIC_RELAXED);
+    else
+        __atomic_and_fetch(&ss->proc_pending, ~sig_bit(signo),
+                           __ATOMIC_RELAXED);
+}
+
 static inline bool signal_restore_tf_valid(struct proc *p,
                                            const struct trap_frame *tf)
 {
@@ -64,7 +138,7 @@ static inline void signal_restore_tf(struct trap_frame *dst,
  * a newly posted signal at the earliest opportunity.
  *
  * TD_STATE_SLEEPING (nanosleep): sched_wake_sleeping cancels the sleep
- * callout, removes from sleep_list, and enqueues as READY — all under
+ * callout, removes from sleep_list, and enqueues as READY, all under
  * sched_lock, which serializes against the normal sleep callout wake.
  *
  * TD_STATE_BLOCKED / TD_STATE_SEM_WAIT (sync primitives): we cannot
@@ -167,8 +241,8 @@ i32 signal_send(struct proc *p, i32 signo)
     u64 tflags = proc_table_lock_irqsave();
     if (p->state != PROC_STATE_FREE && p->state != PROC_STATE_ZOMBIE) {
         u64 sflags = proc_sig_lock_irqsave(p);
-        __atomic_or_fetch(&p->sig_state.proc_pending, sig_bit(signo),
-                          __ATOMIC_RELAXED);
+        p->sig_state.proc_pending_plain |= sig_bit(signo);
+        sig_refresh_proc_pending_locked(&p->sig_state, signo);
         proc_sig_unlock_irqrestore(p, sflags);
         delivered = true;
         if (signo == SIGKILL)
@@ -191,6 +265,97 @@ i32 signal_send(struct proc *p, i32 signo)
     return 0;
 }
 
+i32 signal_queue_send(struct proc *p, i32 signo, u64 value)
+{
+    if (!p || !sig_valid(signo))
+        return -(i32) EINVAL;
+
+    bool need_kill = false;
+    bool delivered = false;
+    i32 rc = 0;
+
+    u64 tflags = proc_table_lock_irqsave();
+    if (p->state != PROC_STATE_FREE && p->state != PROC_STATE_ZOMBIE) {
+        u64 sflags = proc_sig_lock_irqsave(p);
+        if (!signal_value_queue_push_locked(&p->sig_state, signo, value))
+            rc = -(i32) EAGAIN;
+        else {
+            sig_refresh_proc_pending_locked(&p->sig_state, signo);
+            delivered = true;
+            if (signo == SIGKILL)
+                need_kill = true;
+            else
+                signal_interrupt_task(signal_pick_wake_target_locked(p, signo));
+        }
+        proc_sig_unlock_irqrestore(p, sflags);
+    }
+    proc_table_unlock_irqrestore(tflags);
+
+    if (rc < 0)
+        return rc;
+    if (!delivered)
+        return 0;
+
+    if (need_kill) {
+        proc_exit(p, -SIGKILL);
+        return 0;
+    }
+    return 0;
+}
+
+bool signal_claim_proc_pending_locked(struct proc *p,
+                                      i32 signo,
+                                      u64 *out_value,
+                                      bool *out_has_value)
+{
+    if (!p || !sig_valid(signo) ||
+        (p->sig_state.proc_pending & sig_bit(signo)) == 0)
+        return false;
+
+    /* Prefer a queued sigqueue payload (FIFO). If none is queued but a plain
+     * kill-style instance is still pending, consume that instead. Each call
+     * consumes exactly one source so kill() and sigqueue() of the same signo
+     * can coexist without one silently swallowing the other.
+     */
+    bool had_value =
+        signal_value_queue_pop_locked(&p->sig_state, signo, out_value);
+    if (!had_value && (p->sig_state.proc_pending_plain & sig_bit(signo))) {
+        p->sig_state.proc_pending_plain &= ~sig_bit(signo);
+    }
+    if (out_has_value)
+        *out_has_value = had_value;
+
+    sig_refresh_proc_pending_locked(&p->sig_state, signo);
+    return true;
+}
+
+bool signal_restore_proc_pending_locked(struct proc *p,
+                                        i32 signo,
+                                        u64 value,
+                                        bool had_value)
+{
+    if (!p || !sig_valid(signo))
+        return false;
+
+    bool payload_dropped = false;
+    if (had_value) {
+        if (!signal_value_queue_push_head_locked(&p->sig_state, signo, value)) {
+            /* Queue filled up via a concurrent sigqueue after the pop. The
+             * exact payload is lost, but a same-signo instance is still in
+             * flight via the queue, so observability of the signal is not
+             * lost. Surface a plain pending instance as well so the receiver
+             * is guaranteed to retry.
+             */
+            p->sig_state.proc_pending_plain |= sig_bit(signo);
+            payload_dropped = true;
+        }
+    } else {
+        p->sig_state.proc_pending_plain |= sig_bit(signo);
+    }
+    sig_refresh_proc_pending_locked(&p->sig_state, signo);
+    return payload_dropped;
+}
+
 /* Saved signal context pushed onto the user stack. */
 struct signal_frame {
     u32 magic;
@@ -260,7 +425,7 @@ bool signal_deliver(struct sched_task *td, struct trap_frame *tf)
     if (thread_pending & sig_bit(signo))
         td->td_sig.pending &= ~sig_bit(signo);
     else
-        p->sig_state.proc_pending &= ~sig_bit(signo);
+        (void) signal_claim_proc_pending_locked(p, signo, NULL, NULL);
     sig_handler_fn_t handler = p->sig_state.actions[signo].handler;
 
     if (handler == SIG_IGN) {
diff --git a/kernel/proc/signal.h b/kernel/proc/signal.h
index 4842236..9dd7a5e 100644
--- a/kernel/proc/signal.h
+++ b/kernel/proc/signal.h
@@ -56,6 +56,41 @@ void signal_init(struct signal_state *ss);
  */
 i32 signal_send(struct proc *p, i32 signo);
 
+/* Enqueue a process-directed signal with a queued payload value.
+ * Repeated deliveries of the same signo are preserved up to the
+ * bounded per-signal queue depth. Returns -EAGAIN if that queue is
+ * full.
+ */
+i32 signal_queue_send(struct proc *p, i32 signo, u64 value);
+
+/* Claim one process-directed pending instance of signo. Caller must
+ * hold p->sig_lock. Queued sigqueue payloads are preferred (FIFO); when
+ * one is dequeued, *out_value receives the payload and *out_has_value is
+ * set true. When no payload is queued but a plain kill-style instance is
+ * pending, that instance is consumed and *out_has_value is set false.
+ * The summary proc_pending bit is updated to reflect any remaining source.
+ * Returns false if no instance of signo is pending.
+ */
+bool signal_claim_proc_pending_locked(struct proc *p,
+                                      i32 signo,
+                                      u64 *out_value,
+                                      bool *out_has_value);
+
+/* Re-insert a previously-claimed process-directed instance after a partial
+ * delivery failure (e.g. sigtimedwait copy_to_user fault after the signal
+ * was already dequeued). Caller must hold p->sig_lock. If had_value is true,
+ * the payload is pushed back at the queue head; if the queue is now full
+ * (because a concurrent sigqueue arrived after the original pop), the
+ * payload is dropped and a plain pending instance is recorded instead so the
+ * signal itself stays observable. If had_value is false, a plain pending
+ * instance is restored. The summary proc_pending bit is refreshed.
+ * Returns true if a payload had to be dropped because the queue was full.
+ */
+bool signal_restore_proc_pending_locked(struct proc *p,
+                                        i32 signo,
+                                        u64 value,
+                                        bool had_value);
+
 /* Return true if this specific thread can take signo immediately. */
 static inline bool signal_thread_can_deliver(const struct sched_task *td,
                                              i32 signo)
diff --git a/kernel/proc/syscall.c b/kernel/proc/syscall.c
index 101ce2a..355c1e6 100644
--- a/kernel/proc/syscall.c
+++ b/kernel/proc/syscall.c
@@ -2216,6 +2216,22 @@ static i64 sys_kill_h(struct trap_frame *tf, struct sched_task *td __unused)
     return (i64) signal_send(target, signo);
 }
 
+static i64 sys_sigqueue_h(struct trap_frame *tf, struct sched_task *td __unused)
+{
+    u16 pid = (u16) tf->a0;
+    i32 signo = (i32) tf->a1;
+    u64 value = tf->a2;
+
+    struct proc *target = proc_find(pid);
+    if (!target)
+        return -(i64) ESRCH;
+
+    if (signo == 0)
+        return 0;
+
+    return (i64) signal_queue_send(target, signo, value);
+}
+
 /* Forward declaration; defined alongside the other thread syscalls
  * later in the file.
  */
@@ -2340,14 +2356,17 @@ static i64 sys_sigsuspend_h(struct trap_frame *tf, struct sched_task *td)
     return -(i64) EINTR;
 }
 
-/* sigtimedwait(set, info, timeout):
+/* sigtimedwait(set, info, timeout, value_out):
  *   block until any signal in *set becomes pending; on success,
  *   atomically dequeue that signal (without invoking its handler) and
- *   write its number to *info (just the signo, since Mazu signals
- *   carry no siginfo payload). Returns the signal number on success,
- *   or -EAGAIN on timeout, -EINTR if a non-set signal arrives.
+ *   write its number to *info. If a queued sigqueue payload exists for
+ *   that signo, the kernel writes it to *value_out. Plain kill-style
+ *   signals report no payload and leave *value_out unmodified.
+ *   Returns the signal number on success, or -EAGAIN on timeout,
+ *   -EINTR if a non-set signal arrives.
  *   a0 = user *u32 set, a1 = user *i32 signo_out (NULL ok),
- *   a2 = user *struct timespec timeout (NULL = wait forever).
+ *   a2 = user *struct timespec timeout (NULL = wait forever),
+ *   a3 = user *u64 value_out (NULL ok).
  */
 static i64 sys_sigtimedwait_h(struct trap_frame *tf, struct sched_task *td)
 {
@@ -2356,6 +2375,7 @@ static i64 sys_sigtimedwait_h(struct trap_frame *tf, struct sched_task *td)
     ptr u_set = (ptr) tf->a0;
     ptr u_signo_out = (ptr) tf->a1;
     ptr u_timeout = (ptr) tf->a2;
+    ptr u_value_out = (ptr) tf->a3;
 
     if (!u_set)
         return -(i64) EFAULT;
@@ -2375,6 +2395,8 @@ static i64 sys_sigtimedwait_h(struct trap_frame *tf, struct sched_task *td)
      */
     if (u_signo_out && !user_addr_writable(u_signo_out, sizeof(i32)))
         return -(i64) EFAULT;
+    if (u_value_out && !user_addr_writable(u_value_out, sizeof(u64)))
+        return -(i64) EFAULT;
 
     /* Compute a monotonic deadline once. NULL timeout = no deadline.
      * Bound tv_sec so the (tv_sec * freq) and (now + add_ticks)
@@ -2428,6 +2450,9 @@ static i64 sys_sigtimedwait_h(struct trap_frame *tf, struct sched_task *td)
 
     i64 result = 0;
     i32 dequeued_signo = 0;
+    u64 dequeued_value = 0;
+    bool dequeued_has_value = false;
+    bool dequeued_from_proc = false;
     for (;;) {
         sflags = proc_sig_lock_irqsave(p);
         u32 thread_pending = td->td_sig.pending;
@@ -2443,8 +2468,13 @@ static i64 sys_sigtimedwait_h(struct trap_frame *tf, struct sched_task *td)
             }
             if (thread_pending & sig_bit(signo))
                 td->td_sig.pending &= ~sig_bit(signo);
-            else
-                p->sig_state.proc_pending &= ~sig_bit(signo);
+            else if (!signal_claim_proc_pending_locked(
+                         p, signo, &dequeued_value, &dequeued_has_value)) {
+                td->td_sig.sigwait_set = 0;
+                proc_sig_unlock_irqrestore(p, sflags);
+                continue;
+            } else
+                dequeued_from_proc = true;
             td->td_sig.sigwait_set = 0;
             proc_sig_unlock_irqrestore(p, sflags);
             dequeued_signo = signo;
@@ -2486,19 +2516,37 @@ static i64 sys_sigtimedwait_h(struct trap_frame *tf, struct sched_task *td)
 
     /* Write the signo out-of-line. u_signo_out was pre-validated so
      * a fault here is unlikely; if it does fault (concurrent munmap
-     * after validation), re-queue the bit so the caller can retry.
+     * after validation), restore the pending instance so the caller can
+     * retry. signal_restore_proc_pending_locked re-checks queue capacity
+     * under sig_lock, since a concurrent sigqueue may have arrived after
+     * the original pop and filled the queue.
      */
     if (u_signo_out) {
         i64 cprc =
             copy_to_user(u_signo_out, &dequeued_signo, sizeof(dequeued_signo));
         if (cprc < 0) {
             sflags = proc_sig_lock_irqsave(p);
-            /* Restore on the per-thread mask (a thread-directed
-             * deliver was either consumed here or was process-wide
-             * and is best preserved per-thread to avoid a feedback
-             * loop with signal_pick_wake_target).
-             */
-            td->td_sig.pending |= sig_bit(dequeued_signo);
+            if (dequeued_from_proc) {
+                (void) signal_restore_proc_pending_locked(
+                    p, dequeued_signo, dequeued_value, dequeued_has_value);
+            } else {
+                td->td_sig.pending |= sig_bit(dequeued_signo);
+            }
+            proc_sig_unlock_irqrestore(p, sflags);
+            return cprc;
+        }
+    }
+    if (u_value_out && dequeued_has_value) {
+        i64 cprc =
+            copy_to_user(u_value_out, &dequeued_value, sizeof(dequeued_value));
+        if (cprc < 0) {
+            sflags = proc_sig_lock_irqsave(p);
+            if (dequeued_from_proc) {
+                (void) signal_restore_proc_pending_locked(
+                    p, dequeued_signo, dequeued_value, dequeued_has_value);
+            } else {
+                td->td_sig.pending |= sig_bit(dequeued_signo);
+            }
             proc_sig_unlock_irqrestore(p, sflags);
             return cprc;
         }
@@ -3309,6 +3357,7 @@ static const struct syscall_entry syscall_table[SYS_NR] = {
     [SYS_PTHREAD_SIGMASK] = {sys_pthread_sigmask_h, SYSCALL_F_NEEDS_PROC},
     [SYS_SIGSUSPEND] = {sys_sigsuspend_h, SYSCALL_F_NEEDS_PROC},
     [SYS_SIGTIMEDWAIT] = {sys_sigtimedwait_h, SYSCALL_F_NEEDS_PROC},
+    [SYS_SIGQUEUE] = {sys_sigqueue_h, 0},
     [SYS_THREAD_CANCEL] = {sys_thread_cancel_h, SYSCALL_F_NEEDS_PROC},
     [SYS_THREAD_SETCANCELSTATE] = {sys_thread_setcancelstate_h,
                                    SYSCALL_F_NEEDS_PROC},
diff --git a/tests/tests-proc-helpers.h b/tests/tests-proc-helpers.h
new file mode 100644
index 0000000..b2222bd
--- /dev/null
+++ b/tests/tests-proc-helpers.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: MIT */
+/* Shared proc/task helpers for syscall-facing selftests.
+ *
+ * These helpers intentionally live in tests/ because they allocate mock
+ * sched_task instances and wire them into the proc/cap tables without
+ * going through the full user-thread creation path. That is appropriate
+ * for syscall ABI validation where we need a stable proc + current-task
+ * context but not a live scheduled peer thread.
+ */
+
+#ifndef TESTS_PROC_HELPERS_H
+#define TESTS_PROC_HELPERS_H
+
+#include <mazu/cap.h>
+#include <mazu/kvalloc.h>
+#include <mazu/proc.h>
+
+static inline struct proc *alloc_running_proc(void)
+{
+    struct proc *p = proc_alloc();
+
+    if (p)
+        proc_set_state(p, PROC_STATE_RUNNING);
+    return p;
+}
+
+static inline struct sched_task *alloc_mock_task(void)
+{
+    struct option_byte_array td_mem =
+        kvalloc_alloc(sizeof(struct sched_task), alignof(struct sched_task));
+
+    if (td_mem.is_none)
+        return NULL;
+    struct sched_task *td = byte_array_ptr(option_byte_array_checked(td_mem));
+
+    memset(td, 0, sizeof(*td));
+    td->td_cap_slot = -1;
+    return td;
+}
+
+static inline void free_mock_task(struct sched_task *td)
+{
+    kvalloc_free(byte_array_new((byte *) td, sizeof(*td)));
+}
+
+static inline i32 syscall_test_thread_cap_slot(u8 task_slot)
+{
+    return CAP_SPACE_SLOTS - PROC_THREAD_MAX + (i32) task_slot;
+}
+
+static inline i64 syscall_test_thread_token(struct proc *p,
+                                            struct sched_task *td)
+{
+    assert(p);
+    assert(td);
+    assert(td->td_cap_slot >= 0);
+    return cap_get_token(p, td->td_cap_slot, CAP_TYPE_THREAD);
+}
+
+static inline bool alloc_proc_and_task(struct proc **out_p,
+                                       struct sched_task **out_td)
+{
+    struct proc *p = alloc_running_proc();
+
+    if (!p)
+        return false;
+    struct sched_task *td = alloc_mock_task();
+    if (!td) {
+        proc_set_state(p, PROC_STATE_ZOMBIE);
+        proc_free(p);
+        return false;
+    }
+    td->proc = p;
+    {
+        u64 pflags = proc_table_lock_irqsave();
+        bool ok = proc_attach_task(p, td);
+        proc_table_unlock_irqrestore(pflags);
+        if (!ok) {
+            free_mock_task(td);
+            proc_set_state(p, PROC_STATE_ZOMBIE);
+            proc_free(p);
+            return false;
+        }
+    }
+    u8 thread_slot = proc_task_slot(p, td);
+    i32 thread_handle = cap_open_handle(
+        p, thread_slot, CAP_TYPE_THREAD, CAP_RIGHT_READ | CAP_RIGHT_WRITE,
+        syscall_test_thread_cap_slot(thread_slot), true);
+    if (thread_handle < 0) {
+        u64 pflags = proc_table_lock_irqsave();
+        (void) proc_reap_exited_thread_locked(p, td);
+        proc_table_unlock_irqrestore(pflags);
+        free_mock_task(td);
+        proc_set_state(p, PROC_STATE_ZOMBIE);
+        proc_free(p);
+        return false;
+    }
+    td->td_cap_slot = (i16) thread_handle;
+    *out_p = p;
+    *out_td = td;
+    return true;
+}
+
+static inline void free_proc_and_task(struct proc *p, struct sched_task *td)
+{
+    proc_set_state(p, PROC_STATE_ZOMBIE);
+    proc_free(p);
+    free_mock_task(td);
+}
+
+static inline bool attach_mock_thread(struct proc *p, struct sched_task *target)
+{
+    u8 slot = PROC_THREAD_MAX;
+    u64 flags = proc_table_lock_irqsave();
+    bool ok = proc_reserve_thread_slot(p, &slot);
+    proc_table_unlock_irqrestore(flags);
+    if (!ok)
+        return false;
+    flags = proc_table_lock_irqsave();
+    ok = proc_attach_task_slot(p, target, slot);
+    proc_table_unlock_irqrestore(flags);
+    if (!ok)
+        return false;
+    i32 thread_handle = cap_open_handle(
+        p, slot, CAP_TYPE_THREAD, CAP_RIGHT_READ | CAP_RIGHT_WRITE,
+        syscall_test_thread_cap_slot(slot), true);
+    if (thread_handle < 0) {
+        flags = proc_table_lock_irqsave();
+        (void) proc_reap_exited_thread_locked(p, target);
+        proc_table_unlock_irqrestore(flags);
+        return false;
+    }
+    target->td_cap_slot = (i16) thread_handle;
+    return true;
+}
+
+#endif /* TESTS_PROC_HELPERS_H */
diff --git a/tests/tests-pse51.c b/tests/tests-pse51.c
index 97cee36..56a5e30 100644
--- a/tests/tests-pse51.c
+++ b/tests/tests-pse51.c
@@ -1,170 +1,565 @@
 /* SPDX-License-Identifier: MIT */
-/* PSE51 conformance smoke tests.
+/* PSE51 conformance suite.
  *
- * Exercises the PSE51-facing surface: clocks, sleep, sync primitives,
- * scheduling control, barriers, rwlocks, message queues.
- * Each test validates a minimal operation of its subsystem.
+ * Inspired by externals/posix-conformance: group checks by API
+ * category, cover positive plus targeted negative/boundary cases,
+ * and exercise the user-visible syscall semantics where practical.
+ *
+ * Lower-level primitive tests still live in the subsystem-specific
+ * selftests. This file is the consolidated "does the PSE51-facing
+ * surface still match the matrix?" suite.
  */
 
-#include <kernel/ipc/mqueue.h>
-#include <kernel/sync/barrier.h>
-#include <kernel/sync/condvar.h>
-#include <kernel/sync/mutex.h>
-#include <kernel/sync/rwlock.h>
-#include <kernel/sync/semaphore.h>
-#include <kernel/sync/sync_handle.h>
+#include <kernel/proc/pipe.h>
+#include <mazu/cap.h>
 #include <mazu/ipi.h>
+#include <mazu/list.h>
 #include <mazu/posix_time.h>
-#include <mazu/sched.h>
+#include <mazu/selftest.h>
+#include <mazu/syscall.h>
+#include <mazu/sysconf.h>
 #include <mazu/time.h>
+#include <mazu/uaccess.h>
 #include "tests-common.h"
+#include "tests-proc-helpers.h"
 
-/* Smoke 1: clock monotonicity and resolution. */
-static i32 test_pse51_clocks(void)
+static bool pse51_map_user_page(struct proc *p, vaddr_t va)
 {
-    u64 freq = time_get_timebase_freq();
-    SELFTEST_ASSERT(freq > 0, 1);
-
-    u64 t1 = time_rdtime();
-    for (volatile i32 v = 0; v < 10; v++)
-        ;
-    u64 t2 = time_rdtime();
-    SELFTEST_ASSERT(t2 > t1, 2);
-
-    /* Resolution should be sub-millisecond. */
-    i64 res_ns = (i64) (NSEC_PER_SEC / freq);
-    SELFTEST_ASSERT(res_ns > 0 && res_ns < NSEC_PER_MSEC, 3);
+    return !proc_map_user_page(p, va, PT_FLAG_RW | PT_FLAG_USER).is_error;
+}
 
+static i32 test_pse51_sysconf_profile(void)
+{
+    SELFTEST_ASSERT(sys_sysconf_query(_SC_TIMERS) == _POSIX_TIMERS, 1);
+    SELFTEST_ASSERT(
+        sys_sysconf_query(_SC_MONOTONIC_CLOCK) == _POSIX_MONOTONIC_CLOCK, 2);
+    SELFTEST_ASSERT(sys_sysconf_query(_SC_PRIORITY_SCHEDULING) ==
+                        _POSIX_PRIORITY_SCHEDULING,
+                    3);
+    SELFTEST_ASSERT(sys_sysconf_query(_SC_SEMAPHORES) == _POSIX_SEMAPHORES, 4);
+    SELFTEST_ASSERT(sys_sysconf_query(_SC_BARRIERS) == _POSIX_BARRIERS, 5);
+    SELFTEST_ASSERT(sys_sysconf_query(_SC_READER_WRITER_LOCKS) ==
+                        _POSIX_READER_WRITER_LOCKS,
+                    6);
+    SELFTEST_ASSERT(sys_sysconf_query(_SC_THREAD_PRIORITY_INHERIT) ==
+                        _POSIX_THREAD_PRIO_INHERIT,
+                    7);
+    SELFTEST_ASSERT(
+        sys_sysconf_query(_SC_MESSAGE_PASSING) == _POSIX_MESSAGE_PASSING, 8);
+    SELFTEST_ASSERT(sys_sysconf_query(_SC_THREADS) == _POSIX_THREADS, 9);
+    SELFTEST_ASSERT(
+        sys_sysconf_query(_SC_THREAD_CPUTIME) == _POSIX_THREAD_CPUTIME, 10);
+    SELFTEST_ASSERT(sys_sysconf_query(_SC_CPUTIME) == _POSIX_CPUTIME, 11);
+    SELFTEST_ASSERT(
+        sys_sysconf_query(_SC_REALTIME_SIGNALS) == _POSIX_REALTIME_SIGNALS, 12);
+    SELFTEST_ASSERT(sys_sysconf_query(_SC_SPIN_LOCKS) == -1, 13);
+    SELFTEST_ASSERT(sys_sysconf_query(_SC_CLOCK_SELECTION) == -1, 14);
     return 0;
 }
-DEFINE_SELFTEST(pse51_clocks, test_pse51_clocks);
+DEFINE_SELFTEST(pse51_sysconf_profile, test_pse51_sysconf_profile);
 
-/* Smoke 2: nanosleep. */
-static i32 test_pse51_sleep(void)
+static i32 test_pse51_time_profile(void)
 {
-    u64 before = time_rdtime();
-    sleep_ms(time_ms_new(20));
-    u64 after = time_rdtime();
-    u64 elapsed_us = time_ticks_to_us(after - before);
-    SELFTEST_ASSERT(elapsed_us >= 15000, 1);
+    struct proc *p;
+    struct sched_task *td;
+    struct trap_frame tf = {0};
+    struct timespec ts;
+    const vaddr_t va = USER_DATA_BASE + (180UL * PAGE_SIZE);
+
+    SELFTEST_ASSERT(alloc_proc_and_task(&p, &td), 1);
+    SELFTEST_ASSERT(pse51_map_user_page(p, va), 2);
+
+    tf.a7 = SYS_CLOCK_GETTIME;
+    tf.a0 = CLOCK_MONOTONIC;
+    tf.a1 = (u64) va;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 3);
+    SELFTEST_ASSERT(copy_from_user(&ts, va, sizeof(ts)) == 0, 4);
+    SELFTEST_ASSERT(ts.tv_nsec >= 0 && ts.tv_nsec < NSEC_PER_SEC, 5);
+
+    tf.a0 = CLOCK_REALTIME;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 6);
+    SELFTEST_ASSERT(copy_from_user(&ts, va, sizeof(ts)) == 0, 7);
+    SELFTEST_ASSERT(ts.tv_nsec >= 0 && ts.tv_nsec < NSEC_PER_SEC, 8);
+
+    td->cpu_time_us = 1234567ULL;
+    tf.a0 = CLOCK_THREAD_CPUTIME_ID;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 9);
+    SELFTEST_ASSERT(copy_from_user(&ts, va, sizeof(ts)) == 0, 10);
+    SELFTEST_ASSERT(ts.tv_sec == 1 && ts.tv_nsec == 234567000, 11);
+
+    tf.a0 = CLOCK_PROCESS_CPUTIME_ID;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 12);
+    SELFTEST_ASSERT(copy_from_user(&ts, va, sizeof(ts)) == 0, 13);
+    SELFTEST_ASSERT(ts.tv_sec == 1 && ts.tv_nsec == 234567000, 14);
+
+    tf.a7 = SYS_CLOCK_GETRES;
+    tf.a0 = CLOCK_MONOTONIC;
+    tf.a1 = (u64) va;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 15);
+    SELFTEST_ASSERT(copy_from_user(&ts, va, sizeof(ts)) == 0, 16);
+    SELFTEST_ASSERT(ts.tv_sec >= 0 && ts.tv_nsec > 0, 17);
+
+    tf.a7 = SYS_NANOSLEEP;
+    tf.a0 = (u64) va;
+    tf.a1 = 0;
+    ts.tv_sec = 0;
+    ts.tv_nsec = -1;
+    SELFTEST_ASSERT(copy_to_user(va, &ts, sizeof(ts)) == 0, 18);
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) EINVAL, 19);
+
+    ts.tv_sec = -1;
+    ts.tv_nsec = 0;
+    SELFTEST_ASSERT(copy_to_user(va, &ts, sizeof(ts)) == 0, 20);
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) EINVAL, 21);
+
+    ts.tv_sec = 0;
+    ts.tv_nsec = (i64) NSEC_PER_SEC;
+    SELFTEST_ASSERT(copy_to_user(va, &ts, sizeof(ts)) == 0, 22);
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) EINVAL, 23);
+
+    free_proc_and_task(p, td);
     return 0;
 }
-DEFINE_SELFTEST(pse51_sleep, test_pse51_sleep);
+DEFINE_SELFTEST(pse51_time_profile, test_pse51_time_profile);
 
-/* Smoke 3: mutex lock/unlock. */
-static i32 test_pse51_mutex(void)
+static i32 test_pse51_memory_syncio_profile(void)
 {
-    i32 h = sync_mutex_alloc(NULL);
-    SELFTEST_ASSERT(h >= 0, 1);
+    struct proc *p;
+    struct sched_task *td;
+    struct trap_frame tf = {0};
+    const vaddr_t va = USER_DATA_BASE + (181UL * PAGE_SIZE);
+    struct pipe *pipe;
 
-    struct pi_mutex *m = sync_mutex_get(h);
-    SELFTEST_ASSERT(m != NULL, 2);
+    SELFTEST_ASSERT(alloc_proc_and_task(&p, &td), 1);
+    SELFTEST_ASSERT(pse51_map_user_page(p, va), 2);
 
-    pi_mutex_lock(m);
-    pi_mutex_unlock(m);
+    tf.a7 = SYS_MLOCK;
+    tf.a0 = (u64) va;
+    tf.a1 = 0;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) EINVAL, 3);
 
-    SELFTEST_ASSERT(pi_mutex_trylock(m) == 0, 3);
-    pi_mutex_unlock(m);
+    tf.a0 = U64_MAX - 16;
+    tf.a1 = 64;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) EINVAL, 4);
 
-    sync_mutex_free(h, NULL);
-    return 0;
-}
-DEFINE_SELFTEST(pse51_mutex, test_pse51_mutex);
+    tf.a0 = 0;
+    tf.a1 = 16;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) ENOMEM, 5);
 
-/* Smoke 4: semaphore with timed wait. */
-static i32 test_pse51_sem(void)
-{
-    i32 h = sync_sem_alloc(NULL, 1);
-    SELFTEST_ASSERT(h >= 0, 1);
+    tf.a0 = (u64) va;
+    tf.a1 = 16;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 6);
 
-    struct semaphore *s = sync_sem_get(h);
-    SELFTEST_ASSERT(s != NULL, 2);
+    tf.a7 = SYS_MUNLOCK;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 7);
 
-    sem_wait(s);
-    sem_post(s);
+    tf.a7 = SYS_FSYNC;
+    tf.a0 = (u64) -1;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) EBADF, 8);
 
-    /* Timed wait should succeed immediately since count is now 1. */
-    i32 rc = sem_timedwait(s, time_ms_new(100));
-    SELFTEST_ASSERT(rc == 0, 3);
+    tf.a7 = SYS_FDATASYNC;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) EBADF, 9);
 
-    /* Timed wait should timeout since count is now 0. */
-    rc = sem_timedwait(s, time_ms_new(20));
-    SELFTEST_ASSERT(rc == -(i32) ETIMEDOUT, 4);
+    tf.a7 = SYS_FSYNC;
+    tf.a0 = PROC_FD_STDOUT;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 10);
 
-    sem_post(s);
-    sync_sem_free(h, NULL);
-    return 0;
-}
-DEFINE_SELFTEST(pse51_sem, test_pse51_sem);
+    tf.a7 = SYS_FDATASYNC;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 11);
 
-/* Smoke 5: scheduling priority query. */
-static i32 test_pse51_sched(void)
-{
-    SELFTEST_ASSERT(SCHED_PRIO_IDLE == 0, 1);
-    SELFTEST_ASSERT(CONFIG_SCHED_NPRIO > 1, 2);
+    pipe = pipe_alloc();
+    SELFTEST_ASSERT(pipe != NULL, 12);
+    SELFTEST_ASSERT(
+        cap_open_pipe(p, pipe, true, CAP_RIGHT_READ | CAP_RIGHT_GRANT,
+                      PROC_FD_STDIN, true) == PROC_FD_STDIN,
+        13);
+
+    tf.a7 = SYS_FSYNC;
+    tf.a0 = PROC_FD_STDIN;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) EINVAL, 14);
 
-    struct sched_task *td = sched_current_task();
-    SELFTEST_ASSERT(td != NULL, 3);
-    SELFTEST_ASSERT(td->td_base_prio >= SCHED_PRIO_IDLE, 4);
+    tf.a7 = SYS_FDATASYNC;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) EINVAL, 15);
 
+    free_proc_and_task(p, td);
     return 0;
 }
-DEFINE_SELFTEST(pse51_sched, test_pse51_sched);
+DEFINE_SELFTEST(pse51_memory_syncio_profile, test_pse51_memory_syncio_profile);
 
-/* Smoke 6: message queue send/receive. */
-static i32 test_pse51_mqueue(void)
+static i32 test_pse51_sync_profile(void)
 {
-    i32 h = mqueue_open(NULL, 4, 32);
-    SELFTEST_ASSERT(h >= 0, 1);
-
-    u8 msg[] = {1, 2, 3};
-    SELFTEST_ASSERT(mqueue_send(h, msg, 3, 10) == 0, 2);
-
-    u8 buf[32];
-    u32 prio;
-    i32 len = mqueue_receive(h, buf, sizeof(buf), &prio);
-    SELFTEST_ASSERT(len == 3, 3);
-    SELFTEST_ASSERT(prio == 10, 4);
-    SELFTEST_ASSERT(buf[0] == 1 && buf[1] == 2 && buf[2] == 3, 5);
-
-    mqueue_close(h);
+    struct proc *p;
+    struct sched_task *td;
+    struct trap_frame tf = {0};
+    struct timespec ts = {0};
+    i64 mutex_h;
+    i64 cond_h;
+    i64 sem_h;
+    i64 barrier_h;
+    i64 rwlock_h;
+    const vaddr_t va = USER_DATA_BASE + (182UL * PAGE_SIZE);
+
+    SELFTEST_ASSERT(alloc_proc_and_task(&p, &td), 1);
+    SELFTEST_ASSERT(pse51_map_user_page(p, va), 2);
+
+    tf.a7 = SYS_MUTEX_INIT;
+    mutex_h = syscall_dispatch(&tf, td);
+    SELFTEST_ASSERT(mutex_h >= 0, 3);
+
+    tf.a7 = SYS_MUTEX_LOCK;
+    tf.a0 = (u64) mutex_h;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 4);
+
+    tf.a7 = SYS_MUTEX_UNLOCK;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 5);
+
+    tf.a7 = SYS_MUTEX_TRYLOCK;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 6);
+    tf.a7 = SYS_MUTEX_UNLOCK;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 7);
+
+    tf.a7 = SYS_COND_INIT;
+    cond_h = syscall_dispatch(&tf, td);
+    SELFTEST_ASSERT(cond_h >= 0, 8);
+
+    tf.a7 = SYS_COND_SIGNAL;
+    tf.a0 = (u64) cond_h;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 9);
+
+    tf.a7 = SYS_COND_BROADCAST;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 10);
+
+    tf.a7 = SYS_SEM_INIT;
+    tf.a0 = 1;
+    sem_h = syscall_dispatch(&tf, td);
+    SELFTEST_ASSERT(sem_h >= 0, 11);
+
+    tf.a7 = SYS_SEM_WAIT;
+    tf.a0 = (u64) sem_h;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 12);
+
+    tf.a7 = SYS_SEM_POST;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 13);
+
+    tf.a7 = SYS_SEM_TRYWAIT;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 14);
+
+    ts.tv_sec = 0;
+    ts.tv_nsec = 0;
+    SELFTEST_ASSERT(copy_to_user(va, &ts, sizeof(ts)) == 0, 15);
+    tf.a7 = SYS_SEM_TIMEDWAIT;
+    tf.a0 = (u64) sem_h;
+    tf.a1 = (u64) va;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) ETIMEDOUT, 16);
+
+    tf.a7 = SYS_BARRIER_INIT;
+    tf.a0 = 1;
+    barrier_h = syscall_dispatch(&tf, td);
+    SELFTEST_ASSERT(barrier_h >= 0, 17);
+
+    tf.a7 = SYS_BARRIER_WAIT;
+    tf.a0 = (u64) barrier_h;
+    SELFTEST_ASSERT(
+        syscall_dispatch(&tf, td) == (i64) PTHREAD_BARRIER_SERIAL_THREAD, 18);
+
+    tf.a7 = SYS_BARRIER_DESTROY;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 19);
+
+    tf.a7 = SYS_RWLOCK_INIT;
+    rwlock_h = syscall_dispatch(&tf, td);
+    SELFTEST_ASSERT(rwlock_h >= 0, 20);
+
+    tf.a7 = SYS_RWLOCK_RDLOCK;
+    tf.a0 = (u64) rwlock_h;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 21);
+
+    tf.a7 = SYS_RWLOCK_UNLOCK;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 22);
+
+    tf.a7 = SYS_RWLOCK_TRYWRLOCK;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 23);
+
+    tf.a7 = SYS_RWLOCK_UNLOCK;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 24);
+
+    ts.tv_sec = 0;
+    ts.tv_nsec = 0;
+    SELFTEST_ASSERT(copy_to_user(va, &ts, sizeof(ts)) == 0, 25);
+    tf.a7 = SYS_RWLOCK_TIMEDRDLOCK;
+    tf.a0 = (u64) rwlock_h;
+    tf.a1 = (u64) va;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) ETIMEDOUT, 26);
+
+    tf.a7 = SYS_RWLOCK_DESTROY;
+    tf.a0 = (u64) rwlock_h;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 27);
+
+    free_proc_and_task(p, td);
     return 0;
 }
-DEFINE_SELFTEST(pse51_mqueue, test_pse51_mqueue);
+DEFINE_SELFTEST(pse51_sync_profile, test_pse51_sync_profile);
 
-/* Smoke 7: barrier with 1 thread (trivial case). */
-static i32 test_pse51_barrier_trivial(void)
+static i32 test_pse51_sched_thread_profile(void)
 {
-    struct barrier b;
-    barrier_init(&b, 1);
-
-    i32 ret = barrier_wait(&b);
-    SELFTEST_ASSERT(ret == PTHREAD_BARRIER_SERIAL_THREAD, 1);
-
-    SELFTEST_ASSERT(barrier_destroy(&b) == 0, 2);
+    struct proc *p;
+    struct sched_task *td;
+    struct sched_task *target;
+    struct trap_frame tf = {0};
+    i32 old_state = 0;
+
+    SELFTEST_ASSERT(alloc_proc_and_task(&p, &td), 1);
+    td->td_base_prio = (u8) (CONFIG_SCHED_NPRIO - 1);
+    list_init(&td->pi_held_mutexes);
+
+    tf.a7 = SYS_THREAD_SELF;
+    SELFTEST_ASSERT(
+        syscall_dispatch(&tf, td) == syscall_test_thread_token(p, td), 2);
+
+    tf.a7 = SYS_THREAD_GETSCHEDPARAM;
+    tf.a0 = 0;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == (i64) td->td_base_prio, 3);
+
+    tf.a7 = SYS_THREAD_SETSCHEDPARAM;
+    tf.a0 = 0;
+    tf.a1 = SCHED_PRIO_IDLE;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 4);
+    SELFTEST_ASSERT(td->td_base_prio == SCHED_PRIO_IDLE, 5);
+
+    tf.a7 = SYS_SCHED_GETSCHEDULER;
+    tf.a0 = 0;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == (i64) SCHED_FIFO, 6);
+
+    tf.a7 = SYS_SCHED_SETSCHEDULER;
+    tf.a0 = 0;
+    tf.a1 = SCHED_OTHER;
+    tf.a2 = SCHED_PRIO_IDLE;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == (i64) SCHED_FIFO, 7);
+
+    target = alloc_mock_task();
+    SELFTEST_ASSERT(target != NULL, 8);
+    target->proc = p;
+    target->td_join_state = TD_JOIN_JOINABLE;
+    init_waitqueue_head(&target->td_join_wq);
+    SELFTEST_ASSERT(attach_mock_thread(p, target), 9);
+
+    tf.a7 = SYS_THREAD_DETACH;
+    tf.a0 = (u64) syscall_test_thread_token(p, target);
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 10);
+    SELFTEST_ASSERT(target->td_join_state == TD_JOIN_DETACHED, 11);
+
+    tf.a7 = SYS_THREAD_JOIN;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) EINVAL, 12);
+
+    tf.a0 = (u64) syscall_test_thread_token(p, td);
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) EDEADLK, 13);
+
+    tf.a7 = SYS_THREAD_SETCANCELSTATE;
+    tf.a0 = PTHREAD_CANCEL_DISABLE;
+    tf.a1 = 0;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 14);
+    old_state =
+        td->td_cancel_disabled ? PTHREAD_CANCEL_DISABLE : PTHREAD_CANCEL_ENABLE;
+    SELFTEST_ASSERT(old_state == PTHREAD_CANCEL_DISABLE, 15);
+
+    tf.a0 = PTHREAD_CANCEL_ENABLE;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 16);
+    SELFTEST_ASSERT(td->td_cancel_disabled == false, 17);
+
+    tf.a7 = SYS_THREAD_CANCEL;
+    tf.a0 = (u64) syscall_test_thread_token(p, td);
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 18);
+    SELFTEST_ASSERT(td->td_cancel_pending == true, 19);
+
+    td->td_cancel_disabled = true;
+    tf.a7 = SYS_THREAD_TESTCANCEL;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 20);
+
+    {
+        u64 flags = proc_table_lock_irqsave();
+        proc_detach_task(p, target);
+        proc_table_unlock_irqrestore(flags);
+    }
+    free_mock_task(target);
+    free_proc_and_task(p, td);
     return 0;
 }
-DEFINE_SELFTEST(pse51_barrier_trivial, test_pse51_barrier_trivial);
+DEFINE_SELFTEST(pse51_sched_thread_profile, test_pse51_sched_thread_profile);
 
-/* Smoke 8: rwlock basic read/write. */
-static i32 test_pse51_rwlock(void)
+static i32 test_pse51_signal_profile(void)
 {
-    struct rwlock rw;
-    rwlock_init(&rw);
-
-    rwlock_rdlock(&rw);
-    rwlock_unlock(&rw);
-
-    rwlock_wrlock(&rw);
-    rwlock_unlock(&rw);
-
-    SELFTEST_ASSERT(rwlock_tryrdlock(&rw) == 0, 1);
-    rwlock_unlock(&rw);
-
-    SELFTEST_ASSERT(rwlock_trywrlock(&rw) == 0, 2);
-    rwlock_unlock(&rw);
+    struct proc *p;
+    struct sched_task *td;
+    struct trap_frame tf = {0};
+    const vaddr_t va = USER_DATA_BASE + (183UL * PAGE_SIZE);
+    u32 set = 0;
+    u32 old_mask = 0;
+    i32 signo_out = 0;
+    u64 value_out = 0;
+
+    SELFTEST_ASSERT(alloc_proc_and_task(&p, &td), 1);
+    SELFTEST_ASSERT(pse51_map_user_page(p, va), 2);
+
+    tf.a7 = SYS_SIGPROCMASK;
+    set = sig_bit(SIGUSR1);
+    SELFTEST_ASSERT(copy_to_user(va, &set, sizeof(set)) == 0, 3);
+    tf.a0 = 99;
+    tf.a1 = (u64) va;
+    tf.a2 = 0;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) EINVAL, 4);
+
+    set = 0xFFFFFFFFu;
+    SELFTEST_ASSERT(copy_to_user(va, &set, sizeof(set)) == 0, 5);
+    tf.a0 = SIG_SETMASK;
+    tf.a1 = (u64) va;
+    tf.a2 = (u64) (va + sizeof(u32));
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 6);
+    SELFTEST_ASSERT((td->td_sig.blocked & sig_bit(SIGKILL)) == 0, 7);
+    SELFTEST_ASSERT(
+        copy_from_user(&old_mask, va + sizeof(u32), sizeof(old_mask)) == 0, 8);
+
+    tf.a7 = SYS_PTHREAD_KILL;
+    tf.a0 = (u64) syscall_test_thread_token(p, td);
+    tf.a1 = SIGUSR2;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 9);
+    SELFTEST_ASSERT((td->td_sig.pending & sig_bit(SIGUSR2)) != 0, 10);
+    SELFTEST_ASSERT((p->sig_state.proc_pending & sig_bit(SIGUSR2)) == 0, 11);
+
+    td->td_sig.pending = 0;
+    tf.a1 = SIGKILL;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) EINVAL, 12);
+
+    tf.a7 = SYS_SIGTIMEDWAIT;
+    set = 0;
+    SELFTEST_ASSERT(copy_to_user(va, &set, sizeof(set)) == 0, 13);
+    tf.a0 = (u64) va;
+    tf.a1 = 0;
+    tf.a2 = 0;
+    tf.a3 = 0;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) EINVAL, 14);
+
+    td->td_sig.pending = sig_bit(SIGUSR1) | sig_bit(SIGUSR2);
+    set = sig_bit(SIGUSR2);
+    SELFTEST_ASSERT(copy_to_user(va, &set, sizeof(set)) == 0, 15);
+    tf.a1 = (u64) (va + sizeof(u32));
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == SIGUSR2, 16);
+    SELFTEST_ASSERT(
+        copy_from_user(&signo_out, va + sizeof(u32), sizeof(signo_out)) == 0,
+        17);
+    SELFTEST_ASSERT(signo_out == SIGUSR2, 18);
+
+    td->td_sig.pending = 0;
+    tf.a7 = SYS_SIGQUEUE;
+    tf.a0 = p->pid;
+    tf.a1 = SIGUSR1;
+    tf.a2 = 0x1122334455667788ULL;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 19);
+    SELFTEST_ASSERT((p->sig_state.proc_pending & sig_bit(SIGUSR1)) != 0, 20);
+
+    set = sig_bit(SIGUSR1);
+    SELFTEST_ASSERT(copy_to_user(va, &set, sizeof(set)) == 0, 21);
+    tf.a7 = SYS_SIGTIMEDWAIT;
+    tf.a0 = (u64) va;
+    tf.a1 = (u64) (va + sizeof(u32));
+    tf.a2 = 0;
+    tf.a3 = (u64) (va + sizeof(u32) + sizeof(i32));
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == SIGUSR1, 22);
+    SELFTEST_ASSERT(
+        copy_from_user(&signo_out, va + sizeof(u32), sizeof(signo_out)) == 0,
+        23);
+    SELFTEST_ASSERT(copy_from_user(&value_out, va + sizeof(u32) + sizeof(i32),
+                                   sizeof(value_out)) == 0,
+                    24);
+    SELFTEST_ASSERT(signo_out == SIGUSR1, 25);
+    SELFTEST_ASSERT(value_out == 0x1122334455667788ULL, 26);
+
+    free_proc_and_task(p, td);
+    return 0;
+}
+DEFINE_SELFTEST(pse51_signal_profile, test_pse51_signal_profile);
 
-    SELFTEST_ASSERT(rwlock_destroy(&rw) == 0, 3);
+static i32 test_pse51_timer_mqueue_profile(void)
+{
+    struct proc *p;
+    struct sched_task *td;
+    struct trap_frame tf = {0};
+    const vaddr_t va = USER_DATA_BASE + (184UL * PAGE_SIZE);
+    struct timespec ts = {0};
+    char msg[] = "mazu";
+    char recv[8] = {0};
+    u32 prio = 0;
+    i64 timer_h;
+    i64 mq_h;
+
+    SELFTEST_ASSERT(alloc_proc_and_task(&p, &td), 1);
+    SELFTEST_ASSERT(pse51_map_user_page(p, va), 2);
+
+    tf.a7 = SYS_TIMER_CREATE;
+    timer_h = syscall_dispatch(&tf, td);
+    SELFTEST_ASSERT(timer_h >= 0, 3);
+
+    tf.a7 = SYS_TIMER_SETTIME;
+    tf.a0 = (u64) timer_h;
+    tf.a1 = 10;
+    tf.a2 = 0;
+    tf.a3 = 0;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 4);
+
+    tf.a7 = SYS_TIMER_GETTIME;
+    tf.a0 = (u64) timer_h;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) >= 0, 5);
+
+    tf.a7 = SYS_TIMER_GETOVERRUN;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) >= 0, 6);
+
+    tf.a7 = SYS_TIMER_DELETE;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 7);
+
+    tf.a7 = SYS_TIMER_GETTIME;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) EINVAL, 8);
+
+    tf.a7 = SYS_MQ_OPEN;
+    tf.a0 = 4;
+    tf.a1 = sizeof(recv);
+    mq_h = syscall_dispatch(&tf, td);
+    SELFTEST_ASSERT(mq_h >= 0, 9);
+
+    SELFTEST_ASSERT(copy_to_user(va, msg, sizeof(msg)) == 0, 10);
+    tf.a7 = SYS_MQ_SEND;
+    tf.a0 = (u64) mq_h;
+    tf.a1 = (u64) va;
+    tf.a2 = sizeof(msg);
+    tf.a3 = 7;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 11);
+
+    tf.a7 = SYS_MQ_RECEIVE;
+    tf.a0 = (u64) mq_h;
+    tf.a1 = (u64) (va + 16);
+    tf.a2 = sizeof(recv);
+    tf.a3 = (u64) (va + 32);
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == (i64) sizeof(msg), 12);
+    SELFTEST_ASSERT(copy_from_user(recv, va + 16, sizeof(msg)) == 0, 13);
+    SELFTEST_ASSERT(copy_from_user(&prio, va + 32, sizeof(prio)) == 0, 14);
+    SELFTEST_ASSERT(recv[0] == 'm' && recv[1] == 'a' && recv[2] == 'z' &&
+                        recv[3] == 'u' && recv[4] == '\0',
+                    15);
+    SELFTEST_ASSERT(prio == 7, 16);
+
+    ts.tv_sec = 0;
+    ts.tv_nsec = 0;
+    SELFTEST_ASSERT(copy_to_user(va + 48, &ts, sizeof(ts)) == 0, 17);
+    tf.a7 = SYS_MQ_TIMEDRECEIVE;
+    tf.a0 = (u64) mq_h;
+    tf.a1 = (u64) (va + 16);
+    tf.a2 = sizeof(recv);
+    tf.a3 = 0;
+    tf.a4 = (u64) (va + 48);
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == -(i64) ETIMEDOUT, 18);
+
+    tf.a7 = SYS_MQ_CLOSE;
+    tf.a0 = (u64) mq_h;
+    SELFTEST_ASSERT(syscall_dispatch(&tf, td) == 0, 19);
+
+    free_proc_and_task(p, td);
     return 0;
 }
-DEFINE_SELFTEST(pse51_rwlock, test_pse51_rwlock);
+DEFINE_SELFTEST(pse51_timer_mqueue_profile, test_pse51_timer_mqueue_profile);
diff --git a/tests/tests-syscall.c b/tests/tests-syscall.c
index b7aab3d..29fbf4c 100644
--- a/tests/tests-syscall.c
+++ b/tests/tests-syscall.c
@@ -7,31 +7,7 @@
 #include <mazu/uaccess.h>
 #include <mazu/vfs.h>
 #include "../kernel/sync/futex.h"
-
-static struct proc *alloc_running_proc(void)
-{
-    struct proc *p = proc_alloc();
-    if (p)
-        proc_set_state(p, PROC_STATE_RUNNING);
-    return p;
-}
-
-static struct sched_task *alloc_mock_task(void)
-{
-    struct option_byte_array td_mem =
-        kvalloc_alloc(sizeof(struct sched_task), alignof(struct sched_task));
-    if (td_mem.is_none)
-        return NULL;
-    struct sched_task *td = byte_array_ptr(option_byte_array_checked(td_mem));
-    memset(td, 0, sizeof(*td));
-    td->td_cap_slot = -1;
-    return td;
-}
-
-static void free_mock_task(struct sched_task *td)
-{
-    kvalloc_free(byte_array_new((byte *) td, sizeof(*td)));
-}
+#include "tests-proc-helpers.h"
 
 static bool syscall_test_vfs_available(void)
 {
@@ -43,96 +19,6 @@ static bool syscall_test_vfs_available(void)
     return true;
 }
 
-static inline i32 syscall_test_thread_cap_slot(u8 task_slot)
-{
-    return CAP_SPACE_SLOTS - PROC_THREAD_MAX + (i32) task_slot;
-}
-
-static i64 syscall_test_thread_token(struct proc *p, struct sched_task *td)
-{
-    assert(p);
-    assert(td);
-    assert(td->td_cap_slot >= 0);
-    return cap_get_token(p, td->td_cap_slot, CAP_TYPE_THREAD);
-}
-
-/* Allocate a RUNNING proc + mock task, linked together. */
-static bool alloc_proc_and_task(struct proc **out_p, struct sched_task **out_td)
-{
-    struct proc *p = alloc_running_proc();
-    if (!p)
-        return false;
-    struct sched_task *td = alloc_mock_task();
-    if (!td) {
-        proc_set_state(p, PROC_STATE_ZOMBIE);
-        proc_free(p);
-        return false;
-    }
-    td->proc = p;
-    {
-        u64 pflags = proc_table_lock_irqsave();
-        bool ok = proc_attach_task(p, td);
-        proc_table_unlock_irqrestore(pflags);
-        if (!ok) {
-            free_mock_task(td);
-            proc_set_state(p, PROC_STATE_ZOMBIE);
-            proc_free(p);
-            return false;
-        }
-    }
-    u8 thread_slot = proc_task_slot(p, td);
-    i32 thread_handle = cap_open_handle(
-        p, thread_slot, CAP_TYPE_THREAD, CAP_RIGHT_READ | CAP_RIGHT_WRITE,
-        syscall_test_thread_cap_slot(thread_slot), true);
-    if (thread_handle < 0) {
-        u64 pflags = proc_table_lock_irqsave();
-        (void) proc_reap_exited_thread_locked(p, td);
-        proc_table_unlock_irqrestore(pflags);
-        free_mock_task(td);
-        proc_set_state(p, PROC_STATE_ZOMBIE);
-        proc_free(p);
-        return false;
-    }
-    td->td_cap_slot = (i16) thread_handle;
-    *out_p = p;
-    *out_td = td;
-    return true;
-}
-
-/* Teardown: transition to ZOMBIE, free proc, free task. */
-static void free_proc_and_task(struct proc *p, struct sched_task *td)
-{
-    proc_set_state(p, PROC_STATE_ZOMBIE);
-    proc_free(p);
-    free_mock_task(td);
-}
-
-static bool attach_mock_thread(struct proc *p, struct sched_task *target)
-{
-    u8 slot = PROC_THREAD_MAX;
-    u64 flags = proc_table_lock_irqsave();
-    bool ok = proc_reserve_thread_slot(p, &slot);
-    proc_table_unlock_irqrestore(flags);
-    if (!ok)
-        return false;
-    flags = proc_table_lock_irqsave();
-    ok = proc_attach_task_slot(p, target, slot);
-    proc_table_unlock_irqrestore(flags);
-    if (!ok)
-        return false;
-    i32 thread_handle = cap_open_handle(
-        p, slot, CAP_TYPE_THREAD, CAP_RIGHT_READ | CAP_RIGHT_WRITE,
-        syscall_test_thread_cap_slot(slot), true);
-    if (thread_handle < 0) {
-        flags = proc_table_lock_irqsave();
-        (void) proc_reap_exited_thread_locked(p, target);
-        proc_table_unlock_irqrestore(flags);
-        return false;
-    }
-    target->td_cap_slot = (i16) thread_handle;
-    return ok;
-}
-
 static i32 selftest_sys_open_emfile(void)
 {
     struct proc *p;
@@ -1558,6 +1444,7 @@ static i32 selftest_sigtimedwait(void)
     tf.a0 = (u64) va;
     tf.a1 = 0;
     tf.a2 = 0;
+    tf.a3 = 0;
     assert(syscall_dispatch(&tf, td) == -(i64) EINVAL);
 
     /* Pre-pend a signal in the per-thread mask, then sigtimedwait
@@ -1569,6 +1456,7 @@ static i32 selftest_sigtimedwait(void)
     tf.a0 = (u64) va;
     tf.a1 = (u64) (va + sizeof(u32));
     tf.a2 = 0;
+    tf.a3 = 0;
     i64 ret = syscall_dispatch(&tf, td);
     assert(ret == SIGUSR2);
     assert((td->td_sig.pending & sig_bit(SIGUSR2)) == 0);
@@ -1581,3 +1469,397 @@ static i32 selftest_sigtimedwait(void)
     return 0;
 }
 DEFINE_SELFTEST(sigtimedwait, selftest_sigtimedwait);
+
+static i32 selftest_sigqueue_payload(void)
+{
+    struct proc *p;
+    struct sched_task *td;
+    assert(alloc_proc_and_task(&p, &td));
+
+    const vaddr_t va = USER_DATA_BASE + (136UL * PAGE_SIZE);
+    assert(proc_map_user_page(p, va, PT_FLAG_RW | PT_FLAG_USER).is_error ==
+           false);
+
+    struct trap_frame tf = {0};
+    tf.a7 = SYS_SIGQUEUE;
+    tf.a0 = p->pid;
+    tf.a1 = SIGUSR1;
+    tf.a2 = 0x1122334455667788ULL;
+    assert(syscall_dispatch(&tf, td) == 0);
+    assert((p->sig_state.proc_pending & sig_bit(SIGUSR1)) != 0);
+
+    u32 set = sig_bit(SIGUSR1);
+    assert(copy_to_user(va, &set, sizeof(set)) == 0);
+
+    tf.a7 = SYS_SIGTIMEDWAIT;
+    tf.a0 = (u64) va;
+    tf.a1 = (u64) (va + sizeof(u32));
+    tf.a2 = 0;
+    tf.a3 = (u64) (va + sizeof(u32) + sizeof(i32));
+    assert(syscall_dispatch(&tf, td) == SIGUSR1);
+    assert((p->sig_state.proc_pending & sig_bit(SIGUSR1)) == 0);
+    assert(p->sig_state.queued[SIGUSR1].count == 0);
+
+    i32 signo_out;
+    u64 value_out;
+    assert(copy_from_user(&signo_out, va + sizeof(u32), sizeof(signo_out)) ==
+           0);
+    assert(copy_from_user(&value_out, va + sizeof(u32) + sizeof(i32),
+                          sizeof(value_out)) == 0);
+    assert(signo_out == SIGUSR1);
+    assert(value_out == 0x1122334455667788ULL);
+
+    free_proc_and_task(p, td);
+    return 0;
+}
+DEFINE_SELFTEST(sigqueue_payload, selftest_sigqueue_payload);
+
+static i32 selftest_sigqueue_full(void)
+{
+    struct proc *p;
+    struct sched_task *td;
+    assert(alloc_proc_and_task(&p, &td));
+
+    struct trap_frame tf = {0};
+    tf.a7 = SYS_SIGQUEUE;
+    tf.a0 = p->pid;
+    tf.a1 = SIGUSR2;
+
+    for (u64 i = 0; i < SIGQUEUE_MAX_PER_SIGNO; i++) {
+        tf.a2 = i + 1;
+        assert(syscall_dispatch(&tf, td) == 0);
+    }
+
+    tf.a2 = 99;
+    assert(syscall_dispatch(&tf, td) == -(i64) EAGAIN);
+    assert(p->sig_state.queued[SIGUSR2].count == SIGQUEUE_MAX_PER_SIGNO);
+
+    free_proc_and_task(p, td);
+    return 0;
+}
+DEFINE_SELFTEST(sigqueue_full, selftest_sigqueue_full);
+
+/* FIFO order across multiple queued sigqueue payloads: dequeues must
+ * return values in the order they were posted.
+ */
+static i32 selftest_sigqueue_fifo(void)
+{
+    struct proc *p;
+    struct sched_task *td;
+    assert(alloc_proc_and_task(&p, &td));
+
+    const vaddr_t va = USER_DATA_BASE + (137UL * PAGE_SIZE);
+    assert(proc_map_user_page(p, va, PT_FLAG_RW | PT_FLAG_USER).is_error ==
+           false);
+
+    struct trap_frame tf = {0};
+    tf.a7 = SYS_SIGQUEUE;
+    tf.a0 = p->pid;
+    tf.a1 = SIGUSR1;
+    for (u64 i = 0; i < SIGQUEUE_MAX_PER_SIGNO; i++) {
+        tf.a2 = 0x100 + i;
+        assert(syscall_dispatch(&tf, td) == 0);
+    }
+    assert(p->sig_state.queued[SIGUSR1].count == SIGQUEUE_MAX_PER_SIGNO);
+
+    u32 set = sig_bit(SIGUSR1);
+    assert(copy_to_user(va, &set, sizeof(set)) == 0);
+
+    for (u64 i = 0; i < SIGQUEUE_MAX_PER_SIGNO; i++) {
+        tf.a7 = SYS_SIGTIMEDWAIT;
+        tf.a0 = (u64) va;
+        tf.a1 = (u64) (va + sizeof(u32));
+        tf.a2 = 0;
+        tf.a3 = (u64) (va + sizeof(u32) + sizeof(i32));
+        assert(syscall_dispatch(&tf, td) == SIGUSR1);
+        u64 value_out;
+        assert(copy_from_user(&value_out, va + sizeof(u32) + sizeof(i32),
+                              sizeof(value_out)) == 0);
+        assert(value_out == 0x100 + i);
+    }
+    assert(p->sig_state.queued[SIGUSR1].count == 0);
+    assert((p->sig_state.proc_pending & sig_bit(SIGUSR1)) == 0);
+    assert((p->sig_state.proc_pending_plain & sig_bit(SIGUSR1)) == 0);
+
+    free_proc_and_task(p, td);
+    return 0;
+}
+DEFINE_SELFTEST(sigqueue_fifo, selftest_sigqueue_fifo);
+
+/* kill() followed by sigqueue() on the same signo: consuming the queued
+ * payload must not silently clear the plain pending instance. The receiver
+ * dequeues the sigqueue payload first (FIFO), then the plain kill instance
+ * with no payload.
+ */
+static i32 selftest_sigqueue_then_kill_coexist(void)
+{
+    struct proc *p;
+    struct sched_task *td;
+    assert(alloc_proc_and_task(&p, &td));
+
+    const vaddr_t va = USER_DATA_BASE + (138UL * PAGE_SIZE);
+    assert(proc_map_user_page(p, va, PT_FLAG_RW | PT_FLAG_USER).is_error ==
+           false);
+
+    struct trap_frame tf = {0};
+    /* First: plain kill(). */
+    tf.a7 = SYS_KILL;
+    tf.a0 = p->pid;
+    tf.a1 = SIGUSR2;
+    assert(syscall_dispatch(&tf, td) == 0);
+    /* Then: sigqueue() with payload. */
+    tf.a7 = SYS_SIGQUEUE;
+    tf.a0 = p->pid;
+    tf.a1 = SIGUSR2;
+    tf.a2 = 0xdeadbeefULL;
+    assert(syscall_dispatch(&tf, td) == 0);
+    assert((p->sig_state.proc_pending & sig_bit(SIGUSR2)) != 0);
+    assert((p->sig_state.proc_pending_plain & sig_bit(SIGUSR2)) != 0);
+    assert(p->sig_state.queued[SIGUSR2].count == 1);
+
+    /* First dequeue: sigqueue payload. */
+    u32 set = sig_bit(SIGUSR2);
+    assert(copy_to_user(va, &set, sizeof(set)) == 0);
+    u64 sentinel = 0xa5a5a5a5a5a5a5a5ULL;
+    assert(copy_to_user(va + sizeof(u32) + sizeof(i32), &sentinel,
+                        sizeof(sentinel)) == 0);
+    tf.a7 = SYS_SIGTIMEDWAIT;
+    tf.a0 = (u64) va;
+    tf.a1 = (u64) (va + sizeof(u32));
+    tf.a2 = 0;
+    tf.a3 = (u64) (va + sizeof(u32) + sizeof(i32));
+    assert(syscall_dispatch(&tf, td) == SIGUSR2);
+    u64 value_out;
+    assert(copy_from_user(&value_out, va + sizeof(u32) + sizeof(i32),
+                          sizeof(value_out)) == 0);
+    assert(value_out == 0xdeadbeefULL);
+    /* Plain instance must still be pending after consuming the queued one. */
+    assert((p->sig_state.proc_pending & sig_bit(SIGUSR2)) != 0);
+    assert((p->sig_state.proc_pending_plain & sig_bit(SIGUSR2)) != 0);
+    assert(p->sig_state.queued[SIGUSR2].count == 0);
+
+    /* Second dequeue: plain kill instance, payload untouched. */
+    assert(copy_to_user(va + sizeof(u32) + sizeof(i32), &sentinel,
+                        sizeof(sentinel)) == 0);
+    assert(syscall_dispatch(&tf, td) == SIGUSR2);
+    assert(copy_from_user(&value_out, va + sizeof(u32) + sizeof(i32),
+                          sizeof(value_out)) == 0);
+    assert(value_out == sentinel);
+    assert((p->sig_state.proc_pending & sig_bit(SIGUSR2)) == 0);
+    assert((p->sig_state.proc_pending_plain & sig_bit(SIGUSR2)) == 0);
+
+    free_proc_and_task(p, td);
+    return 0;
+}
+DEFINE_SELFTEST(sigqueue_then_kill_coexist,
+                selftest_sigqueue_then_kill_coexist);
+
+/* sigqueue() followed by kill() on the same signo: same invariant in
+ * reverse arrival order. The queued payload still comes out first.
+ */
+static i32 selftest_kill_then_sigqueue_coexist(void)
+{
+    struct proc *p;
+    struct sched_task *td;
+    assert(alloc_proc_and_task(&p, &td));
+
+    const vaddr_t va = USER_DATA_BASE + (139UL * PAGE_SIZE);
+    assert(proc_map_user_page(p, va, PT_FLAG_RW | PT_FLAG_USER).is_error ==
+           false);
+
+    struct trap_frame tf = {0};
+    tf.a7 = SYS_SIGQUEUE;
+    tf.a0 = p->pid;
+    tf.a1 = SIGUSR1;
+    tf.a2 = 0xcafebabeULL;
+    assert(syscall_dispatch(&tf, td) == 0);
+    tf.a7 = SYS_KILL;
+    tf.a0 = p->pid;
+    tf.a1 = SIGUSR1;
+    assert(syscall_dispatch(&tf, td) == 0);
+    assert(p->sig_state.queued[SIGUSR1].count == 1);
+    assert((p->sig_state.proc_pending_plain & sig_bit(SIGUSR1)) != 0);
+
+    u32 set = sig_bit(SIGUSR1);
+    assert(copy_to_user(va, &set, sizeof(set)) == 0);
+    tf.a7 = SYS_SIGTIMEDWAIT;
+    tf.a0 = (u64) va;
+    tf.a1 = (u64) (va + sizeof(u32));
+    tf.a2 = 0;
+    tf.a3 = (u64) (va + sizeof(u32) + sizeof(i32));
+    assert(syscall_dispatch(&tf, td) == SIGUSR1);
+    u64 value_out;
+    assert(copy_from_user(&value_out, va + sizeof(u32) + sizeof(i32),
+                          sizeof(value_out)) == 0);
+    assert(value_out == 0xcafebabeULL);
+    assert((p->sig_state.proc_pending_plain & sig_bit(SIGUSR1)) != 0);
+    assert((p->sig_state.proc_pending & sig_bit(SIGUSR1)) != 0);
+
+    /* Drain the plain instance. */
+    assert(syscall_dispatch(&tf, td) == SIGUSR1);
+    assert((p->sig_state.proc_pending & sig_bit(SIGUSR1)) == 0);
+
+    free_proc_and_task(p, td);
+    return 0;
+}
+DEFINE_SELFTEST(kill_then_sigqueue_coexist,
+                selftest_kill_then_sigqueue_coexist);
+
+/* Realistic concurrent-producer rollback: queue is full at the producer cap,
+ * a consumer dequeues the head, a producer immediately refills the vacated
+ * slot, then the consumer's copy_to_user faults and triggers rollback. The
+ * reserved internal slot must let the rollback succeed losslessly, restoring
+ * FIFO order (the originally-popped value comes out next).
+ */
+static i32 selftest_sigqueue_restore_lossless(void)
+{
+    struct proc *p;
+    struct sched_task *td;
+    assert(alloc_proc_and_task(&p, &td));
+
+    u64 sflags = proc_sig_lock_irqsave(p);
+    /* Producer fills the queue to MAX. */
+    for (u64 i = 0; i < SIGQUEUE_MAX_PER_SIGNO; i++) {
+        p->sig_state.queued[SIGUSR1].values[i] = 0xb000 + i;
+    }
+    p->sig_state.queued[SIGUSR1].head = 0;
+    p->sig_state.queued[SIGUSR1].tail = SIGQUEUE_MAX_PER_SIGNO;
+    p->sig_state.queued[SIGUSR1].count = SIGQUEUE_MAX_PER_SIGNO;
+    p->sig_state.proc_pending |= sig_bit(SIGUSR1);
+
+    /* Consumer dequeues into a local. */
+    u64 popped;
+    bool had_value;
+    assert(signal_claim_proc_pending_locked(p, SIGUSR1, &popped, &had_value));
+    assert(had_value == true);
+    assert(popped == 0xb000);
+    assert(p->sig_state.queued[SIGUSR1].count == SIGQUEUE_MAX_PER_SIGNO - 1);
+    proc_sig_unlock_irqrestore(p, sflags);
+
+    /* Producer refills the slot we vacated. */
+    i32 rc = signal_queue_send(p, SIGUSR1, 0xb004);
+    assert(rc == 0);
+    assert(p->sig_state.queued[SIGUSR1].count == SIGQUEUE_MAX_PER_SIGNO);
+
+    /* Consumer faults during copy_to_user and rolls back. The reserved
+     * internal slot makes the push lossless.
+     */
+    sflags = proc_sig_lock_irqsave(p);
+    bool dropped = signal_restore_proc_pending_locked(p, SIGUSR1, popped, true);
+    assert(dropped == false);
+    assert(p->sig_state.queued[SIGUSR1].count == SIGQUEUE_MAX_PER_SIGNO + 1);
+
+    /* FIFO must place the restored value back at the head. */
+    u64 v;
+    bool h;
+    assert(signal_claim_proc_pending_locked(p, SIGUSR1, &v, &h));
+    assert(h == true && v == 0xb000);
+    proc_sig_unlock_irqrestore(p, sflags);
+
+    free_proc_and_task(p, td);
+    return 0;
+}
+DEFINE_SELFTEST(sigqueue_restore_lossless, selftest_sigqueue_restore_lossless);
+
+/* Defense-in-depth: if the ring is somehow filled past the user-visible cap
+ * (pathological multi-consumer race), the rollback helper must still avoid
+ * corrupting q->count and must surface a plain pending instance so the
+ * signal stays observable. The payload is lost in this branch by design.
+ */
+static i32 selftest_sigqueue_restore_overflow(void)
+{
+    struct proc *p;
+    struct sched_task *td;
+    assert(alloc_proc_and_task(&p, &td));
+
+    u64 sflags = proc_sig_lock_irqsave(p);
+    for (u64 i = 0; i < SIGQUEUE_RING_CAP; i++) {
+        p->sig_state.queued[SIGUSR1].values[i] = 0xa000 + i;
+    }
+    p->sig_state.queued[SIGUSR1].head = 0;
+    p->sig_state.queued[SIGUSR1].tail = 0;
+    p->sig_state.queued[SIGUSR1].count = SIGQUEUE_RING_CAP;
+    p->sig_state.proc_pending |= sig_bit(SIGUSR1);
+
+    bool dropped = signal_restore_proc_pending_locked(p, SIGUSR1, 0xdead, true);
+    proc_sig_unlock_irqrestore(p, sflags);
+
+    assert(dropped == true);
+    assert(p->sig_state.queued[SIGUSR1].count == SIGQUEUE_RING_CAP);
+    assert((p->sig_state.proc_pending_plain & sig_bit(SIGUSR1)) != 0);
+    assert((p->sig_state.proc_pending & sig_bit(SIGUSR1)) != 0);
+
+    free_proc_and_task(p, td);
+    return 0;
+}
+DEFINE_SELFTEST(sigqueue_restore_overflow, selftest_sigqueue_restore_overflow);
+
+/* sigtimedwait pre-validation rejects an unwritable signo_out without
+ * consuming the queued payload, so a retry can observe it.
+ */
+static i32 selftest_sigtimedwait_efault_rollback(void)
+{
+    struct proc *p;
+    struct sched_task *td;
+    assert(alloc_proc_and_task(&p, &td));
+
+    const vaddr_t va = USER_DATA_BASE + (140UL * PAGE_SIZE);
+    assert(proc_map_user_page(p, va, PT_FLAG_RW | PT_FLAG_USER).is_error ==
+           false);
+
+    struct trap_frame tf = {0};
+    tf.a7 = SYS_SIGQUEUE;
+    tf.a0 = p->pid;
+    tf.a1 = SIGUSR1;
+    tf.a2 = 0x42ULL;
+    assert(syscall_dispatch(&tf, td) == 0);
+
+    /* Pass an unmapped pointer for signo_out. Pre-validation in
+     * sys_sigtimedwait_h catches this before the bit is consumed, so the
+     * queue must still hold the payload.
+     */
+    u32 set = sig_bit(SIGUSR1);
+    assert(copy_to_user(va, &set, sizeof(set)) == 0);
+    tf.a7 = SYS_SIGTIMEDWAIT;
+    tf.a0 = (u64) va;
+    tf.a1 = (u64) 0xdeadc0deUL;
+    tf.a2 = 0;
+    tf.a3 = 0;
+    assert(syscall_dispatch(&tf, td) == -(i64) EFAULT);
+    assert(p->sig_state.queued[SIGUSR1].count == 1);
+    assert((p->sig_state.proc_pending & sig_bit(SIGUSR1)) != 0);
+    /* Retry with a valid pointer must observe the still-queued payload. */
+    tf.a1 = (u64) (va + sizeof(u32));
+    tf.a3 = (u64) (va + sizeof(u32) + sizeof(i32));
+    assert(syscall_dispatch(&tf, td) == SIGUSR1);
+    u64 value_out;
+    assert(copy_from_user(&value_out, va + sizeof(u32) + sizeof(i32),
+                          sizeof(value_out)) == 0);
+    assert(value_out == 0x42ULL);
+
+    free_proc_and_task(p, td);
+    return 0;
+}
+DEFINE_SELFTEST(sigtimedwait_efault_rollback,
+                selftest_sigtimedwait_efault_rollback);
+
+/* ABI stability check: SYS_SIGQUEUE is appended at the end of the table; the
+ * pre-existing trailing entries must keep their numbers.
+ */
+static i32 selftest_sigqueue_abi_numbering(void)
+{
+    static_assert(SYS_THREAD_CANCEL == 93, "SYS_THREAD_CANCEL must stay at 93");
+    static_assert(SYS_THREAD_SETCANCELSTATE == 94,
+                  "SYS_THREAD_SETCANCELSTATE must stay at 94");
+    static_assert(SYS_THREAD_TESTCANCEL == 95,
+                  "SYS_THREAD_TESTCANCEL must stay at 95");
+    static_assert(SYS_CAP_DROP == 96, "SYS_CAP_DROP must stay at 96");
+    static_assert(SYS_CAP_TRANSFER == 97, "SYS_CAP_TRANSFER must stay at 97");
+    static_assert(SYS_CAP_REVOKE_DELEGATE == 98,
+                  "SYS_CAP_REVOKE_DELEGATE must stay at 98");
+    static_assert(SYS_CAP_GET_TOKEN == 99, "SYS_CAP_GET_TOKEN must stay at 99");
+    static_assert(SYS_SIGQUEUE == 100, "SYS_SIGQUEUE must be appended at 100");
+    static_assert(SYS_NR == 101, "SYS_NR must stay at 101");
+    return 0;
+}
+DEFINE_SELFTEST(sigqueue_abi_numbering, selftest_sigqueue_abi_numbering);