Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ nav:
- runbooks/index.md
- runbooks/dqlite-write-contention.md
- runbooks/jiva-ctrl-eviction-iscsi-ro-filesystem.md
- runbooks/jiva-ctrl-node-rolling-restart.md
- runbooks/jiva-csi-mount-proliferation.md
- runbooks/kcm-stale-terminating-replicas.md
- runbooks/kubelet-volume-manager-stall.md
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -655,6 +655,7 @@ kubectl --context pvek8s wait pod <statefulset-pod-name> -n <namespace> --for=co
- Related PIR: [k8s03 Extended Recovery β€” kine Watch Corruption, VXLAN Route Corruption, and Kubelet Watch Stream Stall](2026-05-18-k8s03-extended-recovery-kine-watch-vxlan-route-corruption.md)
- Related PIR: [dqlite Snapshot Bloat β†’ Watch Stream Failure](2026-04-02-dqlite-snapshot-crash-loop-watch-stream-failure.md)
- Runbook: [jiva-ctrl-eviction-iscsi-ro-filesystem.md](../runbooks/jiva-ctrl-eviction-iscsi-ro-filesystem.md) β€” jiva-ctrl eviction β†’ iSCSI drop β†’ EXT4 ro cascade (new, from this PIR)
- Runbook: [jiva-ctrl-node-rolling-restart.md](../runbooks/jiva-ctrl-node-rolling-restart.md) β€” safe pre-restart migration procedure to prevent this failure mode
- Runbook: [kubelet-volume-manager-stall.md](../runbooks/kubelet-volume-manager-stall.md) β€” iSCSI WaitForAttachAndMount hang from processorListener stall
- Runbook: [kcm-stale-terminating-replicas.md](../runbooks/kcm-stale-terminating-replicas.md) β€” stale terminatingReplicas after kine watch disruption
- Runbook: [kubelet-silent-stall.md](../runbooks/kubelet-silent-stall.md) β€” related failure modes (pod watch goroutine stall, PLEG stall)
Expand Down
1 change: 1 addition & 0 deletions src/runbooks/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Operational runbooks for diagnosing and recovering from known failure patterns o
| [KCM Stale terminatingReplicas](kcm-stale-terminating-replicas.md) | microk8s/kube-controller-manager | ReplicaSet refuses to create pods β€” KCM pod informer stale after kine disruption; terminatingReplicas stuck |
| [Jiva CSI Mount Proliferation](jiva-csi-mount-proliferation.md) | openebs-jiva-csi | Duplicate bind mounts accumulate per kubelite restart, causing findmnt/Ansible hangs |
| [Jiva-ctrl Eviction β†’ iSCSI β†’ EXT4 Read-Only](jiva-ctrl-eviction-iscsi-ro-filesystem.md) | openebs-jiva-csi | Pod filesystem goes read-only after jiva-ctrl pod evicted, dropping iSCSI session and triggering EXT4 journal abort |
| [Safe Node Restart (jiva-ctrl hosted)](jiva-ctrl-node-rolling-restart.md) | openebs-jiva-csi | Pre-restart procedure for nodes hosting jiva-ctrl pods β€” migrate workloads and verify iSCSI sessions clear before restarting |
| [dqlite Write Contention](dqlite-write-contention.md) | microk8s/dqlite | `database is locked (try:500)` under kubelite restart storms β€” prevention, recovery, phantom RS fix |

## Scripts
Expand Down
4 changes: 2 additions & 2 deletions src/runbooks/jiva-ctrl-eviction-iscsi-ro-filesystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ Before applying a `NoExecute` taint to or draining a node:

4. **If no sessions exist:** Safe to proceed directly.

See [PGM-223](https://linear.app/pgmac-net-au/issue/PGM-223) for the full rolling restart runbook (to be written).
See [jiva-ctrl-node-rolling-restart.md](jiva-ctrl-node-rolling-restart.md) for the full step-by-step procedure including commands to identify sessions, migrate workloads, and verify logout before restarting.

### Structural mitigations (not yet implemented)

Expand All @@ -271,7 +271,7 @@ See [PGM-223](https://linear.app/pgmac-net-au/issue/PGM-223) for the full rollin
- Linear: [PGM-224](https://linear.app/pgmac-net-au/issue/PGM-224) β€” this runbook
- Linear: [PGM-221](https://linear.app/pgmac-net-au/issue/PGM-221) β€” log-based alerts (planned)
- Linear: [PGM-222](https://linear.app/pgmac-net-au/issue/PGM-222) β€” extended jiva-ctrl tolerations (planned)
- Linear: [PGM-223](https://linear.app/pgmac-net-au/issue/PGM-223) β€” rolling restart runbook for jiva-ctrl nodes (planned)
- Runbook: [jiva-ctrl-node-rolling-restart.md](jiva-ctrl-node-rolling-restart.md) β€” safe pre-restart migration procedure for nodes hosting jiva-ctrl pods (prevents this failure mode)
- Related: [jiva-csi-mount-proliferation.md](jiva-csi-mount-proliferation.md) β€” duplicate CSI mounts from kubelite restarts (separate but related failure mode affecting same jiva-csi-node DaemonSet)
- Related: [kubelet-volume-manager-stall.md](kubelet-volume-manager-stall.md) β€” iSCSI attach failure where pods are stuck ContainerCreating (vs this runbook: pod was already Running then lost storage)
- Related: [dqlite-write-contention.md](dqlite-write-contention.md) β€” KCM dqlite reconnect behaviour that causes batched evictions
Expand Down
218 changes: 218 additions & 0 deletions src/runbooks/jiva-ctrl-node-rolling-restart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
---
tags:
- runbook
- microk8s
- storage
- openebs
- jiva
- iscsi
- node-restart
---

# Safe Node Restart for Nodes Hosting jiva-ctrl Pods

**Service:** OpenEBS Jiva iSCSI (pvek8s)
**First documented:** 2026-05-30
**PIR:** [pvek8s Post-Power-Outage Recovery β€” kubelet Volume Manager Stall and KCM Stale terminatingReplicas](../incidents/2026-05-28-pvek8s-post-outage-kubelet-informer-kcm-stall.md)
**Linear:** [PGM-223](https://linear.app/pgmac-net-au/issue/PGM-223)

---

## When to Use This Runbook

Use this runbook whenever you need to restart kubelite (or drain/taint) a node that may be hosting jiva-ctrl pods (iSCSI targets).

**Why this matters:** jiva-ctrl pods are iSCSI targets. When the node running them is restarted, those pods are evicted and the iSCSI target process exits. Any workload pod on *another* node that has an active iSCSI session to the controller will detect a TCP connection failure, enter 120-second session recovery, and β€” if the target does not reappear within that window β€” have its SCSI device go offline. The kernel's JBD2 journal then aborts and EXT4 remounts the filesystem read-only. This is a data-safe failure but requires manual recovery.

The pre-restart procedure below migrates affected workload pods *before* the restart, so the iSCSI sessions are already gone and there is nothing to fail over.

See [jiva-ctrl-eviction-iscsi-ro-filesystem.md](jiva-ctrl-eviction-iscsi-ro-filesystem.md) for recovery if the filesystem has already gone read-only.

---

## Pre-Restart Procedure

### Step 1 β€” Identify jiva-ctrl pods on the target node

```bash
TARGET_NODE=<node> # e.g. k8s01

kubectl --context pvek8s get pods -n openebs -o wide --no-headers | \
awk -v n="$TARGET_NODE" '/jiva.*ctrl/ && $7==n {print $1, $7}'
```

If the output is empty, no jiva-ctrl pods are on this node β€” skip to the [Node Restart Procedure](#node-restart-procedure).

Example output:
```
pvc-746b2837-...-jiva-ctrl-0 k8s01
pvc-a3a7e012-...-jiva-ctrl-0 k8s01
```

### Step 2 β€” Find nodes with active iSCSI sessions to those controllers

For each jiva-ctrl pod, check whether any node has a live iSCSI session to its controller service:

```bash
# Get the ClusterIP of each controller's service
# The service name shares the PV prefix with the ctrl pod name
kubectl --context pvek8s get svc -n openebs | grep "jiva-ctrl"
# β†’ pvc-746b2837-...-jiva-ctrl-svc ClusterIP 10.152.183.57 ...
# β†’ pvc-a3a7e012-...-jiva-ctrl-svc ClusterIP 10.152.183.22 ...

# Check all nodes for active sessions to those IPs
for pod in $(kubectl --context pvek8s get pods -n openebs \
-l app=openebs-jiva-csi-node -o name); do
echo "=== $pod ==="
kubectl --context pvek8s exec -n openebs "$pod" -c jiva-csi-plugin -- \
iscsiadm -m session 2>/dev/null || echo "(no sessions)"
done
```

Note which nodes have sessions to each controller IP. Those are the nodes hosting workload pods that must be migrated before the restart.

### Step 3 β€” Migrate workload pods off the affected nodes

For each controller with active sessions on other nodes, find and delete the workload pod that holds that PVC:

```bash
# Derive the PV name from the ctrl pod name (strip -jiva-ctrl-N suffix)
CTRL_POD=pvc-746b2837-...-jiva-ctrl-0
PV_NAME=${CTRL_POD%-jiva-ctrl-*}

# Find the PVC bound to this PV
kubectl --context pvek8s get pvc -A --no-headers | awk -v pv="$PV_NAME" '$3==pv {print $1, $2}'
# β†’ media seerr-seerr-chart-config

# Find the pod in that namespace using that PVC
PVC_NS=media
PVC_NAME=seerr-seerr-chart-config
kubectl --context pvek8s get pods -n "$PVC_NS" -o json | \
python3 -c "
import json,sys
data=json.load(sys.stdin)
pvc='$PVC_NAME'
for p in data['items']:
for v in p['spec'].get('volumes',[]):
if v.get('persistentVolumeClaim',{}).get('claimName')==pvc:
print(p['metadata']['name'])
"
```

Once you have the pod name, delete it and wait for it to reschedule to a node that is **not** `$TARGET_NODE`:

```bash
kubectl --context pvek8s delete pod -n "$PVC_NS" <pod-name>

# Watch until Running on a different node
kubectl --context pvek8s get pod -n "$PVC_NS" <pod-name> -o wide -w
# β†’ 1/1 Running on k8s02 or k8s03 (not TARGET_NODE)
```

!!! warning "StatefulSet pods do not reschedule automatically on cordoned nodes"
If the node is already cordoned (or if you cordon it before deleting), StatefulSet pods will stay
Pending until you uncordon another eligible node. Delete the pod *before* cordoning the target node
so the scheduler can place it freely.

Repeat for every controller with active sessions.

### Step 4 β€” Verify all sessions have logged out

Confirm no node retains an iSCSI session to the controllers that were on `$TARGET_NODE`:

```bash
for pod in $(kubectl --context pvek8s get pods -n openebs \
-l app=openebs-jiva-csi-node -o name); do
echo "=== $pod ==="
kubectl --context pvek8s exec -n openebs "$pod" -c jiva-csi-plugin -- \
iscsiadm -m session 2>/dev/null | grep "<controller-ClusterIP>" || echo "(none)"
done
# All nodes should show "(none)" for the affected controller IPs
```

Only proceed once all sessions to the affected controllers are gone.

---

## Node Restart Procedure

With iSCSI sessions safely cleared, restart the node using the standard dqlite β†’ kubelite ordering:

1. **Cordon the node** (required β€” prevents the kubelet watch-race stall on restart):

```bash
kubectl --context pvek8s cordon "$TARGET_NODE"
```

See [kubelet-silent-stall.md β€” Failure Mode 2](kubelet-silent-stall.md) for why cordoning before restart is mandatory.

2. **Restart k8s-dqlite first**, wait for it to stabilise:

```bash
ssh "$TARGET_NODE" "sudo systemctl restart snap.microk8s.daemon-k8s-dqlite.service"
# Wait until active and no 'database is locked' errors for 30s
ssh "$TARGET_NODE" "sudo systemctl is-active snap.microk8s.daemon-k8s-dqlite.service"
```

3. **Restart kubelite**:

```bash
ssh "$TARGET_NODE" "sudo systemctl restart snap.microk8s.daemon-kubelite.service"
```

4. **Wait for node Ready**:

```bash
kubectl --context pvek8s wait node/"$TARGET_NODE" --for=condition=Ready --timeout=300s
```

5. **Uncordon**:

```bash
kubectl --context pvek8s uncordon "$TARGET_NODE"
```

See [kubelet-volume-manager-stall.md β€” Option B](kubelet-volume-manager-stall.md) for the full dqlite restart safety procedure and lock-contention checks.

---

## Post-Restart Verification

```bash
# Node is Ready and schedulable
kubectl --context pvek8s get node "$TARGET_NODE"
# β†’ Ready (no SchedulingDisabled)

# jiva-ctrl pods have rescheduled and are Running
kubectl --context pvek8s get pods -n openebs -o wide | grep jiva.*ctrl
# β†’ all Running, spread across nodes

# Workload pods that were migrated are Running with rw filesystems
kubectl --context pvek8s get pods -n <namespace> <pod-name> -o wide
# β†’ 1/1 Running on a node other than TARGET_NODE

# iSCSI sessions re-established on the workload node
NEW_NODE=$(kubectl --context pvek8s get pod -n <namespace> <pod-name> \
-o jsonpath='{.spec.nodeName}')
NEW_JIVA_POD=$(kubectl --context pvek8s get pods -n openebs \
-l app=openebs-jiva-csi-node \
-o jsonpath="{.items[?(@.spec.nodeName=='$NEW_NODE')].metadata.name}")
kubectl --context pvek8s exec -n openebs "$NEW_JIVA_POD" -c jiva-csi-plugin -- \
iscsiadm -m session
# β†’ tcp: [...] iqn.2016-09.com.openebs.jiva:<pvc-name> (non-flash)

# Filesystem is rw
kubectl --context pvek8s exec -n openebs "$NEW_JIVA_POD" -c jiva-csi-plugin -- \
grep "<pvc-name>" /proc/mounts
# β†’ should show rw in mount options, not ro
```

---

## References

- PIR: [pvek8s Post-Power-Outage Recovery](../incidents/2026-05-28-pvek8s-post-outage-kubelet-informer-kcm-stall.md) β€” Chain 4 root cause (batched jiva-ctrl eviction β†’ EXT4 ro)
- Linear: [PGM-223](https://linear.app/pgmac-net-au/issue/PGM-223) β€” this runbook
- Related: [jiva-ctrl-eviction-iscsi-ro-filesystem.md](jiva-ctrl-eviction-iscsi-ro-filesystem.md) β€” recovery if the filesystem has already gone read-only (use when it's too late to migrate first)
- Related: [kubelet-volume-manager-stall.md](kubelet-volume-manager-stall.md) β€” Option B: full dqlite+kubelite restart procedure and lock-contention safety checks
- Related: [kubelet-silent-stall.md](kubelet-silent-stall.md) β€” Failure Mode 2: why cordon-before-restart is required for kubelite restarts
1 change: 1 addition & 0 deletions src/runbooks/kubelet-silent-stall.md
Original file line number Diff line number Diff line change
Expand Up @@ -449,3 +449,4 @@ ssh <node> "sudo journalctl -u snap.microk8s.daemon-kubelite --since '24 hours a
- PIR: [microk8s 1.34 β†’ 1.35 Upgrade](../incidents/2026-05-16-microk8s-1.35-upgrade-cgroup-v2-containerd-disk-pressure.md) β€” Phases 4 and 8
- Linear: [PGM-187](https://linear.app/pgmac-net-au/issue/PGM-187), [PGM-195](https://linear.app/pgmac-net-au/issue/PGM-195), [PGM-203](https://linear.app/pgmac-net-au/issue/PGM-203), [PGM-201](https://linear.app/pgmac-net-au/issue/PGM-201)
- Related: [dqlite-write-contention runbook](dqlite-write-contention.md) β€” k8s-dqlite restart context
- Related: [jiva-ctrl-node-rolling-restart.md](jiva-ctrl-node-rolling-restart.md) β€” uses the cordon-before-restart requirement from Failure Mode 2; adds jiva-ctrl iSCSI session migration before the restart