Skip to content

add service fabric linux cluster certificate rotation#327

Draft
jagilber wants to merge 4 commits intoAzure:masterfrom
jagilber:linux
Draft

add service fabric linux cluster certificate rotation#327
jagilber wants to merge 4 commits intoAzure:masterfrom
jagilber:linux

Conversation

@jagilber
Copy link
Copy Markdown
Member

@jagilber jagilber commented Apr 1, 2026

service fabric linux cluster certificate rotation

…shooting

- Add full ARM JSON snippets for all 6 rotation steps (VMSS secrets, extension settings, SF cluster resource, swap, remove)
- Add PowerShell commands (Add/Remove-AzServiceFabricClusterCertificate, New-AzResourceGroupDeployment)
- Fix cert format docs: clarify .crt/.prv (waagent) vs .crt/.key (KV extension) vs .pem
- Fix ClusterManifest locations: document both TempClusterManifest.xml and ClusterManifest.current.xml
- Fix recovery Option 2: replace incorrect systemctl stop/start servicefabric with pkill sfbootstrapagent/FabricHost + walinuxagent restart
- Fix recovery Option 1: add prereqs note about az CLI availability, add curl+managed identity alternative
- Add InfrastructureManifest.xml and Settings.xml updates to manual recovery
- Add Key Vault VM extension docs for common-name cert auto-rollover
- Add Managing Azure Resources reference link
Tested against Ubuntu 22.04 SF cluster (sfljagilber1lx3) with full
cert rotation cycle. Key corrections:

- Fix systemd services: servicefabric.service and
  servicefabricnodebootstrapagent.service DO exist on modern clusters
- Fix file permissions: POSIX ACLs (root:root + sfuser ACL), not sfuser ownership
- Fix certificateStore: null on Linux, not 'My'
- Fix typeHandlerVersion: 2.0, not 1.1
- Fix TempClusterManifest.xml: single-line XML, not updated by config upgrades
- Expand /var/lib/sfcerts/ contents (includes .pfx, .pem, transport certs)
- Add waagent .pem files note alongside .crt/.prv
- Add SFRP does NOT auto-update VMSS extension settings warning
- Add Add-AzServiceFabricClusterCertificate deprecation notice (Az 6.0+)
- Fix manifest grep commands to use python3 xml pretty-print
- Add getfacl commands for ACL verification
- Split quick reference manifest check into runtime vs staging
- Add 'After Emergency Recovery: Update SFRP' section after Options 1/2
- Aligns with Manual Steps and Automated Script TSGs that require
  contacting Microsoft Support for SFRP backend update
- Note no Linux equivalent of FixExpiredCert.ps1
- Add missing reference to Fix Expired Cert Automated Script TSG
@amenarde
Copy link
Copy Markdown
Contributor

@jagilber any reason not to target microsoft-learn for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants