add service fabric linux cluster certificate rotation#327
Draft
jagilber wants to merge 4 commits intoAzure:masterfrom
Draft
add service fabric linux cluster certificate rotation#327jagilber wants to merge 4 commits intoAzure:masterfrom
jagilber wants to merge 4 commits intoAzure:masterfrom
Conversation
…shooting - Add full ARM JSON snippets for all 6 rotation steps (VMSS secrets, extension settings, SF cluster resource, swap, remove) - Add PowerShell commands (Add/Remove-AzServiceFabricClusterCertificate, New-AzResourceGroupDeployment) - Fix cert format docs: clarify .crt/.prv (waagent) vs .crt/.key (KV extension) vs .pem - Fix ClusterManifest locations: document both TempClusterManifest.xml and ClusterManifest.current.xml - Fix recovery Option 2: replace incorrect systemctl stop/start servicefabric with pkill sfbootstrapagent/FabricHost + walinuxagent restart - Fix recovery Option 1: add prereqs note about az CLI availability, add curl+managed identity alternative - Add InfrastructureManifest.xml and Settings.xml updates to manual recovery - Add Key Vault VM extension docs for common-name cert auto-rollover - Add Managing Azure Resources reference link
Tested against Ubuntu 22.04 SF cluster (sfljagilber1lx3) with full cert rotation cycle. Key corrections: - Fix systemd services: servicefabric.service and servicefabricnodebootstrapagent.service DO exist on modern clusters - Fix file permissions: POSIX ACLs (root:root + sfuser ACL), not sfuser ownership - Fix certificateStore: null on Linux, not 'My' - Fix typeHandlerVersion: 2.0, not 1.1 - Fix TempClusterManifest.xml: single-line XML, not updated by config upgrades - Expand /var/lib/sfcerts/ contents (includes .pfx, .pem, transport certs) - Add waagent .pem files note alongside .crt/.prv - Add SFRP does NOT auto-update VMSS extension settings warning - Add Add-AzServiceFabricClusterCertificate deprecation notice (Az 6.0+) - Fix manifest grep commands to use python3 xml pretty-print - Add getfacl commands for ACL verification - Split quick reference manifest check into runtime vs staging
- Add 'After Emergency Recovery: Update SFRP' section after Options 1/2 - Aligns with Manual Steps and Automated Script TSGs that require contacting Microsoft Support for SFRP backend update - Note no Linux equivalent of FixExpiredCert.ps1 - Add missing reference to Fix Expired Cert Automated Script TSG
Contributor
|
@jagilber any reason not to target microsoft-learn for this? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
service fabric linux cluster certificate rotation