feat: add export-from-confluence action and workflow#16
Open
orangit-timo-aho wants to merge 1 commit into
Open
feat: add export-from-confluence action and workflow#16orangit-timo-aho wants to merge 1 commit into
orangit-timo-aho wants to merge 1 commit into
Conversation
87428dd to
c292542
Compare
## Action (.github/actions/export-from-confluence/) Composite action that exports Confluence pages to local Markdown files: - Exports a full space tree, a named root-page subtree, or a single page by ID - Downloads image attachments (png/jpg/gif/svg/webp/bmp/ico) alongside each page - Reconstructs folder hierarchy from the Confluence page tree - Writes YAML frontmatter (confluence_url, page_id) into each exported .md file - Converts Confluence storage format to Markdown: code macros, info macros, tables, lists, headings, blockquotes, inline formatting, links setup-python pinned to SHA, consistent with the publish-to-confluence action. ## Workflow (.github/workflows/export-from-confluence.yml) Manual (workflow_dispatch) trigger with inputs: space-key, root-page-title, page-id, output-dir, max-depth Credentials fetched from 1Password via service account token (op://orangit-documenter/confluence-credentials/). PR creation uses native git + gh CLI: - Checks out a branch confluence-export/<run-id> - Stages only the output-dir - Skips PR creation if there are no changes - Opens a PR with full context (space, page, actor) in the body
c292542 to
485981e
Compare
Contributor
orangit-sami-bister
left a comment
There was a problem hiding this comment.
Correctness / Logic Issues
4. Silent auth bypass in "Test Confluence connection" step (action.yml lines ~66-74)
try:
conf.get_page_by_id("0")
except Exception:
pass # Expected — just checking auth doesn't throw 401
This swallows all exceptions including 401/403. A bad token will silently pass this check. At minimum, inspect the exception type/status code and fail if it's an auth error.
5. download_attachments silently ignores download failures (export.py line ~336)
except Exception:
pass
Failed attachment downloads are silently dropped. The exported Markdown will have broken image links with no warning. Should at least print(f" ✗ Failed to download: {filename}").
6. Attachment pagination limit is hardcoded to 100 (export.py line ~309)
result = conf.get_attachments_from_content(page_id, start=0, limit=100)
Pages with >100 attachments will silently miss some. Should paginate.
7. Nested list handling is fragile (export.py lines ~200-220)
The 5-iteration loop for nested lists is a workaround. It will silently truncate lists nested deeper than 5 levels. Consider using a recursive approach or a proper HTML parser (html.parser / lxml).
The masking gap
GitHub Actions automatically masks values from secrets.* in logs. However, in this workflow the secrets are fetched via op read and written to $GITHUB_ENV:
workflow.yml lines ~53-57
echo "CONFLUENCE_URL=$(op read 'op://...')" >> "$GITHUB_ENV"
echo "CONFLUENCE_USER=$(op read '...')" >> "$GITHUB_ENV"
echo "CONFLUENCE_API_TOKEN=$(op read '...')" >> "$GITHUB_ENV"
Values written to $GITHUB_ENV this way are not automatically masked. GitHub only masks values that come directly from secrets.*. Since these values are set dynamically at runtime, they will appear in plain text in:
- Step debug logs (if debug logging is enabled)
- Any step that accidentally echos an env var
- Error messages from the Python atlassian library that might include the URL or user in exception text
How to fix
Use add-mask immediately after reading each secret:
CONFLUENCE_URL=$(op read 'op://orangit-documenter/confluence-credentials/url')
CONFLUENCE_USER=$(op read 'op://orangit-documenter/confluence-credentials/username')
CONFLUENCE_API_TOKEN=$(op read 'op://orangit-documenter/confluence-credentials/token')
echo "::add-mask::$CONFLUENCE_URL"
echo "::add-mask::$CONFLUENCE_USER"
echo "::add-mask::$CONFLUENCE_API_TOKEN"
{
echo "CONFLUENCE_URL=$CONFLUENCE_URL"
echo "CONFLUENCE_USER=$CONFLUENCE_USER"
echo "CONFLUENCE_API_TOKEN=$CONFLUENCE_API_TOKEN"
} >> "$GITHUB_ENV"
The ::add-mask:: workflow command tells the runner to redact that value from all subsequent log output for the rest of the job.
Note: CONFLUENCE_URL is not a secret per se, but CONFLUENCE_USER and especially CONFLUENCE_API_TOKEN definitely need masking.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new
export-from-confluencecomposite action and supporting workflow that exports Confluence pages to Markdown files in the repository.Action (
.github/actions/export-from-confluence/)confluence_url,page_id) into each exported.mdfilesetup-pythonpinned to SHA for reproducible runs.Workflow (
.github/workflows/export-from-confluence.yml)Manual (
workflow_dispatch) trigger with inputs:space-key,root-page-title,page-id,output-dir,max-depth.Credentials fetched from 1Password via service account token (
op://orangit-documenter/confluence-credentials/). 1Password CLI installed directly from the official tar release.PR creation uses native
git+ghCLI:confluence-export/<run-id>output-dir