Skip to content

feat: add export-from-confluence action and workflow#16

Open
orangit-timo-aho wants to merge 1 commit into
mainfrom
feat/export-from-confluence
Open

feat: add export-from-confluence action and workflow#16
orangit-timo-aho wants to merge 1 commit into
mainfrom
feat/export-from-confluence

Conversation

@orangit-timo-aho
Copy link
Copy Markdown

@orangit-timo-aho orangit-timo-aho commented May 7, 2026

Summary

Adds a new export-from-confluence composite action and supporting workflow that exports Confluence pages to Markdown files in the repository.

Action (.github/actions/export-from-confluence/)

  • Exports a full space tree, a named root-page subtree, or a single page by ID
  • Downloads image attachments (png/jpg/gif/svg/webp/bmp/ico) alongside each page
  • Reconstructs folder hierarchy from the Confluence page tree
  • Writes YAML frontmatter (confluence_url, page_id) into each exported .md file
  • Converts Confluence storage format to Markdown: code macros, info macros, tables, lists, headings, blockquotes, inline formatting, links

setup-python pinned to SHA for reproducible runs.

Workflow (.github/workflows/export-from-confluence.yml)

Manual (workflow_dispatch) trigger with inputs: space-key, root-page-title, page-id, output-dir, max-depth.

Credentials fetched from 1Password via service account token (op://orangit-documenter/confluence-credentials/). 1Password CLI installed directly from the official tar release.

PR creation uses native git + gh CLI:

  • Checks out branch confluence-export/<run-id>
  • Stages only the output-dir
  • Skips PR creation silently if there are no changes
  • Opens a PR with full context (space, page, actor) in the body

## Action (.github/actions/export-from-confluence/)

Composite action that exports Confluence pages to local Markdown files:
- Exports a full space tree, a named root-page subtree, or a single page by ID
- Downloads image attachments (png/jpg/gif/svg/webp/bmp/ico) alongside each page
- Reconstructs folder hierarchy from the Confluence page tree
- Writes YAML frontmatter (confluence_url, page_id) into each exported .md file
- Converts Confluence storage format to Markdown: code macros, info macros,
  tables, lists, headings, blockquotes, inline formatting, links

setup-python pinned to SHA, consistent with the publish-to-confluence action.

## Workflow (.github/workflows/export-from-confluence.yml)

Manual (workflow_dispatch) trigger with inputs:
  space-key, root-page-title, page-id, output-dir, max-depth

Credentials fetched from 1Password via service account token
(op://orangit-documenter/confluence-credentials/).

PR creation uses native git + gh CLI:
- Checks out a branch confluence-export/<run-id>
- Stages only the output-dir
- Skips PR creation if there are no changes
- Opens a PR with full context (space, page, actor) in the body
@orangit-timo-aho orangit-timo-aho force-pushed the feat/export-from-confluence branch from c292542 to 485981e Compare May 7, 2026 04:55
@orangit-timo-aho orangit-timo-aho changed the title Add export-from-confluence action and workflow feat: add export-from-confluence action and workflow May 7, 2026
Copy link
Copy Markdown
Contributor

@orangit-sami-bister orangit-sami-bister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness / Logic Issues
4. Silent auth bypass in "Test Confluence connection" step (action.yml lines ~66-74)
try:
conf.get_page_by_id("0")
except Exception:
pass # Expected — just checking auth doesn't throw 401
This swallows all exceptions including 401/403. A bad token will silently pass this check. At minimum, inspect the exception type/status code and fail if it's an auth error.
5. download_attachments silently ignores download failures (export.py line ~336)
except Exception:
pass
Failed attachment downloads are silently dropped. The exported Markdown will have broken image links with no warning. Should at least print(f" ✗ Failed to download: {filename}").
6. Attachment pagination limit is hardcoded to 100 (export.py line ~309)
result = conf.get_attachments_from_content(page_id, start=0, limit=100)
Pages with >100 attachments will silently miss some. Should paginate.
7. Nested list handling is fragile (export.py lines ~200-220)
The 5-iteration loop for nested lists is a workaround. It will silently truncate lists nested deeper than 5 levels. Consider using a recursive approach or a proper HTML parser (html.parser / lxml).

The masking gap
GitHub Actions automatically masks values from secrets.* in logs. However, in this workflow the secrets are fetched via op read and written to $GITHUB_ENV:

workflow.yml lines ~53-57

echo "CONFLUENCE_URL=$(op read 'op://...')" >> "$GITHUB_ENV"
echo "CONFLUENCE_USER=$(op read '...')" >> "$GITHUB_ENV"
echo "CONFLUENCE_API_TOKEN=$(op read '...')" >> "$GITHUB_ENV"
Values written to $GITHUB_ENV this way are not automatically masked. GitHub only masks values that come directly from secrets.*. Since these values are set dynamically at runtime, they will appear in plain text in:

  • Step debug logs (if debug logging is enabled)
  • Any step that accidentally echos an env var
  • Error messages from the Python atlassian library that might include the URL or user in exception text
    How to fix
    Use add-mask immediately after reading each secret:
    CONFLUENCE_URL=$(op read 'op://orangit-documenter/confluence-credentials/url')
    CONFLUENCE_USER=$(op read 'op://orangit-documenter/confluence-credentials/username')
    CONFLUENCE_API_TOKEN=$(op read 'op://orangit-documenter/confluence-credentials/token')
    echo "::add-mask::$CONFLUENCE_URL"
    echo "::add-mask::$CONFLUENCE_USER"
    echo "::add-mask::$CONFLUENCE_API_TOKEN"
    {
    echo "CONFLUENCE_URL=$CONFLUENCE_URL"
    echo "CONFLUENCE_USER=$CONFLUENCE_USER"
    echo "CONFLUENCE_API_TOKEN=$CONFLUENCE_API_TOKEN"
    } >> "$GITHUB_ENV"
    The ::add-mask:: workflow command tells the runner to redact that value from all subsequent log output for the rest of the job.
    Note: CONFLUENCE_URL is not a secret per se, but CONFLUENCE_USER and especially CONFLUENCE_API_TOKEN definitely need masking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants