fix(libreoffice): import existing office files#264
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 77c4ae6202
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Adds an import pipeline to the LibreOffice harness so existing ODF/Microsoft Office documents can be converted/parsed into the harness’s editable project JSON model, with corresponding CLI commands, docs, and tests.
Changes:
- Introduces
core/importer.pyto import ODF directly and convert Office formats to ODF via LibreOffice headless before parsing into the project model. - Extends the CLI with
document import,document import-formats, and enhancesdocument opento import Office/ODF files (optionally saving to JSON). - Updates LibreOffice conversion runtime isolation (temp profile/XDG dirs) and adds unit/E2E coverage + documentation updates for import workflows.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| libreoffice/agent-harness/setup.py | Updates package description to mention import support. |
| libreoffice/agent-harness/cli_anything/libreoffice/utils/lo_backend.py | Runs LibreOffice conversion with isolated temp profile/runtime/config/cache env. |
| libreoffice/agent-harness/cli_anything/libreoffice/core/importer.py | New importer: parse ODF content into project model; convert Office→ODF first. |
| libreoffice/agent-harness/cli_anything/libreoffice/libreoffice_cli.py | Adds document import / import-formats; enhances document open to import. |
| libreoffice/agent-harness/cli_anything/libreoffice/core/document.py | Improves invalid project file error to direct users to import. |
| libreoffice/agent-harness/cli_anything/libreoffice/tests/test_core.py | Adds unit tests for import formats + ODF import + conversion routing. |
| libreoffice/agent-harness/cli_anything/libreoffice/tests/test_full_e2e.py | Adds E2E coverage for importing existing Office docs and CLI open/import flow. |
| libreoffice/agent-harness/cli_anything/libreoffice/tests/TEST.md | Updates test inventory and results summary to include import tests. |
| libreoffice/agent-harness/cli_anything/libreoffice/skills/SKILL.md | Documents new import-related commands and recommended workflow. |
| libreoffice/agent-harness/cli_anything/libreoffice/README.md | Documents prerequisites and examples for importing existing files. |
| libreoffice/agent-harness/LIBREOFFICE.md | Updates architecture and adds import workflow example. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ed894b917c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| content_xml = parsed.get("content_xml") | ||
| if not content_xml: | ||
| return project |
There was a problem hiding this comment.
Reject ODF packages missing content.xml
import_odf returns a freshly created empty project when content.xml is absent, so a structurally broken .odt/.ods/.odp can be reported as a successful import and then re-exported, silently discarding original document data. Since content.xml is the primary document payload, this should fail fast with a user-facing error instead of returning an empty document.
Useful? React with 👍 / 👎.
| try: | ||
| parsed = parse_odf(path) | ||
| except zipfile.BadZipFile as e: | ||
| raise ValueError(f"Invalid ODF file: {path}") from e |
There was a problem hiding this comment.
Catch non-UTF8 XML decode errors in import pipeline
import_odf only wraps zipfile.BadZipFile, but parse_odf can also raise UnicodeDecodeError when XML entries are not UTF-8 decodable (e.g., malformed/corrupt archives). That exception is not handled by handle_error, so document import/open can terminate with a traceback instead of a clean CLI error, which is a user-visible robustness regression for file import.
Useful? React with 👍 / 👎.
Description
Fixes #
Type of Change
For New Software CLIs (in-repo)
<SOFTWARE>.mdSOP document exists at<software>/agent-harness/<SOFTWARE>.mdSKILL.mdexists atskills/cli-anything-<software>/SKILL.mdSKILL.mdexists atcli_anything/<software>/skills/SKILL.mdcli_anything/<software>/tests/test_core.pyare present and pass without backendcli_anything/<software>/tests/test_full_e2e.pyare presentREADME.mdincludes the new software (with link to harness directory)registry.jsonincludes an entry withsource_url: null(see Contributing guide)repl_skin.pyinutils/is an unmodified copy from the pluginFor New Software CLIs (standalone repo)
pip install <package-name>or apip install git+https://...URLSKILL.mdexists in the external reporegistry.jsonentry includessource_urlpointing to the external reporegistry.jsonentry includesskill_mdwith full URL to the external SKILL.mdinstall_cmdinregistry.jsonworks (tested locally)For Existing CLI Modifications
python3 -m pytest cli_anything/<software>/tests/test_core.py -vpython3 -m pytest cli_anything/<software>/tests/test_full_e2e.py -vregistry.jsonentry is updated if version, description, or requirements changedGeneral Checklist
--jsonflag is supported on any new commandsfeat:,fix:,docs:,test:)Test Results