Skip to content

DOCX writer and reader: support for endnotes#11501

Open
massifrg wants to merge 10 commits into
jgm:mainfrom
massifrg:docx-endnotes
Open

DOCX writer and reader: support for endnotes#11501
massifrg wants to merge 10 commits into
jgm:mainfrom
massifrg:docx-endnotes

Conversation

@massifrg
Copy link
Copy Markdown
Contributor

@massifrg massifrg commented Mar 4, 2026

Initial support for endnotes for openxml (docx).

You get an endnote just embedding a Note in a Span that has an "endnote" class:

This text[^1] has a footnote and an endnote[[^E1]]{.endnote}.

[^1]: This is the footnote text.

[^E1]: This is the endnote text.

No need to modify pandoc types, just a convention like the one about custom styles.

I did a small test and the docx writer works. Here's my plan:

  • implement endnotes also in the docx reader
  • modify the code so that the support for endnotes is an extension that is disabled by default
  • at this point the code could be merged into the main branch
  • in the future, once we know it does not mess anything, the extension could become enabled by default

About the notation

In pundok-editor I'm supporting many types of notes, embedding the Note inside a Span with a note-type attribute, whose value can be "endnote", "marginnote", "translatornote" or whatever:

This text has an exotic[[^EN1]]{note-type=exoticnote} note.

[^EN1]: Text of the note.

But for docx endnotes support, I prefer the "endnote" class way, because

  • it's lighter
  • openxml does not support any other kind of notes (AFAIK)
  • writing filters that do the translation between the two notations is trivial

Initial support for endnotes for
openxml.
You get an endnote embedding
a Note in a Span that has
an "endnote" class.
@massifrg massifrg marked this pull request as draft March 4, 2026 12:51
@massifrg
Copy link
Copy Markdown
Contributor Author

massifrg commented Mar 4, 2026

38 tests now fail because i needed to update the docx package contents in the data/docx directory.
Most of them are expected, since there are files in the base docx package that were missing or have been updated.

  1. "Can't extract ... from archive in stored file", which happens with these docx files:
    • word/endnotes.xml (all the 38 failing tests)
    • word/_rels/endnotes.xml (all the 38 failing tests)
    • word/media/rId10.jpg (2/38): this is more worrysome because it looks like a real side effect of the new code
  2. "Non-matching xml in ...", which happens with these docx files:
    • [Content_Types].xml (all the 38 failing tests)
    • word/document.xml (13/38)
    • word/footnotes.xml (3/38)
    • word/_rels/document.xml.rels (38/38)
    • word/_rels/footnotes.xml.rels (4/38)
    • word/styles.xml (37/38)

The updates are needed to support endnotes, so it looks unlikely to fix the failing tests only through changes in the code, even in the case of an endnotes extension that is disabled by default.

massifrg added 2 commits March 4, 2026 17:34
A new extension "endnotes" to
enable endnotes in docx format
The "endnotes" extension for docx
is working with the docx Writer.
No tests for this feature yet, but
old tests now pass.
@massifrg
Copy link
Copy Markdown
Contributor Author

massifrg commented Mar 4, 2026

There's a new extension for docx, "endnotes", which is disabled by default, so all the old tests now work, because they don't know about that extension and they don't enable it.

To get endnotes in docx, now you need to specify -t docx+endnotes.

There are a couple of styles to be added to definitions when the extension is enabled, in particular to have endnote markers in superscript, but the Writer side is done.

massifrg added 4 commits March 5, 2026 10:51
The "endnotes" extension for docx
is working with the docx Writer.
No tests for this feature yet, but
old tests now pass.
When "endnotes" extension is enabled,
the Span of class "endnote" that
embeds a Note may have an id
or attributes for the Note to become
an endnote.
The only condition is that there is
ONLY ONE class and that must be
"endnote".
When the "endotes" extension is enabled,
docx endnotes are converted to a Note
embedded in a Span of class "endnote".
@massifrg
Copy link
Copy Markdown
Contributor Author

massifrg commented Mar 5, 2026

Endnote support is now both in the Writer and in the Reader for docx format.
It is activated by the endnotes extensions, which is disabled by default.

On the reader side, when you specify -f docx+endnotes, you get a Span with class endnote around every Note that comes from an endnote in the docx input file.

On the writer side, when you specify -t docx+endnotes, you get an endnote in the docx output file for every Note that is embedded in a Span of class endnote.
That Span may have an identifier or other attributes, the condition is that it must have only one class: "endnote". The identifier and attributes will be discarded in the docx output, but maybe they could be useful for other output formats.

Before removing the draft state from this PR, I want to update the manual and provide some tests.
In particular, I want to write a roundtrip test that goes from markdown to docx and then back to markdown:

pandoc -f markdown -t docx+endnotes -o test.docx test.md
pandoc -f docx+endnotes -t markdown test.docx

If the resulting docx has the Span.endnote around endnotes, the test is successful.

The problem is I must create a temporary test.docx file, which should go away once the test is over, but I don't know how to do it yet.

@massifrg massifrg marked this pull request as ready for review March 9, 2026 09:03
An "endnote" class in a Span does not
alter the usual behavior when its contents
don't consist only of a Note.
@massifrg massifrg changed the title DOCX writer: support for endnotes DOCX writer and reader: support for endnotes Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant