Skip to content

Tex to StarMath: better math/formula rendering in LibreOffice .odt output#11470

Open
jdpipe wants to merge 6 commits into
jgm:mainfrom
jdpipe:tex-to-starmath
Open

Tex to StarMath: better math/formula rendering in LibreOffice .odt output#11470
jdpipe wants to merge 6 commits into
jgm:mainfrom
jdpipe:tex-to-starmath

Conversation

@jdpipe
Copy link
Copy Markdown

@jdpipe jdpipe commented Feb 15, 2026

This pull request implements a writer for StarMath, which is the native syntax for expressing formulae in LibreOffice (LO). The current MathML is still emitted, but the StarMath code is added as an annotation. This greatly improves the formatting/rendering of latex and math-embedded markdown into LO writer (.odt) documents, and also makes the equations far more idiomatic in ongoing hand-editing is needed in LO.

Example of problematic .md input:

$$
\dot{Q}_{\text{dem}}(t)=\dot{Q}_{\text{eb}}(t)+\dot{Q}_{\text{dis}}(t)+\dot{Q}_{\text{boil}}(t),
$$

The \dot{Q} was being converted by Pandoc to something like (notice the sneaky unicode combining-dot character):

 <mover><mi>Q</mi><mo accent="true">̇</mo></mover>

and this was then being rendered in LO as:

{ Q csup ̇ _ "dem" ( t ) = Q csup ̇ _ "eb" ( t ) + Q csup ̇ _ "dis" ( t ) + Q csup ̇ _ "boil" ( t ) , }

which is gobbledygook as far as an end-user is concerned, and also renders very poorly.

With this pull request, the same equation renders in LO instead as

{dot Q}_"dem"(t) = {dot Q}_"eb"(t) + {dot Q}_"dis"(t) + {dot Q}_"boil"(t),
%% TeX: 
%% \dot{Q}_{\text{dem}}(t)=\dot{Q}_{\text{eb}}(t)+\dot{Q}_{\text{dis}}(t)+\dot{Q}_{\text{boil}}(t),

and appears exactly as would be expected. Note that the incoming tex code is provided as a comment to allow corrections/further tweaks for any still-unimplemented latex commands.

- alignment of fractions within a stack
- correct separation of 'min' from 'left' in \min \left(...\right)
- translation of \quad and \qquad to ~ and ~~ respectively
- handle, for now, \mathcal{J} as simply {ital J} (no better option?)
- adding space after greek letters, as in '\lambda f_\text{l}'
@jgm
Copy link
Copy Markdown
Owner

jgm commented Feb 15, 2026

ODT is based on opendocument, which explicitly allows MathML for formulas. I don't know what StarMath is, or how it fits with open document. I am guessing, from what you say, that it is something specific to LibreOffice? Can you give a link to its documentation? Is it the case that LibreOffice ignores the MathML when a StarMath annotation is present? When the equation is updated, does LibreOffice adjust the MathML as well as the StarMath?

From what you have written, it seems that LibreOffice translates MathML to StarMath in rendering documents. It also seems that this translation is buggy. If that's the case, shouldn't this be fixed on the LibreOffice side? After all, MathML support is called for in the opendocument spec. Have you submitted a bug report there?

You make a good case for supporting StarMath in pandoc, but the way to do it would be, first, to add a writer (and possibly a reader) for StarMath to texmath (jgm/texmath). (A lot of the needed code is in this PR, but it would have to be fit into texmath.) A reader would be also be good to have if LibreOffice does not update changes to StarMath in the MathML, or does that badly. And I'd need to be convinced by test coverage that the change would be a net gain overall (not a case where some formulas are rendered better and others worse).

@jdpipe
Copy link
Copy Markdown
Author

jdpipe commented Feb 17, 2026

Hi jgm, thanks for your thoughts and comments!

Regarding StarMath, it's the primary user-facing editing format for equations in LibreOffice (LO). The name dates back to the old StarOffice days, before OpenOffice.org. It's still called 'smath' in the LO code, although it has other names too.

As I understand it, LO documents that have been created in LO encode both the MathML and the StarMath code, which gets stored as an 'annotation' alongside the 'main' MathML. When a Word document gets imported into LO, I believe its Office Math Markup Language (OMML) gets XSLTed into MathML, and then if editing is being done, it gets further converted into StarMath.

An LO document can exist without the StarMath code, and LO will attempt to convert the MathML into an editable version of the equation, as you said. It generally works OK, but it produces far less readable code, and certainly not code that is comfortable to work with in any further hand-editing. I gave examples of the kind of code that we get from current-generation Pandoc; it's a similar story when opening Word (.docx) documents. Apparently Word uses Office Math Markup Language (OMML), which can be XSLT-ed to MathML with some degree of success. I presume that is what is done when loading a .docx into LO, and then a further conversion to StarMath is done if you actually attempt to edit an equation.

The issue is that the mapping from MathML back to human-friendly markup is one to many, and the converter built into LO seems to chooses some maybe more robust but less user-friendly choices -- for some reason, it might use something like "{Q} rsub {nital P}" instead of 'Q_"P"'. Maybe the 'rsub' has a good reason, eg compatibility in R-to-L language contexts, perhaps (?) but the upshot is that the resulting formulas are awful to edit, and you have to do lots of pointless fiddly edits in order to have somethat you can further work with, if that's what you want to do. The converter I've provided is far, far better than the current conversion pathway provided by Pandoc in combination with the MathML to StarMath converter provided in LO.

Yes, one place to improve this would in LO itself. However, LO will not have access to the original Tex; it only has the resulting MathML, which already loses some of the specificity of the original TeX.

Another place to implement this might be TexMath; I didn't consider that. One consideration is that the converter needs to poke code into the ODT structure. So at least that part needs to be in Pandoc, I presume. But yes, I guess the other part could be in in TexMath perhaps.

Actually another possible reason for having the in Pandoc would be that as a superior pathway from MathML to StarMath, it might also support better conversion of Word format into ODT. I didn't test that, but it can be explored. Obviously the %% Tex: source wouldn't be available in that case.

@jgm
Copy link
Copy Markdown
Owner

jgm commented Feb 17, 2026

If texmath had a StarMath writer, then pandoc's ODT writer could use that to produce the StarMath annotation, which it could then insert into the MathML. But conceptually that's where the converter belongs.

@jdpipe
Copy link
Copy Markdown
Author

jdpipe commented Feb 17, 2026

Sorry, I had some edits on the go and didn't see your subsequent response.

@jdpipe
Copy link
Copy Markdown
Author

jdpipe commented Mar 9, 2026

ok i've created a pull request now for texmath. the changes here now require that change to the texmath repo in order to function.

this commit has a few other tweaks for latex processing, to fix up glitches in processing:
https://github.com/wch/latexsheet/blob/gh-pages/latexsheet.tex

the cheatsheet now renders pretty well in both markdown and libreoffice writer. i had to suppress the @startsection macro from emitting rubbish, and also the \begin{multicols}{...} was dropping '3' and '2' into the output.

also had to do some hacking to get the \LaTeXe, \BibTeX etc macros to do the obvious thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants