Add auto_identifiers support to Man Reader#11675
Conversation
Add support for the auto_identifiers, gfm_auto_identifiers, and ascii_identifiers extensions in the man reader. Section headings parsed from .SH and .SS macros now receive auto-generated id attributes when the extension is enabled, enabling --toc to produce working anchor links. - Add autoIdExtensions to default man extensions - Added HasReaderOptions, HasLogMessages and HasIdentifierList to ManState to run registerHeader - Use headerWith instead of header to attach the computed Attr with identifiers Closes jgm#8852
- The auto identifers are verified with 3 different test cases. - The same 3 tests cases are verified with GFM Auto identifiers algorithm. - The AsciiIdentifers option is additionally tested with one test case Closes jgm#8852
E.g. for zh-Hant-TW look for (in order) zh-Hant-TW.yaml, zh-Hant.yaml, zh.yaml. Closes jgm#11648.
If tblHeader exists but has `w:val="0"`, then don't consider the element a header. See jgm#8299...but this change doesn't seem to fix things completely.
This led to some table rows being wrongly considered header rows. We now correctly handle the example from jgm#8299 (comment) See jgm#8299.
(Instead of using raw HTML.) The "aside" class is added to the Div. Also, add "header" class to Divs created from headers. See jgm#11626.
...if otherwise the label doesn't come after anything. (In this case typst will raise an error.) Closes jgm#11568.
This change ensures that raw content marked `epub2` will appear in (only) EPUBv2 output and content marked `epub3` will appear in (only) EPUBv3 output.
This fixes a bug which produced too-narrow columns in some cases. Closes jgm#11664.
`stringify` returns the empty string for a MetaString, so each keyword in the `cp:keywords` list of `docProps/core.xml` was rendered as empty. Convert each metadata value like `lookupMetaString` does instead. Signed-off-by: Sai Asish Y <say.apm35@gmail.com>
We parse these as DefinitionList items, but we previously sometimes stopped prematurely in including material in the definition. We should include everything until we hit a new indentation-changing macro. Closes jgm#11668.
Previously the OpenDocument writer emitted a fresh automatic style (L1..Ln, P1..Pn, T1..Tn) for nearly every list, list-item paragraph, block quote, preformatted block, and inline text style. This produced large ODT files, made `--reference-doc` customization ineffective (the user's predefined styles were never referenced), and gave each list its own indentation independent of any containing block quote. This commit teaches the writer to reference the predefined styles that LibreOffice ships and that pandoc's reference.odt now exports: - Bullet lists use `List_20_1`; ordered lists with default start and decimal format use `Numbering_20_1`. Non-default ordered lists generate a single named override style (`Pandoc_Numbering_N`) memoised by (ListNumberStyle, ListNumberDelim); a non-default start value with the default format is expressed via `text:start-value` on the `text:list` element instead of a new style. - List-item paragraphs use `List_20_Bullet[_Tight]` and `List_20_Number[_Tight]`. The Tight variants are pandoc-specific (zero top/bottom margin) and are injected into the user's reference.odt if missing, just like the Skylighting token styles. - Block quotes use the predefined `Quotations` paragraph style directly. Nested block quotes use a single automatic style that inherits from Quotations and only adds extra margin-left, so a list inside a block quote now inherits its container's indent (jgm#2747). - Preformatted blocks use `Preformatted_20_Text` directly. - Emphasis, Strong, Strikeout, Subscript, Superscript and Code spans use the predefined `Emphasis`, `Strong_20_Emphasis`, `Strikeout`, `Subscript`, `Superscript` and `Source_20_Text` text styles. - `paraStyle`/`paraStyleFromParent` no longer emit a wrapper automatic style when its only attribute would be `parent-style-name`; the parent name is returned directly. Closes jgm#9136. Closes jgm#5086. Closes jgm#2747. Closes jgm#3426. Closes jgm#7336. Co-authored by: Claude Opus 4.7.
Like the other table syntaxes (pipe, simple, and multiline tables) and block-level constructs generally, a grid table may now be indented by up to three spaces and still be recognized as a table. Previously the grid-table parser required the table to begin at the left margin, so an indented grid table was parsed as a paragraph. The leading indentation is stripped uniformly from each line before the table is parsed, so an indented grid table produces the same AST as its non-indented equivalent. Adds a command test.
|
Looks like there were new commits on main branch with new tests. I will Rebase from main and make the changes. Additionally, I have written the previous test cases in Old test suit style. I will move them to test/command/8852.md. |
| attr <- registerHeader nullAttr contents | ||
|
|
There was a problem hiding this comment.
Note that registerHeader doesn't emit log messages directly with report; it adds them to a list of log messages in state (using addLogMessage); to make sure that items in this list are actually output, you need to call reportLogMessages after parsing is finished. (I actually no longer remember why we had to do this indirect thing in the markdown reader, rather than using report directly, but there was some reason registerHeader was designed this way.)
There was a problem hiding this comment.
The registerHeader would not write any logs for Man State.
Because registerHeader only generates logs when there are duplicate identifiers.
Logs generate in markdown when we have duplicate identifiers defined in markdown. But for Man, we do not have option for defining identifiers.
I can still add the reportLogMessages in ParseMan function if required.
| , "H1" =: | ||
| ".SH The header\n" | ||
| =?> header 1 (text "The header") | ||
| =?> headerWith ("",[],[]) 1 (text "The header") |
There was a problem hiding this comment.
I didn't understand why this was needed. Isn't header == headerWith ("".[],[])?
| ] | ||
| , testGroup "man" | ||
| [ test' "reader" ["-r", "man", "-w", "native", "-s"] | ||
| [ test' "reader" ["-r", "man-auto_identifiers", "-w", "native", "-s"] |
There was a problem hiding this comment.
I think it makes more sense to test man, and adjust the expected output.
You can do this quickly with make TESTARGS=--accept.
There was a problem hiding this comment.
The test of the -auto_identifiers case could be moved to the command test above.
There was a problem hiding this comment.
Right, I agree. I will move the old cases to command tests
|
Looks good - I squashed and merged in one commit. |
This PR enables the
auto_identifiersextension for the Man reader, allowing parsed headers to receive identifier attributes.These identifiers are used by writers for section links and table-of-contents generation. The
auto_identifiersextension is now enabled by default for Man.New test cases have been added to verify:
auto_identifiersgfm_auto_identifiersascii_auto_identifiersCloses #8852