mediawiki reader: improve strong/emph conformance#10766
Draft
silby wants to merge 1 commit into
Draft
Conversation
Owner
|
In general I'm not too concerned with divergences in edge cases. Nobody is ever going to write |
Owner
|
Is Parsoid the parser mediawiki uses? Or is that something else? |
Collaborator
|
This looks ready but is still marked as draft. @silby, can we merge this? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
cf. #10761 and #3044.
I made some progress with this today without completely blowing up the existing strong and emph parsers but weird edge cases remain. E.g. consider
''foo''''bar''. Pandoc today will give youEmph [ Str "foo" , Str "bar" ], which has an obvious appeal. My work in progress givesEmph [ Str "foo''" ] , Str "bar''", which is odder but defensible given other requirements for emphasized quote marks. The actual correct answer, according to MediaWiki, isEmph [ Str "foo'" , Strong [ Str "bar" ] ], i.e. foo'bar, which is basically a koan.Parsoid has a lot of code just for processing quotes, presumably aiming to maintain bug-for-bug compatibility with whatever MediaWiki's first parser did. So what a string of single-quotes means varies depending on what comes after it in the line, in a more context-sensitive way than I expected.
Would it be better to merge code that makes us more conformant with MediaWiki for some cases and "wrong in a different way" for others, or to try to reach perfection?