-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JATS validation fails if footnotes include block quotes #5570
Comments
I think it makes sense to make pandoc do this transformation when generating JATS. |
Okay, great! I don't want to slam you all at once with a bunch of issues. But I am seeing a few very similar problems, and it probably makes sense to at least show you the full scope of these problems, so that you can consider the issue comprehensively.
Both these problems are very similar to the one outlined above:
I'd be happy to write up each of these as separate issues with recreation examples, etc – whatever is most convenient for you. And overall, I'm happy to help in any way I can. Just let me know what I can do. Thanks again! |
Is it safe to assume that
|
Good questions. The JATS spec should be helpful here as they strictly specify how elements may be nested. For the full story, I recommend looking at the Document Hierarchy Diagrams. But to answer your question:
Hopefully that helps! Let me know if you have any other questions. |
OK, according to the spec,
That covers all block-level elements, as rendered by pandoc, except:
Here's the complete list:
|
So I think what you suggest is basically right:
I think the cleanest way to do the wrapping is to add a |
Done. If you could use this version to generate some documents and make sure they validate, that would be great. |
Wow so fast! I'd be happy to test this fix against my problem articles and follow up. I do have a related question though... Is there an easy way for me to install the bleeding edge in order to test this? (Otherwise, I could wait for you to cut a new version which would also be fine with me.) If it matters, currently I am installing via Dockerfile:
|
You could try https://github.com/pandoc-extras/pandoc-nightly |
Great! I can test it tomorrow using the nightly build. I'll follow up on this thread to let you know how it works. Thanks again! |
Following up here... I am testing my documents against the latest version, Good newsThe Bad newsI believe version 2.7.3 also introduced a fairly major regression in the JATS writer. I suspect the bug was introduced in #5511:
I agree that grouping footnotes at the end of the document is better. However, this change created a bug where JATS output fails validation if the document does not contain any footnotes. The specific validation error I am seeing is:
If it's not obvious, that means that
Finally, I can confirm that when I add a footnote(s) anywhere in the document, the validation error disappears. Would you like me to create a separate issue for this bug? |
@mb21 has fixed this regression, thanks for the testing! |
Background
The JATS spec does not allow you to have block quotes (i.e.
<disp-quote>
) inside footnotes (i.e.<fn>
).Frankly, I think this is an odd and unreasonable restriction. Some authors – and some disciplines, such as legal scholarship – make heavy use of footnotes. As far as I know, there's nothing fundamentally wrong with placing a block quote inside a footnote.
Problem
I am encountering real-world examples of this problem, so I need some sort of workaround.
To be clear, here's an example of invalid JATS:
This JATS XML would fail validation with the error:
Element fn content does not follow the DTD, expecting (label? , p+), got (p disp-quote)
Steps to recreate
Pandoc version: 2.7.2
Files: jats_example.zip
To reproduce the issue described above
pandoc -s --metadata-file metadata.json --to jats example.md -o output.xml
output.xml
using the PMC online validation tool: https://www.ncbi.nlm.nih.gov/pmc/tools/xmlcheckerSolution
After examining the JATS spec, I think I have a solid workaround. I want to wrap the
<disp-quote>
element in a<p>
element. This will ensure the JATS is valid while while only minimally changing the semantic meaning. Unless it's too much work, it would be nice if we could also include aspecific-use
attribute.So in practice, we would convert this:
Into this:
For sure, this is a little weird. From a semantic (or even just commonsense) standpoint, it doesn't make sense to have a block quote inside a paragraph. But this is allowed / valid in JATS. In fact, here's a proof of concept demonstrating that it's valid / okay to wrap a
<disp-quote>
in a<p>
tag. You can download this file and run it through the PMC Validator to confirm.Finally, in case it's unclear, I only want to wrap
<disp-quote>
when nested inside<fn>
(i.e. I don't want to wrap all<disp-quote>
).Questions
First, I can't decide if this fix should be made directly in the core JATS writer or only in my code (e.g. using a custom filter). Personally, I am leaning toward the core JATS writer because, imo, the JATS writer should strive to produce valid JATS and thus everyone would benefit from this fix. However, I can also imagine y'all feeling like this problem is too specific and should be solved in the client's code rather than Pandoc. And, of course, it's really not my decision to make. So... Let me know what y'all think.
Second, if y'all think this should be solved in the client's code, then I could use some help writing a Lua filter for this use case. I have successfully written some basic Lua filters in the past, but this problem is proving trickier than I expected. Seems like Pandoc's AST expects paragraphs to include a list of inline elements but I'm trying to nest a block quote which also a block element. Anyway, for whatever reason, it's not working as expected, so any advice would be very much appreciated.
Thanks again for maintaining Pandoc! It's an amazing tool!
The text was updated successfully, but these errors were encountered: