-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DocBook reader ignores the id
attribute of formalpara
#8666
Comments
I looked a bit more into the DocBook standard and into the Pandoc DocBook reader code and I think that this might be better solved by using the already defined Moreover, I realized that DocBook also has the Thus, consider this DocBook example (example.xml): <?xml version="1.0" encoding="UTF-8"?>
<?asciidoc-toc?>
<?asciidoc-numbered?>
<article xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en">
<info>
<title>My Document</title>
<date>2023-03-06</date>
</info>
<formalpara xml:id="my_code_id">
<title>Code title</title>
<para>
<programlisting language="bash" linenumbering="unnumbered">echo "hello world!"</programlisting>
</para>
</formalpara>
<example xml:id="my_example_id">
<title>Example title</title>
<simpara>example content</simpara>
</example>
<sidebar xml:id="my_sidebar_id">
<title>Sidebar title</title>
<simpara>sidebar content</simpara>
</sidebar>
</article> with the current pandoc development version the AST obtained with [ Div
( "" , [ "formalpara-title" ] , [] )
[ Para [ Strong [ Str "Code" , Space , Str "title" ] ] ]
, CodeBlock ( "" , [ "bash" ] , [] ) "echo \"hello world!\""
, Para [ Str "example" , Space , Str "content" ]
, Para [ Str "sidebar" , Space , Str "content" ]
] Note that ids and titles are lost Now if we modify the code to use parseAdmonition funcion with these changes: diff --git a/src/Text/Pandoc/Readers/DocBook.hs b/src/Text/Pandoc/Readers/DocBook.hs
index 855f1d188..521f9ec89 100644
--- a/src/Text/Pandoc/Readers/DocBook.hs
+++ b/src/Text/Pandoc/Readers/DocBook.hs
@@ -786,6 +786,9 @@ blockTags = Set.fromList $
admonitionTags :: [Text]
admonitionTags = ["caution","danger","important","note","tip","warning"]
+titledBlockElements :: [Text]
+titledBlockElements = ["example", "formalpara", "sidebar"]
+
-- Trim leading and trailing newline characters
trimNl :: Text -> Text
trimNl = T.dropAround (== '\n')
@@ -849,12 +852,6 @@ parseBlock (Elem e) =
"toc" -> skip -- skip TOC, since in pandoc it's autogenerated
"index" -> skip -- skip index, since page numbers meaningless
"para" -> parseMixed para (elContent e)
- "formalpara" -> do
- tit <- case filterChild (named "title") e of
- Just t -> divWith ("",["formalpara-title"],[]) .
- para . strong <$> getInlines t
- Nothing -> return mempty
- (tit <>) <$> parseMixed para (elContent e)
"simpara" -> parseMixed para (elContent e)
"ackno" -> parseMixed para (elContent e)
"epigraph" -> parseBlockquote
@@ -899,6 +896,7 @@ parseBlock (Elem e) =
"refsect3" -> sect 3
"refsection" -> gets dbSectionLevel >>= sect . (+1)
l | l `elem` admonitionTags -> parseAdmonition l
+ l | l `elem` titledBlockElements -> parseAdmonition l
"area" -> skip
"areaset" -> skip
"areaspec" -> skip the resulting AST obtained with [ Div
( "my_code_id" , [ "formalpara" ] , [] )
[ Div
( "" , [ "title" ] , [] )
[ Plain [ Str "Code" , Space , Str "title" ] ]
, CodeBlock ( "" , [ "bash" ] , [] ) "echo \"hello world!\""
]
, Div
( "my_example_id" , [ "example" ] , [] )
[ Div
( "" , [ "title" ] , [] )
[ Plain [ Str "Example" , Space , Str "title" ] ]
, Para [ Str "example" , Space , Str "content" ]
]
, Div
( "my_sidebar_id" , [ "sidebar" ] , [] )
[ Div
( "" , [ "title" ] , [] )
[ Plain [ Str "Sidebar" , Space , Str "title" ] ]
, Para [ Str "sidebar" , Space , Str "content" ]
]
] Note that in the above |
Explain the problem.
DocBook reader ignores the id attribute of
formalpara
elements. This attribute is needed for cross-references.I found this problem when trying to convert an asciidoc document that references code blocks. Since pandoc does not support direct asciidoc conversion, I used the DocBook backend of
asciidoctor
to generate a DocBook document, but I found that when I tried to convert the DocBook document to other formats, the references to the code blocks were broken.For a minimal example, consider this asciidoc code:
When converting to docbook with
asciidoctor -b docbook example.adoc
the following DocBook is produced:Then, when pandoc reads the DocBook code with the command
pandoc -t native -f docbook
the following AST is returned:The problem here is that in the AST the Div element is missing the id and thus the previous reference to the code element is broken. The expected Div should be:
Pandoc version?
Pandoc development version
Possible fix
I have never programmed in haskell, but I looked around the code a bit and I found a working solution, this is the diff:
This fixes the
id
attribute, but note that there is also the related issue #3657 for which therole
attributes are not saved.The text was updated successfully, but these errors were encountered: