Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<br /> preceded by a newline is lost when parsing an HTML fragment #178

Closed
rgrove opened this issue Dec 1, 2009 · 5 comments
Closed

<br /> preceded by a newline is lost when parsing an HTML fragment #178

rgrove opened this issue Dec 1, 2009 · 5 comments

Comments

@rgrove
Copy link
Contributor

rgrove commented Dec 1, 2009

When a <br /> element is preceded by a newline in an HTML fragment, Nokogiri seems to remove it when the fragment is parsed. Here's an irb session demonstrating the issue (using Nokogiri 1.4.0 with libxml2 2.7.6):

>> require 'rubygems'
=> false
>> require 'nokogiri'
=> true
>> html = "First line\nSecond line<br />Broken line"
=> "First line\nSecond line<br />Broken line"
>> fragment = Nokogiri::HTML::DocumentFragment.parse(html)
=> #<Nokogiri::HTML::DocumentFragment:0x80c94f5c name="#document-fragment" children=[#<Nokogiri::XML::Text:0x80c94c64 "First line\nSecond lineBroken line">]>
>> fragment.to_xhtml
=> "First line\nSecond lineBroken line"
>> fragment.to_html
=> "First line\nSecond lineBroken line"

If I remove the newline, the fragment is parsed just fine:

>> html = "First line<br />Broken line"
=> "First line<br />Broken line"
>> fragment = Nokogiri::HTML::DocumentFragment.parse(html)
=> #<Nokogiri::HTML::DocumentFragment:0x80c8c118 name="#document-fragment" children=[#<Nokogiri::XML::Text:0x80c8be20 "First line">, #<Nokogiri::XML::Element:0x80c8bd80 name="br">, #<Nokogiri::XML::Text:0x80c8bc7c "Broken line">]>
>> fragment.to_xhtml
=> "First line<br />Broken line"
@wbharding
Copy link

This also applies to other HTML -- I have observed it with anchors, i.e., "One line\nTwo line\n\n<a href="http://brokenlink.com"&gt;This won't be a link after parsing</a>"

However, if I wrap the text block in a <div> and </div>, it works. I'm thinking that the newline must somehow interfere with Nokogiri's ability to discern the insides as HTML.

@flavorjones
Copy link
Member

OK, will investigate.

@flavorjones
Copy link
Member

fixing leading text node with newline in fragment parsing. closed by b659302.

@wbharding
Copy link

Great turnaround time, thanks a lot! Between this and the fix to the document.root.namespace exception you recently did, I'd love to see a gem bump soon so I can strip out my collection of kluge-fixes. :)

@flavorjones
Copy link
Member

Should be bumped this weekend. Cross your fingers.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants