Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nokogiri::XML::Reader#encoding and Nokogiri::XML::Reader#xml_version giving wrong output #980

Closed
aruprakshit opened this issue Oct 13, 2013 · 6 comments · Fixed by #2377
Closed

Comments

@aruprakshit
Copy link

Hi,

I tried below code :

require 'nokogiri'

reader = (<<-eoxml)
<?xml version="1.0" encoding="ISO-8859-1"?>
<root>
<node label="Session">              
      <node id="session_query1" label="DTM Buffer Size" />
      <node id="session_query2" label="Buffer Block Size"/>
      <node id="session_query3" label="Enable Test Load"/>
</node>
<node label="Workflow" >
     <node id="workflow_query1" label="Enable HA Recovery"/>
     <node id="workflow_query2" label="Suspend on Error"/>
</node>
<node label="Mapping" >
     <node id="mapping_query1" label="SQL Override in SQ " />
     <node id="mapping_query2" label="SQL Override in lookup" />               
</node>
</root>
eoxml

reader.instance_of? Nokogiri::XML::Reader # => true
p reader.xml_version
p reader.encoding
# >> nil
# >> nil
@JunaidBKirkire
Copy link

@loveGitHub Hi I just attempted the above example with Nokogiri 1.5.6

"reader.instance_of? Nokogiri::XML::Reader" gave me "false".

@leejarvis
Copy link
Member

@loveGitHub Please update this with an accurate example. In your code, you're clearly creating a new String and nowhere in there are you creating any Nokogiri related objects

@aruprakshit
Copy link
Author

@leejarvis That was a typo..

require 'nokogiri'

reader = Nokogiri::XML::Reader(<<-eoxml)
<?xml version="1.0" encoding="ISO-8859-1"?>
<root>
<node label="Session">              
      <node id="session_query1" label="DTM Buffer Size" />
      <node id="session_query2" label="Buffer Block Size"/>
      <node id="session_query3" label="Enable Test Load"/>
</node>
<node label="Workflow" >
     <node id="workflow_query1" label="Enable HA Recovery"/>
     <node id="workflow_query2" label="Suspend on Error"/>
</node>
<node label="Mapping" >
     <node id="mapping_query1" label="SQL Override in SQ " />
     <node id="mapping_query2" label="SQL Override in lookup" />               
</node>
</root>
eoxml

reader.instance_of? Nokogiri::XML::Reader # => true
p reader.xml_version
p reader.encoding
# >> nil
# >> nil

@JunaidBKirkire
Copy link

@loveGitHub @leejarvis I think the encoding, xml_version info needs to be provided or parsed from the "source".

I think a slight change in constructor code would be required to fix this. (lib/nokogiri/xml/reader.rb)

def initialize source, url = nil, encoding = nil # :nodoc:

   // Apply regex here to extract encoding from source if encoding is nil.

    @source   = source
    @errors   = []
    @encoding = encoding
end

Let me know how this sounds.

@flavorjones
Copy link
Member

Hi, apologies for not responding on this issue for such an embarrassingly long time.

The xml_version attribute is only set once the reader has started reading. So if you update the example above to first do a #read (or read via #each) you'll see this populated.

p reader.read.xml_version # => "1.0"
reader.each do |node|
  p node.xml_version # => "1.0"
end

As for #encoding, it's a little trickier. Today this simply returns whatever is passed into the constructor (as @JunaidBKirkire mentioned above). There is a libxml2 function that will return the document encoding, and I'll play with that a bit to see if it works as we expect.

@flavorjones
Copy link
Member

PR #2377 has been created to improve the behavior of #encoding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants