Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] write_to fails in Nokogiri 1.14.0 when writing to an IO that doesn't implement external_encoding, such as Zip::OutputStream #2773

Closed
dmolesUC opened this issue Jan 19, 2023 · 6 comments

Comments

@dmolesUC
Copy link

Please describe the bug

In Nokogiri 1.14.0, Node.write_to calls external_encoding on the output stream. (I'm not sure why it does this as from the IO and Encoding docs external_encoding appears to be only for reading, but whatever.)

This fails when the underlying output stream is not an actual IO but only a mostly-IO-like object such as Zip::OutputStream.

This was not an issue in 1.13.x, so I'm guessing it's related to 795464c.

Help us reproduce what you're seeing

#!/usr/bin/env ruby

require 'nokogiri'
require 'minitest/autorun'
require 'zip'

class Test < MiniTest::Spec
  before { @tmpdir = Dir.mktmpdir }
  after { FileUtils.rm_rf(@tmpdir) }

  describe "Node#write_to" do
    it "should write to a Zip::OutputStream" do
      doc = Nokogiri::XML("<test/>")

      zipfile_path = File.join(@tmpdir, 'test.zip')
      Zip::OutputStream.open(zipfile_path) do |out|
        out.put_next_entry('test.xml')
        doc.write_to(out, encoding: 'UTF-8')
      end
    end
  end
end

Expected behavior

  • Test above should complete without errors.

Actual behavior

  • Test fails with

    NoMethodError:
      undefined method `external_encoding' for #<Zip::OutputStream:0x0000000131dece98>
    # /Users/david/.rvm/gems/ruby-2.7.5/gems/nokogiri-1.14.0-arm64-darwin/lib/nokogiri/xml/node.rb:1368:in `native_write_to'
    # /Users/david/.rvm/gems/ruby-2.7.5/gems/nokogiri-1.14.0-arm64-darwin/lib/nokogiri/xml/node.rb:1368:in `write_to'
    # /Users/david/.rvm/gems/ruby-2.7.5/gems/nokogiri-1.14.0-arm64-darwin/lib/nokogiri/html5/node.rb:37:in `write_to'  
    

Environment

# Nokogiri (1.14.0)
    ---
    warnings: []
    nokogiri:
      version: 1.14.0
      cppflags:
      - "-I/Users/david/.rvm/gems/ruby-2.7.5/gems/nokogiri-1.14.0-arm64-darwin/ext/nokogiri"
      - "-I/Users/david/.rvm/gems/ruby-2.7.5/gems/nokogiri-1.14.0-arm64-darwin/ext/nokogiri/include"
      - "-I/Users/david/.rvm/gems/ruby-2.7.5/gems/nokogiri-1.14.0-arm64-darwin/ext/nokogiri/include/libxml2"
      ldflags: []
    ruby:
      version: 2.7.5
      platform: arm64-darwin21
      gem_platform: arm64-darwin-21
      description: ruby 2.7.5p203 (2021-11-24 revision f69aeb8314) [arm64-darwin21]
      engine: ruby
    libxml:
      source: packaged
      precompiled: true
      patches:
      - 0001-Remove-script-macro-support.patch
      - 0002-Update-entities-to-remove-handling-of-ssi.patch
      - 0003-libxml2.la-is-in-top_builddir.patch
      - '0009-allow-wildcard-namespaces.patch'
      libxml2_path: "/Users/david/.rvm/gems/ruby-2.7.5/gems/nokogiri-1.14.0-arm64-darwin/ext/nokogiri"
      memory_management: ruby
      iconv_enabled: true
      compiled: 2.10.3
      loaded: 2.10.3
    libxslt:
      source: packaged
      precompiled: true
      patches:
      - 0001-update-automake-files-for-arm64.patch
      datetime_enabled: true
      compiled: 1.1.37
      loaded: 1.1.37
    other_libraries:
      zlib: 1.2.13
      libiconv: '1.17'
      libgumbo: 1.0.0-nokogiri

Additional context

Above test requires RubyZip; tested with RubyZip 2.3.2.

@flavorjones
Copy link
Member

flavorjones commented Jan 19, 2023

Thanks for reporting this, and for providing a great test case to reproduce!

I'm not sure why it does this as from the IO and Encoding docs external_encoding appears to be only for reading

The original commit is referenced in the commit you mention. I think it explains pretty well what's going on. We explicitly set the encoding on the IO object so that the intermediate buffers created can match that encoding, and avoid some edge cases related to UTF-16 and BOMs. Edit: also see the subsequent commit b5768c6.

but whatever

😎

I'll take a look as soon as I can. This seems like it should be pretty easy to work around, but I will want to play with edge cases involving UTF-16 just to make sure it's working right.

@dmolesUC
Copy link
Author

@flavorjones OK, I just assumed that external_encoding had no effect on writing. I missed this bit buried in the IO docs:

When ext_enc is specified, strings read will be tagged by the encoding when reading, and strings output will be converted to the specified encoding when writing. [emphasis added]

Glad to see it's documented somewhere at least.

@flavorjones
Copy link
Member

flavorjones commented Jan 21, 2023

I've got a fix for CRuby, but JRuby isn't working right with Zip::OutputStream. I'm going to try to get that to work before pushing a PR.

@flavorjones
Copy link
Member

Fix is going through CI at #2775

@flavorjones
Copy link
Member

I hope to cut a patch release this weekend with this fixed.

@flavorjones flavorjones added this to the v1.14.x patch releases milestone Jan 30, 2023
@flavorjones
Copy link
Member

See related #3406 in which rubyzip's encoding behavior changed in 2.4.1 (and again in 3.0.dev) that we have no control over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants