Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly handle pathnames by closing opened files again #1090

Merged
merged 1 commit into from
Nov 30, 2018

Conversation

haslo
Copy link
Contributor

@haslo haslo commented Nov 23, 2018

This PR fixes #975 by using a standard File.open block for handling path names as images.

The io is yielded within that block instead of returned from the method, and thus is correctly closed after it has been used to read from.

@pointlessone
Copy link
Member

Wouldn't it be easier to change verify_and_open_image to verify_and_read_image that would just return read data, instead of io? I imagine, the overall code would be simpler and easier to read as well.

Also, please add specs.

@haslo
Copy link
Contributor Author

haslo commented Nov 23, 2018

@pointlessone Thanks a lot for the feedback! I absolutely agree, directly reading the content is a lot simpler.

Do you know of a way to test open file handles from inside Ruby, without resorting to OS tools? I've added a file double way of testing whether the block syntax is used for File.open, and a version that uses lsof and only runs on systems that use that, but I'm not perfectly happy with either...

end
# String or Pathname
io_or_path = Pathname.new(io_or_path)
raise ArgumentError, "#{io_or_path} not found" unless io_or_path.file?
io = io_or_path.open('rb')
io
File.open(io_or_path, 'rb') do |file_io|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File.read(io_or_path)

Instead of the block open would work as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have to be #binread though, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. That is closer to the original code.

Copy link
Contributor

@fidothe fidothe Nov 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because Pathname supports a lot of class-level methods from File you could also just call binread on the Pathname instance directly

Suggested change
File.open(io_or_path, 'rb') do |file_io|
io_or_path.binread

You could then simplify your no-files-left-open spec by passing a known-non-existent file path in plus a correctly configured test double/spy and verify that the spy had binread.

Something like this:

    it 'uses binread, which closes files opened from pathnames' do
      pathname_double = instance_double("Pathname", binread: "1234567")
      expect(Pathname).to receive(:new).with(filename) { pathname_double }
      
      _info = pdf.image(filename)
      expect(pathname_double).to have_received(:binread)
    end

Not 100% happy with that, but hopefully you see the point - use of a non-existent path ensures that file system accesses outside calling #binread on the double would fail with a system error, so you don't need to verify through something like ObjectSpace.

@pointlessone
Copy link
Member

Well, this issue was never considered a serious one because Ruby is quite good a closing files for you. And does so fairly quickly.

It does so when it GCs the file object. I imagine, you can write a reliable spec by preventing that from Happening. Let say, you can disable GC for one spec so that opened file object could live long enough for you to do all the assertion you want.

@haslo
Copy link
Contributor Author

haslo commented Nov 23, 2018

Unfortunately, it happens to be serious for us because the files we open live on an SMB share, who complains about open files fairly quickly. The processes using Prawn are Rails workers in Passenger, who doesn't appear to close the files until the process dies and gets replaced by a new one, which can be minutes...

Both specs are reliable, but I'd prefer to just ask File whether it knows of any open handles. It doesn't look like Ruby has anything of that sort though.

@pointlessone
Copy link
Member

pointlessone commented Nov 23, 2018

No, Ruby doesn't keep a list of all open files.

You may be observing this more often in Passenger because many web-servers tune GC to not happen during request. However, it should happen in between requests so I don't know why it has to die to free up file handles.

Additionally, the files might be retained by something other than requests and so, leaking.

@haslo
Copy link
Contributor Author

haslo commented Nov 23, 2018

Leaking is of course possible, but the issue arises in the places where we feed Prawn just pathnames as strings. I'm not sure how long it's being retained, the fact is that it's sometimes minutes. We tried to fix the symptoms by always passing file instances instead of filenames (and then closing those ourselves), but figured that's silly when we can just help fixing the cause instead of digging through all our code and maintaining lists of files in dozens of places...

@haslo
Copy link
Contributor Author

haslo commented Nov 23, 2018

I found a way to write the spec that I'm happy with. What do you think, @pointlessone?

@pointlessone
Copy link
Member

Isn't ObjectSpace MRI-specific? I'd like to keep whatever JRuby compatibility we have.

@haslo
Copy link
Contributor Author

haslo commented Nov 23, 2018

I wasn't aware. What do you think of running that particular spec only for MRI?

@pointlessone
Copy link
Member

I'm not opposed to the idea of impl-specific specs. As long as there are corresponding specs for other implementations.

@haslo
Copy link
Contributor Author

haslo commented Nov 23, 2018

I've rebased it all into one commit. Thanks again for your feedback!

it 'closes opened files again after getting Pathname objects' do
if RUBY_PLATFORM == 'java'
# JRuby doesn't support ObjectSpace
expect(RUBY_PLATFORM).to eq('java')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be a spec here. Otherwise we can't be sure it works on Ruby.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to come up with something... The spec that used lsof isn't really great either, that doesn't test anything for Windows platforms. Or would that work for you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we don't run tests on Windows but we don't have any platform-specific code either. Everything is Ruby-only so in theory we shouldn't have any major issues with adding Windows. It at all possible I'd prefer it stayed that way.

it 'accepts Pathname objects' do
info = pdf.image(Pathname.new(filename))
expect(info.height).to eq(453)
end

context 'with File message spy' do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec feels a bit too coupled to the implementation and doesn't describe the actual intent particularly well but since I can't give any useful advice on how to improve it, let's keep it. However, tag it with issue. Here's an example.

Copy link
Contributor Author

@haslo haslo Nov 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I don't like testing the implementation (and only implicitly the logic) either. Something like the version with ObjectSpace but working on every platform would've been awesome...

@haslo
Copy link
Contributor Author

haslo commented Nov 26, 2018

I think it would make sense to add the lsof and the ObjectSpace versions of the specs after all, even if they have no direct equivalents without those tools. The feature the specs test is whether the file handles are closed again. There will then be three specs for that, testing every angle on systems that have all three approaches available, and only testing the implementation-based angle currently in the PR if they don't.

For reference, lsof version:

  it 'closes opened files again after getting Pathname objects' do
    system_has_lsof = system('lsof -v > /dev/null 2>&1')
    system_has_grep = system('grep --version > /dev/null 2>&1')
    if system_has_lsof && system_has_grep
      gc_was_disabled = GC.disable # make sure GC doesn't remove File instance and close file
      open_files_before = `lsof -c ruby | grep "#{filename}"`
      _info = pdf.image(Pathname.new(filename))
      open_files_after = `lsof -c ruby | grep "#{filename}"`
      GC.enable unless gc_was_disabled
      expect(open_files_after).to eq(open_files_before)
    else
      expect(system_has_lsof && system_has_grep).to be_falsy # just for Rubocop
    end
  end

...and ObjectSpace version:

  it 'closes opened files again after getting Pathname objects' do
    if <guard ObjectSpace availability>
      gc_was_disabled = GC.disable # make sure GC doesn't remove File instance and close file
      open_files_before = ObjectSpace.each_object(File).count { |f| !f.closed? }
      _info = pdf.image(Pathname.new(filename))
      open_files_after = ObjectSpace.each_object(File).count { |f| !f.closed? }
      GC.enable unless gc_was_disabled
      expect(open_files_after).to eq(open_files_before)
    else
      expect(<tautology>).to be_truthy # just for Rubocop
    end
  end

If I add these to the PR, even without corresponding specs when the tools aren't available, the logic is tested from three non-perfect angles instead of just one:

  • Cross platform compatible, but implementation-based instead of logic-based
  • Logic-based, but only works on linux and macOS platforms
  • Logic-based, but only works on MRI

If anybody has an idea for a logic-based cross-platform spec after all, I'm all ears. But without that, personally I'd feel happier with these three specs and a comment explaining their raison d'être than just one...

@pointlessone
Copy link
Member

pointlessone commented Nov 26, 2018

Sounds reasonable.

However, instead of meaningless expectations in else branches please use pending with a good doc string.

end
end

it 'closes opened files, spec with lsof' do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An easier approach to these kind of specs is to wrap the definition with the platform check, which means you avoid the sometimes-this-spec-is-pending-but-not-always problem:

describe "situation" do
  system_has_lsof = system('lsof -v > /dev/null 2>&1')
  system_has_grep = system('grep --version > /dev/null 2>&1')
  if system_has_lsof && system_has_grep
    it 'closes opened files, spec with lsof' do
      ...
    end
  end

  if RUBY_PLATFORM != 'java'
    it 'closes opened files, spec with ObjectSpace' do
      ...
    end
  end
end

@haslo
Copy link
Contributor Author

haslo commented Nov 30, 2018

The PR is still in the status "changes requested" ... the specs are as fitting as I think is possible now, right? Is there something more I can / should do?

Copy link
Member

@pointlessone pointlessone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Prawn::Images#image method does not close IO object
3 participants