Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gruf 2.13 seems to fail to allow connections #147

Closed
tmtrademarked opened this issue Feb 16, 2022 · 6 comments · Fixed by #148
Closed

Gruf 2.13 seems to fail to allow connections #147

tmtrademarked opened this issue Feb 16, 2022 · 6 comments · Fixed by #148

Comments

@tmtrademarked
Copy link

Please describe the issue

After updating to Gruf 2.13, I can no longer make connections from my gRPC clients to my gRPC server. With Gruf 2.12, everything works fine - but with 2.13, it seems like connections are just never serviced. I see the following error reported by my client:

[9a625990-be2e-41e0-be7f-638fed11094e] [::1] Gruf::Client::Errors::Unavailable (14:failed to connect to all addresses. debug_error_string:{"created":"@1645049640.389778000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3135,"referenced_errors":[{"created":"@1645049640.389777000","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}):

I don't see any logging in the server process, not even any logs about starting up/etc. (That appears unchanged in both 2.12 and 2.13, but it makes it harder for me to diagnose where the problem might be)

The truly odd part is that this feels like an error in the gRPC library itself somehow, given the error message. But I haven't changed gRPC versions, just gruf versions - and when I try upgrading gRPC as well, to 1.43.1, I seem to see the same behavior. I saw notes in previous Gruf releases about gRPC incompatibility, and figured maybe it was related, but thus far, no dice.

How to Reproduce

I don't know quite how to describe this for reproduction in another application - but in my application, it's very simple:

  • After spinning up the gRPC server with bundle exec gruf, attempt to make any gRPC request via Gruf::Client.new(<options>).call
  • Observe that the connection never completes and eventually times out with the error above.

What should happen?

The server should accept connections from the calling client.

Anything else we should know?

  • gRPC version - 1.41.0
  • Ruby version - 3.0.2
  • OS - OSX 11.6.2, M1 MBP
@splittingred
Copy link
Member

splittingred commented Feb 16, 2022

@tmtrademarked Hm, I wonder if we have an issue with the server start delegation that we pushed to grpc in ef139b7 - I'll take a look at this this week to figure out what's going on.

Strange that our unit + e2e tests and the gruf-demo app do not have this issue. Is there anything unique you're doing in your client/server instantiation? (Channel args, deadlines, interceptors, etc)

@tmtrademarked
Copy link
Author

We do have some interceptors configured, but they are nothing particularly special - just a logging interceptor, and two for error handling. From what I can see, no code in these interceptors ever gets invoked, though. And when I try running gruf with --suppress-default-interceptors, and all my interceptors removed, I still see the same behavior.

Our config for the server is pretty basic:

require 'gruf'

Gruf.configure do |c|
  c.server_binding_url = '0.0.0.0:50051'

  c.logger = Rails.logger
  c.grpc_logger = Rails.logger
end

I don't think we're doing anything special with channels or deadlines anywhere, but I'll keep digging and see if anything pops up. I confess that I'm pretty surprised by this too, because this release has been out for a while, and I would have expected something this bad to have been encountered before if it was at all common.

@splittingred
Copy link
Member

@tmtrademarked Actually, taking a look at this - I think there's an issue where server.wait_till_running is not behaving as expected here. I'll have a PR up shortly to address the issue.

@tmtrademarked
Copy link
Author

Awesome - thanks @splittingred ! Just to sort of prove the hypothesis, in my local gem I was adding some binding.pry statements in lib/gruf/server.rb to see what happens. It looks like if I put some breakpoints in the new thread call, I can make it start serving - so maybe there's a race condition here in the server code?

@splittingred
Copy link
Member

@tmtrademarked I think so, I think the thread we use to start and update the Process title isn't properly allowing the underlying started variable to properly allow the wait_till_running check to proceed, leaving the server stuck in starting mode.

@splittingred
Copy link
Member

@tmtrademarked I've opened #148 to address the issue.

splittingred added a commit that referenced this issue Feb 16, 2022
… to bind connections and never reach serving state (fixes #147)
splittingred added a commit that referenced this issue Feb 16, 2022
… to bind connections and never reach serving state (fixes #147)
splittingred added a commit that referenced this issue Feb 16, 2022
… to bind connections and never reach serving state (fixes #147)
splittingred added a commit that referenced this issue Feb 16, 2022
… to bind connections and never reach serving state (fixes #147)
splittingred added a commit that referenced this issue Feb 16, 2022
… to bind connections and never reach serving state (fixes #147)
splittingred added a commit that referenced this issue Feb 16, 2022
… to bind connections and never reach serving state (fixes #147)
splittingred added a commit that referenced this issue Feb 16, 2022
… to bind connections and never reach serving state (fixes #147)
@splittingred splittingred self-assigned this Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment