Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 413 response handling and re-enable related couch replicator test #1234

Merged
merged 2 commits into from
Mar 26, 2018

Conversation

nickva
Copy link
Contributor

@nickva nickva commented Mar 23, 2018

Previously, when the server decided too much data was sent in the client's
request, it would immediately send a 413 response and close the socket. In the
meantime there could be unread data on the socket since the client keeps
streaming data. When this happens the connection is reset instead of going
through regular close sequence. The client, specifically the replicator client,
detected the reset before it had a chance to process the 413 response. This
lead to a retry, since it was interpreted as generic network error, instead of
a proper 413 HTTP error.

The improvement is to flush the receive socket before and after sending a 413
response, then close the connection. This reduces the chance of the socket
being closed with unread data, avoids a TCP reset, and gives the client a
better chance of parsing the 413 response. This is mostly geared to work with
the replicator client but should help other clients as well.

Also the connection on both the server and the client sides is closed after a
413 event. This avoids a few race conditions were it is not clear how much data
is on the socket after the 413 is processed. On the server side, the close
response header is set and socket is closed. On the client side, a flag is set
such that right before the worker release back to the pool it is stopped, which
closes the socket.

Also re-enable the previously disabled replicate_one_with_attachment test.

To test run this for a while:

make soak-eunit apps=couch_replicator suites=couch_replicator_small_max_request_size_target
  • Code is written and works correctly;
  • Changes are covered by tests;
  • Documentation reflects the changes;

@rnewson
Copy link
Member

rnewson commented Mar 23, 2018 via email

Previously, when the server decided too much data was sent in the client's
request, it would immediately send a 413 response and close the socket. In the
meantime there could be unread data on the socket since the client keeps
streaming data. When this happens the connection is reset instead of going
through regular close sequence. The client, specifically the replicator client,
detected the reset before it had a chance to process the 413 response. This
lead to a retry, since it was interpreted as generic network error, instead of
a proper 413 HTTP error.

The improvement is to flush the receive socket before and after sending a 413
response, then close the connection. This reduces the chance of the socket
being closed with unread data, avoids a TCP reset, and gives the client a
better chance of parsing the 413 response. This is mostly geared to work with
the replicator client but should help other clients as well.

Also the connection on both the server and the client sides is closed after a
413 event. This avoids a few race conditions were it is not clear how much data
is on the socket after the 413 is processed. On the server side, the `close`
response header is set and socket is closed. On the client side, a flag is set
such that right before the worker release back to the pool it is stopped, which
closes the socket.
@nickva
Copy link
Contributor Author

nickva commented Mar 24, 2018

I ran that test for 6 hours and no failures yet.

@janl
Copy link
Member

janl commented Mar 26, 2018

+1

Good explanation, thanks!

@janl janl merged commit e7c48b3 into apache:master Mar 26, 2018
@nickva nickva deleted the issue-1211 branch August 3, 2018 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants