Fix 413 response handling and re-enable related couch replicator test #1234

nickva · 2018-03-23T22:28:51Z

Previously, when the server decided too much data was sent in the client's
request, it would immediately send a 413 response and close the socket. In the
meantime there could be unread data on the socket since the client keeps
streaming data. When this happens the connection is reset instead of going
through regular close sequence. The client, specifically the replicator client,
detected the reset before it had a chance to process the 413 response. This
lead to a retry, since it was interpreted as generic network error, instead of
a proper 413 HTTP error.

The improvement is to flush the receive socket before and after sending a 413
response, then close the connection. This reduces the chance of the socket
being closed with unread data, avoids a TCP reset, and gives the client a
better chance of parsing the 413 response. This is mostly geared to work with
the replicator client but should help other clients as well.

Also the connection on both the server and the client sides is closed after a
413 event. This avoids a few race conditions were it is not clear how much data
is on the socket after the 413 is processed. On the server side, the close
response header is set and socket is closed. On the client side, a flag is set
such that right before the worker release back to the pool it is stopped, which
closes the socket.

Also re-enable the previously disabled replicate_one_with_attachment test.

To test run this for a while:

make soak-eunit apps=couch_replicator suites=couch_replicator_small_max_request_size_target

Code is written and works correctly;
Changes are covered by tests;
Documentation reflects the changes;

…"" This reverts commit ba624ea.

rnewson · 2018-03-23T22:51:01Z

You described the previous behaviour in the pr description but not the new behaviour.

…

Sent from my iPhone

On 23 Mar 2018, at 22:28, Nick Vatamaniuc ***@***.***> wrote: Previously, when the server decided too much data sent with the client's request, it would immediately send a 413 response and close the socket. The client side kept sending incoming data as the socket was closed with unread data in it. When this happens the connection was reset instead of going through a regular close sequence. The client, specifically the replicator client, detected the connection reset event before it had a chance to process the 413 response. Also re-enable the previously disabled replicate_one_with_attachment test. To test run this for a while: make soak-eunit apps=couch_replicator suites=couch_replicator_small_max_request_size_target Code is written and works correctly; Changes are covered by tests; Documentation reflects the changes; You can view, comment on, or merge this pull request online at: #1234 Commit Summary Revert "Revert "re-enable "flaky" test in quest to nail down #745"" Improve 413 response handling File Changes M src/couch/src/couch_httpd.erl (11) M src/couch_replicator/test/couch_replicator_small_max_request_size_target.erl (28) Patch Links: https://github.com/apache/couchdb/pull/1234.patch https://github.com/apache/couchdb/pull/1234.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Previously, when the server decided too much data was sent in the client's request, it would immediately send a 413 response and close the socket. In the meantime there could be unread data on the socket since the client keeps streaming data. When this happens the connection is reset instead of going through regular close sequence. The client, specifically the replicator client, detected the reset before it had a chance to process the 413 response. This lead to a retry, since it was interpreted as generic network error, instead of a proper 413 HTTP error. The improvement is to flush the receive socket before and after sending a 413 response, then close the connection. This reduces the chance of the socket being closed with unread data, avoids a TCP reset, and gives the client a better chance of parsing the 413 response. This is mostly geared to work with the replicator client but should help other clients as well. Also the connection on both the server and the client sides is closed after a 413 event. This avoids a few race conditions were it is not clear how much data is on the socket after the 413 is processed. On the server side, the `close` response header is set and socket is closed. On the client side, a flag is set such that right before the worker release back to the pool it is stopped, which closes the socket.

nickva · 2018-03-24T14:35:20Z

I ran that test for 6 hours and no failures yet.

janl · 2018-03-26T09:03:40Z

+1

Good explanation, thanks!

Revert "Revert "re-enable "flaky" test in quest to nail down apache#745…

97e14a8

…"" This reverts commit ba624ea.

nickva force-pushed the issue-1211 branch from cecf228 to a8ccdfb Compare March 24, 2018 07:47

janl merged commit e7c48b3 into apache:master Mar 26, 2018

wohali mentioned this pull request Mar 27, 2018

Investigate why 413 error is not delivered reliably #1211

Closed

nickva deleted the issue-1211 branch August 3, 2018 05:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 413 response handling and re-enable related couch replicator test #1234

Fix 413 response handling and re-enable related couch replicator test #1234

nickva commented Mar 23, 2018 •

edited

Loading

rnewson commented Mar 23, 2018 via email

nickva commented Mar 24, 2018

janl commented Mar 26, 2018

Fix 413 response handling and re-enable related couch replicator test #1234

Fix 413 response handling and re-enable related couch replicator test #1234

Conversation

nickva commented Mar 23, 2018 • edited Loading

rnewson commented Mar 23, 2018 via email

nickva commented Mar 24, 2018

janl commented Mar 26, 2018

nickva commented Mar 23, 2018 •

edited

Loading