-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Multipart boundary predicate in URLSessionClient
#491
fix: Multipart boundary predicate in URLSessionClient
#491
Conversation
Only looking for "---" is insufficient to determine the multipart boundary, as the string can occur within JSON objects. As specified in https://datatracker.ietf.org/doc/html/rfc2046#section-5.1, the delimiter should be _on a separate line_, which means we can recognize it by the newline control characters that won't occur inside JSON objects.
@asmundg: Thank you for submitting a pull request! Before we can merge it, you'll need to sign the Apollo Contributor License Agreement here: https://contribute.apollographql.com/ |
👷 Deploy request for apollo-ios-docc pending review.Visit the deploys page to approve it
|
👷 Deploy request for eclectic-pie-88a2ba pending review.Visit the deploys page to approve it
|
@asmundg - at first pass this looks correct and I'm surprised we didn't have it already. It sounds like you encountered this in a response, is that correct? Can you add a test case so we can catch any future regressions too please. |
Is your boundary being set as |
You're right, that comment doesn't make a lot of sense out of context. I expanded a bit on the description to point out the general problem and how it gets worse due to the commonly used delimiter. I did indeed hit this in the wild, with a response object containing a row of dashes. I'll see if I can add a test. |
I began adding a few test cases yesterday and the proposed fix alone isn't enough to resolve the issue of |
Yep, this got hairier the more I looked into it. I added a test case that illustrates the problem (if you remove the changes in URLSessionClient, it illustrates how you end up with half a JSON object). The test required some refactoring and makes the MockURLProtocol timing dependent. Which I'm not happy about at all. |
I should probably also mention that I'm using my own multipart response parsing interceptor, due to some technical requirements. So I haven't looked at behavior further up the stack than getting complete parts out of the session client. |
Hm, I wonder if it might make more sense to test urlSession didReceive directly here. That should provide more precise control and support testing the various edge cases. |
@calvincestari all right, I think the tests are in a much better shape now. As you point out, the downstream parsers will still break for the problematic payload, but the session client is at least emitting complete part. |
The RFC doesn't strictly require crlf to follow the boundary (although the relay reference implementation makes that assumption).
Thanks @asmundg. I started cleaning up the previous tests yesterday but I'll review these updates today and go from there. I'll have to check if the downstream parsers must be fixed too before merging. We can't leave things may be in an inconsistent state for others that rely on a working chain at the moment. You're the first to report an issue with the delimiter though. |
@asmundg I've updated this PR with some refactoring to the boundary indexing and cleaned up the tests. We're able to use the existing test infrastructure so no need to create fake data tasks, which use deprecated initializers anyways. I've also changed the base branch so I can ensure that the downstream parsers change as needed before merging all changes together into |
URLSessionClient
Thanks! I think we're not covering the edge case where you get an incomplete part, with the boundary string contained in the JSON object, e.g. ' \r\n{"data": "foo--"}'. Without a real boundary delimiter present, URLSessionClient will find the last instance of the boundary string, which is now inside the JSON object, and return a split part. |
I'll add a test for that but I think the current changes will work. I expect this to fail parsing though because without the ending boundary there is no indication the chunk is complete. |
@asmundg - here is the missing test, and as expected the result is Once the tests pass I'll merge this into the base branch ( |
Thanks for following up here. I have a functioning workaround, so no immediate rush to getting the fix complete. 🙂 |
dcf1c3b
into
apollographql:fix/multipart-delimiter-boundary-parsing
Only looking for the boundary delimiter by itself is insufficient to determine the multipart boundary, as the string can occur within JSON objects, especially since "---" is in common use (see https://github.com/ChilliCream/graphql-platform/blob/d39b40be047fab666a75f302056531498cb792a7/src/HotChocolate/Core/src/Execution/Serialization/MultiPartResultFormatter.cs#L202C1-L202C68, https://github.com/graphql/graphql-over-http/blob/main/rfcs/IncrementalDelivery.md#content-type-multipartmixed).
As specified in https://datatracker.ietf.org/doc/html/rfc2046#section-5.1, the delimiter should be on a separate line, which means we can recognize it by the newline control characters that won't occur inside JSON objects.