Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Multipart boundary predicate in URLSessionClient #491

Conversation

asmundg
Copy link

@asmundg asmundg commented Sep 25, 2024

Only looking for the boundary delimiter by itself is insufficient to determine the multipart boundary, as the string can occur within JSON objects, especially since "---" is in common use (see https://github.com/ChilliCream/graphql-platform/blob/d39b40be047fab666a75f302056531498cb792a7/src/HotChocolate/Core/src/Execution/Serialization/MultiPartResultFormatter.cs#L202C1-L202C68, https://github.com/graphql/graphql-over-http/blob/main/rfcs/IncrementalDelivery.md#content-type-multipartmixed).

As specified in https://datatracker.ietf.org/doc/html/rfc2046#section-5.1, the delimiter should be on a separate line, which means we can recognize it by the newline control characters that won't occur inside JSON objects.

Only looking for "---" is insufficient to determine the multipart boundary, as the string can occur within JSON objects.

As specified in https://datatracker.ietf.org/doc/html/rfc2046#section-5.1, the delimiter should be _on a separate line_, which means we can recognize it by the newline control characters that won't occur inside JSON objects.
@apollo-cla
Copy link

@asmundg: Thank you for submitting a pull request! Before we can merge it, you'll need to sign the Apollo Contributor License Agreement here: https://contribute.apollographql.com/

Copy link

netlify bot commented Sep 25, 2024

👷 Deploy request for apollo-ios-docc pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit 24c97eb

Copy link

netlify bot commented Sep 25, 2024

👷 Deploy request for eclectic-pie-88a2ba pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit 24c97eb

@calvincestari
Copy link
Member

@asmundg - at first pass this looks correct and I'm surprised we didn't have it already. It sounds like you encountered this in a response, is that correct? Can you add a test case so we can catch any future regressions too please.

@calvincestari
Copy link
Member

calvincestari commented Sep 25, 2024

Only looking for "---" is insufficient to determine the multipart boundary, as the string can occur within JSON objects.

Is your boundary being set as -? Which would create --- as the boundary delimiter.

@asmundg
Copy link
Author

asmundg commented Sep 26, 2024

Only looking for "---" is insufficient to determine the multipart boundary, as the string can occur within JSON objects.

Is your boundary being set as -? Which would create --- as the boundary delimiter.

You're right, that comment doesn't make a lot of sense out of context. I expanded a bit on the description to point out the general problem and how it gets worse due to the commonly used delimiter. I did indeed hit this in the wild, with a response object containing a row of dashes.

I'll see if I can add a test.

@calvincestari
Copy link
Member

I began adding a few test cases yesterday and the proposed fix alone isn't enough to resolve the issue of --- (or any hyphen-hyphen-boundary combination) being in the message data. There is another bug downstream in the parser sequence due to how the chunk handling is split up: URLSessionClient only tries to ensure it has a complete block of one or more chunks to pass on, leading any remaining data in the buffer; from there the individual specification parser needs to handle more than a single chunk being in the data it was given.

@asmundg
Copy link
Author

asmundg commented Sep 26, 2024

I began adding a few test cases yesterday and the proposed fix alone isn't enough to resolve the issue of --- (or any hyphen-hyphen-boundary combination) being in the message data. There is another bug downstream in the parser sequence due to how the chunk handling is split up: URLSessionClient only tries to ensure it has a complete block of one or more chunks to pass on, leading any remaining data in the buffer; from there the individual specification parser needs to handle more than a single chunk being in the data it was given.

Yep, this got hairier the more I looked into it. I added a test case that illustrates the problem (if you remove the changes in URLSessionClient, it illustrates how you end up with half a JSON object). The test required some refactoring and makes the MockURLProtocol timing dependent. Which I'm not happy about at all.

@asmundg
Copy link
Author

asmundg commented Sep 26, 2024

I should probably also mention that I'm using my own multipart response parsing interceptor, due to some technical requirements. So I haven't looked at behavior further up the stack than getting complete parts out of the session client.

@asmundg
Copy link
Author

asmundg commented Sep 27, 2024

Hm, I wonder if it might make more sense to test urlSession didReceive directly here. That should provide more precise control and support testing the various edge cases.

@asmundg
Copy link
Author

asmundg commented Sep 27, 2024

@calvincestari all right, I think the tests are in a much better shape now. As you point out, the downstream parsers will still break for the problematic payload, but the session client is at least emitting complete part.

The RFC doesn't strictly require crlf to follow the boundary (although
the relay reference implementation makes that assumption).
@calvincestari
Copy link
Member

Thanks @asmundg. I started cleaning up the previous tests yesterday but I'll review these updates today and go from there.

I'll have to check if the downstream parsers must be fixed too before merging. We can't leave things may be in an inconsistent state for others that rely on a working chain at the moment.

You're the first to report an issue with the delimiter though.

@calvincestari calvincestari changed the base branch from main to fix/multipart-delimiter-boundary-parsing October 1, 2024 22:46
@calvincestari
Copy link
Member

@asmundg I've updated this PR with some refactoring to the boundary indexing and cleaned up the tests. We're able to use the existing test infrastructure so no need to create fake data tasks, which use deprecated initializers anyways.

I've also changed the base branch so I can ensure that the downstream parsers change as needed before merging all changes together into main.

@calvincestari calvincestari changed the title feat: Fix multipart boundary predicate fix: Multipart boundary predicate Oct 1, 2024
@calvincestari calvincestari changed the title fix: Multipart boundary predicate fix: Multipart boundary predicate in URLSessionClient Oct 1, 2024
@asmundg
Copy link
Author

asmundg commented Oct 2, 2024

Thanks! I think we're not covering the edge case where you get an incomplete part, with the boundary string contained in the JSON object, e.g. '

\r\n{"data": "foo--"}'. Without a real boundary delimiter present, URLSessionClient will find the last instance of the boundary string, which is now inside the JSON object, and return a split part.

@calvincestari
Copy link
Member

I'll add a test for that but I think the current changes will work. I expect this to fail parsing though because without the ending boundary there is no indication the chunk is complete.

@calvincestari
Copy link
Member

@asmundg - here is the missing test, and as expected the result is .failure.

Once the tests pass I'll merge this into the base branch (fix/multipart-delimiter-boundary-parsing). If you need this fix immediately in your project you can target that branch while I do any work remaining on the downstream parsers. Once that work is completed it'll all get merged into main.

@asmundg
Copy link
Author

asmundg commented Oct 2, 2024

Thanks for following up here. I have a functioning workaround, so no immediate rush to getting the fix complete. 🙂

@calvincestari calvincestari merged commit dcf1c3b into apollographql:fix/multipart-delimiter-boundary-parsing Oct 2, 2024
26 checks passed
@calvincestari
Copy link
Member

calvincestari commented Oct 4, 2024

@asmundg - PR #502 is the complete fix (URLSessionClient + MultipartResponseParsingInterceptor + specification parsers (defer & subscriptions). I may add one more set of tests tomorrow morning and then put it up for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants