Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2: Avoid the prototype chain with editions if possible #897

Merged
merged 5 commits into from
Jun 21, 2024

Conversation

timostamm
Copy link
Member

In proto3, serialization skips scalar and enum zero values. This known as "implicit presence". In proto2, and in edition 2023 with default features, fields use explicit presence. In our generated code, messages are plain objects for implicit presence, and use the prototype chain to track explicit presence.

Since there are disadvantages to the prototype chain (some libraries expect only plain objects, without a custom prototype), we want to avoid it if possible.

Currently, converting a proto3 file to editions changes the behavior, adding a prototype to generated messages.

This PR adds a little bit of logic to avoid the prototype chain for editions, if a message uses only implicit presence.

@@ -197,7 +199,7 @@ const messagePrototypes = new WeakMap<
*/
function createZeroMessage(desc: DescMessage): Message {
let msg: Record<string, unknown>;
if (desc.file.edition == EDITION_PROTO3) {
if (!needsPrototypeChain(desc)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we only avoided the prototype chain for proto3.

Comment on lines +265 to +284
/**
* Do we need the prototype chain to track field presence?
*/
function needsPrototypeChain(desc: DescMessage): boolean {
switch (desc.file.edition) {
case EDITION_PROTO3:
// proto3 always uses implicit presence, we never need the prototype chain.
return false;
case EDITION_PROTO2:
// proto2 never uses implicit presence, we always need the prototype chain.
return true;
default:
// If a message uses scalar or enum fields with explicit presence, we need
// the prototype chain to track presence. This rule does not apply to fields
// in a oneof group - they use a different mechanism to track presence.
return desc.fields.some(
(f) => f.presence != IMPLICIT && f.fieldKind != "message" && !f.oneof,
);
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we avoid the prototype chain for messages that only use implicit presence. Message fields, oneof, lists, and maps never need the prototype chain.

We're keeping the cases for proto2 and proto3 because this is a hot path.

Comment on lines +525 to +532
describe("from edition2023 with proto3 features", () => {
const desc = test_messages_proto3_editions_ts.TestAllTypesProto3Schema;
test("without custom prototype", () => {
const msg = create(desc);
const hasCustomPrototype =
Object.getPrototypeOf(msg) !== Object.prototype;
expect(hasCustomPrototype).toBe(false);
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using editions/golden/test_messages_proto3_editions.proto to test that an editions file with file option features.field_presence = IMPLICIT does not use the prototype chain.

@timostamm timostamm changed the title Avoid the prototype chain with editions if possible V2: Avoid the prototype chain with editions if possible Jun 20, 2024
Copy link
Member

@srikrsna-buf srikrsna-buf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments in create still refer to proto3 as the deciding factor, apart from that LGTM.

// the prototype chain to track presence. This rule does not apply to fields
// in a oneof group - they use a different mechanism to track presence.
return desc.fields.some(
(f) => f.presence != IMPLICIT && f.fieldKind != "message" && !f.oneof,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do for repeated fields? I'm guessing repeated fields return "IMPLICIT" for presence, right? (Pretty sure that's what happens in other runtimes, like C++, Java, Go.) So if a field had only repeated fields, message fields, and oneof fields, it would return "no need for prototype chain", even when the default field presence for the file/message is explicit, right? If so, might we want to apply this test for "proto2" also, just so that we can further minimize the cases where a prototype is used? On the other hand, will it be surprising to users that use editions that sometimes messages use a prototype and sometimes not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, list (repeated) and map fields report presence IMPLICIT.

Copy link
Member

@jhump jhump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is not a behavior change from when the prototype was used, right? IIUC, the message constructor populates the object with all zero/default vals (except nested messages), so the field values are the same, even when unset and even without the prototype, right?

If so, LGTM. I left one remark about possibly being able to have no prototype chain for proto2 messages, but also with a bit of concern about predictability of when a message has a prototype chain and if that will be a confusion/sore spot for users.

@timostamm
Copy link
Member Author

I assume this is not a behavior change from when the prototype was used, right?

It is not. The only difference is that the object does not have an extra (empty) prototype in its chain.

Predictability is a concern, especially if the schema isn't authored by the user of the generated code. For example, if a field is added that requires the prototype chain, code can fail at runtime (if a 3rd party library is involved that requires plain objects).

I've considered a custom feature for explicit control, or a plugin option like prototype_chain=error so that users can have confidence in a build-time check. It might also be more straight-forward to look for a file option features.field_presence = IMPLICIT, instead of checking individual fields like it's done in this PR.

We'll be getting a better understanding for the DX by updating our examples and writing docs, and can amend if necessary.

@timostamm timostamm merged commit 3a7393b into v2 Jun 21, 2024
6 checks passed
@timostamm timostamm deleted the tstamm/avoid-prototype branch June 21, 2024 16:18
@timostamm timostamm mentioned this pull request Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants