Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support proto2 default values and field presence #716

Merged
merged 12 commits into from
Feb 19, 2024

Conversation

timostamm
Copy link
Member

@timostamm timostamm commented Feb 14, 2024

This PR adds support for default values in proto2. For example, consider this simple Protobuf definition:

syntax="proto2";

message Example {
  optional int32 int32_field = 1;
  optional string string_field = 2 [default = "hello"];
  required bool bool_field = 3;
}

Note that string_field defines a default value, which isn't available in generated code (except through reflection).

If you have previously used a "required" field like bool_field in the example above, you might also have wondered why we generate an optional property boolField?: boolean.

This PR changes the generated code as follows:

export class Example extends Message<Example> {
  /**
   * @generated from field: optional int32 int32_field = 1;
   */
- int32Field?: number;
+ declare int32Field: number;

  /**
   * @generated from field: optional string string_field = 2 [default = "hello"];
   */
- stringField?: string;
+ declare stringField: string;

  /**
   * @generated from field: required bool bool_field = 3;
   */
- boolField?: boolean;
+ declare boolField: boolean;
  // ...
}

+ Example.prototype.int32Field = 0;
+ Example.prototype.stringField = "hello";
+ Example.prototype.boolField = false;

Default values

If you access string_field from a fresh message, the property will have the default value now. If no default value is specified, the property is initialized with the zero-value for the field type - an empty string in this case, false for a boolean field, and so on:

const e = new Example();
e.int32Field // 0
e.stringField // "hello"
e.boolField // false

Field presence

The reason proto2 required fields were previously typed as option in generated code is field presence. Proto2 fields use explicit presence tracking, which means we need to distinguish between the initial value, and a value that is explicitly set. Many Protobuf implementations use getters and setters for this purpose. We use the prototype chain instead.

To determine whether a field is present, we provide the function isFieldSet, and the function clearField to reset a field to its initial value. Here is a practical example:

import { isFieldSet, clearField } from "@bufbuild/protobuf";

const e = new Example();
isFieldSet(e, "int32Field"); // false

e.int32Field = 123; // setting a value makes the field present
isFieldSet(e, "int32Field"); // true

clearField(e, "int32Field"); // reset to initial value
e.int32Field // 0

These new functions provide a unified API to determine field presence for proto2, proto3, and - in the future - editions.

Notes

  • An object spread will not include initial values, but toPlainMessage does. Be careful when passing a plain message to a constructor – any default values in the plain message will become present.
  • Protobuf bytes fields can have default values. You must be careful not to modify this default value, as it would affect every other instance of the message. This only applies to explicitly set default values, not to zero-values, since a Uint8Array with zero length cannot be mutated.
  • The change for property typings only applies to scalar and enum fields, not to message fields. To do the same for message fields, we would need immutable messages, so that instances for default values can be shared. This may or may not be an option going forward, but we think that the change for primitive fields stands on its own.

@timostamm timostamm marked this pull request as ready for review February 14, 2024 18:19
Copy link
Member

@jhump jhump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be something in the generated code that can fail-fast if the new generated code is inadvertently combined with the old runtime library?

What about the reverse case? What happens if someone inadvertently upgrades to the new runtime library but forgets to re-generate code? It seems like there needs to be some affordance for this since any breakage would probably be pretty subtle and potentially inexplicable to users.

Protobuf bytes fields can have default values. You must be careful not to modify this default value, as it would affect every other instance of the message.

Could the default value be frozen with Object.seal()? Or perhaps bytes fields could be updated to use a union type that allows Uint8Array but also some read-only type (with similar read interface and a way to convert to Uint8Array)?

The change for property typings only applies to scalar and enum fields, not to message fields.

I think this is probably fine, perhaps even expected. (It is certainly similar to message types in protobuf-go, which are very different from other field types since they are pointers.)

@@ -34,7 +34,10 @@ export function isFieldSet(
case "scalar":
if (field.opt || field.req) {
// explicit presence
return target[localName] !== undefined;
return (
Object.prototype.hasOwnProperty.call(target, localName) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this done this way (vs. target.hasOwnProperty(localName)) in case somehow the target object has a separate member with that name (like maybe from a Protobuf field named has_own_property)? Or is this just best practice defensive code in JS for using any functionality inherited from Object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's best practice. eslint (most popular linter in the ecosystem) considers a direct call as an error.

ES2022 adds the static method Object.hasOwn as a replacement, but we target ES2017 and can't use it yet, most likely similar to many users.

@@ -67,16 +70,45 @@ export function clearField(
target[localName] = {};
break;
case "enum":
target[localName] = implicitPresence ? field.T.values[0].no : undefined;
if (implicitPresence) {
target[localName] = field.T.values[0].no;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No way to have scalarZeroValue compute this for enums? It would be nice to consolidate this case with the one below given how much repetition there is. (Just a thought; not a blocking a comment.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be nice to avoid the repetition here, but it would add enough complication for scalarZeroValue and the ScalarValue type (both parts of the public API), that it wouldn't be a net improvement overall.

@timostamm
Copy link
Member Author

Should there be something in the generated code that can fail-fast if the new generated code is inadvertently combined with the old runtime library?

What about the reverse case? What happens if someone inadvertently upgrades to the new runtime library but forgets to re-generate code? It seems like there needs to be some affordance for this since any breakage would probably be pretty subtle and potentially inexplicable to users.

That would definitely be nice to have, but it's quite a bit more complex to implement than in Go. A simple function call can throw off tree-shakers, which may either remove the call, or consider it an important side-effect and opt out of tree-shaking for the entire module.

Could the default value be frozen with Object.seal()? Or perhaps bytes fields could be updated to use a union type that allows Uint8Array but also some read-only type (with similar read interface and a way to convert to Uint8Array)?

Object.seal() still allows to change properties (including elements in the array). Object.freeze() would prevent mutation, but its not supported on typed arrays. We could roll our own, but it's far from trivial because typed arrays provide access to the underlying ArrayBuffer, and we have to be extremely cautious to still pass instanceof Uint8Array. Since typed arrays are built-in types, it's quite easy to run into issues on different runtime implementations and constraints they may put on built-ins. It may be possible to pull it off, but it's a significant piece of work that'll require tests and ongoing maintenance.

A union type has the disadvantage that it'll apply to every bytes field, and users will have to inspect and convert to Uint8Array every time they pass the value somewhere. To illustrate:

declare const bytesOrReadonlyBytes: Uint8Array | ReadonlyArray<number>;
const bytes: Uint8Array = "buffer" in bytesOrReadonlyBytes ? bytesOrReadonlyBytes : new Uint8Array(bytesOrReadonlyBytes);

This could be improved with a bespoke class for readonly bytes, or helper function that converts if necessary, but there are pitfalls around assignability that are difficult to predict with TypeScripts complex type system, the dual package hazard for a bespoke class, and it would require significant work to get right, not much different to the runtime-based sealing.

If the ecosystem or we come up with a solution for immutable typed arrays, we can bolt it on later. I do not think we should invest the necessary time into this edge case now.

The change for property typings only applies to scalar and enum fields, not to message fields.

I think this is probably fine, perhaps even expected. (It is certainly similar to message types in protobuf-go, which are very different from other field types since they are pointers.)

The advantage with protobuf-go is that it's possible to call methods on null pointers. msg.GetMsgField().GetBoolField() will return false even if Msg msg_field is unset. Since RPC response messages often contain a message field with an entity to return, this requires optional chaining in JS, as in resp.msgField?.boolField, and possibly more boilerplate when the value is passed somewhere that doesn't accept undefined. It's a bit of a pain point for users, especially on the fronted with functional paradigms that don't let you exit a code path.

@timostamm timostamm changed the title Improve support for proto2 required fields Support proto2 default values and field presence Feb 16, 2024
Copy link
Member

@jhump jhump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@timostamm timostamm merged commit 0b0f084 into main Feb 19, 2024
8 checks passed
@timostamm timostamm deleted the tstamm/improve-proto2-required branch February 19, 2024 14:48
timostamm added a commit that referenced this pull request Feb 20, 2024
timostamm added a commit that referenced this pull request Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants