Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated Fast Marshal Panics with Buffer Overflow in EncodeNested for proto3 Fields with Implicit Presence #183

Open
francoposa opened this issue Jan 31, 2025 · 1 comment
Labels
bug Something isn't working triage Issues that need to be reviewed by maintainers

Comments

@francoposa
Copy link

francoposa commented Jan 31, 2025

Title

Generated Fast Marshal Panics with Buffer Overflow in EncodeNested for proto3 Fields with Implicit Presence

Version

v0.35.0

Description

EncodeNested comes into play when the messages are nested more than one level deep.

A proto3 message with implicit presence will correctly report its Size as 0 when its members are equal to the default values
However, EncodeNested will still write the tag and size into the buffer, taking up two bytes for each message that should have implicit presence and therefore not be written into the buffer at all.
Those bytes were not accounted for in the initial buffer size allocation by Marshal before it calls into MarshalTo, so if there are any other fields to be serialized in the message, it will panic.

Note that this does not end up being true when dealing with implicit/explicit presence: https://protobuf.dev/reference/go/size/

Checking if proto.Size returns 0 is an easy way to recognize empty messages:

if proto.Size(m) == 0 {
    // No fields set (or, in proto3, all fields matching the default);
    // skip processing this message, or return an error, or similar.
}

Go's reflect-based marshaling itself has extensive code and full internal presence package to track field presence and resize the marshaling buffer if needed.

Repro steps

Full working code @ https://github.com/francoposa/proto-demo/tree/francoposa/prometheus-rw2-csproto-bug-repro

Simplified Prometheus Remote Write V2 Proto

syntax = "proto3";
package io.prometheus.write.v2;

option go_package = "github.com/prometheus/prometheus/write/v2;writev2";

// Request represents a request to write the given timeseries to a remote destination.
message Request {
  reserved 1 to 3;
  repeated string symbols = 4;
  repeated TimeSeries timeseries = 5;
}

// TimeSeries represents a single series.
message TimeSeries {
  repeated uint32 labels_refs = 1 [packed=false];

  repeated Sample samples = 2;
  reserved 3;
  reserved 4;

  Metadata metadata = 5;

  int64 created_timestamp = 6;
}

// Sample represents series sample.
message Sample {
  double value = 1;
  int64 timestamp = 2;
}

// Metadata represents the metadata associated with the given series' samples.
message Metadata {
  enum MetricType {
    METRIC_TYPE_UNSPECIFIED    = 0;
  }
  MetricType type = 1;
  uint32 help_ref = 3;
  uint32 unit_ref = 4;
}

Test to Repro Against the Generated fastmarshal Code

package writev2

import (
	"testing"

	"github.com/CrowdStrike/csproto"
	"github.com/stretchr/testify/assert"
)

func TestMarshalZeroValues(t *testing.T) {
	for _, tt := range []struct {
		name   string
		msg    csproto.Marshaler
		panics bool
	}{
		{
			name: "nonempty-zero-value-single-field-member",
			// does not panic because total size for entire message reports as zero;
			// there is no buffer and no serialization is performed
			msg: &TimeSeries{
				Samples: []*Sample{
					{
						Value:     0,
						Timestamp: 0,
					},
				},
			},
			panics: false,
		},
		{
			name: "nonempty-zero-value-single-nested-member-with-member-before",
			// panics because total size for each Sample message is report as zero
			// so bytes were not accounted for in the initial buffer size calculation,
			// but EncodeNested will still encode the tag and length for each Sample message;
			// buffer overflows when the Sample tag is written to it
			msg: &TimeSeries{
				LabelsRefs: []uint32{0, 1},
				Samples: []*Sample{
					{
						Value:     0,
						Timestamp: 0,
					},
				},
			},
			panics: true,
		},
		{
			name: "nonempty-zero-value-single-nested-member-with-member-after",
			// panics because total size for each Sample message is report as zero
			// so bytes were not accounted for in the initial buffer size calculation,
			// but EncodeNested will still encode the tag and length for each Sample message;
			// buffer overflows when the next (Metadata) field is written to it
			msg: &TimeSeries{
				Samples: []*Sample{
					{
						Value:     0,
						Timestamp: 0,
					},
				},
				Metadata: &Metadata{
					Type: Metadata_METRIC_TYPE_UNSPECIFIED,
				},
			},
			panics: true,
		},
	} {
		t.Run(tt.name, func(t *testing.T) {
			// csproto marshal
			if tt.panics {
				assert.Panics(t, func() {
					_, _ = tt.msg.Marshal()
				})
			} else {
				_, err := tt.msg.Marshal()
				assert.NoError(t, err)
			}
		})
	}
}
@francoposa francoposa added bug Something isn't working triage Issues that need to be reviewed by maintainers labels Jan 31, 2025
@francoposa
Copy link
Author

francoposa commented Jan 31, 2025

The difficulty appears to be that Size is not a good enough proxy for EncodeNested to know whether a zero-sized message field should still get its tag & size serialized or not - you also need to know whether it has implicit presence or not.
This is not unique to the csproto fastmarshal Size() Patching in upstream google.golang.org/protobuf/proto.Size has the same issue.

In proto3, the same message could have implicit or explicit presence depending on how it is included in the parent message:
https://protobuf.dev/programming-guides/field_presence/#presence-in-proto3-apis
E.g. a "Singular message" always has explicit presence, but a message included in a Repeated or Map always has implicit presence.

It seems like the codegen would have to propagate a parameter down from the parent message indicating whether the field has implicit presence or not into EncodeNested (or a new, similar method to avoid breaking that interface).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issues that need to be reviewed by maintainers
Projects
None yet
Development

No branches or pull requests

1 participant