Add s3 batch consumer #43

cortze · 2025-01-16T17:10:29Z

Description

Due to the large number of traces Hermes can generate, we've decided to include the option to submit the traces (batched) into a given S3 bucket. Furthermore, we've agreed (at least so far) to rely on parquet files to make the traces easier to import on the Data processing side.

This PR adds that functionality into Hermes, adding all the necessary checks and tests to ensure nothing is broken.

Tasks:

S3 configuration flags at cmd
S3 batcher datastream interface
Local test for the s3 batcher interfaces
Parquet formating support for traces
Integration of S3 tests with the localstack s3 docker image
(Optional) possibility of defining from a template which metrics do we want to trace within Hermes

NOTE: this PR is still WIP, as I'll need to check the performance of both: the parquet formater and the s3 submitter.

host/s3.go

dennis-tra · 2025-01-17T07:56:35Z

host/s3.go

+	ctx      context.Context
+	cancelFn context.CancelFunc


If possible, let's avoid contexts on structs. From experience this was always possible with some restructuring. I think this is one of my more controversial opinions when it comes to writing Go code :D I just checked the rest of the code and the Kinesis Datastream also follows this pattern. So better be consistent instead of mixing patterns.

yeah, I had the same dilemma in mind when I saw the AWS kinesis DataStream implementation, so I decided to keep it constant.

Happy to reorg the logic if you think it's a dealbreaker and it doesn't take much time to implement

host/s3.go

dennis-tra · 2025-01-17T07:57:35Z

host/s3.go

+}
+
+func (s3ds *S3DataStream) OutputType() DataStreamOutputType {
+	return DataStreamOutputTypeKinesis


Is this correct?

Yeah, the naming might not be the clearest one, but there are only two types of Outputs:

The original Kinesis one

The extended one with more details for the Callback function (added by the EF team, and used at Xatu)

host/s3.go

dennis-tra · 2025-01-17T08:15:59Z

host/s3.go

+	opCtx, cancel := context.WithTimeout(ctx, S3OpTimeout)
+	defer cancel()
+
+	s3ds.client.DeleteObject(opCtx, &s3.DeleteObjectInput{


doesn't this return an error?

ah just for testing. Maybe still worth adding if at some point in the future someone just uses this method and expects to see an error?

dennis-tra · 2025-01-17T08:22:51Z

host/s3.go

+
+			}
+		}
+	}()


it would be great to ensure proper clean up of this go-routine by signaling the S3DataStream that this go routine has exited.

I would say:

add a flusherDone chan struct{} to the S3DataStream struct and initialize it to an unbounded channel

in this method here call add defer close(s3ds.flusherDone) to the inner go routine

in the Stop() method wait for the flusherDone channel to be closed before exiting.

Just to be sure that the process always exits properly you could also do it with a timeout like

select { case <-time.After(5*time.Second): // log that something didn't go as planed case <-s3ds.flusherDone: }

But if everything is setup correctly you wouldn't have to.

Lastly, in theory we then should make sure that spawnPeriodicFlusher is only called once during the lifecycle of the s3ds datastream. Otherwise the channel would be closed twice and we would panic. However, that also shouldn't happen - so I'd be fine not guarding for that.

that is indeed a really nice addition, thus the ctx given to the spawnPeriodicFlusher wasn't the main one. Updating it

host/s3.go

dennis-tra · 2025-01-17T08:28:41Z

host/s3.go

+func (b *traceBatcher) reset() []ParquetTraceEvent {
+	b.Lock()
+	prevTraces := make([]ParquetTraceEvent, len(b.traces))
+	for i, trace := range b.traces {
+		prevTraces[i] = *trace.toParquet()
+	}
+	b.traces = make([]*TraceEvent, 0)
+	b.Unlock()
+	return prevTraces


To avoid the data copy I think (not sure) that you can do something like this:

func (b *traceBatcher) reset() []ParquetTraceEvent { b.Lock() prevTraces := b.traces b.traces = nil b.Unlock() return prevTraces

I'm really not sure about this.

dennis-tra · 2025-01-17T08:29:51Z

Awesome that you have added all the tests!

dennis-tra · 2025-01-17T08:33:27Z

host/s3_test.go

+	require.NoError(t, err)
+
+	// wait 2,5 secs (flusher should kick in)
+	time.Sleep(2500 * time.Millisecond)


Let's avoid sleeps in tests. Is there another synchronization way?

dennis-tra · 2025-01-17T08:34:08Z

host/s3_test.go

+
+	// submit the traces
+	s3ds.submitRecords(ctx)
+	time.Sleep(300 * time.Millisecond)


same here. I'm happy to brainstorm how to add synchronization

cortze added 4 commits January 16, 2025 17:46

add localstack s3 support for local testing

be0d884

add first working draft of a s3 trace submitter + tests

7508c93

WIP - extesion of hermes cmd to s3 config

ebdec2b

update dependencies adding parquet-go and s3

68e8573

dennis-tra self-requested a review January 17, 2025 07:53

dennis-tra reviewed Jan 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add s3 batch consumer #43

Add s3 batch consumer #43

cortze commented Jan 16, 2025 •

edited

Loading

dennis-tra Jan 17, 2025

cortze Jan 17, 2025

dennis-tra Jan 17, 2025

cortze Jan 17, 2025

dennis-tra Jan 17, 2025

dennis-tra Jan 17, 2025

dennis-tra Jan 17, 2025

cortze Jan 17, 2025

dennis-tra Jan 17, 2025

dennis-tra commented Jan 17, 2025

dennis-tra Jan 17, 2025

dennis-tra Jan 17, 2025

Add s3 batch consumer #43

Are you sure you want to change the base?

Add s3 batch consumer #43

Conversation

cortze commented Jan 16, 2025 • edited Loading

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dennis-tra commented Jan 17, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cortze commented Jan 16, 2025 •

edited

Loading