-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Use gRPC bidirectional streaming for source transformer #2071
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2071 +/- ##
==========================================
+ Coverage 64.44% 64.49% +0.04%
==========================================
Files 324 324
Lines 30502 30579 +77
==========================================
+ Hits 19658 19721 +63
- Misses 9819 9820 +1
- Partials 1025 1038 +13 ☔ View full report in Codecov by Sentry. |
00b3445
to
1b9a06d
Compare
@@ -25,13 +25,13 @@ import ( | |||
// SourceTransformApplier applies the source transform on the read message and gives back a new message. Any UserError will be retried here, while | |||
// InternalErr can be returned and could be retried by the callee. | |||
type SourceTransformApplier interface { | |||
ApplyTransform(ctx context.Context, message *isb.ReadMessage) ([]*isb.WriteMessage, error) | |||
ApplyTransform(ctx context.Context, messages []*isb.ReadMessage) ([]isb.ReadWriteMessagePair, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please help me understand the need for this signature change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we run the below func in goroutines and call ApplyTransform
within it:
concurrentApplyTransformer(ctx context.Context, readMessagePair <-chan *isb.ReadWriteMessagePair)
It retrieves message from the readMessagePair
channel, apply transformer, and store the WriteMessages
in the same element.
With bidrectional streaming, the concurrency is moved to the server side. It might be difficult to have a function signature that takes a message and returns the output since we are not sure of the result order from the stream. Instead, we send all messages, wait for all output and aggregate the result in the new implementation.
func (df *DataForward) concurrentApplyTransformer(ctx context.Context, readMessagePair <-chan *isb.ReadWriteMessagePair) { | ||
for message := range readMessagePair { | ||
start := time.Now() | ||
metrics.SourceTransformerReadMessagesCount.With(map[string]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need this metrics?
} | ||
taggedMessages = append(taggedMessages, taggedMessage) | ||
if !tracker.IsEmpty() { | ||
return nil, errors.New("transform response for all requests were not received from UDF") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we dump the tracker state at this point if you think it could help triaging later on?
rust/monovertex/src/config.rs
Outdated
/// # Returns | ||
/// A string representing the `OnFailureStrategy` enum variant. | ||
fn to_string(&self) -> String { | ||
impl Display for OnFailureStrategy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you :)
rust/monovertex/src/transformer.rs
Outdated
pub(crate) async fn new(client: SourceTransformClient<Channel>) -> Result<Self> { | ||
Ok(Self { client }) | ||
pub(crate) async fn new(mut client: SourceTransformClient<Channel>) -> Result<Self> { | ||
let (read_tx, read_rx) = mpsc::channel(config().batch_size as usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: referencing config
directly can make testing difficult if you want to change the value for testing.
rust/monovertex/src/transformer.rs
Outdated
let transform_fn_with_timeout = tokio::time::timeout( | ||
Duration::from_secs(3), | ||
client.source_transform_fn(Request::new(read_stream)), | ||
); | ||
let Ok(resp_stream) = transform_fn_with_timeout.await else { | ||
return Err(Error::TransformerError( | ||
"connection to transformer gRPC server timed out".to_string(), | ||
)); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only 3 sec timeout? will this whole thing become flaky in a busy system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we create the stream at the beginning and we don't create multiple streams, this should succeed immediately. I added the timeout because the tests were stuck when there was some issue with the server (server started,but there was a mismatch in the protobuf files on the client and the server). I will remove the timeouts.
rust/monovertex/src/transformer.rs
Outdated
} | ||
|
||
Ok(Some(messages)) | ||
let sender_task = tokio::time::timeout(Duration::from_millis(2), sender); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please document your thought on why you did this approach.
pkg/udf/rpc/tracker.go
Outdated
@@ -6,68 +6,68 @@ import ( | |||
"github.com/numaproj/numaflow/pkg/isb" | |||
) | |||
|
|||
// tracker is used to store a key value pair for string and *isb.ReadMessage | |||
// Tracker is used to store a key value pair for string and *isb.ReadMessage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this file to pkg/isb/
and rename to message_tracker.go
, also rename the struck name?
rust/monovertex/src/transformer.rs
Outdated
|
||
use crate::config::config; | ||
use crate::error::{Error, Result}; | ||
use crate::message::{Message, Offset}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this Message
will be moved to a separate package that will be shared by both MonoVertex and Pipeline.
rust/monovertex/src/transformer.rs
Outdated
}; | ||
for (i, result) in resp.results.into_iter().enumerate() { | ||
if result.tags.iter().any(|x| x == DROP) { | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: metrics
for { | ||
select { | ||
case <-ctx.Done(): | ||
return nil, fmt.Errorf("waiting for transformer gRPC server to be ready: %w", context.Cause(ctx)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, fmt.Errorf("waiting for transformer gRPC server to be ready: %w", context.Cause(ctx)) | |
return nil, fmt.Errorf("failed waiting for transformer gRPC server to be ready: %w", ctx.Err()) |
context.Cause(ctx)
returns same error obj as ctx.Err()
. let's change it to be consistent with other sdk clients.
|
||
c.stream, err = c.grpcClt.SourceTransformFn(ctx) | ||
if err != nil { | ||
return nil, fmt.Errorf("creating a gRPC stream for source transform: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, fmt.Errorf("creating a gRPC stream for source transform: %w", err) | |
return nil, fmt.Errorf("failed creating a gRPC stream for source transform: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen somewhere (Go blog / talks), to not include "failed" in an error message, since error
type itself indicates that it failed. If the error gets wrapped again on the caller's side, it could result in a final message of below pattern:
Failed to create user: Failed to create new account: Failed to INSERT new account record: database not connected
Compared to:
Failed to create user: creating new account: INSERT new account record: database not connected
return c, nil | ||
} | ||
func doHandshake(stream transformpb.SourceTransform_SourceTransformFnClient) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func doHandshake(stream transformpb.SourceTransform_SourceTransformFnClient) error { | |
func doHandshake(stream transformpb.SourceTransform_SourceTransformFnClient) error { |
}, | ||
} | ||
if err := stream.Send(handshakeReq); err != nil { | ||
return fmt.Errorf("sending handshake request for source tansform: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fmt.Errorf("sending handshake request for source tansform: %w", err) | |
return fmt.Errorf("failed sending handshake request for source tansform: %w", err) |
|
||
handshakeResp, err := stream.Recv() | ||
if err != nil { | ||
return fmt.Errorf("receiving handshake response from source transform stream: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fmt.Errorf("receiving handshake response from source transform stream: %w", err) | |
return fmt.Errorf("failed receiving handshake response from source transform stream: %w", err) |
Tags: result.Tags, | ||
} | ||
taggedMessages = append(taggedMessages, taggedMessage) | ||
if !tracker.IsEmpty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the current implementation, this if statement should never hold true, right? because we only break the loop when messageCount is 0.
pkg/udf/rpc/tracker.go
Outdated
m: make(map[string]*isb.ReadMessage), | ||
lock: sync.RWMutex{}, | ||
} | ||
} | ||
|
||
// addRequest add a new entry for a given message to the tracker. | ||
// AddRequest add a new entry for a given message to the Tracker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming: please be concise. if we are saving messages, use AddMessage instead of Request.
pkg/udf/rpc/tracker.go
Outdated
// to indicate if it does not exist | ||
func (t *tracker) getRequest(id string) (*isb.ReadMessage, bool) { | ||
func (t *Tracker) GetRequest(id string) (*isb.ReadMessage, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we simply return *isb.RM and use nil to indicate message doesn't exist?
pkg/udf/rpc/tracker.go
Outdated
return &tracker{ | ||
// NewTracker initializes a new instance of a Tracker | ||
func NewTracker() *Tracker { | ||
return &Tracker{ | ||
m: make(map[string]*isb.ReadMessage), | ||
lock: sync.RWMutex{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a fundamental question. With bidirectional stream, are we trying to improve performance? If so, is having this tracker with RWMutex making introducing extra io latency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutex operations (unless there is too much contention) are much faster than network calls (nanoseconds vs micro/milli seconds). Unary calls has some overhead. As per the docs:
Use streaming RPCs when handling a long-lived logical flow of data from the client-to-server, server-to-client, or in both directions. Streams can avoid continuous RPC initiation, which includes connection load balancing at the client-side, starting a new HTTP/2 request at the transport layer, and invoking a user-defined method handler on the server side.
pkg/udf/rpc/tracker.go
Outdated
// removeRequest will remove the entry for a given id | ||
func (t *tracker) removeRequest(id string) { | ||
// RemoveRequest will remove the entry for a given id | ||
func (t *Tracker) RemoveRequest(id string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove can be merged with Get as one method.
Signed-off-by: Sreekanth <[email protected]>
Signed-off-by: Sreekanth <[email protected]>
8cad234
to
8c96ca7
Compare
Signed-off-by: Sreekanth <[email protected]>
Signed-off-by: Yashash H L <[email protected]>
Signed-off-by: Sreekanth <[email protected]>
Signed-off-by: Yashash H L <[email protected]>
Signed-off-by: Yashash H L <[email protected]>
chore: fixes and minor refactor
Signed-off-by: Sreekanth <[email protected]>
Signed-off-by: Sreekanth <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Part of #2046
Related PRs:
numaproj/numaflow-go#147
numaproj/numaflow-rs#91