-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix][Producer]: handle TopicNotFound/TopicTerminated/ProducerBlockedQuotaExceededException/ProducerFenced when reconnecting #1134
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to handle the errors when creating the producer here:
pulsar-client-go/pulsar/producer_partition.go
Lines 194 to 198 in 1b1dd23
if err != nil { | |
p.batchFlushTicker.Stop() | |
logger.WithError(err).Error("Failed to create producer at newPartitionProducer") | |
return nil, err | |
} |
But this could be considered as a separate issue and fixed in a separate PR.
LGTM |
Hmm, in |
If we don't introduce new states, then I think it's OK. I'm OK with not importing new producer states.
We need to investigate it further. Please submit an issue for it. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Could you add some tests?
Sure, could you please tell me how to trigger a |
You could create two producers both with |
@RobertIndie Hmm, this PR is about reconnecting, not producer creation, I think it is difficult to simulate when a producer's connection is closed while another producer connect to the same topic exclusively at the same time, and then the closed producer reconneting, it receive a |
@gunli pulsar-client-go/pulsar/producer_partition.go Lines 369 to 373 in 1b1dd23
You could refer to this way: pulsar-client-go/pulsar/producer_test.go Line 1283 in ec846ff
|
I know that, but the timing is difficult, when we call |
Could you try using pulsar-client-go/pulsar/producer.go Line 69 in 50015d3
|
I see, I will try that later, thank you. |
@RobertIndie I have tried that but failed, reconnecting is too fast, the second producer has no chance to get connected. And I also failed in simulating TopicNotFound, 'cause when there is an active producer, deleting a topic is denied by the server. I have pushed but commented the test cases, you can check them out. func TestTopicNotFound(t *testing.T) {
client, err := NewClient(ClientOptions{
URL: serviceURL,
})
assert.NoError(t, err)
defer client.Close()
topicName := newTopicName()
producer, err := client.CreateProducer(ProducerOptions{
Topic: topicName,
SendTimeout: 2 * time.Second,
})
assert.Nil(t, err)
defer producer.Close()
afterCh := time.After(5 * time.Second)
topicNotFoundChan := make(chan bool)
go func() {
for {
_, err := producer.Send(context.Background(), &ProducerMessage{
Payload: make([]byte, 1024),
})
if err != nil {
e := err.(*Error)
if e.result == TopicNotFound || err == errProducerClosed {
topicNotFoundChan <- true
} else {
topicNotFoundChan <- false
}
}
time.Sleep(1 * time.Millisecond)
}
}()
deleteURL := adminURL + "/admin/v2/persistent/public/default/" + topicName
log.Info(deleteURL)
makeHTTPCall(t, http.MethodDelete, deleteURL, "")
for {
select {
case d := <-topicNotFoundChan:
assert.Equal(t, d, true)
return
case <-afterCh:
assert.Fail(t, "Time is up. Topic should have been deleted by now")
return
}
}
}
func TestProducerFenced(t *testing.T) {
client, err := NewClient(ClientOptions{
URL: serviceURL,
})
assert.NoError(t, err)
defer client.Close()
topicName := newTopicName()
consumer, err := client.Subscribe(ConsumerOptions{
Topic: topicName,
SubscriptionName: "producer_fenced_sub",
})
assert.Nil(t, err)
defer consumer.Close() // subscribe but do nothing
// create the first producer exclusively
producer1, err := client.CreateProducer(ProducerOptions{
Topic: topicName,
SendTimeout: 2 * time.Second,
ProducerAccessMode: ProducerAccessModeWaitForExclusive,
BatchingMaxMessages: 2,
BatchingMaxSize: 200,
BatchingMaxPublishDelay: 1 * time.Second,
})
assert.Nil(t, err)
defer producer1.Close()
go func() {
// create the second producer wait for exclusive
fmt.Println("create the second producer wait for exclusive...")
producer2, err := client.CreateProducer(ProducerOptions{
Topic: topicName,
SendTimeout: 2 * time.Second,
ProducerAccessMode: ProducerAccessModeWaitForExclusive,
})
assert.Nil(t, err)
defer producer2.Close()
fmt.Println("the second producer is ready")
// keep producer2 alive
time.Sleep(30 * time.Second)
}()
time.Sleep(3 * time.Second)
afterCh := time.After(10 * time.Second)
producerFencedChan := make(chan bool)
go func() {
for {
producer1.SendAsync(context.Background(),
&ProducerMessage{Payload: make([]byte, 100)},
func(id MessageID, producerMessage *ProducerMessage, err error) {
if err != nil {
fmt.Println(err)
e := err.(*Error)
if e.result == ProducerFenced || err == errProducerClosed {
producerFencedChan <- true
} else {
producerFencedChan <- false
}
}
},
)
time.Sleep(1 * time.Millisecond)
}
}()
// trigger reconnecting
doneChan := make(chan bool)
go func() {
ticker := time.NewTicker(1 * time.Second)
defer ticker.Stop()
for {
select {
case <-doneChan:
return
case <-ticker.C:
fmt.Println("close connections...")
producers := producer1.(*producer).producers
for i := 0; i < len(producers); i++ {
partitionProducerImp := producers[i].(*partitionProducer)
partitionProducerImp.ConnectionClosed()
}
default:
}
}
}()
for {
select {
case d := <-producerFencedChan:
assert.Equal(t, d, true)
doneChan <- true
return
case <-afterCh:
assert.Fail(t, "Time is up. Producer should have been fenced by now")
doneChan <- true
return
}
}
} |
551f98d
to
9662c60
Compare
…dException/ProducerFenced when reconnecting
9662c60
to
3b278a9
Compare
@RobertIndie The CI is failed, but I can find out the root cause from the logs, would you please check it out? |
There is a data race issue in the CI: https://github.com/apache/pulsar-client-go/actions/runs/6978661724/job/19165650715?pr=1134#step:5:9630 @gunli Could you take a look? |
@RobertIndie I have pushed a commit to fix it, PTAL and run the CI. |
(If this PR fixes a github issue, please add
Fixes #<xyz>
.)Fixes #1128
(or if this PR is one task of a github issue, please add
Master Issue: #<xyz>
to link to the master issue.)Master Issue: #1128
Motivation
In Java client, when we get TopicNotFound/TopicTerminated/ProducerBlockedQuotaExceededException/ProducerFenced, we should failPendingMessages, and close producer. But in Go client, we forget to handle ProducerBlockedQuotaExceededException/ProducerFenced, and in #1128, we just call sr.done(), actually we should call failPendingMessages().
https://github.com/apache/pulsar-client-go/pull/1128/files
https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java#L1663
Modifications
errMsgTopicNotFount
toerrMsgTopicNotFound
failPendingMessages()
;Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If
yes
was chosen, please highlight the changesDocumentation