[BUG] Data race if the provider set few times #307

erka · 2024-11-26T19:01:34Z

Observed behavior

I have several tests with a similar setup: the provider is created, then configured, the tests are executed, and openfeature.Shutdown() is called for cleanup. However, when these tests are run with the race detector enabled, they consistently fail.

Expected Behavior

No response

Steps to reproduce

package openfeature_test

import (
	"context"
	"sync"
	"testing"

	of "github.com/open-feature/go-sdk/openfeature"
)

func TestDataRace(t *testing.T) {
	of.SetEvaluationContext(of.NewTargetlessEvaluationContext(map[string]any{}))

	err := of.SetProviderAndWait(newRaceProvider())
	if err != nil {
		t.Fatal(err)
	}

	of.Shutdown()

	err = of.SetProviderAndWait(newRaceProvider())
	if err != nil {
		t.Fatal(err)
	}
}

func newRaceProvider() of.FeatureProvider {
	return &raceProvider{
		state: of.NotReadyState,
	}
}

var (
	_ of.FeatureProvider = (*raceProvider)(nil) // ensure implements FeatureProvider
	_ of.StateHandler    = (*raceProvider)(nil) // ensure implements StateHandler
)

type raceProvider struct {
	state of.State
	mu    sync.RWMutex
}

func (p *raceProvider) Metadata() of.Metadata {
	return of.Metadata{
		Name: "racing",
	}
}

func (p *raceProvider) Status() of.State {
	p.mu.RLock()
	defer p.mu.RUnlock()

	return p.state
}

func (p *raceProvider) Init(evalCtx of.EvaluationContext) error {
	p.mu.Lock()
	defer p.mu.Unlock()
	p.state = of.ReadyState
	return nil
}

func (p *raceProvider) Shutdown() {
	p.mu.Lock()
	defer p.mu.Unlock()
	p.state = of.NotReadyState
}

func (p *raceProvider) BooleanEvaluation(ctx context.Context, flag string, defaultValue bool, evalCtx of.FlattenedContext) of.BoolResolutionDetail {
	return of.BoolResolutionDetail{}
}

func (p *raceProvider) StringEvaluation(ctx context.Context, flag string, defaultValue string, evalCtx of.FlattenedContext) of.StringResolutionDetail {
	return of.StringResolutionDetail{}
}

func (p *raceProvider) FloatEvaluation(ctx context.Context, flag string, defaultValue float64, evalCtx of.FlattenedContext) of.FloatResolutionDetail {
	return of.FloatResolutionDetail{}
}

func (p *raceProvider) IntEvaluation(ctx context.Context, flag string, defaultValue int64, evalCtx of.FlattenedContext) of.IntResolutionDetail {
	return of.IntResolutionDetail{}
}

func (p *raceProvider) ObjectEvaluation(ctx context.Context, flag string, defaultValue interface{}, evalCtx of.FlattenedContext) of.InterfaceResolutionDetail {
	return of.InterfaceResolutionDetail{}
}

func (p *raceProvider) Hooks() []of.Hook {
	return []of.Hook{}
}

Create a file with test and run go test -race ./...

The text was updated successfully, but these errors were encountered:

beeme1mr · 2024-11-26T19:28:06Z

Hey @erka, could you please try the test provider and see if that resolves your issue?

erka · 2024-11-26T21:22:23Z

@beeme1mr I am trying to implement the provider and write some e2e tests for it. Using another provider doesn't help much. The issue is related to the providers which implement StateHandler interface.

beeme1mr · 2024-11-26T21:31:36Z

In the next release, stage management moves to the provider. This is a non-breaking change but may address this problem. FYI @toddbaert

toddbaert · 2024-11-28T14:02:22Z

There have been substantial changes to the SDK recently like @beeme1mr said, which are yet unreleased.

The changes are significant enough that there's a chance this bug could be resolved. I would have liked to see the released already but there's a few more issues we have to iron out with it, namely the one you pointed out here: #296 (review)

erka · 2024-12-10T20:40:39Z

Just keep this alive, the release 1.14.0 doesn't fix the issue. go-sdk uses reflect.DeepEqual() to compare provider(s) internally and that doesn't play very well if provider has sync.Mutex probably.

toddbaert · 2024-12-11T19:05:27Z

Just keep this alive, the release 1.14.0 doesn't fix the issue. go-sdk uses reflect.DeepEqual() to compare provider(s) internally and that doesn't play very well if provider has sync.Mutex probably.

Thanks for the update @erka ! If you have some ideas for a fix, I'm open to that. Otherwise, we have some pressing issues internally so I probably won't be able to dig into this until next week.

🙏

blkt · 2024-12-13T13:01:17Z

Hi @toddbaert, we're having the same issue with our project.
After looking into this, the problem seems to be that this invocation of Shutdown happens while the subsequent SetProviderAndWait is running.
https://github.com/open-feature/go-sdk/blob/main/openfeature/openfeature_api.go#L254-L256

I think that just running v.Shutdown() should solve the problem.

cc @rdimitrov

warber · 2024-12-14T21:19:44Z

I think that just running v.Shutdown() should solve the problem.

@blkt i gave it a quick shot and it seems to fix the problem. At least all tests pass, even with the race detector enabled. Cool.

go-sdk uses reflect.DeepEqual() to compare provider(s) internally and that doesn't play very well if provider has sync.Mutex probably.

@erka yea using reflect.DeepEqual() to compare provider implementations is a problem...
Here are just some ideas:

Require the feature flag implementation to include a UniqueName() or Id() method that can be used to internally identify providers. Also not an ideal solution.
Serialize provider implementations into a format like JSON, which naturally excludes unexported fields and requires omitting problematic field types like sync.Mutex. However, this approach is also far from ideal.
Avoid directly using the FeatureProvider implementation within the SDK. Instead, create a providerReference wrapper during registration, assigning it a self-generated identifier. Subsequently, the rest of the code should work with providerReference values, enabling easy comparison using their IDs. This approach seems like the most promising. The effort strongly depends on how easy it is to change the code to not use FeatureProvider implementations directly.

erka · 2024-12-16T21:27:54Z

@warber We could add a kind field to providerReference and set it to reflect.TypeOf(FeatureProvider).Kind(). If the kind is ptr, we can perform a direct comparison of FeatureProvider. For struct, fallback to reflect.DeepEqual.

toddbaert · 2024-12-23T20:12:54Z

@warber We could add a kind field to providerReference and set it to reflect.TypeOf(FeatureProvider).Kind(). If the kind is ptr, we can perform a direct comparison of FeatureProvider. For struct, fallback to reflect.DeepEqual.

I'll give this a shot this week unless I see a PR from somebody else first.

toddbaert · 2024-12-24T17:49:41Z

My findings so far:

In addition to just doing v.Shutdown() as @blkt suggested here, another simple (but inadequate, IMO) solution is to RLock the api mutex during shutdown:

	go func(forShutdown StateHandler) {
		api.mu.RLock()
		defer api.mu.RUnlock()
		forShutdown.Shutdown()
	}(v)

This is probably a bit better than just running shutdown synchronously, but it still means that we won't be able to register providers until the old provider is shut down, which is bad, we want the shutdown not to block anything - it's for side effects such as flushing telemetry, etc. This solution also hides the real issue, IMO...
The root problem of this entire issue, as far as I can tell, is that we are using DeepEqual, which is a problem specifically because it reads unexported fields (in this case the state of @erka 's provider), bypassing the mutex in that provider and causing the race (this is confirmed by the trace in the DATA RACE warning). I think this is a more general issue and we should not do it, or we risk a lot of this sort of issue.
I think the only real solution is to find a safe way to compare providers without using DeepEqual at all, so that provider's unexported fields are never read (perhaps the way @warber suggested in their last point here)

toddbaert · 2024-12-31T19:14:07Z

I think the only real solution is to find a safe way to compare providers without using DeepEqual at all, so that provider's unexported fields are never read (perhaps the way @warber suggested in their last point here)

I'm going to give this a shot. Somebody stop me if you think I'm barking up the wrong tree.

erka added bug Something isn't working Needs Triage labels Nov 26, 2024

blkt mentioned this issue Dec 13, 2024

build(deps): bump github.com/open-feature/go-sdk from 1.13.1 to 1.14.0 mindersec/minder#5175

Open

toddbaert self-assigned this Dec 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Data race if the provider set few times #307

[BUG] Data race if the provider set few times #307

erka commented Nov 26, 2024

beeme1mr commented Nov 26, 2024

erka commented Nov 26, 2024

beeme1mr commented Nov 26, 2024

toddbaert commented Nov 28, 2024

erka commented Dec 10, 2024

toddbaert commented Dec 11, 2024

blkt commented Dec 13, 2024 •

edited

Loading

warber commented Dec 14, 2024 •

edited

Loading

erka commented Dec 16, 2024 •

edited

Loading

toddbaert commented Dec 23, 2024

toddbaert commented Dec 24, 2024 •

edited

Loading

toddbaert commented Dec 31, 2024 •

edited

Loading

[BUG] Data race if the provider set few times #307

[BUG] Data race if the provider set few times #307

Comments

erka commented Nov 26, 2024

Observed behavior

Expected Behavior

Steps to reproduce

beeme1mr commented Nov 26, 2024

erka commented Nov 26, 2024

beeme1mr commented Nov 26, 2024

toddbaert commented Nov 28, 2024

erka commented Dec 10, 2024

toddbaert commented Dec 11, 2024

blkt commented Dec 13, 2024 • edited Loading

warber commented Dec 14, 2024 • edited Loading

erka commented Dec 16, 2024 • edited Loading

toddbaert commented Dec 23, 2024

toddbaert commented Dec 24, 2024 • edited Loading

toddbaert commented Dec 31, 2024 • edited Loading

blkt commented Dec 13, 2024 •

edited

Loading

warber commented Dec 14, 2024 •

edited

Loading

erka commented Dec 16, 2024 •

edited

Loading

toddbaert commented Dec 24, 2024 •

edited

Loading

toddbaert commented Dec 31, 2024 •

edited

Loading