Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add read/write lock in the fault handler #4309

Merged
merged 3 commits into from
Sep 5, 2024
Merged

Conversation

xxx0624
Copy link
Contributor

@xxx0624 xxx0624 commented Aug 26, 2024

Summary

This PR is to add a field / RWMutex in the fault handler which can help to avoid race condition where multiple clients would manipulate the same network resource at the same time.

Implementation details

Testing

New tests cover the changes:
No. Existing unit test will cover this.

Description for the changelog

Feature: Add RWMutex in the fault handler.

Additional Information

Does this PR include breaking model changes? If so, Have you added transformation functions?

Does this PR include the addition of new environment variables in the README?

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@xxx0624 xxx0624 force-pushed the fis-rwlock branch 6 times, most recently from af3afb4 to 5589a96 Compare August 28, 2024 00:17
@xxx0624 xxx0624 marked this pull request as ready for review August 29, 2024 18:34
@xxx0624 xxx0624 requested a review from a team as a code owner August 29, 2024 18:34
mye956
mye956 previously approved these changes Aug 29, 2024
mye956
mye956 previously approved these changes Sep 4, 2024
@@ -161,7 +161,7 @@ func (s *TMDSAgentState) getTaskMetadata(v3EndpointID string, includeTags bool)
Path: task.GetNetworkNamespace(),
NetworkInterfaces: []*tmdsv4.NetworkInterface{
{
DeviceName: "",
DeviceName: "ethx",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some context:
The DeviceName is required for network latency / network packet loss faults. We are exposing this information (and other task network config) via AgentState to facilitate the fault injection handler to start/stop/check network fault.

Why we need a non-empty value here which is fake: because in this PR, we introduce a new check (link) to see if the deviceName is empty. So if that's empty, these unit test will fail.

@@ -416,11 +473,17 @@ func (h *FaultHandler) CheckNetworkPacketLoss() func(http.ResponseWriter, *http.

// Obtain the task metadata via the endpoint container ID
// TODO: Will be using the returned task metadata in a future PR
_, err = validateTaskMetadata(w, h.AgentState, requestType, r)
taskMetadata, err := validateTaskMetadata(w, h.AgentState, requestType, r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this in all check methods? Since we're only calling tc q and check if "kind":"netem" exists, we don't need to know the network namespace path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for awsvpc mode, all commands should be running in the given task network namespace. So we also need this check here.

@xxx0624 xxx0624 merged commit 9ddccf9 into aws:dev Sep 5, 2024
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants