Add read/write lock in the fault handler #4309

xxx0624 · 2024-08-26T21:04:01Z

Summary

This PR is to add a field / RWMutex in the fault handler which can help to avoid race condition where multiple clients would manipulate the same network resource at the same time.

Implementation details

Add the read/write lock in the FaultHandler
Add a new method to create the handler for faults
Add more validations in validateTaskNetworkConfig to make sure the required field has non-empty value.

Testing

New tests cover the changes:
No. Existing unit test will cover this.

Description for the changelog

Feature: Add RWMutex in the fault handler.

Additional Information

Does this PR include breaking model changes? If so, Have you added transformation functions?

Does this PR include the addition of new environment variables in the README?

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

ecs-agent/tmds/handlers/fault/v1/handlers/handlers_test.go

ecs-agent/tmds/handlers/fault/v1/handlers/handlers.go

amogh09 · 2024-09-05T16:16:47Z

agent/handlers/v4/tmdsstate.go

@@ -161,7 +161,7 @@ func (s *TMDSAgentState) getTaskMetadata(v3EndpointID string, includeTags bool)
 					Path: task.GetNetworkNamespace(),
 					NetworkInterfaces: []*tmdsv4.NetworkInterface{
 						{
-							DeviceName: "",
+							DeviceName: "ethx",


Why this change?

Some context:
The DeviceName is required for network latency / network packet loss faults. We are exposing this information (and other task network config) via AgentState to facilitate the fault injection handler to start/stop/check network fault.

Why we need a non-empty value here which is fake: because in this PR, we introduce a new check (link) to see if the deviceName is empty. So if that's empty, these unit test will fail.

tshan2001 · 2024-09-05T17:13:30Z

ecs-agent/tmds/handlers/fault/v1/handlers/handlers.go

@@ -416,11 +473,17 @@ func (h *FaultHandler) CheckNetworkPacketLoss() func(http.ResponseWriter, *http.

 		// Obtain the task metadata via the endpoint container ID
 		// TODO: Will be using the returned task metadata in a future PR
-		_, err = validateTaskMetadata(w, h.AgentState, requestType, r)
+		taskMetadata, err := validateTaskMetadata(w, h.AgentState, requestType, r)


Do we need this in all check methods? Since we're only calling tc q and check if "kind":"netem" exists, we don't need to know the network namespace path

for awsvpc mode, all commands should be running in the given task network namespace. So we also need this check here.

xxx0624 force-pushed the fis-rwlock branch 6 times, most recently from af3afb4 to 5589a96 Compare August 28, 2024 00:17

xxx0624 marked this pull request as ready for review August 29, 2024 18:34

xxx0624 requested a review from a team as a code owner August 29, 2024 18:34

xxx0624 requested review from amogh09, mye956 and tshan2001 August 29, 2024 18:35

xxx0624 added the bot/test label Aug 29, 2024

amazon-ecs-bot removed the bot/test label Aug 29, 2024

mye956 reviewed Aug 29, 2024

View reviewed changes

ecs-agent/tmds/handlers/fault/v1/handlers/handlers_test.go Outdated Show resolved Hide resolved

mye956 previously approved these changes Aug 29, 2024

View reviewed changes

amogh09 reviewed Sep 3, 2024

View reviewed changes

ecs-agent/tmds/handlers/fault/v1/handlers/handlers.go Outdated Show resolved Hide resolved

ecs-agent/tmds/handlers/fault/v1/handlers/handlers.go Outdated Show resolved Hide resolved

xxx0624 dismissed mye956’s stale review via fb15c09 September 3, 2024 19:13

xxx0624 force-pushed the fis-rwlock branch 3 times, most recently from 0f2cd58 to 02cb576 Compare September 3, 2024 19:19

xxx0624 added the bot/test label Sep 3, 2024

amazon-ecs-bot removed the bot/test label Sep 3, 2024

mye956 previously approved these changes Sep 4, 2024

View reviewed changes

amogh09 reviewed Sep 5, 2024

View reviewed changes

xxx0624 added 3 commits September 5, 2024 16:59

Add read/write lock in the fault injection handler

66464f3

Remove ENI ID which is used for RWMutex

8c868c0

Add a comment

c58e734

xxx0624 dismissed mye956’s stale review via c58e734 September 5, 2024 17:00

xxx0624 force-pushed the fis-rwlock branch from 02cb576 to c58e734 Compare September 5, 2024 17:00

tshan2001 reviewed Sep 5, 2024

View reviewed changes

amogh09 approved these changes Sep 5, 2024

View reviewed changes

xxx0624 added the bot/test label Sep 5, 2024

amazon-ecs-bot removed the bot/test label Sep 5, 2024

tshan2001 approved these changes Sep 5, 2024

View reviewed changes

mye956 approved these changes Sep 5, 2024

View reviewed changes

mye956 mentioned this pull request Sep 5, 2024

Update agentstate to include task network namespace and default interface name to populate task network config #4315

Merged

xxx0624 merged commit 9ddccf9 into aws:dev Sep 5, 2024
40 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add read/write lock in the fault handler #4309

Add read/write lock in the fault handler #4309

xxx0624 commented Aug 26, 2024 •

edited

Loading

amogh09 Sep 5, 2024

amogh09 Sep 5, 2024

xxx0624 Sep 5, 2024

tshan2001 Sep 5, 2024

xxx0624 Sep 5, 2024

Add read/write lock in the fault handler #4309

Add read/write lock in the fault handler #4309

Conversation

xxx0624 commented Aug 26, 2024 • edited Loading

Summary

Implementation details

Testing

Description for the changelog

Additional Information

Licensing

amogh09 Sep 5, 2024

Choose a reason for hiding this comment

amogh09 Sep 5, 2024

Choose a reason for hiding this comment

xxx0624 Sep 5, 2024

Choose a reason for hiding this comment

tshan2001 Sep 5, 2024

Choose a reason for hiding this comment

xxx0624 Sep 5, 2024

Choose a reason for hiding this comment

xxx0624 commented Aug 26, 2024 •

edited

Loading