Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding start and stop black hole port fault implementation #4355

Merged
merged 1 commit into from
Sep 24, 2024

Conversation

mye956
Copy link
Contributor

@mye956 mye956 commented Sep 19, 2024

Summary

This PR will introduce both start and stop network black hole port fault injection into the ecs-agent directory. It does so by making iptables commands via os/exec.

Implementation details

We will be adding two new functions, startNetworkBlackholePort() and stopNetworkBlackHolePort(), into the ecs-agent/tmds/handlers/fault/v1/handlers/handlers.go file.

  • startNetworkBlackholePort(): This function is responsible for starting/injecting a new network black hole port with the specified traffic type, protocol, and port number that's passed from the request body. It is called in StartNetworkBlackholePort(). The general workflow of this function is as followed:
    1. Checks if there's not a already running chain with the specified protocol and port number already via checkNetworkBlackHolePort()
    2. Creates a new chain via iptables -N <chain> (the chain name is in the form of "--")
    3. Appends a new rule to the newly created chain via iptables -A <chain> -p <protocol> --dport <port> -j DROP
    4. Inserts the newly created chain into the built-in INPUT/OUTPUT table
  • stopNetworkBlackHolePort(): This function is responsible for stopping a specific network black hole port with the specified traffic type, protocol, and port number that's passed from the request body. It is called in StopNetworkBlackHolePort(). The general workflow of this function is as followed:
    1. Checks if there's a running chain with the specified protocol and port number via checkNetworkBlackHolePort()
    2. Clears all rules within the specific chain via iptables -F <chain>
    3. Removes the specific chain from the built-in INPUT/OUTPUT table via iptables -D <INPUT/OUTPUT> -j <chain>
    4. Deletes the specific chain via iptables -X <chain>

Similar to CheckNetworkBlackHolePort(), both StartNetworkBlackholePort() and StopNetworkBlackHolePort() handler functions will also have the following checks before responding back to the request.:

  • If either startNetworkBlackholePort() and stopNetworkBlackHolePort() takes too long to finish then we will respond back with a 500 + "request timed out" error message.
  • If there were any errors when running any of the the iptables commands in startNetworkBlackholePort() and stopNetworkBlackHolePort() then we will respond back with a 500 + whatever the standard error was from the iptables commands

Testing

  • New unit test cases were added to generateStartBlackHolePortFaultTestCases and generateStopBlackHolePortFaultTestCases with mock exec expectation calls. Existing test cases now also have the correct mock exec expectation calls.
  • Renamed generateNetworkBlackHolePortTestCases to generateCommonNetworkBlackHolePortTestCases
  • This PR will also refactor the existing tests in agent/handlers/task_server_setup_test.go to just test whether or not we can make successful requests to each of the fault injection TMDS endpoints. The deleted tests are also already tested in ecs-agent/tmds/handlers/fault/v1/handlers/handlers_test.go already.

Manual Testing:
Hooked up the fault injection handlers to also register upon TMDS server start up, ran a AWSVPC task that calls all three BHP endpoints (start -> check status -> stop BHP fault)

level=debug time=2024-09-20T00:56:51Z msg="Handling http request" method="PUT" from="169.254.172.2:42200"
level=info time=2024-09-20T00:56:51Z msg="Received new request for request type: start network-blackhole-port" request="{\"Protocol\":\"tcp\",\"TrafficType\":\"egress\",\"Port\":1234}" requestType="start network-blackhole-port" tmdsEndpointContainerID="f4645575-7c7f-49b9-b605-38854d1f1775"
level=info time=2024-09-20T00:56:51Z msg="[INFO] Black hole port fault is not running" netns="/host/proc/25803/ns/net" command="nsenter --net=/host/proc/25803/ns/net iptables -C egress-tcp-1234 -p tcp --dport 1234 -j DROP" output="iptables: Bad rule (does a matching rule exist in that chain?).\n" exitCode=1
level=info time=2024-09-20T00:56:51Z msg="[INFO] Attempting to start network black hole port fault" netns="/host/proc/25803/ns/net" chain="egress-tcp-1234"
level=info time=2024-09-20T00:56:51Z msg="Successfully started fault" requestType="start network-blackhole-port" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}" response="{\"Status\":\"running\"}"
level=debug time=2024-09-20T00:57:00Z msg="Storage stats not reported for container" module=utils_unix.go
level=debug time=2024-09-20T00:57:01Z msg="Handling http request" method="GET" from="169.254.172.2:59142"
level=info time=2024-09-20T00:57:01Z msg="Received new request for request type: check status network-blackhole-port" requestType="check status network-blackhole-port" tmdsEndpointContainerID="f4645575-7c7f-49b9-b605-38854d1f1775" request="{\"Protocol\":\"tcp\",\"TrafficType\":\"egress\",\"Port\":1234}"
level=debug time=2024-09-20T00:57:01Z msg="Successfully parsed fault request payload" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}"
level=info time=2024-09-20T00:57:01Z msg="[INFO] Black hole port fault has been found running" netns="/host/proc/25803/ns/net" command="nsenter --net=/host/proc/25803/ns/net iptables -C egress-tcp-1234 -p tcp --dport 1234 -j DROP" output=""
level=info time=2024-09-20T00:57:01Z msg="[INFO] Successfully checked status for fault" requestType="check status network-blackhole-port" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}" response="{\"Status\":\"running\"}"
level=debug time=2024-09-20T00:57:05Z msg="Received message of type: HeartbeatMessage"
level=debug time=2024-09-20T00:57:05Z msg="ACS activity occurred"
level=debug time=2024-09-20T00:57:05Z msg="Sending response to ACS" Name="heartbeat message responder" Response={
  MessageId: "fd8a0b80-f7e0-41a9-82bd-8d20450c03fa"
}
level=debug time=2024-09-20T00:58:01Z msg="Handling http request" method="DELETE" from="169.254.172.2:52668"
level=info time=2024-09-20T00:58:01Z msg="Received new request for request type: stop network-blackhole-port" request="{\"Protocol\":\"tcp\",\"TrafficType\":\"egress\",\"Port\":1234}" requestType="stop network-blackhole-port" tmdsEndpointContainerID="f4645575-7c7f-49b9-b605-38854d1f1775"
level=debug time=2024-09-20T00:58:01Z msg="Successfully parsed fault request payload" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}"
level=info time=2024-09-20T00:58:01Z msg="[INFO] Black hole port fault has been found running" netns="/host/proc/25803/ns/net" command="nsenter --net=/host/proc/25803/ns/net iptables -C egress-tcp-1234 -p tcp --dport 1234 -j DROP" output=""
level=info time=2024-09-20T00:58:01Z msg="[INFO] Attempting to stop network black hole port fault" netns="/host/proc/25803/ns/net" chain="egress-tcp-1234"
level=info time=2024-09-20T00:58:01Z msg="Successfully stopped fault" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}" response="{\"Status\":\"stopped\"}" requestType="stop network-blackhole-port"

Corresponding iptables output in task ENI/network namespace

[ec2-user@ip-172-31-25-237 amazon-ecs-agent]$ sudo nsenter --net=/proc/25803/ns/net iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
egress-tcp-1234  all  --  0.0.0.0/0            0.0.0.0/0           

Chain egress-tcp-1234 (1 references)
target     prot opt source               destination         
DROP       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:1234
[ec2-user@ip-172-31-25-237 amazon-ecs-agent]$ sudo iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
DROP       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:51678
DROP       all  -- !127.0.0.0/8          127.0.0.0/8          ! ctstate RELATED,ESTABLISHED,DNAT

Chain FORWARD (policy DROP)
target     prot opt source               destination         
DOCKER-USER  all  --  0.0.0.0/0            0.0.0.0/0           
DOCKER-ISOLATION-STAGE-1  all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           
[ec2-user@ip-172-31-25-237 amazon-ecs-agent]$ sudo nsenter --net=/proc/25803/ns/net iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
[ec2-user@ip-172-31-25-237 amazon-ecs-agent]$ sudo iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
DROP       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:51678
DROP       all  -- !127.0.0.0/8          127.0.0.0/8          ! ctstate RELATED,ESTABLISHED,DNAT

Chain FORWARD (policy DROP)
target     prot opt source               destination         
DOCKER-USER  all  --  0.0.0.0/0            0.0.0.0/0           
DOCKER-ISOLATION-STAGE-1  all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Same test but using Host mode task

level=debug time=2024-09-20T01:01:33Z msg="Handling http request" method="PUT" from="172.31.25.237:38452"
level=info time=2024-09-20T01:01:33Z msg="Received new request for request type: start network-blackhole-port" request="{\"Protocol\":\"tcp\",\"TrafficType\":\"egress\",\"Port\":1234}" requestType="start network-blackhole-port" tmdsEndpointContainerID="64ca83af-f51b-4e66-acff-d0e3c29c1afc"
level=info time=2024-09-20T01:01:33Z msg="[INFO] Black hole port fault is not running" netns="host" command="iptables -C egress-tcp-1234 -p tcp --dport 1234 -j DROP" output="iptables: Bad rule (does a matching rule exist in that chain?).\n" exitCode=1
level=info time=2024-09-20T01:01:33Z msg="[INFO] Attempting to start network black hole port fault" netns="host" chain="egress-tcp-1234"
level=info time=2024-09-20T01:01:33Z msg="Successfully started fault" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}" response="{\"Status\":\"running\"}" requestType="start network-blackhole-port"
level=debug time=2024-09-20T01:01:38Z msg="Handling http request" method="HEAD" from="127.0.0.1:59468"
level=debug time=2024-09-20T01:01:43Z msg="Handling http request" method="GET" from="172.31.25.237:54894"
level=info time=2024-09-20T01:01:43Z msg="Received new request for request type: check status network-blackhole-port" request="{\"Protocol\":\"tcp\",\"TrafficType\":\"egress\",\"Port\":1234}" requestType="check status network-blackhole-port" tmdsEndpointContainerID="64ca83af-f51b-4e66-acff-d0e3c29c1afc"
level=debug time=2024-09-20T01:01:43Z msg="Successfully parsed fault request payload" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}"
level=info time=2024-09-20T01:01:43Z msg="[INFO] Black hole port fault has been found running" netns="host" command="iptables -C egress-tcp-1234 -p tcp --dport 1234 -j DROP" output=""
level=info time=2024-09-20T01:01:43Z msg="[INFO] Successfully checked status for fault" requestType="check status network-blackhole-port" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}" response="{\"Status\":\"running\"}"
level=debug time=2024-09-20T01:01:53Z msg="Handling http request" method="DELETE" from="172.31.25.237:52062"
level=info time=2024-09-20T01:01:53Z msg="Received new request for request type: stop network-blackhole-port" tmdsEndpointContainerID="64ca83af-f51b-4e66-acff-d0e3c29c1afc" request="{\"Protocol\":\"tcp\",\"TrafficType\":\"egress\",\"Port\":1234}" requestType="stop network-blackhole-port"
level=debug time=2024-09-20T01:01:53Z msg="Successfully parsed fault request payload" request="{\"Port\":1234,\"Protocol\":\"tcp\",\"TrafficType\":\"egress\"}"
level=info time=2024-09-20T01:01:53Z msg="[INFO] Black hole port fault has been found running" netns="host" command="iptables -C egress-tcp-1234 -p tcp --dport 1234 -j DROP" output=""

Corresponding iptables on host network namespace

[ec2-user@ip-172-31-25-237 amazon-ecs-agent]$ sudo iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
DROP       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:51678
DROP       all  -- !127.0.0.0/8          127.0.0.0/8          ! ctstate RELATED,ESTABLISHED,DNAT

Chain FORWARD (policy DROP)
target     prot opt source               destination         
DOCKER-USER  all  --  0.0.0.0/0            0.0.0.0/0           
DOCKER-ISOLATION-STAGE-1  all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           
[ec2-user@ip-172-31-25-237 amazon-ecs-agent]$ sudo iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
DROP       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:51678
DROP       all  -- !127.0.0.0/8          127.0.0.0/8          ! ctstate RELATED,ESTABLISHED,DNAT

Chain FORWARD (policy DROP)
target     prot opt source               destination         
DOCKER-USER  all  --  0.0.0.0/0            0.0.0.0/0           
DOCKER-ISOLATION-STAGE-1  all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
egress-tcp-1234  all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain egress-tcp-1234 (1 references)
target     prot opt source               destination         
DROP       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:1234
[ec2-user@ip-172-31-25-237 amazon-ecs-agent]$ sudo iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
DROP       tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:51678
DROP       all  -- !127.0.0.0/8          127.0.0.0/8          ! ctstate RELATED,ESTABLISHED,DNAT

Chain FORWARD (policy DROP)
target     prot opt source               destination         
DOCKER-USER  all  --  0.0.0.0/0            0.0.0.0/0           
DOCKER-ISOLATION-STAGE-1  all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

New tests cover the changes: yes

Description for the changelog

Feature: Adding start and stop network black hole port fault implementation

Additional Information

Does this PR include breaking model changes? If so, Have you added transformation functions?

Does this PR include the addition of new environment variables in the README?

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@mye956 mye956 force-pushed the bhp-startstop branch 4 times, most recently from 9fe4569 to da7e8d8 Compare September 20, 2024 00:13
@mye956 mye956 marked this pull request as ready for review September 20, 2024 00:14
@mye956 mye956 requested a review from a team as a code owner September 20, 2024 00:14
@mye956 mye956 changed the title [WIP]Adding start and stop black hole port fault implementation Adding start and stop black hole port fault implementation Sep 20, 2024
ecs-agent/tmds/handlers/fault/v1/handlers/handlers.go Outdated Show resolved Hide resolved
ecs-agent/tmds/handlers/fault/v1/handlers/handlers.go Outdated Show resolved Hide resolved
ecs-agent/tmds/handlers/fault/v1/handlers/handlers.go Outdated Show resolved Hide resolved
cmdOutput, cmdErr := h.stopNetworkBlackHolePort(ctxWithTimeout, aws.StringValue(request.Protocol), port, chainName,
taskMetadata.TaskNetworkConfig.NetworkMode, taskMetadata.TaskNetworkConfig.NetworkNamespaces[0].Path, insertTable)

if err := ctx.Err(); err == context.DeadlineExceeded {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about err != DeadlineExceeded

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, was hoping to check for context timeout has been reached here. All other errors will still be an internal error but will include the specific error in the error response.

ecs-agent/tmds/handlers/fault/v1/handlers/handlers_test.go Outdated Show resolved Hide resolved
@mye956 mye956 force-pushed the bhp-startstop branch 2 times, most recently from b2326b7 to e9709f0 Compare September 23, 2024 21:36
tshan2001
tshan2001 previously approved these changes Sep 23, 2024
@mye956 mye956 force-pushed the bhp-startstop branch 2 times, most recently from 5470641 to cff31c8 Compare September 24, 2024 16:43
@mye956 mye956 merged commit 08341f7 into aws:dev Sep 24, 2024
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants