-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add check network packet loss implementation #4322
Conversation
f89a1d0
to
588af68
Compare
5b987b0
to
e521383
Compare
ce5220f
to
912072f
Compare
// runExecCommand wraps around the execwrapper, providing a convenient way of running any Linux command | ||
// and getting the result in both stdout and stderr. | ||
func (h *FaultHandler) runExecCommand(ctx context.Context, linuxCommandString string) ([]byte, error) { | ||
cmdExec := h.osExecWrapper.CommandContext(ctx, "/bin/sh", "-c", linuxCommandString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure if the /bin/sh
path exists within the agent container. Will need to double check this. Is it possible to make the call without this just like with the standard os/exec
go library? (e.g. exec.Command("ls -al")
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I just pushed a new revision that removes "/bin/sh" from the command. Also re-ran all the manual testing to make sure that it still works.
} | ||
// Log the command output to better help us debug. | ||
logger.Info(fmt.Sprintf("%s command result: %s", tcCheckInjectionCommandComposed, string(cmdOutput[:]))) | ||
var outputUnmarshalled []map[string]interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Instead of a parsing/unmarshaling the output into a map[string]interface{}
, I wonder if we can parse it into a struct and then get only the needed json keys from the output? (if it's not possible/output is not deterministic then please disregard).
Just ran tc -j q show dev eth0 parent 1:1
locally and it seems to output the following:
[{
"kind": "netem",
"handle": "10:",
"parent": "1:1",
"options": {
"limit": 1000,
"loss-random": {
"loss": 0.5,
"correlation": 0
},
"ecn": false,
"gap": 0
}
}]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tc -j q show dev eth0
does not always have all the fields. For example, when running on a host that doesn't have fault injected, it can have the following outout:
{"kind":"pfifo_fast","handle":"0:","parent":":1","options":{"bands":3,"priomap":[1,2,2,2,1,2,0,0,1,1,1,1,1,1,1,1],"multiqueue":false}}]
There's no loss-random field here. Also, when specifying parent 1:1
, the output can also be []
.
Unmarshalling it into a map[string]interface{}
and indexing the json struct iteratively would allow us to better catch the nil
when parsing it
// and getting the result in both stdout and stderr. | ||
func (h *FaultHandler) runExecCommand(ctx context.Context, linuxCommandString string) ([]byte, error) { | ||
commandArray := strings.Split(linuxCommandString, " ") | ||
cmdExec := h.osExecWrapper.CommandContext(ctx, commandArray[0], commandArray[1:]...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Should there be a cmdExec.Run()
call somewhere in this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmdExec.CombinedOutput()
has output signature ([]byte, error)
, which is the same as the output of this method. We return the result of cmdExec.CombinedOutput()
directly. CombinedOutput() is a generic method in os/exec. It runs the command and gives the stdout and stderr result at the same time (whereas Output() will only give stdout when exit code was 0, otherwise stdout will be nil)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see now
// The command above gives the output of "tc q show dev {INTERFACE} parent 1:1" in json format. | ||
// We will then unmarshall the json string and evaluate the fields of it. | ||
tcCheckInjectionCommandComposed := nsenterPrefix + fmt.Sprintf(tcCheckInjectionCommandString, interfaceName) | ||
fmt.Println(tcCheckInjectionCommandComposed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use logger instead.
fmt.Println(tcCheckInjectionCommandComposed) | ||
cmdOutput, err := h.runExecCommand(ctx, tcCheckInjectionCommandComposed) | ||
if err != nil { | ||
return false, errors.New("failed to check network-packet-loss-fault: " + string(cmdOutput[:]) + err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the cmdOutput be nil if the err is not empty?
ipSources := request.Sources | ||
// If task's network mode is awsvpc, we need to run nsenter to access the task's network namespace. | ||
nsenterPrefix := "" | ||
if networkMode == "awsvpc" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - we have const var for awsvpc
in ECS ie
func New(networkMode string, netNSs ...*NetworkNamespace) (*TaskNetworkConfig, error) { |
var responseBody types.NetworkFaultInjectionResponse | ||
var stringToBeLogged string | ||
var httpStatusCode int | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we fire a metric here if the err is not empty? If this happens, it means there is something wrong in our DP and not customer/client issue.
logger.Info(fmt.Sprintf("%s command result: %s", tcCheckIPCommandComposed, string(cmdOutput[:]))) | ||
allIPAddressesInRequestExist := true | ||
for _, ipAddress := range ipSources { | ||
ipAddressInHex, err := convertIPAddressToHex(*ipAddress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use aws.StringValue()
instead? for the *ipAddress
invalidNetworkMode = "invalid" | ||
tcLatencyFaultExistsCommandOutput = `[{"kind":"netem","handle":"10:","parent":"1:1","options":{"limit":1000,"delay":{"delay":0.1,"jitter":0,"correlation":0},"ecn":false,"gap":0}}]` | ||
tcLossFaultExistsCommandOutput = `[{"kind":"netem","handle":"10:","dev":"eth0","parent":"1:1","options":{"limit":1000,"loss-random":{"loss":0.06,"correlation":0},"ecn":false,"gap":0}}]` | ||
tcLossFaultDoesNotExistCommandOutput = `[{"kind":"dummyname"}]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably one more test case without kind
Output() ([]byte, error) | ||
CombinedOutput() ([]byte, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add descriptions for these (and existing) methods to let users know what to expect.
ipSources := request.Sources | ||
// If task's network mode is awsvpc, we need to run nsenter to access the task's network namespace. | ||
nsenterPrefix := "" | ||
if networkMode == "awsvpc" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using an existing constant?
amazon-ecs-agent/ecs-agent/api/ecs/model/ecs/api.go
Lines 25843 to 25844 in 78a2bf0
// NetworkModeAwsvpc is a NetworkMode enum value | |
NetworkModeAwsvpc = "awsvpc" |
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) | ||
defer cancel() | ||
|
||
interfaceName := taskMetadata.TaskNetworkConfig.NetworkNamespaces[0].NetworkInterfaces[0].DeviceName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably out of scope for this PR but will Agent always populate device name in TaskResponse
returned by GetTaskMetadata
method? I think we would want to restrict that to fault injection use case only since computing device name is costly, at least for host network mode tasks.
// The command above gives the output of "tc q show dev {INTERFACE} parent 1:1" in json format. | ||
// We will then unmarshall the json string and evaluate the fields of it. | ||
tcCheckInjectionCommandComposed := nsenterPrefix + fmt.Sprintf(tcCheckInjectionCommandString, interfaceName) | ||
fmt.Println(tcCheckInjectionCommandComposed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably don't want this
var stringToBeLogged string | ||
var httpStatusCode int | ||
if err != nil { | ||
responseBody = types.NewNetworkFaultInjectionErrorResponse(err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err
contains the underlying error returned by the Linux commands. Should we return a more deterministic error to the caller instead?
if lossRandom := options.(map[string]interface{})["loss-random"]; lossRandom != nil { | ||
if loss := lossRandom.(map[string]interface{})["loss"]; loss != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO Since the output is coming from a command I think we should be a bit more defensive in our parsing. Currently if any of these type assertions fail then Agent will panic. An example output is [{"kind":"netem","options":{"loss-random":5}}]
.
How about checking the type assertion success using something like below?
if lossRandom, ok := options.(map[string]interface{}); ok {}
Summary
Adding implementation for
CheckNetworkPacketLoss()
and corresponding unit tests. Also updating theos/exec
wrapper to include 2 more helper methods to facilitate mocks in unit testing.Implementation details
After the TMDS server receives the request to check for network packet loss, the following Linux command will be executed:
The output format will be a byte array of a json string. The output will then be unmarshalled and we will check whether the following exists:
Testing
Unit tests for the TMDS package was run.
New tests cover the changes:
2 unhappy test cases were added:
Manual Testing
Launched a Fargate Instance with the changes. Launched a task with ecs-exec enabled.
Description for the changelog
Add check network packet loss implementation
Additional Information
Does this PR include breaking model changes? If so, Have you added transformation functions?
Does this PR include the addition of new environment variables in the README?
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.