Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect volume metrics and volume clean up for EBS-backed tasks #3929

Merged
merged 1 commit into from
Oct 9, 2023

Conversation

mye956
Copy link
Contributor

@mye956 mye956 commented Sep 26, 2023

Summary

This PR will introduce the functionality to collect volume metrics for EBS-backed tasks via the CSI driver as well as task clean up for EBS-backed task.

Implementation details

Changes made to collect Volume metrics in EBS-backed Tasks

Implemented getEBSVolumeMetrics() and fetchEBSVolumeMetrics() where it'll collect the volume metrics via NodeGetVolumeStats from the CSI driver for each EBS-backed tasks. The getEBSVolumeMetrics() will be invoked when getting all of the other task and instance metrics in GetInstanceMetrics().

IsEBSTaskAttachEnabled() has also been implemented and will scan for any EBS volume configuration within the list of volumes of a task. Although it's being called in AddTask() within a check, it will temporarily always return false for all tasks (i.e. consider it not EBS-backed task) in order to avoid invoking the CSI driver functionality. This will be fixed and addressed in another PR.

List of files changed:

  • agent/api/task/task.go: Added functionality to IsEBSTaskAttachEnabled()
  • agent/engine/docker_task_engine.go: Temporarily assume that all tasks are not EBS-backed, Will be fixed in a future PR
  • agent/stats/engine.go
  • agent/stats/engine_test.go
  • agent/stats/engine_unix.go: Essentially calls upon the CSI driver to run NodeGetVolumeStats on all EBS volumes attached to EBS-backed tasks
  • agent/stats/engine_unix_test.go
  • agent/stats/engine_windows.go
  • ecs-agent/csiclient/csi_client.go

Changes made to clean up EBS-backed tasks

Implemented UnstageVolumes to unmount the host mountpoint for all EBS volumes on a EBS-backed task. This will happen on task clean up.

List of files changed:

  • agent/engine/task_manager.go

Testing

Added new unit test.

TestFetchEBSVolumeMetrics: Test that the new fetchEBSVolumeMetrics function is working as intended. As a TODO, we'll need to add a unhappy test case as well.

TODO: Add unit test for NodeUnstage in a follow up PR.

Manual testing:
We've manually mounted/staged an EBS volume onto a testing EC2 instance with agent running and tried calling the getEBSVolumeMetrics functionality.
Agent logs to get volume metrics

level=debug time=2023-09-26T04:14:54Z msg="Fetching EBS volume metrics..."
level=debug time=2023-09-26T04:14:54Z msg="Is an ebs volume configuration"
level=debug time=2023-09-26T04:14:54Z msg="Found volume usage" UsedBytes=108707840 TotalBytes=10726932480
level=debug time=2023-09-26T04:14:54Z msg="Ignore inodes key"
level=debug time=2023-09-26T04:14:54Z msg="EBS TACS Metrics collected! UsedBytes: 1.0870784e+08 TotalBytes: 1.072693248e+10"
level=debug time=2023-09-26T04:14:54Z msg="sent telemetry message" module=engine.go

CSI driver container NodeGetVolumeStats logs

I1009 18:49:17.146637       1 node.go:303] "NodeGetVolumeStats: called" args={"volume_id":"vol-0f00d7ebcc1c0993f","volume_path":"/mnt/ecs/ebs/mocktaskID_vol-0f00d7ebcc1c0993f"}
I1009 18:49:37.145296       1 node.go:303] "NodeGetVolumeStats: called" args={"volume_id":"vol-0f00d7ebcc1c0993f","volume_path":"/mnt/ecs/ebs/mocktaskID_vol-0f00d7ebcc1c0993f"}

Agent logs to unstage volume

level=debug time=2023-10-09T18:17:16Z msg="No more tasks could be started at this moment, waiting"
level=debug time=2023-10-09T18:17:16Z msg="Successfully unstaged volume" Task="web-app:2 arn:aws:ecs:us-west-2:113424923516:task/evm-test-gamma/20b60545e5214c43818e73aa8f95550a, TaskStatus: (STOPPED->STOPPED) N Containers: 1, N ENIs 0"

CSI driver container NodeUnstageVolume logs

I1009 18:49:39.828644       1 node.go:253] "NodeUnstageVolume: called" args={"volume_id":"vol-0f00d7ebcc1c0993f","staging_target_path":"/mnt/ecs/ebs/mocktaskID_vol-0f00d7ebcc1c0993f"}
I1009 18:49:39.828876       1 node.go:293] "NodeUnstageVolume: unmounting" target="/mnt/ecs/ebs/mocktaskID_vol-0f00d7ebcc1c0993f"
I1009 18:49:39.828894       1 mount_helper_common.go:93] unmounting "/mnt/ecs/ebs/mocktaskID_vol-0f00d7ebcc1c0993f" (corruptedMount: false, mounterCanSkipMountPointChecks: true)
I1009 18:49:39.828908       1 mount_linux.go:360] Unmounting /mnt/ecs/ebs/mocktaskID_vol-0f00d7ebcc1c0993f
I1009 18:49:39.855136       1 mount_helper_common.go:150] Warning: deleting path "/mnt/ecs/ebs/mocktaskID_vol-0f00d7ebcc1c0993f"
I1009 18:49:39.855207       1 node.go:298] "NodeUnStageVolume: successfully unstaged volume" volumeID="vol-0f00d7ebcc1c0993f" target="/mnt/ecs/ebs/mocktaskID_vol-0f00d7ebcc1c0993f"
I1009 18:49:39.855226       1 node.go:268] "NodeUnStageVolume: volume operation finished" volumeID="vol-0f00d7ebcc1c0993f"

New tests cover the changes: Yes

Description for the changelog

  • Collect volume metrics for EBS-backed tasks and clean up EBS-backed tasks

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@mye956 mye956 force-pushed the ebstacs branch 15 times, most recently from 0152e0d to 071d151 Compare September 27, 2023 21:32
@mye956 mye956 marked this pull request as ready for review September 27, 2023 21:32
@mye956 mye956 requested a review from a team as a code owner September 27, 2023 21:32
@mye956 mye956 changed the title [WIP] Collect volume metrics for EBS-backed tasks Collect volume metrics for EBS-backed tasks Sep 28, 2023
xxx0624
xxx0624 previously approved these changes Sep 29, 2023
agent/ebs/watcher.go Show resolved Hide resolved
agent/ebs/watcher.go Outdated Show resolved Hide resolved
agent/stats/engine_unix.go Show resolved Hide resolved
ecs-agent/api/resource/ebs_discovery.go Outdated Show resolved Hide resolved
ecs-agent/api/resource/ebs_discovery_linux.go Outdated Show resolved Hide resolved
ecs-agent/api/resource/ebs_discovery_linux.go Outdated Show resolved Hide resolved
fierlion
fierlion previously approved these changes Oct 2, 2023
@mye956 mye956 force-pushed the ebstacs branch 2 times, most recently from 896cb0f to 21af0e0 Compare October 4, 2023 23:10
@mye956 mye956 changed the title Collect volume metrics for EBS-backed tasks Collect volume metrics and volume clean up for EBS-backed tasks Oct 4, 2023
@amogh09
Copy link
Contributor

amogh09 commented Oct 6, 2023

Can we please fill out the "Testing" section in the PR description.

agent/engine/task_manager.go Outdated Show resolved Hide resolved
agent/engine/task_manager.go Outdated Show resolved Hide resolved
agent/engine/task_manager.go Outdated Show resolved Hide resolved
agent/engine/task_manager.go Outdated Show resolved Hide resolved
agent/stats/engine_unix.go Show resolved Hide resolved
agent/stats/engine_unix_test.go Show resolved Hide resolved
@mye956 mye956 force-pushed the ebstacs branch 3 times, most recently from b959f00 to 194e2f0 Compare October 7, 2023 00:45
@mye956 mye956 added the bot/test label Oct 7, 2023
@mye956 mye956 added the bot/test label Oct 9, 2023
@mye956 mye956 merged commit f70a1a1 into aws:dev Oct 9, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants