-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-14654 test: simplify ior_per_rank.py #13346
Conversation
Bug-tracker data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Build on Leap 15.4 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13346/1/execution/node/384/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13346/1/execution/node/340/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13346/1/execution/node/335/log |
Test stage Build RPM on Leap 15.4 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13346/1/execution/node/332/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13346/1/execution/node/408/log |
Test-tag: test_ior_per_rank Skip-unit-tests: true Skip-fault-injection-test: true - Only run with transfer size 1M - Reduce stonewall to 15s Required-githooks: true Signed-off-by: Dalton Bohning <[email protected]>
9039c46
to
5a96c4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test-tag: test_ior_per_rank Skip-unit-tests: true Skip-fault-injection-test: true Required-githooks: true Signed-off-by: Dalton Bohning <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
transfer_sizes: | ||
- 1M | ||
- 256B | ||
transfer_size: 1M |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the wiki page: https://daosio.atlassian.net/wiki/spaces/DAOS/pages/11136040981/Running+Rack+Group+Level+Tests
This will have reference to 256B.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. I will update the wiki after this merges
good_node = self.server_managers[0].get_host(rank) | ||
if ((good_node not in self.good_nodes) | ||
and (good_node not in self.failed_nodes)): | ||
self.good_nodes.append(good_node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we print the failed nodes performance? The output has failed nodes information without the node performance information. Someone has to go over the job.log to find the write/read performance now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I might as well while I'm here. Will push an update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added better reporting, but I'll need to verify it works
@@ -33,52 +33,46 @@ def execute_ior_per_rank(self, rank): | |||
# create the pool on specified rank. | |||
self.add_pool(connect=False, target_list=[rank]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that creating a container outside the IOR doesn't cause pool destroy failures. I think I tried something like this:
self.add_pool(connect=False, target_list=[rank])
self.add_container(self.pool)
.... On IOR commands create_cont is always set False.
try:
self.ior_cmd.transfer_size.update(transfer_size)
self.ior_cmd.flags.update(self.write_flags)
dfs_out = self.run_ior_with_pool(create_cont=False,fail_on_warning=self.log.info)
dfs_perf_write = IorCommand.get_ior_metrics(dfs_out)
self.ior_cmd.flags.update(self.read_flags)
dfs_out = self.run_ior_with_pool(create_cont=False, fail_on_warning=self.log.info)
In this way, I didn't notice pool_destroy having problem on large scale. It looks like creating containers within IOR for each rank and destroying them and pool has some issues on large scale testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Seems another issue with pydaos handling. If we let run_ior_with_pool
create the container, it calls pool.connect()
. I'll have the test create it here, which I think is a better workflow anyway. Thanks for finding that!
Test-tag: test_ior_per_rank Test-repeat: 2 Skip-unit-tests: true Skip-fault-injection-test: true Required-githooks: true Signed-off-by: Dalton Bohning <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Required-githooks: true
Test-tag: test_ior_per_rank Test-repeat: 2 Skip-unit-tests: true Skip-fault-injection-test: true Required-githooks: true Signed-off-by: Dalton Bohning <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13346/5/testReport/ |
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13346/5/testReport/ |
@rpadma2 This is an example of what failures look like now. It gives the host, rank, bandwidth, and percent diff
|
Test-tag: test_ior_per_rank Test-repeat: 2 Skip-unit-tests: true Skip-fault-injection-test: true Required-githooks: true Signed-off-by: Dalton Bohning <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13346/7/testReport/ |
Test passed, except for the MD on SSD stage which doesn't have any coverage outside of PRs. |
@rpadma2 Do the current changes look okay to you? |
Test-tag: test_ior_per_rank Test-repeat: 2 Skip-unit-tests: true Skip-fault-injection-test: true Required-githooks: true
Test-tag: test_ior_per_rank Test-repeat: 2 Skip-unit-tests: true Skip-fault-injection-test: true Signed-off-by: Dalton Bohning <[email protected]>
Test-tag: test_ior_per_rank Test-repeat: 2 Skip-unit-tests: true Skip-fault-injection-test: true Required-githooks: true Signed-off-by: Dalton Bohning <[email protected]>
Required-githooks: true
Test-tag: test_ior_per_rank Test-repeat: 2 Skip-unit-tests: true Skip-fault-injection-test: true Required-githooks: true Signed-off-by: Dalton Bohning <[email protected]>
- Only run with transfer size 1M - Reduce stonewall to 15s Required-githooks: true Signed-off-by: Dalton Bohning <[email protected]>
- Only run with transfer size 1M - Reduce stonewall to 15s Signed-off-by: Dalton Bohning <[email protected]>
Test-tag: test_ior_per_rank
Skip-unit-tests: true
Skip-fault-injection-test: true
Required-githooks: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: