Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-12950 test: auto-determine test tags #13315

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,14 @@ jobs:
ref: ${{ github.event.pull_request.head.sha }}
- name: Check DAOS logging macro use.
run: ./utils/cq/d_logging_check.py --github src

ftest-tags:
name: Ftest tag check
runs-on: ubuntu-22.04
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normal behaviour is to checkout a merge commit with the target branch, this code will checkout the branch itself.

The difference comes in annotating against line numbers, with a merge commit they might be wrong although of course with a branch checkout you're not testing what would be landed. Unless you're annotating the PR against specific lines in files you probably don't want this, if you're just failing the build with logging or annotating against files (which I think is done against "line 0" then you don't want to specify a ref here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The ftest lint just fails with logging + exit(1). No specific lines or files are annotated. So I think I want the default behavior

- name: Check DAOS ftest tags.
run: ./src/tests/ftest/tags.py lint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be protected against PRs where this check has landed but PRs aren't including it yet. I can send you code for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. Do GHA for PRs use the master workflows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check

2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,7 @@ pipeline {
stage('Get Commit Message') {
steps {
script {
env.COMMIT_MESSAGE = sh(script: 'git show -s --format=%B',
env.COMMIT_MESSAGE = sh(script: 'ci/get_commit_message.py --target origin/' + target_branch,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for doing it this way is to prevent needing to modify pipelinelib.

returnStdout: true).trim()
Map pragmas = [:]
// can't use eachLine() here: https://issues.jenkins.io/browse/JENKINS-46988/
Expand Down
127 changes: 127 additions & 0 deletions ci/get_commit_message.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of an API to query the test tags from something, be that a commit message, a string or a target sha etc. I think that's what this file is but if that's the case then it needs to be better documented and probably a clearer interface. I'm also sceptical if re is required or just split(:) would suffice.

Copy link
Contributor Author

@daltonbohning daltonbohning Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also sceptical if re is required or just split(:) would suffice.

To rely on just split(':') we would need logic to loop over each line and match lines starting with Test-tag: and Features: (and case insensitive). Which basically means writing a simplified version of our own re. I don't think that's the way to go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a function to do it in https://github.com/daos-stack/daos/pull/8483/files but it's really very simple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, this is with re

import re

features = re.findall(r'Features: (.*)', message, flags=re.MULTILINE | re.IGNORECASE)
test_tags = re.findall(r'Test-tag: (.*)', message, flags=re.MULTILINE | re.IGNORECASE)

And this is manually:

features = []
test_tags = []
for line in message.lower().splitlines():
    if line.startswith('features:'):
        features.append(line.split(':', maxsplit=1)[1])
    if line.startswith('test-tag:'):
        test_tags.append(line.split(':', maxsplit=1)[1])

Personally, I think re is more readable and more extensible

Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
#!/usr/bin/env python3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: add copyright

"""Get the latest commit message with some modifications."""

import importlib.util
import os
import re
import subprocess # nosec
from argparse import ArgumentParser

PARENT_DIR = os.path.dirname(__file__)


# Dynamically load the tags
tags_path = os.path.realpath(
os.path.join(PARENT_DIR, '..', 'src', 'tests', 'ftest', 'tags.py'))
tags_spec = importlib.util.spec_from_file_location('tags', tags_path)
tags = importlib.util.module_from_spec(tags_spec)
tags_spec.loader.exec_module(tags)
Comment on lines +13 to +18
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is to use sys.path



def git_commit_message():
"""Get the latest git commit message.

Returns:
str: the commit message
"""
result = subprocess.run(
['git', 'show', '-s', '--format=%B'],
stdout=subprocess.PIPE, check=True, cwd=PARENT_DIR)
return result.stdout.decode().rstrip('\n')


def git_root_dir():
"""Get the git root directory.

Returns:
str: the root directory path
"""
result = subprocess.run(
['git', 'rev-parse', '--show-toplevel'],
stdout=subprocess.PIPE, check=True, cwd=PARENT_DIR)
return result.stdout.decode().rstrip('\n')


def git_files_changed(target):
"""Get a list of files from git diff.

Args:
target (str): target branch or commit.

Returns:
list: absolute paths of modified files
"""
git_root = git_root_dir()
result = subprocess.run(
['git', 'diff', target, '--name-only', '--relative'],
stdout=subprocess.PIPE, cwd=git_root, check=True)
return [os.path.join(git_root, path) for path in result.stdout.decode().split('\n') if path]
Comment on lines +54 to +58
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This root dir logic is so you could technically call this from anywhere in the repo

Comment on lines +54 to +58
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct in the case where the branch isn't up-to-date with the target. It needs logic similar to find_base.sh



def modify_commit_message_pragmas(commit_message, target):
"""Modify the commit message pragmas.

TODO: if a commit already has Test-tag, do not overwrite. Just comment the suggested.

Args:
commit_message (str): the original commit message
target (str): where the current branch is intended to be merged

Returns:
str: the modified commit message
"""
modified_files = git_files_changed(target)

rec_tags = tags.files_to_tags(modified_files)
commit_tags = set()

# Extract all "Features" and "Test-tag"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipeline lib doesn't support both Features and Test-tag so this combines them into one

feature_tags = re.findall(
r'^Features:(.*)$', commit_message, flags=re.MULTILINE | re.IGNORECASE)
if feature_tags:
for _tags in feature_tags:
commit_tags.update(filter(None, _tags.split(' ')))
commit_message = re.sub(
r'^Features:.*$', '', commit_message, flags=re.MULTILINE | re.IGNORECASE)
test_tags = re.findall(
r'^Test-tag:(.*)$', commit_message, flags=re.MULTILINE | re.IGNORECASE)
if test_tags:
for _tags in test_tags:
commit_tags.update(filter(None, _tags.split(' ')))
commit_message = re.sub(
r'^Test-tag:.*$', '', commit_message, flags=re.MULTILINE | re.IGNORECASE)

# Put "Test-tag" after the title
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think common use (even outside of our team) is that pragmas (i.e. GH's Fixes: etc. go at the end of a commit message, before the SoB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a requirement? It's much easier to insert after the title because we don't have to look for a SoB. And personally, I prefer pragmas up front in my own commit messages so they are clearly visible. Though, this modified commit message would only be in Jenkins, not the actual commit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always put them at the end. Put the human-readable bit at the beginning where it can be seen and the machine readable bit at the end out of the way.

Like Dalton says however this doesn't appear to matter but it does highlight that the way this code is communicating with ftest is something we might be able to improve.

commit_message_split = commit_message.splitlines()
commit_message_split.insert(1, '')
if test_tags:
# Keep the commit tags but leave a comment with the recommended
commit_message_split.insert(2, f'Test-tag: {" ".join(sorted(commit_tags))}')
commit_message_split.insert(3, '# Auto-recommended Test-tag')
commit_message_split.insert(4, f'# Test-tag: {" ".join(sorted(rec_tags))}')
commit_message_split.insert(5, '')
else:
# If there were "Feature" tags, merge into the recommended.
# Otherwise, just use the recommended.
rec_tags.update(commit_tags)
commit_message_split.insert(2, '# Auto-recommended Test-tag')
commit_message_split.insert(3, f'Test-tag: {" ".join(sorted(rec_tags))}')
commit_message_split.insert(4, '')
Comment on lines +97 to +109
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to shake out the details but the idea is

  1. Features and Test-tag are not both supported. If the commit has both, merge as a convenience. I've seen people try to specify both.
  2. If the commit has Test-tag, just leave a comment with the recommended. This allows people to override for whatever reason. E.g. when trying iterating on a broken test and not wanting to run everything for every commit.

The end result is that the modified commit never has Features. Only Test-tag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "supported" this is all our code I assume, there's some logic that we own in launch.py that extracts this and passes it to the avocado command line?

I'm not saying this is wrong or requesting changes but just trying to understand the requirements and limitations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. pipeline-lib extracts commit pragmas and converts the Features and Test-tag to a single line that gets passed to launch.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, pipeline-lib is also within our control but harder to modify than here certainly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it harder to modify? It even tests the modifications being made with all currently supported branches to try to prevent unexpected regressions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a separate repo, so I'd have to first update pipeline lib and verify it's working

Yes. Always have to verify changes being made work as intended. But this isn't really any different than if it were on the same project.

and backwards compatible with other branches.

pipeline-lib's CI does that for you. It tests the pipeline-lib against daos' master, release/2.4, weekly-testing and weekly-2.4-testing branches. Might want to think about adding the new daily* branches in there but just those 4 branches is pretty good coverage. You can even define a Test-tag: you want to be used during all of that testing.

And I would still need a daos PR to verify expected behavior of passing both pragmas.

Again, that sounds like it's just a normal process of testing/validating one's changes, regardless of which repo they are in.

Anyway, just hoping to clear up any misconceptions on the perceived difficulty of updating pipeline-lib.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Always have to verify changes being made work as intended. But this isn't really any different than if it were on the same project.

  1. By using just the daos repo, I can run and debug some code locally to verify it's working. Testing pipeline-lib code locally is virtually impossible.
  2. I would need TWO PRS - one in pipeline-lib, and then one in daos that uses the pipeline-lib one. And every time I update the pipeline-lib PR, I would have to retrigger my daos PR.

And I would still need a daos PR to verify expected behavior of passing both pragmas.

Again, that sounds like it's just a normal process of testing/validating one's changes, regardless of which repo they are in.

I disagree. If I have a daos PR, I only need ONE PR, and I only need to verify my changes on that PR for that ONE branch. With pipeline-lib, I would have to verify all supported branches. Sure, pipeline-lib has automated steps for some of this, but it's still more complex that just ONE daos PR. And I would still have to manipulate the Test-tag and Features commit pragmas for those auto-spawned jobs to verify it works. AND I would still need a separate daos PR to verify it works in "PR mode", since pipeline-lib has different behavior depending on whether it's a timed build, PR, upstream, etc. AND it takes much longer to wait for testing from 4 branches than for one branch.

Another benefit to having just a daos PR is that the change is atomic. One PR, one commit - instead of co-dependent PRs in both daos and pipeline-lib.

In all, I find pipeline-lib incredibly cumbersome to work with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all, I find pipeline-lib incredibly cumbersome to work with.

Not disagreeing that it's certainly more cumbersome than having everything in a single repo on a single branch. But then when you do branch you have to start duplicating changes across branches. The exact reason libraries were invented.

That said, library is not really even our primary reason for pipeline-lib (as much as we do benefit from it being a common library to many projects and branches and greatly reducing duplication). It's primary reason for existence is Jenkins itself and it's limitations. We've had to move soooo much code out of a daos branch's Jenkinsfile into pipeline-lib just because there is an upper limit to the size of Jenkinsfile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then when you do branch you have to start duplicating changes across branches. The exact reason libraries were invented.

But proper libraries are versioned and you can specify which version you want to use. With pipeline-lib, all daos branches are forced to use the same version. Which goes back to the previous problem of having to make sure all active branches still work.
I don't think putting logic in the daos repo is duplicating code - it's just versioned for different branches. We don't want to do the exact same thing for every branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is an upper limit to the size of Jenkinsfile

FWIW I don't think it's the Jenkinsfile that has a limitation on size. It's the pipeline object that is limited in size. But either way, this is an external python script and doesn't affect that.

return os.linesep.join(commit_message_split)


def main():
parser = ArgumentParser()
parser.add_argument(
"--target",
help="if given, used to modify commit pragmas")
args = parser.parse_args()

if args.target:
print(modify_commit_message_pragmas(git_commit_message(), args.target))
else:
print(git_commit_message())
Comment on lines +115 to +123
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option to run with --target is just in case we need a failsafe if something goes wrong in the tag logic



if __name__ == '__main__':
main()
8 changes: 7 additions & 1 deletion debian/changelog
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this file is needed in the RPM but I'd back out this change for now to avoid conflicts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, it would useful to have this utility available to probe for things like "what tags do I need to run this test"

Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
daos (2.5.100-10) unstable; urgency=medium
daos (2.5.100-11) unstable; urgency=medium
[ Dalton Bohning ]
* Add tags.py

-- Dalton Bohning <[email protected]> Wed, 08 Nov 2023 12:00:00 -0500

daos (2.5.100-10) unstable; urgency=medium
[ Phillip Henderson ]
* Move verify_perms.py location

Expand Down
1 change: 1 addition & 0 deletions src/container/srv_container.c
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
/**
* \file
*
* dummy change
* ds_cont: Container Operations
*
* This file contains the server API methods and the RPC handlers that are both
Expand Down
1 change: 1 addition & 0 deletions src/tests/ftest/datamover/dst_create.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# Dummy change
hosts:
test_servers: 1
test_clients: 1
Expand Down
2 changes: 2 additions & 0 deletions src/tests/ftest/dfuse/mu_mount.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
from dfuse_test_base import DfuseTestBase
from run_utils import command_as_user, run_remote

# Dummy change


class DfuseMUMount(DfuseTestBase):
"""Verifies multi-user dfuse mounting"""
Expand Down
2 changes: 2 additions & 0 deletions src/tests/ftest/dfuse/mu_perms.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
from run_utils import command_as_user, run_remote
from user_utils import get_chown_command

# Dummy change


class DfuseMUPerms(DfuseTestBase):
"""Verify dfuse multi-user basic permissions."""
Expand Down
10 changes: 5 additions & 5 deletions src/tests/ftest/nvme/object.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ def container_read(container, array_size=None):
# read written objects and verify
container.read_objects()

def test_runner(self, namespace, record_size, array_size, thread_per_size=4):
def run_test(self, namespace, record_size, array_size, thread_per_size=4):
"""Perform simultaneous writes of varying record size to a container.
Args:
Expand Down Expand Up @@ -143,7 +143,7 @@ def test_nvme_object_single_pool(self):
:avocado: tags=NvmeObject,test_nvme_object_single_pool
"""
# perform multiple object writes to a single pool
self.test_runner("/run/pool_1/*", self.record_size[:-1], 0, self.array_size)
self.run_test("/run/pool_1/*", self.record_size[:-1], 0, self.array_size)
report_errors(self, self.errors)

@avocado.fail_on(DaosApiError)
Expand All @@ -165,7 +165,7 @@ def test_nvme_object_multiple_pools(self):
:avocado: tags=NvmeObject,test_nvme_object_multiple_pools
"""
# thread to perform simultaneous object writes to multiple pools
runner_manager = ThreadManager(self.test_runner, self.get_remaining_time() - 30)
runner_manager = ThreadManager(self.run_test, self.get_remaining_time() - 30)
runner_manager.add(
namespace='/run/pool_1/*', record_size=self.record_size, array_size=self.array_size)
runner_manager.add(
Expand All @@ -178,6 +178,6 @@ def test_nvme_object_multiple_pools(self):
self.errors.append(result.result)
report_errors(self, self.errors)

# run the test_runner after cleaning up all the pools for large nvme_pool size
self.test_runner("/run/pool_3/*", self.record_size, self.array_size)
# run again after cleaning up all the pools for large nvme_pool size
self.run_test("/run/pool_3/*", self.record_size, self.array_size)
report_errors(self, self.errors)
Loading