Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate DRA job configs from a Jinja template #34010

Merged
merged 4 commits into from
Jan 9, 2025

Conversation

bart0sh
Copy link
Contributor

@bart0sh bart0sh commented Dec 19, 2024

  • Implemented job configs generation
  • added make rules to generate and verify generated jobs
  • generated DRA canary jobs

/cc @pohly @kannon92 @SergeyKanzhelev @haircommander

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/config Issues or PRs related to code in /config size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/jobs sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Dec 19, 2024
@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from 7c3a83c to 2f75bbd Compare December 19, 2024 13:37
@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch 3 times, most recently from d030275 to c5999e8 Compare December 19, 2024 14:13
@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch 3 times, most recently from 3259e4d to 499379c Compare December 20, 2024 12:38
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 20, 2024
@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from 499379c to 2e1e253 Compare December 20, 2024 15:08
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 20, 2024
Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very promising.

How to solve indention was my biggest concern when thinking about how to use Jinja. I am not sure whether this is addressed here (need to check test results).

@bart0sh
Copy link
Contributor Author

bart0sh commented Dec 20, 2024

@pohly @kannon92 @SergeyKanzhelev @haircommander

Looks very promising.

Thank you. After fixing review comments, I'm going to remove -pull and -ci yamls from this PR, so we can only test -canary.
It would be great if SIG-Node folks would look at this and confirm that this approach is at least acceptable.

I personally like it. Using it would allow us to

  • have presubmit job for every periodic
  • keep them synchronized
  • easily generate canary jobs for testing purposes (e.g. kubetest2)
  • make less mistakes as job configs are automatically generated
  • do less typing and copypasting :)
    etc.

WDYT guys?

@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch 2 times, most recently from 4234630 to 28eda1b Compare December 20, 2024 21:33
@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from b1ba970 to 17e5c8a Compare January 8, 2025 17:13
hack/generate-jobs.py Outdated Show resolved Hide resolved
@bart0sh
Copy link
Contributor Author

bart0sh commented Jan 8, 2025

@pohly good news - all new -canary jobs succeeded for my test PR.

I triggered -pull jobs to check if both run the same set of test cases.

@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from 17e5c8a to 985c885 Compare January 8, 2025 21:26
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 8, 2025
@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from 985c885 to bdac7d1 Compare January 8, 2025 21:32
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 8, 2025
@bart0sh
Copy link
Contributor Author

bart0sh commented Jan 8, 2025

@pohly I've also fixed inconsistent naming of the crio jobs: *-kubernetes-node-e2e-cgrpv1-crio-dra -> *-kubernetes-node-e2e-crio-cgrpv1-dra. Now names of the new jobs are the same as of the current jobs.

@bart0sh bart0sh force-pushed the PR060-generate-job-configs branch from bdac7d1 to 15ae942 Compare January 9, 2025 14:23
Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/assign @dims

Can you perhaps have a look and approve? You have some context on the usage of Jinja for kops.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 9, 2025
@bart0sh
Copy link
Contributor Author

bart0sh commented Jan 9, 2025

@dims I've presented this solution on SIG-Node CI meeting yesterday and received a generally positive feedback. There are certain concerns that presented solution may work for smaller scope of different jobs and can become harder to maintain when amount of jobs and their variety increases. However, we can't agree or disagree with this without trying, which is what I proposed - to try it out with DRA jobs and then gradually expand to the rest of SIG-Node jobs and people agreed.

/cc @kannon92 @SergeyKanzhelev @haircommander

@dims
Copy link
Member

dims commented Jan 9, 2025

@bart0sh thanks for doing this! it will definitely help us going forward.

/approve
/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bart0sh, dims

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 9, 2025
@pohly
Copy link
Contributor

pohly commented Jan 9, 2025

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 9, 2025
@k8s-ci-robot k8s-ci-robot merged commit c94cb9d into kubernetes:master Jan 9, 2025
8 checks passed
@k8s-ci-robot
Copy link
Contributor

@bart0sh: Updated the job-config configmap in namespace default at cluster test-infra-trusted using the following files:

  • key dra-canary.yaml using file config/jobs/kubernetes/sig-node/dra-canary.yaml
  • key dynamic-resource-allocation.yaml using file ``
  • key dra-ci.yaml using file config/jobs/kubernetes/sig-node/dra-ci.yaml
  • key dra-presubmit.yaml using file config/jobs/kubernetes/sig-node/dra-presubmit.yaml
  • key sig-node-presubmit.yaml using file config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml

In response to this:

  • Implemented job configs generation
  • added make rules to generate and verify generated jobs
  • generated DRA canary jobs

/cc @pohly @kannon92 @SergeyKanzhelev @haircommander

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@@ -145,7 +137,7 @@ periodics:
- '--node-test-args=--feature-gates=DynamicResourceAllocation=true --service-feature-gates=DynamicResourceAllocation=true --runtime-config=api/beta=true --container-runtime-endpoint=unix:///var/run/crio/crio.sock --container-runtime-process-name=/usr/local/bin/crio --container-runtime-pid-file= --kubelet-flags="--cgroup-driver=systemd --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/crio.service --kubelet-cgroups=/system.slice/kubelet.service" --extra-log="{\"name\": \"crio.log\", \"journalctl\": [\"-u\", \"crio\"]}"'
- --node-tests=true
- --provider=gce
- '--test_args=--timeout=1h --label-filter="Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Beta, DynamicResourceAllocation } && !Flaky"'
- '--test_args=--timeout=1h --label-filter="Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Beta, DynamicResourceAllocation } && !Flaky && !Slow"'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed okay, but actually was an intentional difference between periodic and presubmit: in a periodic job it's okay to run longer, but presubmits should be fast.

This is relevant for example for kubernetes/kubernetes#129543 because raising the limit potentially affects a slow test.

=> #34113

@rifelpet
Copy link
Member

👋🏻 from the Kops side, I've been following along to see how you end up integrating the job generation scripts in to CI. I'd like to do the same with kops' build_jobs.py.

Do you all have a preference on if/how I integrate our scripts?

I was thinking of adding new run-in-python-container.sh executions to hack/make-rules/verify/generated-jobs.sh and hack/make-rules/update/generated-jobs.sh.

I can also hold off until after you're confident you dont need to revert.

@bart0sh
Copy link
Contributor Author

bart0sh commented Jan 10, 2025

@rifelpet The way you've proposed looks good to me. You can modify the scripts. All DRA generated jobs are "green" now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/config Issues or PRs related to code in /config area/jobs area/testgrid cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

6 participants