Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change to new Aventer Mesos URL #4290

Merged
merged 11 commits into from
Dec 3, 2022
Merged

Conversation

adamnovak
Copy link
Member

@adamnovak adamnovak commented Nov 29, 2022

This ought to fix #4289.

Changelog Entry

To be copied to the draft changelog by merger:

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/{cliOptions,cwl,wdl}.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passes tests.
  • Make sure the PR has been reviewed since its last modification. If not, review it.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

@adamnovak
Copy link
Member Author

I'm also glomming a fix for #4282 onto here since I am messing about with deb dependencies anyway.

@adamnovak adamnovak mentioned this pull request Nov 29, 2022
19 tasks
@adamnovak
Copy link
Member Author

adamnovak commented Nov 30, 2022

OK, I had to adjust https://github.com/vgteam/dind to do some cgroup surgery on cgroups v2 machines, to get all the processes in the container out of the container's root cgroup, and into another cgroup, to allow new containers under the container's root cgroup to be created, because processes and cgroups with limits on them can't exist in the same cgroup under cgroups v2.

Then I rebuilt the Toil and vg CI prebake containers.

I also had to upgrade Singularity to one that works with cgroups v2, which meant I had to upgrade the base Ubuntu image to ine with a new enough libc for the newer Singularity builds.

@adamnovak adamnovak force-pushed the issues/4289-fix-apt-source branch from ed6504b to 6b5a591 Compare November 30, 2022 21:32
@adamnovak
Copy link
Member Author

adamnovak commented Nov 30, 2022

Now I have Singularity working if the Kubernetes pod is privileged , or if it has an unconfined AppArmor profile applied, but if not I get:

cd
singularity build -s -F ./sandbox docker://ubuntu:20.04
'singularity' -d -v 'exec' '-w' '-u' '-B' '' '--pwd' '/mnt' './sandbox' ls -lah /
VERBOSE [U=0,P=532]        user_namespace_init()         Create user namespace
VERBOSE [U=0,P=532]        create_namespace()            Create user namespace
VERBOSE [U=65534,P=532]    init()                        Spawn master process
DEBUG   [U=65534,P=532]    setup_userns_mappings()       Write deny to setgroups file
DEBUG   [U=65534,P=532]    setup_userns_mappings()       Write to GID map
DEBUG   [U=65534,P=532]    setup_userns_mappings()       Write to UID map
DEBUG   [U=0,P=551]        set_parent_death_signal()     Set parent death signal to 9
VERBOSE [U=0,P=551]        create_namespace()            Create mount namespace
ERROR   [U=0,P=551]        shared_mount_namespace_init() Failed to set mount propagation: Permission denied

This is related to apptainer/singularity#5857 and to https://gitlab.nrp-nautilus.io/prp/nautilus-cluster/-/issues/570

The default Ubuntu AppArmor profiles on the new host nodes break Singularity (at least on cgroups v2). I think I have to turn apparmor off on the nodes because we can't just annotate all the pods. I don't think the old RMP-based base AMI has AppArmor on?

@adamnovak
Copy link
Member Author

I've modified the ASG templates for the cluster, but the nodes need to scale away and come back again before Singularity will work again, I think.

@adamnovak
Copy link
Member Author

Now I have set up AppArmor so it can allow Singularity to start in Docker containers: adamnovak/gi-kubernetes-autoscaling-config@b81b482

It might still not work under Toil at which point I can try blanket allowing "mount".

Comment on lines +164 to +169
# CPU affinity may limit the size
affinity_size: Union[float, int] = float('inf')
if hasattr(os, 'sched_getaffinity'):
try:
logger.debug('CPU affinity available')
affinity_size = len(os.sched_getaffinity(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using len(psutil.Process().cpu_affinity() which is supported for Linux, Windows, FreeBSD (and we already use psutil)

https://psutil.readthedocs.io/en/latest/index.html#psutil.Process.cpu_affinity

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@DailyDreaming DailyDreaming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for this!

@@ -83,7 +84,7 @@ def heredoc(s):
motd = ''.join(l + '\\n\\\n' for l in motd.splitlines())

print(heredoc('''
FROM ubuntu:20.04
FROM ubuntu:22.04
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice...

@adamnovak
Copy link
Member Author

OK, using this and a general mount allow rule for AppArmor for Docker containers, I am able to run kubectl run --limits='cpu=4000m,memory=4Gi,ephemeral-storage=16Gi' toil-test --rm -i --tty --overrides='{"spec": {"affinity": {"nodeAffinity": {"requiredDuringSchedulingIgnoredDuringExecution": {"nodeSelectorTerms": [{"matchExpressions": [{"key": "kubernetes.io/hostname", "operator": "NotIn", "values": ["ip-172-31-51-233.us-west-2.compute.internal"]}]}]}}}}}' --image quay.io/ucsc_cgl/toil:5.8.0a1-faf4ce788afa2937c987b88877c69143bf84f67d-py3.9 --command -- bash -c "cd ; wget https://github.com/common-workflow-language/cwl-v1.2/raw/v1.2.0/tests/tmap-job.json ; wget https://github.com/common-workflow-language/cwl-v1.2/raw/v1.2.0/tests/tmap-tool.cwl ; wget https://github.com/common-workflow-language/cwl-v1.2/raw/v1.2.0/tests/reads.fastq ; wget https://github.com/common-workflow-language/cwl-v1.2/raw/v1.2.0/tests/args.py ; export SINGULARITY_DOCKER_HUB_MIRROR=http://docker-registry.toil ; toil-cwl-runner --clean=always --logDebug --statusWait=10 --disableCaching --retryCount=0 --setEnv=SINGULARITY_DOCKER_HUB_MIRROR --batchSystem=single_machine --singularity --outdir=test-out tmap-tool.cwl tmap-job.json" and show that I can run Singularity containers under Toil CWL workflows from within a Kubernetes pod on one of the new worker nodes.

Hopefully the Cactus and other integration tests are similarly happy with the move.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aventer APT source is broken again Docker in the Toil appliance container does not know about --gpus
3 participants