Skip to content

bot sync meeting 2024 01 12

Kenneth Hoste edited this page Jan 20, 2024 · 2 revisions

Bot sync meeting - Fri 12 Jan 2024 (09:00 CET)

  • present: Lara, Kenneth, Richard, Thomas
  • goal: high-level discussion of development priorities for 2024 (maybe 1st half of 2024)
  • loose list of topics
    • see open issues https://github.com/EESSI/eessi-bot-software-layer/issues
    • deployment improvements
      • single staging PR (issue #192)
      • upload metadata and tarball to different directories (cfr. approach in NESSI)
        • => create issue with more info
      • add issue comment id to metadata
      • add additional metadata (which?)
        • which bot created the tarball
      • support for deleting/overwriting something that is already there (issue #147)
        • example: deploying new compat layer
        • => create issue for this
        • metadata that comes with the tarball should specify which actions should be taken as part of the ingest
          • paths to remove
          • list of tarballs to ingest => single staging PR!
          "payload": {
              "remove": ["path1", "path2", ...],
              "add": [{
              "filename": "eessi-2023.06-software-linux-x86_64-intel-skylake_avx512-1704485930.tar.gz",
              "size": "601491765",
              "ctime": "Fri Jan  5 20:20:16 UTC 2024",
              "sha256sum": "ded54f4555bf0411f8dc61e2a9b8f78b321533b978231543ed9d00a61c20ae2f",
              "url": "https://software.eessi.io-2023.06.s3.amazonaws.com/2023.06/software/linux/x86_64/intel/skylake_avx512/1704485930/eessi-2023.06-software-linux-x86_64-intel-skylake_avx512-1704485930.tar.gz"
            }},
        • ingest script will need to be updated accordingly as well
          • should work based on metadata files in S3 bucket (not tarballs)
          • cfr. approach in NESSI
        • revise directory structure on S3 bucket
          • EESSI: target/tarball+metadata
          • NESSI:
            • tarballs/target/example.tgz
            • new/target/metadata.txt
            • other directories for staging/approved/ingested/etc.
      • dealing with multiple bots that deploy
        • Stratum-0 should be enhanced to know when it can open a staging PR
          • only when tarballs for all CPU targets have arrived
      • bot deploy implementation is specific to EESSI software-layer (issue #113?)
      • support for running a bot/pre-deploy.sh script, only trigger deploy when that script has exit code 0
        • => open issue on this
          • a more general approach could be to add support to configure the bot with a particular workflow
    • job manager crashes
      • see issues #193, #191, #142
      • related: event handler crash (issue #160)
      • use retry approach for problems talking to GitHub
      • catch exceptions (like we do in PyGhee/event handler)
        • report back in GitHub what went wrong, by means of notification
    • sync NESSI/EESSI bot codes
      • bot implementation of NESSI bot has diverged
        • w.r.t. deployment procedure (extra updated in GitHub comment)
        • extra info in metadata file
        • uploading of tarball is done is different subdirectory of bucket
      • => should open PR with changes made in NESSI bot to EESSI bot to ave a clear view of divergence
    • support for cancelling a job (issue #190)
      • only people with build permissions
      • or only person who triggered build + select group of "admins"
      • need to make sure that job manager properly cleans up when job gets cancelled (or disappears due to some other problem)
    • support for building on top of EESSI/NESSI (local FS, second CVMFS (restricted repo))
      • requires update in eessi_container.sh script to bind-mount stuff for multiple repos
    • "control center" (issue #96)
      • first step could be to use first comment created by each bot to let it provide a status overview
      • could be a step towards having a more general web interface that provides an overview of all build jobs/open PRs
    • make bot less "chatty" (issue #159)
    • bot should warn when something that was built successfully before is being rebuilt
      • maybe it should even refuse to rebuild, unless a (new) rebuild command is used instead (issue #92)
      • => need to open issue on this
    • implement extra test phase in bot
      • overlay should not be writable in this case!
      • still need to bind mount contents of build tarball
  • support GitLab (issue #194)
    • what information we need - map between HGH and GL
    • how to update PyGHee
  • plan the work
    • milestone for bug fix release 0.2.1
      • fix crashes of job manager (+ event handler)
    • (see differences first) sync EESSI/NESSI code release 0.2.2
      • open PR [Thomas]
    • milestone for minor release 0.3.0
      • improve deploy phase
      • => open missing issues [Thomas]
    • next sync meeting Wed, Feb 7, 10:00
      • use weekly support meetings to report on status
Clone this wiki locally