-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grab bag of bug fixes and improvements #419
base: master
Are you sure you want to change the base?
Conversation
Switch from running both Mypy and Pyright to just Pyright. I can't bear to keep accommodating Mypy's limitations compared to Pyright. I only introduced Pyright¹ in addition to (as opposed to replacing) existing Mypy usage because Pyright had functionality Mypy didn't that I wanted to use. Mypy at the time also supported some things Pyright didn't at the time, so I used both instead of switching. Over time though, Pyright became a lot more sophisticated than Mypy, and most of Mypy's complaints became false-positives or overly-conservative because of various limitations/choices. I've made comments (in code, in commit messages, on GitHub issues, etc.) for a while now that yeah, we probably should drop Mypy at this point, but it was always low-priority. And I knew I'd get fed up enough at some point to JFDI, and lo and behold, here we are! Type ignore comments which were only necessary for Mypy are removed, and the remaining ignore comments are converted to Pyright-specific syntax so they can target specific rules. Mypy-specific workarounds with cast() and TYPE_CHECKING are removed. The most recent version of Pyright is required for tests as it contains a fix² to type inference that without which a cast() is still necessary in one spot due to type inference changing after removal of Mypy-specific workarounds. Reverts the following commits wholesale - "runner.docker: Use os.getuid()/getgid() when present instead of conditioning on os.name" (23f618e) - "dev: Work around some mypy limitation when type checking a lambda" (31d708f) - "dev: Ignore for mypy's sake some superclass calls" (d2fe958) as part of the changes. Drops from dev dependencies a few types-* packages that were only necessary for Mypy. ¹ In "dev: Add (optional) type checking with Pyright to our tests" (e0faaf5) ² <microsoft/pyright#9988>
Allows for tests that require config and other app data and also isolates tests from the local user's personal config and data.
40e4a7a
to
b3913e1
Compare
…("nextstrain.cli") The prose name is more meaningful to users than the internal package name.
…e. bundles Python) This is a very useful bit of information to know, but previously it was only discernible by knowing what to look for (e.g. the Python executable path) in the --verbose output.
…owser locally The previous approach of late-loading of nextstrain.cli.command.view only worked when running an explicit set of tests/*.py files—pytest's discovery process for doctests and pytest_*() functions means it loads all of nextstrain.cli too early otherwise—and most of the time a new tab or two was popped open because the environment modification was too late. This was annoying. The previous approach also predated our devel/pytest wrapper to reliably set env by insisting on a common entrypoint to tests. With that now present, the easiest and more reliable thing is to move the setting of BROWSER there before pytest is ever exec-ed.
Good manners and standard practice. Almost essential for writing readable Cram tests too.
Recently when working on this codebase I've though it's not clear enough that "UserError" doesn't mean an error the user's committed. It means an error intended for the user to see. Describe it as such for the benefit of other developers.
Improves readability, it seems to me, having spent a bit more time going thru --help recently.
The query parts of such endpoints were not being correctly joined with additional authentication parameters. Noticed in passing. No impacted users that I know of.
…rdcoded /nextstrain Brings this older code up to consistency with newer code and further centralizes mount_point() as the logic for where named volumes end up.
…not our presumed extraction path. This is a nuanced change: normally the two paths are identical, but zipfile.extract() does sometimes munge extraction paths for security or platform compatibility and thus make the two paths different.
The latest image, nextstrain/base:build-20250304T041009Z, provides a mechanism in the entrypoint to support bundling of overlays in the workdir ZIP archive by way of upwards-traversing archive member paths.¹ For example, an Augur overlay is bundled into the workdir ZIP archive with member paths starting with ../augur/ and ends up overwriting files in the image's /nextstrain/augur/ since the AWS Batch workdir is always /nextstrain/build/. Extending overlay support to AWS Batch has been very low priority and something I thought was unlikely to ever happen. However, in the course of working on AWS Batch support for `nextstrain run`, it turned out to be easiest/most straightforward/most minimal changes to bundle the pathogen source directory with the working analysis directory in the workdir ZIP archive, i.e. as a "pathogen" overlay. This naturally led to supporting overlays more generally, which I've done here. One caveat compared to overlays in runtimes with the concept of volume mounts (Docker, Singularity) is that any files in the image that do not exist in the overlaid files will remain present since nothing removes them. This is potentially problematic and will be annoying if run into but most of the time should be a non-issue. It is also solvable if we care to exert the effort and extra code to do so. I don't right now. ¹ <nextstrain/docker-base#241>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skimmed over most of the code, but it was the last commit that I was most interested in!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extending overlay support to AWS Batch has been very low priority and something I thought was unlikely to ever happen.
I've wanted this! Awesome that it's happening
One caveat compared to overlays in runtimes with the concept of volume mounts (Docker, Singularity) is that any files in the image that do not exist in the overlaid files will remain present since nothing removes them. This is potentially problematic and will be annoying if run into but most of the time should be a non-issue. It is also solvable if we care to exert the effort and extra code to do so. I don't right now.
This'll be a nightmare to debug if it ever occurs. If I understand the code (I don't spend much time in this codebase) the behaviour of --augur
is going to differ if it's run locally in docker (which swaps the entire augur
out) vs aws-batch (which overlays any zipped up ../augur
contents), right? Since --augur
is already special cased I think it'd be sensible to remove the (containers original) augur
directory if we see ../augur
in the zip file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've wanted this! Awesome that it's happening
\o/
This'll be a nightmare to debug if it ever occurs. If I understand […]
Your understanding is correct. I think "nightmare" is overstating it—a stack trace, for example, will implicate a file that shouldn't be there—but yes, it'd certainly be aggravating.
Note that --augur
is just one example of these development overlays, although probably the most useful one. There's also --auspice
, --fauna
, and (ha) --sacra
. (I've personally used --augur
and --auspice
about evenly.)
Additional handling of these overlays in the AWS Batch entrypoint is possible, like I noted, but I don't really want to spend more time on those bits now. The overlay options have other caveats that present tripping hazards too, e.g. for Augur, it's just source code that's overlaid, not dependencies; and for Auspice you need to make sure to build it locally first. So a bit of "here be dragons" isn't unprecedented, and these are (AFAIK) rarely-used options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if anything comes from this discussion we should at least remove --sacra
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a little background, it's very very rare I run a workflow using the release version of augur + the master/main
repo branch, as that's what we have the automated stuff for in my eyes. Almost every time I'm running workflows it's because i've modified the workflow, or modified augur, or (surprisingly often) both. If we can get nextstrain run ...
working as I hope so I can farm out a job to AWS with my modified augur code & modified repo and then download the assets as a separate siloed working directory I'd save a whole lot of build time and my laptop would be happier. Is #407 at the point where this can be done / tested out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good reminder that although I often focus on the expected/anticipated external users of nextstrain run
, we have internal uses for it that have different focuses. Internally we care mainly about the workdir isolation and mechanics in combination with offloading the work to AWS Batch, not the pathogen source/version management.
You can totally test out #407 for your purposes!
There is not currently a supported way to use nextstrain setup
to make an "editable install" (to use Python/Pip's lingo) of a self-managed local pathogen repo. It's on the list, but I wasn't focusing on it at first. Sounds like it should be soon/next up. In the meantime, a workaround is to do something like this:
$ ln -sv ~/nextstrain/measles ~/.nextstrain/pathogens/measles/local="$(echo -n local | base32)"
$ nextstrain run measles@local …
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good reminder that although I often focus on the expected/anticipated external users
Yeah, this'll always be a tension in the group - imaging a "happy path" for "most people" vs the way some of us run things. For this work I don't mind symlinking things.
P.S. base32
not available on MacOS out of the box.
All related of varying degrees to my
nextstrain run
and workflows-as-programs work in #407. Split out to be reviewed/merged (and probably released) separately. One big grab bag, as no one seemed to mind.Checklist