Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give users SSH access to builders #6

Merged
merged 2 commits into from
Nov 2, 2023

Conversation

FrostyX
Copy link
Member

@FrostyX FrostyX commented Sep 16, 2023

@FrostyX FrostyX changed the title Allow users to SSH access builders Give users SSH access to builders Sep 16, 2023
Obviously, Copr resources aren't unlimited so we need to allocate them
wisely.

- Only 1 running instance per user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This number should be configurable. I can imagine that for internal Copr we can make this number higher.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we allow Packit to provide this to its users (in the future), all the Packit users will share this limit. So, I would have a configurable instance limit and, ideally, a user-level limit as well.

rules, instance type selection, etc., and a button to actually spawn
the instance. The instance would be fresh and it would be up to the
user to reproduce the failure manually. Once the builder spawns, the
page will show instructions on how to connect to it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this one. "Resubmit and keep for ssh"

Copy link
Member

@praiskup praiskup Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this one as well. Edit: changed my mind. The consequence is that we can't guarantee the same kind of VM (in AWS or on HW?).

able to specify at least an architecture. But that is IMHO not good
enough. For example we have x86_64 builders our own hypervisors and in
Amazon AWS. Historically, we had issues that appeared only on builders
from one of them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not care about issues that happens only in some pool. It may happen, but I guess this is very rare. Let them choose only architecture. Or to be precise one chroot.

but try it in an anonymous window, it doesn't work.

Anyway, Copr already fetches some user information from FAS so I think
we can assume we will be able to fetch the SSH keys as well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
And if it means that they need to logoff and login to make this feature start working, that it is cheap price.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I requested this against noggin/fasjson: fedora-infra/fasjson#580

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


- It is not possible to override a status of an already finished build
- Users cannot take a build from the queue (maybe not even theirs) and
process it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nod we will know at the beginnig that this build fail and we will give ssh to user. In this case we have to ignore the result.

to 30 hours at most. That means the lifespan needs to be at
least 30 hours.
- I propose 48 hours because we want to give users enough time for
debugging after the build finishes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would give 1/2 or 1 hour max by default, with the possibility to prolong to 2 days as you propose.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for having a really limited lifespan by default. Most of the people won't need much. Half an hour looks fine to me. Not sure if you have a median of the build time or some other metric we can compare this to.

debugging after the build finishes
- Alternatively, we can do a 24-hours lifespan and add an "extend"
button to Copr WebUI or to add a `copr-builder-extend` command
to the instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, here -> but I think 24 hours by default is too much.

instance is already running. We don't want to kill it and spawn a
new one. We probably need to print it was not possible to spawn a
new instance into the builder-live/backend log
- Where to print the instructions on how to connect to the builder?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend.log (backend is responsible for (not)closing the resalloc ticket)?

new instance into the builder-live/backend log
- Where to print the instructions on how to connect to the builder?
- If the build fails in multiple chroots, which one of those builders
should be kept alive? We can't keep them all.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allow user to have N such machines, the first N will be kept -> the rest of them would just dump to backend log that "You already have N machines ready for debugging, please release them first.".

should be kept alive? We can't keep them all.
- Inability to select an instance type in advance so it may be hard
to debug builds where e.g. Amazon AWS Spot instance is needed
and nothing else.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we have the exact failing machine in hand, what am I missing?


Since the workflow starts from a failed build, we could create a MOTD
on the builder showing instructions on how to reproduce the particular
build using `copr-rpmbuild`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I faced many times a "why the heck the build succeeds??" situation, too. Could we have:

--keep-builder-for-debugging=WHEN (failed|succeeded defaults to failed)


Alternativelly, we could allow users to specify a public URL to their
SSH key, upload it, or copy-paste it to some Copr form, but I don't
like any of these options.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We should also observe possibilities for LDAP for the Red Hat Internal Copr. I think we could make a compromise here if the automatic getting of pub keys is too complicated.

could change the password after logging-in, upload their SSH key,
whatever they want.

Personally, I prefer `Option 2 - Passwords`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have to be stored in some private per-user log, or in *_private database table :-/ I personally prefer Option 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I didn't think of that.
Agreed that we should go with Option 1 if it is doable.
👍


We need to make sure that:

- It is not possible to override a status of an already finished build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? This seems to be "for free"... (either we started a new VM to reproduce previous failure, or we get the machine post-build - depending on the chosen approach).

Personally, I prefer `Option 2 - Passwords`.


## Security
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure that the resalloc ticket ID is recycled immediately after user finished the debugging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, actually no - resalloc may re-use the resource with another ticket; the simple solution is to use a randomly generated unique --sandbox while taking the ticket.

@praiskup
Copy link
Member

Awesome proposal, thank you @FrostyX moving this forward!

Copy link

@lachmanfrantisek lachmanfrantisek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really handy!

As mentioned in today's meeting, it might be worth syncing with the Testing Farm folks since they have a very similar feature.

Obviously, Copr resources aren't unlimited so we need to allocate them
wisely.

- Only 1 running instance per user

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we allow Packit to provide this to its users (in the future), all the Packit users will share this limit. So, I would have a configurable instance limit and, ideally, a user-level limit as well.

to 30 hours at most. That means the lifespan needs to be at
least 30 hours.
- I propose 48 hours because we want to give users enough time for
debugging after the build finishes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for having a really limited lifespan by default. Most of the people won't need much. Half an hour looks fine to me. Not sure if you have a median of the build time or some other metric we can compare this to.

Comment on lines +141 to +143
Alternativelly, we could allow users to specify a public URL to their
SSH key, upload it, or copy-paste it to some Copr form, but I don't
like any of these options.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to set custom SSH key(s) would allow Packit to provide this to its users. We can easily let either people configure the public key or take it from GitHub. The workflow I see here:

  1. Someone opens a PR.
  2. Build is triggered.
  3. Build fails, in the results (or as a comment), Packit suggests the user to try this feature.
  4. User triggers this (e.g. via /packit rerun-with-ssh or some other comment)
  5. Packit uses the Copr Python client to use this feature and set the user's public SSH key (it should be available via GitHub API).
  6. Packit adds a comment with the instructions on how to connect.

config file on copr-frontend.


## Auth

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if PAM plugin (sssd?) would work to auth using kerberos: we use it both in Fedora and internally.

internal Copr by security initiatives.


## Security

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you store some secrets on builders?

@FrostyX FrostyX merged commit 6650cd1 into fedora-copr:main Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants