Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hostport registry: add NodeObservability hostnetwork ports #1245

Conversation

alebedev87
Copy link
Contributor

@alebedev87 alebedev87 commented Sep 20, 2022

NodeObservability Operator is an add-on (off OCP payload) operator, it deploys an agent (DaemonSet) on the nodes which need to be observed. The observation is currently limited to CRIO and Kubelet profiling probes (standard golang pprof endpoints) which are triggered via the agent HTTPS endpoint: Operator --HTTPS--> Agent on the node --UnixDomainSocket/HTTPS--> CRIO/Kubelet pprof endpoint. The agent has been set as privileged container to be able to mount the unix domain socket of CRIO. Also, this led to a dedicated SCC with quite wide permissions (almost cluster-admin).

The idea behind this PR is to reduce the cluster level permissions of NodeObservability Operator. This can be done by targeting the HTTPS CRIO pprof endpoint (instead of the socket). This needs the hostnetwork access as CRIO HTTPS pprof endpoint is bound on localhost address and cannot receive traffic from the POD network namespace. The hostnetwork access enabled for NodeObservability Operator results in 2 occupied ports which we would like to make visible by creating this PR.

Note: 9740 port can potentially be removed in the future, if kube-rbac-proxy will support proxying via unix domain socket.

Any ideas/remarks are welcome!

@openshift-ci openshift-ci bot requested review from s-urbaniak and sttts September 20, 2022 14:26
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 20, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zaneb for approval by writing /assign @zaneb in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alebedev87
Copy link
Contributor Author

/retest

@danwinship
Copy link
Contributor

The idea behind this PR

well, this PR is just documenting a decision that was/will be made elsewhere...

reduce the cluster level permissions of NodeObservability Operator

Being hostNetwork brings its own set of possible exploits. Especially if the pod in question also has CAP_NET_ADMIN, which I assume your pod does? I wouldn't just assume that trading "can mount host filesystem" for "is hostNetwork" would be a reduction in permissions...

@alebedev87
Copy link
Contributor Author

alebedev87 commented Sep 29, 2022

@danwinship : Thanks a lot for taking a look at the PR!

The idea behind this PR

well, this PR is just documenting a decision that was/will be made elsewhere...

Yes, the decision is supposed to be already taken, however I wanted to have a discussion with OCP network experts as we are also hesitating on this move. So, I thought that this can be well approached in the form of a PR.

reduce the cluster level permissions of NodeObservability Operator

Being hostNetwork brings its own set of possible exploits. Especially if the pod in question also has CAP_NET_ADMIN, which I assume your pod does? I wouldn't just assume that trading "can mount host filesystem" for "is hostNetwork" would be a reduction in permissions...

That's exactly the question we asked ourselves. We are not quite sure that going hostNetwork is a less privileged "mode" than having privileged container. However, the operator does create a dedicated SCC to allow this privileged container. This is an additional burden which could have been avoided by going hostNetwork.
Talking about CAP_NET_ADMIN, we don't use any additional capabilities, I suppose CAP_NET_ADMIN is not part of the default capabilities a container gets. Also looking at the description of the net_admin capability I don't see anything our container is doing from its list.

Would you mind to showcase some potential (or maybe even the most common) exploits for a container which is using the host network?

@alebedev87 alebedev87 force-pushed the node-observability-host-port-registry branch from 80797f3 to dfc66e0 Compare September 29, 2022 15:23
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 29, 2022

@alebedev87: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@alebedev87
Copy link
Contributor Author

@danwinship: I found a good article which tries to rank (loosely but still educated) the most privileged options a container may have, it also provides examples of potential attack paths: Bad Pods: Kubernetes Pod Privilege Escalation.

Looking at that article the choice between "privileged+hostPath" vs hostnetwork container is slightly biased towards the later. Taking into account the fact that the operator won't need any create/update/delete permissions on the SCC, I think that the least privileged is indeed the hostNetwork container.

@danwinship
Copy link
Contributor

oh, I missed that it was privileged + host filesystem vs not privileged + hostNetwork. "Not privileged" always wins.
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 4, 2022
@alebedev87
Copy link
Contributor Author

/hold
Until openshift/node-observability-operator#71 is merged

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 11, 2022
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 9, 2022
@alebedev87
Copy link
Contributor Author

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 14, 2022
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 13, 2022
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 13, 2022
@openshift-merge-robot
Copy link
Contributor

@alebedev87: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 20, 2022
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Dec 28, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 28, 2022

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants