Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with 4.21f ceos #10

Open
hyson007 opened this issue Mar 22, 2020 · 14 comments
Open

issue with 4.21f ceos #10

hyson007 opened this issue Mar 22, 2020 · 14 comments

Comments

@hyson007
Copy link

hyson007 commented Mar 22, 2020

getting below error when start pod
OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: \"Cli\": executable file not found in $PATH": unknown
from arista recent readme of ceos-lab,
it seems there is need to pass system some systemd.setenv arg along with /sbin/init but looking at Class CEOS, it seems only environment variables are passed. (I tried to concat in self.command but it doesn't work)

create docker instances with needed environment variables
docker create --name=ceos1 --privileged -e INTFTYPE=eth -e ETBA=1 -e SKIP_ZEROTOUCH_BARRIER_IN_SYSDBINIT=1 -e CEOS=1 -e EOS_PLATFORM=ceoslab -e container=docker -i -t ceosimage:4.21.0F /sbin/init systemd.setenv=INTFTYPE=eth systemd.setenv=ETBA=1 systemd.setenv=SKIP_ZEROTOUCH_BARRIER_IN_SYSDBINIT=1 systemd.setenv=CEOS=1 systemd.setenv=EOS_PLATFORM=ceoslab systemd.setenv=container=docker

@hyson007
Copy link
Author

update self.command in class CEOS to the following resolved the issue to me.

['/sbin/init', 'systemd.setenv=INTFTYPE=eth', 'systemd.setenv=ETBA=1', 'systemd.setenv=SKIP_ZEROTOUCH_BARRIER_IN_SYSDBINIT=1', 'systemd.setenv=CEOS=1', 'systemd.setenv=EOS_PLATFORM=ceoslab', 'systemd.setenv=container=docker']

@networkop
Copy link
Owner

sounds right. do you want to do a pull request?

@hyson007
Copy link
Author

it seems there are still some issue with dynamic routing protocol, unable to bring up ospf, i'm working with arista TAC in case 199976 , will update here once I have more info.

sw-1(config-router-ospf)#end
sw-1#sh ip os nei

% Internal error
% To see the details of this error, run the command 'show error 1'
sw-1#sh ip os nei
! OSPF inactive
sw-1#sh ip os nei
! OSPF inactive
sw-1#sh ip os nei

% Internal error
% To see the details of this error, run the command 'show error 2'

@networkop
Copy link
Owner

i think this is because you need to have at least one ethernet interface in up/up state.

@hyson007
Copy link
Author

nope, i do have L3 interface up/up and can ping each other but ospf can't be brought up, show logging says rib is continuously crashing.
TAC is able to reproduce the issue and ospf works when they use svi, they claim it's this bug causing issue,

BUG397410 affects all EOS versions.
Kernel interfaces are the interfaces on which the VMs are installed.

Our Engineering team is working on this bug fix.
As of now the work around would be to create an SVI, and have Ospf neighborship on a SVI instead of an ethernet interface.

but it seems no such issue on the old version, 4.20.5F, i will update once i have more info.

@hyson007
Copy link
Author

( i did encounter the scenario you mentioned when no ethernet interface in ceos is showing up, in that case i can't even enable 'ip routing', but this time it seems different, i can enable 'ip routing' at least )

@vparames86
Copy link

I updated the self.command, but still getting the issue for version 4.22.1F

kubectl exec -it arista01-5f4dcbdf77-99h9x Cli
Defaulting container name to router.
Use 'kubectl describe pod/arista01-5f4dcbdf77-99h9x -n default' to see all of the containers in this pod.
OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: "Cli": executable file not found in $PATH": unknown
command terminated with exit code 126

@networkop
Copy link
Owner

you can check that command over here https://github.com/networkop/docker-topo/blob/master/bin/docker-topo#L416

@vparames86
Copy link

@networkop - Still getting this issue while running it in a k8s cluster. Have no issues when I launch it as separate docker container. I tried different arista ceos images and all have prb when launched in K8s cluster. I could get to the bash but not Cli. I did "ps -ef" to check all processes running after logging in to bash but see no process running. But in the one I launched as separate docker container, I could see all the processes running.

bash-4.3# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 10:38 ? 00:00:00 /sbin/init systemd.setenv=INTFTYPE=eth systemd.setenv=ETBA=1 systemd.setenv=SKIP_ZEROTOUCH_BARRIER_IN_S
root 6 0 0 10:39 pts/0 00:00:00 bash
root 14 6 0 10:40 pts/0 00:00:00 ps -ef

kubectl describe pod arista05-bb8dcbf6b-mkn7m
Name: arista05-bb8dcbf6b-mkn7m
Namespace: default
Priority: 0
Node: k8s-agentpool-24376997-vmss000004/10.240.0.125
Start Time: Thu, 02 Apr 2020 03:38:06 -0700
Labels: app=aristatopo03
device=arista05
pod-template-hash=bb8dcbf6b
Annotations: kubernetes.io/psp: privileged
Status: Running
IP: 10.240.0.127
IPs:
IP: 10.240.0.127
Controlled By: ReplicaSet/arista05-bb8dcbf6b
Containers:
router:
Container ID: docker://285f718f4a04add8a9ce74fce60ad2ebea26081eaede75578ce5e9dd24603b82
Image: ccevirtnetpperegistry.azurecr.io/ceosimage:4.21.10M
Image ID: docker-pullable://ccevirtnetpperegistry.azurecr.io/ceosimage@sha256:9c1867f3e5f2e539f2a521f4ba443906ec2e6b5972cb6dc1b1e6faa902efe977
Port:
Host Port:
Command:
/sbin/init
systemd.setenv=INTFTYPE=eth
systemd.setenv=ETBA=1
systemd.setenv=SKIP_ZEROTOUCH_BARRIER_IN_SYSDBINIT=1
systemd.setenv=CEOS=1
systemd.setenv=container=docker
systemd.setenv=EOS_PLATFORM=ceoslab
State: Running
Started: Thu, 02 Apr 2020 03:38:09 -0700
Ready: True
Restart Count: 0
Limits:
cpu: 2
Requests:
cpu: 1
memory: 2Gi
Environment:
CEOS: 1
EOS_PLATFORM: ceoslab
container: docker
ETBA: 1
SKIP_ZEROTOUCH_BARRIER_IN_SYSDBINIT: 1
INTFTYPE: eth
Mounts:
/mnt/azure from startup-config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wrqkj (ro)
nse-sidecar:
Container ID: docker://8bda0802eed6ee652aaf48cd2181f4d60a31c5972760181ba2b0f3204fac494b
Image: networkservicemesh/topology-sidecar-nse:master
Image ID: docker-pullable://networkservicemesh/topology-sidecar-nse@sha256:e7a949655cf3759e10fd777c7e9e367640276b871f465bc80c9aabdfd95bf1f7
Port:
Host Port:
State: Running
Started: Thu, 02 Apr 2020 03:38:10 -0700
Ready: True
Restart Count: 0
Limits:
networkservicemesh.io/socket: 1
Requests:
networkservicemesh.io/socket: 1
Environment:
ENDPOINT_NETWORK_SERVICE: aristatopo03
ENDPOINT_LABELS: device=arista05
IP_ADDRESS: 10.60.17.48/28
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wrqkj (ro)
nsc-sidecar:
Container ID: docker://574988ff764754c5917a45b4f07a8f1de6fd0d3cbea4d6e97414de758083c67c
Image: networkservicemesh/topology-sidecar-nsc:master
Image ID: docker-pullable://networkservicemesh/topology-sidecar-nsc@sha256:2b953a76bb548da60313ba5d51973772b61e7096be62722067c019f2aae62934
Port:
Host Port:
State: Running
Started: Thu, 02 Apr 2020 03:38:11 -0700
Ready: True
Restart Count: 0
Limits:
networkservicemesh.io/socket: 1
Requests:
networkservicemesh.io/socket: 1
Environment:
NS_NETWORKSERVICEMESH_IO: aristatopo03/eth1?link=net-84&peerif=eth2
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wrqkj (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
startup-config-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: arista05-pvc
ReadOnly: false
default-token-wrqkj:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-wrqkj
Optional: false
QoS Class: Burstable
Node-Selectors:
Tolerations: networkservicemesh.io/socket:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Warning FailedScheduling default-scheduler running "VolumeBinding" filter plugin for pod "arista05-bb8dcbf6b-mkn7m": pod has unbound immediate PersistentVolumeClaims
Normal Scheduled default-scheduler Successfully assigned default/arista05-bb8dcbf6b-mkn7m to k8s-agentpool-24376997-vmss000004
Normal Pulling 3m53s kubelet, k8s-agentpool-24376997-vmss000004 Pulling image "ccevirtnetpperegistry.azurecr.io/ceosimage:4.21.10M"
Normal Pulled 3m52s kubelet, k8s-agentpool-24376997-vmss000004 Successfully pulled image "ccevirtnetpperegistry.azurecr.io/ceosimage:4.21.10M"
Normal Created 3m51s kubelet, k8s-agentpool-24376997-vmss000004 Created container router
Normal Started 3m51s kubelet, k8s-agentpool-24376997-vmss000004 Started container router
Normal Pulling 3m51s kubelet, k8s-agentpool-24376997-vmss000004 Pulling image "networkservicemesh/topology-sidecar-nse:master"
Normal Pulled 3m50s kubelet, k8s-agentpool-24376997-vmss000004 Successfully pulled image "networkservicemesh/topology-sidecar-nse:master"
Normal Created 3m50s kubelet, k8s-agentpool-24376997-vmss000004 Created container nse-sidecar
Normal Started 3m50s kubelet, k8s-agentpool-24376997-vmss000004 Started container nse-sidecar
Normal Pulling 3m50s kubelet, k8s-agentpool-24376997-vmss000004 Pulling image "networkservicemesh/topology-sidecar-nsc:master"
Normal Pulled 3m49s kubelet, k8s-agentpool-24376997-vmss000004 Successfully pulled image "networkservicemesh/topology-sidecar-nsc:master"
Normal Created 3m49s kubelet, k8s-agentpool-24376997-vmss000004 Created container nsc-sidecar
Normal Started 3m49s kubelet, k8s-agentpool-24376997-vmss000004 Started container nsc-sidecar

@networkop
Copy link
Owner

I can't see where the error is. @vparames86 can you try launching it as a standalone pod, i.e. outside of k8s-topo?

@vparames86
Copy link

vparames86 commented Apr 2, 2020

@networkop - Even the standalone pod doesn't seem to work for me. This is the yaml I used. I put all the vars in COMMANDS and also tried putting the remaining ones other than /sbin/init under ARGS but doesn't seem to work.

apiVersion: v1
kind: Pod
metadata:
name: arista101
namespace: default
spec:
containers:
- command: ["/sbin/init"]
args: ["systemd.setenv=INTFTYPE=eth", "systemd.setenv=ETBA=1", "systemd.setenv=SKIP_ZEROTOUCH_BARRIER_IN_SYSDBINIT=1", "systemd.setenv=CEOS=1", "systemd.setenv=container=docker", "systemd.setenv=EOS_PLATFORM=ceoslab"]
env:
- name: CEOS
value: "1"
- name: EOS_PLATFORM
value: ceoslab
- name: container
value: docker
- name: ETBA
value: "1"
- name: SKIP_ZEROTOUCH_BARRIER_IN_SYSDBINIT
value: "1"
- name: INTFTYPE
value: eth
image: ccevirtnetpperegistry.azurecr.io/ceosimage:4.21.10M
imagePullPolicy: Always
name: router
resources:
limits:
cpu: "2"
requests:
cpu: "1"
memory: 2Gi
securityContext:
capabilities:
add:
- NET_ADMIN
imagePullSecrets: - name: ipevirtnetppereg

Could you please share a pod.yaml that works for you?

@networkop
Copy link
Owner

this one worked for me

apiVersion: v1
kind: Pod
metadata:
  name: ceos
spec:
  containers:
  - image: ceos:4.23.2F
    name: ceos
    securityContext:
        privileged: true
        capabilities:
            add:
            - NET_ADMIN
    command: 
    - "/sbin/init"
    args:
    - "systemd.setenv=INTFTYPE=eth"
    - "systemd.setenv=ETBA=1" 
    - "systemd.setenv=SKIP_ZEROTOUCH_BARRIER_IN_SYSDBINIT=1"
    - "systemd.setenv=CEOS=1"
    - "systemd.setenv=container=docker"
    - "systemd.setenv=EOS_PLATFORM=ceoslab"
    env: 
    - name: CEOS
      value: "1"
    - name: EOS_PLATFORM
      value: "ceoslab"
    - name: container
      value: docker
    - name: SKIP_ZEROTOUCH_BARRIER_IN_SYSDBINIT
      value: "1"
    - name: INTFTYPE
      value: eth

@vparames86
Copy link

ah sec_context = client.V1SecurityContext(privileged=True) this is missing for create_nsm function. This most probably might be the issue.

@vparames86
Copy link

@networkop - This worked thanks for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants