Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up a basic automated e2e smoke test #102

Merged
merged 14 commits into from
Jun 28, 2023
Merged

Set up a basic automated e2e smoke test #102

merged 14 commits into from
Jun 28, 2023

Conversation

qobilidop
Copy link
Member

@qobilidop qobilidop commented May 30, 2023

Summary

This PR sets up a basic automated e2e smoke test:

  • Automated means the whole testing is runnable with a single command.
  • e2e means it exercises the whole P4 DPDK stack including the p4c compiler (with DPDK backend), DPDK software switch, TDI runtime API, and some other libraries provided in this repo.

Background

In our current experimental usage of the P4 DPDK stack, we sometimes identify issues that could be caught early by a simple e2e smoke test. We think adding these tests to this public repo, and setting up GitHub CI to run them automatically, could be a useful addition to the existing unit tests.

Demo

Build a Docker image with the P4 DPDK stack (can take up to hours):

$ ./tools/docker_build.sh

Run the smoke test in a container:

$ ./tools/docker_run_smoke_test.sh
INFO:root:Run command:
    p4c-dpdk --arch pna --p4runtime-files e2e-test/suites/pna/basicfwd/p4c_gen/p4info.txt --bf-rt-schema e2e-test/suites/pna/basicfwd/p4c_gen/bfrt.json --tdi e2e-test/suites/pna/basicfwd/p4c_gen/tdi.json --context e2e-test/suites/pna/basicfwd/p4c_gen/context.json -o e2e-test/suites/pna/basicfwd/p4c_gen/config.spec e2e-test/suites/pna/basicfwd/main.p4

INFO:root:Using SDE: /home/sde
INFO:root:Using SDE_INSTALL: /home/sde/install
Node Pages Size Total
0    16    2Mb    32Mb
1    16    2Mb    32Mb

Hugepages mounted on /dev/hugepages
INFO:root:Run command:
    sudo -E PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LD_LIBRARY_PATH=/home/sde/install/lib:/home/sde/install/lib64:/home/sde/install/lib/x86_64-linux-gnu /home/sde/install/bin/bf_switchd --install-dir /home/sde/install --conf-file /home/work/e2e-test/suites/pna/basicfwd/conf_bf_switchd.json

INFO:root:PID: 34
INFO:root:Run command:
    /home/sde/install/bin/bfshell -f /home/work/e2e-test/suites/pna/basicfwd/cmds_bfshell.py

INFO:root:PID: 50
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
.
Sent 1 packets.
..
Sent 1 packets.
.
----------------------------------------------------------------------
Ran 2 tests in 2.321s

OK

For more details, see e2e-test/README.md.

Current status and further plans

The test added in this PR is very basic, but shall serve as a good starting point for test organization.

@lzhzero and I will keep working on adding more tests in follow-up PRs, and reporting issues if we encountered any, until we get a fully GitHub CI automated representative sets of e2e tests.

@qobilidop qobilidop marked this pull request as draft May 30, 2023 23:44
@qobilidop
Copy link
Member Author

qobilidop commented Jun 3, 2023

Update: This issue is fixed later in this PR. It is related to p4lang/tdi#107. I updated TDI dependency at the time, and couldn't find the enable command. So I simply skipped the enable command below, which resulted in the error. Later, I reverted TDI to its original version in this repo, and added the enable command back, and it works!

Original comment below:

I'm encountering a new issue that could use some help.

Short description

The add_with_xxx command fails with:

TdiTableError: Error: table_entry_add failed on table pipe.MainControl.forwarding. [Object not found]

Relevant logs shown in bf_switchd:

MatchSpec:
priority = 0
num_valid_match_bits:48
num_match_bytes:6
match_value_bits: 0x0000c0a80166
match_mask_bits: 0xffffffffffff
ActionSpec:
action_bmap:01
num_valid_action_data_bits:64
num_action_data_bytes:8
action_data: 0x0000000200000002
get_table_id failed at 693
get_table_id failed at 714 end
2023-06-02 23:54:05.700968 BF_PIPE ERROR - dpdk table id get for table forwarding failed
2023-06-02 23:54:05.701086 BF_PIPE ERROR - not able get table metadata for table forwarding
2023-06-02 23:54:05.701134 BF_PIPE ERROR - dal_table_ent_add failed

How to reproduce?

Use the latest commit in this PR. Follow the instructions in e2e-test/README.md.

For the convenience of a quick look, below I copied the bfshell commands file I used:

tdi_python

# Add ports

for port_id in range(4):
    tdi.port.port.add(
        DEV_PORT=port_id,
        PORT_TYPE="BF_DPDK_TAP",
        PORT_DIR="PM_PORT_DIR_DEFAULT",
        PORT_IN_ID=port_id,
        PORT_OUT_ID =port_id,
        PIPE_IN="pipe",
        PIPE_OUT="pipe",
        MEMPOOL="MEMPOOL0",
        PORT_NAME=f"TAP{port_id}",
        MTU=1500
    )

# Add entries

from netaddr import IPAddress

control = tdi.main.pipe.MainControl
table = control.forwarding
table.add_with_mark_and_forward(
    dst_addr=IPAddress('192.168.1.101'),
    marker=0x01,
    port=1
)
table.add_with_mark_and_forward(
    dst_addr=IPAddress('192.168.1.102'),
    marker=0x02,
    port=2
)

The error occurs on the last two table.add_with_mark_and_forward calls.

@jafingerhut
Copy link
Contributor

Are you able to create at least one simple test that passes, that is not blocked by the issue you have found and reported in this comment? #102 (comment)

If so, it might be nice to focus on creating that first simple passing test first, and file a separate issue for the bug you have found.

@qobilidop
Copy link
Member Author

Thanks Andy for your suggestion! I'll try to bypass this and come to a working test first.

@qobilidop qobilidop changed the title (WIP) Set up a basic e2e smoke test Set up a basic automated e2e smoke test Jun 7, 2023
@qobilidop qobilidop marked this pull request as ready for review June 7, 2023 17:47
@jafingerhut
Copy link
Contributor

@satish153 @swaroopsarma @jamescchoi Can you suggest someone who can (a) review this? and (b) do whatever approval is required for the Github "1 workflow awaiting approval" message I see on this conversation?

@fruffy
Copy link

fruffy commented Jun 15, 2023

I am interested in running this script as part of CI. How much time is spent on setting up dependencies? Realistically, do we require a docker container to run this as part of CI?

@qobilidop
Copy link
Member Author

Thanks @fruffy for offering help! That would be great!

I just tried building the Docker image again. It takes 45 min on my computer, and the final image is 4.78 GB. I feel the building time might be too long, and the image size too large to be usable in CI.

Without a Docker container, as long as all the dependencies are installed, the testing scripts shall work. Maybe that's an easier solution on the CI side.

I still think providing a Dockerfile is useful as a reproducible environment to try things out locally, even if it's not used in the CI. I feel the best middle ground might be what's been done in the p4c repo, that the Dockerfile is simply invoking the ci-build.sh script, which is also used in CI to build required dependencies.

One challenge I faced in setting up dependencies is how to best bring in p4c. Building p4-dpdk-target itself doesn't require p4c, but in the e2e tests p4c is needed. My current solution is to use the public p4lang/p4c image as the base image, so that p4c is automatically available. This is probably not ideal, because the p4c version is not fixed, but sometimes that might matter. If we do want to fix p4c version in testing, then it seems building p4c from source is the only option?

I'd like to continue with more detailed discussions on this and some other issues, but I feel getting them all resolved in this PR might be too much. So, I'm proposing limiting the scope of this PR to a single working test that is automated locally, and expect more work to be done before it is fully automated in CI. Hopefully this could also make reviewing this PR easier.

Copy link
Collaborator

@swaroopsarma swaroopsarma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apart from return status removing in dal_init.c, rest all looks fine

qobilidop added 10 commits June 15, 2023 12:04
The final image is 5.3GB. That seems too large. There should be room for optimization.
The testing scripts are not finished yet. But I ran into some issues that might need to be resolved before I can move on.
TDI is updated to the latest commit (on Mar 1, 2023). DPDK is updated to the latest release (v23.03).
There are still lots of issues to solve before this smoke test becomes meaningful enough. But this could be a good starting point.
This brings back the enable() method that seems to be a crucial step in setting up the pipeline. But this also results in some errors. Will try to fix them.
This is a temporary solution though.
@swaroopsarma
Copy link
Collaborator

Just rebased the pull request

@fruffy
Copy link

fruffy commented Jun 16, 2023

Without a Docker container, as long as all the dependencies are installed, the testing scripts shall work. Maybe that's an easier solution on the CI side.

Sound great! 45 is also tolerable and we can likely accelerate this.

One challenge I faced in setting up dependencies is how to best bring in p4c. Building p4-dpdk-target itself doesn't require p4c, but in the e2e tests p4c is needed. My current solution is to use the public p4lang/p4c image as the base image, so that p4c is automatically available. This is probably not ideal, because the p4c version is not fixed, but sometimes that might matter. If we do want to fix p4c version in testing, then it seems building p4c from source is the only option?

Hmm, if we use ccache, building p4c from scratch can take around 5-8 minutes. We might consider that. We can also disable all back ends except the DPDK back end.

I'd like to continue with more detailed discussions on this and some other issues, but I feel getting them all resolved in this PR might be too much. So, I'm proposing limiting the scope of this PR to a single working test that is automated locally, and expect more work to be done before it is fully automated in CI. Hopefully this could also make reviewing this PR easier.

Yes, this was not a request, mostly curiosity. It will take a bit of time until we get to an automated testing run.

@jafingerhut
Copy link
Contributor

@swaroopsarma Can you take another look at this to see if it looks good to you now? If I understand correctly, with the latest commit (number 11) on this PR, there are only new log messages output in the dal_init.c code, but no other functional changes.

@qobilidop
Copy link
Member Author

@jafingerhut Thanks for the ping and sorry about the premature pushes. I added 2 more commits and now they should be ready for another review.

Also thank you @swaroopsarma for your previous review. Wish to hear more from you soon.

@chrispsommers
Copy link

This is probably not ideal, because the p4c version is not fixed, but sometimes that might matter. If we do want to fix p4c version in testing, then it seems building p4c from source is the only option?

HI @qobilidop This is easy to solve, you can docker pull a specific SHA of any docker image instead of using e.g. :latest or :stable, e.g. here is the version of p4c:latest on 2022-08-19. You can navigate dockerhub and look at the tag versions, see example then get the SHA from there. You can pull or run this image directly using @sha<sha code>: instead of the normal tag, for example:

docker pull p4lang/p4c@sha256:f334bfa3f3f5fc92c3935dc88dc2dbd35a17b1d332fcb98041d5a660c666b2dc

This is how you can pin a version. Just document in any scripts what the SHA corresponds to, e.g. # This is same as p4c:latest on 2023-06-23

BTW you can also use this to build your own dockerfiles using a specific base image, I do this in the SONiC-DASH project. Here's an example Dockerfile which I base on a specific p4c docker imaghe. I then make a new docker image which is smaller by throwing away unneeded backends. Then I publish this smaller custom p4c image to another container registry which a CI pipeline or user dev environment can pull down. I saved a few hundred MB this way. https://github.com/sonic-net/DASH/blob/main/dash-pipeline/dockerfiles/Dockerfile.p4c-bmv2

@chrispsommers
Copy link

To avoid long docker build times, why don't you automate the docker build and publish workflow similar to many other p4lang repos, e.g. https://github.com/p4lang/p4c/blob/main/.github/workflows/ci-container-image.yml ?

Everything gets built and published to Dockerhub daily. This will save hours, and perhaps attract more users because the initial user experience can be a deal-breaker. Strive for 5 Minutes TFD (Tine for First Dopamine).

@qobilidop
Copy link
Member Author

Thanks @chrispsommers! Those are great suggestions. I didn't realize it could be that easy. The example Dockerfile you linked is especially useful. I see there is also a Dockerfile.p4c-dpdk. I'll learn from those examples and improve the Dockerfile in this PR.

As to publishing Docker images using GitHub CI automatically, I completely agree with you that this would be a great thing to have. I'm just not sure if I am the right person to set it up. Because it seems this would require more permissions (both on GitHub and DockerHub) than I currently have. I'm willing to do the work, but it might be easier for someone already having those permissions to set this up in a follow-up PR. @swaroopsarma @jafingerhut @fruffy What do you think?

@chrispsommers
Copy link

I'm just not sure if I am the right person to set it up. Because it seems this would require more permissions (both on GitHub and DockerHub) than I currently have. I'm willing to do the work, but it might be easier for someone already having those permissions to set this up in a follow-up PR. @swaroopsarma @jafingerhut @fruffy What do you think?

@qobilidop while developing it in your forked dev branch, you can take a few temporary measures to develop it as far as possible w/o involving others, e.g.:

  • comment out the final login/publish steps in the .yml CI file, this way you can debug the build process itself
  • change the Docker repo to a personal one on Dockerhub, I did this for DASH and we actually used it for several months in production until we migrated the docker registry to ACR.

Actually puling the trigger to publish to p4lang/p4-dpdk docker registry would probably require the p4-dpdk account admin to install the dockerhub account credentials in the repo secrets (unless they are global for all or p4lang GitHub site).

@qobilidop
Copy link
Member Author

Got it. I'll try these initial steps then. Thanks again @chrispsommers!

@qobilidop qobilidop marked this pull request as draft June 23, 2023 19:37
@chrispsommers
Copy link

One more piece of advice - any changes you make to this branch will trigger CI actions in the p4lang/p4-dpdk GitHub pipeline, because the PR is already opened. If you want to try out a new CI script or other potentially messy experiments without sharing with the entire world, you can make a scratch branch off of it in your workspace (git checkout -b some-name) and push to your own repo. There you can do messy experiments while you're trying CI actions, etc. in relative privacy. You must enable Git Actions in your repo, they're off by default. Then you can merge back to the PR branch and the same actions will get triggered in the PR branch. I wish I'd thought of this before a few long, embarrassing sessions. :)

@fruffy
Copy link

fruffy commented Jun 23, 2023

Thanks @chrispsommers! Those are great suggestions. I didn't realize it could be that easy. The example Dockerfile you linked is especially useful. I see there is also a Dockerfile.p4c-dpdk. I'll learn from those examples and improve the Dockerfile in this PR.

As to publishing Docker images using GitHub CI automatically, I completely agree with you that this would be a great thing to have. I'm just not sure if I am the right person to set it up. Because it seems this would require more permissions (both on GitHub and DockerHub) than I currently have. I'm willing to do the work, but it might be easier for someone already having those permissions to set this up in a follow-up PR. @swaroopsarma @jafingerhut @fruffy What do you think?

Iiirc there is no formal process. The docker images are maintained on a voluntary basis. I believe @antoninbas has access to the p4lang Dockerhub.

Size is reduced from 4.78GB to 1GB! Thanks @chrispsommers for the suggestion.
@qobilidop qobilidop marked this pull request as ready for review June 26, 2023 23:09
@qobilidop
Copy link
Member Author

@chrispsommers I've adopted the multi-stage builds technique from the examples you linked to. It helps reducing docker image size from 4.78GB to 1GB, and building time from 45min to 13min. Very impressive!

For the docker image CI setup, I still plan to do the work, but after second thoughts, I again propose to leave it to a follow-up PR, so we can leave this already-long PR more focused (and potentially easier to review). I can work on this in my fork without waiting for this PR to be merged. So that's not a blocker.

@swaroopsarma I've marked this PR as ready for review again. The main change needing your attention (based on your last review comments) is the diff in dal_init.c. I have come up with a new workaround, and would like to know if it is good enough this time.

@jafingerhut @fruffy The bmv2 ptf testing work you mentioned in another meeting sounds pretty relevant here. I'm interested to learn how it works if you can share a link, and see if I can help with integrating with it in a follow-up PR.

Thanks everyone!

@fruffy
Copy link

fruffy commented Jun 27, 2023

@jafingerhut @fruffy The bmv2 ptf testing work you mentioned in another meeting sounds pretty relevant here. I'm interested to learn how it works if you can share a link, and see if I can help with integrating with it in a follow-up PR.

I believe Andy did a writeup here. The main advantage is the direct use of Protobuf messages. I do not think this setup requires the shell. Correct me if I am wrong, Andy.

I am also working with a student at NYU (@Hoooao) on an automated setup for p4c, which uses P4Testgen to generate PTF tests for the SoftNIC and runs them. We were thinking of using your automated workflow and then combining it with the PTF tests setup. But this is still work in progress.

@chrispsommers
Copy link

I am also working with a student at NYU (@Hoooao) on an automated setup for p4c, which uses P4Testgen to generate PTF tests for the SoftNIC and runs them. We were thinking of using your automated workflow and then combining it with the PTF tests setup. But this is still work in progress.

@fruffy Interesting coincidence, I'd just messaged @jafingerhut about a proposal to include p4tools in all nightly p4c docker build/publish CI runs. It would make using p4testgen more convenient, just pull a pre-built docker. I'll file a PR shortly.

@qobilidop
Copy link
Member Author

Thanks @fruffy for the information. That's great to know! This kind of test infra surely will be very useful. I look forward to you and @Hoooao's work. I'll read Andy's writeup for my own learning then.

@chrispsommers
Copy link

@fruffy I've submitted PR p4lang/p4c#4049 to incorporate p4tools (including p4testgen) into Dockerfile builds which will eliminate the need to build from scratch unless you're developing the backend itself. Consumers can just docker pull p4lang/p4c and use it. It needs a review.

@jafingerhut
Copy link
Contributor

I believe Andy did a writeup here. The main advantage is the direct use of Protobuf messages. I do not think this setup requires the shell. Correct me if I am wrong, Andy.

My PTF tests use a Python library called p4runtime-shell just out of personal choice: https://github.com/p4lang/p4runtime-shell

Any method of creating+sending, or receiving+parsing, the required P4Runtime Protobuf messages in Python will work.

Copy link
Collaborator

@swaroopsarma swaroopsarma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@swaroopsarma
Copy link
Collaborator

@qobilidop and @jafingerhut can we merge this PR ?

@qobilidop
Copy link
Member Author

I'm okay with merging it. And thanks for reviewing!

@jafingerhut
Copy link
Contributor

@swaroopsarma I have no objections to merging it. Please do not count on me for a technical review of these changes -- that would be a job best requested of others who know this repo contents better than I do.

I am not sure if you were asking me to do perform the merge, but if you were, I do not have permissions on this repository to do that.

@swaroopsarma swaroopsarma merged commit b5c7770 into p4lang:main Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants