Set up a basic automated e2e smoke test #102

qobilidop · 2023-05-30T23:43:34Z

Summary

This PR sets up a basic automated e2e smoke test:

Automated means the whole testing is runnable with a single command.
e2e means it exercises the whole P4 DPDK stack including the p4c compiler (with DPDK backend), DPDK software switch, TDI runtime API, and some other libraries provided in this repo.

Background

In our current experimental usage of the P4 DPDK stack, we sometimes identify issues that could be caught early by a simple e2e smoke test. We think adding these tests to this public repo, and setting up GitHub CI to run them automatically, could be a useful addition to the existing unit tests.

Demo

Build a Docker image with the P4 DPDK stack (can take up to hours):

$ ./tools/docker_build.sh

Run the smoke test in a container:

$ ./tools/docker_run_smoke_test.sh
INFO:root:Run command:
    p4c-dpdk --arch pna --p4runtime-files e2e-test/suites/pna/basicfwd/p4c_gen/p4info.txt --bf-rt-schema e2e-test/suites/pna/basicfwd/p4c_gen/bfrt.json --tdi e2e-test/suites/pna/basicfwd/p4c_gen/tdi.json --context e2e-test/suites/pna/basicfwd/p4c_gen/context.json -o e2e-test/suites/pna/basicfwd/p4c_gen/config.spec e2e-test/suites/pna/basicfwd/main.p4

INFO:root:Using SDE: /home/sde
INFO:root:Using SDE_INSTALL: /home/sde/install
Node Pages Size Total
0    16    2Mb    32Mb
1    16    2Mb    32Mb

Hugepages mounted on /dev/hugepages
INFO:root:Run command:
    sudo -E PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LD_LIBRARY_PATH=/home/sde/install/lib:/home/sde/install/lib64:/home/sde/install/lib/x86_64-linux-gnu /home/sde/install/bin/bf_switchd --install-dir /home/sde/install --conf-file /home/work/e2e-test/suites/pna/basicfwd/conf_bf_switchd.json

INFO:root:PID: 34
INFO:root:Run command:
    /home/sde/install/bin/bfshell -f /home/work/e2e-test/suites/pna/basicfwd/cmds_bfshell.py

INFO:root:PID: 50
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
.
Sent 1 packets.
..
Sent 1 packets.
.
----------------------------------------------------------------------
Ran 2 tests in 2.321s

OK

For more details, see e2e-test/README.md.

Current status and further plans

The test added in this PR is very basic, but shall serve as a good starting point for test organization.

@lzhzero and I will keep working on adding more tests in follow-up PRs, and reporting issues if we encountered any, until we get a fully GitHub CI automated representative sets of e2e tests.

qobilidop · 2023-06-03T00:09:32Z

Update: This issue is fixed later in this PR. It is related to p4lang/tdi#107. I updated TDI dependency at the time, and couldn't find the enable command. So I simply skipped the enable command below, which resulted in the error. Later, I reverted TDI to its original version in this repo, and added the enable command back, and it works!

Original comment below:

I'm encountering a new issue that could use some help.

Short description

The add_with_xxx command fails with:

TdiTableError: Error: table_entry_add failed on table pipe.MainControl.forwarding. [Object not found]

Relevant logs shown in bf_switchd:

MatchSpec:
priority = 0
num_valid_match_bits:48
num_match_bytes:6
match_value_bits: 0x0000c0a80166
match_mask_bits: 0xffffffffffff
ActionSpec:
action_bmap:01
num_valid_action_data_bits:64
num_action_data_bytes:8
action_data: 0x0000000200000002
get_table_id failed at 693
get_table_id failed at 714 end
2023-06-02 23:54:05.700968 BF_PIPE ERROR - dpdk table id get for table forwarding failed
2023-06-02 23:54:05.701086 BF_PIPE ERROR - not able get table metadata for table forwarding
2023-06-02 23:54:05.701134 BF_PIPE ERROR - dal_table_ent_add failed

How to reproduce?

Use the latest commit in this PR. Follow the instructions in e2e-test/README.md.

For the convenience of a quick look, below I copied the bfshell commands file I used:

tdi_python

# Add ports

for port_id in range(4):
    tdi.port.port.add(
        DEV_PORT=port_id,
        PORT_TYPE="BF_DPDK_TAP",
        PORT_DIR="PM_PORT_DIR_DEFAULT",
        PORT_IN_ID=port_id,
        PORT_OUT_ID =port_id,
        PIPE_IN="pipe",
        PIPE_OUT="pipe",
        MEMPOOL="MEMPOOL0",
        PORT_NAME=f"TAP{port_id}",
        MTU=1500
    )

# Add entries

from netaddr import IPAddress

control = tdi.main.pipe.MainControl
table = control.forwarding
table.add_with_mark_and_forward(
    dst_addr=IPAddress('192.168.1.101'),
    marker=0x01,
    port=1
)
table.add_with_mark_and_forward(
    dst_addr=IPAddress('192.168.1.102'),
    marker=0x02,
    port=2
)

The error occurs on the last two table.add_with_mark_and_forward calls.

jafingerhut · 2023-06-04T20:04:42Z

Are you able to create at least one simple test that passes, that is not blocked by the issue you have found and reported in this comment? #102 (comment)

If so, it might be nice to focus on creating that first simple passing test first, and file a separate issue for the bug you have found.

qobilidop · 2023-06-05T17:27:40Z

Thanks Andy for your suggestion! I'll try to bypass this and come to a working test first.

jafingerhut · 2023-06-07T22:52:57Z

@satish153 @swaroopsarma @jamescchoi Can you suggest someone who can (a) review this? and (b) do whatever approval is required for the Github "1 workflow awaiting approval" message I see on this conversation?

fruffy · 2023-06-15T12:58:13Z

I am interested in running this script as part of CI. How much time is spent on setting up dependencies? Realistically, do we require a docker container to run this as part of CI?

qobilidop · 2023-06-15T18:29:31Z

Thanks @fruffy for offering help! That would be great!

I just tried building the Docker image again. It takes 45 min on my computer, and the final image is 4.78 GB. I feel the building time might be too long, and the image size too large to be usable in CI.

Without a Docker container, as long as all the dependencies are installed, the testing scripts shall work. Maybe that's an easier solution on the CI side.

I still think providing a Dockerfile is useful as a reproducible environment to try things out locally, even if it's not used in the CI. I feel the best middle ground might be what's been done in the p4c repo, that the Dockerfile is simply invoking the ci-build.sh script, which is also used in CI to build required dependencies.

One challenge I faced in setting up dependencies is how to best bring in p4c. Building p4-dpdk-target itself doesn't require p4c, but in the e2e tests p4c is needed. My current solution is to use the public p4lang/p4c image as the base image, so that p4c is automatically available. This is probably not ideal, because the p4c version is not fixed, but sometimes that might matter. If we do want to fix p4c version in testing, then it seems building p4c from source is the only option?

I'd like to continue with more detailed discussions on this and some other issues, but I feel getting them all resolved in this PR might be too much. So, I'm proposing limiting the scope of this PR to a single working test that is automated locally, and expect more work to be done before it is fully automated in CI. Hopefully this could also make reviewing this PR easier.

src/pipe_mgr/shared/dal/dpdk/dal_init.c

swaroopsarma

apart from return status removing in dal_init.c, rest all looks fine

The final image is 5.3GB. That seems too large. There should be room for optimization.

The testing scripts are not finished yet. But I ran into some issues that might need to be resolved before I can move on.

TDI is updated to the latest commit (on Mar 1, 2023). DPDK is updated to the latest release (v23.03).

There are still lots of issues to solve before this smoke test becomes meaningful enough. But this could be a good starting point.

This brings back the enable() method that seems to be a crucial step in setting up the pipeline. But this also results in some errors. Will try to fix them.

This is a temporary solution though.

It works this time!

swaroopsarma · 2023-06-15T19:05:00Z

Just rebased the pull request

fruffy · 2023-06-16T12:14:23Z

Without a Docker container, as long as all the dependencies are installed, the testing scripts shall work. Maybe that's an easier solution on the CI side.

Sound great! 45 is also tolerable and we can likely accelerate this.

One challenge I faced in setting up dependencies is how to best bring in p4c. Building p4-dpdk-target itself doesn't require p4c, but in the e2e tests p4c is needed. My current solution is to use the public p4lang/p4c image as the base image, so that p4c is automatically available. This is probably not ideal, because the p4c version is not fixed, but sometimes that might matter. If we do want to fix p4c version in testing, then it seems building p4c from source is the only option?

Hmm, if we use ccache, building p4c from scratch can take around 5-8 minutes. We might consider that. We can also disable all back ends except the DPDK back end.

I'd like to continue with more detailed discussions on this and some other issues, but I feel getting them all resolved in this PR might be too much. So, I'm proposing limiting the scope of this PR to a single working test that is automated locally, and expect more work to be done before it is fully automated in CI. Hopefully this could also make reviewing this PR easier.

Yes, this was not a request, mostly curiosity. It will take a bit of time until we get to an automated testing run.

jafingerhut · 2023-06-22T18:15:25Z

@swaroopsarma Can you take another look at this to see if it looks good to you now? If I understand correctly, with the latest commit (number 11) on this PR, there are only new log messages output in the dal_init.c code, but no other functional changes.

qobilidop · 2023-06-22T23:37:11Z

@jafingerhut Thanks for the ping and sorry about the premature pushes. I added 2 more commits and now they should be ready for another review.

Also thank you @swaroopsarma for your previous review. Wish to hear more from you soon.

chrispsommers · 2023-06-23T18:26:13Z

This is probably not ideal, because the p4c version is not fixed, but sometimes that might matter. If we do want to fix p4c version in testing, then it seems building p4c from source is the only option?

HI @qobilidop This is easy to solve, you can docker pull a specific SHA of any docker image instead of using e.g. :latest or :stable, e.g. here is the version of p4c:latest on 2022-08-19. You can navigate dockerhub and look at the tag versions, see example then get the SHA from there. You can pull or run this image directly using @sha<sha code>: instead of the normal tag, for example:

docker pull p4lang/p4c@sha256:f334bfa3f3f5fc92c3935dc88dc2dbd35a17b1d332fcb98041d5a660c666b2dc

This is how you can pin a version. Just document in any scripts what the SHA corresponds to, e.g. # This is same as p4c:latest on 2023-06-23

BTW you can also use this to build your own dockerfiles using a specific base image, I do this in the SONiC-DASH project. Here's an example Dockerfile which I base on a specific p4c docker imaghe. I then make a new docker image which is smaller by throwing away unneeded backends. Then I publish this smaller custom p4c image to another container registry which a CI pipeline or user dev environment can pull down. I saved a few hundred MB this way. https://github.com/sonic-net/DASH/blob/main/dash-pipeline/dockerfiles/Dockerfile.p4c-bmv2

chrispsommers · 2023-06-23T18:32:05Z

To avoid long docker build times, why don't you automate the docker build and publish workflow similar to many other p4lang repos, e.g. https://github.com/p4lang/p4c/blob/main/.github/workflows/ci-container-image.yml ?

Everything gets built and published to Dockerhub daily. This will save hours, and perhaps attract more users because the initial user experience can be a deal-breaker. Strive for 5 Minutes TFD (Tine for First Dopamine).

qobilidop · 2023-06-23T19:13:45Z

Thanks @chrispsommers! Those are great suggestions. I didn't realize it could be that easy. The example Dockerfile you linked is especially useful. I see there is also a Dockerfile.p4c-dpdk. I'll learn from those examples and improve the Dockerfile in this PR.

As to publishing Docker images using GitHub CI automatically, I completely agree with you that this would be a great thing to have. I'm just not sure if I am the right person to set it up. Because it seems this would require more permissions (both on GitHub and DockerHub) than I currently have. I'm willing to do the work, but it might be easier for someone already having those permissions to set this up in a follow-up PR. @swaroopsarma @jafingerhut @fruffy What do you think?

chrispsommers · 2023-06-23T19:33:55Z

I'm just not sure if I am the right person to set it up. Because it seems this would require more permissions (both on GitHub and DockerHub) than I currently have. I'm willing to do the work, but it might be easier for someone already having those permissions to set this up in a follow-up PR. @swaroopsarma @jafingerhut @fruffy What do you think?

@qobilidop while developing it in your forked dev branch, you can take a few temporary measures to develop it as far as possible w/o involving others, e.g.:

comment out the final login/publish steps in the .yml CI file, this way you can debug the build process itself
change the Docker repo to a personal one on Dockerhub, I did this for DASH and we actually used it for several months in production until we migrated the docker registry to ACR.

Actually puling the trigger to publish to p4lang/p4-dpdk docker registry would probably require the p4-dpdk account admin to install the dockerhub account credentials in the repo secrets (unless they are global for all or p4lang GitHub site).

qobilidop · 2023-06-23T19:36:57Z

Got it. I'll try these initial steps then. Thanks again @chrispsommers!

chrispsommers · 2023-06-23T19:44:37Z

One more piece of advice - any changes you make to this branch will trigger CI actions in the p4lang/p4-dpdk GitHub pipeline, because the PR is already opened. If you want to try out a new CI script or other potentially messy experiments without sharing with the entire world, you can make a scratch branch off of it in your workspace (git checkout -b some-name) and push to your own repo. There you can do messy experiments while you're trying CI actions, etc. in relative privacy. You must enable Git Actions in your repo, they're off by default. Then you can merge back to the PR branch and the same actions will get triggered in the PR branch. I wish I'd thought of this before a few long, embarrassing sessions. :)

fruffy · 2023-06-23T20:10:38Z

Thanks @chrispsommers! Those are great suggestions. I didn't realize it could be that easy. The example Dockerfile you linked is especially useful. I see there is also a Dockerfile.p4c-dpdk. I'll learn from those examples and improve the Dockerfile in this PR.

As to publishing Docker images using GitHub CI automatically, I completely agree with you that this would be a great thing to have. I'm just not sure if I am the right person to set it up. Because it seems this would require more permissions (both on GitHub and DockerHub) than I currently have. I'm willing to do the work, but it might be easier for someone already having those permissions to set this up in a follow-up PR. @swaroopsarma @jafingerhut @fruffy What do you think?

Iiirc there is no formal process. The docker images are maintained on a voluntary basis. I believe @antoninbas has access to the p4lang Dockerhub.

@chrispsommers

Size is reduced from 4.78GB to 1GB! Thanks @chrispsommers for the suggestion.

qobilidop · 2023-06-26T23:09:30Z

@chrispsommers I've adopted the multi-stage builds technique from the examples you linked to. It helps reducing docker image size from 4.78GB to 1GB, and building time from 45min to 13min. Very impressive!

For the docker image CI setup, I still plan to do the work, but after second thoughts, I again propose to leave it to a follow-up PR, so we can leave this already-long PR more focused (and potentially easier to review). I can work on this in my fork without waiting for this PR to be merged. So that's not a blocker.

@swaroopsarma I've marked this PR as ready for review again. The main change needing your attention (based on your last review comments) is the diff in dal_init.c. I have come up with a new workaround, and would like to know if it is good enough this time.

@jafingerhut @fruffy The bmv2 ptf testing work you mentioned in another meeting sounds pretty relevant here. I'm interested to learn how it works if you can share a link, and see if I can help with integrating with it in a follow-up PR.

Thanks everyone!

fruffy · 2023-06-27T00:43:29Z

@jafingerhut @fruffy The bmv2 ptf testing work you mentioned in another meeting sounds pretty relevant here. I'm interested to learn how it works if you can share a link, and see if I can help with integrating with it in a follow-up PR.

I believe Andy did a writeup here. The main advantage is the direct use of Protobuf messages. I do not think this setup requires the shell. Correct me if I am wrong, Andy.

I am also working with a student at NYU (@Hoooao) on an automated setup for p4c, which uses P4Testgen to generate PTF tests for the SoftNIC and runs them. We were thinking of using your automated workflow and then combining it with the PTF tests setup. But this is still work in progress.

chrispsommers · 2023-06-27T00:50:06Z

I am also working with a student at NYU (@Hoooao) on an automated setup for p4c, which uses P4Testgen to generate PTF tests for the SoftNIC and runs them. We were thinking of using your automated workflow and then combining it with the PTF tests setup. But this is still work in progress.

@fruffy Interesting coincidence, I'd just messaged @jafingerhut about a proposal to include p4tools in all nightly p4c docker build/publish CI runs. It would make using p4testgen more convenient, just pull a pre-built docker. I'll file a PR shortly.

qobilidop · 2023-06-27T01:00:19Z

Thanks @fruffy for the information. That's great to know! This kind of test infra surely will be very useful. I look forward to you and @Hoooao's work. I'll read Andy's writeup for my own learning then.

chrispsommers · 2023-06-27T01:10:25Z

@fruffy I've submitted PR p4lang/p4c#4049 to incorporate p4tools (including p4testgen) into Dockerfile builds which will eliminate the need to build from scratch unless you're developing the backend itself. Consumers can just docker pull p4lang/p4c and use it. It needs a review.

jafingerhut · 2023-06-27T01:43:51Z

I believe Andy did a writeup here. The main advantage is the direct use of Protobuf messages. I do not think this setup requires the shell. Correct me if I am wrong, Andy.

My PTF tests use a Python library called p4runtime-shell just out of personal choice: https://github.com/p4lang/p4runtime-shell

Any method of creating+sending, or receiving+parsing, the required P4Runtime Protobuf messages in Python will work.

src/pipe_mgr/shared/dal/dpdk/dal_init.c

swaroopsarma

looks good

swaroopsarma · 2023-06-28T05:56:09Z

@qobilidop and @jafingerhut can we merge this PR ?

qobilidop · 2023-06-28T06:00:21Z

I'm okay with merging it. And thanks for reviewing!

jafingerhut · 2023-06-28T06:12:06Z

@swaroopsarma I have no objections to merging it. Please do not count on me for a technical review of these changes -- that would be a job best requested of others who know this repo contents better than I do.

I am not sure if you were asking me to do perform the merge, but if you were, I do not have permissions on this repository to do that.

qobilidop marked this pull request as draft May 30, 2023 23:44

qobilidop changed the title ~~(WIP) Set up a basic e2e smoke test~~ Set up a basic automated e2e smoke test Jun 7, 2023

qobilidop marked this pull request as ready for review June 7, 2023 17:47

swaroopsarma reviewed Jun 15, 2023

View reviewed changes

src/pipe_mgr/shared/dal/dpdk/dal_init.c Show resolved Hide resolved

swaroopsarma reviewed Jun 15, 2023

View reviewed changes

qobilidop added 10 commits June 15, 2023 12:04

Add a Dockerfile for a reproducible environment

4a66f2a

The final image is 5.3GB. That seems too large. There should be room for optimization.

Add a simple testing pipeline.

3f84d18

The testing scripts are not finished yet. But I ran into some issues that might need to be resolved before I can move on.

Update TDI and DPDK submodules.

8aa4e43

TDI is updated to the latest commit (on Mar 1, 2023). DPDK is updated to the latest release (v23.03).

Convert ad-hoc shell scripts into reusable Python scripts

f52793c

Add the bfshell command

610dd88

Set up a basic automated smoke test

a056ba2

There are still lots of issues to solve before this smoke test becomes meaningful enough. But this could be a good starting point.

Switch back to original versions of DPDK and TDI in this repo

85a4ab2

This brings back the enable() method that seems to be a crucial step in setting up the pipeline. But this also results in some errors. Will try to fix them.

Make the test case pass

7f36e98

This is a temporary solution though.

General refinement

97f680c

Add forwarding table back to the P4 pipeline

e86daa3

It works this time!

Log command and returned status

f8b9480

qobilidop added 2 commits June 22, 2023 20:15

Use zlog-cfg file to configure log level instead of hard coding

ecdbba4

Switch to checking file existence as a temporary workaround

b1bb5a5

qobilidop marked this pull request as draft June 23, 2023 19:37

Use multi-stage build to reduce Docker image size

24a215b

Size is reduced from 4.78GB to 1GB! Thanks @chrispsommers for the suggestion.

qobilidop marked this pull request as ready for review June 26, 2023 23:09

lzhzero reviewed Jun 27, 2023

View reviewed changes

src/pipe_mgr/shared/dal/dpdk/dal_init.c Show resolved Hide resolved

swaroopsarma approved these changes Jun 27, 2023

View reviewed changes

swaroopsarma merged commit b5c7770 into p4lang:main Jun 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set up a basic automated e2e smoke test #102

Set up a basic automated e2e smoke test #102

qobilidop commented May 30, 2023 •

edited

Loading

qobilidop commented Jun 3, 2023 •

edited

Loading

jafingerhut commented Jun 4, 2023

qobilidop commented Jun 5, 2023

jafingerhut commented Jun 7, 2023

fruffy commented Jun 15, 2023 •

edited

Loading

qobilidop commented Jun 15, 2023

swaroopsarma left a comment

swaroopsarma commented Jun 15, 2023

fruffy commented Jun 16, 2023 •

edited

Loading

jafingerhut commented Jun 22, 2023

qobilidop commented Jun 22, 2023

chrispsommers commented Jun 23, 2023

chrispsommers commented Jun 23, 2023

qobilidop commented Jun 23, 2023

chrispsommers commented Jun 23, 2023

qobilidop commented Jun 23, 2023

chrispsommers commented Jun 23, 2023

fruffy commented Jun 23, 2023

qobilidop commented Jun 26, 2023

fruffy commented Jun 27, 2023

chrispsommers commented Jun 27, 2023

qobilidop commented Jun 27, 2023

chrispsommers commented Jun 27, 2023

jafingerhut commented Jun 27, 2023

swaroopsarma left a comment

swaroopsarma commented Jun 28, 2023

qobilidop commented Jun 28, 2023

jafingerhut commented Jun 28, 2023

Set up a basic automated e2e smoke test #102

Set up a basic automated e2e smoke test #102

Conversation

qobilidop commented May 30, 2023 • edited Loading

Summary

Background

Demo

Current status and further plans

qobilidop commented Jun 3, 2023 • edited Loading

Short description

How to reproduce?

jafingerhut commented Jun 4, 2023

qobilidop commented Jun 5, 2023

jafingerhut commented Jun 7, 2023

fruffy commented Jun 15, 2023 • edited Loading

qobilidop commented Jun 15, 2023

swaroopsarma left a comment

Choose a reason for hiding this comment

swaroopsarma commented Jun 15, 2023

fruffy commented Jun 16, 2023 • edited Loading

jafingerhut commented Jun 22, 2023

qobilidop commented Jun 22, 2023

chrispsommers commented Jun 23, 2023

chrispsommers commented Jun 23, 2023

qobilidop commented Jun 23, 2023

chrispsommers commented Jun 23, 2023

qobilidop commented Jun 23, 2023

chrispsommers commented Jun 23, 2023

fruffy commented Jun 23, 2023

qobilidop commented Jun 26, 2023

fruffy commented Jun 27, 2023

chrispsommers commented Jun 27, 2023

qobilidop commented Jun 27, 2023

chrispsommers commented Jun 27, 2023

jafingerhut commented Jun 27, 2023

swaroopsarma left a comment

Choose a reason for hiding this comment

swaroopsarma commented Jun 28, 2023

qobilidop commented Jun 28, 2023

jafingerhut commented Jun 28, 2023

qobilidop commented May 30, 2023 •

edited

Loading

qobilidop commented Jun 3, 2023 •

edited

Loading

fruffy commented Jun 15, 2023 •

edited

Loading

fruffy commented Jun 16, 2023 •

edited

Loading