Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netdev CI testing #6666

Open
wants to merge 118 commits into
base: bpf-next_base
Choose a base branch
from
Open

Conversation

kuba-moo
Copy link
Contributor

Reusable PR for hooking netdev CI to BPF testing.

@kernel-patches-daemon-bpf kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 3 times, most recently from 4f22ee0 to 8a9a8e0 Compare March 28, 2024 04:46
@kuba-moo kuba-moo force-pushed the to-test branch 11 times, most recently from 64c403f to 8da1f58 Compare March 29, 2024 00:01
@kernel-patches-daemon-bpf kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 3 times, most recently from 78ebb17 to 9325308 Compare March 29, 2024 02:14
@kuba-moo kuba-moo force-pushed the to-test branch 6 times, most recently from c8c7b2f to a71aae6 Compare March 29, 2024 18:01
@kuba-moo kuba-moo force-pushed the to-test branch 2 times, most recently from d8feb00 to b16a6b9 Compare March 30, 2024 00:01
@kuba-moo kuba-moo force-pushed the to-test branch 2 times, most recently from 4164329 to c5cecb3 Compare March 30, 2024 06:00
mmhal and others added 29 commits February 3, 2025 22:00
During socket release, sock_orphan() is called without considering that it
sets sk->sk_wq to NULL. Later, if SO_LINGER is enabled, this leads to a
null pointer dereferenced in virtio_transport_wait_close().

Orphan the socket only after transport release.

Partially reverts the 'Fixes:' commit.

KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
 lock_acquire+0x19e/0x500
 _raw_spin_lock_irqsave+0x47/0x70
 add_wait_queue+0x46/0x230
 virtio_transport_release+0x4e7/0x7f0
 __vsock_release+0xfd/0x490
 vsock_release+0x90/0x120
 __sock_release+0xa3/0x250
 sock_close+0x14/0x20
 __fput+0x35e/0xa90
 __x64_sys_close+0x78/0xd0
 do_syscall_64+0x93/0x1b0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=9d55b199192a4be7d02c
Fixes: fcdd224 ("vsock: Keep the binding until socket destruction")
Signed-off-by: Michal Luczaj <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Explicitly close() a TCP_ESTABLISHED (connectible) socket with SO_LINGER
enabled. May trigger a null pointer dereference.

Signed-off-by: Michal Luczaj <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
NPAR 1.2 adds a transparent VLAN tag for all packets between the NIC
and the switch.  Because of that, RX VLAN acceleration cannot be
supported for any additional host configured VLANs.  The driver has
to acknowledge that it can support no RX VLAN acceleration and
set the NPAR 1.2 supported flag when registering with the FW.
Otherwise, the FW call will fail and the driver will abort on these
NPAR 1.2 NICs with this error:

bnxt_en 0000:26:00.0 (unnamed net_device) (uninitialized): hwrm req_type 0x1d seq id 0xb error 0x2

Reviewed-by: Somnath Kotur <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Add a new bnxt_hwrm_cp_ring_alloc_p5() function to handle allocating
one completion ring on P5_PLUS chips.  This simplifies the existing code
and will be useful later in the series.

Reviewed-by: Ajit Khaparde <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Add a new bnxt_hwrm_tx_ring_alloc() function to handle allocating
a transmit ring.  This will be useful later in the series.

Reviewed-by: Ajit Khaparde <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Add a wrapper routine to free L2 completion rings.  This will be
useful later in the series.

Reviewed-by: Kalesh AP <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Somnath Kotur <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Modify bnxt_free_tx_rings() to free the skbs per TX ring.
This will be useful later in the series.

Reviewed-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Somnath Kotur <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
There is some common code for setting up RX and RX AGG ring allocation
parameters for P5_PLUS chips.  Refactor the logic into a new function.

Reviewed-by: Hongguang Gao <[email protected]>
Reviewed-by: Ajit Khaparde <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Newer firmware can use the NQ ring ID associated with each RX/RX AGG
ring to enable PCIe Steering Tags on P5_PLUS chips.  When allocating
RX/RX AGG rings, pass along NQ ring ID for the firmware to use.  This
information helps optimize DMA writes by directing them to the cache
closer to the CPU consuming the data, potentially improving the
processing speed.  This change is backward-compatible with older
firmware, which will simply disregard the information.

Reviewed-by: Hongguang Gao <[email protected]>
Reviewed-by: Ajit Khaparde <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Andy Gospodarek <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
In order to program the correct Steering Tag during an IRQ affinity
change, we need to free/re-allocate the RX completion ring during
queue_restart.  If TPH is enabled, call FW to free the Rx completion
ring and clear the ring entries in queue_stop().  Re-allocate it in
queue_start() if TPH is enabled.  Note that TPH mode is not enabled
in this patch and will be enabled later in the patch series.

While modifying bnxt_queue_start(), remove the unnecessary zeroing of
rxr->rx_next_cons.  It gets overwritten by the clone in
bnxt_queue_start().

Reviewed-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Somnath Kotur <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
In order to use queue_stop/queue_start to support the new Steering
Tags, we need to free the TX ring and TX completion ring if it is a
combined channel with TX/RX sharing the same NAPI.  Otherwise
TX completions will not have the updated Steering Tag.  If TPH is
not enabled, we just stop the TX ring without freeing the TX/TX cmpl
rings.  With that we can now add napi_disable() and napi_enable()
during queue_stop()/ queue_start().  This will guarantee that NAPI
will stop processing the completion entries in case there are
additional pending entries in the completion rings after queue_stop().

There could be some NQEs sitting unprocessed while NAPI is disabled
thereby leaving the NQ unarmed.  Explicitly re-arm the NQ after
napi_enable() in queue start so that NAPI will resume properly.

Error handling in bnxt_queue_start() requires a reset.  If a TX
ring cannot be allocated or initialized properly, it will cause
TX timeout.  The reset will also free any partially allocated
rings.

Reviewed-by: Ajit Khaparde <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Signed-off-by: Somnath Kotur <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Add TPH support to the Broadcom BNXT device driver. This allows the
driver to utilize TPH functions for retrieving and configuring Steering
Tags when changing interrupt affinity. With compatible NIC firmware,
network traffic will be tagged correctly with Steering Tags, resulting
in significant memory bandwidth savings and other advantages as
demonstrated by real network benchmarks on TPH-capable platforms.

Co-developed-by: Somnath Kotur <[email protected]>
Signed-off-by: Somnath Kotur <[email protected]>
Co-developed-by: Wei Huang <[email protected]>
Signed-off-by: Wei Huang <[email protected]>
Signed-off-by: Manoj Panicker <[email protected]>
Reviewed-by: Ajit Khaparde <[email protected]>
Reviewed-by: Andy Gospodarek <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Expected behaviour:
In case we reach scheduler's limit, pfifo_tail_enqueue() will drop a
packet in scheduler's queue and decrease scheduler's qlen by one.
Then, pfifo_tail_enqueue() enqueue new packet and increase
scheduler's qlen by one. Finally, pfifo_tail_enqueue() return
`NET_XMIT_CN` status code.

Weird behaviour:
In case we set `sch->limit == 0` and trigger pfifo_tail_enqueue() on a
scheduler that has no packet, the 'drop a packet' step will do nothing.
This means the scheduler's qlen still has value equal 0.
Then, we continue to enqueue new packet and increase scheduler's qlen by
one. In summary, we can leverage pfifo_tail_enqueue() to increase qlen by
one and return `NET_XMIT_CN` status code.

The problem is:
Let's say we have two qdiscs: Qdisc_A and Qdisc_B.
 - Qdisc_A's type must have '->graft()' function to create parent/child relationship.
   Let's say Qdisc_A's type is `hfsc`. Enqueue packet to this qdisc will trigger `hfsc_enqueue`.
 - Qdisc_B's type is pfifo_head_drop. Enqueue packet to this qdisc will trigger `pfifo_tail_enqueue`.
 - Qdisc_B is configured to have `sch->limit == 0`.
 - Qdisc_A is configured to route the enqueued's packet to Qdisc_B.

Enqueue packet through Qdisc_A will lead to:
 - hfsc_enqueue(Qdisc_A) -> pfifo_tail_enqueue(Qdisc_B)
 - Qdisc_B->q.qlen += 1
 - pfifo_tail_enqueue() return `NET_XMIT_CN`
 - hfsc_enqueue() check for `NET_XMIT_SUCCESS` and see `NET_XMIT_CN` => hfsc_enqueue() don't increase qlen of Qdisc_A.

The whole process lead to a situation where Qdisc_A->q.qlen == 0 and Qdisc_B->q.qlen == 1.
Replace 'hfsc' with other type (for example: 'drr') still lead to the same problem.
This violate the design where parent's qlen should equal to the sum of its childrens'qlen.

Bug impact: This issue can be used for user->kernel privilege escalation when it is reachable.

Fixes: f70f90672a2c ("sched: add head drop fifo queue")
Reported-by: Quang Le <[email protected]>
Signed-off-by: Quang Le <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
…limit==0

When limit == 0, pfifo_tail_enqueue() must drop new packet and
increase dropped packets count of the qdisc.

All test results:

1..16
ok 1 a519 - Add bfifo qdisc with system default parameters on egress
ok 2 585c - Add pfifo qdisc with system default parameters on egress
ok 3 a86e - Add bfifo qdisc with system default parameters on egress with handle of maximum value
ok 4 9ac8 - Add bfifo qdisc on egress with queue size of 3000 bytes
ok 5 f4e6 - Add pfifo qdisc on egress with queue size of 3000 packets
ok 6 b1b1 - Add bfifo qdisc with system default parameters on egress with invalid handle exceeding maximum value
ok 7 8d5e - Add bfifo qdisc on egress with unsupported argument
ok 8 7787 - Add pfifo qdisc on egress with unsupported argument
ok 9 c4b6 - Replace bfifo qdisc on egress with new queue size
ok 10 3df6 - Replace pfifo qdisc on egress with new queue size
ok 11 7a67 - Add bfifo qdisc on egress with queue size in invalid format
ok 12 1298 - Add duplicate bfifo qdisc on egress
ok 13 45a0 - Delete nonexistent bfifo qdisc
ok 14 972b - Add prio qdisc on egress with invalid format for handles
ok 15 4d39 - Delete bfifo qdisc twice
ok 16 d774 - Check pfifo_head_drop qdisc enqueue behaviour when limit == 0

Signed-off-by: Quang Le <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
qdisc_tree_reduce_backlog() notifies parent qdisc only if child
qdisc becomes empty, therefore we need to reduce the backlog of the
child qdisc before calling it. Otherwise it would miss the opportunity
to call cops->qlen_notify(), in the case of DRR, it resulted in UAF
since DRR uses ->qlen_notify() to maintain its active list.

Fixes: f8d4bc4 ("net/sched: netem: account for backlog updates from child qdisc")
Cc: Martin Ottens <[email protected]>
Reported-by: Mingi Cho <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Integrate the test case provided by Mingi Cho into TDC.

All test results:

1..4
ok 1 ca5e - Check class delete notification for ffff:
ok 2 e4b7 - Check class delete notification for root ffff:
ok 3 33a9 - Check ingress is not searchable on backlog update
ok 4 a4b9 - Test class qlen notification

Cc: Mingi Cho <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Add read only access to the 32-entry MAC address TCAM via debugfs.
BMC filtering shares the same table so this is quite useful
to access during debug. See next commit for an example output.

Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
…en adding unicast addrs

I realized when we were adding unicast addresses we were enabling
promiscous mode. I did a bit of digging and realized we had overlooked
setting the driver private flag to indicate we supported unicast filtering.

Example below shows the table with 00deadbeef01 as the main NIC address,
and 5 additional addresses in the 00deadbeefX0 format.

  # cat $dbgfs/mac_addr
  Idx S TCAM Bitmap       Addr/Mask
  ----------------------------------
  00  0 00000000,00000000 000000000000
                          000000000000
  01  0 00000000,00000000 000000000000
                          000000000000
  02  0 00000000,00000000 000000000000
                          000000000000
  ...
  24  0 00000000,00000000 000000000000
                          000000000000
  25  1 00100000,00000000 00deadbeef50
                          000000000000
  26  1 00100000,00000000 00deadbeef40
                          000000000000
  27  1 00100000,00000000 00deadbeef30
                          000000000000
  28  1 00100000,00000000 00deadbeef20
                          000000000000
  29  1 00100000,00000000 00deadbeef10
                          000000000000
  30  1 00100000,00000000 00deadbeef01
                          000000000000
  31  0 00000000,00000000 000000000000
                          000000000000

Before rule 31 would be active. With this change it correctly sticks
to just the unicast filters.

Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

Remove unused flexible-array member `buf` and, with this, fix the following
warnings:
drivers/net/ethernet/aquantia/atlantic/aq_hw.h:197:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/ethernet/aquantia/atlantic/hw_atl/../aq_hw.h:197:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Suggested-by: Igor Russkikh <[email protected]>
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

Move the conflicting declaration to the end of the structure. Notice
that `struct ethtool_dump` is a flexible structure --a structure that
contains a flexible-array member.

Fix the following warning:
./drivers/net/ethernet/chelsio/cxgb4/cxgb4.h:1215:29: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

So, in order to avoid ending up with a flexible-array member in the
middle of other structs, we use the `struct_group_tagged()` helper
to create a new tagged `struct mlx5e_umr_wqe_hdr`. This structure
groups together all the members of the flexible `struct mlx5e_umr_wqe`
except the flexible array.

As a result, the array is effectively separated from the rest of the
members without modifying the memory layout of the flexible structure.
We then change the type of the middle struct member currently causing
trouble from `struct mlx5e_umr_wqe` to `struct mlx5e_umr_wqe_hdr`.

We also want to ensure that when new members need to be added to the
flexible structure, they are always included within the newly created
tagged struct. For this, we use `static_assert()`. This ensures that the
memory layout for both the flexible structure and the new tagged struct
is the same after any changes.

This approach avoids having to implement `struct mlx5e_umr_wqe_hdr` as
a completely separate structure, thus preventing having to maintain two
independent but basically identical structures, closing the door to
potential bugs in the future.

We also use `container_of()` whenever we need to retrieve a pointer to
the flexible structure, through which we can access the flexible-array
member, if necessary.

So, with these changes, fix 124 of the following warnings:

drivers/net/ethernet/mellanox/mlx5/core/en.h:664:48: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
tc_actions.sh keeps hanging the forwarding tests.

sdf@: tdc & tdc-dbg started intermittenly failing around Sep 25th

Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
It won't show up in 6.8 as kmemleak did not report per-cpu allocation
leaks. But even with the latest kernel, it's probabilistic, some data
somewhere may look like a pointer and not be reported (I couldn't
reproduce it on arm64).

It turns out to be a false positive. The __percpu pointers are
referenced from node_data[]. The latter is populated in
alloc_node_data() and kmemleak registers the pg_data_t object from the
memblock allocation. However, due to an incorrect pfn range check
introduced by commit 84c3262 ("mm: kmemleak: check physical address
when scan"), we ignore the node_data[0] allocation. Some printks in
alloc_node_data() show:

	nd_pa = 0x3ffda140
	nd_size = 0x4ec0
	min_low_pfn = 0x0
	max_low_pfn = 0x3ffdf
	nd_pa + nd_size == 0x3ffdf000

So the "PHYS_PFN(nd_pa + nd_size) >= max_low_pfn" check in kmemleak is
true and the whole pg_data_t object ignored (not scanned). The __percpu
pointers won't be detected. The fix is simple:

Signed-off-by: NipaLocal <nipa@local>
On Wed, Jan 29, 2025 at 08:13:02AM +0100, Greg Kroah-Hartman wrote:

> > Both are needed, actually.  Slightly longer term I would rather
> > split full_proxy_{read,write,lseek}() into short and full variant,
> > getting rid of the "check which pointer is non-NULL" and killed
> > the two remaining users of debugfs_real_fops() outside of
> > fs/debugfs/file.c; then we could union these ->..._fops pointers,
> > but until then they need to be initialized.
> >
> > And yes, ->methods obviously needs to be initialized.
> >
> > Al, bloody embarrassed ;-/
>
> No worries, want to send a patch to fix both of these up so we can fix
> up Linus's tree now?

[PATCH] Fix the missing initializations in __debugfs_file_get()

both method table pointers in debugfs_fsdata need to be initialized,
obviously, and calculating the bitmap of present methods would also
go better if we start with initialized state.

Fixes: 41a0ecc "debugfs: get rid of dynamically allocation proxy_ops"
Fucked-up-by: Al Viro <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Disable tests we don't care about, we use alltests in kunit.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
# Conflicts:
#	drivers/firmware/cirrus/Kconfig
#	lib/Kconfig.debug
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.