Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf readRawSample fails when bpf_perf_event_output submit buffer larger than 64k (appending skb) #1633

Open
alban opened this issue Dec 19, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@alban
Copy link
Contributor

alban commented Dec 19, 2024

Describe the bug

Perf events of type PERF_RECORD_SAMPLE contains a header with the 16-bit size, read in readRecord:

// perfEventHeader must match 'struct perf_event_header` in <linux/perf_event.h>.
type perfEventHeader struct {
	Type uint32
	Misc uint16
	Size uint16
}

It is followed by another 32-bit size field, read in readRawSample:

// This must match 'struct perf_event_sample in kernel sources.
type perfEventSample struct {
	Size uint32
}

Unfortunately, it is possible for a bpf program to submit a record larger than 64kB by appending a small event with a 65507-byte UDP packet:

SEC("socket1")
int my_socket_filter(struct __sk_buff *skb)
{
        __u64 skb_len = skb->len;
        bpf_perf_event_output(skb, &events, skb_len << 32 | BPF_F_CURRENT_CPU,
                              &event, sizeof(event));
}

The kernel calculates the header size in perf_sample_data_size. This returns a u32. Then, it just casts it in the u16 field in perf_prepare_header. See also bpf_event_output and perf_sample_save_raw_data.

This means that perfEventHeader->Size has a small number due to integer overflow. And perfEventSample->Size has a >64kB number.

Unfortunately, cilium/ebpf does not notice the inconsistency and I get errors such as:

Error reading perf ring buffer: read sample: unexpected EOF

How to reproduce

Generate a large UDP packet:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int main() {
    int max_packet_size = 65507;
    int sockfd;
    struct sockaddr_in servaddr;

    if (getenv("max_packet_size") != NULL) {
        max_packet_size = atoi(getenv("max_packet_size"));
    }

    // Create a UDP socket
    if ((sockfd = socket(AF_INET, SOCK_DGRAM, 0)) < 0) {
        perror("socket creation failed");
        exit(EXIT_FAILURE);
    }

    memset(&servaddr, 0, sizeof(servaddr));

    // Fill server information
    servaddr.sin_family = AF_INET;
    servaddr.sin_port = htons(5353); // Change the port number as needed
    servaddr.sin_addr.s_addr = inet_addr("127.0.0.1"); // Change the IP address as needed

    // Create a large buffer to hold the packet data
    char *buffer = (char *)malloc(max_packet_size);
    if (buffer == NULL) {
        perror("malloc failed");
        exit(EXIT_FAILURE);
    }
    memset(buffer, 'A', max_packet_size - 1);
    buffer[max_packet_size - 1] = '\0';

    // Send the UDP packet
    int n = sendto(sockfd, buffer, max_packet_size,
                   MSG_CONFIRM, (const struct sockaddr *)&servaddr,
                   sizeof(servaddr));
    if (n < 0) {
        perror("sendto failed");
        exit(EXIT_FAILURE);
    }

    free(buffer);
    close(sockfd);
    return 0;
}

Attach a socket filter using cilium/ebpf and read from the perf ring buffer:

SEC("socket1")
int my_socket_filter(struct __sk_buff *skb)
{
        __u64 skb_len = skb->len;
        bpf_perf_event_output(skb, &events, skb_len << 32 | BPF_F_CURRENT_CPU,
                              &event, sizeof(event));
}

Version information

github.com/cilium/ebpf v0.16.0 => github.com/cilium/ebpf v0.16.1-0.20241023154409-d3c63ab2edcb

@florianl
Copy link
Contributor

florianl commented Jan 3, 2025

Thanks @alban for the detailed report 🙏
With your provided information I will try to reproduce it. Do you copy all the max_packet_size bytes into event in my_socket_filter()? And it is not clear to me, why skb_len << 32 | BPF_F_CURRENT_CPU is set as flags to bpf_perf_event_output(). I would expect skb_len << 32 | BPF_F_INDEX_MASK or BPF_F_CURRENT_CPU here instead.

@alban
Copy link
Contributor Author

alban commented Jan 6, 2025

Thanks @florianl for looking into this.

tl;dr: maybe the issues are in the kernel and in my code (and not in cilium/ebpf).

The bpf program itself does not copy any bytes from the packet into event: this is done by the bpf_perf_event_output helper:

We can make use of the passed in skb context that we have in the helper anyway, and use some of the reserved flag bits as a length argument.

torvalds/linux@555c8a8

And it is not clear to me, why skb_len << 32 | BPF_F_CURRENT_CPU is set as flags to bpf_perf_event_output(). I would expect skb_len << 32 | BPF_F_INDEX_MASK or BPF_F_CURRENT_CPU here instead.

BPF_F_INDEX_MASK and BPF_F_CURRENT_CPU have the same values, so your change would have no effect:

/* BPF_FUNC_perf_event_output, BPF_FUNC_perf_event_read and
 * BPF_FUNC_perf_event_read_value flags.
 */
enum {
        BPF_F_INDEX_MASK                = 0xffffffffULL,
        BPF_F_CURRENT_CPU               = BPF_F_INDEX_MASK,
/* BPF_FUNC_perf_event_output for sk_buff input context. */
        BPF_F_CTXLEN_MASK               = (0xfffffULL << 32),
};

AFAIU, the semantics of the u64 flags argument to bpf_perf_event_output is as follows:

---
title: "bpf_perf_event_output(u64 flags)"
---
packet-beta
0-19: "reserved (12 bits)"
20-31: "ctx size (20 bits)"
32-63: "index in map, or BPF_F_CURRENT_CPU (-1) to select the current CPU (32 bits)"
Loading

So my code is meant to say:

  1. Automatically select the index in the map for the current CPU
  2. Append the full network packet to the perf event

There might be several things to address:

On the other hand, I noticed the following code pattern in cilium/ebpf:
https://github.com/cilium/ebpf/blob/v0.17.0/examples/fentry/main.go#L88-L89

        for {
                record, err := rd.Read()
                if err != nil {
                        if errors.Is(err, ringbuf.ErrClosed) {
                                log.Println("received signal, exiting..")
                                return
                        }
                        log.Printf("reading from reader: %s", err)
                        continue
                }

So the error is just ignored, and we "continue" to the next iteration of the loop.

Whereas my code has the following pattern:
https://github.com/inspektor-gadget/inspektor-gadget/blob/v0.36.0/pkg/operators/ebpf/tracer.go#L170-L174

        for {
                rec, err := t.perfReader.Read()
                if err != nil {
                        return err
                }

So the error is not ignored and we stop iterating the loop. I have not yet tested if the userspace reader recovers if we replace the return by a continue.

This is probably why it stopped working (and not due a bug in cilium/ebpf).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants