Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Add firecracker init logic to connector-init #890

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

jshearer
Copy link
Contributor

@jshearer jshearer commented Jan 20, 2023

Introduce support for running tasks (currently captures and materializations, but also derivations in the future) in Firecracker, and wire up flowctl-go to be able to use it, currently gated behind a hard-coded flag because we can't run this in prod yet.

Outside the VM

Firecracker is a virtual machine manager designed for Linux, so fundamentally the two things you need to run a VM are a kernel, and a userspace.

Kernel

The kernel is provided in a bundled vmlinux.bin file, which is the output of the kernel build process. I've included a fairly recent kernel build in the assets/ folder, as well as a config file that can be passed to https://github.com/anyfiddle/firecracker-kernel-builder in order to build it from scratch.

Filesystems

We mount two filesystems to the VM: the root filesystem which contains the init program and any config needed, and the main filesystem which contains the unpacked image that contains the connector entrypoint and anything else it expects to exist.

At the moment, we use the equivalent of the following command to generate a tar file of the specified image:

docker export $(docker create ghcr.io/connector-image) --output=connector-image.tar

We then use virt-make-fs to turn that tar file into an ext4-formatted binary file which we can pass to Firecracker as a mountable disk. This is a fairly wasteful process, especially if we start many VMs from the same image, or even images that share many of their layers. In the future we should run containerd and use its ability to checkout/"lease" filesystems based on a set of layers. I left this out of scope for the initial work as it's mainly a performance optimization.

Networking

Firecracker deals with networking by attaching to (or creating) a virtual TUN/TAP network device on the host, which acts as a mirror to/from eth0 inside the guest. We have a few goals for guest networking:

  • The guest needs public internet access, since captures and materializations need to talk to their respective databases
  • The guest should not be able to directly dial the host for security isolation reasons
  • The host should be able to dial ports on the guest in order to talk to the GRPC service that connector-init exposes
  • We don't need to worry about public connector networking here since that's implemented over the GRPC service through connector-init.

We decided to leverage the existing world of CNI plugins in order to configure the networking here, as not only does it allow for clear and powerful configuration of NAT, firewall, IP address allocation, and port mapping, it also supports easy cleanup when a VM is to be shut down. As an additional layer, all networking config is done inside of a network namespace which is torn down once the VM exits.

Inside the VM

Part of a regular Linux userspace is the init program which is responsible for setting up things like special mounts, and then executing the actual entrypoint.

As it turns out, connector-init was already written with this purpose in mind: it exposes a GRPC API to invoke the real connector entrypoint, stream back its output, etc. So, in order to use it as the init program for our Firecracker VM, I needed to teach it about all of the linux booting stuff that a regular init would do:

  • Set up and mount the main filesystem containing the connector entrypoint and associated files
  • Set up a bunch of special devices under /dev, /proc, /sys
  • Set up networking so that the VM knows where its gateway is and all that

flow-firecracker

In order to coordinate all of the mentioned setup and teardown, I wrote the flow-firecracker binary which is responsible for taking a (Docker) image name as well as things like kernel, init program etc, and running it all inside of a Firecracker VM, as well as tearing it all down.

firecracker-runtime 0.0.0

USAGE:
    flow-firecracker [OPTIONS] --init-program <INIT_PROGRAM> --kernel <KERNEL_PATH> --image-name <IMAGE_NAME> [-- <INIT_ARGS>...]

ARGS:
    <INIT_ARGS>...    Args to pass to the init program

OPTIONS:
        --attach
            Attach to VM stdout/stderr. If `only-vm-logs` is not set, then VM output will be logged
            as normal log messages [env: ATTACH=]

        --cni-path <CNI_PATH>
            Path to a directory containing the CNI plugins needed to set up firecracker networking.
            Currently these are: ptp, host-local, firewall, and tc-redirect-tap [env: CNI_PATH=]
            [default: /opt/cni/bin]

        --cpus <CPUS>
            Number of virtual CPU cores [env: CPU_CORE_COUNT=] [default: 1]

        --env <ENV_VAR>
            Environment variables to set inside the running VM

        --firecracker-path <FIRECRACKER_PATH>
            Path to the firecracker binary. If not specified, PATH will be searched [env:
            FIRECRACKER_PATH=]

    -h, --help
            Print help information

        --image-name <IMAGE_NAME>
            The name of the image to build and run, as understood by a docker-like registry e.g
            `hello-world`, `quay.io/podman/hello` [env: IMAGE_NAME=]

        --init-program <INIT_PROGRAM>
            Path to a built `flow-connector-init` binary to inject as the init program [env:
            INIT_PROGRAM=]

        --kernel <KERNEL_PATH>
            Path to an uncompressed linux kernel build [env: KERNEL=]

        --log-format <LOG_FORMAT>
            Log format [env: LOG_FORMAT=] [default: default] [possible values: default, json]

        --memory <MEM_SIZE_MB>
            Memory size in mb [env: MEMORY_SIZE_MB=] [default: 1024]

    -p, --publish <PORT_MAPPING>
            Ports to expose from the guest to the host, in the format of:
            8080:80 - Map TCP port 80 in the guest to port 8080 on the host.
            8080:80/udp - Map UDP port 80 in the guest to port 8080 on the host.

        --raw-vm-logs
            Stream raw VM stdout/stderr without wrapping with tracing [env: RAW_VM_LOGS=]

        --subnet <SUBNET>
            Allocate and assign VMs IPs from this range [env: SUBNET=] [default: 192.168.200.0/24]

    -V, --version
            Print version information

flowctl-go

The first "real" use-case to test firecracker end-to-end is using it instead of Docker to run capture/materialization connectors; derivations require more integration work and will come later. In order to support this, there is now runInFirecracker in go/connector/driver.go that can be switched when you want to test running in firecracker.

Hacks/Future work

firec

The best Rust crate to drive Firecracker is firec, and it's... pretty bad. Still worth using, but a bunch of features are left un-implemented, and rather than watching the socket to figure out when firecracker is running, it just... waits 10s. 🤦 I've had to fork firec to make a few critical things work, too.

Despite Firecracker being written in Rust, the canonical client library is written in Go: https://github.com/firecracker-microvm/firecracker-go-sdk. Ideally we'd write a corresponding Rust client library and publish it.

cnitool

Very similar situation to above. CNI specifies that plugins are just binaries that take env vars/stdin and output stdout. That being said, there is a good bit of "client" magic that goes into invoking them, and shocker of shockers it's all written in Go. Fortunately they offer a binary called cnitool which does what we need so in this PR I just shell out to that, but in the future it would be super neat to have a useful CNI client library in Rust.

Firecracker Requirements:

  • CNI plugins in /opt/cni/bin
  • cnitool binary on path
  • firecracker and jailer binaries on path
  • virt-make-fs on path
    Note: This will go away when we switch to using containerd
    • From libguestfs. On Ubuntu: sudo apt-get install libguestfs-tools
  • Must be run as root:
    • Creating filesystem images using mount needs root.
    • All of the various things CNI networking does needs root, or at least CAP_NET_ADMIN
    • jailer needs all sorts of permissions involving cgroups, network namespaces, mounts etc
    • firecracker needs to call /dev/kvm

This change is Reviewable

@jshearer jshearer force-pushed the feature/new_task_runtime branch from ce00215 to b1ebf2b Compare February 1, 2023 17:50
@jshearer jshearer force-pushed the feature/new_task_runtime branch from 88d7cbb to b8a68ec Compare February 6, 2023 15:55
@jshearer jshearer marked this pull request as ready for review February 8, 2023 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant