refactor: Add firecracker init logic to `connector-init` #890

jshearer · 2023-01-20T18:30:23Z

Introduce support for running tasks (currently captures and materializations, but also derivations in the future) in Firecracker, and wire up flowctl-go to be able to use it, currently gated behind a hard-coded flag because we can't run this in prod yet.

Outside the VM

Firecracker is a virtual machine manager designed for Linux, so fundamentally the two things you need to run a VM are a kernel, and a userspace.

Kernel

The kernel is provided in a bundled vmlinux.bin file, which is the output of the kernel build process. I've included a fairly recent kernel build in the assets/ folder, as well as a config file that can be passed to https://github.com/anyfiddle/firecracker-kernel-builder in order to build it from scratch.

Filesystems

We mount two filesystems to the VM: the root filesystem which contains the init program and any config needed, and the main filesystem which contains the unpacked image that contains the connector entrypoint and anything else it expects to exist.

At the moment, we use the equivalent of the following command to generate a tar file of the specified image:

docker export $(docker create ghcr.io/connector-image) --output=connector-image.tar

We then use virt-make-fs to turn that tar file into an ext4-formatted binary file which we can pass to Firecracker as a mountable disk. This is a fairly wasteful process, especially if we start many VMs from the same image, or even images that share many of their layers. In the future we should run containerd and use its ability to checkout/"lease" filesystems based on a set of layers. I left this out of scope for the initial work as it's mainly a performance optimization.

Networking

Firecracker deals with networking by attaching to (or creating) a virtual TUN/TAP network device on the host, which acts as a mirror to/from eth0 inside the guest. We have a few goals for guest networking:

The guest needs public internet access, since captures and materializations need to talk to their respective databases
The guest should not be able to directly dial the host for security isolation reasons
The host should be able to dial ports on the guest in order to talk to the GRPC service that connector-init exposes
We don't need to worry about public connector networking here since that's implemented over the GRPC service through connector-init.

We decided to leverage the existing world of CNI plugins in order to configure the networking here, as not only does it allow for clear and powerful configuration of NAT, firewall, IP address allocation, and port mapping, it also supports easy cleanup when a VM is to be shut down. As an additional layer, all networking config is done inside of a network namespace which is torn down once the VM exits.

Inside the VM

Part of a regular Linux userspace is the init program which is responsible for setting up things like special mounts, and then executing the actual entrypoint.

As it turns out, connector-init was already written with this purpose in mind: it exposes a GRPC API to invoke the real connector entrypoint, stream back its output, etc. So, in order to use it as the init program for our Firecracker VM, I needed to teach it about all of the linux booting stuff that a regular init would do:

Set up and mount the main filesystem containing the connector entrypoint and associated files
Set up a bunch of special devices under /dev, /proc, /sys
Set up networking so that the VM knows where its gateway is and all that

`flow-firecracker`

In order to coordinate all of the mentioned setup and teardown, I wrote the flow-firecracker binary which is responsible for taking a (Docker) image name as well as things like kernel, init program etc, and running it all inside of a Firecracker VM, as well as tearing it all down.

firecracker-runtime 0.0.0

USAGE:
    flow-firecracker [OPTIONS] --init-program <INIT_PROGRAM> --kernel <KERNEL_PATH> --image-name <IMAGE_NAME> [-- <INIT_ARGS>...]

ARGS:
    <INIT_ARGS>...    Args to pass to the init program

OPTIONS:
        --attach
            Attach to VM stdout/stderr. If `only-vm-logs` is not set, then VM output will be logged
            as normal log messages [env: ATTACH=]

        --cni-path <CNI_PATH>
            Path to a directory containing the CNI plugins needed to set up firecracker networking.
            Currently these are: ptp, host-local, firewall, and tc-redirect-tap [env: CNI_PATH=]
            [default: /opt/cni/bin]

        --cpus <CPUS>
            Number of virtual CPU cores [env: CPU_CORE_COUNT=] [default: 1]

        --env <ENV_VAR>
            Environment variables to set inside the running VM

        --firecracker-path <FIRECRACKER_PATH>
            Path to the firecracker binary. If not specified, PATH will be searched [env:
            FIRECRACKER_PATH=]

    -h, --help
            Print help information

        --image-name <IMAGE_NAME>
            The name of the image to build and run, as understood by a docker-like registry e.g
            `hello-world`, `quay.io/podman/hello` [env: IMAGE_NAME=]

        --init-program <INIT_PROGRAM>
            Path to a built `flow-connector-init` binary to inject as the init program [env:
            INIT_PROGRAM=]

        --kernel <KERNEL_PATH>
            Path to an uncompressed linux kernel build [env: KERNEL=]

        --log-format <LOG_FORMAT>
            Log format [env: LOG_FORMAT=] [default: default] [possible values: default, json]

        --memory <MEM_SIZE_MB>
            Memory size in mb [env: MEMORY_SIZE_MB=] [default: 1024]

    -p, --publish <PORT_MAPPING>
            Ports to expose from the guest to the host, in the format of:
            8080:80 - Map TCP port 80 in the guest to port 8080 on the host.
            8080:80/udp - Map UDP port 80 in the guest to port 8080 on the host.

        --raw-vm-logs
            Stream raw VM stdout/stderr without wrapping with tracing [env: RAW_VM_LOGS=]

        --subnet <SUBNET>
            Allocate and assign VMs IPs from this range [env: SUBNET=] [default: 192.168.200.0/24]

    -V, --version
            Print version information

`flowctl-go`

The first "real" use-case to test firecracker end-to-end is using it instead of Docker to run capture/materialization connectors; derivations require more integration work and will come later. In order to support this, there is now runInFirecracker in go/connector/driver.go that can be switched when you want to test running in firecracker.

Hacks/Future work

`firec`

The best Rust crate to drive Firecracker is firec, and it's... pretty bad. Still worth using, but a bunch of features are left un-implemented, and rather than watching the socket to figure out when firecracker is running, it just... waits 10s. 🤦 I've had to fork firec to make a few critical things work, too.

Despite Firecracker being written in Rust, the canonical client library is written in Go: https://github.com/firecracker-microvm/firecracker-go-sdk. Ideally we'd write a corresponding Rust client library and publish it.

`cnitool`

Very similar situation to above. CNI specifies that plugins are just binaries that take env vars/stdin and output stdout. That being said, there is a good bit of "client" magic that goes into invoking them, and shocker of shockers it's all written in Go. Fortunately they offer a binary called cnitool which does what we need so in this PR I just shell out to that, but in the future it would be super neat to have a useful CNI client library in Rust.

Firecracker Requirements:

CNI plugins in /opt/cni/bin
- From Container Networking using build_linux.sh
  - ptp
  - host-local
  - firewall
- From AWSLabs
  - tc-redirect-tap

cnitool binary on path

Found in https://github.com/containernetworking/cni/tree/main/cnitool

go get github.com/containernetworking/cni
go install github.com/containernetworking/cni/cnitool

firecracker and jailer binaries on path
- From releases page on https://github.com/firecracker-microvm/firecracker
virt-make-fs on path
Note: This will go away when we switch to using containerd
- From libguestfs. On Ubuntu: sudo apt-get install libguestfs-tools
Must be run as root:
- Creating filesystem images using mount needs root.
- All of the various things CNI networking does needs root, or at least CAP_NET_ADMIN
- jailer needs all sorts of permissions involving cgroups, network namespaces, mounts etc
- firecracker needs to call /dev/kvm

This change is

…ocker image, using `connector-init` as the init

jshearer force-pushed the feature/new_task_runtime branch from ce00215 to b1ebf2b Compare February 1, 2023 17:50

jshearer added 7 commits February 6, 2023 15:53

refactor: Add firecracker init logic to connector-init

0434b7c

feature: build out tooling to setup and run a firecracker VM from a d…

8880a97

…ocker image, using `connector-init` as the init

works when just one

4b9ab1d

WIP: Got CNI working!

87032fc

refactor: Clean up a lot of stuff to make this a usable library

8f5b3e1

refactor: Add port mapping

2027c51

Integration work to get flowctl-go able to invoke flow-firecracker

b8a68ec

jshearer force-pushed the feature/new_task_runtime branch from 88d7cbb to b8a68ec Compare February 6, 2023 15:55

jshearer added 6 commits February 6, 2023 19:44

Got e2e firecracker connector working

d473dc1

refactor: change firecracker crate name

0507127

docs: Add README

d6d32bd

refactor: Env filter and remove useless warning

4a2436c

fix: PR with runInFirecracker disabled

3731524

refactor: Use serde features to parse env vars

eedd59a

jshearer requested review from jgraettinger and psFried February 8, 2023 21:15

jshearer marked this pull request as ready for review February 8, 2023 21:19

jshearer added 4 commits February 8, 2023 21:37

refactor: Fix up cargo.toml

585659e

fix: clippy warnings

7663746

fix: cleanup comment

318979d

Add kernel build config

b344d09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Add firecracker init logic to `connector-init` #890

refactor: Add firecracker init logic to `connector-init` #890

jshearer commented Jan 20, 2023 •

edited

Loading

refactor: Add firecracker init logic to connector-init #890

Are you sure you want to change the base?

refactor: Add firecracker init logic to connector-init #890

Conversation

jshearer commented Jan 20, 2023 • edited Loading

Outside the VM

Kernel

Filesystems

Networking

Inside the VM

flow-firecracker

flowctl-go

Hacks/Future work

firec

cnitool

Firecracker Requirements:

refactor: Add firecracker init logic to `connector-init` #890

refactor: Add firecracker init logic to `connector-init` #890

jshearer commented Jan 20, 2023 •

edited

Loading

`flow-firecracker`

`flowctl-go`

`firec`

`cnitool`