Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added feature synth-doc #99

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
191 changes: 191 additions & 0 deletions docs/blog/2021-08-09-macro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
---
title: Complex Procedural Rust Macros
author: Andre Bogus
author_title: Chief Rustacean
author_url: https://github.com/getsynth
author_image_url: https://avatars.githubusercontent.com/u/4200835?v=4
tags: [rust, macros]
description: In this post, we write a procedural macro that generates code to bind functions and types including arbitrary many impl blocks to a scripting language. The problems encountered and techniques learned can be applied to other tasks where complex compile-time analysis that spans multiple macro invocations is required.
image: https://live.staticflickr.com/2785/4317096083_24697db960_b.jpg
hide_table_of_contents: false
---

I recently wrote the most complex procedural Rust macro I’ve ever attempted. This post tries to outline the problems I’ve encountered and tells how I overcame them.

## The Background

With synth, we are building a declarative command line test data generator. For now, the specification that declares what test data to build is just JSON that gets deserialised to our data structures using `serde_json`. This was a quick and easy way to configure our application without much overhead in terms of code. For example:

```json synth
{
"type": "array",
"length": {
"type": "number",
"constant": 3
},
"content": {
"type": "object",
"id": {
"type": "number",
"id": {}
},
"name": {
"type": "string",
"faker": {
"generator": "name"
}
},
"email": {
"type": "string",
"faker": {
"generator": "ascii_email"
}
}
}
}
```

However, it’s also not very nice to write (for example JSON has no comments, no formulas, etc.), so we wanted to bind our specification to a scripting language. Our end goal is to extend the language (both in terms of builtin functions and syntax) to make the configuration really elegant. After some testing and benchmarking different runtimes, our choice fell on [koto](https://github.com/koto-lang/koto), a nice little scripting language that was built foremost for live coding.

Unfortunately, koto has a very bare interface to bind to external Rust code. Since we are talking about a rather large number of types we want to include, it was clear from the start that we would want to generate the code to bind to koto.

## Early Beginnings

So I started with a somewhat simple macro to wrangle koto types (e.g. Maps) into our Rust types. However, I soon found that the marshalling overhead would have been fine for an initial setup phase, but not for recurrent calls into koto (for example for filter functions called for each row). Thus I changed my approach to try and bind functions, then extended that to types and impl blocks.

I found – as I then thought – a genius technique of generating functions that would call each other, thus daisy-chaining a series of code blocks into one that could then be called with another `bindlang_main!()` proc macro:

```rust
static FN_NUMBER: AtomicUsize = AtomicUsize::new(0);

fn next_fn(mut b: Block, arg: &Expr) -> Item {
let number = FN_NUMBER.fetch_add(1, SeqCst);
let this_fn = fn_ident(number);
if let Some(n) = number.checked_sub(1) {
let last_fn = fn_ident(n);
b.stmts.push(parse_quote! { #last_fn(#arg); });
}
b.stmts.extend(last_call);
parse_quote! { fn #this_fn(#arg) { #b } }
}

#[proc_macro]
fn bindlang_main(arg: TokenStream) -> TokenStream {
let arg = ident(arg.to_string());
TokenStream::from(if let Some(n) = FN_NUMBER.load(SeqCst).checked_sub(1) {
let last_fn = fn_ident(n);
quote! { #last_fn(#arg) }
} else {
proc_macro2::TokenStream::default()
})
}
```

I also wrote a derive macro to implement the marshalling traits. This worked well for a small example that was entirely contained within one module, but failed once the code was spread out through multiple modules: The functions would no longer be in the same scope and therefore stopped finding each other.

Worse, I needed a number of pre-defined maps with functions for method dispatch for our external types within koto. A type in Rust can have an arbitrary number of impl blocks but I needed exactly *one* table, and I couldn’t simply daisy-chain those.

It was clear I needed a different solution. After thinking long and hard I came to the conclusion that I needed to pull all the code together in one scope, by the `bindlang_main!()` macro. My idea was that I create a `HashMap` of `syn::Item`s to be quoted together into one `TokenStream`. A lazy static `Arc<Mutex<Vec<Context>>>` was to collect the information from multiple attribute invocations:

```rust
#[derive(Default)]
struct Context {
bare_fns: Vec<MethodSig>,
modules: HashMap<String, Vec<MethodSig>>,
vtables: HashMap<String, Vec<MethodSig>>,
types: HashMap<String, String>,
}
lazy_static::lazy_static! {
static ref CONTEXT: Arc<Mutex<Context>> = Arc::new(Mutex::new(Context::default()));
}

#[proc_macro_attribute]
pub fn bindlang(_attrs: TokenStream, code: TokenStream) -> TokenStream {
let code_cloned = code.clone();
let input = parse_macro_input!(code_cloned as Item);
// evaluate input here, and store information in Context
// CONTEXT.lock().unwrap()...
code
}
```

This was when I found out that none of `syn`'s type is `Send` and therefore cannot be stored within a `Mutex`. My first attempt to circumvent this was moving everything to `String`s and using `syn::parse_str` to get the items out. This failed because of macro hygiene: Each identifier in Rust `proc_macro`s has an identity. Two identifiers resulting from two macro operations will get different identities, no matter if their textual representation is the same.

I also found that `proc_macro_derive`s have no way to get the `#[derive(..)]` attribute of the type, and I wanted to also bind derived trait implementations (at least for `Default`, because some types have no other constructors). So I removed the derive and moved the implementation to the `#[bindlang]` attribute macro, which now works on types, impl blocks and fns.

**Beware: This makes use of the (unspecified, but as of now working) top-down order of macro expansion to work!**

## Dirty tricks avoided

There is a `Span::mixed_context()` variant that will yield semi-hygienic macros (like with `macro_rules`). However, this looked risky (macro hygiene is there to protect us, so we better have a good reason to override it), so I took the data oriented approach, collecting the info I needed to create the code in the `lazy_static` to walk within `bindlang_main!()`. I still tried to generate the trait impls for marshalling directly in the attribute macro, but this again ran into macro hygiene trouble, because I could not recreate the virtual dispatch table identifiers. After moving this part to the main macro, too, the macro finally expanded successfully.

Except it didn’t compile successfully.

I had forgotten to `use` the items I was creating code for in the macro, and koto requires all external types to implement `Display`. So I added those imports as macro arguments and added the `Display` `impl`s to be met with a type inference error within the macro invocation. Clearly I needed some type annotations, but the error message only showed me the macro invocation, which was pretty unhelpful.

## Expanding Our Vision

My solution to debug those was to `cargo expand` the code, comment out the macro invocation and copy the expanded portions in its place so that my IDE would pinpoint the error for me. I had to manually un-expand some `format!` invocations so the code would resolve correctly, and finally found where I needed more type annotations. With those added, the code finally compiled. Whew!

I then extended the bindings to also cover trait impls and `Option`s, while my colleague Christos changed the code to marshall Rust values into koto values to mangle `Result::Err(_)` into koto’s runtime errors. Remembering that implicit constructors (structs and enum variants) are also useful, I added support to binding those if public. There was another error where intermediate code generated wouldn't parse, but some `eprintln!` debugging helped pinpoint the piece of code where it happened.

When trying to bind functions taking a non `self` referenced argument (e.g. `fn from(value: &Value) -> Self`), I found that the bindings would not work, because my `FromValue` implementation could not get references. Remember, a function in Rust cannot return a borrow into a value that lives only within it. It took me a while to remember I [blogged](https://llogiq.github.io/2015/08/19/closure.html) about the solution in 2015! Closures to the rescue! The basic idea is to have a function that takes a closure with a reference argument and return the result of that closure:

```rust
pub trait RefFromValue {
fn ref_from_value<R, F: Fn(&Self) -> R>(
key_path: &KeyPath<'_>,
value: &Value,
f: F,
) -> Result<R, RuntimeError>;
}
```

Having this in a separate trait allows us to distinguish types where the borrow isn't `&T`, e.g. for `&str`. Also we gain a bit of simplicity by using unified function call syntax (`MyType::my_fn(..)` instead of `v0.my_fn`). This also meant I had to nest the argument parsing: I did this by creating the innermost `Expr` and wrap it in argument extractors in reverse argument order:

```rust
let mut expr: Expr = parse_quote! { #path(#(#inner_idents),*) };
for (i, ((a, v), mode)) in idents.iter().zip(args.iter()).enumerate().rev() {
expr = if mode.is_ref() {
parse_quote! {
::lang_bindings::RefFromValue::ref_from_value(
&::lang_bindings::KeyPath::Index(#i, None),
#a,
|#v| #expr
)?
}
} else {
parse_quote! {
match (::lang_bindings::FromValue::from_value(
&::lang_bindings::KeyPath::Index(#i, None),
#a
)?) {
(#v) => #expr
}
}
};
}
```

Note that `match` in the `else` part is there to introduce a binding without requiring a `Block`, a common macro trick. Now all that was left to do was add `#[bindlang]` attributes to our `Namespace` and its contents, and also add a lot of `Display` implementations because koto requires this for all `ExternalValue` implementors.

In conclusion, our test configuration should now look something like:

```
synth.Namespace({
synth.Name("users"): synth.Content.Array (
{
synth.Name("id"): synth.Content.Number(NumberContent.Id(schema.Id)),
synth.Name("name"): synth.Content.String(StringContent.Faker("firstname", ["EN"])),
synth.Name("email"): synth.Content.String(StringContent.Faker("email", ["EN"])),
},
synth.Content.Number(NumberContent.Constant(10))
})
})
```

That's only the beginning: I want to introduce a few coercions, a custom koto prelude and perhaps some syntactic sugar to make this even easier to both read and write.

## The Takeaway

Macros that collect state to use later are possible (well, as long as the expansion order stays as it is) and useful, especially where code is strewn across various blocks (or even files or modules). However, if code relies on other code, it best be emitted in one go, otherwise module visibility and macro hygiene conspire to make life hard for the macro author. And if at one point the expansion order gets changed in a way that breaks the macro, I can change it to a standalone crate to be called from build.rs thanks to `proc_macro2` being decoupled from the actual implementation.
5 changes: 4 additions & 1 deletion docs/docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ module.exports = {
],
themeConfig: {
fathomAnalytics: {
siteId: 'ASXTKXUJ',
siteId: isTargetVercel() ? 'QRVYRJEG' : 'ASXTKXUJ',
},
algolia: {
apiKey: 'b0583a1f7732cee4e8c80f4a86adf57c',
Expand Down Expand Up @@ -99,6 +99,9 @@ module.exports = {
],
copyright: `Copyright © ${new Date().getFullYear()} OpenQuery.`,
},
prism: {
additionalLanguages: ['rust'],
},
},
presets: [
[
Expand Down
2 changes: 1 addition & 1 deletion synth/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -62,4 +62,4 @@ mongodb = {version = "1.2.1", features = ["sync"] , default-features = false}

sqlx = { version = "0.5.5", features = [ "postgres", "mysql", "runtime-async-std-native-tls", "decimal", "chrono" ] }

beau_collector = "0.2.1"
beau_collector = "0.2.1"
4 changes: 4 additions & 0 deletions synth/src/cli/export.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ use async_std::task;
use serde_json::Value;
use synth_core::{Name, Namespace};

use super::synth_doc::DocExportStrategy;

pub trait ExportStrategy {
fn export(self, params: ExportParams) -> Result<()>;
}
Expand All @@ -29,6 +31,7 @@ pub enum SomeExportStrategy {
FromPostgres(PostgresExportStrategy),
FromMongo(MongoExportStrategy),
FromMySql(MySqlExportStrategy),
DocExportStrategy(DocExportStrategy)
}

impl ExportStrategy for SomeExportStrategy {
Expand All @@ -38,6 +41,7 @@ impl ExportStrategy for SomeExportStrategy {
SomeExportStrategy::FromPostgres(pes) => pes.export(params),
SomeExportStrategy::FromMongo(mes) => mes.export(params),
SomeExportStrategy::FromMySql(mes) => mes.export(params),
SomeExportStrategy::DocExportStrategy(doc) => doc.export(params),
}
}
}
Expand Down
34 changes: 33 additions & 1 deletion synth/src/cli/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ mod postgres;
mod stdf;
mod store;
mod telemetry;
mod synth_doc;

use crate::cli::export::SomeExportStrategy;
use crate::cli::export::{ExportParams, ExportStrategy};
Expand All @@ -24,6 +25,8 @@ use crate::utils::{version, META_OS};
use rand::RngCore;
use synth_core::Name;

use self::synth_doc::DocExportStrategy;

pub struct Cli {
store: Store,
args: Args,
Expand Down Expand Up @@ -130,6 +133,11 @@ impl Cli {
}
}
}
Args::Doc{ref namespace} => {
let to = SomeExportStrategy::DocExportStrategy(DocExportStrategy::new(namespace.clone()).context("could not open the namespace")?);
self.doc(namespace.clone(), to, 1)?;
Ok(())
}
}
}

Expand Down Expand Up @@ -267,6 +275,25 @@ impl Cli {
.export(params)
.with_context(|| format!("At namespace {:?}", ns_path))
}
fn doc(&self, ns_path: PathBuf, to: SomeExportStrategy, seed: u64) -> Result<()> {
if !self.workspace_initialised() {
return Err(anyhow!(
"Workspace has not been initialised. To initialise the workspace run `synth init`."
));
};
let namespace = self
.store
.get_ns(ns_path)
.context("Unable to open the namespace")?;
let params = ExportParams {
namespace,
collection_name: None,
target: 2,
seed
};
to.export(params).context("Error generating doc")?;
Ok(())
}
}

#[derive(StructOpt)]
Expand All @@ -277,7 +304,7 @@ pub enum Args {
#[structopt(parse(from_os_str), help = "name of directory to initialize")]
init_path: Option<PathBuf>,
},
#[structopt(about = "Generate data from a namespace")]
#[structopt(about = "Generate data from a namespace", alias = "gen")]
Generate {
#[structopt(
help = "the namespace directory from which to generate",
Expand Down Expand Up @@ -322,6 +349,11 @@ pub enum Args {
)]
from: Option<SomeImportStrategy>,
},
#[structopt(about = "Write documentation for namespace")]
Doc{
#[structopt(about = "namespace to be documented", parse(from_os_str))]
namespace: PathBuf
},
#[structopt(about = "Toggle anonymous usage data collection")]
Telemetry(TelemetryCommand),
}
Expand Down
Loading