Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

treeshake imports based on guest content #117

Merged
merged 9 commits into from
Jul 15, 2024

Conversation

karthik2804
Copy link
Contributor

@karthik2804 karthik2804 commented Jun 18, 2024

This is a WIP as I have not tested it much yet. I am not sure how transitive imports will be passed down. Since we only pass in the imports directly present in the guest content.

Copy link
Member

@tschneidereit tschneidereit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be excellent to have this!

One thing that concerns me a bit is that we're relying on a different lexer here to find the list of used imports. If that lexer disagrees with SpiderMonkey's for whatever reason, that could lead to really bizarre and hard to debug issues.

And I could immediately see that happening in practice, too: we support top-level await, which means code could dynamically decide to import component interfaces, which the lexer wouldn't be able to see.

We could alternatively consider making StarlingMonkey emit the required information itself: it could write out a file containing a list of imports as recorded during parsing/execution. That'd require changing StarlingMonkey to be able to receive the information that it should do so, and where to store the resulting file, but that'd all be doable, I think. @guybedford, do you see any blockers with that approach?

crates/spidermonkey-embedding-splicer/src/lib.rs Outdated Show resolved Hide resolved
@guybedford
Copy link
Collaborator

And I could immediately see that happening in practice, too: we support top-level await, which means code could dynamically decide to import component interfaces, which the lexer wouldn't be able to see.

We do not currently support dynamic import() currently in StarlingMonkey, since we do not register a JS::SetModuleDynamicImportHook hook. If and when we do I agree this question definitely comes up.

Note that ComponentizeJS performs source transforms using the same lexer already, as it has to perform a replacement of the world imports with the generated bindings modules.

Separately, we don't currently support transitive imports at all - which we probably should as well. But since this is a missing feature, again could be a follow-up to this PR in supporting the transitive rewriting and not just for the top-level source.

If we did want to move more of this process into StarlingMonkey, what would be needed is a full loader API - the ability to define source transforms and resolver hooks at the engine level. One problem here is how to communicate the source transform without causing the code for the source transform to be linked in unnecessarily to the final build. So a solution would likely need to be found for that communication problem (eg via a Wizer-time dedicated "resolve" / "loadSource" host call technique).

@tschneidereit
Copy link
Member

Thank you for the excellent context, @guybedford! With all that understood, I think it's more than fine to land this with the external lexer, and eventually look into moving more of this and other things into StarlingMonkey itself.

@karthik2804
Copy link
Contributor Author

I made an attempt to update this PR based on #116 (comment). I am not entirely sure if I am on the right track. I am also not sure how to debug the failing test (my suspicion is that it has something to do with transitive imports in the wit level). Some guidance here would be really appreciated. Thanks!

@karthik2804
Copy link
Contributor Author

I have updated the PR to fix the failing test and also addressed the comment.

@karthik2804 karthik2804 marked this pull request as ready for review July 10, 2024 18:01
@karthik2804 karthik2804 requested a review from guybedford July 10, 2024 18:01
@guybedford
Copy link
Collaborator

I tested this out with a simple scenario and it seems to work well!

I've added an update here to properly handle export aliases. It doesn't fully handle the alias deduping logic yet which is supposed to only support aliases when they don't conflict. To handle that we should move the alias conflict detection logic to a preprocessing stage that is not related to the esm_bindgen like we currently rely on for populate_export_aliases.

It could be nice to even handle treeshaking for function import bindings off of an interface ie import { name } from 'pkg:name@v/iface' to support knowing only name is used from iface, but that can be a follow-up separately.

This code looks good to land to me, please test it out once more there @karthik2804 and let me know if you are happy to release.

We will need a test that verifies the export and import tree-shaking in this PR to land it though. Just let me know if you want to discuss how to set up the wiring for that if you want more feedback on that.

Copy link
Collaborator

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need a simple test to land this.

Comment on lines +389 to +391
// TODO: move populate_export_aliases to a preprocessing
// step that doesn't require esm_bindgen, so that we can
// do alias deduping here as well.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may bring in possibly unused interfaces in some alias conflict cases, which would require the refactoring as discussed around alias preprcessing. I'm okay to land this without that to start.

@guybedford
Copy link
Collaborator

guybedford commented Jul 12, 2024

I've pushed up a simple failing test case here to verify that imports and exports get treeshaking applied. In the process I discovered two remaining issues:

  1. When treeshaking, we are going to change the offset indices of the generated functions. We therefore need to maintain an offset diff table that can be applied to all function offset lookup functions (specifically in the splicer import lookup function). We should specifically test the case where a middle import is removed that the second import in world order doesn't bindgen to the wrong function.
  2. The current treeshaking work does not support filtering resources at all - and as a result all resource bindgen is still being performed. We need to more carefully consider and handle a proper resource tree-shaking ability, similarly to how Jco dynamically pulls in resources as they are needed during function bindgen (see how create_resource_fn_map is called per function generation in https://github.com/bytecodealliance/jco/blob/main/crates/js-component-bindgen/src/transpile_bindgen.rs#L1755).

There is therefore still quite some work to do here. Sorry I can't be of more help.

@guybedford
Copy link
Collaborator

An alternative to (1) would be to carefully ensure that in all bindgen, exports ordering is defined to be the ordering after treeshaking. Either way though requires careful review of the ordering.

test/cases/smoke/test.js Outdated Show resolved Hide resolved
@karthik2804
Copy link
Contributor Author

@guybedford can you walk me through the 2 remaining issues a little more? I do not fully grasp it. My apologies if I am being slow.

@guybedford
Copy link
Collaborator

@karthik2804 I took another look at this today and was able to resolve most of the remaining concerns I had:

  • It turned out we just needed to move the world merge to before encoding the component, as opposed to before the bindgen. I think this should work out correctly now!
  • With this model imports and exports are now the world imports and exports only, BEFORE merging with the engine world. I think this makes sense because you can always analyze the output component to get the merged imports and exports. Perhaps we should call them, guestImports and guestExports in the API?
  • I feel more confident that ordering is not an issue in bindgen now, given that all imports and exports lists in the JS and Rust code correspond to this same list that is the original world after treeshaking, but before the engine merge.
  • Resource treeshaking seems fine to do as a follow-up further, where we walk resources as functions are used to only pull in what is used by functions as opposed to all resources. This seems like it could be a nice refactoring to the resource bindgen as well as discussed in https://github.com/bytecodealliance/jco/blob/main/crates/js-component-bindgen/src/transpile_bindgen.rs#L1755. I have no problem deferring that work, and we can create an issue to track it further as needed.

The tests for imports treeshaking are now fully passing.

There is only one final thing then that we need to discuss now and that is the exports treeshaking, which still has a failing test.

Strictly speaking, I believe exports are supposed to be mandatory for component generation - that is a component can only be encoded to a world when it provides all of the expected exports. Not providing an export is typically seen as an error.

Therefore I'd like to understand more about exports treeshaking and how it might be used in this case? And then if we want to support this exports treeshaking feature then we must also add support for removing unused exports from the world itself before encoding the component.

@karthik2804
Copy link
Contributor Author

@guybedford That sounds exciting!

On the note of exports tree shaking - the original intent of the PR started with only the goal of imports tree shaking. The reason we started analyzing the guest exports was to implement what #116 took a stab at. (i.e) retain the fetch event if the guest does not export an explicit incomingHandler. I do not think we need to tree shake exports otherwise. I fully agree that not having all the exports defined in a world should be an error. I do not see an issue with retaining even unused exports as the output component will still be compliant with the target world.

Comment on lines +152 to +154
for (k, _) in &engine_resolve.worlds[engine_world].imports {
guest_imports.push(engine_resolve.name_world_key(k));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was taking a stab at this afternoon and found that this returned all the imports provided in the wit definition and not just the ones actively used by the engine who's list I was able to get with the following code snippet.

// Componentize the Starlingmonkey engine so that we can pull in the imports that are actually used.
let bytes = include_bytes!("../../../StarlingMonkey/host-apis/wasi-0.2.0/preview1-adapter-release/wasi_snapshot_preview1.wasm");
let component = wit_component::ComponentEncoder::default()
    .adapter("wasi_snapshot_preview1", bytes)
    .map_err(|e| e.to_string())?
    .module(&engine)
    .map_err(|e| e.to_string())?
    .encode()
    .map_err(|e| e.to_string())?;
let decoded = wit_component::decode(&component).unwrap();
let resolve = decoded.resolve();
let packages = decoded.package();
let world_id = resolve.select_world(packages, None).unwrap();
let world = &resolve.worlds[world_id];
// merge the imports actually used by the engine with the imports from the guest content.
for (import_key, _) in world.imports.iter() {
    guest_imports.push(resolve.name_world_key(import_key));
}

@guybedford
Copy link
Collaborator

@karthik2804 okay, in that case, let's remove the exports treeshaking entirely for now then. As for the imports listing, we should probably have two lists like imports and guestImports to use in the test. Your approach is correct but I think ideally we should avoid unnecessary processing, by finding another approach or via precomputing the list of engine exports.

@karthik2804
Copy link
Contributor Author

We could pre-compute the list of engine exports as a part of the build step possibly? run it between the build of Starlingmonkey and before the splicer is built?

@karthik2804
Copy link
Contributor Author

Also with removing the exports tree-shaking, would we still need #116? cc: @tschneidereit

@guybedford
Copy link
Collaborator

I've updated the exports treeshaking to instead provide a missing guest export error and added a test for that.

For the incoming handler, we probably do still want to support that case, as that is about the engine itself?

I'm happy to just document for now that imports is the guest imports and that the component should be analyzed directly to get the full component imports. The actual used guest imports seems more useful to me? Does that make sense?

If so, let's land and release.

@karthik2804
Copy link
Contributor Author

That makes sense to me! I really appreciate you guiding me and helping take this over the line! Thanks!

@guybedford
Copy link
Collaborator

Ok will do. This is a fantastic feature to have, thanks for working on it! Hopeful we can get to interface analysis for cases like import { fn } from 'interface' knowing only fn is used from the given interface, and also resource tree-shaking in due course. But this is a great start.

@guybedford guybedford merged commit 68eb86e into bytecodealliance:main Jul 15, 2024
2 checks passed
@guybedford
Copy link
Collaborator

Released in 0.10.0.

@karthik2804 karthik2804 deleted the treeshake_imports branch July 16, 2024 06:13
@karthik2804
Copy link
Contributor Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants