treeshake imports based on guest content #117

karthik2804 · 2024-06-18T11:51:40Z

This is a WIP as I have not tested it much yet. I am not sure how transitive imports will be passed down. Since we only pass in the imports directly present in the guest content.

tschneidereit

It'll be excellent to have this!

One thing that concerns me a bit is that we're relying on a different lexer here to find the list of used imports. If that lexer disagrees with SpiderMonkey's for whatever reason, that could lead to really bizarre and hard to debug issues.

And I could immediately see that happening in practice, too: we support top-level await, which means code could dynamically decide to import component interfaces, which the lexer wouldn't be able to see.

We could alternatively consider making StarlingMonkey emit the required information itself: it could write out a file containing a list of imports as recorded during parsing/execution. That'd require changing StarlingMonkey to be able to receive the information that it should do so, and where to store the resulting file, but that'd all be doable, I think. @guybedford, do you see any blockers with that approach?

crates/spidermonkey-embedding-splicer/src/lib.rs

guybedford · 2024-06-18T17:45:47Z

And I could immediately see that happening in practice, too: we support top-level await, which means code could dynamically decide to import component interfaces, which the lexer wouldn't be able to see.

We do not currently support dynamic import() currently in StarlingMonkey, since we do not register a JS::SetModuleDynamicImportHook hook. If and when we do I agree this question definitely comes up.

Note that ComponentizeJS performs source transforms using the same lexer already, as it has to perform a replacement of the world imports with the generated bindings modules.

Separately, we don't currently support transitive imports at all - which we probably should as well. But since this is a missing feature, again could be a follow-up to this PR in supporting the transitive rewriting and not just for the top-level source.

If we did want to move more of this process into StarlingMonkey, what would be needed is a full loader API - the ability to define source transforms and resolver hooks at the engine level. One problem here is how to communicate the source transform without causing the code for the source transform to be linked in unnecessarily to the final build. So a solution would likely need to be found for that communication problem (eg via a Wizer-time dedicated "resolve" / "loadSource" host call technique).

tschneidereit · 2024-06-22T11:21:41Z

Thank you for the excellent context, @guybedford! With all that understood, I think it's more than fine to land this with the external lexer, and eventually look into moving more of this and other things into StarlingMonkey itself.

Signed-off-by: karthik2804 <[email protected]>

karthik2804 · 2024-06-30T16:17:34Z

I made an attempt to update this PR based on #116 (comment). I am not entirely sure if I am on the right track. I am also not sure how to debug the failing test (my suspicion is that it has something to do with transitive imports in the wit level). Some guidance here would be really appreciated. Thanks!

crates/spidermonkey-embedding-splicer/src/bindgen.rs

Signed-off-by: karthik2804 <[email protected]>

karthik2804 · 2024-07-10T17:36:29Z

I have updated the PR to fix the failing test and also addressed the comment.

crates/spidermonkey-embedding-splicer/src/bindgen.rs

guybedford · 2024-07-12T22:24:38Z

I tested this out with a simple scenario and it seems to work well!

I've added an update here to properly handle export aliases. It doesn't fully handle the alias deduping logic yet which is supposed to only support aliases when they don't conflict. To handle that we should move the alias conflict detection logic to a preprocessing stage that is not related to the esm_bindgen like we currently rely on for populate_export_aliases.

It could be nice to even handle treeshaking for function import bindings off of an interface ie import { name } from 'pkg:name@v/iface' to support knowing only name is used from iface, but that can be a follow-up separately.

This code looks good to land to me, please test it out once more there @karthik2804 and let me know if you are happy to release.

We will need a test that verifies the export and import tree-shaking in this PR to land it though. Just let me know if you want to discuss how to set up the wiring for that if you want more feedback on that.

guybedford

We just need a simple test to land this.

guybedford · 2024-07-12T22:26:12Z

crates/spidermonkey-embedding-splicer/src/bindgen.rs

+                            // TODO: move populate_export_aliases to a preprocessing
+                            // step that doesn't require esm_bindgen, so that we can
+                            // do alias deduping here as well.


This may bring in possibly unused interfaces in some alias conflict cases, which would require the refactoring as discussed around alias preprcessing. I'm okay to land this without that to start.

guybedford · 2024-07-12T23:40:09Z

I've pushed up a simple failing test case here to verify that imports and exports get treeshaking applied. In the process I discovered two remaining issues:

When treeshaking, we are going to change the offset indices of the generated functions. We therefore need to maintain an offset diff table that can be applied to all function offset lookup functions (specifically in the splicer import lookup function). We should specifically test the case where a middle import is removed that the second import in world order doesn't bindgen to the wrong function.
The current treeshaking work does not support filtering resources at all - and as a result all resource bindgen is still being performed. We need to more carefully consider and handle a proper resource tree-shaking ability, similarly to how Jco dynamically pulls in resources as they are needed during function bindgen (see how create_resource_fn_map is called per function generation in https://github.com/bytecodealliance/jco/blob/main/crates/js-component-bindgen/src/transpile_bindgen.rs#L1755).

There is therefore still quite some work to do here. Sorry I can't be of more help.

crates/spidermonkey-embedding-splicer/src/lib.rs

guybedford · 2024-07-13T00:09:24Z

An alternative to (1) would be to carefully ensure that in all bindgen, exports ordering is defined to be the ordering after treeshaking. Either way though requires careful review of the ordering.

test/cases/smoke/test.js

karthik2804 · 2024-07-15T17:39:36Z

@guybedford can you walk me through the 2 remaining issues a little more? I do not fully grasp it. My apologies if I am being slow.

guybedford · 2024-07-15T22:32:39Z

@karthik2804 I took another look at this today and was able to resolve most of the remaining concerns I had:

It turned out we just needed to move the world merge to before encoding the component, as opposed to before the bindgen. I think this should work out correctly now!
With this model imports and exports are now the world imports and exports only, BEFORE merging with the engine world. I think this makes sense because you can always analyze the output component to get the merged imports and exports. Perhaps we should call them, guestImports and guestExports in the API?
I feel more confident that ordering is not an issue in bindgen now, given that all imports and exports lists in the JS and Rust code correspond to this same list that is the original world after treeshaking, but before the engine merge.
Resource treeshaking seems fine to do as a follow-up further, where we walk resources as functions are used to only pull in what is used by functions as opposed to all resources. This seems like it could be a nice refactoring to the resource bindgen as well as discussed in https://github.com/bytecodealliance/jco/blob/main/crates/js-component-bindgen/src/transpile_bindgen.rs#L1755. I have no problem deferring that work, and we can create an issue to track it further as needed.

The tests for imports treeshaking are now fully passing.

There is only one final thing then that we need to discuss now and that is the exports treeshaking, which still has a failing test.

Strictly speaking, I believe exports are supposed to be mandatory for component generation - that is a component can only be encoded to a world when it provides all of the expected exports. Not providing an export is typically seen as an error.

Therefore I'd like to understand more about exports treeshaking and how it might be used in this case? And then if we want to support this exports treeshaking feature then we must also add support for removing unused exports from the world itself before encoding the component.

karthik2804 · 2024-07-15T22:40:36Z

@guybedford That sounds exciting!

On the note of exports tree shaking - the original intent of the PR started with only the goal of imports tree shaking. The reason we started analyzing the guest exports was to implement what #116 took a stab at. (i.e) retain the fetch event if the guest does not export an explicit incomingHandler. I do not think we need to tree shake exports otherwise. I fully agree that not having all the exports defined in a world should be an error. I do not see an issue with retaining even unused exports as the output component will still be compliant with the target world.

karthik2804 · 2024-07-15T22:43:34Z

crates/spidermonkey-embedding-splicer/src/lib.rs

+            for (k, _) in &engine_resolve.worlds[engine_world].imports {
+                guest_imports.push(engine_resolve.name_world_key(k));
+            }


I was taking a stab at this afternoon and found that this returned all the imports provided in the wit definition and not just the ones actively used by the engine who's list I was able to get with the following code snippet.

// Componentize the Starlingmonkey engine so that we can pull in the imports that are actually used. let bytes = include_bytes!("../../../StarlingMonkey/host-apis/wasi-0.2.0/preview1-adapter-release/wasi_snapshot_preview1.wasm"); let component = wit_component::ComponentEncoder::default() .adapter("wasi_snapshot_preview1", bytes) .map_err(|e| e.to_string())? .module(&engine) .map_err(|e| e.to_string())? .encode() .map_err(|e| e.to_string())?; let decoded = wit_component::decode(&component).unwrap(); let resolve = decoded.resolve(); let packages = decoded.package(); let world_id = resolve.select_world(packages, None).unwrap(); let world = &resolve.worlds[world_id]; // merge the imports actually used by the engine with the imports from the guest content. for (import_key, _) in world.imports.iter() { guest_imports.push(resolve.name_world_key(import_key)); }

guybedford · 2024-07-15T22:48:09Z

@karthik2804 okay, in that case, let's remove the exports treeshaking entirely for now then. As for the imports listing, we should probably have two lists like imports and guestImports to use in the test. Your approach is correct but I think ideally we should avoid unnecessary processing, by finding another approach or via precomputing the list of engine exports.

karthik2804 · 2024-07-15T22:54:18Z

We could pre-compute the list of engine exports as a part of the build step possibly? run it between the build of Starlingmonkey and before the splicer is built?

karthik2804 · 2024-07-15T22:54:58Z

Also with removing the exports tree-shaking, would we still need #116? cc: @tschneidereit

guybedford · 2024-07-15T23:02:05Z

I've updated the exports treeshaking to instead provide a missing guest export error and added a test for that.

For the incoming handler, we probably do still want to support that case, as that is about the engine itself?

I'm happy to just document for now that imports is the guest imports and that the component should be analyzed directly to get the full component imports. The actual used guest imports seems more useful to me? Does that make sense?

If so, let's land and release.

karthik2804 · 2024-07-15T23:04:34Z

That makes sense to me! I really appreciate you guiding me and helping take this over the line! Thanks!

guybedford · 2024-07-15T23:06:38Z

Ok will do. This is a fantastic feature to have, thanks for working on it! Hopeful we can get to interface analysis for cases like import { fn } from 'interface' knowing only fn is used from the given interface, and also resource tree-shaking in due course. But this is a great start.

guybedford · 2024-07-16T00:52:04Z

Released in 0.10.0.

karthik2804 · 2024-07-16T06:13:20Z

Thank you!

tschneidereit reviewed Jun 18, 2024

View reviewed changes

crates/spidermonkey-embedding-splicer/src/lib.rs Outdated Show resolved Hide resolved

tschneidereit mentioned this pull request Jun 28, 2024

add option to retain fetchEvent handler #116

Closed

karthik2804 added 2 commits June 30, 2024 10:27

treeshake imports based on guest content

3faee41

Signed-off-by: karthik2804 <[email protected]>

use export to maybe retain fetchEvent

ff30f7d

Signed-off-by: karthik2804 <[email protected]>

karthik2804 force-pushed the treeshake_imports branch from 486735b to ff30f7d Compare June 30, 2024 16:00

tschneidereit reviewed Jul 1, 2024

View reviewed changes

crates/spidermonkey-embedding-splicer/src/bindgen.rs Outdated Show resolved Hide resolved

karthik2804 added 2 commits July 10, 2024 19:09

merge engine and guest content imports

220068f

Signed-off-by: karthik2804 <[email protected]>

only filter for incomingHandler and run in exports

d1be716

Signed-off-by: karthik2804 <[email protected]>

karthik2804 commented Jul 10, 2024

View reviewed changes

crates/spidermonkey-embedding-splicer/src/bindgen.rs Outdated Show resolved Hide resolved

karthik2804 marked this pull request as ready for review July 10, 2024 18:01

karthik2804 requested a review from guybedford July 10, 2024 18:01

handle export aliases

45bdc55

guybedford requested changes Jul 12, 2024

View reviewed changes

failing test case

7c5cb0e

guybedford reviewed Jul 13, 2024

View reviewed changes

crates/spidermonkey-embedding-splicer/src/lib.rs Outdated Show resolved Hide resolved

karthik2804 commented Jul 15, 2024

View reviewed changes

test/cases/smoke/test.js Outdated Show resolved Hide resolved

guybedford force-pushed the treeshake_imports branch from 6bdbcef to 067d072 Compare July 15, 2024 22:24

get imports test passing

6f70bb3

guybedford force-pushed the treeshake_imports branch from 067d072 to 6f70bb3 Compare July 15, 2024 22:42

karthik2804 commented Jul 15, 2024

View reviewed changes

update imports docs, remove exports test

641a992

provide exports errors instead of exports treeshaking

8aeb98e

guybedford approved these changes Jul 15, 2024

View reviewed changes

guybedford merged commit 68eb86e into bytecodealliance:main Jul 15, 2024
2 checks passed

karthik2804 deleted the treeshake_imports branch July 16, 2024 06:13

karthik2804 mentioned this pull request Jul 16, 2024

Question - Should componentize-js tree shake unused imports? #104

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

treeshake imports based on guest content #117

treeshake imports based on guest content #117

karthik2804 commented Jun 18, 2024 •

edited

Loading

tschneidereit left a comment

guybedford commented Jun 18, 2024

tschneidereit commented Jun 22, 2024

karthik2804 commented Jun 30, 2024

karthik2804 commented Jul 10, 2024

guybedford commented Jul 12, 2024

guybedford left a comment

guybedford Jul 12, 2024

guybedford commented Jul 12, 2024 •

edited

Loading

guybedford commented Jul 13, 2024

karthik2804 commented Jul 15, 2024

guybedford commented Jul 15, 2024

karthik2804 commented Jul 15, 2024

karthik2804 Jul 15, 2024

guybedford commented Jul 15, 2024

karthik2804 commented Jul 15, 2024

karthik2804 commented Jul 15, 2024

guybedford commented Jul 15, 2024

karthik2804 commented Jul 15, 2024

guybedford commented Jul 15, 2024

guybedford commented Jul 16, 2024

karthik2804 commented Jul 16, 2024

treeshake imports based on guest content #117

treeshake imports based on guest content #117

Conversation

karthik2804 commented Jun 18, 2024 • edited Loading

tschneidereit left a comment

Choose a reason for hiding this comment

guybedford commented Jun 18, 2024

tschneidereit commented Jun 22, 2024

karthik2804 commented Jun 30, 2024

karthik2804 commented Jul 10, 2024

guybedford commented Jul 12, 2024

guybedford left a comment

Choose a reason for hiding this comment

guybedford Jul 12, 2024

Choose a reason for hiding this comment

guybedford commented Jul 12, 2024 • edited Loading

guybedford commented Jul 13, 2024

karthik2804 commented Jul 15, 2024

guybedford commented Jul 15, 2024

karthik2804 commented Jul 15, 2024

karthik2804 Jul 15, 2024

Choose a reason for hiding this comment

guybedford commented Jul 15, 2024

karthik2804 commented Jul 15, 2024

karthik2804 commented Jul 15, 2024

guybedford commented Jul 15, 2024

karthik2804 commented Jul 15, 2024

guybedford commented Jul 15, 2024

guybedford commented Jul 16, 2024

karthik2804 commented Jul 16, 2024

karthik2804 commented Jun 18, 2024 •

edited

Loading

guybedford commented Jul 12, 2024 •

edited

Loading