Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reviews for Stage 2.7 #26

Open
6 tasks done
guybedford opened this issue Nov 14, 2024 · 22 comments
Open
6 tasks done

Reviews for Stage 2.7 #26

guybedford opened this issue Nov 14, 2024 · 22 comments

Comments

@guybedford
Copy link
Collaborator

guybedford commented Nov 14, 2024

Issue to track Stage 2 reviewers feedback.

Spec: https://tc39.es/proposal-esm-phase-imports/

Reviewers:

Editors:

@ljharb
Copy link
Member

ljharb commented Nov 14, 2024

to clarify, reviewers are within stage 2, they're just a prerequisite for stage 2.7.

@guybedford guybedford changed the title Stage 2.7 Reviews Reviews for Stage 2.7 Nov 14, 2024
@michaelficarra
Copy link
Member

  1. EvaluateImportCall step 9.b.i: don't say "is not equal to", just say "is not"; see https://github.com/tc39/ecma262/wiki/Editorial-Conventions#comparisons.
  2. GetModuleSourceModuleRecord: wouldn't it be more appropriate to name this parameter specifier? Isn't it still conceptually a specifier?
  3. GetModuleSourceModuleRecord step 5: "is equal to" -> "is"
  4. Source Text Module Records [[SourceText]] field: "ECMAScript source text" is probably a better type. Also I wouldn't include the optimisation note for hosts.
  5. Source Text Module Records [[ModuleSource]] field: like the above rows, we should describe when it's expected to be empty
  6. ModuleSourcesEqual step 2: "is equal to" -> "is"
  7. The ModuleSource constructor: I don't think this is needed: "will throw an error when invoked, where support for dynamic construction may be added in future"
  8. Where is ModuleSourcesEqual called? Why is it on abstract module records and not just source text module records (or better yet, just an AO) if we know it'll only be called on them?
  9. "For Module Records that implement a normal completion" -> "For Module Records that return a normal completion"
  10. I don't love the not-a-source enum name, but we can bikeshed that during integration

@guybedford guybedford mentioned this issue Nov 22, 2024
8 tasks
@michaelficarra
Copy link
Member

LGTM

@syg
Copy link

syg commented Nov 25, 2024

My normative review:

  • Why does EvaluateImportCall get a ModuleSource without consulting the phase argument? That doesn't seem right. If it's intentional then I don't understand what step 9 is doing at all.

My editorial review:

  • This is layered on top of source phase imports but I don't see necessary changes needed to that spec draft (e.g. EvaluateImportCall callsites).
  • GetModuleSource Step 3, the call to OrdinaryObjectCreate needs to pass « [[SourceTextModuleRecord]] » as an additional internal slot
  • ModuleSourcesEqual is a weird AO; is it needed? It seems like it's only used by embedder specs? If so, can the embedder compare identity of the value in [[SourceText]] instead of calling this AO?

@guybedford
Copy link
Collaborator Author

Why does EvaluateImportCall get a ModuleSource without consulting the phase argument? That doesn't seem right. If it's intentional then I don't understand what step 9 is doing at all.

Will do my best to explain further, and happy to discuss more. The import(moduleSource) use case is the ability to import the module instance at the key of a module source.

This is the required primitive for module expressions since a module expression would allow both import(module { export var p = 5 }) and postMessage, while having its JS object representation be the module source object. In turn we would also enable import(wasmModule) with the same behaviour, and in particular for source-phase-imported Wasm.

The registry effectively consists of a key - (module source, module Instance) 1 - 1 "canonical" source / instance pair relation. We get the pair today since a module record just is both things. This is the phasing model - url -> source -> instance all building off the same registry key. Since the module source associates directly with its URL, this allows them to be used as a proxy for their canonical instance, hence being supported in dynamic import() in a well-defined way as an import capability for the module in a sense.

Supporting new Worker(modSource), postMessage(modSource) and import(modSource) in this proposal get us the basic primitives to then follow-up with the module harmony features from there, and including in the Wasm world the ability to have module handles as well. The most pressing next step right now though being module expressions and module declarations.

I can work on more clearly motivating the above if it would help. The module full primitive goal is a crucial one to get right here though, we have to solve all of representation, postMessage and registry identity together, it's no use solving one without the other.

This is layered on top of source phase imports but I don't see necessary changes needed to that spec draft (e.g. EvaluateImportCall callsites).

Can you clarify what you mean by this? The latest source phase spec it is based to is the one here - tc39/ecma262#3492, not the version in the spec repo, which is now out of date. Does that help?

GetModuleSource Step 3, the call to OrdinaryObjectCreate needs to pass « [[SourceTextModuleRecord]] » as an additional internal slot

Thanks, added in #39.

ModuleSourcesEqual is a weird AO; is it needed? It seems like it's only used by embedder specs? If so, can the embedder compare identity of the value in [[SourceText]] instead of calling this AO?

The benefit of defining this was that Wasm could define it in the same way for WebAssembly module records. I found myself writing this out in HTML explicitly, and it felt unnecessary there. Formally defining the meaning for equality for module sources needs to be done somewhere - either in HTML or ECMA-262. @michaelficarra also brought this one up though, maybe we should just cut it until it's needed to be called directly from ECMA-262 though?

@syg
Copy link

syg commented Nov 26, 2024

Oh I see, I completely misunderstood the use case then. I'll give it a second read tomorrow.

@nicolo-ribaudo
Copy link
Member

nicolo-ribaudo commented Nov 26, 2024

  1. In EvaluateImportCall step 9.b, I'd prefer to swap steps i and ii: first we "read" the arguments, and then we do validation.
  2. In EvaluateImportCall step 9.b.ii, we need to validate that options is an object before reading "with" from it. See steps 10.d.i.
  3. (Not really a review comment: I wish GetModuleSourceName of Source Text Modules could return "JavaScript Module Source")
  4. I don't understand the change in step 1 of FinishLoadingImportedModule. Why if we are importing a string specifier we don't want anymore to add it to the loaded map?

I finished going through the spec -- consider my review as ✔ when 1/2/4 are solved (or explicitly not).

@guybedford
Copy link
Collaborator Author

Added #40 for points 1 & 2 above.

For (3), yeah that's what we mean...

I don't understand the change in step 1 of FinishLoadingImportedModule. Why if we are importing a string specifier we don't want anymore to add it to the loaded map?

In the case where you do import(moduleSource) we can't add anything to the map, therefore we gate step one of FinishLoadingImportedModule on the module request being a string, and not a module source (which results in the module request being passed as a module record). That is, import(moduleSource) is a module request for a module record, not a module request record, and that is reflected in the types of HostLoadImportedModule and FinishLoadingImportedModule. Let me know if I'm missing something here though.

@syg
Copy link

syg commented Nov 26, 2024

Will do my best to explain further, and happy to discuss more. The import(moduleSource) use case is the ability to import the module instance at the key of a module source.

Thanks for the explanation, the use case makes sense to me.

Step 9.b.i of EvaluateImportCall doesn't seem to allow import(moduleSourceObj) where options is undefined. Is that intentional?

The latest source phase spec it is based to is the one here - tc39/ecma262#3492, not the version in the spec repo, which is now out of date. Does that help?

Ack. That helps and addresses my previous question.

With the explanation, editorially LGTM.

@guybedford
Copy link
Collaborator Author

Thanks, added #42.

@legendecas
Copy link
Member

Editorial: In EvaluateImportCall 10.a Let moduleSource be GetModuleSourceModuleRecord(specifier), the moduleSource is a Module Record, can we name it module to not confuse it with ModuleSource?

@bakkot
Copy link
Contributor

bakkot commented Nov 30, 2024

Apologies for the delay. Here's my editorial comments:

  • In EvaluateImportCall, it's weird to have the following sequence: "let attributes be options.with; check module.[[Realm]]; check attributes != undefined". Seems like it would make more sense to consolidate the "reading options.with" and the "checking that options.with is undefined" steps; i.e., to pull 10.b.iii up to after 10.b.i.3
    • This is maybe contrary to @nicolo-ribaudo's suggestion above, but I disagree with that suggestion. We generally validate as we read options from options bags and similar. Compare reading/validating the "alphabet" and "omitPadding" options in Uint8Array.prototype.toBase64.
  • In the intro of GetModuleSourceModuleRecord, the sentence beginning "Provides" should begin "It provides". We use complete sentence in the prose parts of the spec, as a rule.
  • Ditto the sentence in the intro of ModuleSourcesEqual beginning "Checks".
  • The Agoric folks have requested that all new "hidden" (non-global) intrinsics have a documented way to reach them, which I think would include %ModuleSource%. This can just be in the readme but you might also want to call it out when presenting.
  • This introduces ModuleSourcesEqual but does not use it, as far as I can tell. I see the other editors have noticed this as well. Personally I'd just leave this out; we don't usually define machinery for use by hosts. If it's essential to have it, it should be explicitly noted that it is not used in this spec but is expected to be used by hosts, and under what circumstances we expect that.
  • s/@@toStringTag/%Symbol.toStringTag%/ throughout, as of Editorial: Convert @@ notation to intrinsics notation for well-known Symbols ecma262#1314
  • The description of GetModuleSourceName says "... to be provided in the strongly branded return value of the @@toStringTag getter on %AbstractModuleSource%". But the return value of that getter is a String; strings, being primitives, cannot be branded. I think just "to be used as the return value of the @@toStringTag getter on %AbstractModuleSource%" would be fine.
  • I was very confused about what "GetModuleSourceName" did because I thought it was returning the name of a specific module source object, and I didn't know what that would be. I would much prefer to call it "GetModuleSourceKind".
  • Is there a reason [[ModuleSource]] needs to be initially ~empty~? It would not be observable if it were populated eagerly, would it? If so, I'd prefer that, for simplicity. You can add a note to engines saying that it need not be manifested until observed, if you like, though engines are generally aware of that sort of thing already. ([[ImportMeta]] works the same way, but that's "observable" in the sense that it calls a host hook when first manifested, which could in principle do something observable. Manifesting [[ModuleSource]] doesn't call any host hooks.)

@guybedford
Copy link
Collaborator Author

guybedford commented Nov 30, 2024

In EvaluateImportCall, it's weird to have the following sequence: "let attributes be options.with; check module.[[Realm]]; check attributes != undefined". Seems like it would make more sense to consolidate the "reading options.with" and the "checking that options.with is undefined" steps; i.e., to pull 10.b.iii up to after 10.b.i.3

Moved the realm check to come after the options validation.

In the intro of GetModuleSourceModuleRecord, the sentence beginning "Provides" should begin "It provides". We use complete sentence in the prose parts of the spec, as a rule.
Ditto the sentence in the intro of ModuleSourcesEqual beginning "Checks".

Updated in #44.

The Agoric folks have requested that all new "hidden" (non-global) intrinsics have a documented way to reach them, which I think would include %ModuleSource%. This can just be in the readme but you might also want to call it out when presenting.

Added a readme section in #44. This uses the same path source phase imports already introduced for Abstract Module Record.

This introduces ModuleSourcesEqual but does not use it, as far as I can tell. I see the other editors have noticed this as well. Personally I'd just leave this out; we don't usually define machinery for use by hosts. If it's essential to have it, it should be explicitly noted that it is not used in this spec but is expected to be used by hosts, and under what circumstances we expect that.

I've added a note to this section to more clearly define it. If we "DCE" this function, we effectively have the [[SourceText]] field itself then having no usage. So actually I've come around to this function being important to define the meaning of this field. Hopefully the note help clarifies but further feedback in the PR very welcome.

s/@@toStringTag/%Symbol.toStringTag%/ throughout, as of tc39/ecma262#1314

The description of GetModuleSourceName says "... to be provided in the strongly branded return value of the @@toStringTag getter on %AbstractModuleSource%". But the return value of that getter is a String; strings, being primitives, cannot be branded. I think just "to be used as the return value of the @@toStringTag getter on %AbstractModuleSource%" would be fine.

I was very confused about what "GetModuleSourceName" did because I thought it was returning the name of a specific module source object, and I didn't know what that would be. I would much prefer to call it "GetModuleSourceKind".

Updated all of these in #44 as well.

Is there a reason [[ModuleSource]] needs to be initially empty? It would not be observable if it were populated eagerly, would it? If so, I'd prefer that, for simplicity. You can add a note to engines saying that it need not be manifested until observed, if you like, though engines are generally aware of that sort of thing already. ([[ImportMeta]] works the same way, but that's "observable" in the sense that it calls a host hook when first manifested, which could in principle do something observable. Manifesting [[ModuleSource]] doesn't call any host hooks.)

In reality, [[ImportMeta]] is always allocated eagerly in all JS engines, and quite frankly this is a pretty bad state of affairs.

While the ideal world is one in which implementers fine and implement these optimizations, I don't think that should preclude writing host calls in a way that avoids unnecessary work where possible. As you say the calling time itself could be deemed to be observable by definitions of observability.

For these reasons I think we should stick with a clear lazy generation module, especially since most modules in the module system will never have their source phases imported, and we shouldn't risk engines slowing down the module system.

@guybedford
Copy link
Collaborator Author

To follow up on this one -

Manifesting [[ModuleSource]] doesn't call any host hooks.

For WebAssembly modules, it calls a third-party defined concrete method though.

@guybedford
Copy link
Collaborator Author

That said, the Wasm implementation is very simple:

Let record be this WebAssembly Module Record.
Let module be record.[[WebAssemblyModule]].
Return module.

So maybe I'm overstating this one, and it could be an option.

I guess I'm unsure about what benefit there is to an earlier initialization? Does it make the spec significantly simpler?

@guybedford
Copy link
Collaborator Author

Considering this further, I think refactoring out GetModuleSource() may be viable editorially, and we can discuss that further, but I don't think that is a change we should make as part of Stage 2.7 here, and that rather that should be an upstream editorial Stage 3 change in the source phase imports specification itself. I've posted tc39/proposal-source-phase-imports#68 to track this one going forward.

@bakkot
Copy link
Contributor

bakkot commented Nov 30, 2024

I guess I'm unsure about what benefit there is to an earlier initialization? Does it make the spec significantly simpler?

We don't usually spec unobservable optimisations, so when reading an algorithm which contains extra machinery like this one expects it to have observable consequences. The fact that it actually doesn't is then surprising.

@kriskowal
Copy link
Member

Thank you for carrying this forward. This is my review.

  1. In 28.1 Module Source Objects,

    Module Source Objects represent modules in their source import phase, which are not linked, instantiated or executed.

    Please consider text that more clearly indicates that Module Source Objects close over only the immutable data that can be inferred from parsing the source text, and is associated with a Module Record in their source import phase, which is not necessarily linked, nor instantiated, nor executed. Although I concede that it is unnecessary for us to refactor 262 to have a single Module Record and multiplicity of Module Source Record types instead of conflating the Module Record and each concrete Module Source Record, I would like to see that occur before Stage 3 and will be satisfied for now if the prose of the specification makes clear that module sources may one day be multiply instantiated and instantiated in a variety of “cohorts”.

    For Shared Structs, @nicolo-ribaudo has already convinced me that new Module and new Evaluators will need to accept a “cohort” token to ensure that each Shared Struct has a unique prototype for every [identity of cohort, identity of module source, and ordinal within the text of that source].
    I am now convinced that this “cohort” token will be necessary to disassociate a ModuleSource instance with its corresponding Module instance (or beneath the spec fiction, its Module Source Record with its Module Record).
    It is our position in Hardened JavaScript that a ModuleSource instance should be a powerless capability.
    I am willing to fall back to a position where a ModuleSource instance is a powerless capability when transferred between cohorts, where each cohort is scoped to a Compartment in our current formulation.
    I expect this behavior to emerge in the future from the use of the same cohort token between new Evaluators for that compartment and every new Module of the same compartment, e.g., new Evaluators({ cohort, globals, importHook, importMetaHook }) and new Module(source, { cohort, globals, importHook, importMetaHook }).

    For the record, I found it dissatisfying that structuredClone(source) can disassociate the module source from the module instance for module sources constructed by new ModuleSource but not for module sources obtained from the host with import source. I am satisfied by cohort association.

    For my fellow object capability enthusiasts, the crux of this issue is that import(source) in one cohort must not produce a module instance from another cohort, because we wish to be able to use a module source as a powerless data object.
    Without a specific cohort, obtaining a module source from the host, like import fsSource from './fs.js' produces a capability within its cohort to get the corresponding host fs module via import(fsSource).
    An explicit cohort is necessary to divorce the fsSource from the fs instance when it passes into another cohort.
    Likewise, the cohort would ensure that the prototypes of shared structs from one cohort are separate instances from those in another cohort.
    I’m already convinced we need cohort and this is just another reason.

  2. The name of the [[SourceText]] internal slot leaves the reader to worry that there might or even could be an accessor that reveals the source text to the user. The ModuleSourcesEqual function is framed around the simplifying assumption that [[SourceText]] is the source test. I understand this is not your intention. Consider renaming this [[SourceIdentity]] to make clear that it may be a proxy for the equivalence of source strings, like a hash.

    Please consider adding a non-normative note that the spirit of providing this facility to host implementations of 262 is to provide a mechanism for ensuring tolerable misbehavior in the event that a program forwards a module source from one host agent to another via a host defined behavior like postMessage, that have both directly observed different versions of the same source at the same URL will create a new instance for each version of the text. I agree that this is the least dissatisfying resolution to this continuity error and that scar tissue will inevitably have to be produced in userland to eliminate problems for rolling deployment of an online application.

These are my only qualms with the specification as written and I would like an opportunity to review again and will make myself available tomorrow, on Tuesday.

@guybedford
Copy link
Collaborator Author

guybedford commented Dec 3, 2024

I've put together a PR with various wording adjustments to try to clarify these points in https://github.com/tc39/proposal-esm-phase-imports/pull/45/files.

For module source objects I used the following:

Module Source Objects represent the immutable data associated with a module record, distinct from linking, instantiation or execution state.

Please let me know how that sounds.

For (2) I thought this might be possible but of course it turns out to be a harder ask unfortunately - in order to specify postMessage inside of HTML we need to be able to call serialize and deserialize, and in order to that we need data types on top of which we can do this. The ECMAScript Parse Node is certainly not a serializable construct (although in reality hosts should serialize at this level).

So if we make the stringishness of [[SourceText]] no longer a feature, I simply don't know how we could define deserialization since it would need to not only define serialization and deserialization of this new unique key, but also look up the opaque key in a map shared between agents to get a shared ECMAScript Parse Node back (deserializing a module source in HTML is currently defined in terms of parsing the source text).

We already have some spec text notes that this field is only used for equality and serialization, and no other features. The impliciation being hosts may optimize further. Not explicitly calling the source text a string while having this note should also discourage any "accessor" string use within ECMA-262 while ensuring clear serialization and deserialization definitions between specs.

Let's perhaps continue the discussion in the PR. An alternative may be to add this as a Stage 3 exploration.

@nicolo-ribaudo
Copy link
Member

Re (1), my expectation is that when defining the host hook that goes from a module source to a module record in the web integration we'd do it in a way that is equivalent to having a per-realm (or per-compartment, in a world where compartments exist) source->module mapping. When we call dynamic import of a module source in a realm that hasn't seen it before, the host hook would give us a new Module Record and add it to its cache/map.

For the record, I found it dissatisfying that structuredClone(source) can disassociate the module source from the module instance for module sources constructed by new ModuleSource but not for module sources obtained from the host with import source. I am satisfied by cohort association.

I have not seen the details of the structuredClone integration, but I would actually expect it to not disassociate in both cases. It's similar to how structuredClone(sharedArrayBuffer) does not disassociate the underlying memory.


Re (2), I'm not sure about what the normative suggestion is (or is it just editorial?), but equality cannot be based just on the source text. For example, on the web it must also take into account the URL (two modules with the same contents at two different URLs are two different modules). Also note that Source Text Module Records already have a slot containing their source code, just as a parse node and not as a string.

In the draft HTML integration I have seen, ModuleSourcesEqual is only called on sources that are already guaranteed to have the same URL (assuming that equality is defined by the (url, source) pair) so it's fine if ModuleSourcesEqual only compares the string contents, but maybe for clarity we could just remove ModuleSourcesEqual and implement it in the relevant HTML place to avoid possible confusion? Even though I see that Guy already pushed back against that idea because HTML would need to hard-code equality for Wasm.

@kriskowal
Copy link
Member

I've put together a PR with various wording adjustments to try to clarify these points in https://github.com/tc39/proposal-esm-phase-imports/pull/45/files.

Thank you, this addresses the bulk of my concerns. I believe this ensures that all observable behavior emerging from specification is consistent, even on hosts that do not capture the literal [[SourceText]] and do not make module sources transferrable via mechanisms like postMessage. I am thinking specifically of XS, on which import source would work and bind source to a ModuleSource instance, even in the case that the source is backed by byte code and the source text is not in evidence. I invoke @phoddie since this is a Moddable concern. On such platforms, the host would never have reason to consult ModuleSourcesEqual so the absence of a [[SourceText]] internal slot would not be observable.

I intend to approve advancement to Stage 2.7 contingent on #45.

@guybedford
Copy link
Collaborator Author

I am thinking specifically of XS, on which import source would work and bind source to a ModuleSource instance, even in the case that the source is backed by byte code and the source text is not in evidence.

Yes, there is no requirement to maintain the source text, and any proxy for [[SourceText]] that offers serialization and identity may be used such as a hash or global indexing scheme.

On such platforms, the host would never have reason to consult ModuleSourcesEqual so the absence of a [[SourceText]] internal slot would not be observable.

While this is true today, when there exist multiple cohorts, depending on how this is specified, ModuleSourcesEqual may still be necessary if we want to support identity unification for cross-cohort source usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants