Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the module structure of the namespace module #6291

Merged
merged 115 commits into from
Jan 17, 2025

Conversation

jjcnn
Copy link
Contributor

@jjcnn jjcnn commented Jul 22, 2024

Description

Fixes #5498.

This PR contains a major refactoring of the namespace module, which is responsible for keeping track of the symbol bindings in the program. The refactoring is intended to reduce memory allocation during compilation, and also fixes a number of minor bugs related to path resolution.

Advice to the reviewer:

I recommend reviewing this PR in 3 steps:

  1. Review the PR description. If something in the description is unclear, then leave a review comment, and I will try to address the issue. The code changes will be significantly easier to review once you get the overall idea of what is happening.
  2. Review the files sway-core/src/semantic_analysis/namespace/root.rs, sway-core/src/semantic_analysis/namespace/namespace.rs and sway-core/src/language/call_path.rs. These files contain the major datastructure changes, and if anything in those files needs to be addressed, then there is limited value in reviewing the rest of the code changes.
  3. Review the remaining files. These code changes are all consequences of the changes in step 2, so it shouldn't take too much time, even though there are quite a few changes.

Remaining issues

No remaining issues except for what reviwers bring up.

Breaking changes

As illustrated by one of the test cases the PR makes paths to external libraries such as ::core::codec::* illegal. This is a bugfix rather than a change in behavior (the correct path is core::codec::*), but may require updates of existing programs.

Additionally, the visibility check for modules was broken, so some modules declared using mod x would have incorrectly been considered public even though it should have been declared pub mod x. Again, this is a bugfix rather than a change of behavior, but may require updates of existing programs.

Changes made by this PR

Representation of the dependency graph (main file: sway-core/src/semantic_analysis/namespace/root.rs)

Previous: External libraries were represented as submodules of every module in the program. This meant that every module in every external package was cloned for every module in the program, and recursively so, since external libraries could also have dependencies that required clones for every module. In particular, std has core as a dependency:

                       core:

                       "" (lib.sw)
          /         /     |         \      \	
   primitives  raw_ptr  raw_slice  slice   ...


                        std:

                          "" (lib.sw)
          /         /        |         \                      \	
       core    constants    vec     primitive_conversions    ...
    / / | \ \       |        |         |        |     \
      ...         core      core      core     b256   ...
               / / | \ \  / / | \ \ / / | \ \   | 
                 ...        ...       ...     core
                                            / / | \ \
					       ...  


                      External library:


                          "" (lib.sw)
          /         /           |              \
       core        std         submodule        ...
    / / | \ \     /   \       /       \    \
      ...       core  ...   core      std    ...
             / / | \ \   / / | \ \    /   \   
                 ...        ...     core   ... 
                                  / / | \ \
				   	       

                         Main program:


                          "" (main.sw)
          /         /           |                        \
       core        std     external library                 submodule
    / / | \ \     /   \       /       \    \               /  |       \
      ...       core  ...   core      std    submodule  core  std    external library
             / / | \ \   / / | \ \    /   \      /  \    / \  /   \      /       |     \
                 ...        ...     core   ...  ... ...  ...  core ...  core    std    ...
                                  / / | \ \                   / \        / \    / \ 
                                                              ...        ... core  ...
                                                                              / \
                                                                              ...

Changes in this PR: External libraries are now represented only at the package root. The root module is given the same name as the package so that paths beginning with the package name can be resolved as a module path. If multiple libraries depend on the same external library, then they each contain a clone (this can be resolved a at later stage by introducing a global table of libraries).

                       core:

                       core (lib.sw)
          /         /     |         \      \	
   primitives  raw_ptr  raw_slice  slice   ...


                        std:

                     std (lib.sw)    ====== externals =====>   core
                                                             / / | \ \
           /      |          |                \                 ...
      constants  vec   primitive_conversions    ...             
                           /     \
                        b256     ...


                   External library:

     external_lib (lib.sw)  === externals ===>   core
        /          \                          / / | \ \
    submodule      ...                           ...
       |
      ...                                        std    == externals => core
                                              / / | \ \              / / | \ \
                                                 ...                    ...       				   	       


                     Main program:

    main_program (main.sw)  === externals ===>  core
      /           \                          / / | \ \
  submodule      ...                            ...
      |
     ...                                        std      == externals => core
                                             / / | \ \                / / | \ \
                                                ...                      ...       				   	       

                                            extenral_lib == externals => core
                                             / / | \ \                / / | \ \
                                                ...                      ...       				   	       

                                                                         std      == externals => core  
                                                                      / / | \ \                / / | \ \
                                                                         ...                      ...       			

Reasoning: This change dramatically reduces the amount of cloning of external libraries. The cloning is not eliminated completely, since multiple libraries may have the same library as a dependency (e.g., core is used as a dependency for both the program and std, so if the program depends on std, then there will be two clones of core in the tree). The remaining cloning can be eliminated by introducing a global table of libraries, which should be doable without making significant changes to the path resolution algorithm.

Callpath representation (main file: sway-core/src/language/call_path.rs)

Previous: Paths to externals were considered relative to current module. This worked because all externals existed a submodules of every module. Paths inside the current package were ambiguous, and were either interpreted relative to current module or relative to the root module. The root module did not have a name, so the empty path was used to indicate the root module.

Now: All full paths start with the name of the package, which is also the name of the root module. If the first identifier is not the name of the current package, then we look in the externals of that package, and if the name isn't found there we throw an error. Note that the name of an external is defined in Forc.toml, and may be different from the name used inside the external library (e.g., a library may call itself X, but a project may use X as an external under the name Y). This means that any path that escapes from a library must have its package name changed.

Reasoning: This is all a conseqence of the change to the representation of the dependency graph. We need to be able to the rely on the package name in order to resolve the path within the correct pacakge. Additionally, if a path escapes from a dependency to a dependent, then the package name must be changed to what the package is called in the dependent.

CallPath.callpath_type (main file: sway-core/src/language/call_path.rs)

Previous: The CallPath struct contained a flag is_absolute, which, if set, was used to mean two different things. For parsed paths it was used to signal that the path started with ::, i.e., that the path should be resolved relative to the current package root. For paths generated during typechecking the flag was used to indicate that the path was a full path, i.e., that it should either be resolved relative to the current package root or that it was a path to an external library.

Now: is_absolute has been replaced with callpath_type, which has 3 possible values: RelativeToPackageRoot, which indicates that the path started with ::, Full, which indicates that the path starts with the package name, and Ambiguous, which indicates that the path should either be resolved relative to the current module, or that it is a full path referring to an external library.

Reasoning: A path has 3 different states, so a boolean flag is not sufficient to represent all states.

Path representation (main file: sway-core/src/language/call_path.rs)

Previous: Full paths always pointed to the module containing the declaration of the thing that the path resolves to.

Now: Full paths point to a module where the suffix is bound. This may not be the module containing the declaration of the thing the suffix is bound to, because the binding may be the result of an import. The trait map is special in that the path to the actual declaration is used as a key in the map, so for traits we use the canonical path, which point to the module where the declaration resides.

Reasoning: The previous definition of full paths required a duplication of the symbol resolution algorithm in order to construct the full path, and it was getting very difficult to maintain. In the vast majority of cases we only need the full path to indicate somewhere where the symbol can be resolved, and don't actually need to know where the declaration resides. In the few cases where we do need to know the path to the declaration (the trait map uses these paths as keys) we can generate the canonical path by first converting to the full path, and then resolving the path and using the path stored with the binding in the lexical scope, since this is the path to the actual declaration. Additionally, the previous solution was only able to generate full paths for paths where the suffix is an identifier - for other suffix types we could not generate a full path. This has now been changed so that we can generate full paths for all paths, although canonical paths are still restricted to paths with an identifier as the suffix. This also moves us closer to a situation where we can represent semantic (canonical) paths separately from syntactic callpaths.

Namespace.init removed (main file: sway-core/src/semantic_analysis/namespace/namespace.rs)

Previous: The Namespace struct contained an init module which was used as a template for new modules, a clone of which was created every time the collection pass or the typechecker entered a submodule. This template contained all external libraries as submodules, as well as (for contract pacakges) a (new) declaration of CONTRACT_ID.

Now: The init module has been removed. When the collection pass or the typechecker enters a new submodule, Namespace creates a new Module (if none exists yet for the relevant submodule), and populates it with implicit imports (core::prelude::* and std::prelude::*, and for contract packages CONTRACT_ID).

Reasoning: The current solution makes it significantly clearer what is in a new Module object. Additionally, the declaration of CONTRACT_ID is no longer duplicated in each module - it is now declared once in the root module, and imported everywhere else.

namespace encapsulation (main file: sway-core/src/semantic_analysis/namespace/namespace.rs)

Previous: The Namespace struct contained methods to extract the underlying datastructures in the namespace module.

Now: An attempt has been made to encapsulate the namespace module a little bit more, so that changes and lookups to the datastructures in the namespace module are done by calling methods on the Namespace struct rather than letting the caller extract the datastructures and manipulating them directly.

Reasoning: When the datastructures are manipulated in many different places it makes it very hard to track down all the places that need to be changed when making changes to the datastructure. The current solution is not perfect, but it does encapsulate things a bit better than before.

Minor changes and improvements

A few minor improvements have been made:

  • A compilation unit (core, std, etc.) is now referred to as a package. (Note: The Root struct should probably be renamed to Package for consistency)
  • The visibility check for path resolution in type_resolve.rs had to be updated because path resolution was changed. The current version is not perfect, but it does improve the faulty check we had before.
  • TyModule so far contained a clone of the Namespace struct in the state it was in when typechecking of the relevant module was done. The Namespace object was never used except for the one in the root module, so I have moved the Namespace clone to TyProgram, thus eliminating quite a bit of unnecessary cloning (the Namespace struct owns the current package Root object, and thus cloning the Namespace struct clones the entire dependency graph). For error handling purposes in sway-lsp we save the current namespace in TypeCheckFailed rather than in TyModule, which also significantly reduces memory allocation during compilation.

Updated tests

A number of tests have been updated for various reasons:

  • Improved or changed error messages
  • Module visibility was not checked correctly, so some tests had to have certain submodules made public.
  • In once case, core::str_slice was referenced using the path ::core::str_slice. This is invalid, but since core used to be treated as a submodule of every module in the program, the path resolved anyway. (This also shows that the invalid paths in issue Fix performance and semantic problem when creating Modules and make Module::name mandatory  #5498 are now actually treated as invalid - I have not added regression tests of that issue, since it seems unlikely that we will reintroduce the bug).

Unresolved issues

  • The Namespace struct is cloned before the symbol collection phase. This is expensive, since the Namespace struct at that point contains the entire dependency graph (minus the modules of the current package). Ideally we would like the symbol collection phase and the typechecker to use the same Namespace struct, but at the moment the Module struct is unable to perform updates from parsed to typechecked declarations, so this isn't possible. The cloning happens in sway-core/src/lib.rs and sway-core/src/semantic_analysis/namespace/contract_helpers.rs. Once we represent externals as a global table we should be able to change the Root.externals map to only use references into the global table, which will make the cloning step much cheaper.

  • contract_helpers.rs throws a regular compiler error if typechecking fails, but that should be an ICE, since it is generated code that is failing to typecheck.

  • Paths may in some cases escape from external packages to the current package. If this happens across multiple packages, then this may result in the path no longer being valid. For instance, if A depends on B, which in turn depends on C, then a path may escape from C via B into A, but since A does not have C as a direct dependency, then the path can no longer be resolved. This again can be fixed once we introduce a global table of externals.

  • The visibility check in type_resolve.rs is insufficient, because it doesn't handle enum variants and associated types correctly.

Checklist

  • I have linked to any relevant issues.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation where relevant (API docs, the reference, and the Sway book).
  • I have added tests that prove my fix is effective or that my feature works.
  • I have added (or requested a maintainer to add) the necessary Breaking* or New Feature labels where relevant.
  • I have done my best to ensure that my PR adheres to the Fuel Labs Code Review Standards.
  • I have requested a review from the relevant team or maintainers.

Copy link

codspeed-hq bot commented Sep 5, 2024

CodSpeed Performance Report

Merging #6291 will improve performances by 22.73%

Comparing jjcnn/move-external-modules-to-root (2feb7e2) with master (37235f2)

Summary

⚡ 4 improvements
✅ 18 untouched benchmarks

Benchmarks breakdown

Benchmark master jjcnn/move-external-modules-to-root Change
compile 5.5 s 4.8 s +15.44%
did_change_with_caching 527.6 ms 475.1 ms +11.06%
completion 23.7 ms 20.9 ms +13.36%
hover 4.3 ms 3.5 ms +22.73%

@jjcnn jjcnn added the breaking May cause existing user code to break. Requires a minor or major release. label Jan 15, 2025
sdankel
sdankel previously approved these changes Jan 15, 2025
Copy link
Member

@sdankel sdankel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tooling changes look good, thanks!

JoshuaBatty
JoshuaBatty previously approved these changes Jan 15, 2025
Copy link
Member

@JoshuaBatty JoshuaBatty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, tooling changes looks good.

@JoshuaBatty JoshuaBatty requested a review from a team January 15, 2025 22:51
@ironcev
Copy link
Member

ironcev commented Jan 16, 2025

I highly appreciate the effort put into the comprehensive PR description! 🙏 ❤️

IGI-111
IGI-111 previously approved these changes Jan 16, 2025
Copy link
Contributor

@IGI-111 IGI-111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a significant improvement. Thanks for that, and for putting in the effort to make it easier to review.

After going through the PR I'm convinced that the remaining issues (such as not being able to import through dependencies) are high priority follow ups.

I don't think this is a breaking change, though we may break some codebases that rely on the buggy behavior, so we'll want to communicate it, but I don't see that as an obstacle to merging it.

@jjcnn jjcnn added breaking May cause existing user code to break. Requires a minor or major release. and removed breaking May cause existing user code to break. Requires a minor or major release. labels Jan 16, 2025
@jjcnn
Copy link
Contributor Author

jjcnn commented Jan 16, 2025

After going through the PR I'm convinced that the remaining issues (such as not being able to import through dependencies) are high priority follow ups.

Imports work fine I think, as does importing via reexports.

The potential problem is something like this:

  • Library A declares a trait T in its root module. The (full and canonical) path to T is A::T.
  • Library B has A a dependency. In its root module it reexports A::T, thus making T visible to any of B's dependents through the (full but not canonical) path B::T.
  • Program C has B as a dependency, but not A. It imports T via B using the path B::T, and implements it for a type. The impl is inserted into the trait map using the canonical path to T, which is A::T. However, A::T is not a meaningful path when seen from C, because C does not have A as a dependency.

I think of this as a bug with how we (I) implemented reexports, but to be honest I'm not sure how much of a problem it is. There's a risk of a name clash, if C has a dependency called A which does not refer to the same library as the one B depends on, but I'm not even sure the package manager can handle the situation where two different libraries in the dependency graph have the same name.

@IGI-111
Copy link
Contributor

IGI-111 commented Jan 16, 2025

Hmm, that still seems worthy of a prompt fix, but it's indeed not as bad as I thought.

tritao
tritao previously approved these changes Jan 16, 2025
@tritao
Copy link
Contributor

tritao commented Jan 16, 2025

Nice stuff, left just a minor issue, looks like some pretty decent gains here too 👍

@tritao tritao merged commit 2b2ce51 into master Jan 17, 2025
41 checks passed
@tritao tritao deleted the jjcnn/move-external-modules-to-root branch January 17, 2025 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking May cause existing user code to break. Requires a minor or major release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix performance and semantic problem when creating Modules and make Module::name mandatory
6 participants