-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the module structure of the namespace module #6291
Conversation
…ate-namespace-module
…ernal-modules-to-root
…ernal-modules-to-root
CodSpeed Performance ReportMerging #6291 will improve performances by 22.73%Comparing Summary
Benchmarks breakdown
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tooling changes look good, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, tooling changes looks good.
I highly appreciate the effort put into the comprehensive PR description! 🙏 ❤️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a significant improvement. Thanks for that, and for putting in the effort to make it easier to review.
After going through the PR I'm convinced that the remaining issues (such as not being able to import through dependencies) are high priority follow ups.
I don't think this is a breaking change, though we may break some codebases that rely on the buggy behavior, so we'll want to communicate it, but I don't see that as an obstacle to merging it.
Imports work fine I think, as does importing via reexports. The potential problem is something like this:
I think of this as a bug with how we (I) implemented reexports, but to be honest I'm not sure how much of a problem it is. There's a risk of a name clash, if |
Hmm, that still seems worthy of a prompt fix, but it's indeed not as bad as I thought. |
Nice stuff, left just a minor issue, looks like some pretty decent gains here too 👍 |
…Labs/sway into jjcnn/move-external-modules-to-root
2feb7e2
Description
Fixes #5498.
This PR contains a major refactoring of the namespace module, which is responsible for keeping track of the symbol bindings in the program. The refactoring is intended to reduce memory allocation during compilation, and also fixes a number of minor bugs related to path resolution.
Advice to the reviewer:
I recommend reviewing this PR in 3 steps:
sway-core/src/semantic_analysis/namespace/root.rs
,sway-core/src/semantic_analysis/namespace/namespace.rs
andsway-core/src/language/call_path.rs
. These files contain the major datastructure changes, and if anything in those files needs to be addressed, then there is limited value in reviewing the rest of the code changes.Remaining issues
No remaining issues except for what reviwers bring up.
Breaking changes
As illustrated by one of the test cases the PR makes paths to external libraries such as
::core::codec::*
illegal. This is a bugfix rather than a change in behavior (the correct path iscore::codec::*
), but may require updates of existing programs.Additionally, the visibility check for modules was broken, so some modules declared using
mod x
would have incorrectly been considered public even though it should have been declaredpub mod x
. Again, this is a bugfix rather than a change of behavior, but may require updates of existing programs.Changes made by this PR
Representation of the dependency graph (main file:
sway-core/src/semantic_analysis/namespace/root.rs
)Previous: External libraries were represented as submodules of every module in the program. This meant that every module in every external package was cloned for every module in the program, and recursively so, since external libraries could also have dependencies that required clones for every module. In particular,
std
hascore
as a dependency:Changes in this PR: External libraries are now represented only at the package root. The root module is given the same name as the package so that paths beginning with the package name can be resolved as a module path. If multiple libraries depend on the same external library, then they each contain a clone (this can be resolved a at later stage by introducing a global table of libraries).
Reasoning: This change dramatically reduces the amount of cloning of external libraries. The cloning is not eliminated completely, since multiple libraries may have the same library as a dependency (e.g.,
core
is used as a dependency for both the program andstd
, so if the program depends onstd
, then there will be two clones ofcore
in the tree). The remaining cloning can be eliminated by introducing a global table of libraries, which should be doable without making significant changes to the path resolution algorithm.Callpath representation (main file: sway-core/src/language/call_path.rs)
Previous: Paths to externals were considered relative to current module. This worked because all externals existed a submodules of every module. Paths inside the current package were ambiguous, and were either interpreted relative to current module or relative to the root module. The root module did not have a name, so the empty path was used to indicate the root module.
Now: All full paths start with the name of the package, which is also the name of the root module. If the first identifier is not the name of the current package, then we look in the externals of that package, and if the name isn't found there we throw an error. Note that the name of an external is defined in Forc.toml, and may be different from the name used inside the external library (e.g., a library may call itself
X
, but a project may useX
as an external under the nameY
). This means that any path that escapes from a library must have its package name changed.Reasoning: This is all a conseqence of the change to the representation of the dependency graph. We need to be able to the rely on the package name in order to resolve the path within the correct pacakge. Additionally, if a path escapes from a dependency to a dependent, then the package name must be changed to what the package is called in the dependent.
CallPath.callpath_type (main file: sway-core/src/language/call_path.rs)
Previous: The CallPath struct contained a flag
is_absolute
, which, if set, was used to mean two different things. For parsed paths it was used to signal that the path started with::
, i.e., that the path should be resolved relative to the current package root. For paths generated during typechecking the flag was used to indicate that the path was a full path, i.e., that it should either be resolved relative to the current package root or that it was a path to an external library.Now:
is_absolute
has been replaced withcallpath_type
, which has 3 possible values:RelativeToPackageRoot
, which indicates that the path started with::
,Full
, which indicates that the path starts with the package name, andAmbiguous
, which indicates that the path should either be resolved relative to the current module, or that it is a full path referring to an external library.Reasoning: A path has 3 different states, so a boolean flag is not sufficient to represent all states.
Path representation (main file: sway-core/src/language/call_path.rs)
Previous: Full paths always pointed to the module containing the declaration of the thing that the path resolves to.
Now: Full paths point to a module where the suffix is bound. This may not be the module containing the declaration of the thing the suffix is bound to, because the binding may be the result of an import. The trait map is special in that the path to the actual declaration is used as a key in the map, so for traits we use the canonical path, which point to the module where the declaration resides.
Reasoning: The previous definition of full paths required a duplication of the symbol resolution algorithm in order to construct the full path, and it was getting very difficult to maintain. In the vast majority of cases we only need the full path to indicate somewhere where the symbol can be resolved, and don't actually need to know where the declaration resides. In the few cases where we do need to know the path to the declaration (the trait map uses these paths as keys) we can generate the canonical path by first converting to the full path, and then resolving the path and using the path stored with the binding in the lexical scope, since this is the path to the actual declaration. Additionally, the previous solution was only able to generate full paths for paths where the suffix is an identifier - for other suffix types we could not generate a full path. This has now been changed so that we can generate full paths for all paths, although canonical paths are still restricted to paths with an identifier as the suffix. This also moves us closer to a situation where we can represent semantic (canonical) paths separately from syntactic callpaths.
Namespace.init removed (main file: sway-core/src/semantic_analysis/namespace/namespace.rs)
Previous: The
Namespace
struct contained aninit
module which was used as a template for new modules, a clone of which was created every time the collection pass or the typechecker entered a submodule. This template contained all external libraries as submodules, as well as (for contract pacakges) a (new) declaration ofCONTRACT_ID
.Now: The
init
module has been removed. When the collection pass or the typechecker enters a new submodule,Namespace
creates a newModule
(if none exists yet for the relevant submodule), and populates it with implicit imports (core::prelude::*
andstd::prelude::*
, and for contract packagesCONTRACT_ID
).Reasoning: The current solution makes it significantly clearer what is in a new
Module
object. Additionally, the declaration ofCONTRACT_ID
is no longer duplicated in each module - it is now declared once in the root module, and imported everywhere else.namespace encapsulation (main file: sway-core/src/semantic_analysis/namespace/namespace.rs)
Previous: The
Namespace
struct contained methods to extract the underlying datastructures in thenamespace
module.Now: An attempt has been made to encapsulate the
namespace
module a little bit more, so that changes and lookups to the datastructures in thenamespace
module are done by calling methods on the Namespace struct rather than letting the caller extract the datastructures and manipulating them directly.Reasoning: When the datastructures are manipulated in many different places it makes it very hard to track down all the places that need to be changed when making changes to the datastructure. The current solution is not perfect, but it does encapsulate things a bit better than before.
Minor changes and improvements
A few minor improvements have been made:
core
,std
, etc.) is now referred to as a package. (Note: TheRoot
struct should probably be renamed toPackage
for consistency)type_resolve.rs
had to be updated because path resolution was changed. The current version is not perfect, but it does improve the faulty check we had before.TyModule
so far contained a clone of theNamespace
struct in the state it was in when typechecking of the relevant module was done. TheNamespace
object was never used except for the one in the root module, so I have moved theNamespace
clone toTyProgram
, thus eliminating quite a bit of unnecessary cloning (the Namespace struct owns the current packageRoot
object, and thus cloning theNamespace
struct clones the entire dependency graph). For error handling purposes in sway-lsp we save the current namespace inTypeCheckFailed
rather than inTyModule
, which also significantly reduces memory allocation during compilation.Updated tests
A number of tests have been updated for various reasons:
core::str_slice
was referenced using the path::core::str_slice
. This is invalid, but sincecore
used to be treated as a submodule of every module in the program, the path resolved anyway. (This also shows that the invalid paths in issue Fix performance and semantic problem when creatingModule
s and makeModule::name
mandatory #5498 are now actually treated as invalid - I have not added regression tests of that issue, since it seems unlikely that we will reintroduce the bug).Unresolved issues
The
Namespace
struct is cloned before the symbol collection phase. This is expensive, since theNamespace
struct at that point contains the entire dependency graph (minus the modules of the current package). Ideally we would like the symbol collection phase and the typechecker to use the sameNamespace
struct, but at the moment the Module struct is unable to perform updates from parsed to typechecked declarations, so this isn't possible. The cloning happens insway-core/src/lib.rs
andsway-core/src/semantic_analysis/namespace/contract_helpers.rs
. Once we represent externals as a global table we should be able to change theRoot.externals
map to only use references into the global table, which will make the cloning step much cheaper.contract_helpers.rs
throws a regular compiler error if typechecking fails, but that should be an ICE, since it is generated code that is failing to typecheck.Paths may in some cases escape from external packages to the current package. If this happens across multiple packages, then this may result in the path no longer being valid. For instance, if A depends on B, which in turn depends on C, then a path may escape from C via B into A, but since A does not have C as a direct dependency, then the path can no longer be resolved. This again can be fixed once we introduce a global table of externals.
The visibility check in
type_resolve.rs
is insufficient, because it doesn't handle enum variants and associated types correctly.Checklist
Breaking*
orNew Feature
labels where relevant.