-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcontributing
140 lines (79 loc) · 10.1 KB
/
contributing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
### CONTRIBUTING GUIDE ###
This is a guide through the source code of technetium, as well as guidelines for making contributions.
-- Part 1: Where to find things --
The technetium source code is made up of several different crates that are given a specific job in the pipeline of processing technetium source code. We'll go through these crates one at a time. You can skip parts of this section if this isn't what you're looking for
- core -
The core crate (crates/core/main.rs) provides the command line interface, sets up some small global context, and calls into each of the other crates in turn. This crate uses `clap` for command line argument parsing, and also sets up the `log` crate for verbose logging.
- lexer -
The lexer crate (crates/lexer) is called by the main crate, and has the job of converting the raw source code input into a stream of `lexer::Tok` tokens. It defines the error types that will be used for the lexing and compiling stage of the script execution.
- compile -
The compile crate (crates/compile) is also called by the main crate, and has the job of consuming a token stream and turning it into bytecode. It does this by first parsing the program into an `compile::ast::StatementList` abstract syntax tree, then by creating a `compile::CompileManager` that keeps track of state in the compilation process, and passing the AST to the CompileManager. The CompileManager will then contain all the required parts for execution on the runtime.
- runtime -
The runtime (crates/runtime) has the job of defining the bytecode, and defining how to run the bytecode. It also includes all the standard types and functions that are part of technetium. This is the most involved part of the codebase.
There is one more crate included, which is kind of unrelated to the others:
- mlrefcell -
This is a copy of `std::cell::RefCell`, except an `mlrefcell::MLRefCell` can be Mutably Locked (hence ML). It's really a quite trivial change to RefCell, but it's useful to the technetium runtime because of technetium's "immutable locking" feature.
- miscellaneous -
The documentation is generated using Sphinx. Install Sphinx into your PATH, then run `build_docs.sh` to build the documentation whos source is located in `docsource`. Then, the `docs` folder will contain a git repository that can be commited and pushed, which will update the Github Pages site.
All tests are either in the source code of files themselves (somewhat rare), or in the `tests` directory. The `tests` directory contains several files that all have random golden tests in them, that run the output `tech` binary against several simple scripts and their expected outputs. The division between these files is not clear, and will probably be dissolved in the future.
There's a script in `scripts/tcmake`, that just acts as an alias to `tech -r`. This is kind of pointless, but it would be great if in the future this would be added to the install location with `cargo install`. I can't figure out how to do this though.
-- Part 2: How does the runtime work? --
This will be an evolving section detailing how the core of the technetium runtime is structured.
- Dynamic Dispatching -
All technetium objects are passed in rust code as the opaque type `ObjectRef`. This is an opaque `Box` pointer to an implementer of the `Object` trait.
The `Object` trait provides some standard functions, such as a function for getting the name of the type of an object, and a function for cloning an object.
Anything that implements the `Object` trait should be of the form `ObjectCell<T>` for some type `T`. ObjectCells can:
* Have multiple owners
* Be turned into mutable internal references with checks at runtime
* Be locked, so that further mutations cause a runtime error
Downcasting `ObjectRef`s into concrete types is done using Rust's dynamic dispatch (using the Any trait). For convenience, since dispatching this way is terribly repetitive, there is a `downcast!` macro in `runtime` that helps with this process.
- Standard Library -
All the functions in the standard library are structs that implement the `Object` trait. These are found in the `runtime::standard` module. There are convenience macros for defining these functions, which make it relatively easy to add new functions to the standard library. These Objects are all then registered in `runtime::standard::mod.rs`.
-- Part 3: Desirable Contributions --
If you notice problems with technetium, make a GitHub issue! If you want to, you can check the projects I have listed below and suggest changes / feedback as well:
- minor projects -
There are some smaller things that used to be on my todo list which I want to commit to changing in the long run. They vary in scope, but are all relatively small:
1. Functions for serializing and persisting variables. Right now, the `stale` function makes use of persistence in the `.tcmake` subdirectory of a project. Users of the programming language could be able to similarly make use of this persistence between script runs, ideally very easily. I don't know exactly what this design would look like, but it would probably just involve several standard library functions for storing and retrieving values from string keys, as well as a serialization mechanism for `Object`'s.
2. Documentation needs to be added for the `dict()` conversion function. See the tests for how this function works (hint: it's similar to Python)
3. The following code stack overflows:
```
l = [1]
l.push(l)
println(l)
```
The equivalent code actually works in python though! The CPython source code that's responsible is here: https://github.com/python/cpython/blob/e5fe509054183bed9aef42c92da8407d339e8af8/Objects/listobject.c#L373 . Something similar could be hypothetically implemented in technetium.
4. Lock errors don't show when an object had been locked, which might be very confusing for the user. Take this example:
```
a = {2}
b = {3}
a.add(b)
b.add(5)
println(a)
```
5. An interactive mode should be added to the language. Statements could be compiled on the fly, and contexts could be reused so that things work out. `isatty` would need to be used, etc.
6. Integer literals, and a lot of standard library functions, expect integers to be within the range of an `i64`, even though in most cases internally ints are stored as arbitrary sized integers. This is unintuitive, and could possibly be changed.
7. The documentation overall could use more examples
8. Windows support. Technetium should *work* on windows, but the shell operator doesn't... I should figure out what it should do, and make it do that.
9. unicode string literals ("\u{something}").
10. A round float function?
11. Format strings that have parsing errors only report the first parsing error. This might be annoying in unusual cases.
12. Incorperate ripgrep, so it's easy to search for patterns. Maybe having a "rg" stdlib function?
- major projects -
There are several short term "major" projects that I am working on now that I feel are fit for contributions:
1. Macros for DRY. Right now, I repeat myself too much. In particular, the `downcast!` macro is relatively new to the source code, so I am converting its use throughout the `runtime` crate. I am always looking for new macros and ways to make my code less repetitive.
2. Miscellaneous documentation. Some areas of the Rust codebase are somewhat lacking in documentation. I'm always looking to improve on this. This is especially true with respect to all the weird state-holders (MemoryManager, CompileManager, etc.) that are not easy to understand.
3. This document. I could always expand the runtime section of this document. This is important!
3.5 More "warn" and "debug" level logging throughout the codebase, especially in the code for the runtime.
4. Error Handling. This is a language feature that I've wanted to add for a while now. This will also probably be the next language feature in technetium. There needs to be a try / catch system in technetium. Here are some sub-tasks that need to be completed first:
[ ] Errors as first class objects. This shouldn't be too hard, just add an `Object` wrapping the `RuntimeError` struct. What's more difficult is figuring out exactly what methods / attributes these errors should have, and properly documenting them.
[ ] try / catch in grammar. Messing with the grammar for the language is annoying, because it's really, *really* easy to make the parser not compile. try / catch is totally possible in the grammer (in fact, it's somewhat trivial compared to things like named arguments to functions), but it'll just take some work.
[ ] try / catch in the AST. Self explanatory. Not much work at all.
[ ] try / catch in the bytecode. This is more interesting! I think just adding a `try` op that contains a code reference for where the `catch` is is the way to go. Shouldn't be too bad.
[ ] Putting it all together.
5. Named arguments to functions. I don't know what to do about this. The more I think about it, the more I think "well you could just pass a dictionary as the last argument to your function that has all the options you need". And then I think "but rust doesn't have named arguments and that always makes me sad". Named arguments will be very difficult to implement, since they'll change the way functions are called in technetium, but it's do-able, and figuring out the scope of this change is a good idea.
6. Generators and comprehensions. These go together, and would be neat!
7. Modules, i.e. multi file programs. This is a large project, that wouldn't necessarily be that useful to technetium specifically.
8. Reference cycle detection. Same as before, a major project, wouldn't be that useful. Right now, memory is leaked if there ever is a reference cycle. You'd have to try pretty hard to make this happen as it is... There's another case when memory is leaked, and that is when some child context references a variable in a parent context (so the variable is captured). A completely new way to manage memory might be helpful. Maybe using slot arenas?
9. Optimization. Code runs pretty slow right now...
10. Related, the `debug` instruction in the bytecode was a bad idea. The debug information should be stored in a table outside of the code itself, for performance reasons.
11. Optimizers that run basic passes on the bytecode could be useful.