Tolk v0.8: preparation for structures; indexed access var.0
#1503
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After Tolk v0.7 with AST-based semantic kernel was developed, we're starting our way to eventually implement structures with auto packing to/from cells. This will take several steps (each publicly released), it's the first one.
Notable changes in Tolk v0.8
tensorVar.0
andtupleVar.0
, both for reading and writingcell
,slice
, etc. to be valid identifiersSyntax
tensorVar.0
andtupleVar.0
I'll briefly remind, what tensors and tuples are in FunC/Tolk.
A tensor of N parts is actually N dictints variables on a TVM stack.
A tuple of N parts is a single variable on a TVM stack. Tuples can be typed and untyped:
Currenly, the only way you can do with tensors or typed tuples, is unpacking them to separate variables. Being unpacked, these variables become copied. Modifying them won't modify an original tensor/tuple.
Since Tolk v0.8, you can access tensors/tuples by indices without unpacking them
Use
tensorVar.{i}
to access i-th component of a tensor. Modifying it will change the tensor.Use
tupleVar.{i}
to access i-th element of a tuple (does asm INDEX under the hood). Modifying it will change the tuple (does SETINDEX under the hood).It also works for untyped tuples, though the compiler can't guarantee index correctness.
It works for nesting
var.{i}.{j}
. It works for nested tensors, nested tuples, tuples nested into tensors. It works formutate
. It works for globals.Just a couple of examples:
Nested tuples will even work for writing:
So, the compiler is smart enough to handle all cases. Even this ones:
Also, the compiler can now detect "one variable modified twice in same expression" resulting in a compilation error
Why is this essential?
In the future, we'll have structures, declared like this:
Structures will be stored like tensors on a stack:
It means, that
obj.{field}
is exactly the same astensorVar.{i}
:Same goes for nested objects:
Probably, a structure might have an annotation to change its layout from a tensor to a tuple. Then, accessing/modifying fields of such an object, will result in "INDEX" / "SETINDEX", exactly the same as done for tuples now. Note, that global tensors (and global objects in the future) are stored as TVM tuples, actually.
So, implementing all the above is a direct step towards structures.
Allow
cell
,slice
, etc. to be valid identifiersPreviously,
int
/cell
/builder
/slice
/tuple
were keywords, variables could not have such names.Now, these names are allowed:
In both TypeScript and Rust, names of types are also valid identifiers (
var number = ...
in TS is okay). Moreover, with the introduction of structures, this code should reasonably be valid, right?As a consequence, struct fields will also be allowed to be named
cell
,slice
, or any other types existing now or will be added later.Implementation details: Ops on a stack refactoring
The necessary first thing is to allow direct access to tensor vars. In FunC (and in Tolk before), tensor vars were represented as a single var in terms or Ops:
Now, every tensor of N stack slots is represented as N IR vars, handling them in later IR analysis correspondingly changed. It became possible, because now all types are inferred in advance.
The
TmpVar
now represents a single stack slot.LocalVarData
now contains an array of stack slots (1 for primitives, N for tensors).Stack comments in Fift output have also changed. First, they contain tensor components now:
Results in (pay attention to comments):
It's very handy, especially when object and fields are implemented.
Next, I've decided to use notation
'0
instead of_0
in stack comments (as always was in FunC). It doesn't mess with identifiers and indices:Implementation details: indexed access and non-trivial lvalues
Having refactored LET Op above, making
tensorVar.0
work for reading and writing becomes quite trivial. It's just accessing stack slots by offset, depending of inferred types. EveryTypeData
can calculate its own width on stack, so accessing i-th component is just accessing W[i] slots with offset as a sum of W[0..i-1].Nesting
tensorVar.0.1.2
works automatically. Holding tuples inside tensors at IR level makes no difference if we can handle tuple vars in general.Making
tupleVar.0
work on writing is not so trivial:To achieve this, a special LValContext was introduced. Its purpose is to handle non-primitive lvalues. At IR level, a usual local variable exists, but on its change, something non-trivial should happen.
globalVar = 9
actually doesConst '5 = 9
+Let '6 = '5
+SetGlob "globVar" = '6
tupleVar.0 = 9
actually doesConst '5 = 9
+Let '6 = '5
+Const '7 = 0
+Call tupleSetAt('4, '6, '7)
Of course, mixing globals with tuples should also be supported. To achieve this, treat
tupleObj
insidetupleObj.i
as "rvalue inside lvalue". For instance,globalTuple.0 = 9
reads global (like rvalue), assigns 9 to tmp var, modifies tuple, writes global.Nested tuples are handled with care. Remember, that
t.0 = rhs
should NOT read 0-th item, only write it. But for nestedt.0.1 = rhs
do read fort.0
tuple (still don't for t.0.1), and updatet.0
after all. It's also done using the sameLValContext
, wheret.0.1
is lval, andt.0
is "rval inside lval".A challenging thing is handling "unique" parts, to be read/updated only once.
f(mutate globalTensor.0, mutate globalTensor.1)
, thenglobalTensor
should be read/written once(t.0.0, t.0.1) = rhs
(m
is[[int, int]]
), thent.0
should be read/updated onceDetecting such "common parts" is done via calculating hashes of AST nodes of every lvalue and "rvalue inside lvalue" in LValContext.
By the way, this automatically gives an ability to detect and fire "multiple writes inside expression", like
(a, a) = rhs
/[t.0, (t.0.1, c)] = rhs
.Internals: built-in
__expect_type()
for testing purposesCurrently, the Tolk tester framework can test various "output" of the compiler: pass input and check output, validate fif codegen, etc. But it can not test compiler internals and AST representation.
I've added an ability to have special functions to check/expose internal compiler state. The first (and the only now) is:
Such a call has special treatment in a compilation process: compilation fails if this expression doesn't have the requested type.
It's intended to be used in tests only. Not present in stdlib.
Related pull requests