notes.txt

=======================================
03/17/2024

Fought this for a long time.  Problem is that var uses can't tell if they're in
the definition (recursively) or not, at parse-time.  So there's no breadcrumb
left-over from the Parsers' use to say if any given use should be "Fresh" or
"NonGen".

In my example below, when I parse A's expression and find "B", I don't know
(yet) if A and B are intertwined or not.  Again when parsing D's expression and
find B, again don't know if D and B are intertwined.

Ponding building a variant Parser than makes an AST for stack frames (like
EXE), then discovered mut-let-rec sets, perhaps reordering the Frame AST to
topo sort mut-let-rec, THEN walks the AST for a SoN.


=======================================
03/17/2024

BUG: $dyn not counted in StructNode._nargs, but it IS an argument

ForwardRef states
- not scoped; var might promote to outer scopes
- scoped; var is defined in this scope, but mid-def
- self-defined; var is defined, but not MUTUALLY defined, e.g. is_even/is_odd examples
- mut-defined; all mutual let-rec vars are defined.

FRefs are tied to their scope/struct/frame as the direct input for that var name.
They have a defined edge, null till self-defined.
They have more mutual dependence edges to other FRefs.  These extra FRef edges
are cycle-detected as self-def gets set.


Example:
A = { x -> rand ? B(x) : x };
D = {   -> rand ? B(1) : C(2) };
C = { x -> A(x) };
B = { x -> C(x) };

Parse token A
- Set A FRef, scoped, not self-defined.  Scope:{A-scoped}
- Parse A expr
- Discover B; insert B FRef, not-scoped  Scope:{A-scoped, B-not-scoped}
- Use a FRef; add edge A-FRef to B-FRef  Scope:{A-scoped(B), B-not-scoped}
- Finish A expr parse
- Change A-FRef to self-defined; has a B-FRef edge so NOT mut-defined (yet); Scope: {A-self(B),B-not}

Parse token D
- Set D FRef, scoped, not self-defined.  Scope:{A-self(B), B-not, D-scoped}
- Parse D expr
- Use B FRef, add edge D-FRef to B-FRef. Scope:{A-self(B), B-not, D-scoped(B)}
- Discover C; insert C FRef, not-scoped  Scope:{A-self(B), B-not, D-scoped(B), C-not}
- Use C FRef, add edge D-FRef to C-FRef. Scope:{A-self(B), B-not, D-scoped(B,C), C-not}
- Finish D expr parse
- Change D-FRef to self-defined; Scope:{A-self(B), B-not, D-self(B,C), C-not}

Parse token C
- Change C FRef to scoped: Scope:{A-self(B), B-not, D-self(B,C), C-scoped}
- Parse C expr
- Use A FRef, add edge C-FRef to A-FRef. Scope:{A-self(B), B-not, D-scoped(B,C), C-scoped(A)}
- Finish C expr parse
- CHange C-FRef to self-defined: Scope:{A-self(B), B-not, D-self(B,C), C-self(A)}

Parse token B
- Change B FRef to scoped: Scope:{A-self(B), B-scoped, D-self(B,C), C-self(A)}
- Parse B expr
- Use C FRef, add edge B-FRef to C-FRef.  Scope: {A-self(B), B-scoped(C), D-self(B,C), C-self(A)}
- FInish B expr parse
- Change B-FRef to self-defined: Scope: {A-self(B), B-self(C), D-self(B,C), C-self(A)}
- - Walk B->C->A, all self/mutual.
- - Flip all to Mutual.
- - Walk backwards from {A,B,C} looking for not-mut-FRefs.
- - If so, cycle-find again on D: D->B(mut), D->C(mut) - Flip all D to mutual.


=======================================
03/12/2024

Thinking that not inserting Fresh during parse doesn't work, because lost NONGEN inputs.

Thinking instead of removing extra FRESH, but this probably requires a global pre-pass.

See testRecur12.aa.sav, TestParse.jig for examples.

So...

Parsing A= definition.  Find forward ref B.  GO up scope, find mid-def of A,
add "B" as a dependence.

Parsing D= def.  Find fref B; go up scope; find mid-def of D, add "B" as a
dependence.  Repeat for fref C; D ends up with deps {B,C}

Parsing C= def.  Find fref A; add "A" as a dep.

Parsing B= def.  Find fref C; add "C" as a dep.

End of struct, topo-sort by deps and find cycle.  Declare {A,B,C} mutual let-rec.
Revisit Fresh loads inserted by FREF resolve, and kill extra Fresh.

Leave them in for D's


=======================================
03/10/2024

When defining a top-level frame, it is defined in parts and each part may (or
may not) be "let/in" complete.

Example, hidden/assumed top-level display/frame:
  fact = {.../*fact is mid-def, NOT FRESH here */... };
  x = e0; // fact is complete here, but x is not
  fact(x) // fact uses let-polymorphism here

Same Example, explicit top-level frame:
  FILE_SCOPE = @{
    fact = {.../*fact is mid-def, NOT FRESH here */... };
    x = e0; // fact is complete here, but x is not
    fact(x) // fact uses let-polymorphism here, but FILE_SCOPE is not complete
    ...     // FILE_SCOPE carries on with more variable
  }
  
Indeed FILE_SCOPE carries on to the program end, adding vars.
Same is true for all nested scopes.

BUG: Currently finding FILE_SCOPE in the non-gen set, which recursively
     includes 'fact' when fact is being used: `fact(x)` which blows its
     polymorphic flavor.

So from a typing perspective looking at something like:
  FILE_SCOPE_CURRENT = @{
    actve, in-progress defs.
    includes e.g. is_even until is_odd defines
  };
  FILE_SCOPE_DONE = @{
    fact = {...fact...};
    x = e0;
    blah = fact(x);
    .. more added
  }

Or maybe TOP/FILE_SCOPE NOT in non-gen, but instead their parts go in non-gen
on a case by case basis.

FILE_SCOPE = @{
  x = e0; // forward-ref set is empty
  fact = { ...fref(fact)...}; // remove fact from fref-set, so empty, so done.
  ...fact(3)... // fact is done, not in non-gen so FRESH
  is_even = { ...fref(is_odd)... }; // fref(is_odd, not done);
  blah = {...baz...}; // fref is empty, so done
  ...blah... // Done, non-in-nongen, so FRESH
  is_odd  = { ...is_even/not done so NOT FRESH...}// picks up is_even fref set, then removes is_odd, so both are done
  ...
}


Issue is building up a topo DAG-sort of sets, names are not-fresh in the same
initial create, and ARE fresh in later uses.
Set: {A,B,C} // A calls B calls C calls A
Set: {D,E}   // These can poly call A,B,C but not each other, D,E
Set: {F,G}   // These can poly call A,B,C but not each other, F,G
Set: {H,I}   // These can poly call any of the above...

Theory: late-add Fresh???
For each Closure Store, track f-ref names.
At end of closure, build SCCs and top-sort DAGS.
Within SCC, replace frefs with direct Loads, no fresh.
Not-replaced frefs replace with a Load/Fresh pair.


Nasty example; listing fref-sets:
FILE: @{
  A = {B};
  H = {I,A,D};
  D = {E,C};
  C = {A}
  I = {H,E}
  E = {D,C}
  B = {C}
}
During parse:
  If outer scope, assume Load/Def always complete here.
  If current scope, ask if Load/Def is complete & flag it.
          If not complete, flag Load as unknown.  Add to fref-set for current scope def.
  When current def completes, gather & reset fref-set.          
After struct finishes, walk fref-sets, make SCCs.
  Walk by SCC, all Loads in same SCC not-complete-not-fresh
  Loads not in current SCC as Fresh

More Nasty Example:
Done parsing A = {B    }; keep fref-set.  A not defined (yet), missing B.
Done parsing H = {I,A,D}; keep fref-set.  H not defined (yet), missing I,D.
Done parsing D = {E,C  }; keep fref-set.  D not defined (yet), missing E,C.
Done parsing C = {A    }; keep fref-set.  C not defined (yet), has A, but A missing B.
Done parsing I = {H,E  }; keep fref-set.  I not defined (yet), has I,H,A,D, missing E.
Done parsing E = {D,C  }; keep fref-set.  E not defined (yet), has everything except B.
Done parsing B = {C    }; C uses A, A uses B.  Declare mut-rec {A,B,C}.
Don't see any shortcuts here.

I still think, in the end, I'm gonna visit all fields as part of the "promote"
logic, and have to do a topo-sort with SCC discovery.  All fref uses get broken
down into uses from inside the SCC and uses outside the SCC.  


=======================================
12/25/2023

Pondering a "reboot" of AA Node class.

The Type classes seems to be stable; complex but very usable with strong
theoretical foundations.

The TVar classes seems to be stable; complex but very usable with strong
theoretical foundations.

Node:

- Needs better IR printing, and moved out of Node (too large).
  See Simple Node printing for a better 1-line per node print.

- Folding the edge Ary<Node>s back into Node - for easier debugging experience.
  Its 1-layer less, 1-click less every time I chase node edges.

- Lots of stuff in the Node class proper, I just want a chance to filter it down.

- Drop the opcode as not really helpful.  Just add Yet Another Virtual call.

- Stick with the recursive peepholes during parsing; no worklist.

- Post-parse, pre-combo & post-combo, normal worklist.

- A bunch of Node bits seem suspiscous - not sure I need them, or would be
  better of if moved to somewhere else.  Also, there's a lot of walkers, makes
  me wonder if I should use a generic walker more often.

- Looking at the walkers, I have a generic walker but its too hard to use.  I
  want cheap&easy input/output walkers.  See TypeStruct for such walkers; I
  pull one from a list, walk, return it.  Have to be careful with leaks on
  early exit.

- Ponder a NodeUtils for lots of stuff that's only sorta tangential to Node

- Ponder a nicer way for Parse to handle keeping partially built nodes around
- - addDef(null) use as a in-progress flag; delDef(null) to unwind
- - But still call keep/unkeep so can assert more.
- - try/resources/autoclose to wrap assertion that add/del is balanced?
- - OR: `int bal; assert bal = pushBalancedKeeps();`
- -     `assert popBalancedKeeps(bal);`
- - `try(BalanceCheck b = BalanceCheck.get(); ) { ... }`


Example for problem walkers; walkerr_def:
- Passed some "Env" for collecting stuff
- visit self-check
- walk just the defs, pre-order
- - skip if def is null, XCTRL, or a FunPtr
- - pre-order walk
- POST WORK:
- - revisit defs, check no ALL on defs
- - adderr(Env) collects errors in Env
- - updates each ErrMsg order

Issues for generic pre-order walker: its not a pre-order walk!  Must have a
special cutout during walking, which means calling a generic walker with
specialized per-edge-walk, which makes it much less useful.


=======================================
12/12/2023

Overloads (finally) appear to be typing/running in the EXE branch.
General theory:

- Dyn(amic) Loads, or DynLoads, take a dynamic offset instead of using a fixed
  constant offset.  The offset will come from a DynTable - and thus a dynamic
  load requires 2 loads instead of 1.
- The DynTable is strongly typed and derived during normal H-M typing.
- DynTable entries are either a mapping from a DynLoad to a field label
  (string, which at compile time turns into a constant) OR the map is from an
  Apply/Call site to a new DynTable used for this call.
- All Lambdas are passed an extra DynTable argument.
- All Apply/Calls pass this argument, which they load from the extant DynTable
  at a call-site specific offset.
- All DynLoads load at a fixed offset from the table, getting their offset, which they then load ptr+offset
- So the runtime cost is a load-per-call, an extra reg argument, and a 2nd load-per-DynLoad.
- The type of a DynTable is the TVDynTable, which is similar to a TVStruct, in
  that it has a variable number of fields, but uses Node IDs instead of labels.  
- DynTables can be empty, and will be if no DynLoad is involved.  They can have
  nested DynTables, which can be recursively empty.  For recursive functions
  (e.g. 'factorial'), the nested DynTable can refer to itself.


=======================================
10/21/2023

DynLoad has a result TV3, call it TV_dyn.
This TV_dyn maps back to the DynLoad TVPtr input, which can load() a TVStruct called TV_match

During a fresh_unify, if the TV_dyn freshes against 'that',
we need to do a trial_resolve(outie,'that',TV_match,test).
This instanceof TV_dyn made Fresh gets its own match.

I think we actually need to push forward with more mappings if fresh_unify.
So during fresh_unify, if see TV_dyn has a mapping, give mapping to 'that' as
well.  Also toss on delayed_resolve worklist.

For the map TV_dyn -> TV_match, since TV_dyn can unify, might try `unify`
updates this mapping.  We can assume the mapping is always correct because
`unify` corrects it.

If TV_dyn is deep, and innards change, and not already resolved, we need
to re-trial resolve.  This means changes on the inside should force a
re-trial.

So here's a simpler algo:

DynLoad puts TV_dyn on the "to be resolved" list, and tags TV_dyn with the
TV_match list.  Fresh_unify copies the list, unify merges the list.  Anything
listed goes on the "to be resolved" list.  The TBR contains the match pattern
(varies by fresh unify) and the tagged TV3.  
A DelayResolve is a pair (TV3,TVStruct).  Each pair resolves independently.
I need dup-removal, with simple O(n^2) search.

Once the main worklist is done, we trial_resolve the "to be resolved" list.
If any progress, we go back to combo.

Same concept can happen for delay_fresh.  Suppose TV_fresh fresh_unifies with
'that'.  Every TVExpanding inside of TV_fresh may later expand, and requires
another round of fresh.  So TVExpanding carries a list of FreshNodes; if they
expand those FreshNodes go on the to-be-re-Freshed list.  Once main worklist is
done, we go back a do more fresh_unify.

Action items: drop existing DELAY_RESOLVE lists.
* Global DelayResolve pair list (TV3,TVStruct)
* DynLoad puts a pair on.
* Fresh of a TV_dyn, puts a 'that' as a pair on: (that,TVStruct,Fresh)
* I assume both TV3 and TVStruct update during main worklist, but do not track
* At end of work, I recheck resolve.
* Resolved TV3s can be removed forever, always resolved.
* Progress restarts main worklist.

Ponder simplifying delay_fresh; this broke in HM and I don't remember how.
Possibly a self-cycle fresh/unify where a TV3 appeared both on the LHS and RHS
at the same time.

Theory: I can make delay_fresh *shallow*, each TV3 in LHS maps to a TV3 in RHS;
changes in LHS need to re-fresh in RHS.  Theory: just call fresh on changed LHS
and mapped RHS.  Cycles might be an issue!


=======================================
10/1/2023

Splitting normal field Load from a DynLoad (Dynamic Load) node.  DynLoad will
take a field offset input (much like an array load), but the input will contain
a mapping from TV3s to field labels.  The address to a DynLoad, same as a field
Load, will have a TVPtr to a TVStruct.  Once exactly one of the offset TV3s
trial-unifies with the DynLoad self, the field is chosen.  If this doesn't
happen during the compile, the DynLoad will do a dynamic lookup.  The lookup
math can be reduced (same as a vtable) to lookup up the field offset, at a
fixed offset in the vfield array.

vfields are produced trivially by New (unsure) and by Fresh.


=======================================
09/13/2023

Summary

Introducing a new concept of dynamic loads DynLoad and a vtable for DynLoads.
DynLoads dynamically compute their field offset from the vtable.  Essentially
the vtable is a mapping from DynLoad to field label (string/offset).

DynLoads are written as "._" and come as the default during operator expansion.
DynLoads use the HM field resolution and may resolve to different fields along
different execution paths.  These different paths always come about from Fresh
nodes and fresh uses of identifiers (generally inside functions but I can make
function-free examples).

The execution semantics for a DynLoad just do a self-lookup in the vtable
input, get a field offset and load at the address plus offset.  In the worse
case this is a fixed offset load from the vtable, then an add and load.

The vtable starts as an empty mapping, and each time is passes by a Fresh the
Parser inserts a DynLoadMap which updates the vtable for the Fresh.  The Fresh,
during Combo/HM, clones its input type and this clones unresolved field types
and these are already present in TVStruct.  During Combo resolution each unique
instance of an unresolved field is always tied to a Fresh and gets resolved
independently.  This resolution is put in the Fresh and DynLoadMap - so the
vtable changes to include a {DynLoad -> label} mapping.

The vtable edge gradually picks up a complete {DynLoad -> label} mapping for
all paths, and each DynLoad can then compute its proper label during runtime.
DynLoads with only constant labels can convert to Loads (the DynLoad dies via
node death).  Liveness can also be used to kill { DynLoad -> label } dead
mappings, which can kill some DynLoadMaps (DynLoadMap dies via Reduce).


Example:

// Using a var assignment here, so a later use of 'foo' clones via Fresh
foo = { x y z ->
  // Crazy expression produces a collection of functions, all different sigs
  (
    { int flt str -> ... },
    { flt str int -> ... },
    { str int flt -> ... }
  )._(x,y,z);              <<-- DynLoad of random expression
}

// A FRESH on FOO for every pattern
( // Gather results
  foo( 1, 2.2, "abc"),  // FRESH on FOO
  foo( "red", 2, 1.1),
  foo( 3.3, "blu", 5)
)


So really - its a vtable-per-FRESH.
Crazy concept
- extra vtable edge.
- R/M/W by FRESH, to extend/replace with self
- passed as extra parm to funcs.
- - scoped basically
- merged at Phis
- used by DynLoad to find field edge....
- can be dead if no DynLoad uses.
- Can be partially dead if a Fresh's choices not used by some DynLoad
- So really its a collection of edges - 1 per Fresh, kept in parallel to all other things, like memory edges.
- DynLoad inserts a unresolved hook in the TVStruct on address input.
- - TVStruct has normal fields
- - TVStruct also  has    <  -  ,DynLoad> -> <TV3 mapping,String>
- - TVStruct Fresh adds a <Fresh,DynLoad> -> <TV3 mapping,String>
- - Resolving fills in the String
- vtable is a mapping <DynLoad> -> String
- - Parse: carry mapping conservative (all DynLoad) -> (unresolved String)
- - Extra parm, like memory, tracks everywhere (blech)
- - Iter: DynLoadMap cannot Value update without Combo; can do Liveness at some point
- Combo computes per-Fresh DynLoad->String map
- Combo updates each Fresh with mapping
- - As fields resolved can be monotonic
- - Combo treats unresolved as high, can type-flow on the vtable mapping edge
- - vtable edge falls from <all DynLoad> -> (unresolved) to <most DynLoads> -> <Strings>
- - liveness per DynLoad
- - Combo a DynLoad getting a constant string no longer keeps vtable edge alive
- - Can thus make some Fresh DynLoad settings go dead.
- So parallel to Fresh is a DynLoadMap
- Which updates the vtable via the Fresh
- And can go dead independently from other things


BROKEN EXAMPLE:

Example, stacked dynamic dispatch

foo = { x y -> 
  (
    { int int -> ... },
    { int flt -> ... },
    { flt int -> ... },
    { flt flt -> ... },
  )._(x,y);              <<-- DynLoad of random expression, forces inlining?
}

baz = { x -> pred ? foo(x,2) : foo(x,2.2) }

bar = { -> pred ? baz(2) : baz(2.2) }


=======================================
09/11/2023

Final result: each Fresh instance of an unresolved field might require a
different label.

Either this should not type OR I can type it, but require inlining for an
efficient solution.

These can be inlined, making a specific instance per-use and thus type.
But this requires *inlining* to allow *typing*.

    `sq = { x -> x*x }; (sq(2),sq(2.2))`

Here I need to resolve a load from operator _*_.  The standard set is
  ( { int int -> int }, { int flt -> flt } )  // Integer MUL
  ( { flt int -> flt }, { flt flt -> flt } )  // Float   MUL
`sq(2  )` needs field `0` to get `{ int int -> int }`
`sq(2.2)` needs field `1` to get `{ flt flt -> flt }`

This example types fine, except I cannot choose between fields `0` and `1`.
Stalled on having a dynamic field load resolution; one that does not depend
on code cloning in *every* case.


So here's a "bad" good solution-
- At every Fresh usage, I tag the value with a unique v-table (unique-per-Fresh).
- At the usage sute of a tagged value, I have a constant lookup.
  This gets me the field offset.
- So at the `sq( 2 )` place, I take `sq` with vtable[0]=0;
- So at the `sq(2.2)` place, I take `sq` with vtable[0]=4;
- Since args are not uniform, I wrap.
- Inside of `sq`, I use the vtable of `x` to lookup `_*_` at a fixed offset.
- At the one dynamic field, I lookup vtable[0], get a field (0 or 1) or field offset (0 or 4)
- I load from tuple+offset, getting either {int int -> int} or {flt flt -> flt}.

Again for `fact`:

      `fact = { x -> x <= 1 ? x : x*fact(x-1) }; (fact(2),fact(2.2))`

Here the inside `fact` has no Fresh, so this is identical to the `sq` solution.


General rule...
- if i can type at all, the unresolved field becomes resolved at some level
- last LET before it becomes unresolved takes a vtable, indexed by each usage site.
- eg ( z = ... (x = ... ( y = ... struct._ ... ))); ( ...x..y..z.., ...x..y..z.. )
- - if x & y types still have unresolved, but z does not, then x gets vtable.
- - Or maybe y gets vtable from x.

Big Picture Answer right now
- Allow mismatched field offset
- I think there's an "acceptable" fast solution
- More inlining might reduce cost to icache-bloat
- Expect to inline for true primitives anyways


=======================================
08/19/2023

About to go to BM, so dumping some state here.

Still failing to type: "fact = { x -> x <= 1 ? x : x*fact(x-1) }; (fact(2), fact(2.2))".

Issue is the FRESH type of "fact" pushes its internal Resolving fields (for the opers <=, -, *)
into the pre-FRESH types.  These pre-FRESH types are already set to either INT CLZ or FLT CLZ
which then forces the internal field resolves.

This is correct for "_<=_" because I match ({A int:1 -> A}, ...) vs both
({int int -> int},{int flt -> flt})
({flt int -> int},{flt flt -> flt})
and field 0 works for both.
Same reason works for "_-_".

Fails for "x*fact(x-1)" because the match is: ({A A -> A}, ...) vs
({int int -> int},{int flt -> flt})
({flt int -> int},{flt flt -> flt})
Which requires field 0 for INTs and field 1 for FLTs.

I've known all along I need to clone `fact` for primitives.
Issue right now is: I'm getting a broken type instead of a failed typing.


Theory:

fresh_unify does NOT push its &123 Resolving fields into `that`?
Might be I never resolve inside of unify, directly.
Instead, I attempt resolve for a particular copy a TVStruct.
If it works (yes or no, not maybe), I record the answer so faster in future -
and I record the selected field in the Load.  There might be several unrelated
field choices (e.g. `fact` will demand the `_*_` pick different fields).

If the Load has only 1 pick, we take it and Done!
If the load has several picks, I need to fail out for ambiguous, (and demand more cloning).

"Demand more cloning" - in "Iter" pass, if a fcn recieves mixed primitives &
ref args, clone for arg types!!!


=======================================
06/25/2023

More on overloads
( A B ) vs pattern ( 7 ) // Ambiguous, either A or B could become int or flt
( @{name:str, ... } @{ age=A } ) -vs- @{ age=B } // Ambiguous, first struct could pick up age, 2nd struct A & B could fail later
( @{name:str      } @{ age=A } ) -vs- @{ age=B } // Ambiguous, first struct is a clear miss  , 2nd struct A & B could fail later
( @{name:str      } @{ age=A } ) -vs- @{ age=A } // OK, A & A cannot miss
( @{name:str, ... } @{ age=A } ) -vs- @{ age=A } // Ambiguous, first struct could pick up age=A

So each match has the following 3 choices
- hard no , something structural is wrong
- hard yes, all parts match, even leaf-for-leaf.  No open struct in pattern.
- maybe   , all parts match, except leafs.  Leafs might expand later and fail.

Actions

- Hard-no: would be nice to be efficient and cut this out of the match choices.
  Can't change the container (directly), but might add side-data?
  Note: same match and same pattern can each be used in other contexts.
- 0-Hard-yes & 1+maybes: Wait.
- 1-Hard-yes & 1+maybes: Wait.
- 1-Hard-yes & 0-maybes: Resolve it!
- 2-Hard-yes & 0+maybes: Unify yes's with ambi error.  Maybes become yes's also unify.

End of pass#1, Apply-Lift goes conservative.

End of pass#2, flip HM_AMBI.
- Between passes, 1-Hard-yes & 1+maybes: Resolve ALL of them, at once!  This
  can cascade resolves, and in turn break other delayed resolves.
- Otherwise SAME.
- We can argue a repeat loop here, if any progress.

End of pass#3
- 0-Hard-yes & 1+maybes: Left around.  Actual resolve has to wait until module is linked against other modules.

Argues for a pass# change in Combo:

while( progress ) {
  - Normal unify
    - 0-Hard-yes & 1+maybes: Wait.
    - 1-Hard-yes & 1+maybes: Wait.
    - 1-Hard-yes & 0-maybes: Resolve it!
    - 2-Hard-yes & 0+maybes: Unify yes's with ambi error.  Maybes become yes's also unify.
      Specifically, TVErr on the Match YESes, since 2+ hard yes's MUST be ambiguous against any pattern.
      
  - Resolve all 1-hard-yes + some-maybes, all at once.  May cause errors.  Report progress.
    Done in parallel to remove an ordering ambiguity.
}
Apply-quits-lifting-leaves.
Repeat above loop.


THIS MIGHT NOT WORK....  UNWOUND.
#######################
#Might also invert TVErr -
#normal TVar has a null TVErr field.
#when error, fills in the _err field, but leafs the TVar alone.
#Another TVar in error, uses the same TVErr field.
#TVErr still points back at the TVars.
#Removes the as_struct/as_ptr problems.


THIS MIGHT BE WORKING!!!
#And I can kill TVNil since unify TVPtr & TVBase?
# Still not checking polymorphic nil on map:
     map( [A ] {A ->B} -> [B] ) vs
     map( [A?] {A?->B} -> [B] )


----------------------
ACTION ITEMS:
*- Add TypeFunPtr BOUND vs UNBOUND check.
*- error to meet/join BOUND and UNBOUND
- Make decent defaults for BOUND/UNBOUND
*- Add TVLambda BOUND vs UNBOUND check.
*- error to unify BOUND/UNBOUND.
*- Load/Field are brainless about overloads
- - PONDER FOLDING LOAD/FIELD BACK TOGETHER
*- Lambda/Func in parser: if called from field-assign, no-bind; else bind.
*- TVErr inversion: UNWIND?
*- Back to Prim-Overs not ptrs


- Bind knows opers
- if no TFP/TVLambda, then NO-OP.
- No oper, if   BOUND, then NO-OP.
-          if UNBOUND, then BIND a TFP/TVLambda
- Yes oper,
- - Must be a TMP/TVPtr
- - To a TypeStruct/TVStruct
- - ForAll fields
- - - Must be UNBOUND LAMBDA;
- - - Then BIND it.


Load/Field/Bind-vs-prims & overloads
- Need a TypeFunPtr which can tell *bound* from *unbound*.
- - DSP=ANY === UNBOUND        The Combo initial BOUND display is ANY.
- - DSP=  all else === BOUND.  The Combo initial BOUND display is XSCALAR.
- - typerror to meet/join BOUND and UNBOUND!!!!
- - tverr    to unify     BOUND and UNBOUND!!!!
- Bind XFER applies binding to unbound
- Bind shallow fresh unifies allowing a binding.
- - V123:( unbound Varg0 Varg1 -> Vres )  produces
- - V456:( Vdsp    Varg0 Varg1 -> Vres )

- Fold together Load/Field/Bind
- - if Load becomes a TVLambda/TFP -
- - - If unbound, bind it.
- - - If   bound, do not change binding.
- - if Load becomes a TVStruct/TMP
- - - Repeat for all fields
- - - - if TVLambda/TFP, assert UNbound; bind them
- - if Load is other, no binding

User writes "1._+_._(2)"
- 1 is TVPtr of int klass
- ._+_ is Load/Field/Bind, is oper, goes deep binding
- ._   is Load/Field/Bind, but pre-bound, no more binding.

Normal ".x"
- .x is Load/Field/Bind, does the 1-deep bound/unbound check and binds as needed

User writes "1._+_.0(2)"
- 1 is TVPtr of int klass
- ._+_ is Load/Field/Bind, is oper, goes deep binding
- .0   is Load/Field/Bind, but pre-bound, no more binding.


Back to Prims-Overs-as-Tuples
Patterns allowed
"1+2"   - Load(1)[@{INT}] -> Field _+_ [(choices)] -> Field _ [lambda] -> Call w/dsp(1), arg(2), and lambda

"1._+_" - Load(1)[@{INT}] -> Field _+_ [(choices)] -> Bind_over [(bound_choices)]

"x.y" - Load(ptr) [struct         ] -> Field y [field       ] -> Bind(ptr) [lambda ? bind_lambda : field]
"x.y" - Load(str)![(bound_choices)] -> Field y [bound_choice] -> Bind(str)![bound_choice]


Still bare lambdas get Bind, so maybe not fold Load/Field with Bind.
( { a -> b } 123 ) // No load, so pre-bind to the current display.

@{ str = { -> ..._pretty_print_self... } } // Lambda is stored.  No bind.

// Makeing sure binding rules are sane for scopes/structs/classes:
cnt:=0;
sq = { x -> cnt++; x*x }; // Lambda is stored; bind to current stack display

..{...    // Some scopes later
  sq(5);  // sq is load/field from up-scope display; pre-bound when used.
..}       // Close later scopes


Base = :@{
  cnt := 0;
  vcall = { cnt++; }
  sq = { x -> vcall(); x*x; }  // unbound since store in struct, dsp is typed as self-struct.
}

Child = Base:@{
  vcall = { cnt += 2; }
}

b:Base = rand ? Base() : Child();
b.sq(s); // Increments cnt by 1 or 2?  Binds sq here.


----------------------
Independent of Prims wanting to bind 2 layers down, I have a theory problem with early/late binding

sq = { x -> x*x }; // In closure scope, early bind.
z = @{
  x = rnd             // Should HM Type-Error, mixing early and late-bound functions
    ? sq              // Early bound
    : { x -> x*x*x; } // Late bound
}
q = z.x; // Should I bind z into x or not???
// Since z.x is a load, I always insert a Bind
// Load should start as "unknown binding"
//  - And falls to "early bind" or "late-bind".
// I can imagine starting the Lambda there as a no-dsp/late-bind.
// Then perhaps late-bind.
// Then later, IF unifies sq-dsp with z-as-dsp & errors (badly).

z0.x0 = z .x ; // Bind?
z1.x1 = z0.x0; // Bind?
z2.x2 = z1.x1; // Bind?
z2.x2(args);   // Bind?  Then call...

// I claim...
Bind happens exactly once on all paths
meet/unify of bound & unbound is an error state.
- DSP goes to ALL or TVErr of the blend
So DSP is either both bound or both unbound.
Bind can use liveness to keep the fp unbound (if display is dead, still do not bind)


=======================================
05/27/2023

Back to TVMem issues -

From prior ACTION items
- Dropped TVMem, but thinking about bringing it back
- TVStructs track a CLZ field (in slot 0 and called ".", if it exists)
- TVStruct fields can be pinned, and float "up" to lowest CLZ position

// Fact is typed as a function taking an 'x' and returning an 'x'.
// 'x' is typed as anything with operators {<=,-,*}
fact = { x -> x <= 1 ? x : x*fact(x-1) };
(fact(0),fact(1),fact(2))

Middle of Combo
Have an unresolved call.
What is the "shape" of memory after such a Call?

TWO BUGS:
ONE: TVNil does not support TVClz operators.  1st fact(0) call does not resolve internally.
Reports back [[_any_]] memory until it resolves.

TWO:
There's a Load after the Call asking about Memory.
Flow-side has no mem info until Call resolves.
Call cannot resolve without operators to resolve.
Load can't find 'fact', so 2nd call never makes.
Since Load doesn't load, Field ".fact" doesn't fall past ANY never unifies.
Cannot unify ANY Nodes, since those are dead, shouldn't type.

FIXES:
- Flow side, unknown calls cannot futz with final fields.
- Flow side, pass final fields thru, other fields stay HIGH (this is Combo) in case they get set to some high thing.
- This never settles out, unless TV3 forces memory shape via as_flow

TVNIL HAS NO OPERS
- Replace TVNil with a TVPtr-struct, unknown alias (works for deep-ptrs, fails for shallow-ptrs+track-mem)
- TVPtr is may-nil.
- No more TVNil.
- TVPtr picks up oper requests.
- Optionally: use TVBase (of nil).  Picks up opers same way.

*ACTION: try to pass final field flow thru - WOrking


=======================================
05/15/2023

*ACTION: DROP TVMem
*ACTION: TVStruct always slot 0 is CLZ, ALWAYS
*ACTION: TVStruct, bring back pinned fields.
 ACTION: When unify drops a un-pinned field, it instead "pushes up" to the next
         open clazz struct and only deletes if it hits CLZCLZ

Don't have Mem right yet.
Let-assign with Mem, then Fresh on use.
Goal is poly-morphic memory.
Might be the same as just using pointers.
But to allow assigns.

Its the magic-updates after a Store to some other memory.  One option: no TV3
updates after a Store.  Aliased pointers all unify, at least on stored fields.
Ex: A = @{ x=?, y="abc" };
    B = @{ x=?, z=3.14  };
    P = rand ? A : B; // Alias A & B, but no unifiy of field parts
    P.x = 16; // Force A.x and B.x to unify with int.

fun= { A B ->
  A.x = 17;      // do A & B alias?
  B.y = "abc";
}


fun = { fx x ->
  x
  ? fx( fun(fx, x - 1 ) )  // Notice oper _-_ on free 'x'
  : 1
};
fun(2._*_._, 99)

Type for 'x' allows e.g. 'Matrix', as long as 'Matrix - 1' yields another 'Matrix'.
Type for 'x' must be a ptr (or int or flt), with class type kept in memory.
So Fresh has to track Memory along with ID (makes a Fresh copy of Memory)?
Not sure I like this... 


At the parm mem:     Parm:Mem            @{ _-_ ^= (&123={ [[]] FOO 1 -> [[]] FOO }, ...), ... } // Unpinned unresolved field can float up
At the parm:4:       Parm        *[alias]
At the klaz load:    Load        *[alias]@{ _-_ ^= (&123={ [[]] FOO 1 -> [[]] FOO }, ...), ... } // Unpinned unresolved field can float up
At the oper load:    Field ._-_          @{ _-_ ^= (&123={ [[]] FOO 1 -> [[]] FOO }, ...), ... } // Unpinned unresolved field can float up
At the resolve  :    Field ._                      (&123={ [[]] FOO 1 -> [[]] FOO }, ...)
At the oper call:    CALL FOO 1                          { [[]] FOO 1 -> [[]] FOO }

So at the call-site for 'fun' we make a fresh copy, and unify
FRESH VAL:*[unk_alias]
OTHER VAL: Base:99 --> *[PINT]
FRESH MEM: [[ unk_alias = @{ _-_ ^= (&123={ [[ ]] FOO 1   -> [[ ]] FOO }, ...            ), ...                 } ]]
OTHER MEM: [[ PINT      = @{ _-_  = (   0={ [MEM] int int -> [MEM] int },1={int flt->flt}), other opers, closed } ]]


=======================================
05/12/2023

Got pretty far since last notes.
   "fun = { fx x -> x ? fx( fun(fx,x-1) ) : 1 }; fun(2._*_._, 99)"
New notes:

"x" needs to be typed as having an operator "_-_" to compute "x-1" - but at
which class hiearchy level?

Which leads me to needing to infer class structure!

So current theory:

- All structs have a clazz in a field, which itself is a TVStruct.
  "@{ . = @{CLZ}, ... }"
  - Put the class with field name "." always in slot 0.
  - Put a display uptick with field name as "^" in slot 1.  Optional.
  - Structs in AA without further clzz names go straight to CLZ_CLZ clazz.
  - The "." field is a TVStruct, not a TVPtr.  Sharing will keep costs down?
  - - Can think about final-field (no leaf) optimizations for Fresh.

  
- Field unifies insert a field into a struct.
  - If the field is in the current struct, unify to it
  - Else
  - - If the struct is closed, Field searchs the super-class chain
  - - If the struct is open, Field inserts there (so first open clazz up the super-clazz chain)
  - - If the super-clazz is null, go back down to the bottom and unify with miss_field
  - Invariant: Following super-chain, from the Field struct to below placement
    is always closed.  May be open at placement struct.

- Base/prims act "as if" closed no-field struct with parent @{INT_CLZ} field.
- INT_CLZ/FLT_CLZ/STR_CLZ has e.g. CLZ_CLZ as parent.  No fields in CLZ_CLZ.

This leaves things like open clazzes (not declared) in-between some lower and higher class, which
might inject a shadowing def, and this is OK.


Can drop Type string clz.  Not being used.
Can drop TVClz, using just a tree in TVStruct/TVPtr.


=======================================
2/09/2023

Fresh is horribly wrong.  Can't track just FunNodes, because still have to flag
Fresh unrelated var uses inside of a let-def:
   z = ....;     // let z = ... in 
   x = ...z...;  //   let x = ...Fresh(z,x)... // z is fresh inside of x's def, with x in nongen set

Theory:
- track nongen Nodes in Scope
- when starting a fcn scope, all parms from DSP_IDX on go in nongen set (so can find from Scope/Ctrl?)
- when starting an assignment store, first pre-assign with ForwardRef, and push these on Scope.
- These FREF's collapse on assignment

Theory#2: Fresh copys all Scope nongen inputs.
- End result: every fresh has a complete copy of all active nongen values.
- Theory#2a: i'll be choked-up with Fresh, and need heavy cleanup rules.

Theory#3:
- Fresh points to a Scope with nongen list
Theory#4:
- Scopes nest (they do), so Fresh can follow the nongen list
THeory#4a: i'll be choked-up with Fresh, and need heavy Scope cleanup rules


Scopes:
 0    1    2    3    4    5     6     7   | 8...
CTL  MEM  REZ  PTR  STK  XCTL  XMEM  XVAL | nongen....

Theory: dangerous to merge Scopes and nongen tracking...
Better to replicate at every Fresh (optimize later if we see huge redundant Fresh lists).

--------------------
Problem#next:

Structs in FunNode/scope().stk()/Frame - first nargs arguments are ParmNodes
and NONGEN.  Next set of fields are in the post-Let-Def set, and not NONGEN.
Need to split structnode inputs.


=======================================
1/25/2023

What is the type of { x -> 1+x } ?
Its an overload type with a pending resolve:
   : ( { 1 int -> int }, { 1 flt -> flt } )._
Notice the trailing resolve: "._"   
Notice the add functions are already bound.

What is the type of { x -> x+1 } ?
Its a little more generic:
  : A:@{ clz = @{ _+_ = ( { A 1 -> B } ) } }
Notice its a normal clz-based overload, this SHOULD unify with int/flt just fine.
Notice it works with any overloaded _+_ operator.

What is the type of { x y -> x+y } ?
  : A:@{ clz = @{ _+_ = ( { A B -> C }* ... ) } }
Notice the 2nd argument can have many choices.... so
Notice the '* ...' which means we can clone the display and nargs, but there
can be any number of other choices.  Similar to an open struct, except all
the functions added are limited to nargs and display.


I can imagine allowing any function types in an overload, as long as uses are unambiguous.
TV types add a bind-to-an-overload, which awaits resolution; basically a pair of display & lambda-choices.
TV types add a 'resolve field' instead of doing it in the Field.unify.  Works thru the above Bind.

So now:

{ x -> 1+x } :  ( { 1 int -> int }, { 1 flt -> flt } )._
{ x -> x+1 } :  ( bind A A:@{ clz = @{ _+_ = ( { A B -> C }... )._ } } )


=======================================
1/22/2023

Final resolution
- Fields have a clz flag
- - non-clz Fields load as normal from Structs
- - clz Fields find the Env.PROTOS("clz") and load from there, but require the
    input to type as a specific class first.  Once determined an edge is added
    to verify nothing changes, and to help the incremental Combo.
- Binds have an over flag
- - non-over Binds expect a TFP and a display, and bind.  The TFP must not be
    bound already.  The display can go dead, and the Bind becomes a no-op.
    Bind also handles non-TFPs as a no-op.    
- - over Binds expect an overload (simple tuple), and Binds to all the
    functions in the tuple as a normal Bind above.  All the functions are
    required to have the same arity (and after Bind, the same display).

- Normal operator calls use a clz Field, then a overload Field, and call
  with the loaded function.  No bind is required.
- Normal Field loads do not bind, and cannot load an unbound TFP.
- Explicit Oper field loads DO over-Bind.

- Anonymous functions early-bind to the current display.


======================================= 1/11/2023

Late-Bind vs Early-Bind
-----------------------

Every lambda is bound *once* (the `self` or `this` argument is curried).

Requires H-M unification to decide to bind-on-load or not.
  Parser in general cannot tell, except for opers.
  Means we insert Binds-after-Fields, but some Binds are removed by Combo.
  Means TypeFunPtrs need a way to distinguish:
  - unknown bound: ANY
  -   known bound: Anything else


Example non-obvious late-bind usage:
    pred = e1; @{ fld = e0; fun = pred ? { x -> x+fld } : { x -> x*fld }; }  
Here we are selecting between 2 fcns, directly inside the Struct lexical scope.
Since inside the struct scope, both are late-bound, and in the end `fun` field
late-binds-on-load.  If `pred` is a constant, `fun` field can move to the CLZ.
Similarly, if `e0` is a constant, `fld` can move to the CLZ.

Example non-obvious early-bind usage:
  { var -> 
    fun = { x -> x+var };
    @{ fld = fun; }
  }
`fun` is early bound in the closure; `fld` holds an early-bind non-local
function, and does not bind-on-load, and stays in the instance.

Example:
@{ // Outermost top-most display frame
  var=e0;
  fun = { x -> x+var }  // late bind to top-most struct
  @{ fld = fun; }  // usage bind-on-load
}


Example wrong-bind usage:
  bad:=0;     // Types as an unbound_inner_function
  @{ fun = (  // Assigned an inner unbound function
       // Inner struct
       @{ inner=e0;
          dummy = {unbound_inner -> bad:=unbound_inner}( {e->e+inner} )
       };
       bad
     )
   }; 


Opers always late-bind directly, perfectly.
- bind after uni/bin ops, and before the overload.

Fields late-bind directly, approximately.
- bind after normal field loads (drop bind if field is not an unbound function).
- Can be pre-bound if anon fcn rule triggers.  Combo sorts this out.

Anon Fcn:
- if enclosing lex scope is a struct, LATE BIND else EARLY BIND.

Error to:
- FP2DSP an unbound fcn.
- Example @{ f0=...; f1 = ...map( { e -> e+f0} ) }
- - map call passes anon fcn, which expects to be late-bound because enclosing lex scope is struct.
- - internal to map, late-bound fcn is called without binding.
- - Can add syntax to anon fcns in struct scope to early bind.
BindFP must be monotonic
- if input has a display, then this is noop, flow passes display thru, and unifies straight thru
- if input has  !display, then this binds, flow sets display, and unifies display with the TFP display.
- Always unifies self with TFP.
- Since unify cannot be undone, must delay unification until we can tell has-vs-hasnot display.
  - TFP.DISPLAY  DISPLAY
  -   NO_DSP      ANY    - UNKN Pass along no-dsp .
  -   NO_DSP      XXX    - BIND Pass along XXX dsp.  Unify TFP.DSP and DSP.
  -  HAS_DSP      ANY    - NOOP Pass along has-dsp.
  -  HAS_DSP      XXX    - NOOP Pass along meetdsp.  Assert never NO_DSP/XXX -> HAS_DSP/XXX
- Requires unbound TFPs to use ANY display.
- Phis merging bound and unbound can report an error.


H-M knows unbound from bound TVLambdas.  Use null display arg for NOT YET BOUND
- BindFP expects unbound display in, makes a new TVLambda out with same args, but with display
- FP2DSP expects bound display in.  Can be dead.
- Call dont care on either fcn arg or display arg.
- Error to unify bound & unbound fcns.
- Need to print the null display, handle NPEs as needed


CLZs for H-M, especially INT/FLT CLZ:
-------------------------------------
- TVBase INT/FLT supports TVField uses, indirects to CLZ.
- Oper Fields indirect to "." CLZ field.
- Constant final fields move to the CLZ, post-Combo.
- Unbound functions are constants, so are "abc", 17, 3.14, etc.
- Structs and tuples of constants are constants, e.g. overloads of unbound functions.
- Pretty sure this is a FLOW property, so fully determined e.g. after Combo.

- Later, can loosen this to include "invariant not constant".
- Example invariant:
    "pred = e1 ; for {idx} { ary[idx] = @{ fld=e0; fun = pred ? {x->x+fld} : {x->x*fld}}; }"
  Since pred does not vary in program, all `fun` fields are the same, so can move to CLZ.
- Example varying:
    "for {idx} {pred=f(idx); ary[idx] = @{ fld=e0; fun = pred ? {x->x+fld} : {x->x*fld}}; }"
  Since `pred` varies by index, the `fun` field varies by index, so cannot move to the CLZ.


Lazy Fresh:
-----------
- Theory sez every "use" is a Id in the pure H-M, which requires a Fresh of its input.
- "using" the INT CLZ therefore requires a Fresh, for every int, of the entire INT CLZ.
- Can delay Fresh past "projections" or "refinements", i.e. Field loads.  Since the
  projection/field slices its input, only need the Fresh on the slice.
- Rule: push down Fresh as much as possible, shrinks the amount of "fresh" TVars.  


New NEW Simplify:
-----------------
- Old NEW: Struct/New(old mem)/MProj(makes newmem)/DProj(makes ptr)
- New NEW: Struct/New(makes ptr)/Store(old mem)(makes newmem)
- New takes in only a Struct, produces a Ptr & Alias
- Store can store the New & Struct.


--- Should not need to do this optimization / bulk xform:
Use a large XFORM to move early-bind to late-bind.
Has to wait until the Parser quits making Loads.
Probably waits until Combo finalizes Aliases.
- This optimization shrinks object size (can move late-bind fcns in final fields to a CLAZZ)
- Need to be able to find all Loads of a particular alias/New/Struct (label)/Bind/FPtr in constant time.
- - From Struct(label), forwards find Bind/Fptr.
- - Backwards find New(ptr) alias;
- - find all Loads using this alias
- - If Loads have more aliases, must repeat across all aliases
- - find Field with label
- - Bind-after-Field, and remove Bind-after-Fptr


=======================================
12/27/2022

A little more on field/bindfp/clz handling.

"1 + 2" - expands to bind(1,int._+_)._(2)
x.fld - expands to a Field load from struct x...
but if fld misses in 'x' and hits in "x_clazz' then
the expansion is:
  "bind(x,clazz.fld)"

=======================================
11/18/2022

Main code base:
probably need to drop Unresolved.
Use 'Field' with unresolved label for forward-refs.
Upon seeing a f-ref, add var in scope, pointing to empty tuple.
When using the f-ref, insert unresolved Field in the code, reading from the tuple.
When finally def'ing, insert function in the tuple.

Of course, loading from unresolved is not used in normal code.
So the tuple has to be flagged as a fref tuple?


=======================================
11/16/2022

Overloads solved (H-M resolves unknown field, but user syntax for unknown field).
- TODO: now do this in main.

Pending: redo the deps tracking ala Bobby E's suggestion

Pending: TVar moved to a hard class heirarchy instead of a soft class with string edges


Leaning towards wrappers being ptrs-to-struct
*[6]int:{$17}
TMP -> OBJ with clz "int:" and def TypeInt:17 and no fields.

1 + 2 turns into:
*[6]int:{$1}
  Load [6] to get an instanceof 'int:'
int:{$1}  
  Field ._+_  ; this peeks the empty struct; fails; looks again at 'int:' PROTOS for _+_, succeeds; loads an Overload
&[ {17 ^=int:int64 int:int64 -> int:int64}
   {18 ^=int:int64 flt:flt64 -> flt:flt64} ]
  Field from PROTOS also specializes the 'this':
&[ {17 ^=int:1 int:int64 -> int:int64}
   {18 ^=int:1 flt:flt64 -> flt:flt64} ]
  Field ._ ; requires overload resolution
  Field .0 ; specialization complete
{ int:1 int:int64 -> int:int64}
  Adds the remaining argument and function call
({17 ^=int:1 int:int64 -> int:int64}  int:2)

Pre-optimization Nodes:
  int:1          // A ConNode with type 'int:1' which is a TMP -> OBJ with clz "int:" and def TypeInt:1 and no fields.
  Load           // Converts TMP to a int:TS
  Field ._+_     // Class load vs int:TS; adds disp
  fun = Field .0 // Overload resolution
  int:2          // Arg2
  Call(fun,int:2)

Post-optimization
  fun=FunPtr(17,int:1)
  Call(fun,int:2)

After inlining:
  int:1
  int:2
  $17$add_ii(int:1,int:2)

After prim folds:
  int:3

--------------

xnil behavior:
0 + 2

*[8]nil:{$1}
  Load [6] to get an instanceof 'nil:'
nil:{$nil}  
  Field ._+_  ; this peeks the empty struct; fails; looks again at 'nil:' PROTOS for _+_, succeeds; loads an Overload
&[ {15 ^=nil int:int64 -> int:int64}
   {16 ^=nil flt:flt64 -> flt:flt64} ]
  Field from PROTOS also specializes the 'this'; no change for nil
&[ {15 ^=nil int:int64 -> int:int64}
   {16 ^=nil flt:flt64 -> flt:flt64} ]
  Field ._ ; requires overload resolution
  Field .0 ; specialization complete
{ nil int:int64 -> int:int64}
  Adds the remaining argument and function call
({ ^=nil int:int64 -> int:int64}  int:2)

Post-optimization
  fun=FunPtr(15,nil)
  Call(fun,int:2)

After inlining:
  nil
  int:2
  $15$add_0i(nil,int:2)

After prim folds:
  int:2


-------------------------------------

What happens if loading from xnil, then nil falls to e.g. TypeInt or TypeStr ?

xnil   -> Load ->    // Loads from the default class struct, which int,flt, etc all load from.  
:@{_+_ = &[{xnil x:A -> x:A}] } -> Field._+_ ->   // int/flt class add same field with *more* overloads.  Needs to resolve combined overloads.
&[ {0 x:A -> x:A } ] -> Field.0 ->
{ 0 x:A -> x:A } -> Call int.2 ->
int.2


A:@{
  label = &[ fun1 fun2 ]
}

B:A:@{
  label = &[ fun3 fun4 ]
}

Here B-isa-A.  Requires matching fields have a isa-relationship in their contents.


=======================================
9/24/2022

Solving the mis-mashed Overloads

Same Overload can be merged at If, or Arg, matching field to field.
Fresh of an Overload keeps the basic overload type, but has fresh contents.
Mis-mashed overloads have to resolve at that program point.

Impl: all Over fields have a unique field name.  Matching field names can unify like normal.
Missing fields accumulate.  Resolution happens once per Over (per set of fields).
After resolving, find() can collapse.  If multi-Overs have merged, they can resolve
independently.  Once resolved, their resolved parts also unify.  The whole Over cannot
be collapsed until all parts resolve.

Impl:

a "multi-over", a mover:
- args are unique-named T2.is_over's.  Use the T2 over._uid as unique name.
- maybe with a tag field to mark a mover.
- unify 2 movers:
- - matching fields (overs), field by field
- - unmatched just accumulate, like an open struct
- unify mover+other
- - short-cut '&&' in each over, already resolved
- - trial-unify each over
- all overs resolved to '&&'
- - then self resolves to '&&'
- check for nesting movers
- - flatten into top multi-over
- - check for self flatten (cycle), replace with simple leaf.  Makes ambiguous.


overs:
- fields named 0,1,2,3; but unique within a mover; uses the _uid in the mover
- only expect 2 overs to unify field by field, never an over and something else
- except trial-unify from mover
- - record '&&' field per-over; 3 states; null/Missing: not resolved;
    err-w-self: error (no resolves); other-child: resolved child
- never see a field isa mover, since mover flattens
- if field is an over, use its resolve, or declare "unify success" (which typically stalls)

over1 = &[ a b c ]     // T2 mover &[ over1( a, b, c    ) ]&
over2 = &[ x y over1 ] // T2 mover &[ over2( x, y, over1) ]&

// T2 mover &[ over2( 0:x, 1:y, 2:over1/*if over1 is resolved, use that, else unifies/delay*/) over1(0:a, 1:b, 2:c) ]&


=======================================
9/22/2022

Troubles making TypeUnion a lattice, with constants and nil.
Thinking:
  drop TypeUnion
  
Overload makes a TypeStruct with tuple-named fields; no display (no scope).
Need a Load/with-unknown (HM only knows) field to get correct field.

Overloads are numbered (or clz?  or alias?), based on alloc site.  
Meet with same number/clz/alias uses per-element.
Meet with unrelated blows out to e.g. &[ANY].  Awaits HM selection per-use.

Illegal to mix unrelated Overloads (ambiguous), unless can extract per-use sanely.

// Here 2 unrelated Overloads MEET.  Must resolve prior to MEET, and the Phi has e.g. "int" type.
    over = pred ? &[1 0x123 "red" ] : &[2 3.14 0x456 ]; // Unrelated Overloads MEET
    x = over & 3;  // resolves 'over' as 'int'; otherwise over is 2 unrelated unions, and resolve happens across all


// Mixing related Overloads MEET field by field.  Require same ALIAS#
  "color = { num str -> &[ num str ] };                       // Single Overload alias#
   rgb = pred ? (color 0x123 "red") : (color 0x456 "blue");   // Mixing same overload is OK
   (pair (cat "light" rgb) (- rgb 0x010101))"                 // at use sites, extracts correct part; GCP checks for Overload, checks for HM resolution, extracts part; use uses JOIN 

// Mixing UNrelated and UNresolved Overloads just returns a MEET of JOIN-ALL-FIELDS.
// IN AA, every Use asks: resolve-now, or post-pone?
  "bits = pred ? &[ 0x123 "red" ] : &[ 3.14 0x456 @{x=1} ]; // Mixed overload, resolves here
   (dec bits)" // Only uses integer


So Overloads with same alias meet part-by-part; same for Apply arguments (Just Another Phi).
At other uses, check for HM resolve.  Unresolved, uses JOIN.  Resolved uses chosen part.

To allow overload _+_ oper and _*_ oper to be mixed, all must be made by the same function.
  make_bin_op = { ii if fi ff -> &[ ii if fi ff] }
  add_prims = (make_bin_op _i+i_ _i+f_ _f+i_ _f+f_); // overload of widening add prims
  sub_prims = (make_bin_op _i-i_ _i-f_ _f-i_ _f-f_); // overload of widening add prims


=======================================
8/22/2022

Recursive types seem sane, modulo unroll/re-roll or O(n log n) minimization.
Overloads seem sane, not tested on larger programs.

Probably need true row-polymorphism (a type-var with a set of labels/types
representing all the unnamed labels in an open struct).  Means I can make
a struct of e.g. a subclass or extra fields, and keep the extras.  Example:
  { point delta -> @{ x=point.x+delta, y=point.y+delta, point... } }
Adds delta to fields {x,y}, and copies the remaining fields into the result.
This can be used to type-safe update-in-place, or classic side-effects.

Also, would like progress on main AA.

Decision:
Progress on main AA!
Row-polymorphism for another time.


=======================================
7/18/2022

Recursive types (both Data and Functions) and Structs


CNC: My haskell attempt: 
  import Data.Typeable
  
  // Define a cyclic data wrapper type
  data FWrap = FWrap {flt::Float, mul::FWrap -> FWrap}
  fwrap = (\x -> FWrap { flt=x, mul=(\y -> fwrap(x * (flt y)))})
  
  main = do
    putStrLn (show( flt (fwrap(1.2)) ))
    putStrLn (show(typeOf(fwrap)))


Struct / Fields -
- Structs create a record or struct, fields use a record.
- Structs create a set of labels (or fields).  The set is "closed" in that all
  the labels involved are directly mentioned, and no others are present.  These
  fields are all available.
- Fields demand (or require) a particular label, but do not care if others are
  present; they make an "open" struct or a "partial" struct.

- Unifying two structs intersects the available fields:
    "(if rand @{ x=1.2; y="abc"} @{ x=2.3; z=17 })"  // is typed: @{ x=flt }
  Since the fields "y" and "z" are not available on both arms, they are not
  available in the result.

- Unifying two field refs unions the required fields:
    "(if rand rec.x rec.y)" // is typed: @{ x=A; y=A; ... }
  Since "x" and "y" are both mentioned, both are required.  Since they are
  field references, other fields are allowed and the result is open.  Their
  internal types are recursively unified.
  
- Unifying a closed and an open struct - I "lose" the required fields and keep
  the list of available fields, even the unused ones.
    "s0 = @{x=1.2; y="abc"}; (s0,s0.x)"  // is typed: (@{x=1.2,y="abc"},1.2)    
- If some required fields are not present, they are added to the closed side as
  an error type.
      "s0 = @{x=1.2; y="abc"}; (s0,s0.z)"  // is typed: (@{x=1.2,y="abc",z=E:[Error: missing field 'z']},1.2,E)

Recursive Types:
- Everybody loves them with a type-name in the middle, and fails when anonymous:
    "List A =: @{ next=List; value=A }?" // AA version of a normal "List" type
    "f = (f f)" // "Occurs check" fail in Haskell (Scala?), and the original H-M algorithm
                // AA type: A:{A -> B}

  Without the type-name in the middle, the normal H-M algo fails.  Modifying it
  to succeed is straight-forward, if tedious.  However, now we have a
  type-equivalance problem: are "unrolled" versions of a type equal to the
  re-rolled?  See https://en.wikipedia.org/wiki/Recursive_data_type, esp the
  sections on iso-recursive vs equi-recursive.  The Lattice types in AA are
  equi-recursive and use the O(n log n) algorithm to minimize, and cheap
  reference equality after that.

  Right now, the H-M types are iso-recursive and require a cycle-aware
  equivalence check.


Recursive Types & Structs
        // Define a function 'fwrap' taking one float argument
	fwrap = { ff ->
          // ...and returning a structure.
	  @{
             // The structure has a float field 'f'
             f = ff;
             // ...and a function field 'mul'.  The mul function takes a struct
             // with required field 'f' and calls fwrap after multiplying.
	     mul = { y ->
                      (fwrap (f* ff y.f))
             }
             // Both 'f' and 'mul' fields are available in the result
	   }
	};
        // Type a sample wrapped float.
	(fwrap 1.2)
        
Haskell cannot type the Haskell version because "occurs check".
AA types it fine as:
         A:*@{
           f = flt64;
           mul = { *@{f=flt64;...} -> A }  // Note the struct is cyclic on itself
         }
Extending the fwrap example:
        con12 = (fun 1.2);
        (con12.mul con12)  // Compute 1.2*1.2
AA types as:
        A:*@{
          f = flt64;
          mul = { B:*@{    // Hey! We've inlined (nearly) the same type....
                    f = flt64;
                    mul = { *@{f=flt64;...}
                            -> B
                    }
                  }
                  -> A
          }
        }

As the float expression goes deeper, the type notes that the deepest leaf is
only used for its payload (requires field 'f') and can have another other
fields "...".  The in-between and outer types all use the closed struct syntax,
because those types are made here, and not passed-in.  Hence they list the
available fields, not required.  The type ends up 'counting' the number of
layers between the outer available-field types, and the leaf required-field
type.  End result: minimal type is as large as the deepest expression.  Yuck.

-------------------
         
Haskell CAN type this version, which uses an explicit named data type:
Haskell:  data FWrap = FWrap {ff::Float, mul:: FWrap -> FWrap }
Main AA:       FWrap =:     @{ff :flt  ; mul:{ FWrap -> FWrap } };

Haskell:  fwrap = (\x ->  FWrap{ff=x, mul=(\y ->  fwrap (x * (ff y))) } )
Main AA:  fwrap = { x ->  FWrap(   x,     { y ->  fwrap (x  * y.ff) } ) }
Core AA:  fwrap = { x -> (FWrap    x      { y -> (fwrap (f* x y.ff))} ) }

Haskell learnings: user 'rounds up' the inner type to a more specific (less
general) type: FWrap.  Now everything is just {FWrap->FWrap} and the types do
not unroll endlessly.


=======================================
7/8/2022

Overloads in HM mostly working.

- Overload entries are named (ordered).
- Unification of overloads is by name (order).
- No good AA syntax for naming, just ordering.
- Mostly fine for the primitives.

- GCP still needs to handle the sum-of-products for FIDXS.
- Currently mixing unions of joins gives me the BitsFun.EMPTY set,
  which means GCP reports a REALLY bad answer for Apply results.


=======================================
6/28/2022

Much progress on primitives and HM.

Decision:

User visible prims are actually structs, with _def field being the actual
primitive.  The struct holds final fields, typically holding function pointers,
all pre-bound on the display to this struct.  This is semantics only, as its
wildly inefficient to replicate all the fields all the time (at least it was
for Types).  So their presence (or absence) needs to be hidden behind an
abstraction layer.

User sees: int64
Actual type: SA:@{_def=int64; int: -> Vleaf; _&_ = { *SA *SA -> *SA }; _+_ = ... }


User sees: 3.
Actual type: S3:@{_def=int:3; int: -> Vleaf; _&_ = { *S3 *SA -> *SA }; _+_ = ... }

Implementation type: @{_def=int:3; _clz="int:"; ... } 

When iterating the fields of the impl_type, walk the existing fields normally.
Then get the prototype (Env.PROTOS.get("int:")), and walk those fields, just
the none-'$' fields.  Every field that is a FUNCTION (or could be?) gets its
first arg pre-bound to the self S3 type.

The INT Prototype, partially thru typing:
V428:@{
  $ = int:int64;
  !_   = { V7 -> %int:int1 };
  -_   = { V11 -> %int:int64 };
  _!=_ = V37;
  _%_  = { V47 V49 -> %int:int64 };
  _&&_ = { V61 V58 -> %V51 };
  _&_  = { V64 V66 -> %int:int64 };
  _*_  = V17;
  _+_  = V67;
  _-_  = V76;
  _/_  = V29;
  _<=_ = V85;
  _<_  = V94;
  _==_ = V103;
  _>=_ = V112;
  _>_  = V121;
  _|_  = { V131 V133 -> %int:int64 };
  _||_ = { V53 V54 -> X59>>%V52 }
}


Issue: finding a clz field is expensive.

Idea; pull out the clz field as the "_is_obj" flag.  String.  null for
not-is-obj.  Not-null for is_obj.  Empty string for plain Object.  Otherwise
e.g. "int:" for int-clazz.  Does not play nice with unification.

Idea; use a specific " clz0" field, with leading space.  Contents are the clazz
as a TV2 constant string base.  Plays nice with normal unification.  Can have "
clz1", " clz2" etc fields for subclazzes.


=======================================
3/14/2022

Looking at primitives & prototypes.

Theory: Value-types are a Thing.
Value types are directly a TypeStruct instead of a TMP to a TS.
Load from a Value type does the lookup in the TS fields first,
then extracts a class name from the TS and repeats in the class.
Always as-if all class members are part of every value TS.

Non-value-types are a TMP, and a Load peeks thru the TMP first
(plus a NPE check, plus any aliasing issues).

Value constructors wrap args in a TS without using a NewNode: so need a StructNode
again (which a NewNode might wrap?)


User int/flt are value-types.  Are represented as a named TS with a "_val"
field, and a prototype for the operators.  The "as_fun" expansion loads from
the _val fields and inlines the constructor.


So now I'm thinking Load/Store are used for memory ops and I always *know* so
can strongly require Load/Store address be a TMP.  Load returns a TS not the
field value (and Store takes a TS).  This feeds into a FieldNode which peels
out the named field from a value-type (a plain TS).  Store node uses a
StructNode to construct a whole TS, which then updates TypeMem.

Parser for ints/flts inlines the correct StructNode with args.  So Integer
StructNode takes 1 input in *slot 0* (no ctrl,mem inputs) and produces a named
TS:  "int:@{_val=...}".  In the name of expediency rename "_val" to "i".


So now I'm thinking to fold NewNode and MrgProj together, and feed a StructNode
into a New/Mrg Proj.

Summary:
- Move field handling only, no alias ==>> StructNode
- Keep alias handling ==>> NewNode
- Move memory merge ==>> NewNode
- Nuke MrgProj

Most things become StructNode / value-objects.
Need a syntax for allocation!!!
E.g.  @{ x,y } // Point-like value struct, no   allocation; typed as a TS
E.g. &@{ x,y } // Point-like       struct, with allocation; typed as a TMP
E.g.   ( 1,2 ) // tuple value
E.g. & ( 1,2 ) // tuple allocated

Allows '->' as a post-fix operator converting a ptr to a value.
Allows '*'  as a pre -fix operator converting a ptr to a value.

" ptr = &@{x,y};   ptr->x "
" ptr = &@{x,y}; (*ptr).x "
" obj =  @{x,y};   obj .x "
" ptr:*@{x,y} = ..." Var 'ptr' is typed as a "pointer to a struct with x,y fields"

---------------

StructNode - gathers inputs and has both ordered and named access.  Builds a TS.  No memory, no alias, no allocation
FieldNode  - Selects a field from a TypeStruct.  Glorified ProjNode, since the input does not have to be a StructNode (can be Parm, Phi, etc).
FieldUpdateNode - Takes a TypeStruct (eg StructNode), a field name, a Scalar, and produces a new TypeStruct.

NewNode - takes a TypeStruct (e.g. StructNode), assigns an alias and produces it in memory.
LoadNode - takes a Memory and an alias from a TMP and produces a TypeStruct.
StoreNode - takes a Memory and an alias from a TMP and and a TypeStruct and produces a Memory updated.

---------------

This raises the question of StructNode liveness, which is a deep/rich liveness
one 1 alias which I do not have.
Some Options:
- use simple-liveness.  All fields are either alive or dead.  Loses out on per-field killing.
- use deep-liveness on e.g. alias#1.  Requires deep-live on non-mem nodes.
- redo live, more engineering (in for a penny, in for a pound)

Redo live
- Type.ANY,Type.ALL - everybody handles
- TypeStruct.xxx - Struct, Load, Store handles.  Others pass-thru (e.g. Phi, Parm, Ret)
- - Also FunPtr makes a TS with fields "^" and "fun"
- TypeMem.xxx - Load, Store handles.  Others pass-thru.
Drop TypeMem.ALIVE,DEAD.  Drop LNO_DSP and complex friends.
Drop basic_alive.  Drop all_live.


=======================================
3/13/2022

Got HM+GCP core theory stable.

=======================================
1/1/2022

More type decisions:

- TFP can have no display (code ptr only), 1-shot binds to a 'self'.  'self'
  can be a closure, or a enclosing struct.  Basically implies TFP curry the
  display, and there are binding syntax steps which bind the display.
- - Loading a fcn ptr from a Value type binds the value as the display.
- - Loading a fcn ptr from a static method binds the curr env as display.
- - Loading a fcn ptr from a prototype does NOT bind; gives a code ptr.
- - If not bound, the 1st arg given is always the display.
- Drop TFS.  Formal args are just a TS where needed for call resolution.
- - Other use AssertNode to force types.
- Drop TypeObj, TypeStr
- TMP pts to a TS
- TypeAry becomes a Scalar, suitable as a field in a TS.
- - TypeAry can be fixed-size
- - Exactly 1 TypeAry field can appear in TS, as the endless extent after a TS.

- After parsing prims, all ref parses pick up the interal prim name 'ref:'
  which includes the == and != opers.

- Currently lacking syntax to make a anonymous value type.  Maybe this is OK.


=======================================
12/24/2021

issue:
  need to resolve based on *types* not *graph-shape*.
  Graph-shape is an optimization, but the default base behavior needs to work with *just types*.

Which means at a Call fdx, I see may a TFP with multiple fidxs.
Need to map the fidxs to a formals.
- Some fidxs refer to Vals (Value type constructors).  Formals from prototype.
- Some fidxs refer to Fun/FunPtrs.  Formals from Parms.

FunNode.FUNS:
- Combo inits nongen set.  One pass, no init needed?  Prims shoulda reset at the top-level reset.
- FunNode.<init> stuffs on FIDXS, but need FunPtr.  Move the insert to FunPtr create.
- RetNode.clear for dead?
- Call inlines single-fidx single-callee FunNode.
- Call wires all fidxs to FunNodes.
- Call inlines ValNode.
- Call error reporting
- Fun, Ret, FunPtr, Val name by fidx
- Node.con probably should allow no-disp FunPtrNode, default ValNode constructor
- - These should be directly in FunNode.FUNS, so Node.con does not need to create.
- Scope finds ValFunNode for escapes.


=======================================
12/18/2021

"ints as structs" looks good in theory, wildly inefficient in making nodes & types.
Seems to work fine doing math.

Idea: same semantics as "ints as structs", same overhead as "primitive ints".
"Named:" types have a class type with the same name somewhere.  (Fully qualified :'s to keep uniqueness).
Named types can be in ConNodes and value-flowed.
LoadNodes support them as the address;
- the memory can be dropped
- the fld is 100% matched (or error)
- The contents are always foldable
- - "_val" or similar just strips the "name:" off and gets the primitive
- - ".hash" or "._+_" or similar just loads from the Class
"Class" is just a NewObj, found from a global flat table of NewObjs.  Force keep-alive until Combo/GCP.
No memory on a "Class"; its the immutable part of a primitive.

Build "classes" via normal parsing of named types; detect "value type" via e.g.
all-final-fields, or specific type keyword?

Instance-field loads make a TFP, with the display being the named prim.
int = : @{
  _val=$$java_int;
  _+_ = { y:int -> $$prim_add_int(_val,y._val) }
  _+_ = { y:flt -> $$prim_add_flt($$prim_flt(_val),y._val) }
}
flt = : @{
  _val=$$java_flt;
  _+_ = { y:flt -> $$prim_add_flt(_val,y._val) }
  _+_ = { y:int -> $$prim_add_flt(_val,$$prim_flt(y._val)) }
}
1.2+3 // Parse makes: (flt:1.2)._+_(int:3)  ==>> call flt._+_ with display flt.1.2 and arg int:3


=======================================
12/5/2021

Getting better testing alive.

Upon the mode-flip to GCP - a bunch of things can LIFT.

- DEFMEM goes "dead", unused.  DEFMEM had-been-used to give minimal "shape" to unknown memory.
- This means i lift, until Combo... then i can lift some MORE.
- Can I get rid of DEFMEM?  Or nuke it immediately after Parse?
- Problem may have been solving the loss of Parser precision after an unknown call - which
  was just some primitive:
    - "x+y"
    - assumption is the unknown call "_+_(x,y)" side-effects "x".  At least until call resolution & inlining.
- Making a (nested) fcn-call needs a state-of-memory as sharp as the parser already knows,
  so the parser can do name-lookup.  Using e.g. an TypeMem.ALLMEM memory argument is very weak.
  But maybe legit?  No (real) folding is possible until Combo.


- Unresolved goes "high".  But probably need to leave them low?
- Unresolved has no plan in the HMT+GCP world.
- I DO have a plan where I drop unresolved, and just convert all to a simple field-load of some choices,
- plus a full-arg choice selector.  Require all these choices to be unambiguious overloads,
- resolvable to a single-target (or not resolved at-all if in error).
- Better answer: Unresolved *demands* choices are unambiguous, and supports selecting the correct one.
- Drop extra arg to value() calls, only used for Unresolved being high/low.

- Operator lookup uses an op table to decide precedence, arity, arg placement.
- Replaces "x op y" with "x._op_(y)" using the instanceof syntax calls
  (so "x" is the Display) in the code impl.
  
- Struct type decls split into a static (final-field inline assigns) and dynamic part.

- All calls of the form "x.final_field(args)" use "x" as the display inside the code for final_field.

- Swap && || to the lazy-args model

- Drop TypeLive

- WorkNode + PriorityWork queues

- Drop 'keep' for a Parser "tmp" to keep-alive, and allow any mutations as long
  as i can keep a pointer to the replacement.
- Simplify the Iter loop, dropping the 'x' keeper.
- Add KeepNode
- Scopes drop the internal keep(); hook onto the Env KEEP_ALIVE
- New pattern: use GVN.init instead of GVN.xform when dealing with sets of Node setups
- invariant: NOT { on_worklist, elocked, calling iter }


=======================================
12/4/2021

Holy Moly: prototype "core" HMT+GCP passes in all modes, with 32 rseeds, and
produces the desired improved results.

Now looking at integrating back into the main AA code base.

Along the way, re-thinking primitives-as-full-objects.
Still require the good cost-model.

But would like, e.g.

// integer as a struct type
int =: @{
  // has a '_val' field which is a TypeInt.INT64
  _val:$$com.cliffc.aa.type.TypeInt#INT64;

  // All final fields in the type constructor moved to a "class" object.
  

  // Hidden final field, which is a type-flag.  Demands that the "this"
  // object fully inline, leaving its mutable fields as bare scalars.  
  _default_must_object_inline = 1; 

  // Unary operators
  -_   = {       -> $$com.cliffc.aa.node.PrimNode$MinusI64(_val) }
  !_   = {       -> $$com.cliffc.aa.node.PrimNode$Not     (_val) };
  // Binary operators  
  _-_  = { y:int -> $$com.cliffc.aa.node.PrimNode$SubI64(_val,y._val) };
  _*_  = { y:int -> $$com.cliffc.aa.node.PrimNode$MulI64(_val,y._val) };
  _/_  = { y:int -> $$com.cliffc.aa.node.PrimNode$DivI64(_val,y._val) };
  _%_  = { y:int -> $$com.cliffc.aa.node.PrimNode$ModI64(_val,y._val) };
  _&_  = { y:int -> $$com.cliffc.aa.node.PrimNode$AndI64(_val,y._val) };
  _|_  = { y:int -> $$com.cliffc.aa.node.PrimNode$OrI64 (_val,y._val) };
  // Binary relations
  _<_  = { y:int -> $$com.cliffc.aa.node.PrimNode$LT_I64(_val,y._val) };
  _<=_ = { y:int -> $$com.cliffc.aa.node.PrimNode$LE_I64(_val,y._val) };
  _>_  = { y:int -> $$com.cliffc.aa.node.PrimNode$GT_I64(_val,y._val) };
  _>=_ = { y:int -> $$com.cliffc.aa.node.PrimNode$GE_I64(_val,y._val) };
  _==_ = { y:int -> $$com.cliffc.aa.node.PrimNode$EQ_I64(_val,y._val) };
  _!=_ = { y:int -> $$com.cliffc.aa.node.PrimNode$NE_I64(_val,y._val) };
  // Non-operators
  str = { -> $$com.cliffc.aa.node.NewStrNode$ConvertI64Str(_val) };
  // Hashable
  hash= { -> . }
  eq  = { y:int -> . == y }

}


=======================================
11/29/2021

And now i need a monotonic Apply-Lift for the combo HM+GCP.
Failed previously for being not-monotonic.
Theory:
- if a T2 appears on the input of an Apply, and a GCP is structurally the same,
- THEN if the T2 appears on the output, we can assert the input & output are the
  same flow type, and JOIN the input flow to the output flow.

- TO be monotonic, if any T2 leaf might join with any other, we must assume they will
  before the HMTs become stable.
  
- If the GCPs are high after stability ... we'll become unstable.  The falling
  GCP will trigger an IF to unify both sides which will split the T2s.  But
  which T2 do we split?

- Haven't thot thru similar bases unifying like leaves.
- Haven't thot thru diff    bases unifying like leaves - and being errors.

- Periodically what starts as a T2 leaf (so can join) becomes structure.  We
  lose the joining.  Assume GCP follows the T2.  Should be no (effective) join
  lossage: GCP should be XSCALAR until it follows.

- Keep result as XSCALAR, until T2 and GCP are in-sync structurally
- Or we stop the extra-lift, which means quit pre-joining all T2s.
- - If GCP is still high here, "we have a situation"


=======================================

Pondering time for the major tech-debt cleanup

- drop retype memory, Call assume fcn body has problem mem types always, can inline without type check
- drop escapes logic, is used to help rescue (in small code) precision for recursive struct read/build.  Rely on HM instead
- Several tests fail because cannot bypass constructors over recursive fcns, due to recursive aliasing.
- - Figure it out!!!
- - Build tuples & structs AFTER all initial fields are gathered.
- - Functions in tuples take a display (self), so can be build ahead of the struct NewObj


- Insert a int/flt Class with fields for _+_,_*_,_-_,_/_ and hash and eq, for instance
- Create the int/flt Class at _prims.aa time.
- Parser cant do int/flt constants until these classes exist, then short-cuts to them.
- - int = :@{ _val; _+_ = { x:int y:int -> java_i64_add(x,y):int };
                    _+_ = { x:int y:flt -> java_i64_add(flt(x),y):int };
                    flt = { x:int -> java_int2flt(x):flt }
                    ...
            };
    flt = :{ .... }
    

CallEpi Lift:

- If it CAN unify, it Might.  So... a "compatible" test.  If the output HM "is
compatible" with ANY input, it MIGHT (eventually) be equal to any of them.
Take an output TVar, and JOIN against ALL compatible input TVars, at any depth.
Compatible means: "test-unify says progress, or both are same"

UGH:  lift any part of output from any part of input....
Need to walk & part-out all of (iTV,iType).
Can pre-JOIN the iTypes from all the compatible iTVs?

"Obvious correct but slow" version:
- For all inputs, look at (iTV,iType).
- - For all prior incompatible iTVs,
- - - see if this iTV is compatible with any: if so, join again.
- - - else add a new incompatible (iTV,iType).
- - Test&Set recursive; return;
- - Recursive walk iTV parts
-
- Start with a top-level TVar & Type for Output: (oTV,oType)
- For all incompatible iTV
- - if oTV is compatible to iTV
- - - Find singular iTV
- oType &= iType // join; may add structure to oType
- Test&Set recursive; return oType
- Recursive walk (oTV,oType)
- - Build up oType parts from recursive sup-parts

Faster?
- return if oType is already ~Scalar or above; cannot lift any more.
- return if iType is already Scalar or below; will not lift.


=======================================
10/4/2021

Still wrestling with theory.
"f = { x -> x ? (f(x.next),x.val*x.val) : 0 }; f"

3 interlocking optimistic passes being combined, with lots of bonus extra bits:
At end of pass1:
- X IS ~nScalar (passed-in no-nil instead of more correct ~Scalar to force IF to trigger one side)
- So x.v is ANY                // Loading from ~Scalar
- So x.v MIGHT be a constant   // ANY can fall to e.g. 3.14
- So x.v inputs are DEAD.      // If we fall to a constant, no need to Load
- So Cast does not unify       // Cast-not-nil is maybe-dead, and Dead do not unify & force bad unifications.
- So X does not merge @{.n} and @{.v}, and is DEAD.
- So no H-M resolution for X.

Start of pass2:
- H-M for X is still a Leaf, so X falls to Scalar.
- Function body resolves; X's H-M type is "T=@{n=T, v=int}?"
Which tries to Lift X from Scalar, but Too Late.

Observation:
Const IF, no unify on dead branch    : value->HM    cross point.
Base types in HM                     : value->HM    cross point.
HM lifts post-call & loads           : HM   ->value cross point
Escaping fcn entry point lifts via HM: HM   ->value cross point.
Live-is-dead, so no unification      : live ->HM    cross point.
Value-maybe-con, so args are dead    : value->live  cross point.

Of these 3 interlocking optimistic passes, if I drop value-maybe-con, I remove
live from the interlock.  Now the liveness can run first and feed into the
interlocked value<->HM pass.


=======================================
9/28/2021

Still wrestling with theory.
"f = { x -> x ? (f(x.next),x.val*x.val) : 0 }; f"

Returns a self-recursive function.
During Combo

- Phase1, no calls from external world.
  Hence 'f' is never called.
  Hence 'x' is ~Scalar.
  Hence '?:' does not execute either branch.
  Hence no H-M action, everything remains a Leaf.
  Hence no binding on 'x's type from H-M.
- Phase2, turn on external calls.
  Since 'x' is not bound by H-M, falls to Scalar.
  Both '?:' branches execute.
  Hence 'x' as a nScalar is used for x.next and x.val
  Hence 'x.next' reports an Err.
  
  THEORY: NO ASYMPOTOTIC GAIN from doing this.  Just drop Scalar values in on
  escaping functions - CEProj->Scope goes ~Ctrl->Ctrl and ALL_PARM is always
  Scalar.

  Still need to deal with Err terms.  Basically claim no Err outputs from
  Nodes.  Means no asking 'err' during Iter/GCP.  Issues like Loads peeking
  thru 2nd-final-Store.

  Ugh, think i need to still keep these.  Means, e.g., when x==Scalar, 'x.next'
  is in-error although I can return a Scalar loaded value instead of 'ALL'.
  During error report might have to filter (only report if error-out and not
  error-in).
  
  
- During Phase2, H-M computes x=: A:@{next=A?,val=Real}
  Hence H-M lifts f() result to a tuple T:(T?,Real)
- Phase3, Iter with HM
  Lift all flow types to HM type, lather, rinse, repeat normal iter.
  

=======================================
9/12/2021

Combining HM & forward-flow.


- display arg comes from the FDX.
- GCP TypeFunPtr fed into slot "^".
- Drop FP2DISP; drop extra FDX input to Call.
- Call peels out TypeMemPtr disp for output value()
- Call DOES NOT INCLUDE TFP IN OUTPUT TYPE, get it from fdx() directly.
- HMT for the function includes Display field
- Use fixed arg names, since names not available at Call sites.  TypeStruct.ARGS[2]="^", "a", "b", ....
- - Replace (""+i).intern() with CallNode.argstr(i) { if( i<26 ) return ARGS[i]; else big_name(i)};
- Explicit pass display (as we do now)
- Fresh incoming args when moving into nested display
- Display Loads are ... Field Loads, no FreshNode
- - Drop FreshNode from parser after Loads.
- - Add FreshNode in parser before getting 1st display; nested display loads are just loads.
- CallEpi unifies arguments as before; display unifies to display, etc.
- Ponder moving ConNode ALL_CTRL, ALL_PARM to private Node class


// Both AA and HM Syntax, at once:
dsp0 = @{  // PRIM scope
  PRIMS...
 
  dsp1 = @{     // File scope
    ^ = dsp0;   // Fresh record prior display
    
    // <{} # dsp> is syntax for combined standard HM function type and also the display being carried.
    
    // So, "is_even is a function/display tuple that...."
    is_even = <{ dsp_extern n ->
      // make the local display (during Parser.func())
      dsp = @{            // This is a LET syntax defining the local display
         ^ = dsp_extern;  // fresh of dsp_extern
         n = n;           // fresh of n
      };
      
      // iter() folds all the FreshNodes of the local display, so parser might as well.
      
      // The final result can be in or out of the display
      (if dsp.n 0 (dsp.^.is_odd  dsp.^ (dsp.^.^.dec dsp.n)))
    } # dsp1>;
    
    // "is_odd is a function/display tuple that...."
    is_odd  = <{ dsp_extern n ->
      // make the local display (during Parser.func())
      dsp = @{
         ^ = dsp_extern;  // fresh of dsp_extern
         n = n;           // fresh of n
      };
      (if dsp.n 1 (dsp.^.is_even dsp.^ (dsp.^.^.dec dsp.n)))
    } # dsp1 >;
  };
};
dsp0.dsp1.is_even


ALSO hacking primitive lookups to be normal lookups.
Before other parsing, parse fixed string/file for prims
All operators must be final-assigned, so can bypass the display lookup.

_+_ = { x y -> $$AddI64I64(x,y) }  // java classForName.  Required to be a binop FunPtr
_+_ = $$AddF64F64 // Allowed to combine/overload.
!_  = uniop prefix
str = $$Int2Str; // can be normal name

Later;
- have token '+' in binop position.
- replace with _+_ and do NORMAL lookup.

Drop {+} syntax.  User can just write _+_.


=======================================
6/14/2021

Up to dealing with memory and arrays in the Type vars.
Missing a grammer to type the state of memory.

Grammar:
 *  type = tcon | tvar | tfun[?] | tstruct[?] | ttuple[?] // Types are a tcon or a tfun or a tstruct or a type variable.  A trailing ? means 'nilable'
 *  tcon = int, int[1,8,16,32,64], flt, flt[32,64], real, str[?]
 *  tfun = {[[type]* ->]? type }   // Function types mirror func declarations
 *  ttuple = ( [[type],]* )        // Tuple types are just a list of optional types;
                                   // the count of commas dictates the length, zero commas is zero length.
                                   // Tuples are always final.
 *  tmod = := | = | ==             // Field assignment modifier; ':=' or '' is r/w, '=' is final, '==' is r/o
 *  tstruct = @{ [id [tmod [type?]],]*}  // Struct types are field names with optional types.  Spaces not allowed
 *  tvar = id                      // Type variable lookup
// Nested type var, for repeats in the same type expression.
// If 'id' is used and not an external tvar, then its treated as a free TVar.
// If 'id' is assigned, then it shadows as a free TVar.
// If 'id' is not top-level, then it never escapes the local type expression.
 *  tvar= id=type
// Arrays, no length.  Layout is taken from elem.
 *  tary = []elem
// Memory types can be followed with a tmod; this covers the entire object.
 *  tary = []=elem
 *  tstruct = @{ [id [tmod [type?]],]*}=
 *  ttuple = ( [[type],]* )=

// Some common primitives typed
Array-load :  {[]} : { []==elem int      -> elem } = ...  // array is read/only
Array-store:  {[]=}: { []  elem int elem -> elem } = ...  // array is read/write
int-add    :  {+}  : { int int -> int } = ... // no memory effects
flt-add    :  {+}  : { flt flt -> flt } = ... // no memory effects
            sys.cp : { []==elem int []elem int int -> 0 } = ... // src array is read/only, dst array is read/write; nil return


// Some more syntax, for actually doing the arg lineup & typing in 'aa'
{+} : { int int -> int } = { x y -> $Prim.AddI64.make(x,y) } // no memory effect on type

// Ponder oper fcn name using '_' as placeholders for args.
// Also doing the special FFI syntax to talk to java.
{_[_]} : { elem[]== int -> elem } = { ary idx -> $LValueRead.make($ctrl,$mem,ary,idx) } // read-only memory effect on type
{[_]} : { int -> [] } = { ary idx -> $NewAry.make($ctrl,$mem,ary,idx) } // write-only memory effect on return type
{_[_]=_} : { elem[] int elem -> elem } = { ary idx elem -> $LValueWrite.make($ctrl,$mem,ary,idx,elem) } // since no '==', updates memory

  // Variations on typing arrays in aa code
  [n] -- untyped array of size 'n'
  [n]:flt[] -- flt array; "new float[n]".  Preferred; asserts result is typed as flt[]
  [n]:flt -- special case syntax, ":flt" binds to "[]", and implies "flt[]".  Ambiguous with the below.
  ary[n]:flt -- does an array lookup, and types the result as 'flt'.  Ambiguous with the above.
  [n]:[flt] -- NO: another syntax choice, can be confused as claiming the indices have to be of type flt.
  [:flt n] -- NO: a little odd; unambiguous grammar
  ary[:flt n] -- NO: unambiguous, loading from an array of float.
  flt[n] -- NO: injects a type without a ':'

Now need type var syntax for exposed free tvars.
Have not decided between these two syntaxes:
  Tree A = : @{ left : Tree A ? , value : A, right : Tree A ? }
  Tree<A>= : @{ left : Tree<A>? , value : A, right : Tree<A>? }


=======================================
6/8/2021

Still combining value & live.

Value lattice: Top/Any, con(1,2,3), Bot/All
Live  lattice: Dead, Live

xfer-function for primitives:
  // Issue is: for most primitives, it makes no difference
  // Prop live first, then values
  if all uses dead :  (top    ,dead)
  else:
    if any defs Top:  (top    ,dead)
    if any defs bot:  (bot    ,live)
    if all defs cons: (op(con),dead)
  if Stop : always live
  if Start: always true

  // Prop values first then live
  if all uses dead or con-not-Stop:
    if any defs Top: (top     ,dead)
    if any defs Bot: ( ?      ,dead)
    if all defs cons (op(cons),dead)
  else some uses alive:
    if any defs Top: (top     ,live)
    if any defs Bot: (bot     ,live)
    if all defs cons (op(cons),live)
  
 true     true     true     top
Start  -> ID1  ->  ID2  -> Stop
 dead     live     live    live


Example where value<->live impact each other around Loads of fcn ptrs.
Issue is a fat-pointer (so requires a fat-live) with a #fidx constant
and a display ptr.  Basically, a dead linked-list load.

  (mem0,x0) = New*12(fld)      // New object, returns memory & pointer
  fib0 = fptr(x0,#47)          // Construct TypeFunPtr from a display pointer and code constant
  mem1 = st.fld(mem0,x0,fib0)  // Store into display
  call(*47)(mem1,x0)           // Top-level fib call

  fib() {                      // Function entry
    mem2= parm(mem1,mem2)      // Memory  Parm from entry and loop-back
    x1  = parm(x0  , x2 )      // Display Parm from entry and loop-back
    
    fib1 = x1.fib;             // Load TypeFunptr
    x2 = fp2disp(fib1)         // Extract display pointer
    fib = fp2fun(fib1)         // Extract code ptr
    call(*fib)(mem2,x2)        // Recursive (looping) call
    cepi  ...    
  }

// Start GCP
  (mem0,x0) = New*12(fld)      // mem0: (fld=T )/(fld=X)  x0: *12/X -- Starting, so computes value despite dead
  fib0 = fptr(x0,#47)          // T / X
  mem1 = st.fld(mem0,x0,fib0)  // T / X
  call(*47)(mem1,x0)           // 

  fib() {                      // 
    mem2= parm(mem1,mem2)      // T / X
    x1  = parm(x0  , x2 )      // T / X
    
    fib1 = ld.fld(mem2,x1)     // T / X
    fib = fp2fdx(fib1)         // T / X
    x2 = fp2disp(fib1)         // T / X
    call(*fib)(mem2,x2)        // T / X
    cepi  ...                  // T / L -- Stopping, so computes live despite Top inputs
  }

// Some GCP progress on live, some backwards flow
  (mem0,x0) = New*12(fld)      // mem0: (fld=T )/(fld=X)  x0: *12/X -- Starting, so computes value despite dead
  fib0 = fptr(x0,#47)          // T / X
  mem1 = st.fld(mem0,x0,fib0)  // T / X
  call(*47)(mem1,x0)           // 

  fib() {                      // 
    mem2= parm(mem1,mem2)      // T / X
    x1  = parm(x0  , x2 )      // T / X
    
    fib1 = ld.fld(mem2,x1);    // T / X         // Still dead; both users may-be-constant
    fdx = fp2fdx(fib1)         // T / L         // Backwards from call; may-be-constant, so inputs are unused
    x2 = fp2disp(fib1)         // T / X         // Backwards from call
    call(*fdx)(mem2,x2)        // (T/L,T/X,T/X) // Call special: live-use requires code-ptr, but not memory nor display
    cepi  ...                  // T / L
  }

// Some GCP progress on value, some forwards flow
  (mem0,x0) = New*12(fld)      // mem0: (fld=T/X)  x0: *12/X -- Starting, so computes value despite dead
  fib0 = fptr(x0,#47)          // 12:(fdx=#47/X,dsp=*12/X)
  mem1 = st.fld(mem0,x0,fib0)  // 12:(fld=(fdx=#47/X,dsp=*12/X)/X)
  call(*47)(mem1,x0)           // 

  fib() {                      // 
    mem2= parm(mem1,mem2)      // 12:(fld=(fdx=#47/X,dsp=*12/X)/X)
    x1  = parm(x0  , x2 )      // *12/X
    
    fib1 = ld.fld(mem2,x1);    // (fdx=#47/L,dsp=*12/X)
    fdx = fp2fdx(fib1)         // #47 / L       // 
    x2 = fp2disp(fib1)         // *12 / X       // 
    call(*fdx)(mem2,x2)        // (#47/ L, (fld=(fdx=#47/X,dsp=*12/X)/X), *12/X) // Call special: live-use requires code-ptr, but not memory nor display
    cepi  ...                  // T / L
  }

// Turns out, this is the standard forwards-flow of values with constants
// stopping liveness... but liveness not impacting values.  Simple ordering
// of 2 unrelated global opts.


======================================= 6/6/2021

Optimistically combining values & liveness failed.
Example straight-live graph during GCP:

 true     top      top     top
Start  -> ID1  ->  ID2  -> Stop
 dead     dead     dead    live

Since ID2 is passed and computes ANY, which may_be_constant, ID1 is not alive.
Since ID1 is DEAD, ANY is passed to ID2.
The answer is stable, but wrong: the 'true' should pass from Start to Stop thru the IDs unchanged.

I allow value to impact live: constant values need no inputs, so their inputs are dead.
But never can live impact value, or I get versions of the above example.


=======================================
5/12/2021

Fun:   gather of control
Parms: Lambda args def-spot.  Feeds into New directly, and uses inside the Lambda are never fresh.
- The Env pushes a VStack of the parms.  These are all the not-fresh vars.
Load: Drop the 'make fresh on removal'.  Do nothing 'Fresh'.
- In parser, when doing name lookup, insert a Fresh after Load.
FunPtr: is the value of a Lambda, not the variable being defined.  No fresh issues, but behaves like a Lambda.
Stmt:  Behaves like a Let; use push nongen inside the statement def; pop nongen afterwards.
- If after ifex() not a fref, no push/pop nongen.
- If after ifex() was a fref, pop nongen
Fact: If fref, push nongen (once; if not already pushed).
Func:  Behaves like a Lambda; push nongen scope; add vars for each parm; all are 'popped' when Env is popped


Fun                            
  Parm
  ...
  Ret
FunPtr                    Lambda: make a Fun of Parms->Ret

Call (FunPtr Args...)     Gather Args
CallEpi                   Apply: Fun+Args unify to FunPtr


=======================================
4/14/2021

Back to HM algo.  Missed 'fresh' a bunch; 'occurs_in' was not right.

Thinking:
- Cut Parse back to the nubbins; no 'xform' and no 'inline'.
- Confirm get HM answer correct from HM test cases.
- Supporting mutual-letrec.
- Loads from Display just like HM.Ident; and needs 'nongen' set.
- Stores to Display, very similar to '?:' just use same tvars.
- Load/Store from Object/Array use memory/pair-like defs.  No recursive defs?

- Need to build a nongen set during Parse.  This is almost exactly the
  env._scope.stk() display, except during mid-def of recursive and mutually
  recursive functions.
- - WithOUT recursion, current Display is like a nested set of 'let x= ... in ...'.

- May have to look at Unresolved as a case of Haskell Type-classes.
- - {+} {int,int->int} {dbl,dbl->dbl} {flt,flt->flt}?  {str,str->str}
- - {==} {int,int->bool} {dbl,dbl->bool}, etc....
- - Haskell TypeClasses are declared interfaces, not duck-typing.

- Once TV2 produces correct answers, start turning on XFORMS.

- Besides turning OFF XFORMS, what else can i keep?
- Can i keep NewObj for Displays & add-in the needed let-rec support?
- - Add in support for ordered sets of tvars, used in the nongen tests?


=======================================
2/23/2021

Back to HM algo.  Missed the 'fresh' notion in main code: let Name = Body in Use;
If   Name is in Use  then use a Fresh let result;
else Name is in Body and  use the     let result.

In main code; Names are use-def edges (SSA).
Typically want FunPtrNodes to deliver Fresh,
except when recursively self-defining.

So when looking at a FunPtrNode, tvar behaves "as if" wrapped in a Fresh.
Here's maybe the bug: At a normal Call/Cepi with a FunPtr, I do the
fresh-unify.  But if the FunPtr is passed along (and merged with others), I
lose the Fresh notion.  All unification points (args to functions, Phi nodes),
need to honor the Fresh notion.

Also, lots of replicated code because TMulti does not support sparse edges (ala
TMem aliases), nor named fields (TObj).  Replace TMulti with a sparse array
indexed by Object (either String or Alias index/Integer).  Drop TFun, TMem,
TArg, TRet.  Add a String for a structural-type, for asserts.  Add a Fresh wrapper.
Normal unify splits into fresh/not-fresh as HM does.
Drop unify_lift & debug until i can run all tests except HM15.

Can i get rid of TNil & TVDead?
Fold TMulti and TVar?
In HM, using _updates; in TVar its called _deps, and should only take CallEpiNodes.
Still keep _ns for debugging; but only for debugging.

Keep new TVar counts under control? not getting big in HM.  Obvious: add a
string per alloc side, & histogram allocs.  Keep with TVar, and on unification
also histogram its death (no GC, so cannot recycle, without refcnt).


Make a TVar2/TV2; and clone HM.java behavior as much as possible.


=======================================
2/16/2021

Mid-integration with H-M.  New situation with Memory.

H-M assumes all args are immutable, visible, and the return type is a function
of args (and the fdx).

Flow-types assume values are immutable but memory is incrementally updated
(mutable), with new aliases in-function and also field updates.

Integration pretends memory is both an arg-in and a return-out.  It is split by
escaping aliases; non-escapes are NOT arg-ins; escapes ARE arg-ins.  The
return-out memory is a function of the arg-ins, including escape-in memory.

Precision on the flow side is at the alias level, but needs to be at the FIELD
level.  Precision on the H-M side is at the field level.

Bug#1 is: MemSplit & MProj did not unify.
Bug#2 is: body of fcn never lifts Parm:mem/Split/Join/MrgProj to TMem... so MrgProj never reports new-in-fcn TObj.
----

Want a TLazyMem: has a prior TVar, a BitsAlias and a TObj.  Unifies the TObj at
all BitsAlias in the prior TVar - once it becomes a TMem.  Can just stay apart
if prior is a TLazyMem, buggy if anything else (e.g. TFun).  Two TLazyMems can
unify TObjs if both BitsAliases are the same.  A TLazyMem and a TMem can make a
new TMem by copying the old TMem except at aliases.

Want all Nodes to produce an initial Txxx that is sharper, and not require
'unify' except once (per original H-M, but more times with partial-eval
modifying program shape).

Want a TLazyObj: has a prior TVar, a field & a TVar.  Merges fields.  But
fields do not flow forever, nor get huge in number.  So just make TObj allow
for lazy new field to appear?  eg. a Load/StoreNode has incoming memory
as a plain TVar (not TMem), or incoming obj as a plain TVar (not TObj).
Will force output as a TLazyMem with a TLazyObj (plus named field).

For a FCN, the input mem parm will be a TLazyMem w/no aliases except what
appears inside the FCN.  Fresh-unify will unify with the TLazyMems, and require
the not-mentioned aliases be the same before & after the FCN.  E.g. a
TFun = { [CTL:TVar MEM:TLazyMem[exceptions] ARG3:TVar  ARG4:TVar] -> 
         [CTL:TVar MEM:TLazyMem[exceptions] REZ :TVar] }
Forces the not-exception memory aliases to be the same.


----

H-M alternative: split memory by aliases (and by field within alias), as
explicit top-level TVar args into a Call, or top-level TVar returns.  New-in-
call is treated as a top-level input (of UNUSED) and a top-level return.  All
side-effects are modeled as part of the memory-return.

Aliases that are neither esc-in nor esc-out are not incoming args nor can force
fresh-unify anything.

CallEpi are the Apply return-points, and unify "as expected": defaults to a
TRet which has a TLazyMem to-be-unified with the return ala fresh-unify.

Why a TLazyMem instead of eagerly chasing all TObj/aliases down and flattening
into a TMem?  Because we *never* answer the question of unrelated aliases in an
unrelated function.  It's the variable never asked for.

Which brings me to: Tmem becomes lazy by-default always.

TLazyMem: has a default, has a "you asked for it once before, so i got the
short-cut answer right now", has a "exceptions, not the default".  The
effect is: "whatever the default is, unless looking at exceptions".
When fresh-unify, the default can bottom out at "no default given" which
might be a plain-tvar.  The Parm:mem & Ret:mem can have the same default,
which fresh-unify uses to force before-mem and after-mem to be the same.


=======================================
2/2/2021

'remove_ambi' situation: no high fdxs, so no *optimistic selection* fdxs.  so
double-loop in GCP not needed; no remove-ambi.

But concept of needing remove-ambi still valid: optimistic selection of choice
overloads.

2 kinds of overloads: (1) from type-specialization, with a clear monontonically
improving choice, and (2) unrelated primitives (+:int vs +:str) with conflicts
and (2a) unrelated primitives with cost (+:int vs +:flt).

Current issue: during GCP, Call.live_use is declaring unwired fdxs as "will be
wired", so Args "might-be-live on the to-be-wired fdx".  Never wired, because
call is in-error.  "good_call" does not distinguish between "some args high, so
waiting for them to fall" from "bad nargs or bad args".  During GCP want to
distinguish between "CG still building" from "CG is built, but errors so no
longer want to wait for CG"

Call old resolve code helped here.
GCP args falling from high.
FDXs have some overloads; Scalar->Int type-specialization, or
same-name overloads +:Int-vs-Flt or +:Int-vs-Str.
Ex ARG lattice: TOP->Int/Other->Scalar.
Ex FDX lattice: TOP->50/54->[50,54], where [50] allows Scalar and [54] allows Int.
Want CEProj to never hit CTRL on paths that will never take.
[Arg:Top,FDX:Top  ] -> 50:X,54:X
[Arg:Int,FDX:54   ] -> 50:X,54:Ctrl
[Arg:Int,FDX:50,54] -> 50:X,54:Ctrl
[Arg:Str,FDX:50,54] -> 50:Ctrl,54:X
[Arg:Int,FDX:50   ] -> 50:X,54:X // ODDLY, staying HIGH waiting for 54 to appear.  If it does NOT appear, needs to fall.

This is the remove_ambi situation: call stays high (disallows all FDXs) until
no-more-progress.  Then GCP calls remove-ambi, which starts allowing FDXs that
are available, but can be filtered.  Which means no FDXs until either
remove-ambi, or i can tell this FDX will never be overloaded.  Or Unresolved
puts out high FDXs (which fail at Phis, but can all fall down) (high FDXs is
what i was doing).  Call resolve allows no high FDXs until resolve-ambi, then
the call is flagged to 'fall this way', or can prove exactly 1 high FDX will
ever pass.


=======================================
1/31/2021

Maybe a breakthru on Call+resolve+Unresolved.

FunNode has a TypeFunSig.  This can FALL during ITER (lift during GCP?) as arg
uses die (during GCP all args start not-live so all TFS can be super-low).

A FunNode validates each path independently, and sets a bit-per-path.  (Part of
FunNode value?  Add a CProj on every wired Call path?).  Valid means that all
args are "isa" the TFS.  Invalid paths mean the call+fun is in-error, normal
during Iter.  They still push results thru (despite being invalid), since the
TFS might FALL to drop the incoming arg requirement; dead args can be Err.

Parms merge all paths, including in-error paths & overload paths.  If the arg
fails to be "isa" the formal, Parm drops to ALL.  "The poison spreads".  This
is monotonic; if an error arg dies the path remains valid & the merge is
correct (and the FunNode no longer is in-error).

Calls just pass along the FDXs, no resolution filtering.
CallEpis wire all FDXs unconditionally.

Overloads then might wire to all options.  Most options would be in-error.  As
args lift during Iter, we can unwire overloads.  We ask the FunNode if this
overload path is valid.  Also, if FDXs lift to fewer, we can unwire.

Ambiguous error if more than 1 overload remains.

We keep all paths during Iter even with BAD args (because TFS might lift).  We
also keep HIGH args, but these do not pass along much info.  Post GCP#1 we have
all wired+extras (might require GCP#1 to resolve totally unknown FPTRs, so no
guarentees out of Iter#1)

THIS IS WORKING OUT.

Think I need a Call Graph edge; a CEProj between Fun and Call, so individual
edges can be turned on and off.  GCP needs optimistic edge discovery to resolve
cyclic overload fidxs.  Theory is i've wired a call to {+} to all 3 {+}s;
during GCP i discover that (a) all start high, so ok, (b) one arg falls to 1,
which flows to {+}:int, which around again.  Need to not allow {+}:str to fire
until all args are not-above-center() and also isa_formals.

CEProj.value: XCTRL while Call is ANY/XCTRL; FIDXS are high; ARGS are high;
Then and only then, look at the exactly trailing FUN, get _sig, and check args
being 'isa'; allow CTRL is all are OK.
Parm.value: ignore _sig isa; just use the CEProj state.


=======================================
1/28/2021

Maybe a breakthru on Unresolved....

Have a way to specify that "this collection of fcns are ambiguous", maybe
always true based on name alone (e.g. {+}).  Then any BitsFun is a collection
of bits (as normal), but some of them may be ambiguous with some others (but
not all).  They can be both high and low BitsFuns, as normal (and mixed with
other bits).  Normal forward flow rules, except at a Call.  Call filters out
BAD arg choices as normal.

LEAST_COST, during Iter same as Opto: can only be used once no more other
lifting can happen.  Standard problem: BAD args can go dead and lift to ANY,
so cannot prematurely remove a FDX with BAD args.

Need to expand on the TestMonotonicCall.  Add ANY args.
Can I ever filter out FDXs?

When an arg dies, a FunSig LOWERS to ALL (allowing even an Err argument).
Previously illegal args can become legal.
Iter: Args keep lifting.
Unresolved makes a low set of FunPtrs.
LEAST_COST tosses out the higher cost, once its clear there is a lower cost legal FDX.
Can be run at any time.

i.e. Call.value: Keeps all FDXs until there is a lower cost one; keeps BAD ones
(in case ARG goes dead), keeps LOW (in case lifts).  Keeps HIGH (might as well)

Parm filters by valid args, keeps at ALL until valid (can keep at ALL until all
Parms are valid; dead args same as ALL, always valid).

Opto: Args keep falling.
Unresolved starts with a high set?

Illegal FDX

A FunSig starts high (all args dead, all calls allowed? not monotonic?) - Seperate question.


=======================================
1/11/2021

Restructure of GVN in progress.

=======================================
1/6/2021

Got incremental H-M working.
Got decent evidence of 'monotonicity' of unify.
For lattice, TOP=a shared TVar (all part of it), BOT=independent TVar.
H-M starts @ BOT, and unifies towards One-Shared-TVar.


Thinking of reworking GVN iter, to get me more structure & control over it.
Seperate worklists for dead (instead of recursive kill0), for strict removal
(eg, CSE, folding +0, st/ld), flowing {value,live,unify}, strict same edge/node
counts but lowers something, increases edge/nodes (but maybe adds parallelism),
larger simple xforms, inlining (maybe sorted by utility).

Seperate ideal_xxx calls for each of these, along with seperate worklist_xxxs.

A way to gather neighbors for each ideal_xxx or value/live?  For a
value/live/unify, return the new value plus also a "neighbors used to make
this".  Store with each Node the reverse-neighbor list?

How about:
  old.add_work_flow (nnn);  // called iff value/live changes, adds to value/live worklist
  old.add_work_unify(nnn);  // called iff unify changes, adds to unify worklist
  old.add_work_reduce();    // 


=======================================
Notes from 12/28/2020

Back around to basic theory.  Progress since last time: pushing H-M through
most code, but then up against some basic stuff.

MEMORY: Suppose I treat a H-M "pair" as memory, extending the 2 elements to N.
As long as the index is a fixed constant (e.g. car,cdr on pairs), the H-M
updates are fairly obvious: A "car!" or "cdr!" forces unification of the
before-and-after.

Example: pair (X,Y), "car! (X,Y) Z" results in (Z,Y) and unification of {X,Z}.
If I extend car/cdr to a "store" with an unknown index, everything the index
touches must unify.

Example with a 4-wide "pair": (A,B,C,D), and a ptr with range 1-2, and then "st
(A,B,C,D) ptr Q" --> (A,B&Q,C&Q,D) (maybe updates B or C) and forces
unification of {B,Q,C}.  Later during "iter" ptr lifts from 1-2 to the constant
1, and the store result lifts to (A,B&Q,C,D) and we should LOSE THE UNIFICATION
of C and Q.  This is not monotonic!

Same basic problem as dead-code on a test: (P ? A : B) unifies A and B, but if
P becomes a constant this unification should be lost - and is not monotonic.

This leads me back to Click thesis chpt3 - a need to express unification rules
as equations, in order to get the basic theory right, and get back to some kind
of monotonic solution.

---Attempting to make a pair-of-nodes Unification Lattice:

Using CCP from Click thesis directly, including jargon Lattice_c (for integer
constants) and Lattice_u (for (un)reachable), shortcut: Lc and Lu.

Lattice_unified (Lf): two Nodes are unified or not.  This is a 2-D bitwise
relation.  We hope it becomes an equivalence class, with a distinguished
Leader.  (As of 12/28/2020, seeing the need for a Leader & Followers).

(X<=>Y) are unified if:
      (X===Y)         +        // Reflexive
      Y:=X            +        // Y is copied from X, includes Loads & Stores & Projections of Tuples.
                               // And Y is marked as a Follower
      (X_isa_Phi)&(path#1 reachable)&(X[1]===Y) + // Phi and reachable path are unified
      VX=TOP          +        // Either is Value TOP
      VY=TOP          +        // Either is Value TOP
      VX=VY=c0        +        // All compute the same constant
      ....

Lattice_f does not hold onto the notion of Functions or Pairs or Oper.
In H-M, Nodes have a constructed T expression which is shared with other Nodes:
 - A unique numbered TVar; the same TVar can appear many times in a T expression
 - A TMulti fixed-length tuple of TVars, indexed by constants
 - A TArg, which is a TMulti of length 3+, [Ctrl TMem Arg1 Arg2...]
 - A TRet, which is a TArg of length 3
 - A TFun, which is a TMulti of a TArg and a TRet
 - A TMem, which is a TMulti of length #aliases (program constant) of TObjs
 - A TObj, which is a TMulti indexed by field name (one-to-one mapping from field name to field index), which hold TVars
 - A TAry, which is a TMulti with a single element TVar (someday maybe the size).

Typical equation-solving gives a unique TVar structure to *a Node*, but H-M
solving gives a shared TVar structure, but this is for efficiency and not
required.  

Each H-M unification step either (1) succeeds and maybe grows the 'leaves' or
(2) fails, and the program is in-error.  Success adds constraints to a T
expression; if the sub-parts are shared then some other Nodes' T expression
also changes.  If we do not share, then instead Lf is a per-Node T expression
which monotonically grows.  This means we are computing a T expression
per-Node.

Given a T-expression & a Value type, we can force alignment by doing a JOIN
on Leader only (Followers get Values flowing from Leader).
Which turns into, we can apply the H-M "fresh_unify" notion in the old Type
system at Calls.


Need to make a Lattice_f: define a 'meet' and 'dual'?
Assume all lattice elements are 'below center' for now.
A 'TVar' is essentially a 'lattice bottom' and can lift to any more specific type.
Phi xfer function = { ctrl_1:Type.CTRL val_1:Type hm_1:TVar
                      ctrl_2:Type.CTRL val_2:Type hm_2:TVar ->

                      if( ctrl_1==X && ctrl_2==X )
                        return [Type.ANY, plain TVar];
                      if( ctrl_1==CTRL && ctrl_2==X )
                        return [val_1,hm_1] // same value type, same hm as LHS
                      if( ctrl_1==ctrl_2==CTRL )
                        return [val_1 MEET  val_2,
                                hm_1  UNIFY hm_2 ];
                    }


Need to define unify:
- Given 2 complex structs with equal name&arg.length
  - its the structural recursive answer
- Given a TVar & anything else, its the other thing
- Can be a base type
  - Given 2 base types, use TYPE system.
- Otherwise its an TError
This implies a lattice: plain TVar at TOP, complex in the middle, TError at bottom.
Dual can be just replicated 'above' & ignored till GCP.

Exploring this "lattice" with shared TVars -
  { x -> x } MEET { 3 -> y } ==> { 3 -> 3 }

  { a -> { b -> c } }  MEET { { d -> e } -> f }  ==> { {d -> e} -> {b -> c} }

  { [a] {a->b} -> [b] } MEET { [int] {c->flt} -> d} ==> { [a:int] {a:c:int->b:flt} -> d:[b:flt]}

  { bool a a a -> a } MEET { b b b b -> b } ==> all bools

  { x -> x } { (a,b) -> (d,e) } ==> { (ad,be) -> (ad,be) }
LOOKs like lattice to me!!!

Ignoring shared TVars, this could be a refinement of Type system.  Assuming
IMMUTABLE, with marked shared (un-interesting differences are always
hash-consed, shared, immutable).  Shares are unique per expression, and must be
top-level marked, or else cannot tell shares here from shares there, and we
might get cross-TVar sharing (which is buggy).

Summary: need a top-level THM which delimits TVar sharing.  Internally plain
TVars are null; shared ones unique numbered from 1.  Still need to look at
Apply / Call+CallEpi.

Lambda/FunPtr xfer function = { arg:Type tvarg:TVar ret:Type tvret:TVar ->
                                return [TypeFunPtr(arg,ret),
                                        TFun(tvarg,tvret)];
                              }

Weirdness at Let,Apply, because of UNIFY.

ALSO: could be a bug with nongen missing contents of memory

======================================= Notes from 11/23/2020

Blending H-M & Flow/Lattice Theory.

Adding H-M constraints "looks like" making two Nodes structurally-equal (both
are primitives, or both function-pointers, or both structures with same fields,
etc).  Thus when H-M U-Fs two TVars together, their matching Nodes should have
structurally-equal types - which I can make by doing a JOIN.

If, during Iter & types lift, I declare some code dead, it no longer U-Fs
sensibly.  i.e., what does "structurally-equal to dead" mean.  Example: "P=0; P
? A : B", might start with A & B unified, but then A goes dead and B keeps
being unified with A... and A might have had structure, which then got forced
onto B.  Dropping the U-F of A needs to drop the structural unification.  More
Example: "A = (C0,C0); B = (D0,E0)".  Unifying A & B forces equivalence
between D0 & E0.  When A goes dead, what to allow D0 & E0 to once again be
seperate.

During GCP, is something is currently dead, it no longer JOINS, so CEPI types
drop.  If something goes live, it does JOIN, so CEPI types lift.  Not monotonic.
Cannot use JOIN.

Conflict is: More H-M U-F "lifts" in that it forces equiv structural types,
which maps naturally to my JOIN.  But dead-code breaks a Union effect (or else
should allow Union to DEAD), which "lowers" resulting types.  This is UNSTABLE
& OSCILLATES.  If U-F with DEAD keeps the JOIN to ANY, then always fail to type
with simple DCE.

This is because I dont have a theory on the "best solution" or "fixed-point"
solution when H-M discovers structure-equivalence.  Current best theory is
every time H-M unifies, I've discovered more struct-eqv, which can be used to
lift types.  Lifting types can lift a {?:} op to one side, which then U-F with
DEAD & removes the structural equivalence between sides.

// still no fail here...
b:int = 1
c:flt = 2.3
dup = {x->(x,x)}
while(1) {
  P ? (dup 2) : (b,c); // Unifies (2,b) and (2,c); forcing (_:nint,_:nScalar)
}

BECAUSE PHI DOES NOT IMPLY STRUCT-EQV ON ARGS, UNLIKE H-M {if/else} OPERATOR!
Because H-M does not ask the forward-flow question, it only gathers contraints
that have to hold to type.  When flow gives an Error, H-M might claim things
have to be equal.  More like an Assert.

Keep DEAD & JOIN anyways.  Good progress, back to the is_even/is_odd test.

Now: Lacking a way to tell unify progress.
Still want a unify-fresh variant for performance & progress.

Now: After some progress work, figured out call fresh() in the wrong place.
Each *use* of a FunPtr gets a new fresh(), which only updates if the
FunPtr basic type changes shape.  This *should* be the same as calling
fresh() at every use (only use in CallEpi), except progress is wrong.

Ok, sorted out more progress.


=======================================
Notes from 11/16/2020

Got HM unification "working" on Sea of Nodes, meaning doing unification but not
using the info to do any improvements.  Also, no distinction between Lambda and
Let.  Actions:

Make a "unify" call for GVM.  Set to NOP by default.
Call it during iter().  Can do Ideal() for now.
Run "standard" unification, like i do in the constructors now.
Add a "progress" - which forces re-unification.
If a Call directly uses a FunPtr, get a fresh copy for unification.
Update the fresh copy if FunPtr makes "progress", which means it unifies with anything.

Eventually, once the TVars are correct for toy example, use the TVars to
improve CEpi return types after Call.


======================================= Notes from 11/10/2020

Branch HM has a Hindley-Milner-like thing, set for a Sea-of-Nodes (or at least
SSA), a worklist-style algo with monotonicity properties (for including in the
iter() pass), and recursive types (no occurs-check).  Missing minification of
recursive types, like TypeStruct does.  Also, the TypeVar relation changes
dynamically (at the moment), which precludes using the immutable persistent
Types I do now.

Concept: TVars map to a Node, and a collection of U-F'd TVars map to a
collection of Nodes.  The collection, as a whole, is a MEET, but individual
Nodes keep their Type (JOIN vs the whole).

Concept: This is the SESE principle at work: args into a Call have a Type
relationship amongst themselves (and the CEpi), but the CEpi result is lifted
from the merged Ret value.  Types in a Fun/Ret are merged & approximated,
but sharpened again at a CEpi.  The TVars map those relationships.

Concept: TVars have structure like Types: TStructs, TArys, TFuns.  We can argue
TMPs, TFPs, TMems as well.  The difference is that they all bottom out at
TVars, which bottom out at a Node.


Ponder: making TypeVars immutable, so can hash-cons.

Ponder: Need syntax for TypeVars in the type sub-language.  Classically single capital letters.

Ponder: Simplistic in execution, then optimize.  No hash-cons.  Entire cut-n-
paste clone of type/Type*.java, including the cyclic stuff in TypeStruct.

Ponder: Test code for TypeVar, including cyclic unification.

Ponder:
Every Node has a TVar, which can be "sharpened" via U-F.
Every TVar also is a U-F, including "Shape" TVars.
The "base" TVar is either tied to a Node (hence also Type).
Shape TVars: TStruct, TAry, TFun.

Shape TStruct - a map from field names & numbers to TVars.
  Sometimes its just field name from a Load/Store, sometimes a number.
  Sometimes we can union a field name and a number (e.g. a constructor with ordered field declarations).

Shape TAry - just a TVar for the size & elements.  Size is optional.  Maybe use same impl as TStruct, with eg field names "#" and "[]".

Shape TMemPtr: A ptr to TStruct/TAry (or both, as a TObj).  

Shape TMem - A map from TMemPtr to TStruct/TAry/TObj.

Shape TFun - Straight from H-M: a collection of args & and return.


=======================================
Notes from 09/07/2020

Short-circuit evals.

Plan A: Thunk the RHS expr & pass the thunk to all binops.
Prims like '*' unthunk it.
Prims like '&&' do short-circuit eval & unthunk on demand.
Allows user to define new short-circuit operators.
Means expr() will thunk all term()s independently, except perhaps the first.
Then precedence determines the order of eval.

Example: a && b || c && d
Assume a,b,c,d have side effects.
if( a() ) {
  if( b() ) {
    returns b
  }
}
if( c() ) {
  if( d() ) {
    returns d()
  }
}
returns 0

During expr parsing, once a thunking operator is found, thunk all remaining terms.
NEEDS: a way to 'thunk' a term(), where a thunk is a no-arg function with all callers known.
When combining terms, if RHS is a thunk & operator is no-thunk, then "dethunk" it.
NEEDS: a way to 'de-thunk' a thunk; simple inlining.
Inside the thunking operator, expect an IF around a eval'd thunk.
NEEDS: a way to 'de-thunk' in the graph, expected AFTER operator inlining.
After an op return, needs a way to 'thunk' it.


=======================================
Notes from 09/01/2020

REPL- Lots of cleanup/improvements already.
Think I'm needing a use-driven inline/clone.
FunNode with REPL usage & unknown caller & REPL-use Call clone to remove unknown.
Call has to resolve to clone.

Thinking-
- Change of plans.  Drop back-flow of REPL.  Use new cloning strategy: clone
  always if using REPL & has a default input & dropping the default changes sig
  & found from other resolved calls.
- Ponder nested cloning (ugh, tree-shaped cloning or risk endless cloning for
  ever-more-minor variants), where a Fun with multi-inputs can split?
- Cleanup & keep merged live state.  Means I can reverse-flow some flags in the
  future.
  
- Clone always has a default input; limit to sig improvements, although even
  minor sig improvements might be good?
- GCP/Opto.  Both defaults & clones get a chance to resolve.  Some/(many?) of
  the clones will never get resolved-to; get cleaned up.
- Each new-line, some new Calls appear, resolve to some Funs.  
- Each new-line, visit all FunNodes, clone if sig varies when dropping default.

=======================================
Notes from 08/29/2020

REPL-

- Want no impact on error; thinking about giant assert of copy-all Nodes &
  verify after cleanup that all nodes are back the same.
- Might be OK right now, just copy/reset top-level Display.
- All funcs kept conservative with the default input.  Prevents typing.
- Ponder cloning aggressively without the default, on all call-sites LIVE from
  current exit value (as opposed to live-for-all-time).  'nuther flavor of LIVE?
- Like LIVE>>ESCP>>REPL?
  LIVE: Regular alive for future, but not needed now for result.
  ESCP: Regular alive for future, but not needed now for result.  Exits local scope .
  REPL: Regular alive for future, and YES needed now for result.  Exits  all  scopes.
  Or just walk backwards once?

- Rename ESCAPE to 4 chars: ESCP.
- TestREPL calls REPL directly.

=======================================
Notes from 08/12/2020

Yanked priv/public notion.  Failed when calling many fcns, returning the same alias many times.

Back to Basics!
Standard Call-Graph Flow from Rice.
Live & Value both entirely symmetric, so symmetric handling.

Add to Ret a MOD-OUT set (union of NEW/STORED).  Ignores read   aliases.  Forward flow.
Add to Fun a READ-IN set (union of     LOADED).  Ignores stored aliases.  Reverse flow.

Recursively expand MOD-OUT at CEPI.value from union of RETs plus local.
Recursively expand READ-IN at CALL.live  from union of FUNs plus local.

Call.value: just capture from above
Cepi.live : just capture from below


Cepi.value:
  for all rets, take pre.call or post.ret, based on ret MOD-OUT.
  meet results.

Symmetry
Call.live:
  for all funs, take pre.cepi or post.fun, based on fun READ-SET.

How does this handle Recursion?


=======================================
Notes from 08/05/2020

Trying to solve: New / Call / St=; problem is Call blows aliasing on New if recursive.
Code: "foo = { ptr -> ...@{ x=foo(ptr.fld); y=foo(); ... } }"

In general, want to solve the problem of swapping a New & Call.

(1) Via flow, but recursive return New is confused with an entry of prior New.
Easier to see with a loop:   "prev=0; for{P}{ v=f(i); prev=@{_next=prev,_val=v; }}"
So thinking:
Keep a private & public memory in TypeMem, per-alias.
Each ptr is either public or private.
New makes a private ptr.
MrgProj crushes all internal private ptrs to public, tosses away any prior
private memory, replaces it with a new private memory from New.
Private ptrs lose private at all Phis (e.g. ptr-meet is ALWAYS public).
Storing & Loading private ptrs is allowed, and private-ptrs can be in memory.

(2) Via graph, "ptr.fld" confuses with New, so cannot swap New around until
after GCP sorts out 'ptr'.  Next problem: cannot swap New until after GCP, but
the swap is required to solve the final-store of "y=".  So need to clone
"foo={ptr->...} for a variant of ptr with fld.  "foo={ptr:@{fld} -> ...}".

IDEA: Ptrs have 3-value: public, private, both.
New ptr is private.
Possible looping points (Loop-phi, Fun) sets to public, clearing private.
Other Phi merges as normal (so can be pub+priv).
TypeMem has public & private variants of all types.
Stores update or the other or both; always precise in private, always a MEET in public.
Loads from either merge, otherwise just load from one.
MrgNode resets private mem to the New, passes-thru public.
Fun/Loop/Phis all preserve private memory.
Calls do local-esc analysis for CallEpi.
CallEpi keeps pre-call private memory if not escaping, else uses post-call private memory.

META-IDEA: partial unroll of graph in flow, to get the precision without cloning.


=======================================
Notes from 07/31/2020

- bring back "MemMerge"; really need private-vs-public *memory* and pointers.
- pointers i can declare private via graph shape: direct ptr to DProj-then-NewNode.
- for memory, i need a node which MEETS public & private memory, unlike Join
  which knows it has absolute independence and does a STOMP.
- This MemMerge can really replace the MProj after a New, and so be a MrgProj.
- MrgProj inherits from MProj, adds a public-memory edge and MEETS it.
- MrgProj can flip unrelated public memory to its other side, similar to Join,
  pushing other ops into the "NewNode/MrgProj" region, effectively growing the
  known-private-memory region.  Goal is to handle a version of MAP which NEWs
  before the recursive call, but to keep the known private version after the
  MAP to allow private updates... and supports proper inlining.  Cloning a New
  makes two children which should properly be monotonic for the "off brand"
  memory.
- New no longer takes memory, yes takes control.
- Mrg has a ProjNode like input in slot 0, and a Memory like input in slot 1.

=======================================
Notes from 06/29/2020

- Factory allocations blow all LHS/RHS choices.
- Always "go right" which means no split/join.
- Drop MemMerge
- StartMem is always ~use
- DefMem starts use, lifts per-alias as they appear.
  Eagerly updated as New is updated.
- New takes in mem (not control) and Meets with itself along alias.
- - New produces MProj not OProj
- New tracks simple local escape knowledge.
- Store takes in mem and produces mem, and meets with correct aliases
  unless a single (no children) alias, and then can stomp.
- Parallel Stores bypassing requires a parallel MemJoin/MemSplit.  Still exact (no
  overlaps nor parent/child).  Lazy added.


- use: possibly allocated to worst possible
- obj: alloc as something
- @{low open }: adding  fields; all unknown fields are 'all'
- @{low close}: no more fields; all unknown fields are 'any'
- ~@{}: high, discovery?
- ~obj: high, discovery? of struct-vs-array
- ~use: never allocated

PONDER:
Drop TypeStruct._open, use TypeStruct._use instead


=======================================
Notes from 06/13/2020

Action items
- add global-ptr-use types (used as adr, stored into mem, merged at phi, call arg)
- - meet/dual
- - value props
* - drop NewNode.escapes, DefMemNode.CAPTURED
* add struct-field-complete bit, drop BitsAlias.nflds
- add global-ptr-use live types (subclass of TypeObj, tracking reverse flow props)
* - add global-ptr-use ESCAPE type
- - live-use props
* - live-use ESCAPE props
*- OBJ: never-alloc; low-struct: alloc, still adding fields; high-struct: closed, no more fields; XOBJ: alloc, uninit; UNUSE: unused/dead
* make Call/CallEpi dumb on memory and ptrs
- add Split/Join in parser around all calls & memory uses
- - todo: start optimizing split/joins
* Store/Load do not bypass Call/CallEpi (but yes bypass Split/Join)
* CEPI takes a default RetNode value, same as FunNode takes a default Caller.
* Default FunNode caller knows about default Display memory for parsing.
* Wire when known.
- Set value equal to the default RetNode or default Parm; this allows OOO types
  right up until we remove the default.  At that time, the types must be in alignment.


Live values
- are TypeMem
- for simple numbers, use TypeLive sentinals for live/dead in slot 1
- for pointers, have use types (used by call, store-val, return, etc).
  The live-pointer-value does NOT carry alias info, nor used fields???
- For memory, uses include TypeStruct fields.


=======================================
Notes from 06/11/2020

Want to get away from non-local graph updates, e.g. DefMemNode.CAPUTRED &
NewNode._no_escape.  So move these ideas into the graph flow.

Forwards: ptrs are stored (or not), hence get mixed into memory & thus into
unknown OBJ fields - or not.  A forwards flow property.  Might add into
that a ptr is ld/st address, is used as a call arg, is phi mixed.

Reverse: Ptr values are not just basic-live, but stored (or not), used as an
address, a call arg, etc.  Used by a ScopeNode gives a worst-case user,
so Parser does normal keep-alive.

Doing both of these as a Type means they just flow = no non-local graph issues.

Need a un-init memory value (ISUSED?), a OBJ memory value, a XOBJ and a UNUSED.
Want to track un-init so at split/join can precisely split aliases.

Want to precise-split memory.  No more mixed aliases, and no more mem-merge,
and no more memory meets from MemMerge.  (Imprecise stores can do meets).
Use MemSplit/MemJoin.

Want Split/Join around calls for non-escapes.  Want split/join around all
NewNodes as a single-alias precise memory split.  Want split/join around
all stores (and loads), using the ptr-alias.  This gives me a zillion
split/joins (instead of memmerges) and each one is an exact split.  Can
be obviously optimized (widened) out to the ScopeNode.

This gives the notion of having a Split/Join around each tiny memory piece
in the Parser, and optimze the crap out of it later (including optimizing
in hte Parser).  BUt get it right first, optimize later.

Need a Split tech that replaces the Call/CallEpi - so a Split varient that
takes in call args, does a full reaching analysis and splits memory based
on what reaches.  Call/CallEpi get "stupid".  Inlining a Call gets trivial
again.

Need a split-tech that takes a single ptr value and splits on it.

Join always does SESE regions, left is the Split, right is whatever - and
never do the memories overlap.  So just parallel-joins the memory.

When memory is split, label the not-available side as e.g. XOBJ.  Note that
un-init always "goes right", so 1st time creation can happen anywhere.

A little thinking on the monotonicity problem:
  Split
    Call
      Fun...Funs
        Body...Bodys
      Ret...Rets
    CEpi
  Join
Looking for the optimistic lift during GCP; if I do not find it, then I can
pre-compute - there is no phase-ordering problem.  Optimistic: ptr arrives at
Split but is not in memory, does not escape into Call, Fun, Bodys.  Memory
contents are not modified, and Join takes un-mod memory directly from Split.
If instead at Split, I make ptr go-Right, then into Body, which then escapes,
which then forces a go-Right and mods memory.  When a ptr arrives at a Split,
I need to decide if the memory goes Left or goes Right (or both or neither?).
If I make it go one side, and later the analysis goes the Other Way - then i
cannot drop the side it already went - loses monotonicity - so instead it
must "go both".


======================================= Notes from 06/05/2020

Working on escaping aliases.

Root issue is non-monotonic behavior on escapes.  During GCP, assume ptr does
not reach a call arg, does not escape, so memory slice not modified, so ptr
does not reach call.  But if later DOES reach call, then slice is passed in to
call, and modified.  So sometimes post-call uses the pre-call slice, and
sometimes the post-call slice.  These have to be monotonic.

E.g., alias#15 arrives at a Call, and otherwise is escaping (stored someplace
into memory).  But at this call site, not used at any arg, and not reachable
from any available ptr.  THinking maybe this is a non-issue now.

When can memory slice bypass a call?  If it *ever* escapes (per NewNode), safe
to assume call hammers it - even if not available from reachable ptrs.

Ok, coming around to "not caring" - if alias escapes (via NewNode escape which
limits to *storing* or Phi which stores or a Call), then thread thru all Calls.
Goal: no escape simple recursive display ptrs.  They are not call args, and not
Phi and not store-vals and not returns.  So pass around matching memory slice.

Action: drop Call-escapes.  Keep pass-in / pre-call memory.  CallEpi acts like
a MemJoin (or i insert a MemJoin/MemSplit).  Split criteria is all aliases
based on NewNode escape notion.

This gives-up a per-call slice-around notion!  Future Work!  Can obviously
improve (to using per-call per-alias smarts), but can fix displays simpler.


=======================================
Notes from 05/18/2020

PC dead, a little behind on laptop.

Thinking through the issues with splitting memory from pointers, at type-check
places (call-sites with formals vs actuals, and TypeNodes).  Thinking about all
the places i add code to do partial correctness & forward progress - makes
things very complex.

Can't make out-of-bounds ParmNodes put out their formal value for pointers,
because their pointer aliases (from several Parms) will alias their formals.
e.g. Parm x is typed *[2]->@{x:int}, and Parm y is typed *[2]->@{x:str}.  The
result has @{x:obj}.  Why not merge memory of the formals?  Because formal
memory might LIFT from actual.  I can MEET formal memory and actual memory
also.  This is lower than formal memory; same for the Parms - I cannot lift
them to formals, because the Parm:mem will be BELOW formal memory (being the
meet of actual & formal), and even if i do not meet actual - still the meet of
formals is low.

Other theory: dump out ANY/ALL for every bad result.  Any body with an ALL
input always produces an ALL (except for region/phi which can ignore some
inputs).  Error finding can ignore nodes with ALL inputs - they will be
in-error, but they are not the root error.  Much simpler logic.  CON: cannot
start doing type-specialization, until the types show up.  Means e.g.
typed-functions cannot do primitive spec until GCP proves their input types.

Surely means i need to do type-spec on MEMORY contents, not just normal args.
But still how?  2 Parm:arg come in with ptr types, but totally unknown aliases.
Their formal memories will get merged.  Kinda want to type-spec on alias#s so i
can sharpen each Parm:ptr independently.  This gets me to the notion of having
a alias# that is for a formal, and not related to an allocation site.  I can
use this alias# for all allocation sites that meet the formal spec, in addition
to the normal alias#.  Or I can just note which alloc alias#as match which
"interface alias#s".  At a Parm:arg, if the actuals are OOB the formals, i use
the "iface#" (and Parm:mem uses the iface# for formal memory).  This means an
iface# can hit a Parm:arg.  Usual story: if Parm:arg is OOB, output the formal-
including the formal iface#.

How do i "lift"-only a Parm:ptr?  If it has iface#11, then gets actual memory
12 which is IN-bounds with actual Parm:mem - flipping it from #11 to #12 is
a sideways move???  Not monotonic?

Maybe iface#s are always "below" actual alias#s?  So a meet with any IFACE#
beats all ALIAS#s?  Otherwise IFACE#s simply meet like alias#s.  Still need a
heirarchy of iface#s - or else, if any Parm has an iface# come in, it must
produce its own iface# out.  This means all recursive/loops will only be using
iface#s.  This means i can make forward progress on the loop body - but must
always use GCP to clear out.


PLAN A: All OOB types produce an ALL.  Most nodes when recieve a ALL, produce
an ALL.  All nodes when receive a bad type, produce an ALL.  "Broken graph"
produces the same type (freeze in place).  Errors don't count if a node gets an
ALL (not the root error cause).
PRO: Much simpler.
CON: Cannot make progress without valid types, especially for recursive fcns.

PLAN B: Same as now: produce valid value out always.  For Formal pointers,
invent an "iface#" like an alias# but not related to a New.  For Parm:ptr, if
OOB, produce this iface#.  For Parm:mem, JOIN the formal memory with actual,
using this iface#.  Lifts the produced memory "as if" the formal, so can make
progress before GCP.  At any Parm:arg, if any value is an iface# or not any
alias#, or alias#+memory is OOB to formal - then produce the local iface#
always.  Similar to poison ALL, except can make forward progress.


=======================================
Notes from 03/26/2020

Wiring: purpose is to shortcut args into parm-meets.
With unknown_caller, is optional since already pessimistic parm-meet.
Right now inlining (which removes unknown_caller) requires wiring.

ITER:

Never wire choices (can disappear).  UnknownNode reports choices in iter.
Means not flowing thru primitives in recursive functions?!?  Correct.
Yes wire constant-and-multi.
Always Wire as long as nargs==args; bad args might go dead so always valid.

CHANGES: CallEpi ideal does not bail out early... rolls thru the inline checks
& then wires.  Can bail for choices, dead-from-below, or mal-formed Call.
Since inline requires are checks & wire does not: ponder flipping order.  Wire
first.  Then inline if also args are good.  Only inlines wired.

CHANGE: Call must pass CTRL if args *might* be valid to that call target, even
if not valid to other call targets.  Ponder adding a assert-args shortcut
check.  Drop ParmNode bounds, since ctrl-not-on if args fail check.
CHANGE: Call cannot check args, since valid to some targets and not others.
CHANGE: assert-args DOES error check.  


GCP:
Always wire as long as nargs==args; bad args might stay dead so always valid.
Wire if no choice: (down to 1 func, or multi).
Above-center choices never wires, needs to settle down (resolve).


=======================================
Notes from 03/24/2020
MAIN ISSUE

Add to Function Default Memory, a Merge with the incoming Display.  Always
guarentees minimal display.  The Loads from external displays can optimize away
- or at least see a Phi/Parm of 2 Merges with the same memory at the Loads'
alias.

Very similar: split Memory into Display memory and Heap memory.  Force the
split to all things.  The local-var-loads can have a pre-sharpened memory.
Bigger change, and (perhaps) a easier guarantee.  "Should be the same".

---

Other bug: cannot add type-annot after a function call, and no error.
Easy parse grammer bug to fix.  See TODO in Parse main comment.


=======================================
Notes from 02/27/2020

A lot of troubles arise because need dead bad code to actually get removed from
the graph.  Currently no notion of removing dead fields, so built a lot of
complex flow pushing 'not so near' neighbors on worklists to propagate enough
info to let fields go dead.

Add a reverse-flow 'dead' notion, only useful for fields and during OPTIMISTIC
analysis.  Useful for fields because I'm not slicing seperate nodes per-field,
so there's no per-field notion of deleting dead nodes.  Useful during opto()
because a dead field does not need to be computed, so its inputs are also dead,
recursively.  There's a classic feedback path here, with monotontically
improving results.

The base iter/opto algo can work in both directions (and iter() totally does
now), just not in the base implementation: several ideal() calls push nodes
from the reverse direction).

TypeStruct:
*Remove 'clean' per struct, no need after this.
*Add a 'dead'  bit to TypeStruct fields.  This is a reverse-flow field.
*Add a 'clean' bit to TypeStruct fields.  This is a forward-flow field.
*Defaults to 'alive'.  Alive if any use is alive.  Dead otherwise.
*Roll-up 'dead'  bit to all Types as well - for use during opto.


*Node: Reverse flow alias_uses can be removed.
*Remove 'not so near' add-to-worklist.

*CallNode: Remove filter memory into FunNode.  Just pass it all.  It will be
*discovered 'clean' in the function and the original value used at CallEpi.

*Parse: Do not clear out user-struct closure field; it will go dead.

*Opto init: defaults to dead.  Exit Scope is alive, and thus its defs are alive,
*recursively.  A Node is dead if all using nodes are marked dead.  Only
*interesting during opto, because during iter() dead nodes are deleted.

*Dead Nodes only compute their startype.

SESE Call/Fun precision improvements.  Ret starts with all memory as dead, and
this pushes uphill to Fun.  Call gets a set of alive memory from CallEpi and
from Fun - but Fun/Ret does NOT get alive memory from all CallEpis.  Removes
the classic merge approx on function exit.

SESE Call/Fun precision improvents: Fun starts with all memory fields as clean.
On exit, can be used to improve precision of merged memory results passed to
callers: CallEpi on clean fields takes Call-input type.


New map_closure
Merge Parm_mem & New map_closure
... ld map_closure.x
Call [allmem+map_closure is available]
  Fun: 
  Parm mem: [allmem+map_closure]
  ... allmem is unchanged; map_closure is also clean?
  Ret 
Call_Epi: takes from pre-call memory for map_closure
... ld map_closure: mem from Call_epi, can bypass clean, gets to Merge, bypasses...


=======================================
Notes from 11/4/2019

Reached a point where need to split by aliases across phis ... during parsing,
to keep precision enough for the nScalar tests.  Experimenting with running
iter during Parse.  Works surprisingly well.


=======================================
Notes from 11/2/2019

Missing an execution model for full closures.  Ignoring type-inference or exact
syntax or even semantics, want to actually execute w/closures to try tiny
examples on lifetime management.


---
Trying the impl...
Need to load '-' from starting Scope; scope pts to:
  ctrl,mem,New
Missed; needs to point to:

CTRL (start ctrl)
 |  (start memory)  (primitives, stored as funptrs into closure)
 |    XMEM            New
 |      \        [OProj,DProj]
  \      \          /
   \-    Scope    -/

Normal 'fact' lookup turns into:
 - find Scope
 - Issue Load for field against memory & address from Scope.

Normal 'stmt' update inserts a Store:

some               (closure#17)
ctrl                  New#17
  |    some_mem   [OProj#17,DProj#17]
  |    [all-17]     |           /
  |      |         .... (any number of stores, or Phis)
   \     \         /          / 
         Scope   -/ (#17)  --/ (the ptr-to-#17)


Does Scope need "all the other memories"?  Or just the parser?
Or is the parser using the Scope memory exactly for that...
The stack of Scopes gives me a stack of memory... which is supposed
to be serialized (except for implicit parallelism from unaliased closures).
Which makes me suspicious that in fact can be aliased.

   > (inc, get) = { cnt=0; ({cnt++},{cnt}) }()

   Fun ParmMem[all-#17]
      0
      |
     New#17 (cnt)
     [OProj,DProj]
                \
get = Fun-Parm   \  <<-- requires #17 here on mem parm
            |    | 
             Load|  <<-- since uses #17
           Ret   |
                 |
inc = Fun-Parm   |
            |    | 
            |Load|
            |    \ +1
            |     |
             Store
             /
          Ret
     (inc,get) <<-- closure memory always escapes on Ret, but can go dead later
     Ret

Called from Top-level:
  Call
  CallEpi
  [ctrl,mem+#17,(inc,get)]
  

=======================================
Notes from 11/1/2019


Pondering making NewNode a single scalar field only.  Returns a TMP with single
alias#, somehow attached to the user-notion of an allocation site (plus its
clones when inlined).  Flattens the alias# space to remove the field-level
alias.  Does not help arrays?  Allows fields to die independently.  Means
I do not have to figure out any field-level opts, since the graph does it.
NO: Does not help arrays... still need 2 levels of aliasing.


  Call fcn-ptr,args
  CallEpi: wired Funs
  [Ctrl,AllMem,Val]

Only 1 single "phat" memory in any network slice.

Rules for MemMerge: All inputs are unaliased - but may share alias# if on right
is a NewNode.  Far left is the "phat" memory (includes alias#1) and others are
input in first alias# order? (so I can find easily, beats array by alias# I
think).  Never same alias twice (unless a NewNode).

Rules for StoreNode (which I highly suspect are not yet proper): Output Mem is
same alias as Input Mem.  No bypassing phat-Store-by-phat-Store based on
different alias# without direct replacement... because leaves 2 parallel "phat"
memory.  Instead, request memory split and the independent alias#s float
about.  If stores are on different skinny alias, then already bypassed.

"Request Memory Split" - if this Node expects to use some sub-part of memory,
but is given phat memory, pass the request "up hill".  If this Node expects to
"root" "phat" memory, then insert the split: AliasProj's based on users;
AliasProj's need some quick way to assert unrelated alias#s - all alias# splits
listed on 'phat' memory AliasProj perhaps?  Or on the 'phat' producer?

Means: can ask a using node for the set of aliass it uses (without regard to
which input edge), and can insert graph widening Nodes.  Maybe do not need the
field-level nodes because field aliasing info is "perfect".

Phi with "Request memory split" - Shatter "phat" phi into alias Phis, but not
for "has unknown callers".  Can further shatter alias#phis into field#phis.

NewNode-as-closure; during Parsing can add_fld.  But cannot del_fld, even as it
goes out of scope - because of closure can have live uses.  Out-of-scope means
the variable lookup quits succeeding.

Can we have a reachability-analysis for each TypeMem, based on the reaching
TMP+fields, assuming all are read allows max reachability in alias class which
allows max reading ptrs, recursively?
If a alias#+fld doesn't appear in the max-reach, then its not needed in the TypeMem?
Can be different if accessed from different TMPs?  If there is only one, can canonicalize!
Can be ~Scalared in the type, does not show in the used-aliases-on-ask.
Can be recorded as part of the canonicalization?


-------------------------------------
Example needed for updating closure fields directly, bleah.


[Ctrl,AllMem,Val]
 |
 |   NewNode [TStruct,TMP]
  \  [OProj#18,DProj]
   \  |          |   3.1415
   MemMerge[All] /  /
      |         /  /
      Store[#18]
      |  <type is All,18>


Store, direct to MemMerge, cannot bypass on #18 alone, can only bypass if
address pts to prior generator of address.  Works in this case:

[Ctrl,AllMem,Val]
 |
 |   NewNode [TStruct,TMP]
 |   [OProj#18,DProj]    3.1415
 |    |         /       /
 |    Store[#18.z]     /
 |    |  <type is 18>
 |    |
 MemMerge[All]
     |       


Store, direct to OProj,DProj,NewNode, and NO other same-field uses of OProj can
fold; but want independent field folding, so request field split/merge.
Multiple stores will stack back-to-back and serialize.  Probably do not need
THIS level of precision, since field-name-alias is perfect.

[Ctrl,AllMem,Val]
 |
 |   NewNode [TStruct,TMP]
 |   [OProj#18,DProj]
 |   [x][y][z]   |    3.1415
 |    |  |  |    |   /
 |    |  | Store[#18.z]
 |    |  |  |
 |   [FldMerge]
 MemMerge[All]
     |       


Store of a FldProj - must be matching field and alias (or error).
Can "peek" thru for opts.

[Ctrl,AllMem,Val]  3.1415
 |                /
 |   NewNode [TStruct,TMP]
 |   [OProj#18,DProj]
 |   [x][y][z] 
 |    |  |  |  
 |    |  |  |
 |   [FldMerge]
 MemMerge[All]
     |       

"Junk" FldSplit/FldMerge rejoins:

[Ctrl,AllMem,Val]  3.1415
 |                /
 |   NewNode [TStruct,TMP]
 |   [OProj#18,DProj]
 |       |    
 MemMerge[All]
     |       


-------------------------------------


=================================================================================

Hidden variable 'cnt' inside outer closure.
Return two functions in a tuple, one increments cnt, the other gets it.
    > (inc, get) = { cnt=0; ({cnt++;0},{cnt}) }()
    > inc()
    0
    > get()
    1
    > inc()
    0
    > get()
    2
The outer anon fcn returns and exits, but the storage for 'cnt' remains.
'inc' and 'get' can read & write 'cnt', but 'cnt' is otherwise private.

Every ScopeNode turns into a NewNode with variable mappings via TypeStruct,
which grows as new var names appear.  Every fcn call passes in a display with
all parent scopes (the Env).  All var refs become lds/sts against the NewNode/
ScopeNode.  Standard ld/st ops apply, and a NewNode goes dead the normal way-
no other uses.  Last "normal" use goes away when fcn exits, but display based
uses from nested fcns (i.e., a REAL closure usage) might keep alive.

Can I do this without going the ld/st route?  What's so special about threading
memory thru-out?  Or even threading just the NewNode, no aliasing issues... i
think.  In the above inc/get I can call it 3 times, get 3 unrelated counters.
Pass the fcns along, and get them inlined.  So inc1 bumps cnt1, inc2 bumps
cnt2, and inlined side-by-side.  So cnt1,cnt2 memory ops come from the same
anon fcn.  Can call in a loop, have millions of ctrs from the same anon fcn -
which must therefore be the same alias, therefore ld/st required.


Plan B:

Keep ScopeNode, but remove most everything from it.  NewNode makes a Struct
which includes finalness and field names.  But need to allow more fields like
Scope does.  At Scope exit, NewNode allowed to be dead.

NewNode produces a Tuple of TMP+field for every field.  Each ProjNode can go
dead independently, matching dead field goes to XSCALAR.  When all proj fields
die, NewNode goes to XMEM (even with MemMerge use).

Phat memory usage "forgets" fields.  To remove single unused fields, need to
explode out of phat memory.

More precise memory handling: 2 layer split/join:

AliasProj - can follow any whole memory.  Slices out a set of disjoint aliases.
FieldProj - can follow any single alias.  Slices out a set of disjoint fields.
FldMerge - collects complete field updates to form a complete alias type.
MemMerge - collects complere alias updates to form total memory.

NewNode - produces alias# that is further exactly not any other instance of the
same alias#; can be followed by FldProj.
MemMerge - can accept a NewNode input that overlaps with same alias#; NewNode
is now "confused".


Looking for a model where individual fields can go dead.
Looking for a model where pre-wired calls can wire without memory (pure)
or read-only memory (const).

Graph rewrite opts: skinny memory reads from a phat memory: explodes it iff
progress.  skinny write forces parser explosion & rejoin.  ScopeNode mem slot
pts to a phat memory or a MemMerge, which pts to many FldMerges.  Leaves it
exploded as parse rolls forward until sees a usage of phat memory.  Then leaves
the Mem/Fld Merge in the graph, and starts anew after def of phat memory.

Escaped ptrs: if at a phat memory usage we can see no instances of TMP alias#
in the memory or values, we can declare "not escaped", and now remove an alias#
from phat node usage.  To see field escapes, need a backwards prop of field
usages.  Currently thinking has no way to detect lack-of-usage except via (lack
of) graph node edges.  Have to "explode" in the graph all phat into alias#s
into fields, and push the "inflated" graph all about, then do DCE.  Note:
cannot remove dead field if ptr escapes at all, because later parser might use
field.  Strictly ok after removing unknown callers.

FunNode with mem Parm: can skip mem, if mem is not used (pure fcn, common on
many operators).  Can cast TypeObj._news to a limit set, and then only takes
that memory alias set and bypass the rest.  If purely reading memory, still
take in that alias, but RetNode pts to the ParmNode directly.  The cast-to-str
PrimNodes which alloc a new Str do not take memory, but the RetNode produces a
brand new alias which needs to fold into a post-call MemMerge.

Pure: RetNode has a null memory (no pre-call split, no post-call merge).  Parm is missing.
Read subset:  RetNode can be equal to Parm, with subset in Parm type.
Write subset: RetNode & Parm has some (not all alias#s), but not equal to Parm.
All: RetNode has phat, so does Parm - and not equal to Parm.
New: RetNode can include MORE alias#s than the Parm.  Needs a MemMerge.
New factory: Parm is missing.  RetNode takes in some aliases.

Plan B2:  No!

Only keep all memory pre-exploded at the alias/field level.  Leads to huge
count of graph nodes, esp for unrelated chunks of code that just "pass thru".


---
For closures, all local vars actually talk to the scope-local NewNode, which
can grow fields for a time.  Stops growing fields at scope exit.  Local var
uses do Load & Store, which collapse against any NewNode including the scope-
local one.  Scope-local NewNode available for inner scopes - this is the
closure usage, and therefore serialize later stores against inner fun calls.
Requires inner calls always take outer scope NewNode, and later optimize.

MemMerge only used before calls to flatten everything.  Wired calls can switch
to using some aliases instead of all, with the alts aliases going around the
call.

Calls not wired take all of memory, including scope-local News.  Can optimize
against non-escaping aliases.  Wired calls can be more exact.

Funs & Mem Parm - split into separate aliases by usage (not defs).  Pass-thru
memories can be optimized by wired calls: direct from Ret/MemMerge/PhatMemParm.
Bypass in the CallEpi Ideal.  Flag the bypassed aliases in the MemMerge... but
somewhere else, perhaps the FunNode, for better assertions.

Root Scope becomes Root New.  Fields are primitive names.  All "final", except
can replace a prim funptr value with a Unresolved of the same name.


=================================================================================
Old notes from 7/6/2019

Bits-split fail attempts

- Plan A: split 6 into 9,10; remove all 6's via a read-barrier-before-use.
  Fails because do not want read-barrier before equality checks.  i.e., I like
  defining   "isa = x -> meet(x)==x" as
  opposed to "isa = x -> meet(x)==rd_bar(x)"

- Plan B: Visit all types in Nodes&GVN and replace 6s with 9,10s.  Too hard to
  track them all.

- Plan C: Split 6 into 12,13.  Canonicalize even/odd pairs back to parent.
  Numbers grow fast (by powers of 2), but managable in a long.  Comment: "Plan
  C fails, unwinding.  Cannot do even/odd bits-split pairs, because i need to
  be able to walk the expanded bits, but i do not track how much expansion was
  done. Really needs an explicit tree structure. Unwinding the even-odd
  bit-pairs notion."

- Plan D, explicit tree structure.  Didn't write up the failure, only that it
  got complicated.

- Plan A2: 6 "becomes" 6,7 everywhere instantly.  For the local users, this is
  great.  Reset logic between tests is insane (must reset all of Bits,
  BitsAlias, BitsRPC, BitsFun and all TypeMems).  The setup (clinit) is also
  insane, because splitting happens during the clinit so must be ordered
  extremely carefully.  Must be careful to track whether something is a single
  alias#num, or a BitsAlias collection.  Collections split over time and grow,
  but the single number does not.


Now pondering a D2 to avoid the horrible reset in A2.

Explicit trees again.

Tree nodes have an alias# and a type; they are invariant, hashconsed & shared.
They only point "up" to the root.  A "split" call requires a parent, and makes
a hash-consed child.  All children of a Tree node are given unique dense alias
numbers which are unique across the tree.

After the reset, the same node will hand out the same numbers every time, so
Bits collections do not need to be reset.  This can be implemented with a "side
doubling array of ints".  This side array is not part of the Tree Node...
!!!Hey, cheap structure that splits-the-same after reset, so does not need to reset Bits!!!


Still thinking TypeMem might be like a tuple (so no any/all choice)???  Drop it
for now...

TypeMem has all the tree leafs (and so does not need the interior???).
No... all "open" tree nodes have unknown future splits, and need a type for
them.  So TypeMem has the interior nodes; unless i declare "closed" at a level,
and then canonicalization demands I collapse this.  But "closed" not useful for
a long time; only Parse constant syntax strings are closed right now, and
NewNodes making singletons.  Funs and RPCs only closed/singleton if I disallow
cloning for inlining.

Brings me around to: do i need explicit trees or not.  Maybe not: a tree of
numbers only.  If i ask for a new child of a "tree middle node", i get a new
alias#, extend the lazy-ly growing tree.  If parent is from a TypeMem, then
i have its type for the child initial type.  Given a child#, and a TypeMem,
I can lookup the type in the TypeMem by walking the tree structure....

---------------
(1) Drop the "becomes", horrible reset logic.

(2) Keep explicit numbers-only tree.  Array-of-ints for parents.  Array-of-
    Array-of-ints for children.  This is a bare-bones tree structure with dense
    #s for every node that does not change between test resets.

    Array-of-ints for child lengths.  This is reset between tests; the initial
    part matches the 1st init, but the later part is just zeros.  Any child
    reporting a zero is actually lazily filled in by CNT++.

(3) TypeMem maps #s to Types (via NBHML? vs array?).  Missing values just do a
    tree-based lookup.  Still have a above/below notion based on #1.

(4) No "closed" types (yet).  So no need to canonicalize Bits beyond the
    1-vs-many bit patterns.