diff --git a/14-compiler/.gitignore b/14-compiler/.gitignore new file mode 100644 index 0000000..a55c7be --- /dev/null +++ b/14-compiler/.gitignore @@ -0,0 +1,2 @@ +*.pdf +target/ diff --git a/14-compiler/01-motivation.tex b/14-compiler/01-motivation.tex new file mode 100644 index 0000000..c3d48da --- /dev/null +++ b/14-compiler/01-motivation.tex @@ -0,0 +1,47 @@ +\documentclass[index.tex]{subfiles} + +\begin{document} +\section{Motivation} +\label{motivation} +Right now, \Hazel{} has a simple tree-walk evaluator based on environments, but not a compiler that +can output fast, optimized executables. This PHI proposes such a compiler. + +\subsection{Objectives} +\label{objectives} +There are a number of criteria we would like the compiler to satisfy: +% +\begin{itemize} + \item Compiled programs should match the semantics and output of the existing evaluator. This + means the compiler should + % + \begin{itemize} + \item support the incomplete programs, i.e. programs with holes. + \item produce programs that have the same final result as an evaluated program. + \end{itemize} + + \item Compiled programs should have faster execution speed than evaluated ones. + \item Complete \Hazel{} programs should execute as normal functional programs; they should require + no extra machinery related to holes. +\end{itemize} +% +In addition, it may be interesting to explore interoperability with the evaluator and live +programming environment: +% +\begin{itemize} + \item Dynamic compilation with hand-off between evaluator and compiler. + \item Incremental compilation, to reduce overhead of compilation in a live environment. + \item Incremental execution, via fill-and-resume-like behaviour. +\end{itemize} + +\subsection{Challenges} +\label{challenges} +The presence of holes poses fundamental challenges for compilers, as \emph{indeterminate results} +produced by \Hazel's semantics are not values. Hence, we are concerned with +% +\begin{itemize} + \item Speed- and space-efficient representations for holes and indeterminate results, as well as + performant operations on them. + \item Static analyses for optimizing incomplete programs. + \item Incrementality and liveness; see above \cref{objectives}. +\end{itemize} +\end{document} diff --git a/14-compiler/02-approach.tex b/14-compiler/02-approach.tex new file mode 100644 index 0000000..5ad5eb4 --- /dev/null +++ b/14-compiler/02-approach.tex @@ -0,0 +1,81 @@ +\documentclass[index.tex]{subfiles} + +\begin{document} +\section{Approach} +\label{sec:approach} +This section outlines the approaches we take towards various aspects of compilation and execution +related to holes. + +\subsection{Runtime representation} +\label{sec:runtime-representation} +We represent indeterminate results as syntax trees in the runtime, and operations taking +indeterminate results merely accumulate a new syntax tree. +% +\begin{example} + In the program $1 + (5 * 6 + \SyEHole{1}{})$, $5 * 6$ produces an ordinary number $30$, but the + subsequent $+ \SyEHole{1}{}$ gives an indeterminate syntax tree with root $+$, left child $30$, and + right child $\SyEHole{1}{}$. The final result is an indeterminate syntax tree with root $+$, left + child $1$, and right child that is the previous tree. +\end{example} +% +\noindent This necessitates dynamics checks before each operation to determine whether operands are +values or indeterminate results; therefore, there must be some runtime data that discriminates +between values and indeterminate results. + +\subsection{Casting} +\label{sec:casting} + +\subsubsection{Embedding/projection} +In a first pass, we adopt a type-indexed embedding/projecting pairs approach \cite{benton2005, +new2018}: casting $x$ from type $\tau$ to the hole type \emph{embeds} $x$ with type information +about $\tau$ into a proxy; casting to the type $\tau'$ \emph{projects} the proxy, dynamically +checking if $\tau = \tau'$. + +\subsubsection{Coercions} +In the future, a coercion-based approach should be taken, which has some space efficiency guarantees +\cite{herman2010, kuhlenschmidt2019}. A ``coercion calculus with holes'' is probably a whole +research topic in itself. + +\subsubsection{Static analysis} +It might be possible, as with determining if expressions are possibly indeterminate at runtime +(\Cref{sec:completeness-analysis}), to statically determine where casts might show up. This might +lend itself to some optimizations, particularly when casts appear as scrutinees of pattern matching +(\Cref{sec:pattern-matching}). + +\subsection{Pattern matching} +\label{sec:pattern-matching} +It should be possible to compile pattern matching with holes into ordinary functional pattern +matching: +\begin{itemize} + \item \emph{Hole patterns} may be compiled into wildcard patterns that immediately stop and return + the entire $\textsf{match}$ expression as an indeterminate results. + \item In the presence of casts, types must be matched on as data. This is something to consider + when designing the runtime representation of indeterminate results + (\Cref{sec:runtime-representation}). +\end{itemize} + +\subsection{Completeness analysis} +\label{sec:completeness-analysis} +Since we want complete portions (i.e. no holes) of a program to run as ordinary functional programs +and not require the machinery necessary for handling holes, we perform a static \emph{completeness +analysis} that determines whether an expressions is guaranteed to be hole-free. To do this, we +define a notion of \emph{completeness}: +% +\begin{definition}[name=Completeness, label=completeness] + An expression may be (1) \emph{necessarily complete}, i.e. it must be a value at runtime; (2) + \emph{necessarily incomplete}, i.e. it must be an indeterminate result at runtime; or (3) + \emph{indeterminately incomplete}, i.e. it may be either a value or an indeterminate result at + runtime. +\end{definition} + +\subsubsection{Function-local completeness} +A basic function-local analysis which treats function parameters as indeterminately +incomplete by default may be implemented in a single pass over the expression tree. See +\Cref{fig:lir-completeness-analysis-local} for the formalization. + +\subsubsection{CFA-based completeness} +A more complex analysis may use control-flow analysis techniques on higher-order functions +\cite{shivers1991, nielson1999}. Other papers to consider reading: \textcite{vardoulaskis2011} (but +apparently $2^{n}$ complexity), \textcite{gilray2016}. + +\end{document} diff --git a/14-compiler/03-01-middle-ir.tex b/14-compiler/03-01-middle-ir.tex new file mode 100644 index 0000000..67599c3 --- /dev/null +++ b/14-compiler/03-01-middle-ir.tex @@ -0,0 +1,10 @@ +\documentclass[index.tex]{subfiles} + +\begin{document} +\subsection{Middle intermediate representation (MIR)} +\label{sec:mir} + +\subsection{Sequentialization} +\label{sec:sequentialization} + +\end{document} diff --git a/14-compiler/03-02-lower-ir.tex b/14-compiler/03-02-lower-ir.tex new file mode 100644 index 0000000..56831c7 --- /dev/null +++ b/14-compiler/03-02-lower-ir.tex @@ -0,0 +1,428 @@ +\documentclass[index.tex]{subfiles} + +\begin{document} +\newcommand{\SyWrap}{\ensuremath{\textsf{\textbf{{wrap}}}}} +\newcommand{\SyCaseComplete}{\ensuremath{\textsf{\textbf{casecomplete}}}} + +\newcommand{\NKMName}{\ensuremath{\textsf{Completeness}}} +\newcommand{\NKMV}{\ensuremath{k}} +\newcommand{\KMName}{\ensuremath{\textsf{}}} +\newcommand{\KMV}{\ensuremath{\overline{k}}} +\newcommand{\TCMName}{\ensuremath{\textsf{Type}}} +\newcommand{\TCMV}{\ensuremath{\tau}} +\newcommand{\TMName}{\ensuremath{\textsf{}}} +\newcommand{\TMV}{\ensuremath{\overline{\tau}}} +\newcommand{\CMName}{\ensuremath{\textsf{Composite}}} +\newcommand{\CMV}{\ensuremath{c}} +\newcommand{\IMName}{\ensuremath{\textsf{Immediate}}} +\newcommand{\IMV}{\ensuremath{i}} +\newcommand{\HIMV}{\ensuremath{u}} +\newcommand{\HEMV}{\ensuremath{\SyHoleEnv}} + +\newcommand{\KNC}{\ensuremath{\pie{0.3ex}{360}}} +\newcommand{\KNI}{\ensuremath{\pie{0.3ex}{0}}} +\newcommand{\KII}{\ensuremath{\pie{0.3ex}{180}}} + +\newcommand{\TCHole}{\ensuremath{\SyEHole{}{}}} +\newcommand{\TCInt}{\ensuremath{\SyTInt}} +\newcommand{\TCFloat}{\ensuremath{\SyTFloat}} +\newcommand{\TCBool}{\ensuremath{\SyTBool}} +\newcommand{\TCPair}[2]{\ensuremath{#1 \SyProd #2}} +\newcommand{\TCFun}[2]{\ensuremath{#1 \SyArrow #2}} + +\newcommand{\TIntNC}{\ensuremath{\TMk{\TCInt}{\KNC}}} +\newcommand{\TIntNI}{\ensuremath{\TMk{\TCInt}{\KNI}}} +\newcommand{\TFloatNC}{\ensuremath{\TMk{\TCFloat}{\KNC}}} +\newcommand{\TFloatNI}{\ensuremath{\TMk{\TCFloat}{\KNI}}} +\newcommand{\TBoolNC}{\ensuremath{\TMk{\TCBool}{\KNC}}} +\newcommand{\TBoolNI}{\ensuremath{\TMk{\TCBool}{\KNI}}} +\newcommand{\TPairNC}[2]{\ensuremath{\TMk{\TCPair{#1}{#2}}{\KNC}}} +\newcommand{\TPairNI}[2]{\ensuremath{\TMk{\TCPair{#1}{#2}}{\KNI}}} +\newcommand{\TFunNC}[2]{\ensuremath{\TMk{(\TCFun{#1}{#2})}{\KNC}}} +\newcommand{\TFunNI}[2]{\ensuremath{\TMk{(\TCFun{#1}{#2})}{\KNI}}} +\newcommand{\TMk}[2]{\ensuremath{#1[#2]}} + +\newcommand{\IIntMV}{\ensuremath{\underline{n}}} +\newcommand{\IFloatMV}{\ensuremath{\underline{f}}} +\newcommand{\IBoolMV}{\ensuremath{\underline{b}}} +\newcommand{\ITrue}{\ensuremath{\SyTrue}} +\newcommand{\IFalse}{\ensuremath{\SyFalse}} +\newcommand{\IEHole}[2]{\ensuremath{\SyEHole{#1}{#2}}} + +\newcommand{\CPlus}[2]{\ensuremath{#1 \SyPlus #2}} +\newcommand{\CTimes}[2]{\ensuremath{#1 \SyTimes #2}} +\newcommand{\CFPlus}[2]{\ensuremath{#1 \SyFPlus #2}} +\newcommand{\CFTimes}[2]{\ensuremath{#1 \SyFTimes #2}} + +\newcommand{\CAnd}[2]{\ensuremath{#1 \SyAnd #2}} +\newcommand{\COr}[2]{\ensuremath{#1 \SyOr #2}} + +\newcommand{\CLam}[3]{\ensuremath{\SyFun #1 : #2 \SyDot~ #3}} +\newcommand{\CAp}[2]{\ensuremath{#1 ~#2}} + +\newcommand{\CLet}[2]{\ensuremath{\SyLet~ #1 = #2}} +\newcommand{\CIn}{\ensuremath{~\SyIn~}} +\newcommand{\CInWith}[1]{\ensuremath{~\SyIn~ #1}} +\newcommand{\CReturn}[1]{\ensuremath{\SyReturn~ #1}} + +\newcommand{\CPair}[2]{\ensuremath{(#1, #2)}} +\newcommand{\CProjL}[1]{\ensuremath{\SyProjL~ #1}} +\newcommand{\CProjR}[1]{\ensuremath{\SyProjR~ #1}} + +\newcommand{\CWrapIntoNI}[1]{\ensuremath{\SyWrap^{\KNI}~ #1}} +\newcommand{\CWrapIntoII}[1]{\ensuremath{\SyWrap^{\KII}~ #1}} +\newcommand{\CEmbed}[1]{\ensuremath{\SyEmbed~ #1}} +\newcommand{\CProj}[2]{\ensuremath{\SyProj[#2]~ #1}} +\newcommand{\CCaseCompleteWith}[1]{\ensuremath{\SyCaseComplete~ #1 ~\SyWith}} +\newcommand{\CCaseCompleteBr}[2]{\ensuremath{\SyBar~ #1 \SyArrow #2}} + +\subsection{Lower intermediate representation (LIR)} +\label{sec:lir} + +\begin{note} + The semantics of this representation are not totally thought out; wouldn't be surprised if there's + some big inconsistency in this approach\ldots +\end{note} + +In the lower intermediate representation (LIR), whose syntax is partially given in +\cref{fig:lir-syntax}, completeness (see \Cref{sec:lir-completeness}) is made explicit in the type +system (see \Cref{fig:lir-ta-imm,fig:lir-ta-comp}). + +\begin{figure}[htb!] + \[\begin{array}{rccl} + \NKMName & \NKMV & \Coloneqq & \KNC \mid \KNI \\ + \KMName & \KMV & \Coloneqq & \NKMV \mid \KII \\ + \TCMName & \TCMV & \Coloneqq & \TCHole \mid \TCInt \mid \TCFloat \mid \TCBool + \mid \TCPair{\TCMV}{\TCMV} + \mid \TCFun{\TMk{\TCMV}{\NKMV}}{\TMV} \\ + \TMName & \TMV & \Coloneqq & \TMk{\TCMV}{\KMV} \\ + \IMName & \IMV & \Coloneqq & x \mid \IEHole{\HIMV}{\HEMV} + \mid \IIntMV \mid \IFloatMV \mid \IBoolMV + \mid \CLam{x}{\TMk{\TCMV}{\NKMV}}{\CMV} \\ + \CMName & \CMV & \Coloneqq & \IMV \\ + & & \mid & \CPlus{\IMV}{\IMV} + \mid \CTimes{\IMV}{\IMV} + \mid \CFPlus{\IMV}{\IMV} + \mid \CFTimes{\IMV}{\IMV} + \mid \CAnd{\IMV}{\IMV} + \mid \COr{\IMV}{\IMV} \\ + & & \mid & \CAp{\IMV}{\IMV} \\ + & & \mid & \CPair{\IMV}{\IMV} + \mid \CProjL{\IMV} + \mid \CProjR{\IMV} \\ + & & \mid & \CLet{x}{\CMV} \CInWith{\CMV} \\ + & & \mid & \CWrapIntoNI{\IMV} + \mid \CWrapIntoII{\IMV} \\ + & & \mid & \CEmbed{\IMV} + \mid \CProj{\IMV}{\TCMV} \\ + & & \mid & \CCaseCompleteWith{\IMV} + ~\CCaseCompleteBr{x}{\CMV} + ~\CCaseCompleteBr{x'}{\CMV} + \end{array}\] + % + \caption{LIR syntax} + \label{fig:lir-syntax} +\end{figure} + +\subsubsection{Completeness} +\label{sec:lir-completeness} +Completeness (see \Cref{completeness}) is denoted by the metavariable $\KMV$, with +``necessarily'' cases given by $\NKMV$: $\KNC =$ necessary completeness, $\KNI =$ +necessary incompleteness, and $\KII =$ indeterminate incompleteness. They are encoded in types, +which are given by the metavariable $\TMV$. +% +A type $\TMk{\TCMV}{\KMV}$ is thus a ``concrete'' type $\TCMV$ (need a +better name for this) parameterized by completeness $\KMV$. + +\subsubsection{Immediates} +\label{sec:lir-immediates} +Similar to the high intermediate representation, a distinction is made between \emph{immediates} +(given by $\IMV$) and \emph{composites} (computations, given by $\CMV$). Immediates +are either literals or variables. + +\begin{figure}[htb!] + \judgbox{\ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV}{\TMV}} + $\IMV$ is assigned type $\TMV$ \\ + + \begin{mathpar} + \judgment{ + \holeHasTypeCtx{\holeCtx}{\HIMV}{\TCMV}{\ctx'} \\ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\sigma}{\ctx'} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IEHole{\HIMV}{\sigma}}{\TMk{\TCMV}{\KNI}} + }{TAEHole} \and + + \judgment{ + \inCtx{\ctx}{x}{\TMV} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{x}{\TMV} + }{TAVar} \\ + + \judgment{ }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IIntMV}{\TIntNC} + }{TAIntLit} \and + + \judgment{ }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IIntMV}{\TFloatNC} + }{TAFloatLit} \and + + \judgment{ }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IBoolMV}{\TBoolNC} + }{TABoolLit} \\ + + \judgment{ + \ctxAssignType{\andCtx{\extendCtx{\ctx}{x}{\TMk{\TCMV}{\NKMV}}}{\holeCtx}}{\CMV}{\TMV} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CLam{x}{\TMk{\TCMV}{\NKMV}}{\CMV}}{\TFunNC{\TMk{\TCMV}{\NKMV}}{\TMV}} + }{TALam} + \end{mathpar} + % + \caption{Type assignment for LIR immediates} + \label{fig:lir-ta-imm} +\end{figure} + +\Cref{fig:lir-ta-imm} gives the type assignment rules for immediates, which are quite +straightforward. The rule for holes $\IEHole{\HIMV}{\HEMV}$ is the same as +that of the internal \Hazel{} language. + +\subsubsection{Composites} +\label{sec:lir-composites} +See \cref{fig:lir-syntax-comp} for an brief overview of the composite forms. Note the operations that +are polymorphic over necessary completeness (e.g. integer addition). + +\begin{table} + \begin{center} + \begin{tabular}{rl} + \textbf{Form} & \textbf{Description} \\ + + $\IMV$ + & immediates are also composites \\ + + $\CPlus{\IMV}{\IMV}$, $\CTimes{\IMV}{\IMV}$ + & addition, multiplication of two $\TCInt$ immediates \\ + + $\CFPlus{\IMV}{\IMV}$, $\CFTimes{\IMV}{\IMV}$ + & addition, multiplication of two $\TCFloat$ immediates \\ + + $\CAnd{\IMV}{\IMV}$, $\COr{\IMV}{\IMV}$ + & logical and, logical or of two $\TCBool$ immediates \\ + + $\CPair{\IMV}{\IMV}$ + & pair construction \\ + + $\CProjL{\IMV}$, $\CProjR{\IMV}$ + & left and right pair projection \\ + + $\CAp{\IMV}{\IMV}$ + & function application \\ + + $\CWrapIntoNI{\IMV}$ + & wrapping of a value into an indet. result \\ + + $\CWrapIntoII{\IMV}$ + & wrapping of an immediate into a indeterminately incomplete + result \\ + + $\CEmbed{\IMV}$ + & embedding of an immediate into a cast proxy \\ + + $\CProj{\TCMV}{\IMV}$ + & projection of a proxy to a target type $\TCMV$; dynamic type check and may fail \\ + + $\CLet{x}{\CMV} \CIn \CMV$ + & variable binding \\ + + $\CCaseCompleteWith{\IMV} ~\cdots$ + & dynamic check; first branch taken if value, second branch if indet. result + \end{tabular} + \end{center} + % + \caption{Overview of LIR composite forms} + \label{fig:lir-syntax-comp} +\end{table} + +The syntax and type assignment rules (see \Cref{fig:lir-ta-comp}) enforce serious computations to be +on either necessarily complete or necessarily incomplete immediates; indeterminately incomplete +results may be introduced only either by wrapping (\textsc{\small TAWrapIntoII}) or by projecting a +cast proxy (\textsc{\small TAProj}), and eliminated only by $\SyCaseComplete$ (\textsc{\small +TACaseComplete}). + +\begin{figure}[htb!] + \judgbox{\ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CMV}{\TMV}} + $\CMV$ is assigned type $\TMV$ \\ + + Integers: + \begin{mathpar} + \judgment{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_1}{\TMk{\TCInt}{\NKMV}} \\ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_2}{\TMk{\TCInt}{\NKMV}} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CPlus{\IMV_1}{\IMV_2}}{\TMk{\TCInt}{\NKMV}} + }{TAPlus} \and + + \judgment{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_1}{\TMk{\TCInt}{\NKMV}} \\ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_2}{\TMk{\TCInt}{\NKMV}} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CTimes{\IMV_1}{\IMV_2}}{\TMk{\TCInt}{\NKMV}} + }{TATimes} + \end{mathpar} \\ + % + Floats: + \begin{mathpar} + \judgment{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_1}{\TMk{\TCFloat}{\NKMV}} \\ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_2}{\TMk{\TCFloat}{\NKMV}} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CFPlus{\IMV_1}{\IMV_2}}{\TMk{\TCFloat}{\NKMV}} + }{TAFPlus} \and + + \judgment{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_1}{\TMk{\TCFloat}{\NKMV}} \\ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_2}{\TMk{\TCFloat}{\NKMV}} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CFTimes{\IMV_1}{\IMV_2}}{\TMk{\TCFloat}{\NKMV}} + }{TAFTimes} + \end{mathpar} \\ + % + Booleans: + \begin{mathpar} + \judgment{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_1}{\TMk{\TCBool}{\NKMV}} \\ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_2}{\TMk{\TCBool}{\NKMV}} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CAnd{\IMV_1}{\IMV_2}}{\TMk{\TCBool}{\NKMV}} + }{TAAnd} \and + + \judgment{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_1}{\TMk{\TCBool}{\NKMV}} \\ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_2}{\TMk{\TCBool}{\NKMV}} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\COr{\IMV_1}{\IMV_2}}{\TMk{\TCBool}{\NKMV}} + }{TAOr} + \end{mathpar} \\ + % + Pairs: + \begin{mathpar} + \judgment{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_1}{\TMk{\TCMV}{\NKMV}} \\ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_2}{\TMk{\TCMV'}{\NKMV}} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CPair{\IMV_1}{\IMV_2}}{\TMk{\TCPair{\TCMV}{\TCMV'}}{\NKMV}} + }{TAPair} \\ + + \judgment{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV}{\TMk{\TCPair{\TCMV}{\TCMV'}}{\NKMV}} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CProjL{\IMV}}{\TMk{\TCMV}{\NKMV}} \\ + }{TAFst} \and + + \judgment{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV}{\TMk{\TCPair{\TCMV}{\TCMV'}}{\NKMV}} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CProjR{\IMV}}{\TMk{\TCMV'}{\NKMV}} \\ + }{TASnd} + \end{mathpar} \\ + % + Application: + \begin{mathpar} + \judgment{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_1}{\TMk{(\TCFun{\TMk{\TCMV}{\NKMV}}{\TMV})}{\KNC}} \\ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_2}{\TMk{\TCMV}{\NKMV}} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CAp{\IMV_1}{\IMV_2}}{\TMV} + }{TAAp1} \and + + \judgment{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_1}{\TMk{(\TCFun{\TMk{\TCMV_1}{\NKMV_1}}{\TMk{\TCMV_2}{\NKMV_2}})}{\KNI}} \\ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV_2}{\TMk{\TCMV_1}{\NKMV_1}} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CAp{\IMV_1}{\IMV_2}}{\TMk{\TCMV_2}{\KNC}} + }{TAAp2} + \end{mathpar} \\ + % + \caption{Type assignment for LIR composites} + \label{fig:lir-ta-comp} +\end{figure} + +\begin{figure} \ContinuedFloat + \judgbox{\ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CMV}{\TMV}} + $\CMV$ is assigned type $\TMV$ \\ + + Wrapping: + \begin{mathpar} + \judgment{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV}{\TMk{\TCMV}{\KNC}} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CWrapIntoNI{\IMV}}{\TMk{\TCMV}{\KNI}} + }{TAWrapIntoNI} \and + + \judgment{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV}{\TMk{\TCMV}{\NKMV}} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CWrapIntoII{\IMV}}{\TMk{\TCMV}{\KII}} + }{TAWrapIntoII} + \end{mathpar} \\ + % + Casting: + \begin{mathpar} + \judgment{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV}{\TMk{\TCMV}{\NKMV}} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CEmbed{\IMV}}{\TMk{\TCHole}{\KNI}} + }{TAEmbed} \and + + \judgment{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV}{\TMk{\TCMV}{\KNI}} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CProj{\IMV}{\TCMV'}}{\TMk{\TCMV'}{\KII}} + }{TAProj} + \end{mathpar} \\ + % + Binding: + \begin{mathpar} + \judgment{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CMV}{\TMV} \\ + \ctxAssignType{\andCtx{\extendCtx{\ctx}{x}{\TMV}}{\holeCtx}}{\CMV'}{\TMV'} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\CLet{x}{\CMV} \CIn \CMV'}{\TMV'} + }{TALet} + \end{mathpar} \\ + % + $\KII$ elimination: + \begin{mathpar} + \judgment{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IMV}{\TMk{\TCMV}{\KII}} \\\\ + \ctxAssignType{\andCtx{\extendCtx{\ctx}{x}{\TMk{\TCMV}{\KNC}}}{\holeCtx}}{\CMV}{\TMV} \\ + \ctxAssignType{\andCtx{\extendCtx{\ctx}{x'}{\TMk{\TCMV}{\KNI}}}{\holeCtx}}{\CMV'}{\TMV} + }{ + \ctxAssignType{\andCtx{\ctx}{\holeCtx}}{ + \CCaseCompleteWith{\IMV} + ~\CCaseCompleteBr{x}{\CMV} + ~\CCaseCompleteBr{x'}{\CMV'} + }{\TMV} + }{TACaseComplete} + \end{mathpar} + % + \caption{Type assignment for LIR composites (cont.)} + \label{fig:lir-ta-comp-cont} +\end{figure} + +\subsubsection{Operational semantics} +\label{sec:lir-operational-semantics} +We attempt to give a small-step operational semantics for the lower intermediate representation. + +\begin{figure}[htb!] + \caption{Small-step operational semantics for LIR} + \label{fig:lir-ssos} +\end{figure} + +\subsection{Explicitization} +\label{sec:explicitization} + +\begin{figure}[htb!] + \caption{LIR function-local completeness analysis} + \label{fig:lir-completeness-analysis-local} +\end{figure} + +\end{document} diff --git a/14-compiler/03-03-coercion.tex b/14-compiler/03-03-coercion.tex new file mode 100644 index 0000000..a89ea28 --- /dev/null +++ b/14-compiler/03-03-coercion.tex @@ -0,0 +1,488 @@ +\documentclass[index.tex]{subfiles} + +\begin{document} +\newcommand{\coerce}{\ensuremath{\textsf{coerce}}} +\newcommand{\coerces}[2]{#1 \rightsquigarrow #2} +\newcommand{\ctxCoercesType}[4]{\ensuremath{\ctxAssignType{#1}{#2}{\coerces{#3}{#4}}}} + +\newcommand{\convertsInto}[2]{\ensuremath{#1 \hookrightarrow #2}} +\newcommand{\ctxConvertsInto}[3]{\ensuremath{\withCtx{#1}{\convertsInto{#2}{#3}}}} + +\newcommand{\normalizes}[3]{\ensuremath{\CCompose{#1}{#2} \rightarrowtail #3}} + +\newcommand{\isFinal}[1]{\ensuremath{#1 ~\textsf{final}}} +\newcommand{\isVal}[1]{\ensuremath{#1 ~\textsf{val}}} +\newcommand{\isBoxedVal}[1]{\ensuremath{#1 ~\textsf{boxedval}}} +\newcommand{\isIndet}[1]{\ensuremath{#1 ~\textsf{indet}}} + +\newcommand{\optPremise}[1]{\ensuremath{\textcolor{red}{[}#1\textcolor{red}{]}}} +\newcommand{\transition}[2]{\ensuremath{#1 \longrightarrow #2}} +\newcommand{\markAt}[3]{\ensuremath{#1 = #2\{#3\}}} +\newcommand{\stepsTo}[2]{\ensuremath{#1 \mapsto #2}} + +\newcommand{\EVMName}{\textsf{EvalCtx}} +\newcommand{\EVMV}{\ensuremath{\mathcal{E}}} +\newcommand{\EVEmpty}{\ensuremath{\circ}} +\newcommand{\EVApL}[2]{\ensuremath{#1(#2)}} +\newcommand{\EVApR}[2]{\ensuremath{#1(#2)}} +\newcommand{\EVHole}[3]{\ensuremath{\SyNEHole{#1}{#2}{#3}}} +\newcommand{\EVCast}[2]{\ensuremath{\IECast{#1}{#2}}} + +\newcommand{\TMName}{\textsf{Type}} +\newcommand{\TMV}{\ensuremath{\tau}} +\newcommand{\TBase}{\ensuremath{b}} +\newcommand{\THole}{\ensuremath{\SyEHole{}{}}} +\newcommand{\TArrow}[2]{\ensuremath{#1 \to #2}} + +\newcommand{\CMName}{\textsf{Coercion}} +\newcommand{\CMV}{\ensuremath{c}} + +\newcommand{\CId}{\ensuremath{\textsf{id}}} +\newcommand{\CFail}{\ensuremath{\textsf{fail}}} +\newcommand{\CEmb}[1]{\ensuremath{#1!}} +\newcommand{\CProj}[1]{\ensuremath{#1?}} +\newcommand{\CFun}[2]{\ensuremath{\textsf{fun} ~#1 ~#2}} +\newcommand{\CCompose}[2]{\ensuremath{#1; #2}} + +\newcommand{\DTMName}{\textsf{Dynamic Tag}} +\newcommand{\DTMV}{\ensuremath{g}} +\newcommand{\DTBase}{\ensuremath{\TBase}} +\newcommand{\DTFun}{\ensuremath{\textsf{fun}}} + +\newcommand{\IEMName}{\textsf{Internal Expression}} +\newcommand{\IEMV}{\ensuremath{\prescript{c}{}{d}}} +\newcommand{\IEConst}{\ensuremath{c}} +\newcommand{\IELam}[3]{\ensuremath{\lambda #1 : #2. #3}} +\newcommand{\IEAp}[2]{\ensuremath{#1(#2)}} +\newcommand{\IECast}[2]{\ensuremath{#1 \langle #2 \rangle}} +\newcommand{\IEEHole}[2]{\ensuremath{\SyNEHole{}{#1}{#2}}} +\newcommand{\IENEHole}[3]{\ensuremath{\SyNEHole{#1}{#2}{#3}}} + +\newcommand{\IEHoleNum}{\ensuremath{\SyHoleNum}} +\newcommand{\IEHoleEnv}{\ensuremath{\SyHoleEnv}} + +\newcommand{\IEMVO}{\ensuremath{d}} +\newcommand{\IECastO}[3]{\ensuremath{\IECast{#1}{#2 \Rightarrow #3}}} + +\subsection{Coercion calculus for holes} +\label{sec:coercion} +\begin{warning} + This is a first draft and, as such, the definitions here probably don't actually work out. +\end{warning} +% +This section contains experiments with a coercion calculus for \Hazelnut{}. Specifically, we give an +augmented internal expression language where casts are given by coercions and a conversion to it +from the original internal language. Refer to the \HazelnutLive{} paper \cite{omar2019} for +definitions (notably of expressions and types) not reproduced here. + +\subsubsection{Coercion language} +\label{sec:coercion-language} +Following \textcite{herman2010}, \cref{fig:coercion-coerce} defines the partial function $\coerce : +\TMName \times \TMName \to \CMName$, which gives coercions between two consistent types. + +\begin{figure}[htb!] + \newcommand{\coerceRowIs}[3]{\ensuremath{\coerce(#1, #2) &=& #3}} + \[\begin{array}{rcl} + \coerceRowIs{\TMV}{\TMV}{\CId} \\ + \coerceRowIs{\TBase}{\THole}{\CEmb{\DTBase}} \\ + \coerceRowIs{\THole}{\TBase}{\CProj{\DTBase}} \\ + \coerceRowIs{\TArrow{\TMV_1}{\TMV_2}}{\TArrow{\TMV_1'}{\TMV_2'}}{\CFun{\coerce(\TMV_1', \TMV_1)}{\coerce(\TMV_2, \TMV_2')}} \\ + \coerceRowIs{\THole}{\TArrow{\TMV_1}{\TMV_2}}{\CCompose{\CProj{\DTFun}}{\coerce(\TArrow{\THole}{\THole}, \TArrow{\TMV_1}{\TMV_2})}} \\ + \coerceRowIs{\TArrow{\TMV_1}{\TMV_2}}{\THole}{\CCompose{\coerce(\TArrow{\TMV_1}{\TMV_2}, \TArrow{\THole}{\THole})}{\CEmb{\DTFun}}} + \end{array}\] + % + \caption{Definition of $\coerce$} + \label{fig:coercion-coerce} +\end{figure} + +\Cref{fig:coercion-syntax} gives the syntax of coercions and shows how they fit into expressions, +eliding most of the expression forms, and \cref{fig:coercion-types} gives rules for type +assignment. Note that the forms and typing, as well as most other constructs and lemmas, are +borrowed from \textcite{herman2010}. There are a few key things to note: +% +\begin{itemize} + \item In agreement with \HazelnutLive's semantics, $\CFail$ coercions (which replace the explicit + failed cast form) must coerce two \emph{inconsistent ground} types. (Maybe this is already the + case for those fail coercions?) + + \item The ``dynamic tags'' given by metavariable $\DTMV$ correspond to \emph{ground} types, and + $\TBase$ refers to base types. + + \item $\coerce$ produces compositions of casts that individually cast between type holes and + \emph{ground} types. (This was also true for the coercions of \textcite{herman2010} under our + definition of ground, I believe?). +\end{itemize} + +\begin{figure}[htb!] + \[\begin{array}{rrcl} + \CMName & \CMV & \Coloneqq & \CId \mid \CFail + \mid \CEmb{\DTMV} \mid \CProj{\DTMV} + \mid \CFun{\CMV}{\CMV} \mid \CCompose{\CMV}{\CMV} \\ + \DTMName & \DTMV & \Coloneqq & \DTBase \mid \DTFun \\ + \IEMName & \IEMV & \Coloneqq & \cdots \mid \IECast{\IEMV}{\CMV} + \end{array}\] + % + \caption{Coercion language} + \label{fig:coercion-syntax} +\end{figure} + +\begin{figure}[htb!] + $\judgbox{\ctxCoercesType{}{\CMV}{\TMV}{\TMV'}}$ $\CMV$ coerces expressions of type $\TMV$ to $\TMV'$ + + \begin{mathpar} + \judgment{ }{ + \ctxCoercesType{}{\CId}{\TMV}{\TMV} + }{CId} + + \judgment{ + \isGround{\TMV} \\ + \isGround{\TMV'} \\ + \isNotConsistent{\TMV}{\TMV'} + }{ + \ctxCoercesType{}{\CFail}{\TMV}{\TMV'} + }{CFail} + + \judgment{ }{ + \ctxCoercesType{}{\CEmb{\DTBase}}{\TBase}{\THole} + }{CB!} + + \judgment{ }{ + \ctxCoercesType{}{\CProj{\DTBase}}{\THole}{\TBase} + }{CB?} + + \judgment{ }{ + \ctxCoercesType{}{\CEmb{\DTFun}}{(\TArrow{\THole}{\THole})}{\THole} + }{CFun!} + + \judgment{ }{ + \ctxCoercesType{}{\CProj{\DTFun}}{\THole}{(\TArrow{\THole}{\THole})} + }{CFun?} + + \judgment{ + \ctxCoercesType{}{\CMV_1}{\TMV_1'}{\TMV_1} \\ + \ctxCoercesType{}{\CMV_2}{\TMV_2}{\TMV_2'} + }{ + \ctxCoercesType{}{\CFun{\CMV_1}{\CMV_2}}{\TArrow{\TMV_1}{\TMV_2}}{\TArrow{\TMV_1'}{\TMV_2'}} + }{CFun} + + \judgment{ + \ctxCoercesType{}{\CMV_1}{\TMV}{\TMV_1} \\ + \ctxCoercesType{}{\CMV_2}{\TMV_1}{\TMV_2} + }{ + \ctxCoercesType{}{\CCompose{\CMV_1}{\CMV_2}}{\TMV}{\TMV_2} + }{CCompose} + + \judgment{ + \ctxCoercesType{}{\CMV}{\TMV}{\TMV'} \\ + \ctxAssignType{\ctx}{\IEMV}{\TMV} + }{ + \ctxAssignType{\ctx}{\IECast{\IEMV}{\CMV}}{\TMV'} + }{TACoercion} + \end{mathpar} + % + \caption{Coercion type assignment} + \label{fig:coercion-types} +\end{figure} + +We then have \cref{lem:coercion-well-typed}, which corresponds to Lemma 1 \cite{herman2010}. +% +\begin{lemmat}[name=Well-typed coercions, label=lem:coercion-well-typed] \ + \begin{enumerate} + \item $\isConsistent{\TMV}{\TMV'}$ if and only if $\coerce(\TMV, \TMV')$ is defined. + \item If $\CMV = \coerce(\TMV, \TMV')$, then $\ctxCoercesType{}{\CMV}{\TMV}{\TMV'}$. + \end{enumerate} +\end{lemmat} + +\subsubsection{Cast conversion} +\label{sec:coercion-conversion} +Given the $\coerce$ function (\Cref{fig:coercion-coerce}), conversion from the casts of +$\HazelnutLive$ to coercions is straightforward. \Cref{fig:coercion-conversion} gives rules for +conversion from the original internal language (of which we have elided all rules except those for +handling casts). + +\begin{figure}[htb!] + $\judgbox{\ctxConvertsInto{\andCtx{\ctx}{\holeCtx}}{\IEMVO}{\IEMV}}$ $\IEMVO$ is converted into $\IEMV$ + + \begin{mathpar} + \judgment{ + \ctxConvertsInto{\andCtx{\ctx}{\holeCtx}}{\IEMVO}{\IEMV} \\ + \CMV = \coerce(\TMV, \TMV') + }{ + \ctxConvertsInto{\andCtx{\ctx}{\holeCtx}}{\IECastO{\IEMVO}{\TMV}{\TMV'}}{\IECast{\IEMV}{\CMV}} + }{CoCast} + \end{mathpar} + % + \caption{Cast conversion rules} + \label{fig:coercion-conversion} +\end{figure} + +\begin{theorem}[name=Cast conversion type preservation, label=thm:coercion-type-preservation] + If $\ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IEMV}{\TMV}$ and + $\ctxConvertsInto{\andCtx{\ctx}{\holeCtx}}{\IEMV}{\IEMV'}$, then + $\ctxAssignType{\andCtx{\ctx}{\holeCtx}}{\IEMV'}{\TMV}$. +\end{theorem} +% +Importantly, this conversion is type-preserving (\Cref{thm:coercion-type-preservation}). As we will +see in the following section, the usage of $\coerce$ also removes the need for the original ITGround +and ITExpand instruction transition rules. + +\subsubsection{Dynamic semantics} +\label{sec:coercion-dynamics} +Following $\HazelnutLive$, we give a complete dynamic semantics, first defining the judgments +$\isFinal{\IEMV}$, $\isBoxedVal{\IEMV}$, and $\isIndet{\IEMV}$ in \cref{fig:coercion-final-forms}. + +\begin{figure}[htb!] + \begin{multicols}{2} + $\judgbox{\isFinal{\IEMV}}$ $\IEMV$ is final + + \begin{mathpar} + \judgment{ + \isBoxedVal{\IEMV} + }{ + \isFinal{\IEMV} + }{FBoxedVal} + + \judgment{ + \isIndet{\IEMV} + }{ + \isFinal{\IEMV} + }{FIndet} + \end{mathpar} \\ + + $\judgbox{\isVal{\IEMV}}$ $\IEMV$ is a value + + \begin{mathpar} + \judgment{ }{ + \isVal{\IEConst} + }{VConst} + + \judgment{ }{ + \isVal{\IELam{x}{\TMV}{\IEMV}} + }{VLam} + \end{mathpar} \\ + \end{multicols} + + $\judgbox{\isBoxedVal{\IEMV}}$ $\IEMV$ is a boxed value + + \begin{mathpar} + \judgment{ + \isVal{\IEMV} + }{ + \isBoxedVal{\IEMV} + }{BVVal} + + \judgment{ + \isBoxedVal{\IEMV} + }{ + \isBoxedVal{\IECast{\IEMV}{\CFun{\CMV_1}{\CMV_2}}} + }{BVArrCast} + + \judgment{ + \isBoxedVal{\IEMV} + }{ + \isBoxedVal{\IECast{\IEMV}{\CEmb{\DTMV}}} + }{BVHoleCast} + \end{mathpar} \\ + + $\judgbox{\isIndet{\IEMV}}$ $\IEMV$ is indeterminate + + \begin{mathpar} + \judgment{ + }{ + \isIndet{\IEEHole{\IEHoleNum}{\IEHoleEnv}} + }{IEHole} + + \judgment{ + \isFinal{\IEMV} + }{ + \isIndet{\IENEHole{\IEMV}{\IEHoleNum}{\IEHoleEnv}} + }{INEHole} + + \judgment{ + \IEMV_1 \neq \IECast{\IEMV_1'}{\CFun{\CMV_1}{\CMV_2}} \\ + \isIndet{\IEMV} \\ + \isFinal{\IEMV_2} + }{ + \isIndet{\IEAp{\IEMV_1}{\IEMV_2}} + }{IAp} + + \judgment{ + \isIndet{\IEMV} \\ + % \isGround{\TMV} + }{ + \isIndet{\IECast{\IEMV}{\CEmb{\DTMV}}} + }{ICastGroundHole} + + \judgment{ + \IEMV \neq \IECast{\IEMV'}{\CEmb{\DTMV}} \\ + \isIndet{\IEMV} \\ + % \isGround{\TMV} + }{ + \isIndet{\IECast{\IEMV}{\CProj{\DTMV}}} + }{ICastHoleGround} + + \judgment{ + % \TArrow{\TMV_1}{\TMV_2} \neq \TArrow{\TMV_1'}{\TMV_2'} \\ + \isIndet{\IEMV} \\ + }{ + \isIndet{\IECast{\IEMV}{\CFun{\CMV_1}{\CMV_2}}} + }{ICastArr} + \end{mathpar} + % + \caption{Final forms} + \label{fig:coercion-final-forms} +\end{figure} + +\Cref{fig:coercion-normalization} the rules for \emph{coercion normalization}, which is given by the +judgment $\normalizes{\CMV_1}{\CMV_2}{\CMV}$. + +\begin{figure}[htb!] + $\judgbox{\normalizes{\CMV_1}{\CMV_2}{\CMV}}$ $\CCompose{\CMV_1}{\CMV_2}$ normalizes to $\CMV$ + % + \newcommand{\normalizesRow}[3]{\ensuremath{\CCompose{#1}{#2} &\rightarrowtail& #3}} + \begin{mathpar} + \judgment{ }{ + \normalizes{\CId}{\CMV}{\CMV} + }{NId1} + + \judgment{ }{ + \normalizes{\CMV}{\CId}{\CMV} + }{NId2} + + \judgment{ }{ + \normalizes{\CFail}{\CMV}{\CFail} + }{NFail1} + + \judgment{ }{ + \normalizes{\CMV}{\CFail}{\CFail} + }{NFail2} + + \judgment{ }{ + \normalizes{\CEmb{\DTMV}}{\CProj{\DTMV}}{\CId} + }{NEmbProj1} + + \judgment{ + \DTMV \neq \DTMV' + }{ + \normalizes{\CEmb{\DTMV}}{\CProj{\DTMV'}}{\CFail} + }{NEmbProj2} + + \judgment{ + \normalizes{\CMV_1'}{\CMV_1}{\CMV_1''} \\ + \normalizes{\CMV_2}{\CMV_2'}{\CMV_2''} + }{ + \normalizes{\CFun{\CMV_1}{\CMV_2}}{\CFun{\CMV_1'}{\CMV_2'}}{\CFun{\CMV_1''}{\CMV_2''}} + }{NFun} + + \judgment{ + \normalizes{\CMV_2}{\CMV_3}{\CMV} + }{ + \normalizes{(\CCompose{\CMV_1}{\CMV_2})}{\CMV_3}{\CCompose{\CMV_1}{\CMV}} + }{NAssoc} + \end{mathpar} + % + \caption{Coercion normalization} + \label{fig:coercion-normalization} +\end{figure} + +\Cref{fig:coercion-instruction-transitions,fig:coercion-contextual-dynamics} define the transition +rules. The rules are analogous to the original \HazelnutLive{} ones, but ITCastSucceed and +ITCastFail are subsumed by ITCastCompose. + +\begin{figure}[htb!] + $\judgbox{\transition{\IEMV}{\IEMV'}}$ $\IEMV$ takes an instruction transition to $\IEMV'$ + + \begin{mathpar} + \judgment{ + \optPremise{\isFinal{\IEMV_2}} + }{ + \transition{\IEAp{(\IELam{x}{\TMV}{\IEMV_1})}{\IEMV_2}}{\subst{\IEMV_2}{x}{\IEMV_1}} + }{ITLam} + + \judgment{ + \optPremise{\isFinal{\IEMV_1}} \\ + \optPremise{\isFinal{\IEMV_2}} \\ + % \TArrow{\TMV_1}{\TMV_2} \neq \TArrow{\TMV_1'}{\TMV_2'} + }{ + \transition{\IEAp{\IECast{\IEMV_1}{\CFun{\CMV_1}{\CMV_2}}}{\IEMV_2}}{\IECast{(\IEAp{\IEMV_1}{\IECast{\IEMV_2}{\CMV_1}})}{\CMV_2}} + }{ITApCast} + + \judgment{ + \optPremise{\isFinal{\IEMV}} + }{ + \transition{\IECast{\IEMV}{\CId}}{\IEMV} + }{ITCastId} + + \judgment{ + \optPremise{\isFinal{\IEMV}} \\ + % \isGround{\TMV} \\ + \normalizes{\CMV_1}{\CMV_2}{\CMV} + }{ + \transition{\IECast{\IEMV}{\CCompose{\CMV_1}{\CMV_2}}}{\IECast{\IEMV}{\CMV}} + }{ITCastCompose} + \end{mathpar} + % + \caption{Instruction transitions} + \label{fig:coercion-instruction-transitions} +\end{figure} + +\begin{figure}[htb!] + \[\begin{array}{rrcl} + \EVMName & \EVMV & \Coloneqq & \EVEmpty \mid \EVApL{\EVMV}{\IEMV} \mid \EVApR{\IEMV}{\EVMV} + \mid \EVHole{\EVMV}{\IEHoleNum}{\IEHoleEnv} \mid \EVCast{\EVMV}{\CMV} + \end{array}\] + + $\judgbox{\markAt{\IEMV}{\EVMV}{\IEMV'}}$ $\IEMV$ is obtained by placing $\IEMV'$ at the mark + in $\EVMV$ + + \begin{mathpar} + \judgment{ }{ + \markAt{\IEMV}{\EVEmpty}{\IEMV} + }{FHOuter} + + \judgment{ + \markAt{\IEMV_1}{\EVMV}{\IEMV_1'} + }{ + \markAt{\IEAp{\IEMV_1}{\IEMV_2}}{\EVApL{\EVMV}{\IEMV_2}}{\IEMV_1'} + }{FHAp1} + + \judgment{ + \optPremise{\isFinal{\IEMV_1}} \\ + \markAt{\IEMV_2}{\EVMV}{\IEMV_2'} + }{ + \markAt{\IEAp{\IEMV_1}{\IEMV_2}}{\EVApR{\IEMV_1}{\EVMV}{\IEMV_2'}} + }{FHAp2} + + \judgment{ + \markAt{\IEMV}{\EVMV}{\IEMV'} + }{ + \markAt{\IENEHole{\IEMV}{\IEHoleNum}{\IEHoleEnv}}{\EVHole{\EVMV}{\IEHoleNum}{\IEHoleEnv}}{\IEMV'} + }{FHNEHoleInside} + + \judgment{ + \markAt{\IEMV}{\EVMV}{\IEMV'} + }{ + \markAt{\IECast{\IEMV}{\CMV}}{\EVCast{\EVMV}{\CMV}}{\IEMV'} + }{FHCastInside} + \end{mathpar} \\ + + $\judgbox{\stepsTo{\IEMV}{\IEMV'}}$ $\IEMV$ steps to $\IEMV'$ + + \begin{mathpar} + \judgment{ + \markAt{\IEMV}{\EVMV}{\IEMV_0} \\ + \transition{\IEMV_0}{\IEMV_0'} \\ + \markAt{\IEMV'}{\EVMV}{\IEMV_0'} + }{ + \stepsTo{\IEMV}{\IEMV'} + }{Step} + \end{mathpar} + % + \caption{Evaluation contexts and steps} + \label{fig:coercion-contextual-dynamics} +\end{figure} + +Of course, metatheorems governing the evaluation of internal expressions in \HazelnutLive{} ought to +also hold for the coercion-based internal expressions. + +\end{document} diff --git a/14-compiler/03-formalization.tex b/14-compiler/03-formalization.tex new file mode 100644 index 0000000..a45d7b6 --- /dev/null +++ b/14-compiler/03-formalization.tex @@ -0,0 +1,41 @@ +\documentclass[index.tex]{subfiles} + +\begin{document} +\section{Formalizations} +\label{formalization} +We try to formulate a calculus for the compilation of functional programs with holes. This +formalization is very much in-progress and likely does not reflect the current status of the +implementation. Ideally, once the calculus is stable, we should like to prove that these +translations preserve the semantics of the source language. + +We also experiment with designing a coercion calculus for \Hazelnut{}. For this, see +\Cref{sec:coercion}. + +\subsection{Overview} +Compilation begins with the Hazel internal language and passes through a number of stages. + +\subsubsection{Transformation} +The implementation inserts a first pass (that is quite unnecessary and whose processes probably +should be folded into a subsequent pass) which cleans up the internal language structure to be +easier to compile. + +\subsubsection{Sequentialization} +\emph{Sequentialization} (see \Cref{sec:sequentialization}, currently called ``linearization'' in +the codebase) produces a sequential form resembling monadic normal form \cite{danvy2003}. This form +is called the \emph{middle intermediate representation} (see \Cref{sec:mir}). + +\begin{thought} + Maybe in this phase we should compile evaluation logic concerning casts into explicit operations? + This would significantly change how casts should be represented. +\end{thought} + +\subsubsection{Explicitization} +\emph{Explicitization} (see \Cref{sec:explicitization}) performs completeness analysis (see +\Cref{sec:completeness-analysis}) on the middle intermediate representation and produces a +\emph{lower intermediate representation} (see \Cref{sec:lir}). + +\subfile{03-01-middle-ir} +\subfile{03-02-lower-ir} +\subfile{03-03-coercion} + +\end{document} diff --git a/14-compiler/04-implementation.tex b/14-compiler/04-implementation.tex new file mode 100644 index 0000000..b429bb4 --- /dev/null +++ b/14-compiler/04-implementation.tex @@ -0,0 +1,7 @@ +\documentclass[index.tex]{subfiles} + +\begin{document} +\section{Implementation} +\label{sec:implementation} + +\end{document} diff --git a/14-compiler/Makefile b/14-compiler/Makefile new file mode 100644 index 0000000..3d23f12 --- /dev/null +++ b/14-compiler/Makefile @@ -0,0 +1,17 @@ +LATEXMK ?= latexmk +DOT ?= dot + +.PHONY: all +all : index.pdf assets/pipeline.svg + +%.pdf : %.tex + $(LATEXMK) $*.tex + mv target/$*.pdf $*.pdf + +assets/%.svg : %.dot + $(DOT) $^ -Tsvg -o $@ + +.PHONY : clean +clean : + rm -rf target + rm -f *.pdf diff --git a/14-compiler/assets/hole-contexts-code.png b/14-compiler/assets/hole-contexts-code.png new file mode 100644 index 0000000..2f43217 Binary files /dev/null and b/14-compiler/assets/hole-contexts-code.png differ diff --git a/14-compiler/assets/hole-contexts.svg b/14-compiler/assets/hole-contexts.svg new file mode 100644 index 0000000..576c7b2 --- /dev/null +++ b/14-compiler/assets/hole-contexts.svg @@ -0,0 +1,175 @@ + + + + + + + +hole contexts spaghetti stack example + +cluster_lam + +fun a { ... } + + + +hole90 + +H +90 + + + +hole67 + +H +67 + + + +ab + +a + +b + + + +ab->hole90 + + + + + +c + +c + + + +ab->c + + + + + +lama + +a + + + +c->lama + + + + + +f + +f + + + +c->f + + + + + +hole113 + +H +113 + + + +hole99 + +H +99 + + + +holed + +H +d + + + +lamd + +d + + + +lama->lamd + + + + + +lamd->hole113 + + + + + +lamd->holed + + + + + +lamf + +f + + + +lamd->lamf + + + + + +lamf->hole99 + + + + + +lamg + +g + + + +lamf->lamg + + + + + +f->hole67 + + + + + +g + +g + + + +f->g + + + + + diff --git a/14-compiler/assets/pipeline.svg b/14-compiler/assets/pipeline.svg new file mode 100644 index 0000000..5056c3d --- /dev/null +++ b/14-compiler/assets/pipeline.svg @@ -0,0 +1,184 @@ + + + + + + +pipeline + +hazelc pipeline + +cluster_hazelcore + +hazelcore + + +cluster_compiler + +compiler + + +cluster_middle_end + +middle-end + + +cluster_codegen + +codegen + + + +elaborate +elaborate + + + +dhexp + +dhexp + + + +elaborate->dhexp + + + + + +uhexp + +uhexp + + + +uhexp->elaborate + + + + + +transform +transform + + + +dhexp->transform + + + + + +hir + +hir + + + +transform->hir + + + + + +linearize +linearize + + + +anf + +anf + + + +linearize->anf + + + + + +hir->linearize + + + + + +anf->anf + + + + + +codegen +codegen + + + +anf->codegen + + + + + +gir + +gir + + + +codegen->gir + + + + + +print +print + + + +grain + +grain + + + +print->grain + + + + + +grainc +grainc + + + +wasm + +wasm + + + +grainc->wasm + + + + + +gir->print + + + + + +grain->grainc + + + + + diff --git a/14-compiler/index.tex b/14-compiler/index.tex new file mode 100644 index 0000000..ec9d890 --- /dev/null +++ b/14-compiler/index.tex @@ -0,0 +1,48 @@ +\documentclass{report} +\setcounter{secnumdepth}{3} + +\usepackage[margin=1in]{geometry} + +\input{preamble/preamble} +\addbibresource{references.bib} + +\begin{document} +\setcounter{chapter}{13} +\unchapter{Compiler} + +\section{Contents} +The compiler project explores \Hazel{} and typed holes in a compilation context. The +current implementation can be found under the +\href{https://github.com/hazelgrove/hazel/tree/hazelc}{\texttt{hazelc}} branch. + +\tableofcontents + +\subsection{See also} +\label{sec:see-also} +It may also be worthwhile to reference the following, though they may outdated in various degrees: +% +\begin{itemize} + \item The \href{https://github.com/hazelgrove/hazelc-icfp22-src}{extended abstract}, + \href{https://github.com/mirryi/hazelc-icfp22-poster}{poster}, and + \href{https://github.com/mirryi/hazelc-icfp22-slides}{talk slides} submitted to and presented at + the ICFP 2022 Student Research Competition. +\end{itemize} + +\subsection{Further design} +\label{sec:further-design} +Further design is likely possible in most areas. Particularly: +\begin{itemize} + \item Runtime representation of indeterminate results (see \Cref{sec:runtime-representation}) + \item Cast handling (see \Cref{sec:casting}) + \item Pattern matching (see \Cref{sec:pattern-matching}) +\end{itemize} + +\subfile{01-motivation} +\subfile{02-approach} +\subfile{03-formalization} +\subfile{04-implementation} + +\section{References} +\printbibliography[heading=none] + +\end{document} diff --git a/14-compiler/latexmkrc b/14-compiler/latexmkrc new file mode 100644 index 0000000..a48eb88 --- /dev/null +++ b/14-compiler/latexmkrc @@ -0,0 +1,4 @@ +$out_dir = 'target'; + +$pdf_mode = 1; +$pdflatex = 'pdflatex -interaction=nonstopmode -shell-escape'; diff --git a/14-compiler/notes/embedded-interpreters/assets/untitled.png b/14-compiler/notes/embedded-interpreters/assets/untitled.png new file mode 100644 index 0000000..55363f7 Binary files /dev/null and b/14-compiler/notes/embedded-interpreters/assets/untitled.png differ diff --git a/14-compiler/notes/embedded-interpreters/embedded-interpreters.md b/14-compiler/notes/embedded-interpreters/embedded-interpreters.md new file mode 100644 index 0000000..05cea66 --- /dev/null +++ b/14-compiler/notes/embedded-interpreters/embedded-interpreters.md @@ -0,0 +1,25 @@ +# Embedded Interpreters + +Author: Nick Benton + +## Main Contributions + +- The paper introduces the type-directed embedding technique to implement interpreters into **statically-typed functional language**. +- It discusses how this technique can be extended to languages with with recursive types and basic meta-programming features. +- The paper also shows how the method combines with Filinski’s continuation-based monadic reflection operations to define a extended version of CBV monadic translation, enabling values to be mapped bi-directionally between the levels of an interpreters for a functional language parameterized by an arbitrary monad. + +## Takeaways + +### Popular Ways of Embedding + +- for cases like Jython, Rhino and JScript(to .Net VM), the glue between the interpreting and interpreted languages is provided by the underlying execution engines support for dynamic types and reflection. +- for cases like some DSLs in Scheme, the distinction between languages is blurred by the host language (higher-order languages like) Scheme’s features such as higher-order functions and powerful macro facilities. +- in these cases(noted as “shallow embedding”), there’re many drawbacks... + - the interpreted language is pretty much restricted to the host languages’ syntax and semantics. E.g. with these we cannot embed a dynamically-typed language into a host language with only static binding. + - it’s hard to produce informative error messages. + +### Embedding/ Projection Pairs + +![Untitled](assets/untitled.png) + +### To be continued... diff --git a/14-compiler/notes/gradual-typing/gradual-typing.md b/14-compiler/notes/gradual-typing/gradual-typing.md new file mode 100644 index 0000000..bdb485a --- /dev/null +++ b/14-compiler/notes/gradual-typing/gradual-typing.md @@ -0,0 +1,9 @@ +# Annotated Bibliography + +* Gradual Typing for Functional Languages [[link](http://schemeworkshop.org/2006/13-siek.pdf)] +... + +* Gradual Type Theory [[link](https://arxiv.org/abs/1811.02440)] + + +* Interpretations of the GTLC [[link](https://dl.acm.org/doi/10.1145/2661103.2661112)] \ No newline at end of file diff --git a/14-compiler/notes/grift/grift.md b/14-compiler/notes/grift/grift.md new file mode 100644 index 0000000..bb2d323 --- /dev/null +++ b/14-compiler/notes/grift/grift.md @@ -0,0 +1,52 @@ +# Grift: Toward Efficient Gradual Typing for Structural Types via Coercions + +Author: Andre Kuhlenschmidt, Deyaaeldeen Almahallawi, Jeremy G. Siek +Score /5: ⭐️⭐️⭐️⭐️⭐️ +Status: Reading +Type: Paper + +### Notes: + +- type-based casts will add one proxy to cast chain on each recursive call, thus $O(n)$ catastrophic extra time overhead(See quick sort example, $O(n^2) → O(n^3)$) +- on the contrary, coercions compress the casts which may incur extra time cost on each step(**when and how exactly?**), but compressed casts will be constant later +- **the core design innovation:** what coercions do on function types is to merge proxies to achieve space efficiency. + +```jsx +// pseudo code for proxy-merging logic: +if (TAG(v: obj) is not proxy) + build_proxy(v: obj, c: crcn); +else { + c2: crcn = compose(proxy→crcn, crcn c); + build_proxy(p->wrapper, p->closure, c2); +} +``` + +- no general-purpose global optimizations in Grift such as constant folding etc. These misc optimzations are done after compile to c, and by clang. +- a local optimiza: eliminate useless casts in untyped code region by not allocating proxy for it. +- data representation: + - $\text{Int/Bool/Char}$: stored as is in **64-bits** + - $\text{Function}$: pointer to either a flat closure or a **proxy closure**, indicated by the lowest bit of pointer + - flat: omitted here + - proxy: for functions that **has been cast**, consists of + 1. ptr to a wrapper function + 2. ptr to original closure + 3. ptr to a coercion(**what**) + - $\text{Dyn}$: 64-bits integer, with 3 lowest bits indicating the type that has been **injected**, and 61-bits that is either inlined value(for Int/Bool) or ptr to a pair of 64-bits chunks of one value and one type. + - $\text{Coercion}$: 64-bits, lowest 3-bits indicate the type of coercion(projection, injection, sequence, etc.) The 61-bits store a ptr to structures as below: (except: identity coercion → leave them unused) + - $T?^p$($\text{Dyn} → T$): 2 x 64 bits, one for a ptr to $T$ and the second is a ptr to the blame label(quoted String) + - $T!$($T → \text{Dyn}$): 64-bits, ptr to $T$ + - $c_1, ..., c_n -> d$: (n + 2) x 64 bits, 1st stores the second tag, 2nd stores coercion on return, the rest n blocks each store a coercion for one argument. + +### Questions: + +- how is coercions done to accompolish $O(n)$ space complexity, and why $size(c) ≤ 5*(2^h - 1)$, $c$ is coercion, $h$ is the height of a coercion(how is it defined?) + +> the height of a coercion is bounded by the height of the types in its source program +> +- what is formal definition of “proxy”? how is it constructed? +- what is the definition of “normal form” of coercions? +- for (mutable) reference coercion, why apply c when reading and d when writing? +- what does `UNTAG(obj v)` exactly do in the applying function `coerce(obj v, crcn c)`? +- what does the following point means: + +> Because the coercion implementation distinguishes between regular closures and proxy closures, one might expect closure call sites to branch on the type of closure being applied. However, this is not the case because Grift ensures that the memory layout of a proxy closure is compatible with the regular closure calling convention. \ No newline at end of file diff --git a/14-compiler/notes/space-efficient/assets/untitled-00.png b/14-compiler/notes/space-efficient/assets/untitled-00.png new file mode 100644 index 0000000..32ebf71 Binary files /dev/null and b/14-compiler/notes/space-efficient/assets/untitled-00.png differ diff --git a/14-compiler/notes/space-efficient/assets/untitled-01.png b/14-compiler/notes/space-efficient/assets/untitled-01.png new file mode 100644 index 0000000..4306022 Binary files /dev/null and b/14-compiler/notes/space-efficient/assets/untitled-01.png differ diff --git a/14-compiler/notes/space-efficient/assets/untitled-02.png b/14-compiler/notes/space-efficient/assets/untitled-02.png new file mode 100644 index 0000000..ee44340 Binary files /dev/null and b/14-compiler/notes/space-efficient/assets/untitled-02.png differ diff --git a/14-compiler/notes/space-efficient/assets/untitled-03.png b/14-compiler/notes/space-efficient/assets/untitled-03.png new file mode 100644 index 0000000..2ecb02e Binary files /dev/null and b/14-compiler/notes/space-efficient/assets/untitled-03.png differ diff --git a/14-compiler/notes/space-efficient/assets/untitled-04.png b/14-compiler/notes/space-efficient/assets/untitled-04.png new file mode 100644 index 0000000..36d0279 Binary files /dev/null and b/14-compiler/notes/space-efficient/assets/untitled-04.png differ diff --git a/14-compiler/notes/space-efficient/space-efficient.md b/14-compiler/notes/space-efficient/space-efficient.md new file mode 100644 index 0000000..1f05961 --- /dev/null +++ b/14-compiler/notes/space-efficient/space-efficient.md @@ -0,0 +1,22 @@ +# Space-Efficient Gradual Typing + +Author: David Herman + +### Main Contribution + +- the paper defines the coercion language and its typing rules. +- they implement their set of typing and evaluation rules in the work which elaborates source program(GLTC) into a target language with **explicit type casts**(that is, an expression can be placed anywhere with consistent type) and casts are represented with **coercions** instead of proxy-based implementations. +- they derive the proof of its space-consumption bounds. +- with additional coercion normalization rules(on Fail corercion), this implementation can reveal ill-typing bugs on function’s definition site instead of on its application. + +### Takeaways + +![Untitled](assets/untitled-00.png) + +![Untitled](assets/untitled-01.png) + +![Untitled](assets/untitled-02.png) + +![Untitled](assets/untitled-03.png) + +![Untitled](assets/untitled-04.png) diff --git a/14-compiler/pipeline.dot b/14-compiler/pipeline.dot new file mode 100644 index 0000000..124caff --- /dev/null +++ b/14-compiler/pipeline.dot @@ -0,0 +1,88 @@ +digraph pipeline { + label = "hazelc pipeline" + newrank = true; + + graph [ fontname = "Fira Code", fontsize = 10.0 ]; + node [ fontname = "Fira Code", fontsize = 10.0, ]; + edge [ fontname = "Fira Code", fontsize = 9.0 , arrowhead="vee", arrowsize = 0.5 ]; + + subgraph cluster_hazelcore { + label = "hazelcore"; + rank = same; + + graph [ colorscheme = ylgnbu9, fontcolor = 8 ]; + node [ colorscheme = ylgnbu9, fontcolor = 8 ]; + edge [ colorscheme = ylgnbu9, color = 7 ]; + + style = rounded; + bgcolor = 1; + color = 1; + + node [ shape = plaintext, ]; + elaborate [ label = "elaborate" ]; + + node [ shape = box, style = filled, color = 2, fillcolor = 2 ]; + uhexp [ label = "uhexp" ]; + dhexp [ label = "dhexp" ]; + + uhexp -> elaborate -> dhexp; + } + + subgraph cluster_compiler { + label = "compiler"; + labelloc = b; + + graph [ colorscheme = ylorbr9, fontcolor = 9 ]; + node [ colorscheme = ylorbr9, fontcolor = 9 ]; + edge [ colorscheme = ylorbr9, color = 7 ]; + + style = rounded; + bgcolor = 2; + color = 2; + + subgraph cluster_middle_end { + label = "middle-end" + rank = same; + + style = rounded; + bgcolor = 3; + + node [ shape = plaintext ]; + transform [ label = "transform" ]; + linearize [ label = "linearize" ]; + + node [ shape = box, style = filled, color = 4, fillcolor = 4 ]; + hir [ label = "hir" ]; + anf [ label = "anf" ]; + + transform -> hir -> linearize -> anf -> anf; + } + + subgraph cluster_codegen { + label = "codegen" + rank = same; + + style = rounded; + bgcolor = 3; + + node [ shape = plaintext ]; + codegen [ label = "codegen" ]; + print [ label = "print" ]; + grainc [ label = "grainc" ]; + + node [ shape = box, style = filled, color = 4, fillcolor = 4 ]; + gir [ label = "gir" ]; + grain [ label = "grain" ]; + + codegen -> gir -> print -> grain -> grainc + } + + anf -> codegen; + } + + node [ shape = box, style = filled, colorscheme = bupu9, color = 3, fillcolor = 3 ]; + wasm [ label = "wasm" ]; + + dhexp -> transform; + grainc -> wasm; +} diff --git a/14-compiler/preamble/math.tex b/14-compiler/preamble/math.tex new file mode 100644 index 0000000..6598329 --- /dev/null +++ b/14-compiler/preamble/math.tex @@ -0,0 +1,32 @@ +\declaretheorem[name=Theorem, parent=chapter]{theorem} +\declaretheorem[name=Lemma, parent=theorem]{lemma} +\declaretheorem[name=Lemma, sibling=theorem]{lemmat} +\declaretheorem[name=Corollary, parent=theorem]{corollary} +\declaretheorem[name=Axiom, parent=chapter]{axiom} +\declaretheorem[name=Algorithm, parent=chapter, style=definition]{algorithm} + +\declaretheorem[name=Proposition, sibling=lemma]{proposition} +\declaretheorem[name=Property, parent=chapter]{property} +\declaretheorem[name=Claim, numbered=no, style=remark]{claim} + +\declaretheorem[name=Definition, parent=chapter, style=definition]{definition} +\declaretheorem[name=Definition, numbered=no, style=definition]{definition*} +\declaretheorem[name=Example, parent=chapter, style=definition]{example} + +\declaretheorem[name=Base, numbered=no, style=remark]{base} +\declaretheorem[name=Inductive Step, numbered=no, style=remark]{induction} + +\declaretheorem[name=Approach, numbered=no, style=remark]{approach} +\declaretheorem[name=Idea, numbered=no, style=remark]{idea} +\declaretheorem[name=Note, numbered=no, style=remark]{note} +\declaretheorem[name=Remark, numbered=no, style=remark]{remark} +\declaretheorem[name=Fact, numbered=no, style=remark]{fact} +\declaretheorem[name=Punchline, numbered=no, style=remark]{punchline} +\declaretheorem[name=Generalization, numbered=no, style=remark]{generalization} +\declaretheorem[name=Tip, numbered=no, style=remark]{tip} +\declaretheorem[name=Hint, numbered=no, style=remark]{hint} +\declaretheorem[name=Thought, numbered=no, style=remark]{thought} +\declaretheorem[name=Warning, numbered=no, style=remark]{warning} + +\declaretheorem[name=Question, numbered=no, style=definition]{question} +\declaretheorem[name=Solution, numbered=no, style=definition]{solution} diff --git a/14-compiler/preamble/preamble.tex b/14-compiler/preamble/preamble.tex new file mode 100644 index 0000000..4322c42 --- /dev/null +++ b/14-compiler/preamble/preamble.tex @@ -0,0 +1,62 @@ +\usepackage{iftex} +\ifPDFTeX + \usepackage{libertinust1math} + \usepackage{libertine} +\else + \usepackage{libertinus} +\fi +\usepackage[scaled=0.8]{FiraMono} + +\usepackage{xcolor} + +\usepackage{amsmath} +\usepackage{amssymb} +\usepackage{amsfonts} +\usepackage{amsthm} +\usepackage{thmtools} +\usepackage{mathtools} +\usepackage{mathpartir} +\usepackage{stmaryrd} +\usepackage{bm} + +\usepackage{multicol} + +\usepackage{tikz} + +\usepackage{caption} +\usepackage{booktabs} +\usepackage{enumitem} +\setlist{itemsep=1pt} + +\usepackage{placeins} +\let\Oldsection\section +\renewcommand{\section}{\FloatBarrier\Oldsection} +\let\Oldsubsection\subsection +\renewcommand{\subsection}{\FloatBarrier\Oldsubsection} +\let\Oldsubsubsection\subsubsection +\renewcommand{\subsubsection}{\FloatBarrier\Oldsubsubsection} + +\usepackage{biblatex} + +\usepackage[colorlinks=true, linkcolor=purple]{hyperref} +\usepackage[capitalise, noabbrev]{cleveref} + +\usepackage{subfiles} + +\numberwithin{figure}{chapter} +\numberwithin{table}{chapter} + +\makeatletter +\renewcommand\tableofcontents{\@starttoc{toc}} + +\makeatletter +\newcommand{\unchapter}[1]{% + \begingroup + \let\@makechapterhead\@gobble + \chapter{#1} + \endgroup +} +\makeatother + +\include{preamble/math} +\include{preamble/symbols} diff --git a/14-compiler/preamble/symbols.tex b/14-compiler/preamble/symbols.tex new file mode 100644 index 0000000..e7f8f86 --- /dev/null +++ b/14-compiler/preamble/symbols.tex @@ -0,0 +1,115 @@ +\newcommand{\Hazel}{\textsf{Hazel}} +\newcommand{\Hazelnut}{\textsf{Hazelnut}} +\newcommand{\HazelnutLive}{\textsf{Hazelnut Live}} + +% +% Utilities +% + +% Circular pie +\newcommand{\pie}[2]{% + \begin{tikzpicture} + \draw (0,0) circle (#1); \fill[rotate=90] (#1,0) arc (0:#2:#1) -- (0,0) -- cycle; + \end{tikzpicture}% +} + +% Spacing +\newcommand{\Indent}{\ensuremath{\hspace{10pt}}} + +% +% Judgments +% +\newcommand{\judgment}[3]{\inferrule{#1}{#2}~~{\textsc{\small [#3]}}} +\newcommand{\judgbox}[1]{\noindent \fbox{$#1$}} + +% Consistency +\newcommand{\isConsistent}[2]{\ensuremath{#1 \sim #2}} +\newcommand{\isNotConsistent}[2]{\ensuremath{#1 \nsim #2}} + +% Ground +\newcommand{\isGround}[1]{\ensuremath{#1 ~\textsf{ground}}} + +% Typing contexts +\newcommand{\andCtx}[2]{\ensuremath{#1, #2}} +\newcommand{\ctx}{\ensuremath{\Gamma}} +\newcommand{\extendCtx}[3]{\ensuremath{#1 ; ~\assignType{#2}{#3}}} +\DeclareMathOperator{\dom}{dom} +\newcommand{\inCtx}[3]{\ensuremath{#2, #3 \in #1}} +\newcommand{\notInCtx}[2]{\ensuremath{#2 \not\in \dom(#1)}} +\newcommand{\withCtx}[2]{\ensuremath{#1 \vdash #2}} + +% Typing judgments +\newcommand{\assignType}[2]{\ensuremath{#1 : #2}} +\newcommand{\ctxAssignType}[3]{\ensuremath{\withCtx{#1}{\assignType{#2}{#3}}}} +\newcommand{\synType}[2]{\ensuremath{#1 \Rightarrow #2}} +\newcommand{\ctxSynType}[3]{\ensuremath{\withCtx{#1}{\synType{#2}{#3}}}} +\newcommand{\anaType}[2]{\ensuremath{#1 \Leftarrow #2}} +\newcommand{\ctxAnaType}[3]{\ensuremath{\withCtx{#1}{\anaType{#2}{#3}}}} + +% Hole contexts +\newcommand{\holeCtx}{\ensuremath{\Delta}} +\newcommand{\holeHasTypeCtx}[4]{\ensuremath{#2 :: #3[#4] \in #1}} + +% Substitution +\newcommand{\subst}[3]{\ensuremath{[#1 / #2] #3}} + +% +% Symbols +% + +% Punctuation +\newcommand{\SyDArrow}{\ensuremath{\Rightarrow}} +\newcommand{\SyDNArrow}{\ensuremath{\nRightarrow}} +\newcommand{\SyArrow}{\ensuremath{\to}} +\newcommand{\SyBar}{\ensuremath{\vert}} +\newcommand{\SyColon}{\ensuremath{\textsf{\textbf{{:}}}}} +\newcommand{\SyDot}{\ensuremath{.}} +\newcommand{\SyProd}{\ensuremath{\times}} + +% Keywords +\newcommand{\SyWith}{\ensuremath{\textsf{\textbf{{with}}}}} +\newcommand{\SyMatch}{\ensuremath{\textsf{\textbf{{match}}}}} +\newcommand{\SyReturn}{\ensuremath{\textsf{\textbf{return}}}} + +% Holes +\definecolor{lavender}{RGB}{162,85,162} +\newcommand{\SyNEHole}[3]{\ensuremath{\textcolor{lavender}{\bm{\llparenthesis}}#1\textcolor{lavender}{\bm{\rrparenthesis}^{#2}_{#3}}}} +\newcommand{\SyEHole}[2]{\ensuremath{\SyNEHole{}{#1}{#2}}} +\newcommand{\SyHoleNum}{\ensuremath{u}} +\newcommand{\SyHoleEnv}{\ensuremath{\sigma}} + +% Casting +\newcommand{\SyCastL}{\ensuremath{\langle}} +\newcommand{\SyCastR}{\ensuremath{\rangle}} + +% Integers +\newcommand{\SyTInt}{\ensuremath{\textsf{Int}}} +\newcommand{\SyPlus}{\ensuremath{+}} +\newcommand{\SyTimes}{\ensuremath{*}} + +% Floats +\newcommand{\SyTFloat}{\ensuremath{\textsf{Float}}} +\newcommand{\SyFPlus}{\ensuremath{+^{\cdot}}} +\newcommand{\SyFTimes}{\ensuremath{*^{\cdot}}} + +% Booleans +\newcommand{\SyTBool}{\ensuremath{\textsf{Bool}}} +\newcommand{\SyTrue}{\ensuremath{\textsf{true}}} +\newcommand{\SyFalse}{\ensuremath{\textsf{false}}} +\newcommand{\SyAnd}{\ensuremath{\land}} +\newcommand{\SyOr}{\ensuremath{\lor}} + +% Pairs +\newcommand{\SyProjL}{\ensuremath{\textsf{\textbf{fst}}}} +\newcommand{\SyProjR}{\ensuremath{\textsf{\textbf{snd}}}} + +% Functions +\newcommand{\SyFun}{\ensuremath{\lambda}} + +% Let +\newcommand{\SyLet}{\ensuremath{\textsf{\textbf{{let}}}}} +\newcommand{\SyIn}{\ensuremath{\textsf{\textbf{{in}}}}} + +% Casts +\newcommand{\SyEmbed}{\ensuremath{\textsf{\textbf{emb}}}} +\newcommand{\SyProj}{\ensuremath{\textsf{\textbf{proj}}}} diff --git a/14-compiler/references.bib b/14-compiler/references.bib new file mode 100644 index 0000000..0d608b1 --- /dev/null +++ b/14-compiler/references.bib @@ -0,0 +1,317 @@ +@article{benton2005, + title = {Embedded interpreters}, + volume = {15}, + DOI = {10.1017/S0956796804005398}, + number = {4}, + journal = {Journal of Functional Programming}, + publisher = {Cambridge University Press}, + author = {Benton, Nick}, + year = {2005}, + pages = {503–542}, +} + +@inproceedings{danvy2003, + title = {A New One-Pass Transformation into Monadic Normal Form}, + author = {Olivier Danvy}, + booktitle = {CC}, + year = {2003}, +} + +@inproceedings{gilray2016, + author = {Gilray, Thomas and Lyde, Steven and Adams, Michael D. and Might, Matthew and Van Horn, + David}, + title = {Pushdown Control-Flow Analysis for Free}, + year = {2016}, + isbn = {9781450335492}, + publisher = {Association for Computing Machinery}, + address = {New York, NY, USA}, + url = {https://doi.org/10.1145/2837614.2837631}, + doi = {10.1145/2837614.2837631}, + abstract = {Traditional control-flow analysis (CFA) for higher-order languages introduces spurious + connections between callers and callees, and different invocations of a function may + pollute each other's return flows. Recently, three distinct approaches have been + published that provide perfect call-stack precision in a computable manner: CFA2, PDCFA + , and AAC. Unfortunately, implementing CFA2 and PDCFA requires significant engineering + effort. Furthermore, all three are computationally expensive. For a monovariant + analysis, CFA2 is in O(2^n), PDCFA is in O(n^6), and AAC is in O(n^8). In this paper, + we describe a new technique that builds on these but is both straightforward to + implement and computationally inexpensive. The crucial insight is an unusual + state-dependent allocation strategy for the addresses of continuations. Our technique + imposes only a constant-factor overhead on the underlying analysis and costs only + O(n^3) in the monovariant case. We present the intuitions behind this development, + benchmarks demonstrating its efficacy, and a proof of the precision of this analysis.}, + booktitle = {Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of + Programming Languages}, + pages = {691–704}, + numpages = {14}, + keywords = {Static analysis, Control-flow analysis, Abstract interpretation, Pushdown analysis, + Store-allocated continuations}, + location = {St. Petersburg, FL, USA}, + series = {POPL '16}, +} + +@misc{grain2022, + author = {Grain}, + title = {Grain: A strongly-typed functional programming language for the modern web.}, + note = {Retrieved June 08, 2022 from \url{https://grain-lang.org}}, + urldate = {2022-06-08}, + year = {2022}, +} + +@inproceedings{haas2017, + address = {Barcelona Spain}, + title = {Bringing the web up to speed with {WebAssembly}}, + isbn = {978-1-4503-4988-8}, + url = {https://dl.acm.org/doi/10.1145/3062341.3062363}, + doi = {10.1145/3062341.3062363}, + language = {en}, + urldate = {2022-06-09}, + booktitle = {Proceedings of the 38th {ACM} {SIGPLAN} {Conference} on {Programming} {Language} { + Design} and {Implementation}}, + publisher = {ACM}, + author = {Haas, Andreas and Rossberg, Andreas and Schuff, Derek L. and Titzer, Ben L. and Holman, + Michael and Gohman, Dan and Wagner, Luke and Zakai, Alon and Bastien, Jf}, + month = jun, + year = {2017}, + pages = {185--200}, +} + +@article{herman2010, + title = {Space-efficient gradual typing}, + author = {Herman, David and Tomb, Aaron and Flanagan, Cormac}, + journal = {Higher-Order and Symbolic Computation}, + volume = {23}, + number = {2}, + pages = {167--189}, + year = {2010}, + publisher = {Springer}, +} + +@inproceedings{kuhlenschmidt2019, + author = {Kuhlenschmidt, Andre and Almahallawi, Deyaaeldeen and Siek, Jeremy G.}, + title = {Toward Efficient Gradual Typing for Structural Types via Coercions}, + year = {2019}, + isbn = {9781450367127}, + publisher = {Association for Computing Machinery}, + address = {New York, NY, USA}, + url = {https://doi.org/10.1145/3314221.3314627}, + doi = {10.1145/3314221.3314627}, + abstract = {Gradual typing combines static and dynamic typing in the same program. Siek et al. + (2015) describe five criteria for gradually typed languages, including type soundness + and the gradual guarantee. A significant number of languages have been developed in + academia and industry that support some of these criteria (TypeScript, Typed Racket, + Safe TypeScript, Transient Reticulated Python, Thorn, etc.) but relatively few support + all the criteria (Nom, Gradualtalk, Guarded Reticulated Python). Of those that do, only + Nom does so efficiently. The Nom experiment shows that one can achieve efficient + gradual typing in languages with only nominal types, but many languages have structural + types: function types, tuples, record and object types, generics, etc. In this paper we + present a compiler, named Grift, that addresses the difficult challenge of efficient + gradual typing for structural types. The input language includes a selection of + difficult features: first- class functions, mutable arrays, and recursive types. We + show that a close-to-the-metal implementation of run-time casts inspired by Henglein's + coercions eliminates all of the catastrophic slowdowns without introducing significant + average-case overhead. As a result, Grift exhibits lower overheads than those of Typed + Racket. }, + booktitle = {Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and + Implementation}, + pages = {517–532}, + numpages = {16}, + keywords = {compilation, efficiency, gradual typing}, + location = {Phoenix, AZ, USA}, + series = {PLDI 2019}, +} + +@article{new2018, + author = {New, Max S. and Ahmed, Amal}, + title = {Graduality from Embedding-Projection Pairs}, + year = {2018}, + issue_date = {September 2018}, + publisher = {Association for Computing Machinery}, + address = {New York, NY, USA}, + volume = {2}, + number = {ICFP}, + url = {https://doi.org/10.1145/3236768}, + doi = {10.1145/3236768}, + abstract = {Gradually typed languages allow statically typed and dynamically typed code to + interact while maintaining benefits of both styles. The key to reasoning about these + mixed programs is Siek-Vitousek- Cimini-Boyland’s (dynamic) gradual guarantee, which + says that giving components of a program more precise types only adds runtime type + checking, and does not otherwise change behavior. In this paper, we give a semantic + reformulation of the gradual guarantee called graduality. We change the name to promote + the analogy that graduality is to gradual typing what parametricity is to polymorphism. + Each gives a local-to-global, syntactic-to-semantic reasoning principle that is + formulated in terms of a kind of observational approximation. Utilizing the analogy, we + develop a novel logical relation for proving graduality. We show that + embedding-projection pairs (ep pairs) are to graduality what relations are to + parametricity. We argue that casts between two types where one is “more dynamic” (less + precise) than the other necessarily form an ep pair, and we use this to cleanly prove + the graduality cases for casts from the ep-pair property. To construct ep pairs, we + give an analysis of the type dynamism relation—also known as type precision or naive + subtyping—that interprets the rules for type dynamism as compositional constructions on + ep pairs, analogous to the coercion interpretation of subtyping. }, + journal = {Proc. ACM Program. Lang.}, + month = {Jul}, + articleno = {73}, + numpages = {30}, + keywords = {gradual typing, logical relations, dynamic gradual guarantee, observational error + approximation}, +} + +@inbook{nielson1999, + author = "Nielson, Flemming and Nielson, Hanne Riis and Hankin, Chris", + title = "Constraint Based Analysis", + bookTitle = "Principles of Program Analysis", + year = "1999", + publisher = "Springer Berlin Heidelberg", + address = "Berlin, Heidelberg", + pages = "141--209", + abstract = "In this chapter we present the technique of Constraint Based Analysis using a simple + functional language, FUN. We begin by presenting an abstract specification of a Control + Flow Analysis and then study its theoretical properties: it is correct with respect to + a Structural Operational Semantics and it can be used to analyse all programs. This + specification of the analysis does not immediately lend itself to an efficient + algorithm for computing a solution so we proceed by developing first a syntax directed + specification and then a constraint based formulation and finally we show how the + constraints can be solved. We conclude by illustrating how the precision of the + analysis can be improved by combining it with Data Flow Analysis and by incorporating + context information thereby linking up with the development of the previous chapter.", + isbn = "978-3-662-03811-6", + doi = "10.1007/978-3-662-03811-6_3", + url = "https://doi.org/10.1007/978-3-662-03811-6_3", +} + +@article{omar2017, + author = {Omar, Cyrus and Voysey, Ian and Hilton, Michael and Aldrich, Jonathan and Hammer, + Matthew A.}, + title = {Hazelnut: A Bidirectionally Typed Structure Editor Calculus}, + year = {2017}, + issue_date = {January 2017}, + publisher = {Association for Computing Machinery}, + address = {New York, NY, USA}, + volume = {52}, + number = {1}, + issn = {0362-1340}, + url = {https://doi.org/10.1145/3093333.3009900}, + doi = {10.1145/3093333.3009900}, + abstract = {Structure editors allow programmers to edit the tree structure of a program directly. + This can have cognitive benefits, particularly for novice and end-user programmers. It + also simplifies matters for tool designers, because they do not need to contend with + malformed program text. This paper introduces Hazelnut, a structure editor based on a + small bidirectionally typed lambda calculus extended with holes and a cursor. Hazelnut + goes one step beyond syntactic well-formedness: its edit actions operate over + statically meaningful incomplete terms. Naively, this would force the programmer to + construct terms in a rigid "outside-in" manner. To avoid this problem, the action + semantics automatically places terms assigned a type that is inconsistent with the + expected type inside a hole. This meaningfully defers the type consistency check until + the term inside the hole is finished. Hazelnut is not intended as an end-user tool + itself. Instead, it serves as a foundational account of typed structure editing. To + that end, we describe how Hazelnut's rich metatheory, which we have mechanized using + the Agda proof assistant, serves as a guide when we extend the calculus to include + binary sum types. We also discuss various interpretations of holes, and in so doing + reveal connections with gradual typing and contextual modal type theory, the + Curry-Howard interpretation of contextual modal logic. Finally, we discuss how + Hazelnut's semantics lends itself to implementation as an event-based functional + reactive program. Our simple reference implementation is written using js_of_ocaml. }, + journal = {SIGPLAN Not.}, + month = {Jan}, + pages = {86–99}, + numpages = {14}, + keywords = {gradual typing, structure editors, mechanized metatheory, bidirectional type systems}, +} + +@article{omar2019, + author = {Omar, Cyrus and Voysey, Ian and Chugh, Ravi and Hammer, Matthew A.}, + title = {Live Functional Programming with Typed Holes}, + year = {2019}, + issue_date = {January 2019}, + publisher = {Association for Computing Machinery}, + address = {New York, NY, USA}, + volume = {3}, + number = {POPL}, + url = {https://doi.org/10.1145/3290327}, + doi = {10.1145/3290327}, + abstract = {Live programming environments aim to provide programmers (and sometimes audiences) + with continuous feedback about a program's dynamic behavior as it is being edited. The + problem is that programming languages typically assign dynamic meaning only to programs + that are complete, i.e. syntactically well-formed and free of type errors. Consequently + , live feedback presented to the programmer exhibits temporal or perceptive gaps. This + paper confronts this "gap problem" from type- theoretic first principles by developing + a dynamic semantics for incomplete functional programs, starting from the static + semantics for incomplete functional programs developed in recent work on Hazelnut. We + model incomplete functional programs as expressions with holes, with empty holes + standing for missing expressions or types, and non-empty holes operating as membranes + around static and dynamic type inconsistencies. Rather than aborting when evaluation + encounters any of these holes as in some existing systems, evaluation proceeds around + holes, tracking the closure around each hole instance as it flows through the remainder + of the program. Editor services can use the information in these hole closures to help + the programmer develop and confirm their mental model of the behavior of the complete + portions of the program as they decide how to fill the remaining holes. Hole closures + also enable a fill-and-resume operation that avoids the need to restart evaluation + after edits that amount to hole filling. Formally, the semantics borrows machinery from + both gradual type theory (which supplies the basis for handling unfilled type holes) + and contextual modal type theory (which supplies a logical basis for hole closures), + combining these and developing additional machinery necessary to continue evaluation + past holes while maintaining type safety. We have mechanized the metatheory of the core + calculus, called Hazelnut Live, using the Agda proof assistant. We have also + implemented these ideas into the Hazel programming environment. The implementation + inserts holes automatically, following the Hazelnut edit action calculus, to guarantee + that every editor state has some (possibly incomplete) type. Taken together with this + paper's type safety property, the result is a proof-of-concept live programming + environment where rich dynamic feedback is truly available without gaps, i.e. for every + reachable editor state. }, + journal = {Proc. ACM Program. Lang.}, + month = {Jan}, + articleno = {14}, + numpages = {32}, + keywords = {live programming, typed holes, contextual modal type theory, structured editing, + gradual typing}, +} + +@inproceedings{sabry1993, + author = {Amr Sabry and Matthias Felleisen}, + title = {Reasoning about Programs in Continuation-Passing Style}, + booktitle = {LISP AND SYMBOLIC COMPUTATION}, + year = {1993}, + pages = {288--298}, + publisher = {}, +} + +@phdthesis{shivers1991, + author = {Shivers, Olin}, + title = {Control-Flow Analysis of Higher-Order Languages of Taming Lambda}, + year = {1991}, + publisher = {Carnegie Mellon University}, + address = {USA}, + abstract = {Programs written in powerful, higher-order languages like Scheme, ML, and Common Lisp + should run as fast as their FORTRAN and C counterparts. They should, but they don't. A + major reason is the level of optimisation applied to these two classes of languages. + Many FORTRAN and C compilers employ an arsenal of sophisticated global optimisations + that depend upon data-flow analysis: common-subexpression elimination, loop-invariant + detection, induction-variable elimination, and many, many more. Compilers for + higher-order languages do not provide these optimisations. Without them, Scheme, LISP + and ML compilers are doomed to produce code that runs slower than their FORTRAN and C + counterparts. The problem is the lack of an explicit control-flow graph at compile time + , something which traditional data-flow analysis techniques require. In this + dissertation, I present a technique for recovering the control-flow graph of a Scheme + program at compile time. I give examples of how this information can be used to perform + several data-flow analysis optimisations, including copy propagation, + induction-variable elimination, useless-variable elimination, and type recovery.The + analysis is defined in terms of a non-standard semantic interpretation. The + denotational semantics is carefully developed, and several theorems establishing the + correctness of the semantics and the implementing algorithms are proven.}, + note = {UMI Order No. GAX91-26964}, +} + +@article{vardoulaskis2011, + author = {Dimitrios Vardoulakis and Olin Shivers}, + title = {{CFA2:} a Context-Free Approach to Control-Flow Analysis}, + journal = {Log. Methods Comput. Sci.}, + volume = {7}, + number = {2}, + year = {2011}, + url = {https://doi.org/10.2168/LMCS-7(2:3)2011}, + doi = {10.2168/LMCS-7(2:3)2011}, + timestamp = {Thu, 25 Jun 2020 21:29:07 +0200}, + biburl = {https://dblp.org/rec/journals/corr/abs-1102-3676.bib}, + bibsource = {dblp computer science bibliography, https://dblp.org}, +}