This is a port of various GNU utilities to work with or on the WebAssembly “architecture”, supported using a JavaScript API by many Web browsers; the compiler used is GCC, which means C and C++ programs (and, in theory, programs in other languages) can be compiled to WebAssembly code that can then be run in a web browser or JavaScript shell.
The first warning is it’s quite slow, partly because of the design of the architecture.
The second warning is that there are legal issues with using GCC to compile software not covered by the GNU General Public License on virtual/emulated architectures. I’m not a legal expert, but the exception in COPYING.RUNTIME seems very clear to me:
Notwithstanding that, Target Code does not include data in any format that is used as a compiler intermediate representation, or used for producing a compiler intermediate representation.
WebAssembly bytecode is used for producing a compiler intermediate representation. That’s what it’s for.
The stages of building a WebAssembly program are as follows:
GCC invocation:
The GCC cross-compiler, wasm32-virtual-wasm32-gcc
is invoked to produce ELF objects and ultimately to link those into an ELF executable.
GCC translates C/C++ code to assembler code.
This uses the standard GCC frontend, including full optimization, and a custom GCC backend that translates GCC’s internal RTL expressions into wasm32 code.
GNU as translates assembler code into an ELF object file.
We use the ELF format for intermediate files containing WebAssembly code, binary data, and debugging information in the DWARF format.
GNU ld links the ELF object files into an ELF executable.
During this stage, special “LEB” relocations are used to patch little-endian base-128 addresses into the WebAssembly source code stored in the ELF files.
bin/wasmify-wasm32 invocation:
Afterwards, wasmify-wasm32
is run on the ELF executable to produce a WebAssembly module file.
GNU ld creates a temporary ELF object with WebAssembly code from the ELF executable.
GNU ld turns this temporary ELF object file into a valid WebAssembly module.
bin/wasmrewrite
rewrites LEB-128 integers to have minimal length and shortens some instruction sequences.
dyninfo/dyninfo.pl
extracts dynamic symbol information from the ELF executable and attaches it to the WebAssembly module as a custom WebAssembly section called “dyninfo”.
common/bin/wasm2s
reads a WebAssembly module and produces assembler code that can be modified and reassembled into WebAssembly using wasm32-virtual-wasm32-gcc
and wasmify
.
The main (meta) repository is at https://github.com/pipcet/asmjs. It includes sub-repositories for binutils/GDB, GCC, glibc, and some example applications.
The Makefile rules in this repository require manual intervention to rebuild things after changes; this is because otherwise, a tiny change in binutils or gcc would force a complete rebuild and slow down development too much.
The WebAssembly assembler language used is based on a direct encoding of the opcode bytestream into mnemonic opcodes; this means expressions will generally be encoded using Forth-like stack-machine order. The WebAssembly standard has moved towards such an encoding, as well.
The wasm32 target uses a variant of the ELF format for intermediate files, even though the files ultimately processed by the web browser or JavaScript shell are in WebAssembly format.
The wasm32 target requires a little-endian VM, and the ELF format is little-endian.
The machine identifier used for the ELF files is 0xXXXX (“WA” in XXX little-endian notation).
Currently, WebAssembly allows only for 32-bit addresses, and the wasm32 target uses the 32-bit ELF format.
The entry point of the program is not specified by the relevant field of the ELF header but by the global symbol __entry
. This is because ld -Obinary
provides no way of extracting the entry point address.
Data sections contain binary data in 32-bit little-endian format. They use standard ELF relocations for pointers to data or code.
https://github.com/kripken/emscripten/raw/master/docs/paper.pdf
http://webassembly.github.io/ https://github.com/sunfishcode/wasm-reference-manual/blob/master/WebAssembly.md https://webassembly.github.io/spec/core/bikeshed/
The wasm32 target port uses the VM stack, a stack in the wasm32 “heap” array buffer in addition to the normal WebAssembly stack. The WebAssembly stack’s layout is specific to the JavaScript engine in use and not interesting to us.
During normal operation (function calls that exit normally), space on the VM stack is reserved but nothing is actually written there; when a non-local exit is about to be performed (or certain other conditions are met), each function whose state is recorded on the JavaScript stack writes its state to the VM stack and returns to its caller.
When execution is resumed, only the innermost function is called again at first, and control briefly returns to JavaScript when it exits. The functions being called restore the state in registers and on the JavaScript stack based on the contents of the VM stack before continuing to execute translated JavaScript code.
wasm64 support is severely outdated (and simulates 64-bit operations as 32-bit ones anyway; the wasm MVP will probably not contain 64-bit support).
The wasm target uses a conventional assembler approach: the wasm opcodes are used as though they were assembly instructions.
Notation is in RPN order: child nodes of the AST are described first, then their parent node. This can also be read as instructions for a stack machine.
Immediate arguments follow the instruction opcode, with the exception of an immediate argument specifying the value type for a block, loop, or if block; that type is specified in brackets following the mnemonic, with [] for a void type, [i] for i32, [l] for i64, [f] for f32, and [d] for f64.
For wasm, we use an id of 0x4157, which corresponds to “WA” if formatted in little-endian mode.
An extra relocation is provided for LEB128 constants.
The wasm32 backend uses a number of dummy ELF sections whose only purpose it is to allocate positions in some index space.
Dynamic linking is supported (in fact, it’s currently the only thing that works) using the WebAssembly module format for libraries and executables.