Skip to content

Latest commit

 

History

History
150 lines (96 loc) · 7.31 KB

wasm32.org

File metadata and controls

150 lines (96 loc) · 7.31 KB

Overview

This is a port of various GNU utilities to work with or on the WebAssembly “architecture”, supported using a JavaScript API by many Web browsers; the compiler used is GCC, which means C and C++ programs (and, in theory, programs in other languages) can be compiled to WebAssembly code that can then be run in a web browser or JavaScript shell.

The first warning is it’s quite slow, partly because of the design of the architecture.

The second warning is that there are legal issues with using GCC to compile software not covered by the GNU General Public License on virtual/emulated architectures. I’m not a legal expert, but the exception in COPYING.RUNTIME seems very clear to me:

Notwithstanding that, Target Code does not include data in any format that is used as a compiler intermediate representation, or used for producing a compiler intermediate representation.

WebAssembly bytecode is used for producing a compiler intermediate representation. That’s what it’s for.

Pipeline

The stages of building a WebAssembly program are as follows:

GCC

GCC invocation:

The GCC cross-compiler, wasm32-virtual-wasm32-gcc is invoked to produce ELF objects and ultimately to link those into an ELF executable.

cc1

GCC translates C/C++ code to assembler code.

This uses the standard GCC frontend, including full optimization, and a custom GCC backend that translates GCC’s internal RTL expressions into wasm32 code.

as

GNU as translates assembler code into an ELF object file.

We use the ELF format for intermediate files containing WebAssembly code, binary data, and debugging information in the DWARF format.

ld

GNU ld links the ELF object files into an ELF executable.

During this stage, special “LEB” relocations are used to patch little-endian base-128 addresses into the WebAssembly source code stored in the ELF files.

wasmify

bin/wasmify-wasm32 invocation:

Afterwards, wasmify-wasm32 is run on the ELF executable to produce a WebAssembly module file.

wasm32.cpp-lds.lds

GNU ld creates a temporary ELF object with WebAssembly code from the ELF executable.

wasm32-wasmify.lds

GNU ld turns this temporary ELF object file into a valid WebAssembly module.

wasmrewrite

bin/wasmrewrite rewrites LEB-128 integers to have minimal length and shortens some instruction sequences.

dyninfo

dyninfo/dyninfo.pl extracts dynamic symbol information from the ELF executable and attaches it to the WebAssembly module as a custom WebAssembly section called “dyninfo”.

wasm2s

common/bin/wasm2s reads a WebAssembly module and produces assembler code that can be modified and reassembled into WebAssembly using wasm32-virtual-wasm32-gcc and wasmify.

Repositories

main repository

The main (meta) repository is at https://github.com/pipcet/asmjs. It includes sub-repositories for binutils/GDB, GCC, glibc, and some example applications.

The Makefile rules in this repository require manual intervention to rebuild things after changes; this is because otherwise, a tiny change in binutils or gcc would force a complete rebuild and slow down development too much.

Design

Assembler language

The WebAssembly assembler language used is based on a direct encoding of the opcode bytestream into mnemonic opcodes; this means expressions will generally be encoded using Forth-like stack-machine order. The WebAssembly standard has moved towards such an encoding, as well.

ELF format

The wasm32 target uses a variant of the ELF format for intermediate files, even though the files ultimately processed by the web browser or JavaScript shell are in WebAssembly format.

endianness

The wasm32 target requires a little-endian VM, and the ELF format is little-endian.

machine identifier

The machine identifier used for the ELF files is 0xXXXX (“WA” in XXX little-endian notation).

32-bit

Currently, WebAssembly allows only for 32-bit addresses, and the wasm32 target uses the 32-bit ELF format.

entry point

The entry point of the program is not specified by the relevant field of the ELF header but by the global symbol __entry. This is because ld -Obinary provides no way of extracting the entry point address.

section contents

data sections

Data sections contain binary data in 32-bit little-endian format. They use standard ELF relocations for pointers to data or code.

Links

Emscripten

http://emscripten.org

Relooper algorithm

https://github.com/kripken/emscripten/raw/master/docs/paper.pdf

asm.js standard

http://asmjs.org

WebAssembly

http://webassembly.github.io/ https://github.com/sunfishcode/wasm-reference-manual/blob/master/WebAssembly.md https://webassembly.github.io/spec/core/bikeshed/

Stack layout

The wasm32 target port uses the VM stack, a stack in the wasm32 “heap” array buffer in addition to the normal WebAssembly stack. The WebAssembly stack’s layout is specific to the JavaScript engine in use and not interesting to us.

During normal operation (function calls that exit normally), space on the VM stack is reserved but nothing is actually written there; when a non-local exit is about to be performed (or certain other conditions are met), each function whose state is recorded on the JavaScript stack writes its state to the VM stack and returns to its caller.

When execution is resumed, only the innermost function is called again at first, and control briefly returns to JavaScript when it exits. The functions being called restore the state in registers and on the JavaScript stack based on the contents of the VM stack before continuing to execute translated JavaScript code.

wasm64 support is severely outdated (and simulates 64-bit operations as 32-bit ones anyway; the wasm MVP will probably not contain 64-bit support).

assembly language

The wasm target uses a conventional assembler approach: the wasm opcodes are used as though they were assembly instructions.

Notation is in RPN order: child nodes of the AST are described first, then their parent node. This can also be read as instructions for a stack machine.

Immediate arguments follow the instruction opcode, with the exception of an immediate argument specifying the value type for a block, loop, or if block; that type is specified in brackets following the mnemonic, with [] for a void type, [i] for i32, [l] for i64, [f] for f32, and [d] for f64.

ELF format

machine identification

For wasm, we use an id of 0x4157, which corresponds to “WA” if formatted in little-endian mode.

relocations

An extra relocation is provided for LEB128 constants.

dummy sections

The wasm32 backend uses a number of dummy ELF sections whose only purpose it is to allocate positions in some index space.

Dynamic Linking

Dynamic linking is supported (in fact, it’s currently the only thing that works) using the WebAssembly module format for libraries and executables.

To-Do List

new mnemonics (local.set rather than set_local etc)

opcode escape (FC)

use non-signalling truncation opcodes

look at WebAssembly 64 state

drop asmjs?

look at multi-threading

look at signalling, if anything like that exists

rename “dyninfo” to “dyninfo.json”

replace hard-coded “/home/pip” path in wasm32.h

set the offset in memory access instructions to -4096, so as to catch NULL pointers.

clean up interpreter mess

hunt down “asmjs” references, such as in the linker script