Skip to content

Latest commit

 

History

History
575 lines (445 loc) · 27.4 KB

README.md

File metadata and controls

575 lines (445 loc) · 27.4 KB

Elbrus architecture

Overview

Elbrus 2000 (Elbrus or e2k for short), is a SPARC-inspired VLIW architecture developed by the Moscow Center for SPARC Technology (MCST).

Elbrus machine code is organized into very long instruction words (VLIW), which consist of multiple so-called syllables that are executed together.

References

Several useful documents about Elbrus are available on the internet, albeit mostly in Russian.

Memory organization

Most operations in Elbrus code either:

  • Take the values of one or more registers, compute a function, and write the result to another register, or
  • Load a value from memory into a register or store a value from a register into memory.

Register file, RF (Регистровый файл , РгФ)

The 256 general-purpose registers of the Register File (RF/РгФ) are divided into two categories:

  • 224 registers are part of the procedure stack in a windowed way. They can become available or unavailable during procedure calls and returns. (See also elbrus-prog chapter 9.3.1.1)
  • 32 registers are global registers. They are available during the whole runtime of a program.
32-bit 64-bit description
%g0 %dg0 Global register (0-31)
%r0 %dr0 Procedure stack register, relative to start of current window
%b[0] %db[0] Mobile base registers, relative to the start of the current window, plus BR

TODO: last eight global registers are designated rotatable area

Changing the register window

The procedure stack contains parameters and local data of procedures. Its top area is stored in the register file (RF). On overflow or underflow of the register file, its contents are automatically swapped in/out of memory. Launch of a new procedure allocates a window on the procedure stack, which may overlap with the calling procedure's window.

Procedure chain stack (стек связующей информации)

Stack of return addresses. It can only be manipulated by the operating system and the hardware. Its top area is stored in CF (chain file) registers.

On this stack the following information is encoded in two quad words:

  • return address
  • compilation unit index descriptor (CUIR)
  • window base (wbs) in the register file
  • presence of real 80 (?)
  • predicate file
  • user stack descriptor
  • rotatable area base
  • processor status register

On overflow or underflow of the chain file, its contents are automatically swapped in/out of memory.

Predicate file, PF (Предикатный файл, ПФ)

Comparison operations produce one-bit results (true or false) that can be stored in the predicate registers.

Predicates can be used in conditional control transfers (jumps/calls), or in the conditional execution of individual operations.

There are 32 predicate registers in the predicate file, which appear as %pred0 to %pred31 in assembly code.

Special purpose registers

Special purpose registers can be read using the rrs and rrd operations, and writing using the rws and rwd operations.

Name Description
CUIR compilation unit index register, индекс дескрипторов модуля компиляции
PSHTP procedure stack hardware top pointer
PSP procedure stack pointer - contains the virtual base address of the procedure stack.
WD window descriptor - contains the base and the size of the current procedure's window into the procedure stack.
PCSHTP procedure chain stack hardware top pointer
USBR user stack base pointer, РгБСП
USD user stack descriptor, ДСП

Regular Instructions

Elbrus' wide instructions (широкая команда, ШК) are comprised of a header syllable and zero or more additional syllables. Wide instructions are 8 byte aligned and up to 16 words (64 bytes) long.

Syllables

Abbreviation Description
HS Header syllable - it encodes length and structure of a wide instruction
SS Stubs syllable - short operations that take only a few bits to encode
ALS Arithmetic logic channel syllable
CS Control syllable
ALES Arithmetic logic extension channel semi-syllable. They extend corresponding ALS. ALES2 and ALES5 are only available on Elbrus v4 and higher.
AAS Array access semi-syllable
LTS Literal syllable - literals to be used as operands
PLS Predicate logic syllable - processing of boolean values
CDS Conditional syllable - specified which operations are to be executed under which condition

The first syllable is the header syllable. It is always present. Presence of other syllables depend on the purpose of the command. Syllables occur in the following order:

  • HS
  • SS
  • ALS0, ALS1, ALS2, ALS3, ALS4, ALS5
  • CS0
  • ALES2, ALES5
  • CS1
  • ALES0, ALES1, ALES3, ALES4
  • AAS0, AAS1, AAS2, AAS3, AAS4, AAS5
  • LTS3, LTS2, LTS1, LTS0
  • PLS2, PLS1, PLS0
  • CDS2, CDS1, CDS0

Syllable packing

Semi-syllables ALES and AAS are a half-word (2 bytes) long. All other syllables are one word (4 bytes) long.

Syllables SS, ALS* and CS0 occur as indicated in the header syllable in the order described above. They are packed, e.g. if header bits indicate presence of ALS0 and ALS2 but not SS nor ALS1, then the syllable ALS0 follows directly after HS and ALS2 follows directly after ALS0.

If presence of ALES2 or ALES5 is indicated, then a whole word is allocated for them, whether both are present or not. The first of both to be present occupies the more significant half of the word, the second is encoded in the less significant half. For example, when looking at the syllables as bytes, if ALES2 and ALES5 are present, then the first two bytes of the little endian word contain ALES5 and the last two bytes contain ALES2. If only ALES5 is present, the first two bytes are empty and the last two bytes contain ALES5.

CS1 may follow right after the previously described syllables.

ALES{0,1,3,4} and AAS* start at the word indicated by the "middle pointer" from the header syllable. Their ordering is the same as for ALES2 and ALES5 (high half first, low half second) but they are all packed. This means that any two syllables of ALES{0,1,3,4} and AAS{0,1} may share a word. ALES* may not share a word with AAS{2,3,4,5} because presence of the latter implies presence of AAS0 and/or AAS1. For example, if ALES0, ALES1, ALES4, AAS0 and AAS2 are indicated, then they are encoded as ALES1, ALES0, AAS0, ALES4, two bytes left empty, and finally AAS2.

LTS*, PLS* and CDS* are decoded starting from the end of the wide command. CDS* and PLS* are not indicated by individual flags but rather by their number. For example, there cannot be a PLS2 without a PLS0 and PLS1. LTS take any remaining words between the other syllables. For example, if after the AAS there are five words remaining in the wide command and two CDS and one PLS are indicated, then two words for LTS are left. They would be encoded as LTS1, LTS0, PLS0, CDS1, CDS0.

We do not know what happens if more syllables are indicated than there is space allocated or if syllables are encoded to overlap.

HS - Header syllable

Bit Name Description
31 ALS5 arithmetic-logic syllable 5 presence
30 ALS4 arithmetic-logic syllable 4 presence
29 ALS3 arithmetic-logic syllable 3 presence
28 ALS2 arithmetic-logic syllable 2 presence
27 ALS1 arithmetic-logic syllable 1 presence
26 ALS0 arithmetic-logic syllable 0 presence
25 ALES5 arithmetic-logic extension syllable 5 presence
24 ALES4 arithmetic-logic extension syllable 4 presence
23 ALES3 arithmetic-logic extension syllable 3 presence
22 ALES2 arithmetic-logic extension syllable 2 presence
21 ALES1 arithmetic-logic extension syllable 1 presence
20 ALES0 arithmetic-logic extension syllable 0 presence
19:18 PLS number of predicate logic syllables
17:16 CDS number of conditional execution syllables
15 CS1 control syllable 1 presence
14 CS0 control syllable 0 presence
13 set_mark
12 SS stub syllable presence
11 -- unused
10 loop_mode
9:7 nop
6:4 Length of instruction, in multiples of 8 bytes, minus 8 bytes
3:0 Number of words occupied by SS, ALS, CS, ALES2, ALES5 - called "middle pointer"

SS - Stubs syllable

Stubs syllable format 1 - SF1
Bit Name Description
31:30 ipd instruction prefetch depth
29 eap end array prefetch
28 bap begin array prefetch
27 srp
26 vfdi
25 crp (?)
24 abgi
23 abgd
22 abnf
21 abnt
20 type type is 0 for SF1
19 abpf
18 abpt
17 alcf
16 alct
15 array access syllable 0 and 2 presence
14 array access syllable 0 and 3 presence
13 array access syllable 1 and 4 presence
12 array access syllable 1 and 5 presence
11:10 ctop ctpr number used in control transfer (ct) instructions
9 ?
8:0 ctcond condition code for control transfers (ct)
Stubs syllable format 2 - SF2
Bit Name Description
31:30 ipd instruction prefetch depth
29:28 encodes invts and flushts, see below
27 srp (?)
26 encodes invts and flushts, see below
25 crp (?)
20 type type is 1 for SF2
4:0 pred pred num
(ss >> 27 & 6) | (ss >> 26 & 1) Description
2 invts
3 flushts
6 invts ? %predN
7 invts ? ~ %predN
ct condition codes

The condition code in the stubs syllable controls under which conditions a control transfer operation is executed.

Bit description
4:0 Predicate number (from pred0 to pred31)
8:5 Condition type
Type syntax description
0 -- never
1 always
2 ? %pred0 if predicate is true
3 ? ~ %pred0 if predicate is false
4 ? #LOOP_END
5 ? #NOT_LOOP_END
6 ? %pred0 || #LOOP_END
7 ? ~ %pred0 && #NOT_LOOP_END
8 (TODO, depends on syllable)
9 (TODO, depends on syllable)
10 (reserved)
11 (reserved)
12 (reserved)
13 (reserved)
14 ? ~ %pred0 || #LOOP_END
15 ? %pred0 && #NOT_LOOP_END

#LOOP_END and #NOT_LOOP_END are sometimes spelled as %LOOP_END and %NOT_LOOP_END.

ALS - Arithmetic-logical syllables

Bit Description
31 Speculative mode
30:24 Opcode
23:16 Operand src1, or opcode extension
15:8 Operand src2
7:0 Operand src3, dst, or cmp opcode extension

See chapter 'Arithmetic-logical operations' for more information on the operands.

ALES - Arithmetic-logical extension syllables

Bit Description
15:8 Opcode2
7:0 src3 (in ALEF1) or opcode extension 2 or cmp opcode extension (in ALEF2)

CS - Control syllables

CS0 and CS1 encode different operations.

Syllable pattern name description
CS0, CS1 0xxxxxxx set* setwd/setbn/setbp/settr
CS1 1xxxxxxx vrfpsz vrfpsz + setwd/setbn/setbp/settr
CS0 2xxxxxxx puttsd puttsd with a multiple-of-8 parameter relative to the start of the current instruction
CS1 200000xx setei
CS1 28000000 setsft
CS0, CS1 300000xx wait wait for specified kinds of operations to complete
CS0 4xxxxxxx disp prepare a relative jump in ctpr1
CS0 5xxxxxxx ldisp prepare an array prefetch program (?) in ctpr1
CS0 6xxxxxxx sdisp prepare a system call in ctpr1
CS0 70000000 return prepare to return from procedure in ctpr1
CS0 8xxxxxxx+ -- disp/ldisp/sdisp/return with ctpr2
CS0 cxxxxxxx+ -- disp/ldisp/sdisp/return with ctpr3
CS1 6xxxx000 setmas Set memory address specifier for load and store operations
set*

The set* operation sets several parameters related to register windows. Most bits are encoded in the CS0 syllable itself, but some are also read from the LTS0 syllable.

According to ldis, setwd is always performed, but settr, setbn, and setbp have to be enabled by setting the corrsponding bits in CS0.

Syl. bit name description
CS1 28 enable vfrpsz
CS 27 enable settr
CS 26 enable setbn
CS 25 enable setbp
CS 22:18 setbp psz=x
CS 17:12 setbn rcur=x
CS 11:6 setbn rsz=x
CS 5:0 setbn rbs=x
LTS0 16:12 vfrpsz rpsz=x
LTS0 11:5 setwd wsz=x
LTS0 4 setwd nfx=x
LTS0 3 setwd dbl=x
wait
Bit name description
5 ma_c wait for all previous memory access operations to complete
4 fl_c wait for all previous cache flush operations to complete
3 ls_c wait for all previous load operations to complete
2 st_c wait for all previous store operations to complete
1 all_e wait for all previous operations to issue all possible exceptions
0 all_c wait for all previous operations to complete
disp/ldisp/sdisp/return

The disp operation prepares a jump to a different location by using one of the control transfer preparation registers (ctpr1 to ctpr3).

bit description
31:30 can be 1, 2, or 3 for ctpr1, ctpr2, or ctpr3 respectively
29:28 can be 0, 1, 2, or 3, for disp, ldisp, sdisp, or return respectively
27:0 offset or system call number

For disp and ldisp, the offset is relative to the start of the current instruction, and in multiples of eight bytes. For example, in an instruction at 0x1000, with CS0=40000042, we get disp %ctpr1, 0x1210.

ldisp is only allowed with ctpr2.

For sdisp, the system call number is not shifted. CS0=6000001a is sdisp %ctpr1, 0x1a.

The return operation doesn't take an offset. The offset field should be zero in this case.

setmas (setting the memory address specifier)

Memory address specifiers control multiple aspects of load and store operations. Their 7-bit format is described elsewhere.

The MAS can be independently specified for load and store operations, in CS1:

CS1 bits description
27:21 MAS for load operations
20:14 MAS for store operations

Array Prefetch Instructions

Array prefetch instructions are run asynchronously on the array access unit. They are always 16 bytes long. To assemble array prefetch instructions, the mnemonic fapb is used. To call an array prefetch program, load its address with ldisp to %ctpr2 (no need to call or ct). Even though array prefetch instructions should only ever be called by ldisp and are not processed using the same facilities as regular instructions, they always seem to be terminated by a regular branch instruction. The maximum length of an array prefetch program is 32 instructions.

Arithmetic-logical operations

ALU operations are generally identified by several aspects:

  • The opcode field in the ALS
  • If a corrsponding ALES exists, the opcode2 field in the ALES
  • Opcode extension, opcode extension 2, and cmp opcode extension, depending on the opcode
  • The ALUs in which the operation can be performed. Sometimes the same opcode can mean different operations in different ALUs (numbered from 0 to 5)

The format of an arithmetic-logical operation (ALOPF) is determined by opcode, channel, and presence of an ALES. The presence and location of additional identifying criteria of an operation as well as operands depend on the ALOPF.

Other variations:

  • Some operations require two ALS
  • Some operations require a Memory Address Specifier (MAS) in CS1
  • Some operations have predicates. Some operations require additional data from CDS.
  • ALOPF1, ALOPF2, ALOPF3, ALOPF7, ALOPF8 require no ALES, all others seem to require an ALES.

Operands and other fields

Field encoded in comment
opcode ales[30:24]
opcode2 ales[15:8]
opcode extension als[23:16]
opcode extension 2 ales[7:0]
cmp opcode extension als[7:5] or ales[7:0]
src1 als[23:16] source operand 1
src2 als[15:8] source operand 2 - can encode access to literal syllables (LTS)
src3 als[7:0] or ales[7:0] source operand 3 - for ALOPF3 and ALOPF13 it is in ALS, for ALOPF21 it is in ALES
dst als[7:0], or als[4:0] for predicate registers destination register

src1 encoding

Pattern Range Description
0xxx xxxx 00-7f Rotatable area procedure stack register
10xx xxxx 80-bf procedure stack register
110x xxxx c0-df constant between 0 and 31
111x xxxx e0-ff global register

src2 encoding

src2 that are not status register numbers are encoded as follows:

Pattern Range Description
0xxx xxxx 00-7f Rotatable area procedure stack register
10xx xxxx 80-bf procedure stack register
1100 xxxx c0-cf constant between 0 and 15
1101 000x d0-d1 reference to 16 bit literal semi-syllable, low half of LTS0 or LTS1
1101 010x d4-d5 reference to 16 bit literal semi-syllable, high half of LTS0 and LTS1
1101 10xx d8-db reference to 32 bit literal syllable LTS0, LTS1, LTS2, or LTS3
1101 11xx dc-de reference to 64 bit literal syllable pair LTS1:LTS0, LTS2:LTS1, or LTS3:LTS2
111x xxxx e0-ff global register

Literal half-syllables are sign-extended on access. Thus, values 0-0x7fff and 0xffff8000-0xffffffff (-0x8000 to -1) can be encoded in a literal half-syllable.

src3 encoding

Pattern Range Description
0xxx xxxx 00-7f Rotatable area procedure stack register
10xx xxxx 80-bf procedure stack register
111x xxxx e0-ff global register

dst encoding

dst that are not predicate register numbers or status register numbers are encoded as follows:

Pattern Range Description
0xxx xxxx 00-7f Rotatable area procedure stack register
10xx xxxx 80-bf procedure stack register
1100 1101 cd %tst
1100 1110 ce %tc
1100 1111 cf %tcd
1101 0001 d1 %ctpr1
1101 0010 d2 %ctpr2
1101 0011 d3 %ctpr3
1101 1110 de %empty.lo
1101 1111 df %empty.hi
111x xxxx e0-ff global register

opcode2 values

Opcode2 Name
0x01 EXT
0x02 EXT1
0x03 EXT2
0x04 FLB
0x05 FLH
0x06 FLW
0x07 FLD
0x08 ICMB0
0x09 ICMB1
0x0a ICMB2
0x0b ICMB3
0x0c FCMB0
0x0d FCMB1
0x0e PFCMB0
0x0f PFCMB1
0x10 LCMBD0
0x11 LCMBD1
0x12 LCMBQ0
0x13 LCMBQ1
0x16 QPFCMB0
0x17 QPFCMB1

Arithmetic-logical operation formats (ALOPF)

Several operand formats are defined.

Format Has ALES? src1 src2 src3 dst opcode ext opcode ext 2 cmp opcode ext Example Comment
1 x x x adds, ld{b,h,w,d}
2 x x x movx, popcnts
3 x x als[7:0] st{b,h,w,d}
7 x x x als[7:5] cmposb dst is a predicate register
8 x x als[7:5] cctopo dst is a predicate register
11 x x x x x muls
11 (with literal) x x x x psllqh These opcodes require a literal in ales[7:0]
12 x x x x x fsqrts Opcode pshufh is special as it requires a literal in ales[7:0].
13 x x x als[7:0] x stq
15 x x x x rws, rwd dst is a status register; opcode2 is EXT; opcode extension 2 is 0xc0
16 x x x x rrs, rrd src2 is a status register; opcode2 is EXT; opcode extension 2 is 0xc0
17 x x x x ales[7:0] pcmpeqbop dst is a predicate register; opcode2 is EXT1
21 x x x ales[7:0] x incs_fb
22 x x x x x movtq opcode2 is EXT; ALES opcode extension is 0xc0

For the locations of operands where none is explicitly specified here, see table 'Operands and other fields'.

TODO: ALOPF5, ALOPF6, ALOPF7, ALOPF9, ALOPF10, ALOPF19

NOTE: ALOPF9 and ALOPF10 have a 16 bit opcode extension

List of operations

The following tables are grouped by opcode2 and sorted by opcode.

Short operations (without ALES)

Opcode ALUs name ALS[23:16] ALS[15:8] ALS[7:0] data width description
0x00 all ands src1 src2 dst 32 bits Compute bit-wise AND of src1 and src2, store result in dst
0x01 all andd src1 src2 dst 64 bits Compute bit-wise AND of src1 and src2, store result in dst
0x10 all adds src1 src2 dst 32 bits Compute bit-wise AND of src1 and src2, store result in dst
0x11 all addd src1 src2 dst 64 bits Compute bit-wise AND of src1 and src2, store result in dst
0x24 25 stb src1 src2 src3 8 bits store 8-bit value from src3 to address at src1+src2
0x25 25 sth src1 src2 src3 16 bits store 16-bit value from src3 to address at src1+src2
0x26 25 stw src1 src2 src3 32 bits store 32-bit value from src3 to address at src1+src2
0x26 0134 bitrevs 0xc0 src2 dst 32 bits
0x27 25 std src1 src2 src3 64 bits store 64-bit value from src3 to address at src1+src2
0x27 0134 bitrevd 0xc0 src2 dst 64 bits
0x64 0235 ldb src1 src2 dst 8 bits load 8-bit value from address at src1+src2, store into dst
0x65 0235 ldh src1 src2 dst 16 bits load 16-bit value from address at src1+src2, store into dst
0x66 0235 ldw src1 src2 dst 32 bits load 32-bit value from address at src1+src2, store into dst
0x67 0235 ldd src1 src2 dst 64 bits load 64-bit value from address at src1+src2, store into dst

EXT (opcode2 = 1)

Opcode ALUs name ALS[23:16] ALS[15:8] ALS[7:0] ALES[7:0] data width description
0x58 0 getsp 0xec src2 dst unused 32 -> 64 Add src2 to user stack pointer, store in user stack pointer and dst