This document is a literate Cryptol document. This means that if you install
Cryptol from the website you can run cryptol ZUC1_6.md
in your terminal and all of the
definitions will be typecheck, and the test cases can be run.
All text in this document is directly from the ZUC 1.6 specification.
ZUC is a word-oriented stream cipher. It takes a 128-bit initial key and a 128-bit initial vector (IV) as input, and outputs a keystream of 32-bit words (where each 32-bit word is hence called a key-word). This keystream can be used for encryption/decryption.
The execution of ZUC has two stages: initialization stage and working stage. In the first stage, a key/IV initialization is performed, i.e., the cipher is clocked without producing output (see section 3.6.1). The second stage is a working stage. In this stage, with every clock pulse, it produces a 32-bit word of output (see section 3.6.2).
module Primitive::Symmetric::Cipher::Stream::ZUC1_6 where
In this document, integers are represented as decimal numbers unless specified otherwise. We use the
prefix 0x
to indicate hexadecimal numbers, and the prefix 0b
to indicate a number in binary
representation. [Note: the original document uses the subscript 2
to represent binary numbers].
Example 1: Integer a can be written in different representations:
property example1 = a == 0x499602D2 // hexadecimal representation
/\ a == 0b01001001100101100000001011010010 // binary representation
where
a = 1234567890 // decimal representation
In this document, all data variables are presented with the most significant bit(byte) on the left hand side and the least significant bit(byte) on the right hand side.
Example 2:
property example2 = most_sig == take`{1} a // the leftmost bit
/\ least_sig == (a ! 0) // the rightmost bit
where
a = 0b1001001100101100000001011010010
most_sig = 1
least_sig = 0
+ |
The addition of two integers |
ab |
The product of integers a and b |
= |
The assignment operator |
mod |
The modulo operator of integers |
⊕ |
The bit-wise exclusive-OR operation of integers |
⊞ |
The modulo 2^32 addition |
a||b |
The concatenation of strings a and b |
a_H |
The leftmost 16 bits of integer a |
a_L |
The rightmost 16 bits of integer a |
a <<<_n k |
The k-bit cyclic shift of the n bit register a to the left |
a >> 1 |
The 1-bit right shift of integer a |
(a1,a2,...an) → (b1,b2,...bn) |
The assignment of the values of ai to bi in parallel |
Example 3: For any two strings a and b, the presentation of the string c created by the concatenation of a and b also follows the rules defined in section 2.2 i.e., the most significant digits are on the left hand side and the least significant digits are on the right hand side. For instance,
property example3 = a # b == 0x12345678
where
a = 0x1234
b = 0x5678
Example 4:
H : {n} (fin n, n >= 16)
=> [n] -> [16]
H a = take`{16} a
L : {n} (fin n, n >= 16)
=> [n] -> [16]
L a = drop`{back=16} a
property example4 = ah == H a
/\ al == L a
where
a = 0b1001001100101100000001011010010
ah = 0b1001001100101100
al = 0b0000001011010010
Example 5:
property example5 = a >> 1 == 0b01100100110010110000000101101001
where
a = 0b11001001100101100000001011010010
Example 6: Let a0, a1, ... a15, b0, b1, ... b15 be all integer variables. Then
example6 : [16]Integer -> Bit
property example6 as = and [ as@i == bs@i | i <- [0..15] ]
where
bs = as
ZUC has three logical layers, see Fig. 1 in the original document. The top layer is a linear feedback shift register (LFSR) of 16 stages, the middle layer is for bit-reorganization (BR), and the bottom layer is a nonlinear function F.
The linear feedback shift register (LFSR) has 16 of 31-bit cells (s0,s1,...,s15). Each cell si
(0<=i<=15) is restricted to take values from the following set:
{1,2,3,...,2^31-1}
type LFSR = [16][31]
The LFSR has 2 modes of operations: the initialization mode and the working mode.
In the initialization mode, the LFSR receives a 31-bit inputs word u, which is obtained by
removing the rightmost bit from the 32-bit output W of the nonlinear function F, i.e., u=W>>1
.
More specifically, the initialization mode works as follows:
LFSRWithInitialisationMode : [31] -> LFSR -> LFSR
LFSRWithInitialisationMode u s = s@@[1..15] # [s16] //step4
where
v = step1 s //step1
vu = add v u
s16 = if vu != 0 then vu //step2
else 2^^31 - 1 // step3
In the working mode, the LFSR does not receive any input, and it works as follows:
LFSRWithWorkMode : LFSR -> LFSR
LFSRWithWorkMode s = s@@[1..15] # [s16] // step3
where
v = step1 s //step1
s16 = if v != 0 then v // step1
else 2^^31 - 1 // step2
Informative note: Since the multiplication of a 31-bit string s by 2^i over GF(2^31-1) can be implemented by a cyclic shift of s to the left by i bits, only addition modulo 2^31-1 is needed in step 1 of the above function. More precisely, step 1 of the function LFSRWithInitialisationMode can be implemented by
step1 : LFSR -> [31]
step1 s = v
where
v = foldl add 0 [ ((s@15) <<< 15)
, ((s@13) <<< 17)
, ((s@10) <<< 21)
, ((s@4) <<< 20)
, ((s@0) <<< 8)
, (s@0) ]
and the same implementation is needed for step 1 of the function LFSRWithWorkMode.
Informative note: For two elements a, b over GF(2^31-1), the computation of v=a+b mod (2^31-1)
can be done by (1) compute v=a+b
; and (2) if the carry bit is 1, then set v=v+1
. Alternatively,
(and better if the implementation should resist possible timing attacks): (1) compute w=a+b
,
where w is a 32-bit value; and (2) set v = (least significant 31 bits of w)+(most significant bit of w)
.
add : [31] -> [31] -> [31]
add a b = v'
where
v : [32]
v = ((zero # a) + (zero # b))
v' = if v@0 then (drop v) + 1
else drop v
addAlt : [31] -> [31] -> [31]
addAlt a b = v
where
w : [32]
w = (zero # a) + (zero # b)
v = (drop w) + (zero # [w@0])
property addEquiv a b = add a b == addAlt a b
The middle layer of the algorithm is the bit-reorganization. It extracts 128 bits from the cells of the LFSR and forms 4 of 32-bit words, where the first three words will be used by the nonlinear function F in the bottom layer, and the last word will be involved in producing the keystream,
Let s0,s2,s5,s7,s9,s11,s14,s15 by 8 cells of LFSR as in section 3.2. The the bit-reorganization forms 4 of 32-bit words X0,X1,X2,X3 from the above cells as follows:
Bitreorganization : LFSR -> [4][32]
Bitreorganization s = [X0, X1, X2, X3]
where
X0 = H (s@15) # L (s@14)
X1 = L (s@11) # H (s@9)
X2 = L (s@7) # H (s@5)
X3 = L (s@2) # H (s@0)
Note: That s_i are 31-bit integers, so s_iH means bits 30..15 and not 31..16 of s_i, for 0<=i<=15.
The nonlinear function F has 2 of 32-bit memory cells R1 and R2. Let the inputs to F be X0,X1 and X2, which come from the outputs of the bit-reorganization (see section 3.3), then the function F outputs a 32-bit word W. The detailed process of F is as follows:
F : [2][32] -> [3][32] -> ([32], [2][32])
F [R1, R2] [X0, X1, X2] = (W, [R1', R2'])
where
W = (X0 ^ R1) + R2
W1 = R1 + X1
W2 = R2 ^ X2
R1' = S (L1 (L W1 # H W2))
R2' = S (L2 (L W2 # H W1))
where S is a 32x32 S-box, see section 3.4.1, L1 and L2 are linear transforms as defined in section 3.4.2.
The 32x32 S-box S is composed of 4 juxtaposed 8x8 S-boxes, i.e., S=(S0,S1,S2,S3), where S0=S2,S1=S3. The definitions of S0 and S1 can be found in table 3.1 and table 3.2 respectively.
S0 : [8] -> [8]
S0 x = S0Table@h@l
where
[h, l] = split x
S1: [8] -> [8]
S1 x = S1Table@h@l
where
[h, l] = split x
S2 : [8] -> [8]
S2 = S0
S3 : [8] -> [8]
S3 = S1
Let x be an 8-bit input to S0 (or S1). Write x into two hexadecimal digits as x=h||l
, then
the entry at the intersection of the h-th row and the l-th column in table 3.1 (or table 3.2) is
the output of S0 (or S1).
Example 7:
property example7 = S0 0x12 == 0xF9
/\ S1 0x34 == 0xC0
Let the 32-bit input X and the 32-bit output Y of the S-box S be as follows:
S : [32] -> [32]
S X = y0 # y1 # y2 # y3
where
[x0, x1, x2, x3] = split X
y0 = S0 x0
y1 = S1 x1
y2 = S2 x2
y3 = S3 x3
Example 8:
property example8 = S X == Y
where
X = 0x12345678
Y = 0xF9C05A4E
Table 3.1. The S-box S0
S0Table : [16][16][8]
S0Table =
[ [0x3E, 0x72, 0x5B, 0x47, 0xCA, 0xE0, 0x00, 0x33, 0x04, 0xD1, 0x54, 0x98, 0x09, 0xB9, 0x6D, 0xCB]
, [0x7B, 0x1B, 0xF9, 0x32, 0xAF, 0x9D, 0x6A, 0xA5, 0xB8, 0x2D, 0xFC, 0x1D, 0x08, 0x53, 0x03, 0x90]
, [0x4D, 0x4E, 0x84, 0x99, 0xE4, 0xCE, 0xD9, 0x91, 0xDD, 0xB6, 0x85, 0x48, 0x8B, 0x29, 0x6E, 0xAC]
, [0xCD, 0xC1, 0xF8, 0x1E, 0x73, 0x43, 0x69, 0xC6, 0xB5, 0xBD, 0xFD, 0x39, 0x63, 0x20, 0xD4, 0x38]
, [0x76, 0x7D, 0xB2, 0xA7, 0xCF, 0xED, 0x57, 0xC5, 0xF3, 0x2C, 0xBB, 0x14, 0x21, 0x06, 0x55, 0x9B]
, [0xE3, 0xEF, 0x5E, 0x31, 0x4F, 0x7F, 0x5A, 0xA4, 0x0D, 0x82, 0x51, 0x49, 0x5F, 0xBA, 0x58, 0x1C]
, [0x4A, 0x16, 0xD5, 0x17, 0xA8, 0x92, 0x24, 0x1F, 0x8C, 0xFF, 0xD8, 0xAE, 0x2E, 0x01, 0xD3, 0xAD]
, [0x3B, 0x4B, 0xDA, 0x46, 0xEB, 0xC9, 0xDE, 0x9A, 0x8F, 0x87, 0xD7, 0x3A, 0x80, 0x6F, 0x2F, 0xC8]
, [0xB1, 0xB4, 0x37, 0xF7, 0x0A, 0x22, 0x13, 0x28, 0x7C, 0xCC, 0x3C, 0x89, 0xC7, 0xC3, 0x96, 0x56]
, [0x07, 0xBF, 0x7E, 0xF0, 0x0B, 0x2B, 0x97, 0x52, 0x35, 0x41, 0x79, 0x61, 0xA6, 0x4C, 0x10, 0xFE]
, [0xBC, 0x26, 0x95, 0x88, 0x8A, 0xB0, 0xA3, 0xFB, 0xC0, 0x18, 0x94, 0xF2, 0xE1, 0xE5, 0xE9, 0x5D]
, [0xD0, 0xDC, 0x11, 0x66, 0x64, 0x5C, 0xEC, 0x59, 0x42, 0x75, 0x12, 0xF5, 0x74, 0x9C, 0xAA, 0x23]
, [0x0E, 0x86, 0xAB, 0xBE, 0x2A, 0x02, 0xE7, 0x67, 0xE6, 0x44, 0xA2, 0x6C, 0xC2, 0x93, 0x9F, 0xF1]
, [0xF6, 0xFA, 0x36, 0xD2, 0x50, 0x68, 0x9E, 0x62, 0x71, 0x15, 0x3D, 0xD6, 0x40, 0xC4, 0xE2, 0x0F]
, [0x8E, 0x83, 0x77, 0x6B, 0x25, 0x05, 0x3F, 0x0C, 0x30, 0xEA, 0x70, 0xB7, 0xA1, 0xE8, 0xA9, 0x65]
, [0x8D, 0x27, 0x1A, 0xDB, 0x81, 0xB3, 0xA0, 0xF4, 0x45, 0x7A, 0x19, 0xDF, 0xEE, 0x78, 0x34, 0x60]
]
Table 3.2. The S-box S1
S1Table : [16][16][8]
S1Table =
[ [0x55, 0xC2, 0x63, 0x71, 0x3B, 0xC8, 0x47, 0x86, 0x9F, 0x3C, 0xDA, 0x5B, 0x29, 0xAA, 0xFD, 0x77]
, [0x8C, 0xC5, 0x94, 0x0C, 0xA6, 0x1A, 0x13, 0x00, 0xE3, 0xA8, 0x16, 0x72, 0x40, 0xF9, 0xF8, 0x42]
, [0x44, 0x26, 0x68, 0x96, 0x81, 0xD9, 0x45, 0x3E, 0x10, 0x76, 0xC6, 0xA7, 0x8B, 0x39, 0x43, 0xE1]
, [0x3A, 0xB5, 0x56, 0x2A, 0xC0, 0x6D, 0xB3, 0x05, 0x22, 0x66, 0xBF, 0xDC, 0x0B, 0xFA, 0x62, 0x48]
, [0xDD, 0x20, 0x11, 0x06, 0x36, 0xC9, 0xC1, 0xCF, 0xF6, 0x27, 0x52, 0xBB, 0x69, 0xF5, 0xD4, 0x87]
, [0x7F, 0x84, 0x4C, 0xD2, 0x9C, 0x57, 0xA4, 0xBC, 0x4F, 0x9A, 0xDF, 0xFE, 0xD6, 0x8D, 0x7A, 0xEB]
, [0x2B, 0x53, 0xD8, 0x5C, 0xA1, 0x14, 0x17, 0xFB, 0x23, 0xD5, 0x7D, 0x30, 0x67, 0x73, 0x08, 0x09]
, [0xEE, 0xB7, 0x70, 0x3F, 0x61, 0xB2, 0x19, 0x8E, 0x4E, 0xE5, 0x4B, 0x93, 0x8F, 0x5D, 0xDB, 0xA9]
, [0xAD, 0xF1, 0xAE, 0x2E, 0xCB, 0x0D, 0xFC, 0xF4, 0x2D, 0x46, 0x6E, 0x1D, 0x97, 0xE8, 0xD1, 0xE9]
, [0x4D, 0x37, 0xA5, 0x75, 0x5E, 0x83, 0x9E, 0xAB, 0x82, 0x9D, 0xB9, 0x1C, 0xE0, 0xCD, 0x49, 0x89]
, [0x01, 0xB6, 0xBD, 0x58, 0x24, 0xA2, 0x5F, 0x38, 0x78, 0x99, 0x15, 0x90, 0x50, 0xB8, 0x95, 0xE4]
, [0xD0, 0x91, 0xC7, 0xCE, 0xED, 0x0F, 0xB4, 0x6F, 0xA0, 0xCC, 0xF0, 0x02, 0x4A, 0x79, 0xC3, 0xDE]
, [0xA3, 0xEF, 0xEA, 0x51, 0xE6, 0x6B, 0x18, 0xEC, 0x1B, 0x2C, 0x80, 0xF7, 0x74, 0xE7, 0xFF, 0x21]
, [0x5A, 0x6A, 0x54, 0x1E, 0x41, 0x31, 0x92, 0x35, 0xC4, 0x33, 0x07, 0x0A, 0xBA, 0x7E, 0x0E, 0x34]
, [0x88, 0xB1, 0x98, 0x7C, 0xF3, 0x3D, 0x60, 0x6C, 0x7B, 0xCA, 0xD3, 0x1F, 0x32, 0x65, 0x04, 0x28]
, [0x64, 0xBE, 0x85, 0x9B, 0x2F, 0x59, 0x8A, 0xD7, 0xB0, 0x25, 0xAC, 0xAF, 0x12, 0x03, 0xE2, 0xF2]
]
Both L1 and L2 are linear transforms from 32-bit words to 32-bit words, and are defined as follows:
L1 : [32] -> [32]
L1 X = X ^ (X <<< 2) ^ (X <<< 10) ^ (X <<< 18) ^ (X <<< 24)
L2 : [32] -> [32]
L2 X = X ^ (X <<< 8) ^ (X <<< 14) ^ (X <<< 22) ^ (X <<< 30)
The key loading procedure will expand the initial key and the initial vector into 16 of 31-bit integers as the initial state of the LFSR. Let the 128-bit initial key k and the 128-bit initial vector iv be
k=k0||k1||k2||...||k15
and
iv=iv0||iv1||iv2||...||iv15
respectively, where k_i and iv_i, 0<=i<=15, are all bytes. Then k and iv are loaded to the cells s0,s1,...,s15 of LFSR as follows:
D : [16][15]
D = [ 0b100010011010111, 0b010011010111100, 0b110001001101011, 0b001001101011110
, 0b101011110001001, 0b011010111100010, 0b111000100110101, 0b000100110101111
, 0b100110101111000, 0b010111100010011, 0b110101111000100, 0b001101011110001
, 0b101111000100110, 0b011110001001101, 0b111100010011010, 0b100011110101100
]
KeyLoad : [128] -> [128] -> LFSR
KeyLoad k iv = s
where
k' = split`{16} k
iv' = split`{16} iv
s = [ ki # di # ivi | ki <- k' | di <- D | ivi <- iv' ]
The execution of ZUC has two stages: the initialization stage and the working stage.
ZUC : [128] -> [128] -> [inf][32]
ZUC key iv = WorkStage initZuc
where
initZuc = InitializeStage key iv
During the initialization stage, the algorithm calls the key loading procedure (see section 3.5) to load the 128-bit initial key k and the 128-bit initial vector iv into the LFSR, and set the 32-bit memory cells R1 and R2 to be all 0. The the cipher runs the following operations 32 times:
InitializeStage : [128] -> [128] -> (LFSR, [2][32])
InitializeStage k iv = last s'
where
s = KeyLoad k iv
R0 = zero : [2][32]
s' = [ (s, R0)] # [ InitializeStep si Ri | (si, Ri) <- s' | i <- [1..32] ]
InitializeStep : LFSR -> [2][32] -> (LFSR, [2][32])
InitializeStep s Rs = (s', Rs')
where
[X0, X1, X2, _] = Bitreorganization s
(w, Rs') = F Rs [X0, X1, X2]
s' = LFSRWithInitialisationMode (drop (w >> 1)) s
After the initialization stage, the algorithm moves in the working stage. At the working stage, the algorithm executes the following operations once, and discards the output W of F:
WorkStep1 : LFSR -> [2][32] -> (LFSR, [2][32])
WorkStep1 s Rs = (s', Rs')
where
[X0, X1, X2, _] = Bitreorganization s
(_, Rs') = F Rs [X0, X1, X2]
s' = LFSRWithWorkMode s
Then the algorithm goes into the stage of producing keystream, i.e., for each iteration, the following operations are executed once, and a 32-bit word Z is produced as an output:
WorkStep2 : LFSR -> [2][32] -> ([32], LFSR, [2][32])
WorkStep2 s Rs = (Z, s', Rs')
where
[X0, X1, X2, X3] = Bitreorganization s
(w, Rs') = F Rs [X0, X1, X2]
Z = w ^ X3
s' = LFSRWithWorkMode s
WorkStage : (LFSR, [2][32]) -> [inf][32]
WorkStage (s, Rs) = zs
where
(s', Rs') = WorkStep1 s Rs
z0 = zero : [32]
keystream = [ (z0, s', Rs') ]
# [ WorkStep2 si Ri | (_, si, Ri) <- keystream | i <- [1...]]
zs = [ (keystream@i).0 | i <- [1...] ]