#Old Style Intel 8080 Architecture
####Data Path Units – Write a paragraph describing the Data Path Unit architecture for each processor. How do the DPUs compare one to another?
The Intel 8080 is mainly an 8-bit processor, however, it can do some 16-bit operations. The 8080 has seven 8-bit registers, where six of them are sometimes combined to form 3 16-bit registers. It has a 16-bit stack pointer and a 16-bit program counter. The 8080 has 5 internal flag bits: Sign, Zero, Parity, Carry and Auxiliary carry. The 8080 encoded all instructions in a single bype (including register-numbers, but excluding immediates) for simplicity. Sometimes instructions are followed by a few bytes for an immediate operand, memory address, or port number.
Strengths:
- hailed as the first truly usable microprocessor
- impressive architecture that set the groundwork for the x86 series
Weaknesses:
- required +5 V, −5 V and +12 V supplies
####Instruction Format Comparisons – Pick two instructions: Pick a branch instruction and one other instruction. Compare the assembly and machine code format for both of these instructions among all 4 processors.
- beq instruction (J-type):
- assembly: JNZ HHHLLL
- machine: C2 LL HH
- operation: Jump if not zero/equal to LLL,HHH
- add instruction (R-type):
- assembly: ANA B
- machine: A0
- operation: [A]+[B]
#Modern General Purpose MIPS Architecture
####Data Path Units – Write a paragraph describing the Data Path Unit architecture for each processor. How do the DPUs compare one to another?
Major Components:
- PC (program counter)
- Instruction Memory
- Register File
- ALU (arithmetic and logic unit)
- memory
- Control Unit
A general MIPS architecture contains a program counter and 32 registers. The datapath operates on words of data and uses a 32-bit path. The processor starts by fetching instruction from memory. The instruction is then sent to the register file, in which bits 25-21 are connected to the address input of one of the register file read ports. Instruction bits 15 down to 0 are sign-extended (or zero-extended by later implementation and depending on the instruction type) to 32 bits. An ALU is ready to receive two operands, a SrcA, from the register file, and SrcB, which comes from the sign-extended immediate. An ALU Control determines the operation to be used. The outcome of this ALU is ALUResult, is then sent to the Data Memory. From here, the data is read from data memory onto a ReadData bus, which is written back to the destination register in the register file. During this process and cycling, the processor must compute the address of the next instruction, thus another adder is implemented to increment the program counter by 4. Another bus known as WriteData, is also sent to Data Memory if there is a store operation. As a result, ReadData is ignored. Adding basic instructions such as add, sub, an, or, and slt require datapath enhancements involving more muxes to determine which type of instruction is being used and which should be entered into the ALU. A memory to register mux is also available to determine whether or not ReadData or ALUResult is active. Other instructions yet again, such as branch and jump require additional datapath enhancements as well. A control unit computes the control signals based on the opcode and funct fields of the instruction bits. Based on the instruction specified, this CU controls the muxes within the datapath. Within the control unit contains the main decoder and ALU decoder.
Strengths:
- Easy to implement and contains simple control unit.
- Completes operation in one cycle and does not require any nonarchitectural state.
- Exectures an entire instruction in one cycle
Weaknesses:
- Requires long clock cycle to support slowest instruction (lw - load word)
- Requires three adders (expensive)
- Separate instruction and data memoreis (unrealistic)
####Instruction Format Comparisons – Pick two instructions: Pick a branch instruction and one other instruction. Compare the assembly and machine code format for both of these instructions among all 4 processors.
- beq instruction (J-type):
- assembly: beq rs, rt, label
- machine: [opcode (6 bits)][RS (5 bits)][RT (5 bits)][immediate (16 bits)]
- operation: if ([rs]==[rt]) PC=JTA (jump target address)
- add instruction (R-type):
- assembly: add rd, rs, rt
- machine: [op (6 bits)][rs (5 bits)][rt (5 bits)][rd (5 bits)][shamt (5 bits)][funct (6 bits)] op, shamt = 0, funct = 100000
- operation: [rd]=[rs]+[rt]
#Modern Application Specific Nvidia Fermi Architecture
####Data Path Units – Write a paragraph describing the Data Path Unit architecture for each processor. How do the DPUs compare one to another?
Fermi is a GPU architecture created by Nvidia and was used as the primary microarchitecture for their GeForce 400 & GeForce 500 series of GPU’s. From what can be gathered from a multitude of resources across the internet that all communicate similarly vague information on the DPU of Fermi microarchitecture, the DPU is as follows: instructions are loaded onto an instruction cache, where they are then passed on to two warp schedulers. From the warp schedulers, a dispatch unit sends the information to a register file, where the data can then flow to four options: either block of 16 cores, a single block of 16 ld/store units (load/store), or to a single block of special function units. Data can then be sent to shared memory.
Strengths:
- Improved double-precision performance
- ECC surpport (allows GPU computing users to safely deploy large numbers of GPUs in datacenter installations, and also ensure data-sensitive applications like medical imaging and financial options pricing are protected from memory errors)
- Streaming Multiprocessor
- High-performance cores
Weaknesses:
- High-power consumption (244 Watts) compared to Kepler (other microarchitecture used in GPU’s
- Loud
- Doesn’t permit double-precision instructions to be paired with other instructions
####Instruction Format Comparisons – Pick two instructions: Pick a branch instruction and one other instruction. Compare the assembly and machine code format for both of these instructions among all 4 processors.
-
BRA instruction (beq comparison):
- instruction usage: BRA(.U)(.LMT) const mem addr/(-)0xabcd/!labelName;
- machine code: 1110 011110 1110 000000 000000 000000000000000000000000 00000000 000010
-
IADD instruction (add comparison):
- instruction usage: IADD(.SAT)(.X) reg0, (-)reg1, (-)composite operand;
- machine code: 1100 000000 1110 000000 000000 0000000000000000000000 0000000000 010010
#ARM
####Data Path Units - Write a paragraph describing the Data Path Unit architecture for each processor. How do the DPUs compare one to another?
Family of instruction set architectures based on RISC architecture. This kind of processor requires fewer transistors than the processors of most personal computers. This allows the device to be light, portable, and battery powered. For this reason, ARM processors are used in most mobile devices. These devices range from iPods, Blackberry, Game Boy Advanced and most mobile phones. ARM is the primary hardware environment devices on the iOS, Android, Windows Phone, and Blackberry operating systems. Today, 95% of smartphones use chips based on ARM processors.
ARM has 37 total registers. 6 32-bit status registers and 31 32-bit general purpose registers. Of the general purpose registers, only 16 are visible at any one time. The other registers are used for exception processing. Almost every ARM instruction has a conditional execution feature called the predicate. This is a 4 bit code which indicates under what circumstances this instruction is to be executed.
https://github.com/michaellindahl/bugfree-adventure/blob/master/ARM.png
Strengths:
- Fewer transitors provide several benefites
- Low power consumption
- Light weight
- Most popular 32-bit architecture
- Best MIPS to Watts and MIPS to $ ratios
- Pipelined
Weaknesses:
- Cannot execute x86 code without conversion
- Generally less code density than CISC architectures
####Instruction Format Comparisons – Pick two instructions: Pick a branch instruction and one other instruction. Compare the assembly and machine code format for both of these instructions among all 4 processors.
-
Add:
- Machine code: Condition OP code Rn Rd Other info Rm
- Rn, Rd, Rm are 4 bit registers
- OP code is 4 bit
- Condition is the 4-bit predicate
- Op code: 0100
- example in assembly language ADD R0, R0, R3
-
Branch:
- 24-bit signed word offset allows for branch distance of 32 MB
- Machine code: Conditional Op Code Offset-24 bit
- example in assembly language BGT LOOP (<-- jump address)