Skip to content
Victor edited this page Dec 19, 2022 · 22 revisions

Clock speed

  • NTSC: 23.01136 MHz
  • PAL: 22.801467 MHz

ABI

The ABI for SuperH is to pass the first four parameters in r4 to r7, and any others on the stack. Registers r8 to r14, pr, macl and mach MUST be preserved if you use them, normally by pushing them on the stack, which is r15. r0 to r7 may be freely changed without saving, and the result is returned in r0 as long as it's 4 bytes or less. For 8 bytes or less, the result is returned in r0 and r1.

The SH-2 and 32X hardware is all big-endian.

Clock cycles

The SH2 has a five stage pipeline - each instruction takes (nearly always with a few exceptions) five cycles to complete. However, the pipe can be loaded on each cycle, so after five cycles for the first instruction, all further instructions complete on the next cycle for an effective cycle count of one. Conditional branching can result in the pipe being flushed, meaning four more cycles. You'll want to read the pipeline section of the Hitachi SH2 Programming Manual for details (section 7). In general, though, you can count most instructions as one cycle long... as long as the code is cached and makes no outside memory fetches/stores.

Memory access

The 32X hardware manual tells you how many cycles for reading/writing various blocks in the SH2 address map. For example, reading SDRAM takes 12 cycles since it does a burst read, but only 2 cycles on a write since writes are not burst. Burst reading reads 8 words (one cache line) in one go of 12 cycles - or 1.5 cycles per word on average (the fastest non-cache memory can be read). However, even when reading a single word that is uncached, it still does a burst read - 8 words are read in 12 cycles, and the other 7 are tossed out. So reading an uncached word in SDRAM is the slowest thing you can do on the SH-2s. Keeping in mind the burst reads on the SH-2 is one of the key things to remember when designing code for the 32X when trying to get as much speed as possible.

The division unit (DIVU)

The hardware division unit can work in parallel with the rest of the CPU.

When a read or write instruction is issued while the division unit is operating, the read or write instruction is continuously extended until the operation ends. This means that instructions that do not access the division unit can be parallel-processed.

For 64:32 bit division, the quotient is accessible from two registers: DVDNT and DVDNTL

The divider can't be saved/restored, so make sure that no function used by interrupt handlers uses the divider.

DMA in 16-byte mode

Note, the DMA in the SH2 can use this burst mode when put in 16-byte mode. If you're trying to get the best speed from DMA, put the source data on 16 byte boundaries, and use the 16 byte transfer word size.

For a 16-byte transfer, the address is incremented by +16 regardless of the SM1 and SM0 values.

CPU cache bus width

The internal cache bus width isn't specified directly, but a couple things allow you to assume it either IS 32 bits, or is fast enough to not matter - the HW manual says it takes one cycle to fetch the data for the CPU regardless of the size requested, and it says the cache data bus uses four longwords to fill the cache AND that the cache data bus is what the CPU reads to get the data, therefore the cache data width is indeed 32 bits.

Internal I/O Register Access Cycles

32X Technical Bulletin #32 - SH2 Internal IO Register Access Cycles - [1994-12-08]

Module Name Minimum Number of Cycles
BSC 3
DMAC 3
DIV 3
UBC 3
INTC 4
MDC (CCR, SBYCR) 4
FRT 11
WDT 11
SCI 11

Access to the internal I/O is done in the following sequence:

  1. A wait occurs if the bus is determined to be busy 1 cycle after the internal I/O access begins.
  2. Internal I/O access occurs after the bus master completes the use of the bus.
  3. After access to the internal I/O is completed, bus access is enabled for the other bus master on hold.

Therefore, the access time to the internal I/O = Wait time + minimum number of cycles

Bus Masters

DMA via DMAC

When cycle stealing, the bus is released for each access. During burst transfers, the bus is released after 1 burst is completed.

Bus Request

For example, when the slave side has the bus right, the master side's internal I/O access will be on wait status until the slave side releases the bus right.

Clone this wiki locally