The F21 CPU is based on the MuP21 however F21 has many improvements over MuP21 such as a 5 times faster CPU clock, deeper stacks, more instructions, more branch instruction modes, more SRAM addressing, and more coprocessors on chip so that in addition to MuP21's video output F21 offers video I/O, analog I/O, serial network I/O, and a parallel I/O port on chip. F21 has a transistor count of about 15,000 vs about 7,000 for MuP21.
The F21 CPU contains an on chip DATA STACK with 18 levels, and an on chip RETURN STACK with 17 levels, a memory addressing register A, and a program counter. Instructions are 5 bits wide, and with 4 instructions packed into a 20 bit word the CPU can run at up to four times the speed of memory access (when executing sequential stack based instructions) and executes instructions at 2 nanoseconds each. Internally the CPU operates at 500 MIPS, but memory access timing will limit the actual throughput of the F21 CPU to about 333MIPS in ROM, 200 MIPS in SRAM, and 100 MIPS in DRAM. Memory bandwidth is also used by both the CPU and I/O processors, so video or other coprocessor operation further reduces CPU access to memory.
CPU Instructions are based on Forth:
JMP unconditional jump ( 3 types, 10 bit, 14 bit, home page) T0 branch if TOP of stack =0 ( DUP IF ) ( 3 types) C0 branch if CARRY bit not set ( 3 types) CALL subroutine call ( 3 types) RET subroutine return # literal, place immediate word into top of stack @A+ place memory contents pointed to by register A in TOP, increment A @R+ place memory contents pointed to by register R in TOP, increment R @A place memory contents pointed to by register A in TOP of stack !A+ store TOP of stack into memory pointed to by A, increment A !R+ store TOP of stack into memory pointed to by R, increment R !A store TOP of stack into memory pointed to by A COM complement TOP of stack AND AND TOP of stack with NEXT and leave result in TOP -OR EXCLUSIVE OR TOP of stack with NEXT and leave result in TOP + ADD TOP of stack to NEXT and leave result in TOP 2* left shift TOP 2/ right shift TOP +* ADD TOP of stack to NEXT and leave result in TOP, NEXT unchanged (perform add only if the least signifigant bit of T = 1) A copy A to TOP of stack A! move TOP of stack to A DUP duplicate TOP of stack DROP discard TOP of stack OVER duplicate the second item to the TOP of Data stack PUSH TOP of DATA stack to TOP of RETURN stack POP TOP of RETURN stack to TOP of DATA stack NOP No CPU operation
A register (T) acts as the top of the data stack. All data are placed
in T; its prior contents are pushed onto S.
The ALU acts upon T and S, leaves its result in T and pops S for
binary operations (+ -or and).
A register (A) is used to address data.
A program counter (P) is used to address instructions.
The return stack stores subroutine return addresses (and occassional data).
A configuration register (C) specifies timing and addressing options.
F21 uses a physical bus that represents the number or address 00000 with positive logic on even bits and negative logic on odd bits. This means that the package pins show AAAAA for the number or address 00000. Alternate bits on the pins are complemented. -or a number with 0AAAAA to determine its pattern. Thus, the number 00100F has the pattern 0ABAA5 on the pins. The ALU acts upon numbers; addresses are numbers. The configuration register stores patterns; the package pins display patterns.
bit 20 ...15 ...10 ....5 ....0 slot0 slot1 slot2 slot3 10-bit jump slot0 jump aa aaaa aaaa address p pppp pppp ppaa aaaa aaaa (p from P register) 14-bit jump jump 0aa aaaa aaaa aaaa address p pppp ppaa aaaa aaaa aaaa home jump jump 1aa aaaa aaaa aaaa address c 0c00 00aa aaaa aaaa aaaa (c is C17)The contents of slots 1-3 must be complemented, whether instructions or address. The 3rd jump format, home page jumps, facilitate jumping from DRAM into SRAM since the home page location may be set to DRAM or SRAM by setting memory configuration register bit 17 (c17). A full 21-bit jump requires pushing an address into R and executing the ; instruction.
If the configuration register bit c17 is set then the home page address becomes address 140000, which is high speed SRAM, if it is not set then the home page address is 0 in DRAM. The 10-bit jumps are faster than offpage jumps in DRAM and frees the first instruction slot for use by another opcode. Single cell branch instructions can cover a range of 16k words in DRAM and 8k words in SRAM, and subroutine returns move freely between SRAM and DRAM since the return stack is 21 bits wide.
The 27 instruction codes are: 00 else unconditional jump 08 @R+ fetch, address in R, increment R 01 T=0 jump if T0-19 zero 09 @A+ fetch, address in A, increment A 02 call push P+1 to R, jump 0A # fetch 20-bit in-line literal 03 C=0 jump if T20 zero 0B @A fetch, address in A 04 0C !R+ store, address in R, increment R 05 0D !A+ store, address in A, increment A 06 ret pop P from R 0E 07 0F !A store, address in A 10 com complement T 18 pop pop R, push into T 11 2* shift T, 0 to T0 19 A@ push A into T 12 2/ shift T, T20 to T19 1A dup push T into T 13 +* add S to T if T0 one 1B over push S into T 14 -or exclusive-or S to T 1C push pop T, push into R 15 and and S to T 1D A! pop T into A 16 1E nop 17 + add S to T 1F drop pop T Code Name Description As Forth (where A is a variable) 00 else unconditional jump ELSE 01 T=0 jump if T0-19 zero DUP IF 02 call push P+1 to R, jump : 03 C=0 jump if T20 zero CARRY? IF 04 05 06 ret pop P from R ; 07 08 @R+ fetch, address in R, increment R R @ R> 1+ >R 09 @A+ fetch, address in A, increment A A @ @ 1 A +! 0A # fetch 20-bit in-line literal LIT 0B @A fetch, address in A A @ @ 0C !R+ store, address in R, increment R R ! R> 1+ >R 0D !A+ store, address in A, increment A A @ ! 1 A +! 0E 0F !A store, address in A A @ ! 10 com complement T -1 XOR 11 2* shift T, 0 to T0 2* 12 2/ shift T, T20 to T19 2/ 13 +* add S to T if T0 one DUP 1 AND IF OVER + THEN 14 -or exclusive-or S to T XOR 15 and and S to T AND 16 17 + add S to T + 18 pop pop R, push into T R> 19 A@ push A into T A @ 1A dup push T into T DUP 1B over push S into T OVER 1C push pop T, push into R >R 1D A! pop T into A A ! 1E nop NOP 1F drop pop T DROP Forth macros A! @A @ A! !A ! dup dup -or com -1 dup dup -or 0 over com and -or OR A! push A@ pop SWAP # (com) push ; long_jump
At least 3 stack positions must be available (reserved) for interrupts, 2 on data and 1 on return, this is because a useful interrupt service routine will need at least two data and one return stack positions. Register A must be saved and restored in the interrupt service routine.
The cause of the interrupt is in configuration register bits 2 through 0 (C2-0). The configuration register (C) must be read to determine the interrupt source. The interrupt is cleared when C is rewritten, which may only occur once. It is intended that this code be executed at the end of interrupt processing (say for C0):
A 015554 # com A! ( pattern 1 1110 0000 0000 0000 0--1)The address bits A2-0 specify the interrupt(s) to be cleared. @A !A A! ; must all be in the same word. Another interrupt may occur immediately.
@A !A A! ;
Interrupts are edge-triggered. If one is repeated before being cleared, it's lost.
RAM DRAM ROM 10 12 15 25 40 140 4 Memory speed in ns 200 180 160 115 80 27 333 MIPS With 1 instruction accessing data in the same memory: 110 40 150 MIPS