P21Forth 1.02 Assembler
The P21Forth system offer the programmer the ability to write executable routines in the ANS compliant high level compiler. This is done using colon and other high level defining words. Alternately P21Forth also offers the programmer a built in Forth assembler for the MuP21. Since the assembly language of the MuP21 is based on Forth it is easy to learn and use.
To define a new word in assembler one needs to use the Forth word CODE. Like colon CODE takes a name from the input stream for name of the new word being defined. Words in assembler normally end with the next function which returns control to the next Forth word.
The following sequence:
CODE MYCODE ( -- ) next END-CODEWill define a new word in assembler called MYCODE. This shows how CODE and END-CODE, and next are used.
The MuP21 microprocessor has two small on-chip stacks in hardware. The data stack on MuP21 is 6 cells deep, and the return stack on MuP21 is 4 cells deep. There is a register that is used for memory addressing called the `A' register.
MuP21 accesses 20 bit wide cells of memory. These 20 bit wide cells can contain data or instructions. MuP21 only has 24 instructions, so these instructions may be represented with only 5 bits each. Thus a 20 bit cell in memory can contain up to four MuP21 instructions. It is thus possible for the CPU to execute these instructions up to four times faster than it can access memory. Assembler routines in MuP21 are normally written to show how the instructions are packed into words in memory for clarity.
Since P21Forth must support stacks larger than the hardware stacks provided on MuP21 the P21Forth program must maintain stacks in memory like more conventional processors.
At the start and end of all words in P21Forth there are three registers on MuP21 which must be preserved. The A `` '' register will always hold the interpreter pointer `` (IP) '' in Forth. The top of the data stack register will always hold the ``data stack pointer'' (SP). And the top of the return stack register will always hold the return stack `` pointer'' (RP). These registers must contain these things, and they are manipulated by the internals of assembler words in P21Forth.
The data and return stacks in memory in P21Forth are designed togrow upward. Each time an item is added to a stack the pointer to memory is incremented. Each time an item is removed from a stack the pointer to memory is decremented.
The P21Forth word DUP does two things. It duplicates the top item on the Forth data stack in memory, incrementing the data stack pointer in the process, and then advances to the next word in Forth. Here is a definition to do this:
\ CODE DUP ( n -- n n ) \ at the start of this word A=IP T=SP and R=RP \ ( n -- n n ) is a stack diagram showing an item being duplcated CODE DUP ( n -- n n ) \ create a new word in assembler called DUP a push a! @+ \ these four instructions assemble one MuP21 memory cell \ a push gets the IP from the A register and stores it on the \ on chip return stack. Then a! moves SP into the A register. \ @+ fetches the top item from the P21Forth stack in memory \ places it on the top of the MuP21 hardware stack, and \ increments the A register (SP). ! a pop nop \ the data stack pointer (SP) has been incremented the ! \ instruction stores a copy of what was the top of the memory \ stack into the top of that stack. The a instruction then gets \ a copy of the new data stack pointer and places it on the \ top of the MuP21 hardware data stack where SP should be left. \ pop gets the IP from the hardware return stack and then \ then the a! instruction puts the IP back into the A register. a! next \ go to the next Forth word END-CODE \ end this assembler definitionOf course the comments are not needed to make this definition work. This example is intended to show how there are three registers that must hold certain things at the start and end of each word written in assembler.
CODE Name Function Transfer Instructions 00 JUMP Jump to 10 bit address in the lower 10 bits of the current word. Must be the first or second instruction in a word 01 ;' Subroutine return. (pop the address from the top of the return stack and jump to it) 02 T=0 Jump if T=0 03 C=0 Jump if carry is reset 04 CALL Subroutine call. (push the address of the next location in memory to the return stack, and jump to the 10 bit address in the lower 10 bits of the current word.) 05 reserved 06 reserved 07 reserved Memory Access Instructions 08 reserved 09 @A+ fetch a value from memory pointed to by the A register, place it on the top of the data stack, and increment A 0A # fetch the next cell from memory as a literal and place it on the top of the data stack 0B @A fetch a value from memory pointed to by the A register, place it on the top of the data stack, and increment A 0C reserved 0D !A+ remove the item in the top of data stack and store it into memory pointed to by the A register, increment A 0E reserved 0F !A remove the item in the top of data stack and store it into memory pointed to by the A register ALU Instructions 10 COM complement all 21 bits in T (top of data stack) 11 2* shift T left 1 bit ( the bottom bit becomes 0) 12 2/ shift T right 1 bit ( the top two bits remain unchanged) 13 +* Add the second item on the data stack to the top item without removing the second item, if the least signifigant bit of T is 1 14 XOR remove the top two items from the data stack and replace them with the result of logically exclusively-oring them together 15 AND remove the top two items from the data stack and replace them with the result of logically and-ing them together 16 reserved 17 + remove the top two items from the data stack and replace them with the result of adding them together Register Instructions 18 POP move one item from the return stack to the data stack 19 A copy the contents of the A register to the top of stack 1A DUP copy the top of stack to the top of stack 1B OVER copy the second item on the data stack and make it the new top of the data stack 1C PUSH move one item from the data stack to the return stack 1D A! move the top of stack to the A register 1E NOP null operation (delay 10ns) 1F DROP discard the item on the top of the data stackThe P21Forth assembler provides structured flow control. IF, ELSE, and THEN can be used just as they would in high level Forth code. However it should be noted that the IF in the assembler does not remove the flag from the data stack as does the standard high level IF. Chuck has also introduced a similar operation -IF. -IF compiles a C=0 instruction and therefore tests for carry. -IF will execute the code that follows if carry is set, or it will jump to the ELSE or THEN if carry is not set. BEGIN, WHILE, UNTIL, and REPEAT are also supported in the assembler. Chuck has also introduced the -UNTIL which compiles a C=0 and loops until there is carry.
The next word assembles three opcodes that perform the advance to the next Forth word. This is know as the Forth inner interpreter. next assembles @A+ PUSH ; The @A+ fetches the next Forth word pointed to by the A register (IP) and increments the IP. Then the PUSH ; sequence pushes the address to the return stack and then `returns' to that address to execute the next word.
MuP21 is designed to match the hardware on the DRAM chips, which have 1K sized pages. Two addresses are on the same page if they have the same upper ten bits. Care should be taken to ensure that words written in assembler do not contain jumps or calls that are expected to go to a different page. They would not jump or call to a different page of memory with a jump or call instruction directly. A sequence like PUSH ; is needed to jump to an off page location.
There are several things to remember when coding math on the MuP21 microprocessor in assembler. Items read from memory are only 20 bits, but the CPU registers and math operations are 21 bits. The most signifigant bit is both carry bit and a valid addressing bit to memory. If the most signifigant bit (carry) is set in address used for a memory reference then the SRAM memory will be addressed. Addresses 0-FFFFF are in DRAM, but address 100000 up are in SRAM.
This means that if you if you load a 20 bit -1 (FFFFF) from memory and add it to 1 you will get 100000 which is not the same as 0. If you add 1 to a 21 bit -1 (1FFFFF) then the result will be 0 because carry will be reset. Since you cannot store a 21 bit number in memory directly it is done by complementing the number with COM then storing it into memory. When it is fetched COM is used again to reset the lower 20 bits and to set the carry bit. Since -1 is often used to decrement numbers (MuP21 does not have any auto- decrement instructions) there is a faster way to generate a 21 bit -1 than to load a literal 0 and execute a COM instruction. The instruction sequence DUP DUP XOR COM is a faster way to generate a 21 bit -1, but it also uses one extra location on the data stack.
The MuP21 uses a ripple carry mechanism on the + and +* instructions. The carry in the add will move upward through eight bits in the time of a single instruction. This means that the result of adding 1 to 1 is ready in one instruction time, as is the result of adding 127 to 127. But adding 1 to -1 would require carry to more through 20 bits in the process of the add, and this takes longer than one instruction time. To compensate for this a NOP or two may be needed before the + or +* instruction. There will be no need for a NOP if the + or +* is the first instruction in a word of DRAM. The extra delay needed to fetch the word containing the + or +* in the first instruction from DRAM will ensure that there is sufficient time for a correct result from the addition.
The amount and nature of memory access will generally be the limiting factor in the speed of execution of MuP21 programs. DRAMS can access memory on the same page in about 55ns, but memory accesses to a different page will take 150ns. For this reason it is very important to try to keep critical routines to one page of memory and if possible to let them manipulate data on the same page as the code. For this reason the default data and return stacks in P21Forth are on the same page of memory as the most frequently used words in the Forth kernel.
Chuck's code and Dr. Ting's code in the OK Operating System and the code in the OKAD application are very good examples of techniques to get the most speed from MuP21 assembler.
\ ASM.FOX Chuck Moore's 20 bit assembler for MuP21 \ modified for P21Forth Jeff Fox 10/6/94 HEX VOCABULARY ASM \ create the wordlist for the assembler : ASSEMBLER ALSO ASM ; \ ASSEMBLER adds ASM to wordlist ASSEMBLER DEFINITIONS : END-CODE \ get out of assembler PREVIOUS DEFINITIONS ; \ and put definitions wherever that is VARIABLE HI \ pointer to current slot VARIABLE HW \ pointer to current word under assembly : ALIGN ( -- ) \ 0 1 2 3 .. 4 4 HI ! ; \ force slot pointer to overflow : ORG ( a -- ) \ ORG to an address DUP . CR DP ! ALIGN ; \ DP is the eForth CODE POINTER H in OK CREATE MASK ( -- a ) \ 4 masks for 4 slots scrambled bits AA800 , 55400 , 32A , D5 , \ 1 CELL per mask on MuP21 \ compile pattern : P, ( n -- ) AAAAA XOR , ; \ Patterns must be xored AAAAA : #, ( n -- ) \ compile number , ; \ Numbers are normal on MuP21 : ,W ( mask -- ) \ or in masked bits into word HW @ @ OR HW @ ! ; : ,I ( inst -- ) \ assemble instructin in one slot HI @ \ check slot pointer in HI 4 AND \ overflow? IF \ so align slot pointer 0 HI ! \ clear HI DP @ HW ! \ point HW to current location of DP 0 , THEN \ move to next clear location HI @ \ HI points to current slot 0-3 MASK + \ add offset to start of MASK table @ AND \ AND in the mask bits ,W \ assemble instruction to current slot 1 HI +! ; \ bump slot pointer by 1 CELL : INST ( n -- ) \ defining word CREATE , DOES> @ ,I ; \ Chuck' CONSTANT DOES> is not ANSI 6A82A INST COM \ com com com com 55956 INST NOP \ nop nop nop nop : JPI ( n -- ) \ assembler jump instruction CREATE , \ Chuck's CONSTANT DOES> isn't ANSI DOES> @ ( -- a ) BEGIN HI @ 2 AND \ skip slots 2 and 3 WHILE NOP \ by assembling NOPs REPEAT \ then ,I \ assembler the branch instruction 3FF AND 3FF XOR ,W ALIGN ; \ assemble 10 bit address in slots 2 & 3 : BEGIN ( -- n ) \ start a loop structure , leave addr BEGIN HI @ 4 AND \ check for word boundry 0= WHILE NOP \ assmbler NOPs if needed REPEAT DP @ ; : # ( -- n ) \ assemble a literal 99BE6. ,I , ; \ assembler the instruction n and literal : -# ( -- n ) \ assemble a 21bit negative FFFFF XOR # COM ; \ complement then assemble lit & add COM : P ( n -- ) \ assemble a pattern as literal AAAAA XOR # ; : -P ( n -- ) \ assemble a complement pattern literal 55555 XOR # ; AAAAA JPI JUMP A9AA6 JPI T=0 A96A5 JPI C=0 A6A9A JPI CALL A9AA6 JPI UNTIL A96A5 JPI -UNTIL : IF 3FF T=0 HW @ ; : -IF 3FF C=0 HW @ ; : SKIP 3FF JUMP HW @ ; : THEN DUP >R >R BEGIN 3FF AND 3FF XOR R> @ XOR R> ! ; : ELSE SKIP SWAP THEN ; : WHILE IF SWAP ; : REPEAT JUMP THEN ; 9A7E9 INST @A+ 997E5 INST @A 967D9 INST !A+ 957D5 INST !A 6A429 INST 2* 69826 INST 2/ 69425 INST +* 6681A INST XOR 66419 INST AND 65415 INST + 5A96A INST POP 5A569 INST A 59966 INST DUP 59565 INST OVER 5695A INST PUSH 56559 INST A! 55555 INST DROP AA6A9 INST ;' : next \ next macro in eForth assembler @A+ PUSH ;' ALIGN ; \ compiles @A+ PUSH ; and ALIGNs PREVIOUS DEFINITIONS ALSO ASM \ CODE is a Forth word : CODE ( -- ) HERE HEAD, REVEAL \ create header in eForth for HERE HERE HW ! ALIGN \ start assembly at HERE ASSEMBLER DEFINITIONS ; \ any more defintions go into PREVIOUS