Preface
This html document is a transcript of a presentation that I did for the Silicon Valley Chapter of the Forth Interest Group back in December 2000 on the history of Forth and aha, a system that I had begun working on. I was told by some old time members of the group that although they had been coming to monthly SVFIG meetings for years that my presentation helped them connect the dots and understand more deeply many things that they had not understood over the years. I hope that this transcript will help other people who have not had the benefit of attending monthly SVFIG meetings to understand the evolution of the ideas in Forth.
Jeff Fox
June 2002
12/16/00 SVFIG
a History of Forth and aha
(video)
All right, well I'd like to thank all of you for
giving me the opportunity to talk about this in front of the group. it is
a chance for me to practice explaining it and
perhaps get some new ideas from some of you.
I need to go into the history of this project a little so that you will
understand what brought me to this point and what my goals and objectives were.
Most of you saw the
`F21 in a Mouse'
demo that I did down here some time
ago where I taken the 68hc05 out of an old mouse and replaced it
with one of my chips (F21) then wrote a mouse driver, an event handler,
a desktop, and an application, a simple crude application as a demonstration.
I put it into the mouse and do demos with it from time to time.
That environment let me to realize some of the limitations that I was
faced with and some of the problems that I needed to solve
to move that environment forward.
Figure 1. Forth time phases. Design Edit Compile Run (ICE) (ICE) Problem Ascii Interpret Execute Analysis Files/Blocks Compile Interpret Dictionary Build Compile Dictionary Search
Chuck Moore often talks about the concept of Forth having four different time spaces; the design time, the edit time, the compile time and the run time, and how each of these is more expensive. How you would like to move things from run time to compile time if you can and move things from compile time to edit time if you can and from edit time to design time if you can, rather than going the other way and waiting to do all these things in your application at run time. that was one of the things that influenced me and I came to the conclusion that I was kind of getting a little tired of the limitations of machineforth, that I had been doing this for ten years now and I was very happy with machineforth but I that there were certain limitations so I decided to spend some time in the design phase and really look at the problems. As Chuck says, solutions do not require large amounts of code they require small amounts of well thought out code. so I began to look at the problems that I was facing.
Now in this diagram (Figure 1.) I have tried to diagram the traditional view of what Forth is: the design phase, you have a problem, you analyze it, you try to figure out how you are going to deal with your problem and solve it. That involves all the problems that you have such as the complexities of hardware, the complexities of operating systems, the complexities of the Forth that you are using, and the complexity of the problem that you are trying to solve. Once you have done that you take your design and you move to the editing phase. incrementally you factor, in this phase you factor your problem into the most logical small groups so that it express the problem in the most eloquent and clear and effective way to solve that problem.
Then you move to the edit phase and you edit the code. Then from the edit phase you move to the compile phase and compile it. Sometimes in compile phase you have to go back to the edit phase and edit the code again and then compile it again and then run it. Then sometimes you have to go back to the edit phase again and then compile your code again and test it incrementally.
Furthermore what we did in these phases was quite traditional.
If you look at the history of Forth (Figure 2.)
Chuck Moore did the first early Forths sometime around 1968
and then he wrote the first commercial Forths and Forth Inc.
was formed in the early seventies and they wrote more Forths.
Later in the seventies the
Forth Interest Group popularized Forth and distributed FIG-Forth.
Later F83, these were not official standards, in a sense, but in a sense
they were standards.
Not like ANS Forth,
ANS Forth
was the
first meta-Forth definition. The first definition of Forth designed to describe
a wide range of implementations, and a wide range of machines. To not
specify implementation details but to specify things that would be portable.
The word CELL, for instance, was key to this because these earlier Forths were
tied to a particular word size machine. So if you look back
at this legacy code from FIG-Forth and from Forth-83 you see a lot of the word
two-times ( 2* ). And two-time sometimes means multiply something
by two and sometimes it means convert from cell
units to addressing units. So when you convert this code from
FIG-Forth to ANS Forth
you have to identify which two-times means two-times
and where it really means CELLS and you replace it with the word CELLS.
And if you are on a
machine which has cell addressing then CELLS is one-times which
is an immediate word that compiles nothing. It becomes two-times
on a sixteen-bit machine with byte addressing, it becomes four-times
on a thirty-two bit machine with byte addressing.
And you have portable code and you have words like
and again you go through your old Forth
and figure out which words should be CELLS
and now you can write portable code.
CELL+ CELL- and again you go through your old Forths and
figure out where you were really referring to cells. Then you
can have portable code that will run on 16-bit, or 32-bit, or
8-bit, or 20-bit machines, whatever.
However up to this point this was basically the picture. You
analyze your problem in the design phase. Then you go to an
editor in the edit phase. The source code was in ascii
string format and you store it either files or blocks.
Then you move to the compiler phase where Chuck Moore talks
about ICE; Interpret, Compile, Execute.
Chuck criticizes many Forth programmers for not having enough
interpretation in their application and for too much compilation
and execution. And if you analyze his code and you look at the
metrics you notice that it's almost half interpreted code and half
compiled code. In the middle of a definition he drops out of
compilation mode and goes into interpretation mode, classically
with a bracket, computes something during that phase of compilation,
then goes back into compilation phase and compiles the result of
that computation.
So your compiler interprets your ascii text, compiles your ascii
text, and can execute the code that you have compiled. And
to do this it builds a dictionary. The dictionary is
typically a linked list of names of your compiled words,
of strings, of data structures. Everything in Forth becomes a word
and this dictionary is the container for that structure.
Of course it doesn't have to be a linked list, it could use a
hash table or have other mechanisms to link your dictionary
together but traditionally it was a linked list.
It gets built at compile time and
it gets searched at compile time whenever you encounter a word
in source it and finds out if it should execute it or compile it
or whatever is appropriate. When you run your application it
executes your compiled code. It can also interpret more source
code at execution time and in some cases it may take advantage
of the fact that the Forth compiler is, in most cases, still
there because have simply extended Forth towards an application
specific language.
But the Forth compiler is still there so your application can
still compile. And that's a big advantage for some things because
you can take certain problems and from within the Forth compiler
generate executable code from some specifications at run time.
It's a very powerful feature and some Forth programmers take
advantage of that today.
So I was in the design phase looking at my problem in F21 in a
Mouse. But before I get to that let me go through history a
little bit more. Around the same time as ANS Forth Chuck Moore
began working of Forth hardware. And as you all know, the first
Forth hardware that he did was the Novix chip and for the Novix
chip he wrote a version of Forth that he called cmForth, which
is on one of his license plates. And cmForth was a little bit
different than traditional Forths. It compiled native code,
it compiled optimized native code. It did not use IMMEDIATE
words with an IMMEDIATE bit (in a dictionary header) the way
Forth had traditionally been done. It used a MACRO wordlist
and a FORTH wordlist and IMMEDIATE words went into the MACRO
wordlist which gets searched first. He did this to simplify
the design and make it run faster.
He wrote into it the optimizations that matched the chip.
Well, as
John's
(John Rible) talk about hardware today
showed,
hardware is strange, it works in strange ways. The details
of how to optimize code on the Novix chip were very specific
to that chip. It was an unusual chip, it had three separate
data busses, one for the data stack, one for the return stack,
and one for main memory. And this allowed it to operate on
all three of those environments in a single clock cycle. So
it could take items from the data stack combine them send
them to the ALU, take something off the return stack,
manipulate the return stack, and operate in the main memory
space all in the same machine cycle. In a single opcode it
could execute up to five traditional Forth high level language
words in a single clock cycle. It was remarkably small and
simple and only used 4000 gates. And I was very impressed by
the way it could out perform Intel's fastest chip at the time
on many problems, especially real-time problems.
Because the
solutions needed by Intel, such as deeper pipelines and
memory caches became very complex. When you added in that there
might be interrupt code running in the background it became
very non-deterministic. And for real-time problems
if you profiled the execution rates it was not the few
times that it runs a little bit slower than normal that got
you, it was those times that it executed your routine a hundred
times slower than normal that made it difficult to solve certain
real-time problems. So I was impressed by the way that Novix
which was a hundred times smaller and much simpler and was able to
outperform these large chips that Intel was making.
And I got a Forthkit III from Computer Cowboys and asked
Chuck Moore a lot of questions about it. I was please with
how open and helpful he was and that the didn't treat me like
some lower person just because I didn't know very much about it.
I didn't know anything about his chip or anything about cmForth and he was
very helpful. And I very much liked the little manual that came
with it. It was full of exercises. It reminded me of the lab
notebook that I had in physics class in college in Electronics.
Where we went through a lot of experiments. It was full
of experiments that you could do and code that he had written
to do each little experiment. It was wonderfully educational.
(note to self: I should publish a copy on the net sometime.)
I was also very impressed by the fact that cmForth was very small.
The total system source including the metacompiler was about
thirty screens. The total system was about 6K words and the
compiler itself was about 2K words. And it could compile itself
incredibly fast. Unless of course you were loading the source
code through a 9600 baud serial link from your host computer
which is what I was doing.
But Chuck was not altogether happy with cmForth. He felt it
was just too complex and too specific for his chip
it had all this bizarre optimization that only matched his chip.
He wanted to design a chip that would not require this kind of
complex optimization to get good clean native code. He wanted
a compiler that could produce native code in a simple way
that would run on a simple chip. So that set him down the road
the MISC chips starting with the original sh-boom
and then later the p20 design that evolved into dr. ting's
p21 the F21, and the i21, p32, P32, P8, v21, this whole
list of chips that he did for different clients. But what
was important about these was that he developed a new
version of Forth for these chips called machineforth that
departed more radically from traditional Forth.
Oh, I'm sorry. I skipped OK. This is an important phase that
Chuck went through. He decided that he liked the ideal of
minimal source code and as Chuck often says the minimal
amount of source code was zero. So he said I'll try that.
He said I'm going to do sourceless programming. He designed
a system in which the code was going to be represented by
itself. He had already decided that this was the correct
way to design a chip. You don't design a chip by providing
a text description and handing it to someone else's compiler
to compile your chip for you, that you have no control over, that
is going to produce something that you don't understand, that is
sub-optimal. As he said many times in those days, "The
map is not the territory."
You want to deal with a chip? You describe the chip precisely.
He designed an environment where he would design the chip at
the component, sub-component, level, not in a high level
description. He decided to do the same thing with his software
and he called it OK. Originally he called it threeForth, 3/4th,
he also has a license plate 3Forth, and the idea is that it is part of a
Forth system. It isn't a complete Forth because it doesn't have an
interpreter it doesn't have a compiler it doesn't have source code
but it still basically Forth.
It's sort of like Forth. It's structured like Forth internally
but he's the compiler and there is no source code. The code
is the description of itself. This became OK, from the Forth
command prompt. And he implemented this on a variety of machines
implementing it each time without source code.
He began by bootstrapping something into a debugger to bootstrap
his activity. Then he would write a hex editor and then an
opcode editor and then he would write a screen editor, a
graphics editor, then a font editor. Then he would use those
as the OK system. OK was the minimal GUI. I don't know,
something like six bytes of code. It was basically a menu
controlled function for each screen in his application
so he had a graphics environment but it was
menu based each graphic screen had a menu of associated commands that
appeared on the screen and a table of eight functions which were
indexed into by his keyboard input.
And he also experimented with what types of keyboards were most
effective for him to enter one of eight key commands. You all
remember those days of three key keyboards and keyboards glued
to wood and keyboards glued to rocks and keyboards attached to
his fingers, that he could operate like this (motioning as with
castanets). And he tried all kinds of different things.
Eventually he became dissatisfied with it. He declared that
sourceless programming was a dead end. And the reason that he
decided that was that he found that what he originally intended,
that the code could represent itself. You could just decompile
it if you wanted to see the source. But he found that decompiling
was not a trivial problem. It was complicated by the fact that
different source code sequences could compile to produce the same
object code. Furthermore data, which was compiled by interpretation
of source code that then generated binary data at compile time, resulted
in just binary data. You just could not decompile it into just how
it was defined. So this ambiguity left him dissatisfied with OK.
Furthermore he wanted to make it easier to port his CAD system
which had been implemented in OK. John Rible was involved in
trying to unravel this large mass of object code with no source
and produce source code so that it could be compiled, or assembled
actually, and produce a new version of OK and OKAD.
So now that he had these tools working and was designing chips
he designed a new type of Forth chip that was simpler conceptually
than the Novix chip. The optimizations required for this chip
were relatively trivial and the compiler to produce that code
become known as machineForth.
This is one version of machineForth, the original version of
machineForth, and this is it. This (holding up a half page of code)
is the optimizing native code compiler for the chip that he designed.
In fact this is a cross compiler written for Dr. Ting's P21 and
it's written in F-PC which is a 16-bit Forth so it's about twice
as complicated as it needs to be since you have to use doubles
to compile 21-bit numbers. Furthermore because it is a cross
compiler and because he did some odd things inside of the silicon,
which he called patterns and numbers, you have to do some
conversions when you cross compile from another system. Now of
course these conversions are nothing new to anyone who has been
involved in this kind of thing before.
The first time that I
converted an application from an embedded Motorola chip to an
embedded Intel chip it didn't work and I said, why isn't this
working? And then I realized that one machine had positive
logic and the other machine had negative logic. That is to
say if I took the same ROM and put it into the other
machine then all of the data statements were now inverted
because an 0FF was now a 00 because a logical one on the data bus
represented five volts on one machine and zero volts on the other
machine. So when moving a ROM from Motorola to Intel I not only
had to change all the code, I had to change a macro for all the
data statements to exclusive-or them with 0FF.
So when I encountered
the concept that if I developed the code on a PC then I would have to
exclusive-or all the data statements with 0AAAAA this didn't seem
too strange to me. (a little laughter) However the first time I
heard Chuck explain this in front of a Forth meeting it confused
the Hell out of me. I said, if he can confuse me, and I've been
doing this now for a while, I'm sure that everybody else in the
room is really confused. And in fact he made it sound like if
anybody who wanted to use the chip, even to add two numbers together,
had to exclusive-or the arguments or exclusive-or the result and
in any event it seemed nightmarishly complicated. And this just
wasn't the case, if you wanted to add two numbers, you put two
numbers on the stack and you add them.
That's all there was to it. But still, if your a cross compiler
writer working on a PC, which has a different bus than this chip,
you have to take that into account. You had to exclusive-or
addresses. You had to exclusive-or numbers. And because
Chuck was working at the hardware level he's thinking of these
things as bit patterns they were represented one way and in
software they were represented another way and he had to switch
back and forth thinking of them as numbers or thinking of them
as patterns. And he'd get up in front of the Forth Interest
Group and talk about this and since for the most part none of
you have to deal with being inside of a chip in a CAD system
and outside of the chip it just was very confusing for everyone.
Eventually he learned not to talk about that. And said that it
was probably a mistake to have ever used that terminology in public
because it really confused people.
So for many years I was using various versions of machineForth,
cross compilers. I wrote a version of machineForth that ran
natively on P21 and it was much simpler and much smaller and
more trivial. As you can see from a page of code it is not
a complicated or complex program. And for the most part I was
very happy with it, but it had its limitations.
One thing about cross compiling makes it difficult to move
through the interactive loop of edit, compile, test, edit
compile, test because you have to edit, compile, download,
test. Sometimes burn a ROM in the process. So it takes
time. And we experimented with tethered Forths so that you
could speed up the process, but still it was a slow process.
Furthermore the chip is relatively small, it doesn't need
to be big because everything, everything in the software is
so compact. John talked about the advantage of these tiny
opcodes but you really don't understand it unless you have
been in there and worked with these tiny 5-bit opcodes
where you can put four opcodes into a 20-bit word in memory.
When he designed P21 Chuck only used instructions where
they had access to a 1K page. He did this because the
RAM chips had 1K pages and everything ran fast if you
kept it on one page. So he said, "Well most programs fit into 1K
so all I need is 1K branches." I thought, "What planet is he from
that he thinks that programs fit in 1K." Well I learned that he
was from planet Forth and that programs did fit in 1K. After
ten years almost all of them, every well written program, fit
in 1K for the most part.
It all depends I guess on what you call a program. People sometimes
refer to a program as a collection of different programs that
get loaded at different times. And he refers to his OKAD system
as being about 20 separate programs, that's the way people
implement it when they write twenty ten megabyte programs and
put them together. And he had twenty 1K programs which he put
together so he demonstrated that his programs do fit in about 1K.
When I worked at iTV we had people who were programming in ANS Forth
and people who were programming in machineForth and for the most part
the machineForth people always kept things under 1K. And when I say
to people that programs fit in 1K they think I am from outer space
and I have to say, "Well, ours is word addressing machine, 1K words
means two and half KBytes. So that's the first translation that
you have to make. The second translation that you have to make is
that 1K words means up to 4K Forth opcodes. So if you compare that
to a 32-bit Forth where words are 32-bits that's 16 KBytes. Now
if you say to a Forther that most programs fit in 16 KBytes they
don't look at you like you were from outer space. They say, "Well
yeah, that's the way we have been doing it for twenty years." But
when you say that they fit in 1K they don't quite grasp what you
mean. But 1K words on our machine might be the equivalent of
16K bytes for other people.
Meanwhile, while I was happily working away in machineForth all
these years writing things for iTV and for myself and my chip
and seeing how remarkably small and bragging about the F21 in
a mouse that many people think that it is Windows when they
see it running has only 600 words of code total. The boot code,
the video drivers, the video coprocessor code, the event handler,
the mouse driver, the desktop, the application, if you add them
all up 600 words. So yes, the compiled code is remarkably small.
But the source code was small, yes smaller than other Forths,
but still I was dealing with this conceptually, source in ascii
files or blocks, interpret, compile, build a dictionary, search
the dictionary, create an executable, go execute it. So I had
remarkably small object code but the source code was still on a
conventional scale.
iTV for instance had a fairly complex application, an operating
system, compressed files that decompress when they come out of
ROM, flash file system, a GUI with windowing and fonts and colors
and font attributes, an Internet browser, a TCP/IP stack,
a whole list of Internet protocols, graphics decode, graphics
display, email, (a Forth command line and remote line) and the
entire image that they had was about 200K words of object code
and data.
Michael Montvelishsky did a profile of it down here
a few years ago and I found it quite striking. It was totally
obvious which modules were written in machine Forth and which
modules were written in ANS Forth. He would say
this module is 1K, and this module is 1K, and this module is 2K,
and this module is 50K, and this module is 70K, and you could
tell which was which.
But the point was that the total source was bigger than the ROM
on F21. So you couldn't compile the code on F21 and have
object code. You just simply could not fit the source code
onto the machine. You had to have a separate machine to build
it.
But here I have this lovely desktop that looks like Windows
with 600 words of code. I don't have the problem of download
files from the Internet, store them in the flash and those
sorts of things in this environment so I would like to put
source code into the flash and be able to do things in a
more sophisticated way. So I wrote down all of the features
that were my implementation goals.
I wanted to be able to have isomorphism in Forth hardware and
software so they matched tightly. Isomorphism in the operating
system, GUI, debugger, editor and applications. All integrated
in the same way. Isomorphism using distributed processing with
symmetric multiprocessing and the network interface that I designed
for my chip and the software designed to do this, a hardware accelerated
GUI using the video accelerator built into a few transistors in Chuck's
design, tokenized source code, a tokenized source code database manager
which would allow source code debugging and virtually instantaneous
compilation. My GUI, that I had written for myself and for iTV, were
based on the Smalltalk model but in Forth
and I wanted to do something like that here.
I expanded this, but I also wanted it to have a Logo like property of
simultaneously being able to edit the source code and instantly have
the compiled code changed so that a child could browse the system,
modify a word and instantly they have new executable code.
I wanted a highly optimized heuristic environment ideal for
artificial intelligence, rule based expert systems, neural net
simulations and genetic algorithms. That's what the hardware
was designed to do, that's what the hardware was optimized to do,
and I wanted software optimized for that purpose.
I wanted it to be a scalable environment where you could plug more
than one chip together and have it be as big as you want. And I
wanted it to be extremely simple, compact, and fast. Those were
my design goals.
Meanwhile I was kind of jealous of Chuck because he had left
MachineForth and was now doing weird new stuff in
ColorForth.
ColorForth is what he describes as putting information into the
white space of your source code for the purpose of making the
source code more clear, the source code smaller, and making
the compiler faster.
It does this in many ways. He has talked about how it works
and the evolution of his colorForth over the years. And he
changes it from time to time. The last explanation, last
July, explained how he was now using Huffman encoding for
his character strings and how he picked out the most advantageous
way to represent each character in a minimum number of bits and
pack them together. And it was very clear that he was
thinking about his hardware. His hardware is word oriented,
it is not byte oriented. He doesn't process bytes. He doesn't
have bytes in his source code, he doesn't have bytes in OK,
he doesn't have bytes in OKAD. He's got words, and all of
his code reads words and his hardware reads words.
Furthermore he pointed out that Forth itself is word oriented. And he
doesn't mean words as in CELLS there, he means words as in
Forth is words and spaces and that that is what makes it a
distinct language. It doesn't have a great deal of syntax.
It doesn't have a lot of rules about how this word has to be
in the context of all of these rules or that you have to
formulate your expressions if you are going to use this
keyword in this way. Forth kind of gets around that, it's got
words and spaces.
A word is like an object, it can contain anything, source code,
executable code, data, links, pointers, whatever. Listening
to Chuck's reasoning, why he was doing things the way he was
doing them in colorForth, it was very obvious that he was
designing code to run on the chips that he designed. Well,
I was writing code to run on the chips that he designed but
I was still stuck in this traditional concept (Figure 1.which I
describe as the thirty year old concept that everybody in
Forth uses and nobody every questions.
This (Figure 1.) is the definition of Forth until Chuck comes
down here and gives presentations that George (Perry) calls
(jokingly) heresy and reminds us that it is alright
to be a heretic and question these thirty year old ideas. So
I looked at what I was doing that came from this and break
out of these old ideas. So I looked at
what I was doing that came from this (ANS),
what I was doing that came from this (cmForth),
what I was doing that came from this (OK),
what I was doing that came from this (machineForth),
that got me up to this phase, and what Chuck was doing
in colorForth that was different and why it was superior
and why I would like to do some of that. And I looked at
my design goals which were to write code that smaller,
faster, and simpler, and I took all of these parts and
started trying to see how I could fit them together and
I said, "Aha!"
So I named the project aha, a word designed to express
an epiphany, an understanding of the nature of something
that suddenly appears before you. And what I did in aha
was to refactor this picture, as Chuck had refactored it.
In colorForth Chuck had taken some of what a compiler was
doing and moved it over here into the editor. As a result
of that his compiler was simpler. Well his editor was
more complicated but hey, editors are not complicated
things. Traditionally people in Forth write their own
editors to do exactly what they want. I have written
them, many people have written them, some of you have
written your own editors. It is generally speaking
part of the Forth style since we had these integrated
development environments with compiler and editor all
rolled together.
By moving functions that used to be in the compiler into
the editor Chuck could not find errors at edit time that
would normally hang around and wait for compile time.
By compile time they are not as fresh in your mind as
when you wrote it. When the compiler pops it up you
have to look at it and figure it out all over again
and think through it. But with Chuck these errors would
get caught at edit time to speed up the development cycle.
Furthermore, it sped up the compiler because now the editor
could package things in a way more advantageous than just
ascii files or ascii blocks. For one thing he was thinking
about hardware that was word oriented. It's not byte oriented,
he doesn't want to pull them in one byte at a time and chew
it up one byte at a time because it not very efficient on his
machines. He wants to pull in a word at a time, so he began
packing his source code into (Huffman) packed counted strings
that are word oriented, cell oriented. So he said, "We
need to recognize that Forth is word oriented and take
advantage of that." His way of taking advantage of that
was to use cells and pack strings and counts into this
representation here (source) the compiler looks at it and
says, ok this is a counted string and it can skip over it if
it is a comment or I can read it in and I know how long it is
I don't have to count through it looking for a space to see
where it ends. Furthermore he used source tokens that could
say, this (next item) is a binary number.
Now you go up to this point (cmForth) and the concept is
that for a number first you search the dictionary, and you
know that it's not there because numbers are not in the
dictionary, but you search it anyway because that how it
works in the old compiler and that's it beauty, it's simple.
But when you get a number first you search the dictionary,
well that's not a big problem for Chuck, he's only got a
few hundred words in his dictionary, but when you start
dealing with Forths with 20,000 words in their dictionary
that's a lot of overhead. So you determine that it's not
in the dictionary and they you try to convert it into a
number. You take the string, and then you parse it as a
string, and then you perform this operation on it to convert
the string to a binary number and if it fails you have an
error which you couldn't catch at edit time. You only caught
that at compile time so you go back to edit time and re-edit
your source code because the compiler, parsing your strings
and converting your strings into numbers, determined that this
was a problem.
So I saw that Chuck was getting this advantage. He moved
functions out of this more expensive time phase (compile time)
to this less expensive time phase (edit phase) passing some
complexity from the compiler to the editor. And I said, I'd
like to do that. I want to have packed strings and counted
strings and identify numbers but there were other good ideas
here. There were other good ideas in cmForth and OK but
basically I am sticking with machineForth. I had this big
pile of ideas and when I looked at how they could all fit
together in a new way and had the `aha' realization I came up
with a new concept for this picture.
I said, well let's do this, let's move the dictionary build
and dictionary search into edit time as well.
How do you do that?
That was two of the mechanisms that I came up with. In fact,
I came up with the concept of instead of using an ascii file
or block I used something much more like a database management
system for the source code. It's in records, and the records
are packaged in little packets. And the packets have little
tokens that tell the compiler, or the editor, what's in them.
And I determined that what I wanted was five different kinds
of packets for my source code. Now, why would I want to do this?
I had the realization that I had been doing a lot of sourceless
programming, because I had these tools. I had simulators
that looked like an old fashion monitor program where you edit the
machine code. You can decompile machine code and see what it
represents. You see the names of the words, you change a name and it changes
the bits for you. You don't have to remember which bits are which
opcodes, you just edit it. But it did have limitations.
One version had a dictionary link so that it would show the
names, the names for subroutine calls .... noise...
OK which was that with binary data you had no idea what it
was suppose to mean except that there is some binary data
there that, well if you want to change it you have to go
way back to here (edit source) and rethink all this
do all this to get back
and that was the limitation. But for a lot of code that I
was doing it wasn't ambiguous, it did decompile completely,
and a very good way to represent that code was with tokens,
very small tokens, 5-bit tokens that happened to be the
opcodes.
So my first record type is Opcode Tokens, 5-bit tokens. In
other words I'm representing one class of words in Forth
by 5-bit numbers.
It is a very attractive idea. I don't lose anything and I
compress Forth words down from
being five character long strings, they average five characters in
length if you count the space character that delimits them,
to 5-bits which is a 6/1 compression.
Now the next type of token is Function Token. And these work
like a traditional tokenized Forth. You take a token, index
into a table, and you execute something. So, these words
(holding up the half-page machineForth listing) with the
exceptions of those that are Forth opcodes, words like IF,
THEN, WHILE, BEGIN, UNTIL, and words that are macros in this
optimizing compiler are represented also by only a few bits. They
are not opcodes but
they are source. Opcode tokens are source also. These tokens
are very small
like the opcode tokens (5 to 6-bits) and they simply go to a
lookup table in memory. You pull them out of a word, you pull
out a few bits, you index into a table, and you execute the
same function that gets done in machineForth.
But in machineForth it works this way. You take the name, you
search the dictionary, find it, and then execute the code. That's
what the compiler had to do. Whereas now I don't have to do a dictionary
search for Opcode Tokens. I don't have to do a dictionary search
for Function Tokens. So now that just leaves me with defined
words to deal with.
Well I wanted to do what Chuck was doing in colorForth; pack those
words, count them, store them packed with a count to speed up
processing. And I said, you know, I can pack them into something
that looks just like a Forth dictionary inside of the source code.
That just has to happen in the editor. The editor processes
these things and builds the dictionary. However the dictionary
has a CFA field (Code Field Address slot) that is not initialized,
it's empty, it's got a zero but it's there, and I get compression that
way.
When used in conjunction with the next type of token, the
String Pointer Token, it becomes really advantageous because
the String Pointer says
now when I encounter that word in the source code the next time
I look in the dictionary, I find the word, and I represent that
as a token, which is a pointer to the CFA field in the dictionary.
So I do the dictionary build, and the dictionary search
I convert the words into this type of token (String
Pointer Token) so that first time you see a word defined,
Chuck does that with a
red word, or what we do with a name that
follows colon ( : ) or a name that follows VARIABLE or a name that
follows CREATE, is to create a packed counted string in a
dictionary. Any time I encounter that word in the future
I package it in a String Pointer Token which represents it as
a pointer into the dictionary. So I compress the strings
down into a cell or less than a cell if I want to limit myself
to less than a million definable words. So I get a lot of compression.
But the real advantage of this that the compiler doesn't have to look
them up in the dictionary. The compiler says, ok, this is a pointer
into the dictionary that already exists, it's a pointer to a CFA,
so the compiler, when it compiles this code, and it encounters the
name of a defined word it just fetches the CFA,
when it compiles object code it sets the CFA field in the dictionary.
Now when it encounters the defined word in the following source code
all it has to do is take the representation, which is a pointer into
the dictionary, take one fetch, get the CFA, turn it into the
appropriate type of subroutine call based on current and destination
addresses and compile it into the object code. So there is no
dictionary search. So by using these four types of tokens I completely eliminate
dictionary searching at compile time.
Now I have a fifth type of record, a binary record. From the compiler's
point of view it is analogous to an opcode record
but it's got a different flag bit in the data record header
that tells the compiler (and editor what it is). I also have
packed counted strings for comments. I can put as
many comments as I want in there and the compiler says ok, this
is a comment and at the same time it knows how long he comment
is and it adds that to the pointer and moves past it and it
doesn't slow down the compiler.
So I can speed up the compiler by an incredible amount by doing
all this. I coded up the function to compile these (Opcode Tokens),
and this (Binary Records), and this (Compressed Binary Records)
record type number five, compressed binary records. If you are
going to represent a thousand zeros in memory you don't want
a thousand statements in source code, you want to say repeat
this zero one thousand times. Much smaller in source code
and much faster, especially in an environment that is limited by
memory operations. The CPU is running at 500 Mhz and the
memory is much slower. It's much better if you can put it
on the stack and then just write it out to memory instead
of read it from memory, write it to memory, read it from
memory, write it to memory. Again, this speeds things up.
So again, the first code that I wrote was the code to process
this type of token and this type of token and this type of token.
I've already got the code that processes these types of tokens
(Function Tokens), I mean I have the code in machineForth,
you saw it right there, that does all the things that these
functions do and it's already written. All I need to do is
convert it to use a table lookup by pulling these tokens out.
The code to process these types of tokens was about five or six
words of object code, several lines of source code. And I benchmarked it at
compiling a maximum of 120 million Forth words per second.
The way I get that
number is that using fast SRAM it takes 30ns to compile
one of these opcode tokens which represents up to four
source words so I am roughly compiling Forth words of that
type at a rate of up to 120 million per second. When I looked at
how many operations I need to pull out a function token,
AND out one of them, index into the table and execute some
code and it is much slower. Way down here around around only
two million per second. Defined words are in between. You
only get one per word so there are more memory operations
involved. So it is somewhere in between say 20 million per
second. I am must pulling that out of the air right now
so I could even use 2 million per second.
It depends on what your application is as to how much compression,
and how much speed you get, but you have to remember
that I am talking about things that are six hundred words
of object code for this complete (mouse demo) system. Ten or twenty
words of source code for this much of the compiler. (Function
and string token processing had not been coded at that time.)
So if you take these numbers with an average of ten million
words per second and figure out how long it takes to compile
ten words and you end up with microsecond compile times.
Now I showed all this stuff to Chuck and he thought it was
very clever. And he said, "That's just in time compiling
but it's so fast that you could compile interrupt service
routines from source code." I thought it was a very funny
idea and I suppose I could. I don't intend to, but what I
can do is now on this little chip I can have a whole bunch
of different applications stored in ROM (FLASH) in this
packeted format and I can pull them out and compile them
as I need to.
The machine doesn't have relocatable code therefore if I am going
to put six applications into the ROM and put six icons on
the desktop I was previously forced to put all six
into fixed locations in memory at all times because the
act of compiling is what ties them to a particular location
in memory and if you know you might to run all six of these
you have to put them at fixed locations. But this scheme
allows me to compile them in microseconds into a location
in memory. So they are effectively relocatable now.
I can put more applications into memory, I can put a whole
bunch of applications in that ROM (FLASH) and pull anything
that the user points to and click on the desktop and it
will come out of ROM and get compiled in a few microseconds.
You have an application now, it pops up, it runs like the
way we all expect a GUI to operate.
What I effectively did was by moving these three functions
over here (Figure 3.) I refactored things. Now I have a
compiler that's pretty and incredibly fast and incredibly small.
And I have an editor which can enforce certain rules about how
the machine code should work. It can optimize the code at edit
time and package it into the database system. It does everything
that me editor does now. It does some of the things that the
machineForth compiler does now. I say it does, but it doesn't
exist yet so it doesn't do anything yet. But the editor will
take some of the functions form a traditional compiler and pass
them upstream in time to this less expensive time phase.
It also allows me to build a source code browser - debugger -
editor - optimizing compiler that will work like Logo and
Smalltalk and Forth all packed together.
The records allow changes to the source code to be localized.
You don't have to compile the whole application, you just
pre-compile the packet that your changing. The editor
can do these things between keystrokes. That's no problem.
And it allow me to build what I wanted, something that looks
like Logo and Smalltalk and Forth but is much smaller and
cleaner and simpler.
And I was really very happy. I wanted to make my code a
little cleaner, a little faster and a little more compact
and make the compiler a little smaller and a little faster.
But in fact I had made it a lot more compact, a lot smaller,
and a lot faster and I was very happy with that.
Now a final note. This is related. I got an email from a
young man in Australia named
Sean Pringle
who had written a stand-alone PC Forth system that he called
Enth as in this is
the n'th Forth system that someone has written. And it was
an ANS Forth based environment with boot code and drivers for
DMA, for the interrupt controller, keyboard and VGA drivers.
Similar to Chuck's stand-alone colorForth that he has been
doing but it is written in assembler and uses VGA instead of
the latest ATI 3D video accelerator card that Chuck has now ported
to, and when Chuck chose to change his video drivers to this
card he locked a lot of people out. Of course the fact that
he is using his own character set, that he is using his own
keyboard arrangement, that he's writing it in colorForth,
and that it's designed to run OKAD already locks almost
everyone out from even being able to get in there and play
with it.
So I was pleased to hear about Sean's doing this but he had
not only written Enth, which is a stand-alone ANS Forth for
the Pentium, but he also wrote Flux which he had much more
fun doing, which is a flavor of colorForth. It uses tokens
like Chuck, color tokens, but it has the same boot code and
driver code for the Pentium as Enth and all that. So we
started having some
discussions about it
and I explained
to him what I was doing in aha with packed counted strings,
building a dictionary in the editor and how I used dictionary
pointers in the source code for defined words and that
he could do that in his colorForth and steal this idea
from aha.
I was very pleased that he came back in two days and
said I implemented it and it works nicely. (laughter) He is
currently on vacation and when he gets back he, he has
a web page about Flux but hasn't done any documentation yet
on Enth. I hope to get a copy of Enth and copy of Flux available at the
UltraTechnology website so that people can get them and
play with them. There are a lot of people who have expressed
interest in playing with either a stand-alone boot Forth
for the Pentium, or a colorForth and I doing it with
Chuck's code is problematic (at this time).
I got a copy of one of Chuck's colorForths and I thought
that it might boot up on that machine (pointing) because it does have
an ATI video card before he switched to the latest 3D ATI
video card and it wouldn't boot up. You can't read the
disk with anything unless you write a utility to pull in
the first track and store it in a file as a binary image
and then you can try to figure out what all those bits
mean and I wasn't about to do that and I don't think
anyone else is either.
So the idea that you could get it assembler source code
and do something with it stand-alone Forth on a Pentium
and have ANS or colorForth I found very attractive. I encouraged Sean.
He was worried about talking about this in public. (he
had lurked in comp.lang.forth) He was worried that
people would jump all over him and call him an idiot
and make fun of him. And you know, he was just a college
kid so he was really worried about that and I said,
"I won't do that. I will support you. I'll encourage
you, I'll try to help you. I'll be happy to post information at my
site and we can talk about it in the MISC mail list.
I know that there's people there who won't
jump all over you for having these ideas."
Since then I have learned of a few other colorForths.
Terry Loveall has a colorForth that he's written that
boots up from floppy under DOS so it is a little bit
like Flux. It doesn't have this function from aha
to eliminate dictionary searches. And by the way, this
is the only mechanism that Sean is using. So his Flux
is much simpler than my aha. I am willing to go the
extra distance of adding in the complexity of Opcode
Tokens and Function Tokens because they compress the
source code down several times more. Because I've
got this environment where there is such a tight
connection between the Forth that Chuck wrote and
the chip that Chuck designed that they just fit
together like that.
So I'm willing to do it, especially since I can compile
Forth source code at a hundred million Forth words
per second by doing it. So I'm willing to live with a
little extra complexity in the aha compiler and the
aha editor to do that. And he (Sean) doesn't need to
do that. And he has written his Flux to look very
much Chuck's colorForth in the sense that it's using a
machineForth virtual machine model. It is kind of
designed to run on Chuck's chips even though it's
implemented on a Pentium. And that's what Chuck is
doing. Chuck is writing for the Pentium but he is
using the machineForth virtual machine model from
his chips. I've also learned of another colorForth from
James Hague.
When I get to the stage of writing an editor I plan to write
several editors. The first editor will be a machineForth
editor. It will look like what I have been doing except
that it will be able to have comments and the editor will
compress the code into these DBMS records to be processed
by the compiler. And I can write an editor that displays
the same thing as colorForth because that's an editor
function not a compiler function.
I can even display it as ANS Forth if I want; for people
who need to see it that way and have a third editor. After
all these editors are jus little tiny programs that I can
pull in and execute with the same tiny little
compiler.
So the colorForth picture now is that I am aware of five
people working on colorForth, Chuck Moore, Terry Loveall,
Sean Pringle, James Hague and myself. Four of them have
working colorForths at the moment and I don't. I'm just
leaving the design phase for the edit phase (bootstrapping)
writing the first code and I am not going to get to writing
this code (editor) until after I write this code (compiler)
so I don't have a colorForth yet but I've made provisions
to do what Chuck is doing and some other things in aha.
So it combines stuff from traditional Forths with ideas
from cmForth, from OK, from machineForth, and
from colorForth and adds some ideas of my own. And that's
the story on aha.
Any questions? (laughter)
Figure 2. Forth evolution timeline.
Aha 00
colorforth 95
machineforth 92
OK 87
ANSforth 86
cmforth 85
F83 83
FIGforth 70s
Commercial Forth 70s
Early Forth 68
Figure 3. Aha time phases.
Design Edit Compile Run
(ICE) (ICE)
Problem x-ascii Interpret Execute
Analysis x-Files/Blocks (some) Compile Interpret
<-------------- (some) Compile Compile
<-------------- Dictionary Build
<-------------- Dictionary Search
dbms
Figure 4. Aha DBMS Record Types.
1 Opcode Tokens
2 Function Tokens
3 Packed Counted Strings
4 String Pointer Tokens
5 Binary Records
Compressed Binary Records