Forth-Linda
Linda is a coordination language that can be combined with a standard computational programming language to provide multiprocessor coordination and communication using a tuple space model. A tuple space is a global, associative, object memory that holds tuples and can be implemented on any type of multiprocessor or computer network. A tuple is a sequence of typed fields, where each expression has a value or a potential value. Linda was conceived by Dave Gelernter as a graduate student at the State University of New York at Stonybrook in 1982. Linda was first implemented by Dave Gelernter and James Philbin in 1985.
This document will describe the implementation of Forth-Linda on a simulated multiprocessor where multiple copies of Forth are run in MS-DOS using Desqview, and the implementation of Forth-Linda on the F20 multiprocessor computer system which is under development by Foxware.
In either implementation there is a master processor which manages the network, or simulated network, and multiple worker processors which process active tuples given to them by the master processor.
Computationally intensive programs are broken up into parts that are then taken from the tuple space and executed by worker processors. The more worker processors that are available the faster the active tuples can be distributed and processed. From the perspective of the software the number of worker processors is irrelevant. This approach provides a course grained form of parallelism. Only high level application operations should be implemented using active tuples so as not to create a high overhead of message passing in the tuple space.
The great advantage of Forth-Linda is that load balancing in a multiprocessor is automatic. With Forth-Linda a program works the same way with one hundred worker processors as it does with one worker processor, only one hundred times faster! The programmer need not tailor the application to load specific parts of the algorithm onto specific processors and deal with the details of interprocessor communication. Although a small performance gain may be possible by such custom tailoring of an application to a specific configuration of processors, the designer sacrifices efficiency and portability to a different configuration of processors. The master processor must, of course, know how many worker processors are available, but an application need not!
In implementations of Linda on heterogeneous computer networks the number of worker processors on the network can continuously change, and the master processor may even need to move from one machine to another. However in the F20 multiprocessor or the MS-DOS multiprocessor simulation the number of worker processors remains constant once the system is configured. Heterogeneous computer implementations of Linda must also deal with different types of processors, with different amounts of memory, and different network communication links. In the F20 and MS-DOS implementations all worker processors and their network links are identical and thus the configuration and active tuple assignment functions of the master processor are much simpler than in other Linda implementations.
In the F20 multiprocessor the master processor loads Forth on each worker processor via the Network Interface Unit. It also assigns each NIU a LAN address in the process of configuring the network. Thus on the F20 the master processor will always know how many worker processors there are in the system. In the MS-DOS multiprocessor simulation the number of worker processors will be fixed for a given implementation.
At the lower levels Forth will operate in the normal manner by passing arguments on the stack, and also using variables and constants. All normal Forth operations on the stack as well as variables and constants are local to a worker processor. At higher levels Forth words are executed via the Linda "eval" function. These Forth words can execute on any processor in the system, and must communicate with other high level Forth-Linda words via passive data tuples in the tuple space.
When a worker processor is available for evaluation of active tuples it sends a request to the master processor to get a an active tuple. The master processor then removes an active tuple from tuple space and sends it to a worker processor. The worker processor executes the active tuple. In the process of evaluating a tuple a worker processor can write new active or passive tuples to the tuple space. It may also read or read and remove passive data tuples from the tuple space. When the worker finishes processing the active tuple it writes a passive data tuple back to the tuple space with the final results of the operation and then requests a new active tuple from the tuple space.
The master processor must implement the following functions:
Assigning of active tuples to worker processors that request work.
Matching the tuple space I/O requests from the worker processors with the tuples in the tuple space, and performing all I/O operations on the tuple space.
And on the F20 computer, network management, including giving permission to each worker processor to access the tuple space via the network.
The "rd(", and "rm(" operations from a worker processor use a tuple template. The template is matched against tuples in the tuple space by the master processor. The master associates tuples by matching tuple field types and values to find the appropriate tuple for an operation.
After an active tuple has been assigned to a worker processor for computation the master processor removes the tuple from tuple space. When the worker processor has completed the computation it returns a passive tuple with all fields evaluated to the tuple space with the out operation.
Each worker processor will be assigned a buffer addresses for all communication with the tuple space via the network and master processor. The first word of the buffer contains a flag that indicates when a buffer I/O operation is complete. The remaining area of the buffer contains tuples to be read from or written to tuple space.
In the case of the network simulation unused video text pages are used as the communications buffer since this is one area of memory that is available to all Desqview windows Segment 0B900H is the second text page of CGA or VGA video hardware, and is available to all Forth programs running under Desqview Each worker version of Forth will have its own address in this space through which communication with the master processor and the tuple space will take place.
In the case of the F20 multiprocessor the master processor will load Forth into each worker processor on the network. Each worker will be assigned a specific buffer in the master's memory address space for communication from the worker to the master, and each worker will have a unique address on the LAN and will use the same memory address in the worker address space for communication from the master to each worker.
The use of flags and message buffers, along with the fact that all operations on the actual memory in the tuple space are performed by the master processor ensures that all tuple operations are atomic.
In the MS-DOS multiprocessor simulation the master processor is continuously monitoring the message buffers from the worker processors, each a version of Forth in a Desqview window. However in the F20 multiprocessor computer only one processor may write data to the LAN at one time. In the F20 multiprocessor system the master processor gives permission to each of the worker processors in the token ring, one at a time, to write to the master processor's message buffers. The master processor then performs the actual operations on the tuple space just as it does in the MS- DOS Desqview simulation.
Specification of Tuples in Forth-Linda
The master processor must be able to search all tuples to read or remove tuples from the tuple space. The Forth dictionary is not suitable for storing tuples because tuples do not need to begin with a string, and multiple tuples may begin with the same string. Additionally Forth is not a strongly typed language, and field types must be represented within tuples. For the sake of simplicity in this first implementation of Forth-Linda there will be only two types of fields, integer and string. More than one integer field may represent double integers or even larger data arrays.
Linda normally specifies tuples in the form (f1, ...,fn) where the tuple is contained within parentheses and the fields are separated by commas. Also string fields are normally enclosed in quotation marks. The following are published examples of tuple specifications from C-Linda implementations:
The parentheses characters will be used in Forth-Linda to contain tuple specifications as they are in C-Linda. Strings will be contained within quotes, however as in normal Forth syntax the fields will be separated with spaces.
The following are examples of tuple specifications in Forth-Linda:
Active tuples contain executable Forth words and any arguments that they take. Active tuples are never matched by the master processor, but are simply passed to worker processors for execution. Passive tuples requested from the tuple space by worker processors are specified by the above syntax and are matched against the actual tuples in the tuple space by the master processor. To match a tuple must contain the same number of fields, and have fields of the same type and value. When a field with a potential value is specified then any value in that field is considered a match.
If the command eval( sqrt ) is executed in Forth-Linda by a worker processor the result is that the active tuple eval( sqrt ) is written to the tuple space. This tuple is then removed from the tuple space and assigned to a worker processor for execution. The worker processor executes the Forth word in the active tuple, which in this case, read an integer value, perform a square root, and write the result back to the tuple space as an integer value in a passive tuple of form ( 2 ).
If the command rm( "person" 23 ?x ) is executed in Forth-Linda by a worker processor the worker processor requests that the read operation be performed on the tuple space. The master processor matches the fields in this tuple template with the tuples in the tuple space. The tuple must have three fields, first a string field of length 7 and value "person", the second field must be an integer of value 23, and the third field must be an integer. If the master processor finds such a tuple it removes this tuple from tuple space and passes the value in the third field to the worker processor into the variable "x". If the master processor does not find a matching tuple in the tuple space it will wait and keep trying to match the template to the tuple space until it can find a matching passive tuple to complete the operation.
All tuple fields in the tuple space will contain actual values. Forth-Linda eval( and add( operations will contain actual values in their field expressions. But Forth-Linda rd( and rm( operations will always contain some potential values in their tuple field templates. When these tuple operations are complete the actual values from the tuple space are then transferred to Forth variables or to the Forth stack in the worker processors.
Internal Representation of Tuples
Tuples are essentially treated as long counted strings by all the Forth-Linda words that manipulate them. The first byte of a tuple contains the length of the tuple in words. This limits the size of a packed tuple to 510 bytes in the MSDOS implementation. The tuple space is simply all the current tuples as a very long string. The byte at the first location in tuple space contains either zero (the end of tuple space), or the length in words of that tuple. In addition the tuple also contains the following optional fields:
The tuple space itself is a Forth data array with a pointer to the current top of space for faster access. Internally these operations among others are needed for manipulation of the tuple space:
These five operations are sufficient to make Forth applications run on an unspecified number of processors. Programs are structured so that sections that can be executed in parallel are executed from within eval( expressions. This allows them to be executed on the first available worker processor. These active tuples must communicate with other active tuples by reading, writing, and deleting passive tuples from the tuple space with the add(, rd(,and rm( Forth-Linda tuple words.
These words provide a facility for shared cpu resources, and a global associative memory space available to all processors in the system. Read and remove operations are done by associating matching properties of the data tuples. Since all operations on the tuple space are performed by the master processor they are atomic and the operations on the tuple space are easily monitored for debugging.
In the MS-DOS multiprocessor simulation up to 8 worker processors are supported automatically. A WORKER processor window needs only to be opened and have its worker number assigned with SETWORKER. At that point it is ready to request active tuples from the tuple space, and in the process of executing those active tuples it can add, read, and remove passive data tuples from the tuple space.
In the MS-DOS multiprocessor simulation the MASTER window shares the screen with seven WORKER windows. In this simulation the MASTER window constantly displays the seven flag words from the seven WORKER communication buffers. The values can be see changing from FFFF ( requesting active tuple ) to 0 ( ready ) with an occasional 1 ( posting an active tuple to the worker ) or FFFE ( master assigning an active tuple ) or 2 ( posting passive tuple ) or 3 ( reading passive tuple ).
In the F20 computer network the master processor sends out commands on the LAN through the NIU (Network Interface Unit) to load Forth and set the LAN address for each processor on the network when it boots the LAN. Just as the WORKER windows need only be opened in the MS-DOS simulation, the WORKER processors need only be loaded, and assigned a worker number to become part of the working hypercomputer. In the F20 if a worker processor crashes it can be reloaded at any time and restarted by the master processor. Worker processors do not even need a ROM since they will actually boot from the data loaded by the NIU.
Jeff Fox 8/91
Copyright 1991 Foxware