%File: PLES1.TRA (c) 09/24/80 The Soft Warehouse % MATHTRACE: FALSE $ RFFIRST: 'RFFIRST $ MOVD (PRINT, #PRINT) $ FUNCTION PRINT (EX1), WHEN ATOM (EX1), #PRINT (EX1) EXIT, #PRINT (LPAR), PRINT (FIRST (EX1)), #PRINT (" . "), PRINT (REST (EX1)), #PRINT (RPAR), EX1, ENDFUN $ MOVD (PRINTLINE, #PRINTLINE) $ FUNCTION PRINTLINE (EX1), PRINT (EX1), NEWLINE (), EX1, ENDFUN $ ECHO: TRUE $ % This is the first of a sequence of interactive lessons about muSIMP programming. It presumes as a minimum that you have read section 8 of the muSIMP/muMATH Reference Manual. Also the first part of CLES1.ARI should be completed since it explains the mechanics of how to interact with the lessons. The file TRACE.MUS should be loaded. muSIMP supplies a number of built-in functions and operators. The calculator-mode lessons introduced a few of these, such as RDS, RECLAIM, +, *, etc. These progamming-mode lessons introduce more built-in functions and operators, but more important, the lessons reveal how to supplement the built-in functions and operators with definitions of your own, thus extending muSIMP itself. It is important to realize that, until the last programming-mode lessons, we will not deal with muMATH. Instead we deal first with muSIMP, the language in which muMATH is written. The illustrative examples for these first few lessons are utterly different from muMATH, because we want to suggest a few of the many other applications for which muSIMP is especially well suited, and because we want these lessons to be comprehensible regardless of math training level. Data is what programs operate upon. The most primitive UNSTRUCTURED muSIMP data are integers and names, collectively called ATOMS to suggest their indivisibility by ordinary means. Some programs must distinguish these two types of atoms, so there are two corresponding RECOGNIZER functions: % INTEGER (X76#) ; NAME (X76#) ; EG: -3271 $ INTEGER (EG) ; NAME (EG) ; % Do you suppose that "137", " ", "", and "X + 3", with the quotation marks included, are integers, names, or invalid? Find out for yourself% RDS: FALSE $ % Data can be stored in the computer's memory. The location at which a data item is stored is called its ADDRESS. An address is analogous to a street address on the outside of a mailbox. The data stored there is analogous to mail inside the mailbox. As with a row of mailboxes, the contents of computer memory can change over time. There are useful programs which deal only with unstructured data, but the most interesting applications involve aggregates of primitive data elements. One way to make an aggregate of 2 data elements is to use a structural data element called a NODE, which stores the addresses of the 2 data elements. Thus, a node is "data" consisting of addresses of 2 other data items. For example, suppose that we wish to represent the aggregate consisting of the name BILBO and his age 31. We could store the name BILBO beginning at location 7, the number 31 beginning at location 2, and the node beginning at location 4. Then, begining at location 4, there would be stored the addresses 7 and 2, as illustrated in the following diagram: Address: 1 2 3 4 5 6 7 +-------+-------+-------+-------+-------+-------+-------+-- Contents: | | 31 | | 7 | 2 | | BILBO | +-------+-------+-------+-------+-------+-------+-------+-- Is that clear? The specific placement of data within memory is managed auto- matically, so all we are concerned about is the specific name and number values and the connectivity of the aggregates. Thus, for our purposes it is best to suppress the irrelevant distracting detail associated with the specific addresses. The following diagram is one helpful way to portray only what we are concerned about: +----+----+ | / | \ | +-/--+--\-+ / \ BILBO 31 This imagery suggests the word "pointers" for the addresses stored in nodes. If you have seen one bisected box you have seen them all, so to reduce the clutter and thus emphasize the essential features, we henceforth represent such nodes by a mere vertex in our diagrams, giving schematics such as /\ / \ BILBO 31 Although most muSIMP programs use such aggregates internally, many muSIMP programs are designed to read and print data in whatever specialized notation is most suitable for the application. For example, muMATH uses operator and functional notation. Suppose however that we want to specify such aggregates directly in input and output. How can we do it? If we have a nice graphics terminal, then then we conveniently could use diagrams such as the above. Most of us do not have nice graphics terminals, so we must use another external representation. For this purpose muSIMP uses a representation consisting of the first data item, followed by the second data item, separated by a dot and spaces, all enclosed in a pair of matching parentheses. For example: (BILBO . 31) We call this representation of a node a DOTTED PAIR. However, this is a diferent use of parentheses and periods from how they are otherwise used in muSIMP input, so we must preceed the dotted pair by the single- quote prefix operator to indicate to the parser that we are using dotted-pair notation rather than the usual operator or functional notation: '(BILBO . 31) Moreover, we must use an ampersand rather than a semicolon as the expression-terminator in order to inform the driver to print the expression as a dotted pair rather than attempt to print it using operator and functional notation. We say "attempt" because not all dotted pairs are appropriate for operator or functional printing, as we will explain in the last lessons. Here then is an example of dotted- pair input and printing: % '(78 . TROMBONES) & % Try a few of your own, and note what happens when you forget the single-quote or use a semicolon rather than an ampersand: % RDS: FALSE $ % What about when we want to represent an aggregate of more than two atomic data elements? For example, what if we want to include BILBO's last name, BAGGINS? Well, we can let one of the pointers of a node point to another node, whose first pointer points to BILBO and whose other pointer points to BAGGINS. For example: /\ / \ /\ 31 / \ BILBO BAGGINS We can input this as a dotted pair nested within a dotted pair: % '((BILBO . BAGGINS) . 31) & % Note that we only quote the outermost dotted pair. Now suppose that we want to also include BILBO's species, structured as follows: /\ / \ /\ HOBBIT / \ /\ 31 / \ BILBO BAGGINS How would you input that? Remember, your input must be terminated by an ampersand. % RDS: FALSE $ % We would input it as: % EG: '(((BILBO . BAGGINS) . 31) . HOBBIT) & % An alternative structure for this information is the one corresponding to the input '((BILBO . BAGGINS) . (31 . HOBBIT)). On a piece of scratch paper, sketch the corresponding diagram, then hold it close to my face so I can check it. ----- / O^O \ \ \-/ / \---/ % RDS: FALSE $ % My eyes must be getting bad, I couldn't see it. Oh well... Since either element of a dotted pair can be a dotted pair, they can be used to represent arbitrary "binary tree structures". Moreover, although perhaps unprintable using pure dotted-pair notation, linked networks of binary nodes can be used to represent any data-structure whatsoever. In order to do anything interesting with data aggregates, a program must be able to extract their parts. Accordingly, there are a pair of SELECTOR functions named FIRST and REST which respectively return the left and right pointers in a node. For example: % REST (EG) & FIRST (EG) & FIRST (FIRST (EG)) & REST (FIRST (EG)) & % See if you can extract BILBO and BAGGINS from EG, using nested compositions of FIRST and/or REST: % RDS: FALSE $ % Our answers are: % FIRST (FIRST (FIRST (EG))) & REST (FIRST (FIRST (EG))) & % Deeply nested function invocations become difficult to type and read, so let's define our first muSIMP function named FFFIRST, so that FFFIRST (EG) could be used as shorthand for the first of the above two examples and for any analogous example thereafter: % FUNCTION FFFIRST (U), FIRST (FIRST (FIRST (U))) ENDFUN & % If you are not using a hard-copy terminal, jot down this function definition and all subsequent ones for reference later in the lesson. Despite the word ENDFUN, the fun has just begun: Now that FFFIRST is defined, we can apply it at any subsequent time during the dialogue. For example: % FFFIRST (EG) & FFFIRST ('(((BIG . MAC) . CATSUP) . (FRENCH . FRIES))) & % Using the definition of FFFIRST as a model, define a function named RFFIRST which extracts the REST of the FIRST of the FIRST of its argument, then test RFFIRST on EG: % RDS: FALSE $ % Our solution is: % FUNCTION RFFIRST (FOO), REST (FIRST (FIRST (FOO))), ENDFUN & RFFIRST (EG) & % The name FOO in the definition is called a PARAMETER, whereas EG where the function is applied is an example of an ARGUMENT. We can use any name for a parameter -- even a name which has been bound to a value or even the same name as an argument. The name is merely used as a "dummy variable" to help indicate what to do to an argument when the function is subsequently applied. A function definition is like a recipe. It is filed away, without actually being EXECUTED until applied to actual arguments. As another simple example, since an atom is defined as being either a name or an integer, it is convenient to have a recognizer function for atoms, so that we do not have to test separately for names and atoms when we do not care which type of atom is involved. We could define this recognizer as follows: FUNCTION ATOM (U), NAME (U) OR NUMBER (U) ENDFUN & Actually, ATOM is already built-into muSIMP, but the example provides a good opportunity to introduce the built-in infix OR operator, which returns FALSE if both of its operands are FALSE, returning TRUE otherwise. Try out ATOM on the examples -5, X and EG % RDS: FALSE $ % Analogous to OR, there is a built-in infix AND operator which returns FALSE if either operand is FALSE, returning TRUE otherwise. There is also a built-in prefix NOT operator which returns TRUE if its operand is FALSE, returning FALSE otherwise. Knowing this, see if you can define a recognizer named NODE, which returns TRUE if its argument is a node, returning FALSE otherwise: % RDS: FALSE $ % In programming there is rarely, if ever, one unique solution, but ours is: % FUNCTION NODE (U), NOT ATOM (U) ENDFUN & NODE (EG) & NODE (5) & % So much for trivial exercises. Now let's write a function which counts the number of atoms in its argument. We will count each instance of each atom, even if some atoms occur more than once. At first this may seem like a formidable task, because a tree can be arbitrarily branched. How can we anticipate ahead of time all of these possibilities. Well, let's procrastinate by disposing of the most trivial cases even though we can't yet see the whole solution: If the argument is an atom, then there is exactly 1 atom in it. So much for trivial cases. We haven't yet solved the whole problem, but it builds our self-confidence to make progress, so that is a good psychological reason for first disposing of the easy cases. Also, with the easy cases out of the way, we can turn our full intellectual powers on the harder cases, unfettered by any distractions to trivial loose ends. We are left with the case where we know we have a node. Perhaps we could somehow subdivide the problem into smaller cases? Let's see ... Nodes have a FIRST part and a REST part, so perhaps that provides the natural subdivision. Hmmm ... If we knew the number of atoms for the left part and the number for the right part, clearly the number for the whole aggregate is merely their sum. But how can we find out the number of atoms in these parts? Why not RECURSIVELY use the very function we are defining to determine these two contributions! It may sound like cheating to refer to the function we are defining from with the definition itself, but remembering that the definition is not actually APPLIED until sometime after its definition is complete, perhaps it will work. We are working in a highly interactive environment, so the quickest way to resolve questions about muSIMP is to try it and see! Here then is a formal muSIMP function definition corresponding the the above informal English "algorithm": % FUNCTION #ATOMS (U), WHEN ATOM (U), 1 EXIT, #ATOMS (FIRST(U)) + #ATOMS (REST(U)) ENDFUN & % Here we introduce 2 new concepts: The BODY of a function definition can consist of a sequence of one or more expressions separated by commas. A CONDITIONAL-EXIT is an expression consisting of a sequence of one or more expressions nested between the matching pair of words WHEN and EXIT. When a function definition is APPLIED, the expressions in its body are evaluated sequentially, until perhaps a conditional exit causes an exit from the procedure or until the delimiter named ENDFUN is reached. For a conditional exit, the first expression after the word WHEN is evaluated. If the value is FALSE, then evaluation proceeds to the point immediately following the matching delimiter named EXIT. Otherwise, evaluation proceeds sequentially through the remaining expressions in the conditional exit, if any, exactly as if the body of the conditional exit replaced that of the function. The value of a conditional exit is that of the last expression evaluated therein, and the value returned by a function is that of the last expression evaluated therein when the function is applied. Thus, #ATOMS immediately returns the value 1 whenever the argument is an atom, and otherwise the function breaks the problems into two parts which are necessarily smaller, hence closer to being atoms. Let's test it, starting with trivial cases first: % #ATOMS (FOO) & #ATOMS (5) & EG & #ATOMS (EG) & % It looks promising, but it is still perhaps mysterious how muSIMP and #ATOMS keep track of all of these recursive function invocations. Since the trace package is supposedly loaded, to trace the execution of #ATOMS, we merely issue the command: % TRACE (#ATOMS) & % Now every time #ATOMS is entered, it prints its name and argument values, whereas every time it is exited, it prints its name followed by an equal sign, followed by the returned value. Moreover, the trace is indented in a manner which allows corresponding entries and exits to be visually associated. Watch: % #ATOMS (FOO) & EG & #ATOMS (EG) & % Try a few examples of your own, until these new ideas begin to gel: % RDS: FALSE $ UNTRACE (#ATOMS) & #ATOMS (FOO) & % Here is a function which counts only the number of integers in its argument: % FUNCTION #INTEGERS (U), WHEN INTEGER (U), 1 EXIT, WHEN NAME (U), 0 EXIT, #INTEGERS (FIRST(U)) + #INTEGERS (REST(U)) ENDFUN $ EG & #INTEGERS (EG) ; % Now, using it as a model, try writing a function named #NAMES, which returns the number of names in its argument. If your first syntactically accepted attempt fails any test, try using TRACE to reveal the reason why: % RDS: FALSE $ % Our solution is ... On second thought, we won't give you our solution. Consequently, if you were lazy and didn't try, you had better try now, because the examples will get steadily harder now. % RDS: FALSE $ % The HEIGHT of an atom is 1, and the HEIGHT of a node is 1 more than the maximum of the two heights of its FIRST and REST parts. Accordingly, let's first write a function named MAX, which returns the maximum of its two integer arguments. There is a built-in infix integer comparator named ">", so here is a hint: FUNCTION MAX (INT1, INT2), WHEN INT1 > INT2, ... EXIT, ... ENDFUN $ Enter such a definition, with appropriate substitutions for the missing portions, then test your function to make sure it works correctly: % RDS: FALSE $ % Now, with the help of our friend MAX, see if you can write a function named HEIGHT, which returns the height of its argument: % RDS: FALSE $ % Our solution is: % FUNCTION HEIGHT (U), WHEN ATOM (U), 1 EXIT, 1 + MAX (HEIGHT(FIRST(U)), HEIGHT(REST(U))) ENDFUN $ % This brings us to the end of the first programming-mode lessons. It may be a good idea to review this lesson before proceeding to lesson PLES2.TRA. % ECHO: FALSE $ MOVD (#PRINT, PRINT) $ MOVD (#PRINTLINE, PRINTLINE) $ RDS () $