%File: PLES5.TRA (c) 09/24/80 The Soft Warehouse % MATHTRACE: FALSE $ ECHO: TRUE $ % This is the fifth in a series of muSIMP programming lessons. In the previous lesson our original version of REVERSE, called REVLIS, required time proportional to n^2, where n is the length of the first argument. We then showed how a collection variable or a loop could yield a much faster technique using time proportional only to n. Now, let's consider the speed of some of the other set functions that we defined: Whether iterative or recursive, MEMBER can require a number of equality comparisons equal to the length of its second argument. Whether defined iteratively or recursively, SUBSET, EQSET, UNION, and INTERSECTION all require a membership test for each element of one argument in the list which is the other argument. Thus, these definitions can all consume computation time which grows as the product of the lengths of the two arguments. By similar reasoning, the one- argument functions ISSET and MAKESET are seen to require time proportional to the square of the length of their argument. Data-base applications and others can involve thousands of set operations on sets having thousands of elements, so it is worthwhile to seek methods for which the computation time grows more slowly with set size. In muSIMP, every name has an associated PROPERTY LIST which is immediately accessible in an amount of time that is independent of the total number of names in use. Provided the elements of the sets are all names, this permits techniques for the above set operations requiring time proportional merely to the length of the one set or to the sum of the lengths of the two sets. A property list is a list of dotted pairs. The first of each dotted pair is an expression called the KEY or INDICATOR, and the rest of each dotted pair is an expression called the associated INFORMATION. For example, in a meteorological data-base application, the name HONOLULU might have the property list ((RAIN . 2), (HUMIDITY . 40), (TEMPERATURE, 58, 96)) The function used in the form GET (name, key) returns the information which is dotted with the value of "key" on the property list of the value of "name", returning FALSE if no such key occurred on the property list. A command of the form PUT (name, key, information) causes the value of "key" dotted with the value of "information" to be put on the property list of the value of "name". PUT returns the value of "information". Using property lists, the basic technique for accomplishing our various operations on two sets of names is: 1. For each name in one of the two sets of names, store TRUE under the key SEEN. 2. For each name in the other set, check to determine whether or not the name has already been seen, and act accordingly. 3. For each name in the first set, remove the property SEEN so that we won't invalidate subsequent set operations which utilize any of the same elements. A simpler variant of this idea is applicable to the one-argument functions named ISSET and MAKESET. As an example, here is UNION defined using this technique together with the applicative style: % FUNCTION UNION (SET1, SET2), MARK (SET1), UNMARK (SET1, UNIONAUX (SET2)) ENDFUN $ FUNCTION MARK (SET1), WHEN EMPTY (SET1), EXIT, PUT (FIRST(SET1), 'SEEN, TRUE), MARK (REST (SET1)) ENDFUN $ FUNCTION UNIONAUX (SET2), WHEN EMPTY (SET2), SET1 EXIT, WHEN GET (FIRST(SET2), 'SEEN), UNIONAUX (REST(SET2)) EXIT, ADJOIN (FIRST(SET2), UNIONAUX(REST(SET2))) ENDFUN $ FUNCTION UNMARK (SET1, ANS), WHEN EMPTY (SET1), ANS EXIT, PUT (FIRST(SET1), 'SEEN, FALSE), UNMARK (REST(SET1), ANS) ENDFUN $ UNION ('(A, B, C, D), '(F, A, E, C)) & % Each time any function is invoked, the outside values of its parameter names, if any, are "stacked" away to be restored later, just prior to return from that invocation. If a function refers to a variable which is not among its parameters, then the most recent value of the variable on the stack is used. Thus, when UNIONAUX is invoked from within UNION, SET1 in the definition of UNIONAUX refers to the argument value associated with that parameter of UNION. This treatment is called "dynamic binding", and a reference such as to SET1 in UNIONAUX is called a "fluid reference". We could have avoided this by making SET1 be an argument and a parameter to UNIONAUX, but that would have made the program slightly slower and more bulky. However, fluid variables make programs much harder to debug and maintain, especially if assignments are made to them in functions other than the ones which establish them. Consequently, we recommend generally avoiding fluid variables. The only reason we used one here is to introduce the concept to issue this advice. Values assigned at the top-level of muSIMP, outside all function definitions, are called GLOBAL values. Examples are the initial values of muSIMP control variables such as RDS and ECHO, or of muMATH control variables such as PBRCH or PWREXPD. Reference to a global value from within a function definition is not quite as confusing as reference to a fluid value, and it is indeed onerous to create numerous long lists of parameters in order to pass such environmental control values through a long sequence of function definitions for use deep within. The property-list technique for set operations is one which we think is more naturally implemented using the Von Neumann programming style. Try to write such a version of UNION: % RDS: FALSE $ % Now, using either style, write an INTERSECTION function using the property-list technique: % RDS: FALSE $ % Taking the FIRST and/or REST of an atom is generally not necessary, but it does in fact have a well-defined value. The FIRST cell of an atom points to the atom's value, while the REST cell points to the property list associated with the atom. For example: % WEATHER: 'FOUL $ PUT ('WEATHER, 'TEMPERATURE, -3) $ PUT ('WEATHER, 'WIND, '((NORTH . WEST), 30)) $ FIRST (WEATHER) & REST ('WEATHER) & % Integer atoms also have FIRST and REST cells. The FIRST cell of an integer normally points to the integer itself. The REST cell is used to determine the sign of the number: if FALSE the integer is non-negative, if TRUE the integer is negative. % FIRST (7) & REST (7) & NINE: 9 $ PUT (NINE, 'TESTING, '(1, 2, 3)) & GET (NINE, 'TESTING) & GET (9, 'TESTING) & % All muSIMP data objects (i.e. nodes, names, and integers) have a FIRST cell and a REST cell which can only point to valid muSIMP data objects. Thus, misuse of these selectors cannot accidently give access to non-data objects such as machine language code, stack, print names, etc. This closed pointer universe guarantees the integrity of muSIMP from possible excursions into the unknown. It is common practice to use EMPTY to test for the end condition as a function proceeds down a list. If such a function is inadvertently given a non-list (i.e. a Non-FALSE atom or a structure whose final REST cell points to a Non-FALSE atom), the function will use the FIRST cell of that atom (i.e. its Value cell) as an element of the list and the REST cell of the atom (i.e. its Property List cell) as the REST of the list. Generally the Property List is a well defined list so the EMPTY test will ultimately cause termination with no ill affects. We prefer to have non-list arguments give more predictable results confined to the argument. Thus, our internal implementations of MEMBER, REVERSE, and any other functions ordinarily applied to lists use ATOM rather than EMPTY as the termination test. This is slightly faster too, so you may wish to generally avoid EMPTY in favor of ATOM. Alternatively, you can redefine EMPTY to print and return an error message when given a nonFALSE atom: % FUNCTION EMPTY (LIS), WHEN ATOM (LIS), WHEN LIS EQ FALSE EXIT, PRINT ("*** Warning: EMPTY given nonlist ") EXIT ENDFUN $ EMPTY (5) $ % This is our first example illustrating the fact that conditional exits can be nested arbitrarily deep. The same is true of loops or blocks. This example also illustrates the PRINT function, which prints its one argument the same way that expressions terminated with an ampersand are printed. There is an analogous function named PRTMATH which prints its one argument the same way that expressions terminated with a semicolon are printed. When functions are called with fewer actual arguments than the function has formal arguments, the remaining formal arguments are assigned the value FALSE. This provides a convenient mechanism for automatically inserting default values for these extra arguments. When an argument evaluates to FALSE, the function can assign the appropriate default value. For example, if the user omits the drive as the third argument of RDS, that function uses the currently logged in drive (i.e. the drive indicated by the last operating system prompt given before entering muSIMP). There are instances where it is desirable to permit a function to have an arbitrary number of arguments. This is accomplished by making the formal parameter list of a function definition be an atom or non- list rather than a list. The arguments are passed to the function as a single list of argument values, from which the function can extract the values. For example, it is convenient to have a function named MAX which returns the largest of one or more argument values. We can implement this as follows: % FUNCTION MAX ARGLIS, MAXAUX (FIRST(ARGLIS), REST(ARGLIS)) ENDFUN $ FUNCTION MAXAUX (BIGGEST, UNTRIED), WHEN EMPTY (UNTRIED), BIGGEST EXIT, WHEN BIGGEST > FIRST(UNTRIED), MAXAUX (BIGGEST, REST(UNTRIED)) EXIT, MAXAUX (FIRST(UNTRIED), REST(UNTRIED)) ENDFUN $ MAX (7) ; MAX (3, 8, -2) ; % This collection of arguments into a list is called NOSPREAD, to distinguish from the SPREAD brand of peanut butter. Now, suppose that for some reason we already have a list of integers such as % NUMBLIS: '(18, 3, 7, 91, 12, 2) $ % and we want to find their maximum. The expression MAX (NUMBLIS) will not work, because MAX is designed for numeric arguments, not for a list of numbers. We could of course extract the elements and feed them individually to MAX, but this is awkward, especially if we are referring to MAX inside a function and we do not know ahead of time how many integers are in NUMBLIS. Fortunately there is a convenient function named APPLY, which applies the function whose name is the value of its first argument to the argument list which is the value of its second argument. Consequently, we need merely write % APPLY ('MAX, NUMBLIS) & % APPLY works on either SPREAD or NOSPREAD functions. Why don't you try out a few examples: % RDS: FALSE $ % A function written in muSIMP is stored internally in a very compact form called D-code (see Section 13.9 of the muMATH Reference Manual). In order to retrieve the definition for use as data, the function GETD (GET Definition) of one argument can be used to decompile the definition and return it is as a linked list. If GETD is given the name of a primitively defined machine language routine instead, the physical memory address of the routine is returned. Finally, if its argument is not a defined function, GETD returns returns FALSE. The following examples show the result of all three types of arguments: % GETD ('UNION) & GETD ('FIRST) & GETD ('FOO) & % Since function definitions can be converted into lists and then recompiled back into D-code, a muSIMP program can actuately be made to modify muSIMP functions! In fact, this is exactly what the TRACE and UNTRACE commands do to the traced function. Other examples which could use this feature include a muSIMP function editor and pretty printer, a cross reference program, and even a compiler all of which could be written in muSIMP. This is the end of programming lesson 5. These lessons should have provided you with sufficient knowledge to be able to use the muSIMP Section of the Reference Manual to achieve any desired muSIMP programming goal. % ECHO: FALSE $ RDS () $