%File: PLES3.TRA (c) 09/24/80 The Soft Warehouse % MATHTRACE: FALSE $ ECHO: TRUE $ % This is the third in a series of muSIMP programming lessons. Often, it is most natural to represent a data aggregate as a sequence or LIST of items rather than as a general binary tree. For example, such a sequence is quite natural for the elements of a vector or of a set. We can represent such a sequence in terms of nodes by having all of the FIRST cells point to the data elements, using the REST cells to link the sequence together. The last linkage node can have a REST cell which is FALSE to indicate that there are no further linkage nodes: /\ / \ item1 /\ / \ item2 . . /\ / \ itemN FALSE When this diagram is rotated 45 degrees in the counter-clockwise direction, it looks like a clothes line with the successive data elements suspended from it. +-------+-- - - - ---+----- FALSE | | | item1 item2 itemN This latter diagram more clearly suggests a sequence or list of items. The simple regularity of the structure permits correspondingly simple function definitions for processing such structures. Moreover, the linear structure suggests an external printed representation which is far more readable than dotted pairs. In response to an ampersand terminator, muSIMP prints the above aggregate in the more natural LIST notation: (item1 item2 ... itemN) rather than the equivalent dot notation (item1 . (item2 . ... (itemN . FALSE) ... )) Conversely, the reader accepts list notation as an alternative input form to dot notation. Naturally, any of the items in a list can themselves be either lists or more general dotted pairs. The printer uses list notation as much as possible. Thus, a structure of the form /\ / \ item1 /\ / \ item2 . . /\ / \ itemN atom where "atom" is not the atom FALSE, is printed in a mixed notation as (item1 item2 ... (itemN . atom)) The muSIMP read routines will correctly read such mixed notation. You may wonder why you never noticed list notation being output in PLES1 and PLES2. At the beginning of those lessons the function PRINT was redefined so that it printed expression entirely in dot notation. It is important to fully understand the connection between dotted pairs and lists, so take 5 minutes or so to type in some lists, nested lists, nested dotted pairs, and mixtures, noting carefully how they print. % RDS: FALSE $ % Did your examples include: % '() & % Is that surprising? Since FALSE is used to signal the end of the list, FALSE and the empty list must be equivalent. % FALSE EQ '() & % Clearly functions which successively process each element of a list must somehow determine when the end of the list has been reached. This TERMINAL CASE is easily achieved by an equality test for the name FALSE. Since the need for this test is so pervasive in muSIMP, the empty list recognizer EMPTY is written in machine language for efficiency reasons. However, it could be defined using an EQ test as follows: FUNCTION EMPTY (LIS), LIS EQ '(), ENDFUN; Using EMPTY, see if you can define a function named #ITEMS, which returns the number of (top-level) items in its list argument. For example, #ITEMS ('(FROG, (FRUIT . BAT), NEWT)) should yield 3. Here is an incomplete solution. All you have to do is enter it with the portions marked "..." appropriately filled. FUNCTION #ITEMS (LIS), WHEN EMPTY (LIS), ... EXIT, 1 + #ITEMS ( ... ) ENDFUN $ % RDS: FALSE $ % Actually, there is already a built-in function called LENGTH, which returns the length of a list. It is somewhat more general in that it returns the number of characters necessary for printing when given an atom. Note that with lists it is typical to recur only on the REST of the list, whereas with general binary trees it is typical to recur on both the FIRST and the REST. So far, the examples and exercises have been relatively isolated ones. Now we will focus on writing a collection of functions which together provide a significant applications package: A list provides a natural representation for a set. For example, (MANGO, (CHOCOLATE . FUDGE), (ALFALFA, SPROUTS)) can represent a set of three foods. Using this representation, let's write functions which test set membership and form unions, intersections, etc. First, write a function named ISIN, which returns TRUE if its first argument is in the list which is its second argument, returning FALSE otherwise: % RDS: FALSE $ % Our solution is: % FUNCTION ISIN (U, LIS), WHEN EMPTY (LIS), FALSE EXIT, WHEN U = FIRST (LIS), EXIT, ISIN (U, REST(LIS)) ENDFUN $ ISIN ('FROG, '(SALAMANDER NEWT TOAD)) ; % Actually, there is already a built-in version of ISIN called MEMBER. A set contains no duplicates, so we really should have a recognizer function named ISSET, which returns TRUE if its list argument contains no duplicates, returning FALSE otherwise. Try to write such a function: % RDS: FALSE $ % Here is a hint, in case you gave up: FUNCTION SET (LIS), WHEN ... EXIT, WHEN MEMBER (FIRST(LIS), ... ), FALSE EXIT, SET ( ... ) ENDFUN; % RDS: FALSE $ % In case it isn't clear by now, a rule of this game is that you are free (and encouraged) to use any functions we have already discussed, whether they are built-in, previous examples, or previous exercises. That is one reason it is adviseable for you to actually do the exercises. Now write a function named SUBSET, which returns TRUE if the set which is its first argument is a subset of that which is its second argument. (Remember that every set is a subset of itself and the empty set is a subset of every set.) % RDS: FALSE $ % Here is a hint, in case you gave up or had a less compact solution: FUNCTION SUBSET (SET1, SET2), WHEN ... EXIT, WHEN MEMBER (FIRST(SET1), ...), SUBSET( ...) EXIT ENDFUN; % RDS: FALSE $ % Two sets are equal if and only if they contain the same elements. However, the elements need not occur in the same order. Write a corresponding comparator function named EQSET: % RDS: FALSE $ % Ah yes, a hint perhaps?: FUNCTION EQSET (SET1, SET2), ... ENDFUN; % RDS: FALSE $ % Do you think that's not much of a hint? Well, the body of the function really can be written with one modest line, so try harder: % RDS: FALSE $ % Remember the rules of the game: You are encouraged to use any function discussed previously: % FUNCTION EQSET (SET1, SET2), SUBSET (SET1, SET2) AND SUBSET (SET2, SET1) ENDFUN; % Our examples so far have merely analyzed sets. We can use ADJOIN to construct lists, just as we used ADJOIN to construct binary trees. As an example of this, write a function named MAKESET, which returns a copy of its list argument, except without duplicates if there are any: % RDS: FALSE $ % If you need a hint, here is one, but it is all you will get: FUNCTION MAKESET (LIS) WHEN ..., '() EXIT, WHEN MEMBER ( ... ), ... EXIT, ADJOIN ( ... ) ENDFUN; % RDS: FALSE $ % Let's see if your solution works correctly: % MAKESET ('(FROG, FROG, FROG)) & % If there is a duplicate in the answer, then back to the computer terminal: % RDS: FALSE $ % (It helps to think of nasty test cases BEFORE you start programming). Now for the crowning glory of our set package: The UNION of two sets is defined as the set of all elements which are in either (perhaps both) sets. Give it a try: % RDS: FALSE $ % A hint perhaps? Well, the function body can be written in 3 lines, each of which begins just like the corresponding line in our hint for MAKESET. % RDS: FALSE $ % Here is our solution: % FUNCTION UNION (SET1, SET2), WHEN EMPTY (SET1), SET2 EXIT, WHEN MEMBER (FIRST(SET1), SET2), UNION (REST(SET1), SET2) EXIT, ADJOIN (FIRST(SET1), UNION (REST(SET1), SET2)) ENDFUN $ UNION ('(DOG, CAT, 5, RAT), '(-5, CAT, PIG, DOG))& % The intersection of two sets is the set of all elements which are in both sets. Using our definition of UNION as inspiration, write a corresponding function for the intersection: % RDS: FALSE $ % So far, our set algebra package has been developed in a so-called BOTTOM-UP manner, with the most primitive functions defined first, and with the more sophisticated functions defined in terms of them. The opposite approach is TOP-DOWN, where we define the most comprehensive functions in terms of more primitive ones, then we define those more primitive ones in terms of still more primitive ones, until no undefined functions remain. As an example of the top-down attitude, let's write a SYMMETRIC DIFFERENCE function for our set-algebra package. The symmetric difference of two sets is the set of all elements which are in exactly one of the two sets. This is in contrast to the ordinary diference of two sets, which is all of the elements that are in the first set but not the second. However, if an ordinary difference function was available, we could write the symmetric difference as the union of the ordinary difference between set1 and set2, with the ordinary difference between set2 and set1. We have already written UNION, but an ordinary set difference is not yet available. Nevertheless, let's bravely proceed to write the symmetric difference in terms of the ordinary difference, then we will worry about how to write the latter: % FUNCTION SYMDIF (SET1, SET2), UNION (ORDDIF (SET1, SET2), ORDDIF (SET2, SET1)) ENDFUN $ % Now you try to write ORDDIF. It may help you to know that it can be written very similarly to UNION: % RDS: FALSE $ % Some programmers are initially uncomfortable with the top-down approach because it makes them nervous to refer to undefined functions: there are obvious loose ends during the writing process. However, it is not necessary to understand how an auxiliary function can be written before daring to refer to it. All that is necessary is that the duty relegated to the auxiliary function be somehow more elementary than the overall duty performed by the function which refers to it. There are necessarily loose ends during the writing of a program in any sequential order. With the bottom-up approach, the loose ends are neither written nor referred to until lower-level functions have been written. Unfortunately, as such hidden loose ends are revealed they often make apparent the need to completely reorganize and rewrite all subordinate functions into a more suitable organization. In contrast, the obvious loose ends during a top-down development provide invaluable clues about how to organize the remaining functions. Moreover, any subsequent changes tend to be easier, because communication between the functions is more localized, more independent, and more hierarchial. For example, we know that in the definition of SYMDIF we are taking the union of two DISJOINT sets, because from the definition of ORDDIF it is clear that ORDDIF (SET1, SET2) and ORDDIF (SET2, SET1) cannot have elements in common. Hence it would be more efficient merely to append the second ordinary set difference to the first ordinary set difference, or vice-versa. Unfortunately, ADJOIN does not accomplish the desired effect. For example, ADJOIN ('(5, 9), '(3, 7)) yields ((5, 9), 3, 7) rather than the desired (5, 9, 3, 7). What we must do is ADJOIN 9 to (3, 7), then adjoin 5 to that result. See if you can generalize this process into a function named APPEND, which returns a list consisting of the list which is its first argument appended onto the beginning of the list which is its second argument:% RDS: FALSE $ % How about: % FUNCTION APPEND (LIS1, LIS2), WHEN EMPTY (LIS1), LIS2 EXIT, ADJOIN (FIRST(LIS1), APPEND (REST(LIS1), LIS2)) ENDFUN $ % You may not be getting tired, but my circuits are weary, so let's bring this lesson to a close. % ECHO: FALSE $ RDS () $