%File:  PLES5.TRA  (c)            09/24/80          The Soft Warehouse %


MATHTRACE: FALSE $
ECHO: TRUE $

%   This is the fifth in a series of muSIMP programming lessons.

    In  the  previous lesson our original  version  of  REVERSE,  called 
REVLIS,  required time proportional to n^2, where n is the length of the 
first  argument.   We  then showed how a collection variable or  a  loop 
could  yield a much faster technique using time proportional only to  n.  
Now, let's consider the speed of some of the other set functions that we 
defined:

    Whether  iterative  or  recursive,  MEMBER can require a  number  of 
equality  comparisons  equal  to  the length  of  its  second  argument.  
Whether defined iteratively or recursively,  SUBSET,  EQSET,  UNION, and 
INTERSECTION  all  require  a membership test for each  element  of  one 
argument  in  the  list  which  is  the  other  argument.   Thus,  these 
definitions can all consume computation time which grows as the  product 
of  the lengths of the two arguments.   By similar reasoning,  the  one-
argument   functions  ISSET  and  MAKESET  are  seen  to  require   time 
proportional  to the square of the length of their argument.   Data-base 
applications and others can involve thousands of set operations on  sets 
having  thousands of elements,  so it is worthwhile to seek methods  for 
which the computation time grows more slowly with set size.

    In  muSIMP,  every  name  has an associated PROPERTY LIST  which  is 
immediately  accessible in an amount of time that is independent of  the 
total number of names in use.  Provided the elements of the sets are all 
names,  this  permits techniques for the above set operations  requiring 
time  proportional merely to the length of the one set or to the sum  of 
the lengths of the two sets.

    A property list is a list of dotted pairs.  The first of each dotted 
pair is an expression called the KEY or INDICATOR,  and the rest of each 
dotted  pair  is an expression called the associated  INFORMATION.   For 
example,  in a meteorological data-base application,  the name  HONOLULU 
might have the property list 

          ((RAIN . 2), (HUMIDITY . 40), (TEMPERATURE, 58, 96))

    The  function  used  in  the form   GET  (name,  key)   returns  the 
information which is dotted with the value of "key" on the property list 
of  the value of "name",  returning FALSE if no such key occurred on the 
property list.

    A  command  of the form  PUT (name,  key,  information)  causes  the 
value  of "key" dotted with the value of "information" to be put on  the 
property  list  of  the  value of "name".   PUT  returns  the  value  of 
"information".

    Using  property  lists,  the basic technique for  accomplishing  our 
various operations on two sets of names is:

        1.   For each name in one of the two sets of  names,  store 
    TRUE under the key SEEN.

        2.   For  each  name in the other set,  check to  determine 
    whether  or  not  the  name has  already  been  seen,  and  act 
    accordingly.

        3.   For  each name in the first set,  remove the  property 
    SEEN  so  that  we won't invalidate subsequent  set  operations 
    which utilize any of the same elements.


    A  simpler  variant of this idea is applicable to  the  one-argument 
functions named ISSET and MAKESET.

    As  an example,  here is UNION defined using this technique together 
with the applicative style:  %

FUNCTION UNION (SET1, SET2),
  MARK (SET1),
  UNMARK (SET1, UNIONAUX (SET2))   ENDFUN $
FUNCTION MARK (SET1),
  WHEN EMPTY (SET1),  EXIT,
  PUT (FIRST(SET1), 'SEEN, TRUE),
  MARK (REST (SET1))   ENDFUN $
FUNCTION UNIONAUX (SET2),
  WHEN EMPTY (SET2),  SET1  EXIT,
  WHEN GET (FIRST(SET2), 'SEEN),  UNIONAUX (REST(SET2))  EXIT,
  ADJOIN (FIRST(SET2), UNIONAUX(REST(SET2)))   ENDFUN $
FUNCTION UNMARK (SET1, ANS),
  WHEN EMPTY (SET1),  ANS  EXIT,
  PUT (FIRST(SET1), 'SEEN, FALSE),
  UNMARK (REST(SET1), ANS)   ENDFUN $
UNION ('(A, B, C, D), '(F, A, E, C)) &
%    Each  time  any function is invoked,  the  outside  values  of  its 
parameter names,  if any,  are "stacked" away to be restored later, just 
prior  to  return  from  that invocation.   If a function  refers  to  a 
variable which is not among its parameters,  then the most recent  value 
of  the variable on the stack is used.   Thus,  when UNIONAUX is invoked 
from  within  UNION,  SET1 in the definition of UNIONAUX refers  to  the 
argument value associated with that parameter of UNION.   This treatment 
is called "dynamic binding", and a reference such as to SET1 in UNIONAUX 
is  called a "fluid reference".   We could have avoided this  by  making 
SET1  be  an argument and a parameter to UNIONAUX,  but that would  have 
made  the  program  slightly slower  and  more  bulky.   However,  fluid 
variables make programs much harder to debug and maintain, especially if 
assignments  are  made to them in functions other than  the  ones  which 
establish  them.   Consequently,  we recommend generally avoiding  fluid 
variables.  The only reason we used one here is to introduce the concept 
to issue this advice.

    Values  assigned  at the top-level of muSIMP,  outside all  function 
definitions,  are called GLOBAL values.  Examples are the initial values 
of  muSIMP control variables such as RDS and ECHO,  or of muMATH control 
variables  such as PBRCH or PWREXPD.   Reference to a global value  from 
within a function definition is not quite as confusing as reference to a 
fluid value,  and it is indeed onerous to create numerous long lists  of 
parameters  in order to pass such environmental control values through a 
long sequence of function definitions for use deep within.  

    The property-list technique for set operations is one which we think 
is  more naturally implemented using the Von Neumann programming  style.  
Try to write such a version of UNION: %		RDS: FALSE $
%   Now,  using either style,  write an INTERSECTION function using  the 
property-list technique:  %			RDS: FALSE $
%  Taking the FIRST and/or REST of an atom is generally  not  necessary, 
but  it  does in fact have a well-defined value.   The FIRST cell of  an 
atom  points  to the atom's value,  while the REST cell  points  to  the 
property list associated with the atom.  For example: %

WEATHER: 'FOUL $
PUT ('WEATHER, 'TEMPERATURE, -3) $
PUT ('WEATHER, 'WIND, '((NORTH . WEST), 30)) $
FIRST (WEATHER) &
REST ('WEATHER) &
%   Integer atoms also have FIRST and REST cells.   The FIRST cell of an 
integer normally points to the integer itself.  The REST cell is used to 
determine the sign of the number:  if FALSE the integer is non-negative, 
if TRUE the integer is negative. %

FIRST (7) &
REST (7) &
NINE: 9 $
PUT (NINE, 'TESTING, '(1, 2, 3)) &
GET (NINE, 'TESTING) &
GET (9, 'TESTING) &
%   All muSIMP data objects (i.e.  nodes,  names,  and integers) have  a 
FIRST  cell  and a REST cell which can only point to valid  muSIMP  data 
objects.   Thus, misuse of these selectors cannot accidently give access 
to non-data objects such as machine language code,  stack,  print names, 
etc.   This  closed pointer universe guarantees the integrity of  muSIMP 
from possible excursions into the unknown.

    It is common practice to use EMPTY to test for the end condition  as 
a  function proceeds down a list.   If such a function is  inadvertently 
given a non-list (i.e.  a Non-FALSE atom or a structure whose final REST 
cell  points to a Non-FALSE atom),  the function will use the FIRST cell 
of  that atom (i.e.  its Value cell) as an element of the list  and  the 
REST  cell of the atom (i.e.  its Property List cell) as the REST of the 
list.   Generally  the Property List is a well defined list so the EMPTY 
test will ultimately cause termination with no ill affects.

       We  prefer  to  have non-list  arguments  give  more  predictable 
results confined to the argument.  Thus, our internal implementations of 
MEMBER, REVERSE, and any other functions ordinarily applied to lists use 
ATOM rather than EMPTY as the termination test.  This is slightly faster 
too,  so  you  may  wish  to generally avoid EMPTY  in  favor  of  ATOM.  
Alternatively,  you  can  redefine  EMPTY to print and return  an  error 
message when given a nonFALSE atom: %


FUNCTION EMPTY (LIS),
  WHEN ATOM (LIS),
    WHEN LIS EQ FALSE EXIT,
    PRINT ("*** Warning: EMPTY given nonlist ")  EXIT
ENDFUN $
EMPTY (5) $
%    This  is our first example illustrating the fact  that  conditional 
exits  can  be nested arbitrarily deep.   The same is true of  loops  or 
blocks.   This example also illustrates the PRINT function, which prints 
its  one  argument  the same way that  expressions  terminated  with  an 
ampersand  are  printed.   There is an analogous function named  PRTMATH 
which  prints its one argument the same way that expressions  terminated 
with a semicolon are printed.

    When  functions  are  called with fewer actual  arguments  than  the 
function  has  formal  arguments,  the remaining  formal  arguments  are 
assigned  the  value FALSE.   This provides a convenient  mechanism  for 
automatically inserting default values for these extra arguments.   When 
an argument evaluates to FALSE,  the function can assign the appropriate 
default  value.   For example,  if the user omits the drive as the third 
argument of RDS,  that function uses the currently logged in drive (i.e. 
the  drive  indicated by the last operating system prompt  given  before 
entering muSIMP).

     There  are instances where it is desirable to permit a function  to 
have  an arbitrary number of arguments.   This is accomplished by making 
the  formal parameter list of a function definition be an atom  or  non-
list rather than a list.   The arguments are passed to the function as a 
single  list  of argument values,  from which the function  can  extract 
the values.   For example, it is convenient to have a function named MAX 
which  returns  the  largest of one or more  argument  values.   We  can 
implement this as follows:  %

FUNCTION MAX ARGLIS,
  MAXAUX (FIRST(ARGLIS), REST(ARGLIS))
ENDFUN $
FUNCTION MAXAUX (BIGGEST, UNTRIED),
  WHEN EMPTY (UNTRIED),  BIGGEST  EXIT,
  WHEN BIGGEST > FIRST(UNTRIED), MAXAUX (BIGGEST, REST(UNTRIED)) EXIT,
  MAXAUX (FIRST(UNTRIED), REST(UNTRIED))
ENDFUN $
MAX (7) ;
MAX (3, 8, -2) ;

%    This  collection of arguments into a list is  called  NOSPREAD,  to 
distinguish from the SPREAD brand of peanut butter.

    Now, suppose that for some reason we already have a list of integers 
such as  %

NUMBLIS: '(18, 3, 7, 91, 12, 2) $
%    and we want to find their maximum.   The expression  MAX  (NUMBLIS) 
will not work,  because MAX is designed for numeric arguments, not for a 
list of numbers.   We could of course extract the elements and feed them 
individually to MAX, but this is awkward, especially if we are referring 
to  MAX  inside  a function and we do not know ahead of  time  how  many 
integers  are  in NUMBLIS.   Fortunately there is a convenient  function 
named APPLY,  which applies the function whose name is the value of  its 
first  argument  to the argument list which is the value of  its  second 
argument.  Consequently, we need merely write %

APPLY ('MAX, NUMBLIS) &
%    APPLY works on either SPREAD or NOSPREAD functions.   Why don't you 
try out a few examples:  %		RDS: FALSE $
%   A function written in muSIMP is stored internally in a very  compact 
form  called  D-code (see Section 13.9 of the muMATH Reference  Manual).  
In order to retrieve the definition for use as data,  the function  GETD 
(GET Definition) of one argument can be used to decompile the definition 
and  return  it is as a linked list.   If GETD is given the  name  of  a 
primitively  defined  machine  language routine  instead,  the  physical 
memory address of the routine is returned.   Finally, if its argument is 
not  a  defined function,  GETD returns returns  FALSE.   The  following 
examples show the result of all three types of arguments:  %

GETD ('UNION) &  GETD ('FIRST) &  GETD ('FOO) &
%   Since  function  definitions can be converted into  lists  and  then 
recompiled  back into D-code,  a muSIMP program can actuately be made to 
modify muSIMP functions!   In fact,  this is exactly what the TRACE  and 
UNTRACE commands do to the traced function.   Other examples which could 
use this feature include a muSIMP function editor and pretty printer,  a 
cross  reference  program,  and even a compiler all of  which  could  be 
written in muSIMP.

    This is the end of programming lesson 5.   These lessons should have 
provided  you  with sufficient knowledge to be able to  use  the  muSIMP 
Section   of  the  Reference  Manual  to  achieve  any  desired   muSIMP 
programming goal. %

ECHO: FALSE $
RDS () $