%File:  PLES1.TRA  (c)  	09/24/80	    The Soft Warehouse %


MATHTRACE: FALSE $  RFFIRST: 'RFFIRST $

MOVD (PRINT, #PRINT) $
FUNCTION PRINT (EX1),
  WHEN ATOM (EX1), #PRINT (EX1) EXIT,
  #PRINT (LPAR),  PRINT (FIRST (EX1)),  #PRINT (" . "),
  PRINT (REST (EX1)),  #PRINT (RPAR),  EX1,
ENDFUN $

MOVD (PRINTLINE, #PRINTLINE) $
FUNCTION PRINTLINE (EX1),
  PRINT (EX1), NEWLINE (), EX1,
ENDFUN $
ECHO: TRUE $

%    This is the first of a sequence of interactive lessons about muSIMP 
programming.   It presumes as a minimum that you have read section 8  of 
the  muSIMP/muMATH Reference Manual.   Also the first part of  CLES1.ARI 
should  be completed since it explains the mechanics of how to  interact 
with the lessons.  The file TRACE.MUS should be loaded.

    muSIMP  supplies a number of built-in functions and operators.   The 
calculator-mode lessons introduced a few of these, such as RDS, RECLAIM, 
+,  *,  etc.   These  progamming-mode  lessons introduce  more  built-in 
functions and operators,  but more important,  the lessons reveal how to 
supplement the built-in functions and operators with definitions of your 
own, thus extending muSIMP itself.

    It  is important to realize that,  until the  last  programming-mode 
lessons,  we  will  not deal with muMATH.   Instead we deal  first  with 
muSIMP,  the  language  in which muMATH is  written.   The  illustrative 
examples  for these first few lessons are utterly different from muMATH, 
because  we  want to suggest a few of the many  other  applications  for 
which  muSIMP  is  especially well suited,  and because  we  want  these 
lessons to be comprehensible regardless of math training level.

    Data is what programs operate upon.  The most primitive UNSTRUCTURED 
muSIMP data are integers and names, collectively called ATOMS to suggest 
their indivisibility by ordinary means.   Some programs must distinguish 
these  two  types of atoms,  so there are two  corresponding  RECOGNIZER 
functions: %

INTEGER (X76#) ;
NAME (X76#) ;
EG: -3271 $
INTEGER (EG) ;
NAME (EG) ;
%   Do you suppose that "137",  " ", "", and "X + 3", with the quotation 
marks included, are integers, names, or invalid?  Find out for yourself%  
RDS: FALSE $
%  Data can be stored in the computer's memory.   The  location at which 
a data item is stored is called its ADDRESS.  An address is analogous to 
a street address on the outside of a mailbox.   The data stored there is 
analogous to mail inside the mailbox.   As with a row of mailboxes,  the 
contents of computer memory can change over time.

    There  are  useful programs which deal only with unstructured  data, 
but  the most interesting applications involve aggregates  of  primitive 
data  elements.   One way to make an aggregate of 2 data elements is  to 
use a structural data element called a NODE,  which stores the addresses 
of the 2 data elements.   Thus, a node is "data" consisting of addresses 
of 2 other data items.

    For  example,  suppose  that  we  wish to  represent  the  aggregate 
consisting  of the name BILBO and his age 31.   We could store the  name 
BILBO  beginning at location 7,  the number 31 beginning at location  2, 
and  the node beginning at location 4.   Then,  begining at location  4, 
there  would  be stored the addresses 7 and 2,  as  illustrated  in  the 
following diagram:

  Address:	1	2	3	4	5	6	7
	    +-------+-------+-------+-------+-------+-------+-------+--
  Contents: |	    |  31   |	    |	7   |	2   |	    | BILBO |
	    +-------+-------+-------+-------+-------+-------+-------+--

    Is that clear?

    The  specific  placement  of data within  memory  is  managed  auto-
matically, so all we are concerned about is the specific name and number 
values and the connectivity of the aggregates.   Thus,  for our purposes 
it is best to suppress the irrelevant distracting detail associated with 
the  specific  addresses.   The following diagram is one helpful way  to 
portray only what we are concerned about:

	+----+----+
	|  / | \  |
	+-/--+--\-+
	 /	 \
      BILBO	  31

    This  imagery suggests the word "pointers" for the addresses  stored 
in nodes.

    If  you  have seen one bisected box you have seen them  all,  so  to 
reduce  the  clutter  and  thus emphasize  the  essential  features,  we 
henceforth represent such nodes by a mere vertex in our diagrams, giving 
schematics such as

	/\
       /  \
    BILBO  31

    Although most muSIMP programs use such aggregates  internally,  many 
muSIMP  programs  are  designed  to  read and  print  data  in  whatever 
specialized notation is most suitable for the application.  For example, 
muMATH uses operator and functional notation.   Suppose however that  we 
want  to specify such aggregates directly in input and output.   How can 
we  do  it?   If  we  have  a  nice  graphics  terminal,  then  then  we 
conveniently  could use diagrams such as the above.   Most of us do  not 
have  nice  graphics  terminals,   so  we  must  use  another   external 
representation.    For   this  purpose  muSIMP  uses  a   representation 
consisting  of  the first data item,  followed by the second data  item, 
separated  by  a  dot and spaces,  all enclosed in a  pair  of  matching 
parentheses.  For example:

	(BILBO . 31)

    We call this representation of a node a DOTTED PAIR.   However, this 
is a diferent use of parentheses and periods from how they are otherwise 
used in muSIMP input,  so we must preceed the dotted pair by the single-
quote  prefix  operator  to  indicate to the parser that  we  are  using 
dotted-pair  notation  rather  than the  usual  operator  or  functional 
notation:

	'(BILBO . 31)

    Moreover,  we  must use an ampersand rather than a semicolon as  the 
expression-terminator  in  order  to  inform the  driver  to  print  the 
expression  as  a  dotted  pair rather than attempt to  print  it  using 
operator  and  functional notation.   We say "attempt" because  not  all 
dotted pairs are appropriate for operator or functional printing,  as we 
will  explain in the last lessons.   Here then is an example of  dotted-
pair input and printing:  %

'(78 . TROMBONES) &
%   Try  a few of your own,  and note what happens when you  forget  the 
single-quote or use a semicolon rather than an ampersand:  %
RDS: FALSE $
%    What about when we want to represent an aggregate of more than  two 
atomic data elements?   For example,  what if we want to include BILBO's 
last  name,  BAGGINS?   Well,  we can let one of the pointers of a  node 
point  to another node,  whose first pointer points to BILBO  and  whose 
other pointer points to BAGGINS.  For example:

		/\
	       /  \
	      /\   31
	     /  \
	 BILBO  BAGGINS

    We can input this as a dotted pair nested within a dotted pair:  %

'((BILBO . BAGGINS) . 31) &
%  Note that we only quote the outermost dotted pair.   Now suppose that 
we want to also include BILBO's species, structured as follows:
                /\
	       /  \
	      /\  HOBBIT
	     /  \
	    /\   31
	   /  \
       BILBO  BAGGINS

    How would you input that?
    Remember, your input must be terminated by an ampersand.
%  RDS: FALSE $
%  We would input it as: %

EG: '(((BILBO . BAGGINS) . 31) . HOBBIT) &

%    An   alternative  structure  for  this  information  is   the   one 
corresponding to the input  
                 '((BILBO . BAGGINS) . (31 . HOBBIT)).  
On a piece of scratch paper, sketch the corresponding diagram, then hold 
it close to my face so I can check it.

	-----
       / O^O \
       \ \-/ /
	\---/		% RDS: FALSE $
%   My eyes must be getting bad, I couldn't see it.  Oh well...

    Since either element of a dotted pair can be a dotted pair, they can 
be  used  to represent arbitrary "binary  tree  structures".   Moreover, 
although  perhaps  unprintable using pure dotted-pair  notation,  linked 
networks  of  binary nodes can be used to represent  any  data-structure 
whatsoever.

    In order to do anything interesting with data aggregates,  a program 
must be able to extract their parts.   Accordingly,  there are a pair of 
SELECTOR  functions named FIRST and REST which respectively  return  the 
left and right pointers in a node.  For example:  %

REST (EG) &
FIRST (EG) &
FIRST (FIRST (EG)) &
REST (FIRST (EG)) &
%   See  if  you can extract BILBO and BAGGINS  from  EG,  using  nested 
compositions of FIRST and/or REST:  %  RDS: FALSE $
%   Our answers are:  %
FIRST (FIRST (FIRST (EG))) &
REST (FIRST (FIRST (EG))) &
%    Deeply  nested function invocations become difficult  to  type  and 
read,  so let's define our first muSIMP function named FFFIRST,  so that 
FFFIRST  (EG) could be used as shorthand for the first of the above  two 
examples and for any analogous example thereafter:  %

FUNCTION FFFIRST (U),
  FIRST (FIRST (FIRST (U)))
ENDFUN &
%    If you are not using a hard-copy terminal,  jot down this  function 
definition and all subsequent ones for reference later in the lesson.

    Despite  the word ENDFUN,  the fun has just begun:  Now that FFFIRST 
is defined,  we can apply it at any subsequent time during the dialogue. 
For example:  %
FFFIRST (EG) &
FFFIRST ('(((BIG . MAC) . CATSUP) . (FRENCH . FRIES))) &
%   Using the definition of FFFIRST as a model,  define a function named 
RFFIRST  which  extracts  the  REST of the FIRST of  the  FIRST  of  its 
argument, then test RFFIRST on EG:  %  RDS: FALSE $
%   Our solution is:  %

FUNCTION RFFIRST (FOO),
  REST (FIRST (FIRST (FOO))),
ENDFUN &
RFFIRST (EG) &
%    The  name FOO in the definition is called a PARAMETER,  whereas  EG 
where the function is applied is an example of an ARGUMENT.   We can use 
any name for a parameter -- even a name which has been bound to a  value 
or  even  the same name as an argument.   The name is merely used  as  a 
"dummy  variable"  to help indicate what to do to an argument  when  the 
function  is  subsequently  applied.   A function definition is  like  a 
recipe.  It is filed away, without actually being EXECUTED until applied 
to actual arguments.

    As another simple example,  since an atom is defined as being either 
a name or an integer, it is convenient to have a recognizer function for 
atoms,  so  that we do not have to test separately for names  and  atoms 
when  we  do not care which type of atom is involved.   We could  define 
this recognizer as follows:

FUNCTION ATOM (U),
  NAME (U)  OR  NUMBER (U)
ENDFUN &

    Actually,  ATOM  is  already  built-into  muSIMP,  but  the  example 
provides a good opportunity to introduce the built-in infix OR operator, 
which  returns FALSE if both of its operands are FALSE,  returning  TRUE 
otherwise.  Try out ATOM on the examples -5, X and EG %     RDS: FALSE $
%    Analogous  to  OR,  there is a built-in infix  AND  operator  which 
returns  FALSE  if either operand is FALSE,  returning  TRUE  otherwise.  
There  is also a built-in prefix NOT operator which returns TRUE if  its 
operand is FALSE,  returning FALSE otherwise.   Knowing this, see if you 
can define a recognizer named NODE,  which returns TRUE if its  argument 
is a node, returning FALSE otherwise:  %  RDS:  FALSE $
%   In programming there is rarely,  if ever,  one unique solution,  but 
ours is:  %

FUNCTION NODE (U),
  NOT ATOM (U)
ENDFUN &
NODE (EG) &
NODE (5) &
%    So  much for trivial exercises.   Now let's write a function  which 
counts the number of atoms in its argument.  We will count each instance 
of each atom, even if some atoms occur more than once.

    At first this may seem like a formidable task, because a tree can be 
arbitrarily branched.   How can we anticipate ahead of time all of these 
possibilities.   Well,  let's  procrastinate  by disposing of  the  most 
trivial  cases even though we can't yet see the whole solution:  If  the 
argument is an atom, then there is exactly 1 atom in it.

    So much for trivial cases.  We haven't yet solved the whole problem, 
but  it builds our self-confidence to make progress,  so that is a  good 
psychological reason for first disposing of the easy cases.   Also, with 
the easy cases out of the way,  we can turn our full intellectual powers 
on  the  harder cases,  unfettered by any distractions to trivial  loose 
ends.

    We are left with the case where we know we have a node.   Perhaps we 
could somehow subdivide the problem into smaller cases?

    Let's see ...  Nodes have a FIRST part and a REST part,  so  perhaps 
that provides the natural subdivision.  Hmmm ...

    If  we knew the number of atoms for the left part and the number for 
the  right part,  clearly the number for the whole aggregate  is  merely 
their sum.   But how can we find out the number of atoms in these parts?  
Why  not RECURSIVELY use the very function we are defining to  determine 
these two contributions!

    It  may sound like cheating to refer to the function we are defining 
from with the definition itself,  but remembering that the definition is 
not  actually APPLIED until sometime after its definition  is  complete, 
perhaps  it  will  work.    We  are  working  in  a  highly  interactive 
environment, so the quickest way to resolve questions about muSIMP is to 
try  it  and  see!   Here then is a formal  muSIMP  function  definition 
corresponding the the above informal English "algorithm":  %

FUNCTION #ATOMS (U),
  WHEN ATOM (U),  1  EXIT,
  #ATOMS (FIRST(U))  +  #ATOMS (REST(U))
ENDFUN &
%   Here we introduce 2 new concepts:  The BODY of a function definition 
can  consist  of  a  sequence of one or more  expressions  separated  by 
commas.  A CONDITIONAL-EXIT is an expression consisting of a sequence of 
one  or more expressions nested between the matching pair of words  WHEN 
and EXIT.  When a function definition is APPLIED, the expressions in its 
body are evaluated sequentially, until perhaps a conditional exit causes 
an  exit  from  the procedure or until the  delimiter  named  ENDFUN  is 
reached.   For  a conditional exit,  the first expression after the word 
WHEN is evaluated.   If the value is FALSE,  then evaluation proceeds to 
the  point  immediately  following the matching  delimiter  named  EXIT.  
Otherwise,   evaluation  proceeds  sequentially  through  the  remaining 
expressions in the conditional exit,  if any,  exactly as if the body of 
the  conditional  exit replaced that of the function.   The value  of  a 
conditional exit is that of the last expression evaluated  therein,  and 
the  value  returned  by  a  function is that  of  the  last  expression 
evaluated therein when the function is applied.

    Thus,  #ATOMS immediately returns the value 1 whenever the  argument 
is  an  atom,  and otherwise the function breaks the problems  into  two 
parts which are necessarily smaller, hence closer to being atoms.  Let's 
test it, starting with trivial cases first:  %

#ATOMS (FOO) &
#ATOMS (5) &
EG &
#ATOMS (EG) &
%    It looks promising,  but it is still perhaps mysterious how  muSIMP 
and  #ATOMS  keep track of all of these recursive function  invocations.  
Since the trace package is supposedly loaded,  to trace the execution of 
#ATOMS, we merely issue the command:  %

TRACE (#ATOMS) &
%    Now every time #ATOMS is entered,  it prints its name and  argument 
values,  whereas every time it is exited, it prints its name followed by 
an equal sign,  followed by the returned value.   Moreover, the trace is 
indented in a manner which allows corresponding entries and exits to  be 
visually associated.  Watch: %

#ATOMS (FOO) &
EG &
#ATOMS (EG) &
%   Try a few examples of your own,  until these new ideas begin to gel: 
%  RDS: FALSE $

UNTRACE (#ATOMS) &
#ATOMS (FOO) &
%    Here is a function which counts only the number of integers in  its 
argument:  %

FUNCTION #INTEGERS (U),
  WHEN INTEGER (U),  1  EXIT,
  WHEN NAME (U),  0  EXIT,
  #INTEGERS (FIRST(U))  +  #INTEGERS (REST(U))
ENDFUN $
EG &
#INTEGERS (EG) ;
%   Now, using it as a model, try writing a function named #NAMES, which 
returns   the  number  of  names  in  its  argument.    If  your   first 
syntactically accepted attempt fails any test, try using TRACE to reveal 
the reason why:  %  RDS:  FALSE $
%   Our solution is ...

    On second thought, we won't give you our solution.  Consequently, if 
you  were  lazy  and didn't try,  you had better try  now,  because  the 
examples will get steadily harder now.  %  RDS:  FALSE $ 
%   The HEIGHT of an atom is 1,  and the HEIGHT of a node is 1 more than 
the   maximum  of  the  two  heights  of  its  FIRST  and  REST   parts.  
Accordingly,  let's first write a function named MAX,  which returns the 
maximum of its two integer arguments.  There is a built-in infix integer 
comparator named ">", so here is a hint:

FUNCTION MAX (INT1, INT2),
  WHEN INT1 > INT2,  ...  EXIT,
  ...
ENDFUN $

Enter such a definition,  with appropriate substitutions for the missing 
portions,  then  test your function to make sure it works correctly:   %
RDS: FALSE $
%    Now,  with  the  help of our friend MAX,  see if you  can  write  a 
function named HEIGHT, which returns the height of its argument: %
RDS: FALSE $
%  Our solution is: %

FUNCTION HEIGHT (U),
  WHEN ATOM (U),  1  EXIT,
  1 + MAX (HEIGHT(FIRST(U)), HEIGHT(REST(U)))
ENDFUN $
%   This brings us to the end of the first programming-mode lessons.  It 
may  be  a good idea to review this lesson before proceeding  to  lesson 
PLES2.TRA.  %

ECHO: FALSE $
MOVD (#PRINT, PRINT) $  MOVD (#PRINTLINE, PRINTLINE) $
RDS () $
