Pattern matching is one of the two basic operations of the Refal machine, so we must look into it in more detail. Let us first recall some basic definitions.
An expression which includes neither activation brackets nor free variables is referred to as an object expression. An expression which may include free variables yet does not include activation brackets is a pattern expression, or just a pattern. We write the operation of matching as : where is a pattern, and an object expression. Here, and in the following, capital `script' letters, as well as italicized letters, are used as metasymbols standing for Refal objects.
A matching operation either succeeds or fails.
: succeeds
if there is a substitution S for the free variables
in such that when it is performed, becomes identical to
. If it
succeeds, we say that
can be recognized as (a particular case
of) , or equivalently, that can be mapped on
. The free
variables in then take the values assigned by the substitution
. If matching fails, the values of the variables in are
undefined. There may be more than one substitution transforming
into
. To make the result of matching unique we use the
following convention, which means that the matching is performed
from left to right:
Left-to-right matching. If there is more than one way of
assigning values to free variables in the pattern to achieve
matching then the one is chosen in which the leftmost e-variable
takes the shortest value. If this does not resolve ambiguity
then the same selection is done on the next e-variable from the
left, etc.
With this convention, matching in Refal becomes a unique operation. It is only the result of matching that the user, strictly speaking, must anticipate when writing out Refal sentences. You can skip the rest of this section and still be able to program successfully in Refal. However, it helps to think about matching as a specific algorithm. Therefore, we shall define the exact algorithm for matching an object expression to a pattern , as it is implemented in Refal-5.
Entries of symbols, brackets, and variables will be referred
to as elements of expressions. Gaps between elements will be
called nodes. We define matching
: as a process of mapping,
or projecting, elements and nodes of the pattern onto the elements
and nodes of the object expression
. A graphic representation
of a successful mapping is given in Fig. 2.1. Here nodes are represented
by signs o.
Figure 2.1 Matching
:
as mapping on
. Here
is 'A'((2'B'))'B'
and
is 'A'(e1 t2)s3
.
The following obvious requirements must be observed at every stage of matching:
It is assumed that at the beginning of the matching the boundary nodes of are mapped on the boundary nodes of . The mapping process is described by the following six rules. At every stage of mapping the Rules 1-4 determine the element to be mapped next; thus each element of gets its unique mapping number.
In Fig. 2.1 the matching proceeds as follows. In the beginning there are two rigid elements with one end mapped: 'A' and s3 . In accordance with Rule 3, we map 'A' , and this element gets mapping number 1. Numbers 2 and 3 will be the left and the right parentheses, by Rules 3 and 1. Inside the parentheses we now move from right to left, because t2 is a rigid element which can be mapped while the value of e1 cannot yet be determined. At the next stage we find that e1 is a closed variable whose projection we need not examine in order to assign it a value; whatever is between the two nodes goes to it (in fact, the value of e1 happens to be empty). The mapping of s3 completes the matching. Putting mapping numbers over the elements of a pattern gives a good idea of the algorithm involved:
1 2 5 4 3 6 'A' ( e1 t2 ) s3
Consider another pattern, with mapping numbers over the elements:
1 6 7 8 2 9 10 11 4 5 3 ( e1 '+' e2 ) e3 '+' e4 ( e5 )
Here e1 and e3 are open variables while e5 , e2 , and e4 are closed. Parentheses in Refal are used to jump from one part of expression to another. The first four steps of mapping break our expression into three sub-expressions. The last one passes immediately to e5 . In the first sub-expression we have to find the first '+' . The initial value of e1 becomes empty. If the next (i.e., the first) symbol is '+' , the variable e2 takes the rest of the sub-expression (i.e., the whole of it), and we proceed to the mapping number 9, the variable e3 . If the first symbol is not '+' , the value of e1 is lengthened and becomes equal to that symbol; the second symbol is then compared with '+' , etc. When (and if) the '+' number 7 is found, we go on looking for the '+' number 10.
When a free variable V from takes a value W, we shall write this as the substitution WV. In contrast to the usual notation, the variable is on the right side. The reason: it belongs to the right side of the matching operation : , and its value is a sub-expression of the left side.
Let the pattern above be mapped on the object expression:
(Apples + Peaches + Plums) Cost $45 + 4% (Tax)(Note that we wrote this expression without quotes -- the way it would be written for the function Input . As mentioned before we can do this when we deal with object expressions. If it were part of a Refal program, we would have to quote all characters.) Using our notation of assignments, the result of the matching is:
Tax <- e5 Apples <- e1 Peaches + Plums <- e2 Cost $45 <- e3 4% <- e4
Consider the matching:
('METASYSTEM INDEX')'XYZ' : (e1 sX e2) e3 sX e4The mapping numbers are as follows:
1 3 4 5 2 6 7 8 ( e1 sX e2 ) e3 sX e4
After the two sub-expressions are separated, e1
becomes empty, and
sX
is projected onto 'M'
. Then e3
becomes empty,
and sX
number 7
must be mapped at the beginning of the second sub-expression.
However, this is impossible because sX
would then have the
value 'X'
, while it already has the value 'M'
. Thus a deadlock
situation is declared. The last open variable is e3
, so we lengthen
it, and it becomes 'X'
. But this does not help, and we keep
lengthening e3
until it becomes 'XYZ'
and cannot be lengthened
any more. Then we lengthen e1
, and the loop over 'XYZ'
repeats. Finally we come to a successful matching with:
'METAS' <- e1 'Y' <- sX 'STEM_INDEX' <- e2 'X' <- e3 'Z' <- e4
It should be noted that the order in which rigid elements of a pattern are mapped does not affect the ultimate result of matching (neither the succeed/fail status nor the values of variables); we have fixed a certain order (namely, from left to right) for a complete definition of the matching algorithm only. But the order in which open variables are mapped does make a difference. In the example above, if we chose to map e3 first:
1 6 7 8 2 3 4 5 ( e1 sX e2 ) e3 sX e4the result would be a completely different assignment of values to variables:
<- e3 'X' <- sX 'YZ' <- e4 'METASYSTEM_INDE' <- e1 <- e2
In the pattern:
e1'+'e2(('**')e1 t3)it may seem at first glance that e1 is an open variable. But it is not. One entry of e1 is closed, the other repeated:
9 10 11 2 3 5 6 4 8 7 1 e1 '+' e2 ( ( '*' '*' ) e1 t3 )Exercise 2.5 Put mapping numbers and indicate the type (closed, open, repeated) of each entry of an e-variable in the following patterns:
(a) eA (t2)(Sunday)(s.Day s.Night) (b) (e.Word) e1 ((e.Word) e.Translation) e2 (c) (e1'+'e2'+'e3)e1 '+' e2(e3)Exercise 2.6 Find the results of these matches:
(a) 'diffident' : eA t1 t1 eB (b) 'diffident' : e1 sX e2 sX e3 (c) '(Texas)' : (e.State) (d) A B C D : e1 e2 e3 D (e) A(A B)(C)((C))D : e1 eX eX e2
It should be noted that in the practice of Refal programming we seldom use very complicated patterns.