2.5 The Algorithm of Pattern Matching

Pattern matching is one of the two basic operations of the Refal machine, so we must look into it in more detail. Let us first recall some basic definitions.

An expression which includes neither activation brackets nor free variables is referred to as an object expression. An expression which may include free variables yet does not include activation brackets is a pattern expression, or just a pattern. We write the operation of matching as E : P where P is a pattern, and E an object expression. Here, and in the following, capital `script' letters, as well as italicized letters, are used as metasymbols standing for Refal objects.

A matching operation either succeeds or fails. E : P succeeds if there is a substitution S for the free variables in P such that when it is performed, P becomes identical to E . If it succeeds, we say that E can be recognized as (a particular case of) P, or equivalently, that P can be mapped on E . The free variables in P then take the values assigned by the substitution . If matching fails, the values of the variables in P are undefined. There may be more than one substitution transforming P into E . To make the result of matching unique we use the following convention, which means that the matching is performed from left to right:

Left-to-right matching. If there is more than one way of assigning values to free variables in the pattern to achieve matching then the one is chosen in which the leftmost e-variable takes the shortest value. If this does not resolve ambiguity then the same selection is done on the next e-variable from the left, etc.

With this convention, matching in Refal becomes a unique operation. It is only the result of matching that the user, strictly speaking, must anticipate when writing out Refal sentences. You can skip the rest of this section and still be able to program successfully in Refal. However, it helps to think about matching as a specific algorithm. Therefore, we shall define the exact algorithm for matching an object expression E to a pattern P, as it is implemented in Refal-5.

Entries of symbols, brackets, and variables will be referred to as elements of expressions. Gaps between elements will be called nodes. We define matching E :P as a process of mapping, or projecting, elements and nodes of the pattern P onto the elements and nodes of the object expression E . A graphic representation of a successful mapping is given in Fig. 2.1. Here nodes are represented by signs o.

(A + B) * (B - C)
Figure 2.1 Matching E : P as mapping P on E . Here E is 'A'((2'B'))'B' and P is 'A'(e1 t2)s3 .

The following obvious requirements must be observed at every stage of matching:


General requirements to mapping P on E (matching E : P)


  1. If a node N2 is positioned in P to the right of a node N1, then the projection of N2 in E can either coincide with, or be positioned to the right of, the projection of N1 (projection lines cannot cross).
  2. Projections of brackets and symbols must be identical to themselves.
  3. Projections of variables must meet the syntax requirements of their values; i.e., to be symbols, terms, or arbitrary expressions for s-, t-, and e-variables, respectively. Different entries of the same variable must have equal projections.


It is assumed that at the beginning of the matching the boundary nodes of P are mapped on the boundary nodes of E . The mapping process is described by the following six rules. At every stage of mapping the Rules 1-4 determine the element to be mapped next; thus each element of P gets its unique mapping number.


Rules of mapping


  1. After a bracket is mapped, its pairing bracket is mapped next.
  2. If, as a result of previous steps, both ends of an entry of an e-variable are already mapped, but this variable has no value yet (no other entry of it has been mapped), then this variable is mapped next. Such entries are called closed e-variables. Two closed e-variables can appear at the same time; in this case, the one on the left is mapped first.
  3. An entry of a variable which already has a value is a repeated entry. Brackets, symbols, s-variables, t-variables, and repeated entries of e-variables in P are rigid elements. If one of the ends of a rigid element is mapped, the projection of the other end is unique. If Rules 1 and 2 are not applicable, and there are some rigid elements with one end projected, the leftmost of these is chosen. If it is possible to map this element without contradicting the general requirements 1-3 above, then it is mapped, and the process goes on. Otherwise, a deadlock situation is stated.
  4. If Rules 1-3 are not applicable, and there are some e-variables with the left end mapped, then the leftmost one is chosen. It is called an open e-variable. Initially it gets an empty value, i.e., its right end is projected on the same node as the left one. Other values may be assigned to open variables through lengthening (see Rule 6).
  5. If all elements of P are mapped, the matching process is successfully completed.
  6. In a deadlock situation, the process comes back to the last open e-variable (i.e., the one with the maximal projection number) and its value is lengthened; i.e., the projection of its right end in E is moved one term to the right. Thereafter the process is resumed. If the variable cannot be lengthened (because of the General requirements 1-3), the preceding open variable is lengthened, etc. If there is no open variable to be lengthened, the matching process fails.

In Fig. 2.1 the matching proceeds as follows. In the beginning there are two rigid elements with one end mapped: 'A' and s3 . In accordance with Rule 3, we map 'A' , and this element gets mapping number 1. Numbers 2 and 3 will be the left and the right parentheses, by Rules 3 and 1. Inside the parentheses we now move from right to left, because t2 is a rigid element which can be mapped while the value of e1 cannot yet be determined. At the next stage we find that e1 is a closed variable whose projection we need not examine in order to assign it a value; whatever is between the two nodes goes to it (in fact, the value of e1 happens to be empty). The mapping of s3 completes the matching. Putting mapping numbers over the elements of a pattern gives a good idea of the algorithm involved:

  1   2   5   4   3   6
 'A'  (   e1  t2  )   s3

Consider another pattern, with mapping numbers over the elements:

  1   6   7   8   2   9   10  11  4   5   3
  (   e1 '+'  e2  )   e3  '+' e4  (   e5  )

Here e1 and e3 are open variables while e5 , e2 , and e4 are closed. Parentheses in Refal are used to jump from one part of expression to another. The first four steps of mapping break our expression into three sub-expressions. The last one passes immediately to e5 . In the first sub-expression we have to find the first '+' . The initial value of e1 becomes empty. If the next (i.e., the first) symbol is '+' , the variable e2 takes the rest of the sub-expression (i.e., the whole of it), and we proceed to the mapping number 9, the variable e3 . If the first symbol is not '+' , the value of e1 is lengthened and becomes equal to that symbol; the second symbol is then compared with '+' , etc. When (and if) the '+' number 7 is found, we go on looking for the '+' number 10.

When a free variable V from P takes a value W, we shall write this as the substitution W<-V. In contrast to the usual notation, the variable is on the right side. The reason: it belongs to the right side of the matching operation E : P, and its value is a sub-expression of the left side.

Let the pattern above be mapped on the object expression:

  (Apples + Peaches + Plums) Cost $45 + 4% (Tax) 
(Note that we wrote this expression without quotes -- the way it would be written for the function Input . As mentioned before we can do this when we deal with object expressions. If it were part of a Refal program, we would have to quote all characters.) Using our notation of assignments, the result of the matching is:
              Tax <- e5
           Apples <- e1
  Peaches + Plums <- e2
         Cost $45 <- e3
               4% <- e4

Consider the matching:

  ('METASYSTEM INDEX')'XYZ' : (e1 sX e2) e3 sX e4 
The mapping numbers are as follows:
  1    3    4    5    2    6    7    8              
  (    e1   sX   e2   )    e3   sX   e4  

After the two sub-expressions are separated, e1 becomes empty, and sX is projected onto 'M' . Then e3 becomes empty, and sX number 7 must be mapped at the beginning of the second sub-expression. However, this is impossible because sX would then have the value 'X' , while it already has the value 'M' . Thus a deadlock situation is declared. The last open variable is e3 , so we lengthen it, and it becomes 'X' . But this does not help, and we keep lengthening e3 until it becomes 'XYZ' and cannot be lengthened any more. Then we lengthen e1 , and the loop over 'XYZ' repeats. Finally we come to a successful matching with:

       'METAS' <- e1
           'Y' <- sX
  'STEM_INDEX' <- e2
           'X' <- e3
           'Z' <- e4

It should be noted that the order in which rigid elements of a pattern are mapped does not affect the ultimate result of matching (neither the succeed/fail status nor the values of variables); we have fixed a certain order (namely, from left to right) for a complete definition of the matching algorithm only. But the order in which open variables are mapped does make a difference. In the example above, if we chose to map e3 first:

   1    6    7    8    2    3    4    5              
   (    e1   sX   e2   )    e3   sX   e4  
the result would be a completely different assignment of values to variables:
                    <- e3
                'X' <- sX
               'YZ' <- e4
  'METASYSTEM_INDE' <- e1
                    <- e2

In the pattern:

  e1'+'e2(('**')e1 t3)
it may seem at first glance that e1 is an open variable. But it is not. One entry of e1 is closed, the other repeated:
  9   10  11  2   3   5   6   4   8   7   1
  e1  '+' e2  (   (  '*' '*'  )   e1  t3  ) 
Exercise 2.5 Put mapping numbers and indicate the type (closed, open, repeated) of each entry of an e-variable in the following patterns:
(a) eA (t2)(Sunday)(s.Day s.Night)
(b) (e.Word) e1 ((e.Word) e.Translation) e2
(c) (e1'+'e2'+'e3)e1 '+' e2(e3)
Exercise 2.6 Find the results of these matches:
(a) 'diffident' : eA t1 t1 eB
(b) 'diffident' : e1 sX e2 sX e3
(c) '(Texas)' : (e.State)
(d) A B C D : e1 e2 e3 D
(e) A(A B)(C)((C))D : e1 eX eX e2

It should be noted that in the practice of Refal programming we seldom use very complicated patterns.