Back Table of contents Index Next   BibTool Manual
Reference Manual
Field Manipulation
Field Rewriting

Field Rewriting

Field modifications can be used to optimize or normalize the appearance of a BibTeX data base. The powerful facility of regular expression matching is used for this purpose as we have already seen in section Normalization.

The resource rewrite.rule can be used to specify rewrite rules. The general form is as follows:

rewrite.rule { field1 ... fieldn # pattern # replacement_text}

field1 ... fieldn is a list of field names. The rewrite rule is only applied to those fields which have one of those names. If no field name is given then the rewrite rule is applied to all fields.

rewrite.rule { pattern # replacement_text}

Next there is the separator '#'. This separator is optional. It can also be the equality sign '='.

pattern is a regular expression enclosed in double quotes ("). This pattern is matched against substrings of the field value---including the delimiters. If a match is found then the matching substring is replaced by the replacement text or the field deleted if no replacement text is given.

replacement_text is the string to be inserted for the matching substring of the field value. The backslash '\' is used as escape character. '\n' is replaced by the nth matching group of pattern. n is a single digit (1--9). Otherwise the character following the backslash is inserted.1 Thus it is possible to have double quotes inside the replacement text.

Other specials are

If no replacement text is given then the whole field is deleted. In fact the instruction delete.field is only an alias for a corresponding rewrite rule with an empty replacement text. This behaviour is illustrated in the following abstract examples:

rewrite.rule {field # pattern }

rewrite.rule {pattern}

More concrete, the rewrite rule

rewrite.rule { time # "^{}$" }

deletes the time field if the value of the field is empty and enclosed in curly braces. This is checked with the anchored regular expression ^{}$. The hat ^ matches the beginning of the value and the dollar $ matches its end. Since nothing is in between---except the field delimiters---the rule is applied only to time fields with empty contents.

This can be generalized to the following rewrite rule which deletes all empty fields using the same mechanism and just omitting the specification of a field name:

rewrite.rule { "^{}$" }

Note that for a similar kind of rule for double quotes as field delimiters you need to quote these characters with backslashes:

rewrite.rule { "^\"\"$" }

The replacement text may contain field formatting instructions as described in section Formatting Fields on page Formatting Fields. These field fomatting instructions are replaced by their respective values. Thus we could exploit again the time stamp example from above. The following rewrite rule will update an existing timestamp without adding one if none is present:

rewrite.rule { time ".*" = "%3s($mon) %s($day), %2d($year)"}

The pattern .* matches any sequence of arbitrary characters. Thus the old contents of the field is a match. In this example the value is not reused in the replacement text. Thus the old contents is completly replaced by the new one.

Usually the matching is done case insensitive. This means that any upper case letter matches its lower counterpart and vice versa. This behavior is controlled by the boolean resource rewrite.case.sensitive which is on by default. Changing this variable influences only rewrite rules specified later.

rewrite.case.sensitive = off

A problem occurs e.g. when a string is replaced by a string containing the original one. To avoid infinite recursion in such cases the numeric resource rewrite.limit controls the number of applications of each rewrite rule. If the number given in rewrite.limit is not negative and this limit is exceeded then a warning is printed and further applications of this rule are stopped. A negative value of the resource rewrite.limit indicates that no limitation should be used.

Next we will investigate some concrete examples. Note that in these examples the character ' ' denotes a single space. It is used to highlight places where spaces have to be used which would be hard to recognize otherwise.


1 Future releases may use backslash followed by letters for special purposes. It is not safe to rely on escaping letters.



Back Table of contents Index Next   BibTool Manual
Reference Manual
Field Manipulation
Field Rewriting
© 1999 Gerd Neugebauer