Back Table of contents Index Next   BibTool Manual
Introduction
Using BibTool --- Some Instructive Examples
Normalization

Normalization

BibTool can be used to normalize the appearance of BibTeX databases. As an example we can consider the different forms of delimiters for fields. BibTeX allows the use of of braces or double quotes. Now it can be desirable to use one style only. For this purpose the rewriting facility of BibTool can be applied.

bibtool -- 'rewrite.rule={"^\"\([^#]*\)\"$" "{\1}"}' -o out.bib

Since this seems to be rather cryptic we will have a closer look at this example. First we have to mention that the outer quotes are there because the UN*X shell (csh) treats some characters special and we want to avoid this to happen to the rewrite rule given. A similar quoting mechanism might be required for all command line interpreters.

The rewrite rule is applied to any field. The first string---called pattern---which is enclosed in double quotes is matched against the contents of the field. If a match is found then the matching substring is replaced by the replacement text in the second string.

The pattern is a regular expression like the ones used in Emacs. The first character is the hat (^). This character anchors the match at the beginning of the line. The last character is the dollar sign which anchors the end at the end of the field value. Thus only complete matches are considered.

Since we want to find those fields whose values are enclosed in double quotes they are given after the hat and before the dollar. To avoid a misinterpretation as the end of the pattern they have to be quoted with the backslash (\).

Next we have the parentheses \(...\). They are instructions to memorize the matching substring in a register. Since it is the first instruction of this kind the register number 1 is used.

Now we come to the point where we have to specify the contents of the string. For this purpose we use a character class---written as [...]. Since the first character in this class specification is a hat this class consists of all characters but those given after the hat. Thus all characters but the hash sign (#) are allowed.

The star (*) after the character class indicates that an arbitrary number of characters of this class are allowed.

We have used the complicated construction with a character class to avoid wrong results which would have resulted when this rewrite rule is applied to a concatenated field value like the following one:

  author = "A. U. Thor" # " and " # "S. O. Meone"

Such fields are left unchanged by the rewrite rule given above. We could have used the point (.) instead of the character class since the point matches any character. But this would have let to the syntactic wrong result:

  author = {A. U. Thor" # " and " # "S. O. Meone}

But we have to complete the explanation of the rewrite rule. The remaining part is the replacement text. Here we just have to note that the substring \1 is not copied verbose but replaced with the contents of the first register. This register contains the contents of the field without the delimiting double quotes.

Thus we have a solution to our initial problem which is conservative in the sense that it sometimes fails but never produces a wrong result.



Back Table of contents Index Next   BibTool Manual
Introduction
Using BibTool --- Some Instructive Examples
Normalization
© 1999
Gerd Neugebauer