Regular Expression Matching

BibTool Manual

Reference Manual

Regular Expression Matching

Regular Expression Matching

BibTool makes use of the GNU regular expression library. Thus a short excursion into regular expressions is contained in this manual. Several examples of the application of regular expressions can be found also in other sections of this manual.

A concise description of regular expressions is contained in the document ``Regex'' which is included as the file regex-0.12/doc/regex.texi in the BibTool distribution. In any cases of doubt this documentation is preferable. A good source of information can also be found in [Fri97].

The remainder of this section contains a short description of regular expressions. This is more a reminder than a tutorial or explanation.

Ordinary characters

match only to themselves or their upper or lower case counterpart. Any character not mentioned as special is an ordinary character. Among others letters and digits are ordinary characters.

E.g. the regular expression abc matches the string abc.

The period

(.) matches any single character.

E.g. the regular expression a.c matches the string abc but it does not match the string abbc.

The star

(*) is used to denote any number of repetitions of the preceding regular expression. If no regular expression precedes the star then it is an ordinary character.

E.g. the regular expression ab*c matches any string which starts with a followed by an arbitrary number of b and ended by a c. Thus it matches ac and abbbc. But it does not match the string abcc.

The plus

(+) is used to denote any number of repetitions of the preceding regular expression, but at least one. Thus it is the same as the star operator except that the empty string does not match. If no regular expression precedes the plus then it is an ordinary character.

E.g. the regular expression ab+c matches any string which starts with a followed by one or more b and ended by a c. Thus it matches abbbc. But it does not match the string ac.

The question mark

(?) is used to denote an optional regular expression. The preceding regular expression matches zero or one times. If no regular expression precedes the question mark then it is an ordinary character.

E.g. the regular expression ab?c matches any string which starts with a followed by at most one b and ended by a c. Thus it matches abc. But it does not match the string abbc.

The bar

(|) separates two regular expressions. The combined regular expression matches a string if one of the alternative separated by the bar does.

E.g. the regular expression abc|def matches the string abc and the string def.

Parentheses

(()) can be used to group regular expressions. A group is enclosed in parentheses. It matches a string if the enclosed regular expression does.

E.g. the regular expression a(b|d)c matches the strings abc and adc.

The dollar

($) matches the empty string at the end of the string. It can be used to anchor a regular expression at the end. If the dollar is not the end of the regular expression then it is an ordinary character.

E.g. the regular expression abc$ matches the string aaaabc but does not match the string abcdef.

The hat

(^) matches the empty string at the beginning of the string. It can be used to anchor a regular expression at the beginning. If the hat is not the beginning of the regular expression then it is an ordinary character. There is one additional context in which the hat has a special meaning. This context is the list operator described below.

E.g. the regular expression ^abc matches the strings abcccc but does not match the string aaaabc.

The brackets

([]) are used to denote a list of characters. If the first character of the list is the hat (^) then the list matches any character not contained in the list. Otherwise it matches any characters contained in the list.

E.g. the regular expression [abc] matches the single letter strings a, b, and c. It does not match d.

The regular expression [^abc] matches any single letter string not consisting of an a, b, or c.

The backslash

(\) is used for several purposes. Primarily it can be used to quote any special character. Thus if a special character is preceded by the backslash then it is treated as if it were an ordinary character.

If the backslash is followed by a digit d then this construct is the same as the d^th matching group.

E.g. the regular expression (an)\1as matches the string ananas since the first group matches an.

If the backslash is followed by the character n then this is equivalent to entering a newline.

If the backslash is followed by the character t then this is equivalent to entering a single TAB character.

BibTool Manual

Reference Manual

Regular Expression Matching