Parsing and Pretty Printing

BibTool Manual

Reference Manual

Parsing and Pretty Printing

Parsing and Pretty Printing

The first and simplest task we have to provide on BibTeX files is the parsing and pretty printing. This is not superfluous since BibTeX is rather pedantic about the accepted syntax. Thus I decided to try to be generous and correct as many errors as I can.

Each input file is parsed and stored in an internal representation. BibTeX simply ignores any characters between entries. BibTool stores the comments and attaches them to the entry immediately following them. Normally anything between entries is simply discarded and a warning printed. The boolean resource pass.comments can be used to change this behavior.

pass.comments = on

If this resource is on then the characters between entries are directly passed to the output file. This transfer starts with the first non-space character after the end of an entry.

The standard BibTeX styles support a limited number of entry types. Those are predefined in BibTool. Additional entry types can be defined using the resource new.entry.type as in

new.entry.type = {Anthology}

This option can also be used to redefine the appearance of entry types which are already defined. Suppose we have defined Anthology as above. Afterwards we can redefine this entry type to be printed in upper case with the following option:

new.entry.type = {ANTHOLOGY}

Each undefined entry type leads to an error message.

When a database is printed the different kinds of entries are printed together. E.g. all normal entries are printed en block. The order of the entry types is determined by the resource print.entry.types. The value of this resource is a string where each character represents an entry type to be printed. If a letter is missing then this part of the database is omitted. The following letters are recognized---uppercase letters are folded to their lowercase counterparts if they are not mentioned explicitly:

a: The aliases of the database.
c: The comments of the database which are not attached to an entry.
i: The includes of the database.
m: The modifies of the database.
n: The normal entries of the database.
p: The preambles of the database.
$: The strings (macros) of the database.
S: The strings) of the database which are used in the other entries.
s: The strings) of the database where the resource print.all.strings determines whether all strings are printed or the used ones only.

The following invocation prints the preambles and the normal entries only. This can be desirable if the macros are printed into a separate file.

print.entry.types = {pn}

The internal representation is printed in a format which can be adjusted by certain options. Those options are available through resource files or by specifying resources on the command line.

print.line.length: This numeric resource specifies the desired width of the lines. lines which turn out to be longer are tried to split at spaces and continued in the next line. The value defaults to 77.
print.indent: This numeric resource specifies indentation of normal items, i.e. items in entries which are not strings or comments. The value defaults to 2.
print.align: This numeric resource specifies the column at which the '=' in non-comment and non-string entries are aligned. This value defaults to 18.
print.align.key: This numeric resource specifies the column at which the '=' in non-comment and non-string entries are aligned. This value defaults to 18.
print.align.string: This numeric resource specifies the column at which the '=' in string entries are aligned. This value defaults to 18.
print.align.preamble: This numeric resource specifies the column at which preamble entries are aligned. This value defaults to 11.
print.align.comment

The resource values described above are illustrated by the following examples. First we look at a string entry.

@STRING{macro   = "This is a rather long replacement text which exceeds one
                  line"}
                  |                                                        |
                  | print.align.string                   print.line.length |

Next we look at an unpublished entry. It has a rather long list of authors and a long title. It shows how the lines are broken.

                  | print.align.key
                  | 
@Unpublished{     unpublished-key,
  author        = "First A. U. Thor and Seco N. D. Author and Third A. Uthor
                  and others",
  title         = "This is a rather long title of an unpublished entry which
                  exceeds one line",
  note          = "Some useless comment"
}
  |               |                                                        |
  | print.indent  | print.align                          print.line.length |

When macros are expanded the delimiters of entries are normalized, i.e. only one style is used. The default is to usee braces The alternative would be to use double quotes. This behavior is controlled by the resource print.braces. If this resource is on then braces are used otherwise double quotes are taken. It can be changed like in

print.braces = OFF

The delimiters of the whole entry are recommended to be braces. For compatibility with Scribe it is also allowed that parentheses are used for those delimiters. This behavior can be achieved with the boolean resource print.parentheses. Initially this resource is off. It can be set like in the following instruction:

print.parentheses = ON

The field names of an entry are usually printed in lower case. This can be changed with the resource new.field.type. The argument of this resource is an equation where left of the '=' sign is the name of a field and on the right side is it's print name. They should only contain allowed characters.

new.field.type { author = AUTHOR }

This feature can be used to rewrite the field types. Thus it is completely legal to have a different replacement text than the original field:

new.field.type { OPTauthor = Author }

String names are used case insensitive by BibTeX. BibTool normalizes string names before printing. By default string names are translated to lower case. Currently two other types are supported: translation to upper case and translation to capitalized case, i.e. the first letter upper case and the others in lower case.

The translation is controlled by the resource symbol.type. The value is one of the strings lower, upper, and cased. The resource can be set as in

symbol.type = upper

The macro names are passed through the same normalization apparatus as field types. Thus you can force a rewriting of macro names with the same method as described above. You should be careful when choosing macro names which are also used as field types.

The reference key is usually translated to lower case letters unless a new key is generated (see section Key Generation). In this case the chosen format determines the case of the key. Sometimes it can be desirable to preserve the case of the key as given (even so BibTeX does not mind). This can be achieved with the boolean resource preserve.key.case. Usually it is turned off (because of backward compatibility and the memory used for this feature). You can turn it on as in

preserve.key.case = on

If it is turned on then the keys as they are read are recorded and used when printing the entries. The internal comparisons are performed case insensitive. This is not influenced by the resource preserve.key.case. Especially this holds for sorting which does not recognize differences in case.

Summary

¹ This is mainly obsolete now since comments do not have to follow any syntactic restriction.

BibTool Manual

Reference Manual

Parsing and Pretty Printing