NAME
ast - assembler for transputers
SYNOPSIS
ast [options] [FILE]
DESCRIPTION
ast produces LIT format object file from a transputer
assembly language program contained in the FILE. If the
FILE is omitted, ast takes the program from stdin.
OPTIONS
--help show usage summary
-o FILE, --output FILE
output to FILE instead of a.out.
-p N, --maxpasses N
limit the number of offset optimization passes by
N. ast runs offset optimizer repeatedly over the
internal representation of the program until fur-
ther optimization proves impossible or this limit
is reached. Default limit is 8, increasing it
might make a bit tighter code.
-E, --expr-print-long
affects verbosity of debugging dumps requested with
--trace or --dump. Giving this option once causes
elements and noname symbols referred from ELEMENT
and NAME expressions to be prefixed with their mem-
ory addresses. Giving this option twice results in
every expression being prefixed with its memory
address.
-t SPEC, --trace SPEC
enable debugging output to stdout. SPEC is a
string of the form KEY[+|-KEY][+|-KEY]... where
+KEY turns on certain tracing facility, -KEY turns
the facility off. KEY may be any of the following:
all, lex, point, opt, opt2, reduce, misc, expr.
all turns on all trace facilities; for the descrip-
tion of other facilities refer to the ast source.
-d SPEC, --dump SPEC
enable debugging dumps of internal representation
of the program being assembled at certain points.
SPEC is parsed in the same way as for --trace ;
valid dump keys are all, parse, opt, canon, unroll,
merge, needs.
INFORMAL DESCRIPTION OF THE INPUT LANGUAGE
The language that ast takes as input is mostly what you
would expect a transputer assembly language to be. You
might want to take a look at one of the assembly language
files present in TTOOLS distribution, or examine the out-
put of ``t800-gcc -O -S foo.c'' to get a feel of it.
NAMES
Names, or symbols, are essential part of any assembly lan-
guage. They serve to represent possibly not yet defined
values. For example, you need a name to code a forward
reference:
j mylabel
cmdfoo
cmdbar
mylabel:
cmdfoobar
In this example, the name mylabel represents the address
of the command cmdfoobar in transputer's memory.
DIGITAL LABELS
In addition to names, ast supports digital labels, a fea-
ture commonly found in UNIX assemblers. Digital labels
are often handy when you need a short-range label which
you hate to invent a name for. Digital label is a single
digit (0..9) followed by semicolon. References to digital
labels are written as a digit followed by letters 'f' or
'b' which denote forward and backward references, respec-
tively.
For example, here is a loop which cycles until cmdfoo
leaves zero in Areg:
1:
cmdfoo
cj 1f
cmdbar
j 1b
1:
ASSIGNMENTS
Another way to define a name is an explicit assignment.
For example:
mysize = mylabel - bottom
wordsize = 0x4
mysize_in_words = (mysize + (wordsize-1)) / wordsize
Left hand side of an assignment may contain an arbitrary
expression. Expression use the set of operators and con-
stants notation of the C programming language. Apart from
names, constants, operators, and digital label references,
appearing in expressions may be a dot ('.') which stands
for the current value of the program counter, the assembly
point.
COMMANDS
The syntax for commands on transputers is very simple,
compared to that on traditional CISC processors. Consider
the following piece of code:
ldc fp_array + 5*word_size
fpldnlsn
fpuabs
ldlp 1
fpstnlsn
j end
ldc is a direct command. All direct commands take an
arbitrary expression as an argument; the assembler will
compute the expression and assemble the appropriate
sequence of pfix/nfix instructions before the direct com-
mand, if necessary.
Three direct commands (j, cj, and call) have a special
treatment of their argument: ast implicitly substracts the
address of the end of the command being assembled from the
argument. The reason for this is that the transputer
hardware interprets the argument of these commands as an
offset from the end of the command to the destination
label/function. This special handling compensates for
this, so that the commands look naturally, for example:
call _printf
fpldnlsn is an indirect comand. Indirect commands take no
immediate operand. Actually, they are assembled as
opr indirect_opcode
Ast chooses the indirect_opcode based on the mnemonics of
the indirect command. You might want to use opr directly
if you ever need an indirect command which ast doesn not
know a mnemonics for - but better yet, submit a patch to
get your new command added to ast's instruction table.
See ttools*/ast/scan.l.
fpuabs, as all fpu* commans, is a doubly-indirect floating
point unit command. It is doubly indirect because it is
actually coded as
ldc fpu_opcode; fpentry
where fpentry is an indirect command. ast hides the ldc,
but bear in mind that the ldc *is* there, so you should
have one word of integer regstack unoccupied before any
doubly-indirect command.
SEGMENTS
ast supports the notion of segment as a contiguous portion
of program in transputer's memory. The segments are logi-
cal: they do not necessarily imply the presence of some
kind of segment registers in the hardware. Rather, they
reflect the fact that a program may consist of several
pieces scattered in transputer's memory. ast puts no lim-
itation on the number of segments in a program, and allows
stratup code programmer to choose whatever names he likes
for them. (By the way, segment names never conflict with
regular names; they are in separate namespace). Switching
from segment to segment in a program is accomplished with
segment switch directives, like this:
cmdfoo
.text
cmdbar
cmdfoobar
.data
.byte "that's it\n", 0
Everything emitted between the start of the file and the
first segment switch directive (cmdfoo in this example)
goes into the head segment. Other knowledge ast has about
specific segment names is:
text segment uses the default filler of 0x20
(pfix 0) for .align directive;
bss is the segment where the .comm directive emits
to.
Other than that, ast makes no assumptions about specific
names of segments.
FRAGMENTS
The notion of fragment is probably the most nonstandard
feature of TTOOLS. Fragments are atomic (indivisible)
constituents of the program from the viewpoint of the
TTOOLS linker, lit.
As you can recall, traditional linkers consider object
files as atomic, indivisible consituents of the program
being linked. The traditional behaviour sometimes turn
out awkward; for example, when writing a large library,
programmers have to put every function in a separate file
to avoid linking in unnecessary functions. One known
workaround for this problem is to enclose every function
in a large file in #ifdef's:
...
#ifdef L_foo
foo ()
{
}
#endif
#ifdef L_bar
bar ()
{
}
#endif
...
and then compile the file N times, where N is the amount
of functions in the file, each time with the appropriate
L_* symbol defined.
TTOOLS addresses the problem of omitting unused code by
considering every global function or data in a program as
a separate fragment of the program which the linker can
link in or omit depending on whether this fragment is
needed in the unltimate file or not. Assembler puts a
need list for every fragment in the LIT format objects
file, so that the linker can build a dependency graph to
divide needed fragments from unneeded ones. See lit(1)
for more detailed decription of how the linker decides
whether a fragment is needed or not; here we are mostly
concerned with how the assemblers determines fragment
boundaries and fragment dependencies.
The rule for boundaries is simple: when the assembler
encounters a label which was before declared as global (a
public label), it considers it a start of a new fragment.
The program counter ('.') is set equal to the fragment
name at this point.
"What if I want to put a global label in the middle of a
fragment?" No problem. Using double colon (::) for a
label causes it not to start a new fragment even if the
name is declared global. Example:
.global fragment1
.global fragment2
.global middlelab
fragment1:
cmdfoo
middlelab:: // this label does NOT start a fragment
cmdbar
fragment2:
cmdfoobar
Every segment has its own current fragment. Any code
emitted between the start of a segment and the first pub-
lic label in the segment goes into a noname default frag-
ment. Note that noname symbols are considired distinct by
the linker, despite the fact that strcmp(3) would return 0
for them.
The rule for fragment dependencies is this: if the frag-
ment foo is mentioned in expression argument of any com-
mand or data element of the fragment bar, then bar needs
foo. This natural rule is sufficient in most cases. How-
ever, sometimes you may want to specify an "artificial"
need; you can do that using the .need directive, which
adds its argument to the need list of the current frag-
ment:
.globl foo
.globl bar
foo:
...
bar:
.need foo // must link in foo if bar is linked in
...
"Great! Can I turn all this off? :-)" No. Although it
wouldn't be hard to implement an option which yields the
traditional behavior - treat all public labels as nonfrag-
ments and add needs for next and previous segment to every
segment's only fragment... well, I'll do it if you explain
why you need it; mail me or do it yourself.
DIRECTIVES
.word, .half, .byte
emit data of the width of 4, 2, or 1 byte respec-
tively at the current point. More then one datum
may be specified on one directive, separated by a
comma (','). A repeater expression may be speci-
fied after a datum in braces ([]). Using a ques-
tion mark ('?') for a datum causes an uninitial-
ized datum of the said width to be emitted. .byte
in addition allows to specify a string in double
quotes for a datum; the string contents is parsed
according to the rules of the C language, except
that the trailing zero is not appended automati-
cally. Examples:
.byte 8, "potatoes"
.word ?[3]
.word 0777, 0xfed, 0x55aa55aa[label2-label1]
.ascii is a synonym for .byte, recognized to mimic other
assemblers.
.align boundary [, filler]
advances the point up to the nearest multiple of
boundary by emitting the necessary amount of bytes.
filler expression may be given to specify the value
of the padding bytes; if the filler is omitted, the
value of 0x20 (pfix 0) is used if current segment
is text, the value of 0 for any other segment.
boundary must be a power of two, and also a con-
stant expression; filler may be an arbitrary
expression. Examples:
.align 4, 0x20
.align 8, user_defined_filler
.globl name
declares name to be visible to linker. If name is
also defined in this file, it is called a public
name, otherwise a external name. The .globl direc-
tive may either precede or follow the name defini-
tion, except when the definition a label thus
named: in this case, .globl must precede the label,
or ast will not be able to recognize that the label
starts a new fragment.
.comm name, size
reserve a common block of size bytes long in the
bss segment, and place the label name at the start
of it. The name is automatically declared global,
so it does not need a separate .globl. Essentially
this directive creates a new fragment name in the
bss segment. The name also receives a special com-
mon flag, which tells the linker to expect multiple
occurencies of this fragment in different files,
and merge them silently, choosing the bigger size
if they are not equal (this maximizationr is not
yet implemented, if I can recall). This directive
poorly fits in TTOOLS ideology, and was a pain to
implement, but you need it to assemble code gener-
ated by C compilers.
.need name
add name to the need list of the fragment this
directive appears in. This directive is only nec-
essary when ast cannot figure the dependency
itself, which is a rare case, probably only arising
in startup routines.
.slot width
this directive may be used to specify the exact
size in bytes for the next command emitted in the
current segment. By default, ast tries to minimize
the size of generated commands; this directive can-
cels this behaviour for one command. If the actual
generates command turns out narrower than width, it
is padded with no-ops (pfix 0) from the left. If
the actual generates command turns out wider than
width, ast (or lit, or a loader) will flag this as
error. width must be a constant expression.
.segment, segment switch directives
anything looking like a directive (starting from a
dot) and not matching the directives enumerated
above is considered a directive for setting up seg-
ment as the current segment, creating it if neces-
sary. This catchall behaviour is rather error-
prone, but that is the cost of having both unlim-
ited number of segments and traditionally looking
segment switching directives.
COMMENTS
ast recognizes C++-style comments of both flavors, that is
// this is a comment spanning up to the end of line
/* this is your ordinary embeddable comment */
MISCELLANEOUS
Newline characters in the input are treated as mere
whitespace, so you may write multiline commands if you
wish.
Semicolon (';') is considered an empty statement. You
might want to separate commands written on one line with
semicolons to improve readablilty and help ast recover
from syntax errors, if any. This is not required, though,
as the syntax of the language allows for recognition of
statement boundaries without any special separators, be
that semicolons or newlines.
Identifiers may contain any of the characters [0-9a-zA-
Z_.$@], the leading character must be one of [a-zA-Z_$@].
OFFSETS OPTIMIZATION AND OTHER INNER WORKINGS OF ast
The need for offset optimization is due to an interesing
feature of transputer hardware: the dependence of command
size on the value of the command's argument. A command
may require from one to eight bytes to encode, depending
on its argument's value. Choosing the minimal possible
size for every command is an aim for every transputer pro-
gramming system, because this gives more comact and faster
code.
ast uses the following procedure to minimize command
sizes. First, the entire input file is parsed and trans-
lated into an internal form. The internal form looks like
a linked list of elements of three types: BLOCK, DATA, and
CMD.
The BLOCK element is a plain sequence of bytes, resulting
from translation of commands with constant arguments and
data with known values and sizes. The size of a BLOCK
element is always a constant.
The DATA and CMD elements result from translation of com-
mands or data whose arguments are not constants, but
rather expressions containing variable components, such as
names and sizes of elements. For every element, ast com-
putes the interval where the future size of the element
will lie.
Once the internal form is built, ast traverses the ele-
ments list repeatedly, recomputing sizes of elements based
on the estimation of arguments' values. When an element
is found whose assigned size interval was wider than is
necessary for the current estimation of the argument, the
interval is narrowed. This creates chances for other ele-
ments, whose arguments depend on that interval, to be nar-
rowed too. The process continues with estimations becom-
ing better and better on every iteration, until no more
shrinks can be made.
Actually, the fact that ast cannot optimize further does
not mean that no longer optimization is possible. In link
time, when external references of the program are
resolved, there will be new opportunities to optimize, and
lit can do that. So ast writes the element chains,
including the variable elements, down to the output object
file; thanks to LIT object file format which have means to
represent variable elements (see lit(5)).
BUGS
There are always bugs, even if we fancy we have none :-(.
I would be grateful if you let me know of mine; so if you
find any, please submit a description of the bug and the
assembly language program the bug exposes on (prepro-
cessed, if the program requires a preprocessor - my pre-
processor and include files may differ from yours!) to
bug-ttools@botik.ru. Same for errors in this man page.
SEE ALSO
dast(1), lit(1), lit(5), litdump(1), ttools(1)
AUTHOR
ast is written by Yury Shevchuk (sizif@botik.ru)