NAME ast - assembler for transputers SYNOPSIS ast [options] [FILE] DESCRIPTION ast produces LIT format object file from a transputer assembly language program contained in the FILE. If the FILE is omitted, ast takes the program from stdin. OPTIONS --help show usage summary -o FILE, --output FILE output to FILE instead of a.out. -p N, --maxpasses N limit the number of offset optimization passes by N. ast runs offset optimizer repeatedly over the internal representation of the program until fur- ther optimization proves impossible or this limit is reached. Default limit is 8, increasing it might make a bit tighter code. -E, --expr-print-long affects verbosity of debugging dumps requested with --trace or --dump. Giving this option once causes elements and noname symbols referred from ELEMENT and NAME expressions to be prefixed with their mem- ory addresses. Giving this option twice results in every expression being prefixed with its memory address. -t SPEC, --trace SPEC enable debugging output to stdout. SPEC is a string of the form KEY[+|-KEY][+|-KEY]... where +KEY turns on certain tracing facility, -KEY turns the facility off. KEY may be any of the following: all, lex, point, opt, opt2, reduce, misc, expr. all turns on all trace facilities; for the descrip- tion of other facilities refer to the ast source. -d SPEC, --dump SPEC enable debugging dumps of internal representation of the program being assembled at certain points. SPEC is parsed in the same way as for --trace ; valid dump keys are all, parse, opt, canon, unroll, merge, needs. INFORMAL DESCRIPTION OF THE INPUT LANGUAGE The language that ast takes as input is mostly what you would expect a transputer assembly language to be. You might want to take a look at one of the assembly language files present in TTOOLS distribution, or examine the out- put of ``t800-gcc -O -S foo.c'' to get a feel of it. NAMES Names, or symbols, are essential part of any assembly lan- guage. They serve to represent possibly not yet defined values. For example, you need a name to code a forward reference: j mylabel cmdfoo cmdbar mylabel: cmdfoobar In this example, the name mylabel represents the address of the command cmdfoobar in transputer's memory. DIGITAL LABELS In addition to names, ast supports digital labels, a fea- ture commonly found in UNIX assemblers. Digital labels are often handy when you need a short-range label which you hate to invent a name for. Digital label is a single digit (0..9) followed by semicolon. References to digital labels are written as a digit followed by letters 'f' or 'b' which denote forward and backward references, respec- tively. For example, here is a loop which cycles until cmdfoo leaves zero in Areg: 1: cmdfoo cj 1f cmdbar j 1b 1: ASSIGNMENTS Another way to define a name is an explicit assignment. For example: mysize = mylabel - bottom wordsize = 0x4 mysize_in_words = (mysize + (wordsize-1)) / wordsize Left hand side of an assignment may contain an arbitrary expression. Expression use the set of operators and con- stants notation of the C programming language. Apart from names, constants, operators, and digital label references, appearing in expressions may be a dot ('.') which stands for the current value of the program counter, the assembly point. COMMANDS The syntax for commands on transputers is very simple, compared to that on traditional CISC processors. Consider the following piece of code: ldc fp_array + 5*word_size fpldnlsn fpuabs ldlp 1 fpstnlsn j end ldc is a direct command. All direct commands take an arbitrary expression as an argument; the assembler will compute the expression and assemble the appropriate sequence of pfix/nfix instructions before the direct com- mand, if necessary. Three direct commands (j, cj, and call) have a special treatment of their argument: ast implicitly substracts the address of the end of the command being assembled from the argument. The reason for this is that the transputer hardware interprets the argument of these commands as an offset from the end of the command to the destination label/function. This special handling compensates for this, so that the commands look naturally, for example: call _printf fpldnlsn is an indirect comand. Indirect commands take no immediate operand. Actually, they are assembled as opr indirect_opcode Ast chooses the indirect_opcode based on the mnemonics of the indirect command. You might want to use opr directly if you ever need an indirect command which ast doesn not know a mnemonics for - but better yet, submit a patch to get your new command added to ast's instruction table. See ttools*/ast/scan.l. fpuabs, as all fpu* commans, is a doubly-indirect floating point unit command. It is doubly indirect because it is actually coded as ldc fpu_opcode; fpentry where fpentry is an indirect command. ast hides the ldc, but bear in mind that the ldc *is* there, so you should have one word of integer regstack unoccupied before any doubly-indirect command. SEGMENTS ast supports the notion of segment as a contiguous portion of program in transputer's memory. The segments are logi- cal: they do not necessarily imply the presence of some kind of segment registers in the hardware. Rather, they reflect the fact that a program may consist of several pieces scattered in transputer's memory. ast puts no lim- itation on the number of segments in a program, and allows stratup code programmer to choose whatever names he likes for them. (By the way, segment names never conflict with regular names; they are in separate namespace). Switching from segment to segment in a program is accomplished with segment switch directives, like this: cmdfoo .text cmdbar cmdfoobar .data .byte "that's it\n", 0 Everything emitted between the start of the file and the first segment switch directive (cmdfoo in this example) goes into the head segment. Other knowledge ast has about specific segment names is: text segment uses the default filler of 0x20 (pfix 0) for .align directive; bss is the segment where the .comm directive emits to. Other than that, ast makes no assumptions about specific names of segments. FRAGMENTS The notion of fragment is probably the most nonstandard feature of TTOOLS. Fragments are atomic (indivisible) constituents of the program from the viewpoint of the TTOOLS linker, lit. As you can recall, traditional linkers consider object files as atomic, indivisible consituents of the program being linked. The traditional behaviour sometimes turn out awkward; for example, when writing a large library, programmers have to put every function in a separate file to avoid linking in unnecessary functions. One known workaround for this problem is to enclose every function in a large file in #ifdef's: ... #ifdef L_foo foo () { } #endif #ifdef L_bar bar () { } #endif ... and then compile the file N times, where N is the amount of functions in the file, each time with the appropriate L_* symbol defined. TTOOLS addresses the problem of omitting unused code by considering every global function or data in a program as a separate fragment of the program which the linker can link in or omit depending on whether this fragment is needed in the unltimate file or not. Assembler puts a need list for every fragment in the LIT format objects file, so that the linker can build a dependency graph to divide needed fragments from unneeded ones. See lit(1) for more detailed decription of how the linker decides whether a fragment is needed or not; here we are mostly concerned with how the assemblers determines fragment boundaries and fragment dependencies. The rule for boundaries is simple: when the assembler encounters a label which was before declared as global (a public label), it considers it a start of a new fragment. The program counter ('.') is set equal to the fragment name at this point. "What if I want to put a global label in the middle of a fragment?" No problem. Using double colon (::) for a label causes it not to start a new fragment even if the name is declared global. Example: .global fragment1 .global fragment2 .global middlelab fragment1: cmdfoo middlelab:: // this label does NOT start a fragment cmdbar fragment2: cmdfoobar Every segment has its own current fragment. Any code emitted between the start of a segment and the first pub- lic label in the segment goes into a noname default frag- ment. Note that noname symbols are considired distinct by the linker, despite the fact that strcmp(3) would return 0 for them. The rule for fragment dependencies is this: if the frag- ment foo is mentioned in expression argument of any com- mand or data element of the fragment bar, then bar needs foo. This natural rule is sufficient in most cases. How- ever, sometimes you may want to specify an "artificial" need; you can do that using the .need directive, which adds its argument to the need list of the current frag- ment: .globl foo .globl bar foo: ... bar: .need foo // must link in foo if bar is linked in ... "Great! Can I turn all this off? :-)" No. Although it wouldn't be hard to implement an option which yields the traditional behavior - treat all public labels as nonfrag- ments and add needs for next and previous segment to every segment's only fragment... well, I'll do it if you explain why you need it; mail me or do it yourself. DIRECTIVES .word, .half, .byte emit data of the width of 4, 2, or 1 byte respec- tively at the current point. More then one datum may be specified on one directive, separated by a comma (','). A repeater expression may be speci- fied after a datum in braces ([]). Using a ques- tion mark ('?') for a datum causes an uninitial- ized datum of the said width to be emitted. .byte in addition allows to specify a string in double quotes for a datum; the string contents is parsed according to the rules of the C language, except that the trailing zero is not appended automati- cally. Examples: .byte 8, "potatoes" .word ?[3] .word 0777, 0xfed, 0x55aa55aa[label2-label1] .ascii is a synonym for .byte, recognized to mimic other assemblers. .align boundary [, filler] advances the point up to the nearest multiple of boundary by emitting the necessary amount of bytes. filler expression may be given to specify the value of the padding bytes; if the filler is omitted, the value of 0x20 (pfix 0) is used if current segment is text, the value of 0 for any other segment. boundary must be a power of two, and also a con- stant expression; filler may be an arbitrary expression. Examples: .align 4, 0x20 .align 8, user_defined_filler .globl name declares name to be visible to linker. If name is also defined in this file, it is called a public name, otherwise a external name. The .globl direc- tive may either precede or follow the name defini- tion, except when the definition a label thus named: in this case, .globl must precede the label, or ast will not be able to recognize that the label starts a new fragment. .comm name, size reserve a common block of size bytes long in the bss segment, and place the label name at the start of it. The name is automatically declared global, so it does not need a separate .globl. Essentially this directive creates a new fragment name in the bss segment. The name also receives a special com- mon flag, which tells the linker to expect multiple occurencies of this fragment in different files, and merge them silently, choosing the bigger size if they are not equal (this maximizationr is not yet implemented, if I can recall). This directive poorly fits in TTOOLS ideology, and was a pain to implement, but you need it to assemble code gener- ated by C compilers. .need name add name to the need list of the fragment this directive appears in. This directive is only nec- essary when ast cannot figure the dependency itself, which is a rare case, probably only arising in startup routines. .slot width this directive may be used to specify the exact size in bytes for the next command emitted in the current segment. By default, ast tries to minimize the size of generated commands; this directive can- cels this behaviour for one command. If the actual generates command turns out narrower than width, it is padded with no-ops (pfix 0) from the left. If the actual generates command turns out wider than width, ast (or lit, or a loader) will flag this as error. width must be a constant expression. .segment, segment switch directives anything looking like a directive (starting from a dot) and not matching the directives enumerated above is considered a directive for setting up seg- ment as the current segment, creating it if neces- sary. This catchall behaviour is rather error- prone, but that is the cost of having both unlim- ited number of segments and traditionally looking segment switching directives. COMMENTS ast recognizes C++-style comments of both flavors, that is // this is a comment spanning up to the end of line /* this is your ordinary embeddable comment */ MISCELLANEOUS Newline characters in the input are treated as mere whitespace, so you may write multiline commands if you wish. Semicolon (';') is considered an empty statement. You might want to separate commands written on one line with semicolons to improve readablilty and help ast recover from syntax errors, if any. This is not required, though, as the syntax of the language allows for recognition of statement boundaries without any special separators, be that semicolons or newlines. Identifiers may contain any of the characters [0-9a-zA- Z_.$@], the leading character must be one of [a-zA-Z_$@]. OFFSETS OPTIMIZATION AND OTHER INNER WORKINGS OF ast The need for offset optimization is due to an interesing feature of transputer hardware: the dependence of command size on the value of the command's argument. A command may require from one to eight bytes to encode, depending on its argument's value. Choosing the minimal possible size for every command is an aim for every transputer pro- gramming system, because this gives more comact and faster code. ast uses the following procedure to minimize command sizes. First, the entire input file is parsed and trans- lated into an internal form. The internal form looks like a linked list of elements of three types: BLOCK, DATA, and CMD. The BLOCK element is a plain sequence of bytes, resulting from translation of commands with constant arguments and data with known values and sizes. The size of a BLOCK element is always a constant. The DATA and CMD elements result from translation of com- mands or data whose arguments are not constants, but rather expressions containing variable components, such as names and sizes of elements. For every element, ast com- putes the interval where the future size of the element will lie. Once the internal form is built, ast traverses the ele- ments list repeatedly, recomputing sizes of elements based on the estimation of arguments' values. When an element is found whose assigned size interval was wider than is necessary for the current estimation of the argument, the interval is narrowed. This creates chances for other ele- ments, whose arguments depend on that interval, to be nar- rowed too. The process continues with estimations becom- ing better and better on every iteration, until no more shrinks can be made. Actually, the fact that ast cannot optimize further does not mean that no longer optimization is possible. In link time, when external references of the program are resolved, there will be new opportunities to optimize, and lit can do that. So ast writes the element chains, including the variable elements, down to the output object file; thanks to LIT object file format which have means to represent variable elements (see lit(5)). BUGS There are always bugs, even if we fancy we have none :-(. I would be grateful if you let me know of mine; so if you find any, please submit a description of the bug and the assembly language program the bug exposes on (prepro- cessed, if the program requires a preprocessor - my pre- processor and include files may differ from yours!) to bug-ttools@botik.ru. Same for errors in this man page. SEE ALSO dast(1), lit(1), lit(5), litdump(1), ttools(1) AUTHOR ast is written by Yury Shevchuk (sizif@botik.ru)