Report on the 4-th International Russian-Indian exhibition-seminar "Mathematical modeling and visualization", Moscow, September 15-25, 1997

T-system: Programming environment
providing automatic dynamic parallelizing
on IP-network of Unix-computers

Sergei Abramov
Alexey Adamovich
Maxim Kovalenko

Program Systems Institute of Russian Academy of Sciences
Research Center for Multiprocessor Systems
Pereslavl-Zalessky, Russia, 152140

tel: +7(08535)98.031 fax: +7(08535)20.593 fax: +7(08535)98.278

[ In Russian: KOI8 | CP1251(MS-Windows) | CP866(DOS) | ISO8859-5 | Mac ]


1. Introduction

So far, the problem of implementation of parallel computations is mostly a problem of software rather than hardware.

This paper introduces a software system for automatic dynamic parallelizing of programs developed in RCMS of PSI RAS-the T-system. An example of application of T-system to a task of image rendering using the ray tracing method is also presented.

2. Computation model

We implement automatic dynamic parallelizing using a new model of organization of the process of computation:

[x,y,z] = G(a,b);

  • A call to function G results in creation of a new process, with several inputs (according to the number of the arguments of G) and several outputs (according to the number of the results of G). The outputs of the new process are tied with the appropriate variables (x, y, z) of  F by the "producer-consumer" relation.
  • Thus at the moment of the call the consumer variables receive unevaluated (not ready) values. The new process G will work to compute the function G and to replace the unevaluated values of all consumers by the appropriate results of function G.
  • All processes are created in the "ready to compute" state. A call of function G from the process F as described above does not make the process F not ready to compute.
  • The only event that can lead to the loss of readiness (to suspension) of a process is an attempt of the process to carry out any substantial operation with an unevaluated value (data transfers, for example copying, are not considered substantial operations and do not lead to suspension of processes).
  • If a process has attempted to carry out an operation with an unevaluated value, the process loses its readiness--becomes suspended. The readiness (ability to be evaluated) will be returned to the process (the process will be resumed) when the unevaluated value that was the reason of the suspension is replaced by a normal evaluated value as a result of work of some other process.
  • Thus, in every moment the state of computation can be represented as a network being computed; the nodes of the network are the processes and data structures, and arches represent the "producer-consumer" relations.
  • During the program execution the network being computed transforms itself: new nodes appear (call of function), nodes disappear (the last result needed by consumers has been computed), arches appear and disappear (allowable operations with unevaluated values.)
  • Autotransformation of an network being computed, as a model of organization of computations, opens an opportunity for automatic dynamic parallelizing of programs: for the majority of the applications the organization of the computation in this way provides sufficient number of processes ready to run, and they can be run in parallel.

    The model provides the only mechanism of synchronization of processes---the mechanism of suspending and resuming described above.

    The closest analogues to our approach are the Daisy/DSI project (Indiana Univ.), the reduction machine, and the MDA system (multiprocessor with dynamic architecture, Russia, 1978-1985 years).

    3. T-system

    The T-system is a software environment intended to implement the computation model described in the previous section on a multiprocessor platform.

    3.1. Hardware/software platform for T-system

    The current implementation of T-system runs on a "multicomputer" that is a TCP/IP network of IBM/PC clones running OS Linux.

    The nearest development plans include:

    3.2. The basic functions of T-system

    The main function of T-system is the support of the computation model "autotransformation of evaluation network." Within the framework of this function T-system implements the following basic data structures and operations:

    Each processor element of a multicomputer runs its own instance of T-system---the so-called co-executive. The user task is running under the management and with the support of all co-executives.

    Functions of T-system related to utilization of all the computational power of a multiprocessor:

    3.3. Programming in T-system

    Today, the following technique is used for programming applications under T-system: the programs are developed in the C language using a library of functions providing an interface to the T-system kernel. The library provides the programmer with all operations necessary for the considered computation model.

    The nearest development plans include:

    4. Rendering of the 3-D scene images using T-system

    We have chosen the problem of 3-D scene rendering using ray tracing as the real problem for demonstration of the automatic dynamic parallelizing abilities.

    4.1. Program structure

    The following algorithm is used at the top level of the program:

    [image] main(scene, full_screen_description) {
        [tree] = render_scene(scene, full_screen_description);
        [image] = flatten(tree);
    [image_tree] render_scene(scene, part_of_screen_description) {
    var part1, part2, tree, part_of_img;
        if (size_too_big(part_of_screen_description)) {
            [part1,part2] = split(part_of_screen_description);
            br_node = new(2);
            [br_node[0]] = render_scene(scene,part1);
            [br_node[1]] = render_scene(scene,part2);
            [image_tree] = br_node; // branch in a tree of image fragments
        } else {
            ... allocation of an array part_of_image and filling it by usual
                ray tracing algorithm
            image_tree = part_of_image; // leaf in a tree of image
                                        // fragments
    [image] flatten(tree) {
        ... recursive traversal of the tree and composing
            the whole image from its fragments.

    Pay attention to the fact that the information about the number of co-executives, and the sort of hardware (topology of the network, etc.) is not used in the program. This is one of the main properties of our approach:

    1. T-programs can run without rewriting and even recompilation on various computing installations with T-system that may differ widely from each other in the number of processors and the sort of network hardware.
    2. The programmer does not have to care about the placement of processes to processors, about the synchronization of processes, etc. The programmer only has to describe the problem in the functional style on the top level of program and to take care of large enough computational complexity of functions in the program (because it's the functions that are grains of parallel execution in T-system).

    4.2. The results of computing experiments

    The T-program of ray tracing was run on multicomputers consisting of one, two, three and four personal computers (IBM PC Pentium 166/32Mb/4Gb). We rendered the images of scenes of various complexities.
    The experiments show that T-system supports high level of utilization of the multiprocessor's computational power and low level of overhead.

    The basic results of experiments with calculation for three scenes

    Scene B2 Scene G Scene B3
    Scene B2 
    (3 lights, 91 spheres, 1 polygon)
    Scene G 
    (5 lights, 147 polygons)
    Scene B3 
    (3 lights, 820 spheres, 1 polygon)


    Tsec Computing time in seconds T% Computing time reduction
    number of processors scene B2 scene B3 scene G number of processors scene B2 scene B3 scene G
    1 278.1 2,785.9 1,295.2 1 100.0% 100.0% 100.0%
    2 148.0 1,443.5 671.9 2 53.2% 51.8% 51.8%
    3 100.0 976.8 450.4 3 35.9% 35.0% 34.7%
    4 77.9 747.6 345.6 4 28.0% 26.8% 26.6%
    SpeedUp Speedup factor Utilize Computational power utilization
    number of processors scene B2 scene B3 scene G number of processors scene B2 scene B3 scene G
    1 1.00 1.00 1.00 1 100.0% 100.0% 100.0%
    2 1.88 1.93 1.93 2 93.9% 96.5% 96.3%
    3 2.78 2.85 2.88 3 92.6% 95.0% 95.8%
    4 3.57 3.73 3.75 4 89.2% 93.1% 93.6%

    The results for two other scenes

    Scene B4 Scene G2
    Scene B4 
    (3 lights, 3781 spheres, 1 polygon)
    Scene G2 
    (5 lights, 1169 polygons)

    5. Invitation to collaboration

    We are ready to discuss any forms of collaboration with those interested in:

    6. Selected publications on T-system

    Most of the publications are FTP-available from Web-page: