This file contains various notes about the design of the compiler.

OUTLINE

The main job of the compiler is to translate Mercury into C, although it can also translate (subsets of) Mercury to some other languages (Goedel, the bytecode of a debugger currently under development, and in the future the Aditi Relational Language).

The top-level of the compiler is in the file mercury_compile.m. The basic design is that compilation is broken into the following stages:

  1. parsing (source files -> HLDS)
  2. semantic analysis and error checking (HLDS -> annotated HLDS)
  3. high-level transformations (annotated HLDS -> annotated HLDS)
  4. code generation (annotated HLDS -> LLDS)
  5. low-level optimizations (LLDS -> LLDS)
  6. output C code (LLDS -> C)

Note that in reality the separation is not quite as simple as that. Although parsing is listed as step 1 and semantic analysis is listed as step 2, the last stage of parsing actually includes some semantic checks. And although optimization is listed as steps 3 and 5, it also occurs in steps 2, 4, and 6. For example, elimination of assignments to dead variables is done in mode analysis; middle-recursion optimization and the use of static constants for ground terms is done in code generation; and a few low-level optimizations are done in llds_out.m as we are spitting out the C code.


DETAILED DESIGN

(well, more detailed than the OUTLINE anyway ;-)

The action is co-ordinated from mercury_compile.m.

0. Option handling

The command-line options are defined in the module options.m. mercury_compile.m calls library/getopt.m, passing the predicates defined in options.m as arguments, to parse them. It then invokes handle_options.m to postprocess the option set. The results are stored in the io__state, using the type globals defined in globals.m.

1. Parsing

The result at this stage is the High Level Data Structure, which is defined in four files:

  1. hlds_data.m defines the parts of the HLDS concerned with function symbols, types, insts, modes and determinisms;
  2. hlds_goal.m defines the part of the HLDS concerned with the structure of goals, including the annotations on goals;
  3. hlds_pred.m defines the part of the HLDS concerning predicates and procedures;
  4. hlds_module.m defines the top-level parts of the HLDS, including the type module_info.
The module hlds_out.m contains predicates to dump the HLDS to a file. The module goal_util.m contains predicates for renaming variables in an HLDS goal.

2. Semantic analysis and error checking

Any pass which can report errors or warnings must be part of this stage, so that the compiler does the right thing for options such as `--halt-at-warn' (which turns warnings into errors) and `--error-check-only' (which makes the compiler only compile up to this stage).

implicit quantification
quantification.m handles implicit quantification and computes the set of non-local variables for each sub-goal
type checking
purity analysis
purity.m is responsible for purity checking, as well as defining the purity type and a few public operations on it. It also calls post_typecheck.m to complete the handling of predicate overloading for cases which typecheck.m is unable to handle, to check for unbound type variables, and to copy the clauses to the proc_infos in preparation for mode analysis.
mode analysis
indexing and determinism analysis
checking of unique modes (unique_modes.m)
unique_modes.m checks that non-backtrackable unique modes were not used in a context which might require backtracking. Note that what unique_modes.m does is quite similar to what modes.m does, and unique_modes calls lots of predicates defined in modes.m to do it.
checking typeclass instances (check_typeclass.m)
check_typeclass.m checks that, each instance declaration, that the types, modes and determinism of each predicate/function that is a method of the class is correct (ie. that it matches the typeclass declaration). This pass is performed at the end of semantic analysis because it needs mode and determinism information. In this pass, pred_ids and proc_ids are assigned to the methods for each instance. In addition, while checking that the superclasses of a class are satisfied by the instance declaration, a set of constraint_proofs are built up for the superclass constraints. These are used by polymorphism.m when generating the base_typeclass_info for the instance.
simplification (simplify.m)
simplify.m finds and exploits opportunities for simplifying the internal form of the program, both to optimize the code and to massage the code into a form the code generator will accept. It also warns the programmer about any constructs that are so simple that they should not have been included in the program in the first place. (That's why this pass needs to be part of semantic analysis: because it can report warnings.) simplify.m calls common.m which looks for (a) construction unifications that construct a term that is the same as one that already exists, or (b) repeated calls to a predicate with the same inputs, and replaces them with assignment unifications. simplify.m also attempts to partially evaluate calls to builtin procedures if the inputs are all constants (see const_prop.m).

3. High-level transformations

The first pass of this stage does tabling transformations (table_gen.m). This involves the insertion of several calls to tabling predicates defined in mercury_builtin.m and the addition of some scaffolding structure.

The next two passes of this stage are code simplifications.

To improve efficiency, the above two passes are actually combined into one - polymorphism.m calls calls lambda__transform_lambda directly.

The next pass is termination analysis. The various modules involved are:

Most of the remaining HLDS-to-HLDS transformations are optimizations:

The module transform.m contains stuff that is supposed to be useful for high-level optimizations (but which is not yet used).

Eventually we plan to make Mercury the programming language of the Aditi deductive database system. When this happens, we will need to be able to apply the magic set transformation, which is defined for predicates whose definitions are disjunctive normal form. The module dnf.m translates definitions into DNF, introducing auxiliary predicates as necessary.

4. Code generation

pre-passes to annotate the HLDS
Before code generation there are a few more passes which annotate the HLDS with information used for code generation:
choosing registers for procedure arguments (arg_info.m)
Currently uses one of two simple algorithms, but we may add other algorithms later.
annotation of goals with liveness information (liveness.m)
This records the birth and death of each variable in the HLDS goal_info.
allocation of stack slots
This is done by live_vars.m, which works out which variables need to be saved on the stack when, and then uses graph_colour.m to determine a good allocation of variables to stack slots.
migration of builtins following branched structures
This transformation, which is performed by follow_code.m, improves the results of follow_vars.
allocating the follow vars (follow_vars.m)
Traverses backwards over the HLDS, annotating some goals with information about what locations variables will be needed in next. This allows us to generate more efficient code by putting variables in the right spot directly. This module is not called from mercury_compile.m; it is called from store_alloc.m.
allocating the store map (store_alloc.m)
Annotates each branched goal with variable location information so that we can generate correct code by putting variables in the same spot at the end of each branch.
code generation
For code generation itself, the main module is code_gen.m. It handles conjunctions and negations, but calls sub-modules to do most of the other work:

It also calls middle_rec.m to do middle recursion optimization.

The code generation modules make use of

code_info.m
The main data structure for the code generator
code_exprn.m
This defines the exprn_info type, which is a sub-component of the code_info data structure which holds the information about the contents of registers and the values/locations of variables.
exprn_aux.m
Various preds which use exprn_info
code_util.m
Some miscellaneous preds used for code generation
code_aux.m
Some miscellaneous preds which, unlike those in code_util, use code_info
continuation_info.m
For accurate garbage collection, collects information about each live value after calls, and saves information about procedures.
code generation for `pragma export' declarations (export.m)
This is handled seperately from the other parts of code generation. mercury_compile.m calls the procedures `export__produce_header_file' and `export__get_pragma_exported_procs' to produce C code fragments which declare/define the C functions which are the interface stubs for procedures exported to C.

The result of code generation is the Low Level Data Structure (llds.m). The code for each procedure is generated as a tree of code fragments which is then flattened (tree.m).

5. Low-level optimization

The various LLDS-to-LLDS optimizations are invoked from optimize.m. They are:

Depending on which optimization flags are enabled, optimize.m may invoke many of these passes multiple times.

Some of the low-level optimization passes use basic_block.m, which defines predicates for converting sequences of instructions to basic block format and back, as well as opt_util.m, which contains miscellaneous predicates for LLDS-to-LLDS optimization.

6. Output C code


BYTECODE

The Mercury compiler can translate Mercury programs into bytecode for interpretation by the Mercury debugger currently under development. The generation of bytecode happens after semantic checks have been completed.


MISCELLANEOUS

det_util:
This module contains utility predicates needed by the parts of the semantic analyzer and optimizer concerned with determinism.
special_pred.m, unify_proc.m:
These modules contain stuff for handling the special compiler-generated predicates which are generated for each type: unify/2, compare/3, and index/1 (used in the implementation of compare/3).
dependency_graph.m:
This contains predicates to compute the call graph for a module, and to print it out to a file. (The call graph file is used by the profiler.) The call graph may eventually also be used by det_analysis.m, inlining.m, and other parts of the compiler which could benefit from traversing the predicates in a module in a bottom-up or top-down fashion with respect to the call graph.
passes_aux.m
Contains code to write progress messages, and higher-order code to traverse all the predicates defined in the current module and do something with each one.
opt_debug.m:
Utility routines for debugging the LLDS-to-LLDS optimizations.
error_util.m:
Utility routines for printing nicely formatted error messages.

CURRENTLY USELESS

The following modules do not serve any function at the moment. Some of them are obsolete; other are work-in-progress. (For some of them its hard to say which!)

lco.m:
This finds predicates whose implementations would benefit from last call optimization modulo constructor application. It does not apply the optimization and will not until the mode system is capable of expressing definite aliasing.
mercury_to_goedel.m:
This converts from item_list to Goedel source code. It works for simple programs, but doesn't handle various Mercury constructs such as lambda expressions, higher-order predicates, and functor overloading.
mercury_to_c.m:
The very incomplete beginnings of an alternate code generator. When finished, it will convert HLDS to high-level C code (without going via LLDS).

Last update was $Date: 1998-09-10 06:52:46 $ by $Author: stayl $@cs.mu.oz.au.