mirror of
https://github.com/Mercury-Language/mercury.git
synced 2025-12-16 22:35:41 +00:00
Estimated hours taken: 4 compiler/notes/COMPILER_DESIGN: Document the changes in the design of type-checking that were needed to implement overload resolution for predicates with the same name and arity that occur in different modules.
396 lines
14 KiB
Plaintext
396 lines
14 KiB
Plaintext
-----------------------------------------------------------------------------
|
|
|
|
This file contains various notes about the design of the compiler.
|
|
|
|
-----------------------------------------------------------------------------
|
|
|
|
OUTLINE
|
|
|
|
The top-level of the compiler is in the file mercury_compile.m.
|
|
The basic design is that compilation is broken into the following
|
|
stages:
|
|
|
|
1. parsing (source files -> HLDS)
|
|
2. semantic analysis and error checking (HLDS -> annotated HLDS)
|
|
3. high-level transformations (annotated HLDS -> annotated HLDS)
|
|
4. code generation (annotated HLDS -> LLDS)
|
|
5. low-level optimizations (LLDS -> LLDS)
|
|
6. output C code (LLDS -> C)
|
|
|
|
Note that in reality the seperation is not quite as simple as that.
|
|
Although parsing is listed as step 1 and semantic analysis is listed
|
|
as step 2, the last stage of parsing actually includes some semantic checks.
|
|
And although optimization is listed as steps 3 and 5, it also occurs in
|
|
steps 2, 4, and 6. For example, elimination of assignments to dead
|
|
variables is done in mode analysis; middle-recursion optimization and
|
|
the use of static constants for ground terms is done in code
|
|
generation; and a few low-level optimizations are done in llds.m
|
|
as we are spitting out the C code.
|
|
|
|
-----------------------------------------------------------------------------
|
|
|
|
DETAILED DESIGN (well, more detailed than the OUTLINE anyway ;-)
|
|
|
|
The action is co-ordinated from mercury_compile.m.
|
|
|
|
0. Option handling
|
|
|
|
The command-line options are defined in the module options.m.
|
|
mercury_compile.pp calls library/getopt.m, passing the predicates
|
|
defined in options.m as arguments, to parse them. It then invokes
|
|
handle_options.m to postprocess the option set. The results are
|
|
stored in the io__state, using the type globals defined in globals.m.
|
|
|
|
1. Parsing
|
|
|
|
* lexical analysis (library/lexer.m)
|
|
|
|
* stage 1 parsing - convert strings to terms.
|
|
|
|
library/parser.m contains the code to do this, while
|
|
library/term.m and library/varset.m contain the term and varset
|
|
data structures that result, and predicates for manipulating them.
|
|
|
|
* stage 2 parsing - convert terms to `items' (declarations, clauses, etc.)
|
|
|
|
The result of this stage is a parse tree that has a one-to-one
|
|
correspondence with the source code. Both the parse tree data
|
|
structure and code to create it are in prog_io.m. The module
|
|
prog_out.m and mercury_to_mercury.m contain predicates for
|
|
printing the parse tree.
|
|
prog_util.m contains some utility predicates for manipulating
|
|
the parse tree.
|
|
|
|
* imports and exports are handled at this point (modules.m)
|
|
|
|
modules.m has the code to write out `.int', `.int2', `.d'
|
|
and `.dep' files.
|
|
|
|
* expansion of equivalence types (prog_util.m)
|
|
|
|
This is really part of type-checking, but is done
|
|
on the item_list rather than on the HLDS because it
|
|
turned out to be much easier to implement that way.
|
|
|
|
* simplification
|
|
|
|
make_hlds.m transforms the code into superhomogenous form,
|
|
and at the same time converts the parse tree into the HLDS.
|
|
make_hlds.m also calls make_tags.m which chooses the data
|
|
representation for each discriminated union type by
|
|
assigning tags to each functor.
|
|
|
|
The result at this stage is the High Level Data Structure,
|
|
which is defined in four files:
|
|
|
|
- hlds_data.m defines the parts of the HLDS concerned with
|
|
function symbols, types, insts, modes and determinisms;
|
|
- hlds_goal.m defines the part of the HLDS concerned with the
|
|
structure of goals, including the annotations on goals;
|
|
- hlds_pred.m defines the part of the HLDS concerning
|
|
predicates and procedures;
|
|
- hlds_module.m defines the top-level parts of the HLDS,
|
|
including the type module_info.
|
|
|
|
The module hlds_out.m contains predicates to dump the HLDS to a file.
|
|
The module goal_util.m contains predicates for renaming variables
|
|
in an HLDS goal.
|
|
|
|
2. Semantic analysis and error checking
|
|
|
|
* implicit quantification
|
|
|
|
quantification.m handles implicit quantification and computes
|
|
the set of non-local variables for each sub-goal
|
|
|
|
* type checking
|
|
|
|
- undef_types.m checks for undefined types.
|
|
- typecheck.m handles type checking, overloading resolution &
|
|
module name resolution, and almost fully qualifies all predicate
|
|
and functor names. It sets the map(var, type) field in the
|
|
pred_info. However, typecheck.m doesn't figure out the pred_id
|
|
for function calls or calls to overloaded predicates; that can't
|
|
be done in a single pass of typechecking, and so it is done
|
|
later on in modes.m. When it has finished, typecheck.m calls
|
|
clause_to_proc.m to make duplicate copies of the clauses for
|
|
each different mode of a predicate; all later stages work on
|
|
procedures, not predicates.
|
|
- type_util.m contains utility predicates dealing with types
|
|
that are used in a variety of different places within the compiler
|
|
|
|
* mode analysis
|
|
|
|
- undef_modes.m checks for undefined insts and modes
|
|
- modes.m is the main mode analysis module.
|
|
It checks that the code is mode-correct, reordering it
|
|
if necessary, and annotates each goal with a delta-instmap
|
|
that specifies the changes in instantiatedness of each
|
|
variable over that goal. It also converts higher-order
|
|
pred terms into lambda expressions. It also converts function
|
|
calls into predicate calls, and does the final step
|
|
of figuring out which pred_id to use for a call to an
|
|
overloaded predicate.
|
|
It use the following sub-modules:
|
|
mode_info.m (the main data structure for mode analysis)
|
|
delay_info.m (a sub-component of the mode_info data
|
|
structure used for storing the information
|
|
for scheduling: which goals are currently
|
|
delayed, what variables they are delayed on, etc.)
|
|
inst_match.m
|
|
This contains the code for dealing with insts:
|
|
abstractly unifying them, checking whether two
|
|
insts match, etc.
|
|
mode_errors.m
|
|
This module contains all the code to
|
|
print error messages for mode errors
|
|
- mode_util.m contains miscellaneous useful predicates dealing
|
|
with modes (many of these are used by lots of later stages
|
|
of the compiler)
|
|
|
|
* indexing and determinism analysis
|
|
|
|
switch detection (switch_detection.m),
|
|
common goal hoisting (cse_detection.m)
|
|
Note that if cse_detection.m modifies the code,
|
|
it will re-run mode analysis and switch detection
|
|
determinism analysis (det_analysis.m and det_report.m)
|
|
|
|
* checking of unique modes (unique_modes.m)
|
|
|
|
unique_modes.m checks that non-backtrackable unique modes were
|
|
not used in a context which might require backtracking.
|
|
Note that what unique_modes.m does is quite similar to
|
|
what modes.m does, and unique_modes calls lots of predicates
|
|
defined in modes.m to do it.
|
|
|
|
3. High-level transformations
|
|
|
|
The first two passes of this stage are code simplifications.
|
|
|
|
* introduction of type_info arguments for polymorphic predicates
|
|
(polymorphism.m)
|
|
|
|
* removal of lambda expressions (lambda.m)
|
|
|
|
lambda.m converts lambda expressions into higher-order predicate
|
|
terms referring to freshly introduced separate predicates.
|
|
This pass needs to come after unique_modes.m to ensure that
|
|
the modes we give to the introduced predicates are correct.
|
|
It also needs to come after polymorphism.m since polymorphism.m
|
|
doesn't handle higher-order predicate constants.
|
|
|
|
To improve efficiency, the above two passes are actually combined into
|
|
one - polymorphism.m calls calls lambda__transform_lambda directly.
|
|
|
|
The remaining HLDS-to-HLDS transformations are optimizations:
|
|
|
|
* specialization of higher-order predicates where the value of the
|
|
higher-order arguments are known (higher_order.m)
|
|
|
|
* inlining (i.e. unfolding) of simple procedures (inlining.m)
|
|
|
|
* detection of common terms constructions (common.m)
|
|
|
|
common.m looks for construction unifications which
|
|
constructs a term that is the same as one that already exists,
|
|
and replaces them with assignment unifications
|
|
|
|
* constraint propagation (constraint.m)
|
|
|
|
Not yet working.
|
|
|
|
* issue warnings about unused arguments from predicates, and create
|
|
specialized versions without them (unused_args.m); type_infos are
|
|
often unused
|
|
|
|
* elimination of dead procedures (dead_proc_elim.m); inlining, higher-order
|
|
specialization and the elimination of unused args can make procedures dead
|
|
even the user doesn't
|
|
|
|
* removal of excess assignment unifications (excess.m)
|
|
|
|
* migration of builtins following branched structures (follow_code.m)
|
|
|
|
The module transform.m contains stuff that is supposed to be useful
|
|
for high-level optimizations (but which is not yet used).
|
|
|
|
4. Code generation
|
|
|
|
* pre-passes to annotate the HLDS
|
|
|
|
Before code generation there are a few more passes which
|
|
annotate the HLDS with information used for code generation:
|
|
|
|
choosing registers for procedure arguments (arg_info.m)
|
|
Currently uses a very simple algorithm, but
|
|
we may change this later.
|
|
annotation of goals with liveness information (liveness.m)
|
|
This records the birth and death of each variable
|
|
in the HLDS goal_info.
|
|
allocation of stack slots
|
|
This is done by live_vars.m, which works
|
|
out which variables need to be saved on the
|
|
stack when, and then uses graph_colour.m to determine
|
|
a good allocation of variables to stack slots.
|
|
allocating the follow vars (follow_vars.m)
|
|
Traverses backwards over the HLDS,
|
|
annotating each call with the variable target
|
|
locations for the following call, so we can
|
|
generate efficient code by putting variables in
|
|
the right spot.
|
|
allocating the store map (store_alloc.m)
|
|
Allocates locations for variables at the end of
|
|
branched goals. Annotates the goal_info for
|
|
each branched goal with allocation of variable
|
|
target locations before so that we can generate
|
|
correct code by putting variables in the same
|
|
spot in each branch.
|
|
|
|
* code generation
|
|
|
|
For code generation itself, the main module is code_gen.pp.
|
|
It handles conjunctions and negations, but calls sub-modules
|
|
to do most of the other work:
|
|
|
|
ite_gen.m (if-then-elses)
|
|
call_gen.m (predicate calls and also calls to
|
|
out-of-line unification procedures)
|
|
disj_gen.m (disjunctions)
|
|
unify_gen.m (unifications)
|
|
switch_gen.m (switches), which has sub-modules
|
|
dense_switch.m
|
|
string_switch.m
|
|
tag_switch.m
|
|
|
|
It also calls middle_rec.m to do middle recursion optimization.
|
|
|
|
The code generation modules make use of
|
|
code_info.m
|
|
The main data structure for the code generator
|
|
code_aux.m
|
|
Various preds which use code_info
|
|
code_exprn.m
|
|
This defines the exprn_info type, which is
|
|
a sub-component of the code_info data structure
|
|
which holds the information about
|
|
the contents of registers and
|
|
the values/locations of variables.
|
|
exprn_aux.m
|
|
Various preds which use exprn_info
|
|
code_util.m
|
|
Some miscellaneous preds used for code generation
|
|
|
|
The result of code generation is the Low Level Data Structure (llds.m).
|
|
The code is generated as a tree of code fragments which is then
|
|
flattened (tree.m).
|
|
|
|
5. Low-level optimization
|
|
|
|
The various LLDS-to-LLDS optimizations are invoked from optimize.m.
|
|
They are:
|
|
|
|
* optimization of jumps to jumps (jumpopt.m)
|
|
|
|
* elimination of duplicate code sequences (dupelim.m)
|
|
|
|
* optimization of stack frame allocation/deallocation (frameopt.m)
|
|
|
|
* dead code and dead label removal (labelopt.m)
|
|
|
|
* value numbering
|
|
|
|
This is done by value_number.m, which has the following sub-modules:
|
|
|
|
vn_block.m
|
|
vn_cost.m
|
|
vn_debug.m
|
|
vn_flush.m
|
|
vn_order.m
|
|
atsort.m ("approximate topological sort" predicates)
|
|
vn_table.m
|
|
vn_temploc.m
|
|
vn_type.m
|
|
vn_util.m
|
|
|
|
Several of these modules (and also frameopt, above) use livemap.m.
|
|
|
|
* peephole optimization (peephole.m)
|
|
|
|
Depending on which optimization flags are enabled,
|
|
optimize.m may invoke many of these passes multiple times.
|
|
|
|
Some of the low-level optimization passes use opt_util.m, which
|
|
contains miscellaneous predicates for LLDS-to-LLDS optimization.
|
|
|
|
6. Output C code
|
|
|
|
Final generation of C code is done in llds.m.
|
|
|
|
-----------------------------------------------------------------------------
|
|
|
|
MISCELLANEOUS
|
|
|
|
special_pred.m, unify_proc.m:
|
|
These modules contain stuff for handling the special
|
|
compiler-generated predicates which are generated for
|
|
each type: unify/2, compare/3, index/1 (used in the
|
|
implementation of compare/3), and also type_to_term/2
|
|
and term_to_type/2 (but those last two are disabled
|
|
at the moment).
|
|
|
|
dependency_graph.m:
|
|
This contains predicates to compute the call graph for a
|
|
module, and to print it out to a file.
|
|
(The call graph file is used by the profiler.)
|
|
The call graph may eventually also be used by det_analysis.m,
|
|
inlining.m, and other parts of the compiler which need
|
|
to traverse the predicates in a module bottom-up or top-down.
|
|
|
|
passes_aux.m
|
|
Contains write_progress_message, which is called
|
|
from various places to write progress messages.
|
|
|
|
opt_debug.m:
|
|
Utility routines for debugging the LLDS-to-LLDS optimizations.
|
|
|
|
-----------------------------------------------------------------------------
|
|
|
|
CURRENTLY USELESS
|
|
|
|
The following modules do not serve any function at the moment.
|
|
Some of them are obsolete; other are work-in-progress.
|
|
(For some of them its hard to say which!)
|
|
|
|
utils.m
|
|
Higher-order predicates. Has suffered from software rot.
|
|
Should go in library/{list,map,...}.m anyway.
|
|
|
|
swi_lib.m, swi_builtin.m:
|
|
Support for SWI-Prolog. Incomplete.
|
|
Should go in the library directory anyway.
|
|
|
|
no_builtin.m:
|
|
An empty module, for debugging purposes.
|
|
|
|
nit_builtin.m:
|
|
Support for `nit', the NU-Prolog incompetence tester.
|
|
We don't use nit anymore - the Mercury compiler does
|
|
a better job.
|
|
|
|
mercury_to_goedel.m:
|
|
This converts from item_list to Goedel source code.
|
|
It works for simple programs, but doesn't handle
|
|
various Mercury constructs such as lambda expressions,
|
|
higher-order predicates, and functor overloading.
|
|
|
|
mercury_to_c.m:
|
|
The very incomplete beginnings of an alternate
|
|
code generator. When finished, it will convert HLDS
|
|
to high-level C code (without going via LLDS).
|
|
|
|
shapes.m, garbage_out.m:
|
|
These two modules are used for the native garbage collector.
|
|
|
|
-----------------------------------------------------------------------------
|