mirror of
https://github.com/Mercury-Language/mercury.git
synced 2025-12-08 02:11:55 +00:00
Replacing item blocks file-kind-specific kinds of section markers with
file-kind-specific parse trees has several benefits.
- It allows us to encode the structural invariants of each kind of file
we read in within the type of its representation. This makes the detection
of any accidental violations of those invariants trivial.
- Since each file-kind-specific parse tree has separate lists for separate
kinds of items, code that wants to operate on one or a few kinds of items
can just operate on those kinds of items, without having to traverse
item blocks containing many other kinds of items as well. The most
important consequence of this is not the improved efficiency, though
that is nice, but the increased clarity of the code.
- The new design is much more flexible. For example, it should be possible
to record that e.g. an interface file we read in as a indirect dependency
(i.e. a file we read not because its module was imported by the module
we are compiling, but because its module was imported by *another* imported
module) should be used *only* for the purpose it was read in for. This should
avoid situations where deleting an import of A from a module, because it
is not needed anymore, leads the compiler to generate an error message
about a missing import of module B. This can happen if (a) module B
always *should* have been imported, since it is used, but (b) module A's
import of module B lead to module B's interface being available *without*
an import of B.
Specifically, this flexibility should enable us to establish each module's
.int file as the single source of truth about how values of each type
defined in that module should be represented. When compiling each source
file, this approach requires the compiler to read in that module's .int file
but using only the type_repn items from that .int file, and nothing else.
- By recording a single parse tree for each file we have read, instead of
a varying number of item blocks, it should be significantly easier to
derive the contents of .d files directly from the records of those
parse trees, *without* having to maintain a separate set of fields
in the module_and_imports structure for that purpose. We could also
trivially avoid any possibility of inconsistencies between these two
different sources of truth. (We currently fill in the fields used to
drive the generation of .d files using two different pieces of code,
one used for --generate-dependencies and one used for all other invocations,
and these two *definitely* generate inconsistent results, as the significant
differences in .d files between (a) just after an invocation of
--generate-dependencies and (b) just after any other compiler invocation
can witness.)
This change is big and therefore hard to review. Therefore in many files,
this change adds "XXX CLEANUP" comments to draw attention to places that
have issues that should be fixed, but whose fixes should come later, in
separate diffs.
compiler/module_imports.m:
The compiler uses the module_and_imports structure defined here
to go from a raw compilation unit (essentially a module to be compiled)
to an augmented compilation unit (a raw compilation unit together
with all the interface and optimization files its compilation needs).
We used to store the contents of both the source file and of
the interface and optimization files in the module_and_imports structure
as item blocks. This diff replaces all those item blocks with
file-kind-specific parse trees, for the reasons mentioned above.
Separate out the .int0 files of ancestors modules from the .intN
files for N>0 of directly imported modules. (Their item blocks
used to be stored in the same list.)
Maintain a database of the source, interface and optimization files
we have read in so far. We use it to avoid reading in interface files
if we have already read in a file for the same module that contains
strictly more information (either an interface file with a smaller
number as a suffix, or the source file itself).
Shorten some field names.
compiler/prog_item.m:
Define data structures for storing information about include_module,
import_module and use_module declarations, both in a form that allows
the representation of possibly erroneous code in actual source files,
and in checked-and-cleaned-up form which is guaranteed to be free
of the relevant kinds of errors. Add a block comment at the start
of the module about the need for this distinction.
Define parse_tree_module_src, a data structure for representing
the source code of a single module. This is different from the existing
parse_tree_src type, which represents the contents of a single source file
but which may contain *more* than one module, and also different from
a raw_compilation_unit, which is based on item blocks and is thus
unable to express to invariants such as "no clauses in the interface".
Modify the existing parse_tree_intN types to express the distinction
mentioned just above, and to unify them "culturally", i.e. if they
store the same information, make them store it using the same types.
Fix a mistake by allowing promises to appear in .opt files.
I originally ruled them out because the code that generates .opt files
does not have any code to write out promises, but some of the predicates
whose clauses it writes out have goal_type_promise, which means that
they originated as promises, and get written out as promises.
Split the existing pragma item kind into three item kinds, which have
different invariants applying to them.
- The decl (short for declarative) pragmas give the compiler some
information, such as that a predicate is obsolete or that we
want to type specialize some predicate or function, that is in effect
part of the module's interface. Decl pragmas may appear in module
interfaces, and the compiler may put them into interface files;
neither statement is true of the other two kinds of pragmas.
- The impl (short for implementation) pragmas are named so
precisely because they may appear only in implementation sections.
They give the compiler information that is private to that module.
Examples include foreign_decls, foreign_codes, foreign_procs,
and promises of clause equivalence, and requests for inlining,
tabling etc. These will never be put into interface files,
though some of them can affect the compilation of other modules
by being included in .opt files.
- The gen (short for generated) pragmas can never (legally) appear
in source files at all. They record the results of compiler
analyses e.g. about which arguments of a predicate are unused,
or what exceptions a function can throw, and accordingly they
should only ever occur in compiler-generated interface files.
Use the new type differences between the three kinds of pragmas
to encode the above invariants about which kinds of pragmas can appear
where into the various kinds of parse trees.
Make the augmented compilation unit, which is computed from
the final module_and_imports structure, likewise switch from
storing item blocks to storing the whole parse trees of the
files that went into its construction. With each such parse tree,
record *why* we read it, since this controls what permissions
the source module being compiled has for access to the entities
in the parse tree.
Simplify the contains_foreign_code type, since one of three
function symbols was equivalent to one possible use of another
function symbol.
Provide a way to record which method of which class a compiler-generated
predicate is for. (See hlds_pred.m below.)
Move the code of almost all utility operations to item_util.m
(which is imported by many fewer modules than prog_item.m),
keeping just the most "popular" ones.
compiler/item_util.m:
Move most of the previously-existing utility operations here from
prog_item.m, most in a pretty heavily modified form.
Add a whole bunch of other utility operations that are needed
in more than one other module.
compiler/convert_parse_tree.m:
Provide predicates to convert from raw compilation units to
parse_tree_module_srcs, and vice versa (though the reverse
shouldn't be needed much longer).
Update the conversion operations between the general parse_tree_int
and the specific parse_tree_intN forms for the changes in prog_item.m
mentioned above. In doing so, use a consistent approach, based on
new operations in item_util.m, to detect errors such as duplicate
include_module and import/use_module declarations in all kinds
of parse trees.
Enforce the invariants that the types of parse trees of various kinds
can now express in types, generating error messages for their violations.
Delete some utility operations that have been moved to item_util.m
because now they are also needed by other modules.
compiler/grab_modules.m:
Delete code that did tests on raw compilation units that are now done
when that raw compilation unit is converted to a parse_tree_module_src.
Use the results of the checks done during that conversion to decide
which modules are imported/used and in which module section.
Record a single reason for why we reading in each interface and
optimization file. The code of make_hlds_separate_items.m will use
this reason to set up the appropriate permissions for each item
in those files.
Use separate code for handling different kinds of interface and
optimization files. Using generic traversal code was acceptable economy
when we used the same data structure for every kind of interface file,
but now that we *can* express different invariants for different kinds
of interface and optimization file, we want to execute not just different
code for each kind of file, but the data structures we want to work on
are also of different types. Using file-kind-specific code is a bit
longer, but it is significantly simpler and more robust, and it is
*much* easier to read and understand.
Delete the code that separates the parts of the implementation section
that are exported to submodules, and the part that isn't, since that task
is now done in make_hlds_separate_items.m.
Pass a database of the files we have read through the relevant predicates.
Give some predicates more meaningful names.
compiler/notes/interface_files.html:
Note a problem with the current operation of grab_modules.
compiler/get_dependencies.m:
Add operations to gather implicit references to builtin modules
(which have to be made available even without an explicit import_module
or use_module declaration) in all kinds of parse trees. These have
more code overall, but will be at runtime, since we need only look at
the item kinds that may *have* such implicit references.
Add a mechanism to record the result of these gathering operations
in import_and_or_use_maps.
Give some types, function symbols, predicates and variables
more meaningful names.
compiler/make_hlds_separate_items.m:
When we stored the contents of the source module and the
interface and optimization files we read in to augment it
in the module_and_imports structure as a bunch of item blocks,
the job of this module was to separate out the different kinds of items
in the item blocks, returning a single list of each kind of item,
with each such item being packaged up with its status (which encodes
a set of permissions saying what the source module is allowed
to do with it).
Now that the module_and_imports structure stores this info in
file-kind-specific parse trees, all of which have separate lists
for each kind of item and none of which contain item blocks,
the job of this module has changed. Now its job is to convert
the reason why each file was read in into the (one or more) statuses
that apply to the different kinds of items stored in it, wrap up
each item with its status, and return the resulting overall list
of status/item pairs for each kind of item.
compiler/read_modules.m:
Add predicates that, when reading an interface file, return its contents
in the tightest possible file-kind-specific parse tree.
Refine the database of files we have read to allow us to store
more file-kind-specific parse trees.
Don't require that files in the database have associated timestamps,
since in some cases, we read files we can put into the database
*without* getting their timestamps.
Allow the database to record that an attempt to read a file failed.
compiler/split_parse_tree_src.m:
Rearchitect how this module separates out nested submodules from within
the main module in a file.
Another of the jobs of this module is to generate error messages for
when module A includes module B twice, whether via nesting or via
include_module declarations, with one special exception for the case
where A's interface contains nested submodule A.B's interface,
and A's implementation contains nested submodule A.B's implementation.
The problem ironically was that while it reported duplicate include_module
declarations as errors, split_parse_tree_src.m also *generated*
duplicate include_module declarations. Since it replaced each nested
submodule occurrence with an include_module declaration, in the scenario
above, it generated two include_module declarations for A.B. Even worse,
the interface incarnation of submodule A.B could contain
(the interface of) its own nested submodule A.B.C, while its
implementation incarnation could contain (the implementation section of)
A.B.C. Each occurrence of A.B.C would be its only occurrence in the
including part of its parent A.B, which means local tests for duplicates
do not work. (I found this out the hard way.)
The solution we now adopt adds include_module declarations to the
parents of any submodule only once the parse tree of the entire
file has been processed, since only then do we know all the
includer/included relationships among nested modules. Until then,
we just record such relationships in a database as we discover them,
reporting duplicates when needed (e.g. when A includes B twice
*in the same section*), but not reporting duplicates when not needed
(e.g. when A.B includes A.B.C in *different* sections).
compiler/prog_data.m:
Add a new type, pf_sym_name_and_arity, that exactly specifies
a predicate or function. It is a clone of the existing simple_call_id
type, but its name does NOT imply that the predicate or function
is being called.
Add XXXs that call for some other improvements in type names.
compiler/prog_data_foreign.m:
Give a type, and the operations on that type, a more specific name.
compiler/error_util.m:
Add an id field to all error_specs, which by convention should be
filled in with $pred. Print out the value in this field if the compiler
is invoked with the developer-only option --print-error-spec-id.
This allows a person debugging the compiler find out where in the code
an undesired error message is coming from significantly easier
than was previously possible.
Most of the modules that have changes only "to conform to the changes
above" will be for this change. In many cases, the updated code
will also simplify the creation of the affected error_specs.
Fix a bug that looked for a phase in only one kind of error_spec.
Add some utility operations needed by other parts of this change.
Delete a previously internal function that has been moved to
mdbcomp/prim_data.m to make it accessible in other modules as well.
compiler/Mercury.options:
Ask the compiler to warn about dead predicates in every module
touched by this change (at least in one its earlier versions).
compiler/add_foreign_enum.m:
Replace a check for an inappropriately placed foreign_enum declaration
with a sanity check, since with this diff, the error should be caught
earlier.
compiler/add_mutable_aux_preds.m:
Delete a check for an inappropriately placed mutable declaration,
since with this diff, the error should be caught earlier.
compiler/add_pragma.m:
Instead of adding pass2 and pass3 pragmas, add decl and impl and
generated pragmas.
Delete the tests for generated pragma occurring anywhere except
.opt files, since those tests are now done earlier.
Shorten some too-long predicate names.
compiler/comp_unit_interface.m:
Operate on as specific kinds of parse trees as the interface of this
module will allow. (We could operate on more specific parse trees
if we changed the interface, but that is future work).
Use the same predicates for handling duplicate include_module,
import_module and use_module declarations as everywhere else.
Delete the code of an experiment that shouldn't be needed anymore.
compiler/equiv_type.m:
Replace code that operated on item blocks with code that operates
on various kinds of parse trees.
Move a giant block of comments to the front, where it belongs.
compiler/hlds_module.m:
Add a field to the module_info that lets us avoid generating
misleading error messages above missing definitions of predicates
or functions when those definitions were present but were not
added to the HLDS because they had errors.
Give a field and its access predicates a more specific name.
Mark a spot where an existing type cannot express everything
it is supposed to.
compiler/hlds_pred.m:
For predicates which the compiler creates to represent a class method
(the virtual function, in OOP terms), record not just this fact,
but the id of the class and of the method. Using this extra info
in progress messages (with mmc -V) prevents the compiler from printing e.g.
% Checking typeclass constraints on class method
% Checking typeclass constraints on class method
% Checking typeclass constraints on class method
when checking three such predicates.
compiler/make.m:
Provide a slot in the make_info structure to allow the database
of the files we have read in to be passed around.
compiler/make_hlds_error.m:
Delete predicates that are needed in just one other module,
and have therefore been moved there.
compiler/make_hlds_passes.m:
Add decl, impl and generated pragma separately, instead of adding
pass2 and pass3 pragmas separately.
Do not generate error messages for clauses, initialises or finalises
in module interfaces, since with this diff, such errors should be
caught earlier.
compiler/mercury_compile_main.m:
compiler/recompilation.check.m:
Explicitly pass around the expanded database of parse trees
of files that have been read in.
compiler/module_qual.collect_mq_info.m:
compiler/module_qual.m:
compiler/module_qual.qualify_items.m:
Collect module qualification information, and do module qualification
respectively on parse trees of various kinds, not item blocks.
Take information about what the module may do with the contents
of each interface or optimization file from the record of why
we read that file, not from the section markers in item blocks.
Break up some too-large predicates by carving smaller ones out of them.
compiler/options.m:
Add an option to control whether errors and/or warnings detecting
when deciding what should go into a .intN file be printed,
thus (potentially) preventing the creation of that file.
Add commented-out documentation for a previously totally undocumented
option.
doc/user_guide.texi:
Document the new option.
NEWS:
Announce the new option.
Mention that we now generate warnings for unused import_module and
use_module declarations in the interface even if the module has
submodules.
compiler/write_module_interface_files.m:
Let the new option control whether we filter out any messages generated
when deciding what should go into a .intN file.
compiler/parse_item.m:
Delete actually_read_module_opt, since it is no longer needed;
its callers now call actually_read_module_{plain,trans}_opt instead.
Delete unneeded arguments from some predicates.
compiler/parse_module.m:
Delete some long unused predicates.
compiler/parse_pragma.m:
When parsing pragmas, wrap them up in the new decl, impl or generated
pragma kinds.
compiler/parse_tree_out.m:
Add predicates to write out each of the file-kind-specific parse trees.
compiler/parse_tree_out_pragma.m:
Add predicates to write out decl, impl and generated pragmas.
compiler/polymorphism.m:
Add a conditionally-enabled progress message, which can be useful
in tracking down problems.
compiler/prog_item_stats.m:
Conform NOT to the changes above beyond what is needed to let this module
compile. Let that work be done the next time the functionality of
this module is needed, by which time the affected data structures
maybe have changed further.
compiler/typecheck.m:
Fix a performance problem. With intermodule optimization, we read in
.opt files, some of which (e.g. list.opt and int.opt) contain promises.
These promises are read in as predicates with goal_type_promise,
but they do not have declarations of the types of their arguments
(since promises do not have declarations as such). Those argument types
therefore have to be inferred. That inference replaces the original
"I don't know" argument types with their actual types.
The performance problem is that when we change the recorded argument types
of a predicate, we require another loop over all the predicates in the
module, so that any calls to this predicate can be checked against
the updated types. This is as it should be for callable predicates,
but promises are not callable. So if all the *only* predicates whose
recorded argument types change during the first iteration to fixpoint
are promises, then a second iteration is not needed, yet we used to do it.
The fix is to replace the "Have the recorded types of this predicate
changed?" boolean flag with a bespoke enum that says "Did the checking
of this predicate discover a need for another iteration", and not
setting it when processing predicates whose type is goal_type_promise.
compiler/typecheck_errors.m:
Do not generate an error message for a predicate missing its clauses
is the clauses existed but were not added to the HLDS because they were
in the interface section.
When reporting on ambiguities (when a call can match more than one
predicate or function), sort the possible matches before reporting
them.
compiler/accumulator.m:
compiler/add_class.m:
compiler/add_clause.m:
compiler/add_foreign_proc.m:
compiler/add_mode.m:
compiler/add_pragma_tabling.m:
compiler/add_pragma_type_spec.m:
compiler/add_pred.m:
compiler/add_type.m:
compiler/canonicalize_interface.m:
compiler/check_for_missing_type_defns.m:
compiler/check_parse_tree_type_defns.m:
compiler/check_promise.m:
compiler/check_raw_comp_unit.m:
compiler/check_typeclass.m:
compiler/common.m:
compiler/compile_target_code.m:
compiler/compiler_util.m:
compiler/dead_proc_elim.m:
compiler/deps_map.m:
compiler/det_analysis.m:
compiler/det_report.m:
compiler/du_type_layout.m:
compiler/field_access.m:
compiler/find_module.m:
compiler/float_regs.m:
compiler/format_call.m:
compiler/goal_expr_to_goal.m:
compiler/handle_options.m:
compiler/hlds_out_module.m:
compiler/hlds_out_pred.m:
compiler/hlds_out_util.m:
compiler/inst_check.m:
compiler/intermod.m:
compiler/introduce_parallelism.m:
compiler/layout_out.m:
compiler/make.dependencies.m:
compiler/make.module_dep_file.m:
compiler/make_hlds_warn.m:
compiler/mark_tail_calls.m:
compiler/mercury_compile_llds_back_end.m:
compiler/ml_top_gen.m:
compiler/mmakefiles.m:
compiler/mode_errors.m:
compiler/mode_robdd.equiv_vars.m:
compiler/modes.m:
compiler/module_qual.qual_errors.m:
compiler/oisu_check.m:
compiler/old_type_constraints.m:
compiler/options_file.m:
compiler/parse_class.m:
compiler/parse_dcg_goal.m:
compiler/parse_goal.m:
compiler/parse_inst_mode_defn.m:
compiler/parse_inst_mode_name.m:
compiler/parse_mutable.m:
compiler/parse_sym_name.m:
compiler/parse_type_defn.m:
compiler/parse_type_name.m:
compiler/parse_type_repn.m:
compiler/parse_types.m:
compiler/parse_util.m:
compiler/parse_vars.m:
compiler/post_term_analysis.m:
compiler/post_typecheck.m:
compiler/prog_event.m:
compiler/prog_mode.m:
compiler/purity.m:
compiler/qual_info.m:
compiler/recompilation.version.m:
compiler/resolve_unify_functor.m:
compiler/simplify_goal.m:
compiler/simplify_goal_call.m:
compiler/simplify_goal_disj.m:
compiler/simplify_goal_ite.m:
compiler/simplify_proc.m:
compiler/state_var.m:
compiler/stratify.m:
compiler/style_checks.m:
compiler/superhomogeneous.m:
compiler/table_gen.m:
compiler/term_constr_errors.m:
compiler/term_errors.m:
compiler/termination.m:
compiler/trace_params.m:
compiler/unused_args.m:
compiler/unused_imports.m:
compiler/write_deps_file.m:
compiler/xml_documentation.m:
Conform to the changes above.
mdbcomp/prim_data.m:
Move a utility function on pred_or_funcs here from a compiler module,
to make it available to other compiler modules as well.
scripts/compare_s1s2_lib:
A new script that helped debug this diff, and may help debug
similar diffs the future. It can compare (a) .int* files, (b) .*opt
files, (c) .mh/.mih files or (d) .c files between the stage 1 and
stage 2 library directories. The reason for the restriction
to the library directory is that any problems affecting the
generation of any of these kinds of files are likely to manifest
themselves in the library directory, and if they do, the bootcheck
won't go on to compile any of the other stage 2 directories.
tests/debugger/breakpoints.a.m:
tests/debugger/breakpoints.b.m:
Move import_module declarations to the implementation section
when they are not used in the interface. Until now, the compiler
has ignored this, but this diff causes the compiler to generate
a warning for such misplaced import_module declarations even modules
that have submodules. The testing of such warnings is not the point
of the breakpoints test.
tests/invalid/Mercury.options:
Since the missing_interface_import test case tests error messages
generated during an invocation of mmc --make-interface, add the
new option that *allows* that invocation to generate error messages.
tests/invalid/ambiguous_overloading_error.err_exp:
tests/invalid/max_error_line_width.err_exp:
tests/warnings/ambiguous_overloading.exp:
Expect the updated error messages for ambiguity, in which
the possible matches are sorted.
tests/invalid/bad_finalise_decl.m:
tests/invalid/bad_initialise_decl.m:
Fix programming style.
tests/invalid/bad_item_in_interface.err_exp:
Expect an error message for a foreign_export_enum item in the interface,
where it should not be.
tests/invalid/errors.err_exp:
Expect the expanded wording of a warning message.
tests/invalid/foreign_enum_invalid.err_exp:
Expect a different wording for an error message. It is more "standard"
but slightly less informative.
tests/invalid_submodules/children2.m:
Move a badly placed import_module declaration, to avoid having
the message the compiler now generates for it from affecting the test.
tests/submodules/parent2.m:
Move a badly placed import_module declaration, to avoid having
the message the compiler now generates for it from affecting the test.
Update programming style.
60 lines
1.7 KiB
Bash
Executable File
60 lines
1.7 KiB
Bash
Executable File
#!/bin/sh
|
|
# Compare the stage1 and stage2 versions of every automatically-generated
|
|
# file of the specified kinds in the library directory. The possible kinds
|
|
# are interface files, optimization files, and C headers and sources.
|
|
#
|
|
# The intended use is debugging changes in the compiler that affect the
|
|
# generation of such files. The stage1 versions are generated by a known-good
|
|
# compiler and are known to work (if they didn't work, there would be
|
|
# no stage 1 compiler to generate stage 2), so if stage 2 won't build
|
|
# due to a problem with an interface file in the library, the differences
|
|
# output by this script should contain the culprit(s).
|
|
|
|
cd library
|
|
|
|
for filekind in "$@"
|
|
do
|
|
case "${filekind}" in
|
|
"int")
|
|
kind_exts="int3 int2 int int0"
|
|
;;
|
|
"opt")
|
|
kind_exts="opt trans_opt"
|
|
;;
|
|
"hdr")
|
|
kind_exts="mih mh"
|
|
;;
|
|
"c")
|
|
kind_exts="c"
|
|
;;
|
|
*)
|
|
echo "${filekind} is not a kind of file"
|
|
exit 1
|
|
;;
|
|
esac
|
|
|
|
for module in *.m
|
|
do
|
|
for ext in ${kind_exts}
|
|
do
|
|
m=`basename ${module} .m`
|
|
f="${m}.${ext}"
|
|
if test -f "${f}"
|
|
then
|
|
if test -f "../stage2/library/${f}"
|
|
then
|
|
diff -u "${f}" "../stage2/library/${f}" > .diff
|
|
if test -s .diff
|
|
then
|
|
echo "=== $f ==="
|
|
cat .diff
|
|
rm .diff
|
|
fi
|
|
else
|
|
echo "XXX ${f} not in stage2 XXX"
|
|
fi
|
|
fi
|
|
done
|
|
done
|
|
done
|