mirror of
https://github.com/Mercury-Language/mercury.git
synced 2026-04-23 21:33:49 +00:00
083d376e6598628362ee91c2da170febd83590f4
109 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
d49f6eab84 |
Add missing imports of parent modules.
These imports were missing from source files, but were included in imported modules' .int3 files. An upcoming change will delete these from those .int3 files. |
||
|
|
24b98fdafe |
Pack sub-word-sized ints and dummies in terms.
Previously, the only situation in which we could pack two or more arguments
of a term into a single word was when all those arguments are enums. This diff
changes that, so that the arguments can also be sub-word-sized integers
(signed or unsigned), or values of dummy types (which occupy zero bits).
This diff also records, for each argument of a function symbol, not just
whether, and if yes, how it is packed into a word, but also at *what offset*
that word is in the term's heap cell. It is more economical to compute this
once, when the representation of the type is being decided, than to compute
it over and over again when terms with that function symbol are being
constructed or deconstructed. However, for a transition period, we compute
these offsets at *both* times, to check the consistency of the new algorithm
for computing offsets that is run at "decide representation time" with
the old algorithms run at "generate code for a unification time".
compiler/du_type_layout.m:
Make the changes described above: pack sub-word-sized integers and
dummy values into argument words, if possible, and if the relevant
new option allows it. These options are temporary. If we find no problems
with the new packing algorithm in a few weeks, we should be able to
delete them.
Allow 64 bit ints and uints to be stored in unboxed in two words
on 32 bit platforms, if the relevant new option allows it. Support
for this is not yet complete, but it makes sense to implement the
RTTI changes for both this change and one described in the above
paragraph together.
For each packed argument, record not just its width, its shift and
the mask, but also the number of bits the argument takes. Previously,
we computed this on demand from the mask, but there is no real need
for that when simply storing this info is so cheap.
For all arguments, packed or not, record its offset, relative to both
the start of the arguments, and the start of the memory cell. (The two
are different if the arguments are preceded by either a remote secondary
tag, the typeinfos and/or typeclass_infos describing some existentially
typed arguments, or both.) The reason for this is given at the top.
Centralize the decision of the parameters of packing in one predicate.
If the option --inform-suboptimal-packing is given, print an informational
message whenever the code deciding type representations finds that
reordering the arguments of a function symbol would allow it to pack
the arguments of that function symbol into less space.
compiler/options.m:
Add the option --allow-packing-ints which controls whether
du_type_layout.m will attempt to pack {int,uint}{8,16,32} arguments
alongside enum arguments.
Add the option --allow-packing-dummies which controls whether
du_type_layout.m will optimize away (in other words, represent in 0 bits)
arguments of dummy types.
Add the option --allow-double-word-ints which controls whether
du_type_layout.m will store arguments of the types int64 and uint64
unboxed in two words on 32 bit platforms, the way it currently stores
double precision floats.
All three those options are off by default, which preserves binary
compatibility with existing code. However, the first two are ready
to be switched on (the third is not).
All three options are intended to be present in the compiler
only until these changes are tested. Once we deem them sufficiently
tested, I will modify the compiler to always do the packing they control,
at which point we can delete these options. This is why they are not
documented.
Add the option --inform-suboptimal-packing, whose meaning is described
above.
doc/user_guide.texi:
Document --inform-suboptimal-packing.
compiler/prog_data.m:
For each argument of a function symbol in a type definition, use
a new type called arg_pos_width to record the extra information
mentioned above in (offsets for all arguments, and number of bits
for packed arguments).
For each function symbol that has some existential type constraints,
record the extra information mentioned for parse_type_defn.m below.
compiler/hlds_data.m:
Include the position, as well as the width, in the representation
of the arguments of function symbols.
Previously, we used the integer 0 as a tag for dummies. Add a tag to
represent dummy values, since this gives more information to any code
that sees that tag.
compiler/ml_unify_gen.m:
compiler/unify_gen.m:
Handle the packing of dummy values, and of sub-word-sized ints and uints.
Compare the cell offset of each argument computed using existing
algorithms here with the cell offset recorded in the argument's
representation, and abort if they are different.
In some cases, restructure code a bit to make it possible.
For example, for tuples and closures, this means that instead of
simply recording that each tuple argument or closure element
is a full word, we must record its correct offset as well.
Handle the new dummy_tag.
Add prelim (not yet finished) support for double-word int64s/uint64s
on 32 bit platforms.
When packing the values of two or more variables (or constants) into a
single word in a memory cell, optimize away operations that are no-ops,
such as shifting anything by zero bits, shifting the constant zero
by any number of bits, and ORing anything with zero. This makes the
generated code easier to read. It is probably also faster for us
to do it here than to write out a bigger expression, have the C compiler
read in the bigger expression, and then later make the same optimization.
In ml_unify_gen.m, avoid the unnecessary use of a list of the argument
variables' types separate from the list of the argument variables
themselves; just look up the type of each argument variable when it is
processed.
compiler/add_special_pred.m:
When creating special (unify and compare) predicates for tuples,
include the offsets in the representation of their arguments.
Delete an unused predicate.
compiler/llds.m:
Add a new way to create an rval: a cast. We use it to implement
the extraction of signed sub-word-sized integers from packed argument
words in terms. Masking the right N bits out of the packed word
leaves the other 32-N or 64-N bits as zeroes; a cast to int8_t,
int16_t or int32_t will copy the sign bit to these bits.
Likewise, when we pack signed int{8,16,32} values into words,
we cast them to their unsigned versions to throw away any sign-extension
bits in their original word-sized representations.
No similar change is needed for the MLDS, since that already had
a mechanism for casts.
compiler/mlds.m:
Note a potential simplification in the MLDS.
compiler/builtin_lib_types.m:
Add functions to return the Mercury representation of the int64
and uint64 types.
compiler/foreign.m:
Export a specialized version of an existing predicate, to allow
ml_unify_gen.m to avoid the costs of the more general version.
compiler/hlds_out_module.m:
Always print the representations of all arguments, since the
inclusion of position information in those representation means that
the representations of even all-full-word-argument terms are of potential
interest when debugging term representations.
compiler/lco.m:
Do not try to apply LCO to arguments of dummy types. (We could optimize
them differently, by filling them in before they are "computed", but
that is a separate optimization, which is of *very* low priority.)
compiler/liveness.m:
Do not include variables of dummy types in resume points.
The reason for this is that the code that establishes a resume point
returns, for each such variable, a list of *lvals* where that variable
can be found. The new code in unify_gen.m will optimize away assignments
to values of dummy types, so there is *no* lval where they can be found.
We could allocate one, but doing so would be a pessimization. Instead,
we simply don't save and restore such values. When their value (which is
always 0) is needed, we can create them out of thin air.
compiler/ml_global_data.m:
Include the target language in the ml_global_data structure, to prevent
some of its users having to look it up in the module_info.
Add notes about the specializing the implementation of arrays of
int64s/uint64s on 32 bit platforms.
compiler/check_typeclass.m:
compiler/ml_type_gen.m:
Add sanity checks of the new precomputed fields of exist_constraints.
Conform to the changes above.
compiler/mlds_to_c.m:
Add prelim (not yet finished) support for double-word int64s/uint64s
on 32 bit platforms.
Add notes about possible optimizations.
compiler/parse_type_defn.m:
When a function symbol in a type definition contains existential
arguments, precompute and store the set of constrained and unconstrained
type variables. The code in du_type_layout.m needs this information
to compute the number of slots occupied by typeinfos and typeclass_infos
in memory cells for this function symbol, and several other places
in the compiler do too. It is easier and faster to compute this
information just once, and this is the earliest time what that can be done.
compiler/type_ctor_info.m:
Use the prerecorded information about existential types to simplify
the code here
compiler/polymorphism.m:
Add an XXX about possibly using the extra info we now record in
exist_constraints to simplify the job of polymorphism.m.
compiler/pragma_c_gen.m:
compiler/var_locn.m:
Create the values of dummy variables from scratch, if needed.
compiler/rtti.m:
Replace a bool with a bespoke type.
compiler/rtti_out.m:
compiler/rtti_to_mlds.m:
When generating RTTI information for the LLDS and MLDS backends
respectively, record new kinds of arguments as needing special
treatment. These are int64s and uint64s stored unboxed in two words
on 32 bit platforms, {int,uint}{8,16,32} values packed into words,
and dummy arguments. Each of these has a special code: its own negative
negative value in the num_bits field of the argument.
Generate slightly better formatted output.
compiler/type_util.m:
Delete a predicate that isn't needed anymore.
compiler/opt_util.m:
Delete a function that hasn't been needed for a while.
Conform to the changes above.
compiler/arg_pack.m:
compiler/bytecode_gen.m:
compiler/call_gen.m:
compiler/code_util.m:
compiler/ctgc.selector.m:
compiler/dupelim.m:
compiler/dupproc.m:
compiler/equiv_type.m:
compiler/equiv_type_hlds.m:
compiler/erl_code_gen.m:
compiler/erl_rtti.m:
compiler/export.m:
compiler/exprn_aux.m:
compiler/global_data.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/llds_out_data.m:
compiler/middle_rec.m:
compiler/ml_closure_gen.m:
compiler/ml_switch_gen.m:
compiler/ml_top_gen.m:
compiler/module_qual.qualify_items.m:
compiler/opt_debug.m:
compiler/parse_tree_out.m:
compiler/peephole.m:
compiler/recompilation.usage.m:
compiler/resolve_unify_functor.m:
compiler/stack_layout.m:
compiler/structure_reuse.direct.choose_reuse.m:
compiler/switch_util.m:
compiler/typecheck.m:
compiler/unify_proc.m:
compiler/unused_imports.m:
compiler/xml_documentation.m:
Conform to the changes above.
compiler/llds_out_util.m:
Add a comment.
compiler/ml_code_util.m:
Factor out some common code.
runtime/mercury_type_info.h:
Allocate special values of the MR_arg_bits field of the MR_DuArgLocn type
to designate arguments as two word int64/uint64s, as sub-word-sized
arguments of types {int,uint}{8,16,32}, or as arguments of dummy types.
(We already had a special value for two word float arguments.)
Document the list of places that know about this code, so that they
can be updated if and when it changes.
library/construct.m:
Handle the construction of terms with two-word int64/uint64 arguments,
with packed {int,uint}{8,16,32} arguments, and with dummy arguments.
Factor out the code common to the sectag-present and sectag-absent cases,
to make it possible to do the above in just *one* place.
library/store.m:
Add an XXX to a place that I don't think handles two word arguments
correctly. (I think this is an old bug.)
runtime/mercury_deconstruct.c:
Handle the deconstruction of terms with two-word int64/uint64 arguments,
with packed {int,uint}{8,16,32} arguments, and with dummy arguments.
runtime/mercury_deep_copy_body.h:
Handle the copying of terms with two-word int64/uint64 arguments,
with packed {int,uint}{8,16,32} arguments, and with dummy arguments.
Give a macro a more descriptive name.
runtime/mercury_type_info.c:
Handle taking the size of terms with two-word int64/uint64 arguments,
with packed {int,uint}{8,16,32} arguments, and with dummy arguments.
runtime/mercury.h:
Put related definitions next to each other.
runtime/mercury_deconstruct.h:
runtime/mercury_ml_expand_body.h:
Fix indentation.
tests/hard_coded/construct_test.{m,exp}:
Add to this test case a test of the construction, via the library's
construct.m module, of terms containing packed sub-word-sized integers,
and packed dummies.
tests/hard_coded/deconstruct_arg.{m,exp}:
Convert the source code of this test case to state variable notation,
and update the line number references (in the names of predicates created
from lambda expressions) accordingly.
tests/hard_coded/uint64_ground_term.{m,exp}:
A new test case to check that uint64 values too large to be int64 values
can be stored in static structures.
tests/hard_coded/Mmakefile:
Enable the new test case.
|
||
|
|
15aa457e12 | Delete $module arg from calls to unexpected. | ||
|
|
4ebc3ffa04 |
Carve four modules out of prog_data.m.
The prog_data.m module is imported by most modules of the compiler; by
359 modules out of 488, to be exact. Yet it has many parts that most of
those 359 modules don't need. This diff puts those parts into four new
modules. The number of imports of these modules:
348 modules import prog_data.m
84 modules import prog_data_foreign.m
62 modules import prog_data_pragma.m
12 modules import prog_data_event.m
5 modules import prog_data_used_modules.m
compiler/prog_data_event.m:
compiler/prog_data_foreign.m:
compiler/prog_data_pragma.m:
compiler/prog_data_used_modules.m:
New modules. They contain the parts of the parse tree that deal
respectively with the specification of events and event sets,
interfacing to foreign languages, pragmas, and the sets of used
(i.e. not unused) modules.
compiler/prog_data.m:
Delete the stuff that is now in the new modules. Put the remaining parts
of the module into a logical order.
compiler/parse_tree.m:
compiler/notes/compiler_design.html:
Include and document the new modules.
compiler/globals.m:
Move a type here from prog_data.m, since this is where it belongs.
compiler/add_foreign_proc.m:
compiler/add_mutable_aux_preds.m:
compiler/add_pragma.m:
compiler/add_solver.m:
compiler/add_trail_ops.m:
compiler/call_gen.m:
compiler/code_gen.m:
compiler/code_loc_dep.m:
compiler/comp_unit_interface.m:
compiler/compile_target_code.m:
compiler/complexity.m:
compiler/continuation_info.m:
compiler/coverage_profiling.m:
compiler/ctgc.datastruct.m:
compiler/ctgc.livedata.m:
compiler/ctgc.selector.m:
compiler/deep_profiling.m:
compiler/dep_par_conj.m:
compiler/deps_map.m:
compiler/det_analysis.m:
compiler/det_report.m:
compiler/elds_to_erlang.m:
compiler/equiv_type.m:
compiler/erl_call_gen.m:
compiler/exception_analysis.m:
compiler/export.m:
compiler/fact_table.m:
compiler/foreign.m:
compiler/frameopt.m:
compiler/get_dependencies.m:
compiler/goal_form.m:
compiler/goal_util.m:
compiler/granularity.m:
compiler/hlds_goal.m:
compiler/hlds_module.m:
compiler/hlds_out_goal.m:
compiler/hlds_out_module.m:
compiler/hlds_out_pred.m:
compiler/hlds_pred.m:
compiler/inlining.m:
compiler/intermod.m:
compiler/ite_gen.m:
compiler/item_util.m:
compiler/jumpopt.m:
compiler/layout.m:
compiler/layout_out.m:
compiler/live_vars.m:
compiler/livemap.m:
compiler/llds.m:
compiler/llds_out_file.m:
compiler/llds_out_global.m:
compiler/llds_out_instr.m:
compiler/make.dependencies.m:
compiler/make.module_dep_file.m:
compiler/make_hlds.m:
compiler/make_hlds_warn.m:
compiler/mark_tail_calls.m:
compiler/mercury_compile_llds_back_end.m:
compiler/mercury_compile_main.m:
compiler/ml_call_gen.m:
compiler/ml_code_gen.m:
compiler/ml_code_util.m:
compiler/ml_foreign_proc_gen.m:
compiler/ml_proc_gen.m:
compiler/ml_tailcall.m:
compiler/ml_unify_gen.m:
compiler/mlds.m:
compiler/mlds_to_c.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_java.m:
compiler/modecheck_goal.m:
compiler/module_imports.m:
compiler/module_qual.m:
compiler/module_qual.qualify_items.m:
compiler/modules.m:
compiler/opt_debug.m:
compiler/par_conj_gen.m:
compiler/parse_pragma.m:
compiler/parse_tree_out_info.m:
compiler/parse_tree_out_pragma.m:
compiler/pd_cost.m:
compiler/polymorphism.m:
compiler/pragma_c_gen.m:
compiler/proc_gen.m:
compiler/prog_ctgc.m:
compiler/prog_event.m:
compiler/prog_foreign.m:
compiler/prog_item.m:
compiler/prog_out.m:
compiler/prog_util.m:
compiler/purity.m:
compiler/rbmm.points_to_analysis.m:
compiler/rbmm.points_to_graph.m:
compiler/simplify_goal_call.m:
compiler/simplify_goal_scope.m:
compiler/simplify_proc.m:
compiler/smm_common.m:
compiler/stack_layout.m:
compiler/structure_reuse.analysis.m:
compiler/structure_reuse.direct.choose_reuse.m:
compiler/structure_reuse.direct.detect_garbage.m:
compiler/structure_reuse.domain.m:
compiler/structure_reuse.indirect.m:
compiler/structure_sharing.analysis.m:
compiler/structure_sharing.domain.m:
compiler/table_gen.m:
compiler/tabling_analysis.m:
compiler/term_constr_build.m:
compiler/term_constr_initial.m:
compiler/term_constr_main.m:
compiler/term_constr_main_types.m:
compiler/term_constr_pass2.m:
compiler/term_constr_util.m:
compiler/term_errors.m:
compiler/term_pass1.m:
compiler/term_pass2.m:
compiler/term_traversal.m:
compiler/term_util.m:
compiler/termination.m:
compiler/trace_gen.m:
compiler/trailing_analysis.m:
compiler/type_constraints.m:
compiler/typecheck.m:
compiler/unique_modes.m:
compiler/unused_args.m:
compiler/unused_imports.m:
compiler/use_local_vars.m:
compiler/write_deps_file.m:
Conform to the changes above.
|
||
|
|
ae17b527c9 |
Convert (C->T;E) to (if C then T else E).
In goal_util.m, process all goal types in tests, not just some. Some other simplifications. |
||
|
|
1ea38d9595 |
Clean up some more compiler modules.
compiler/equiv_type.m:
Don't export a predicate that does not need to be exported.
compiler/hlds_data.m:
compiler/polymorphism.m:
mdbcomp/goal_path.m:
Put knowledge of the goal_id to hand head constraints in only one place:
mdbcomp/goal_path.m.
compiler/goal_path.m:
Allocate goal_ids using counters.
compiler/foreign.m:
Delete an unused predicate.
compiler/ite_gen.m:
Factor out some common code.
compiler/equiv_type_hlds.m:
compiler/error_util.m:
compiler/exception_analysis.m:
compiler/global_data.m:
compiler/globals.m:
compiler/goal_expr_to_goal.m:
compiler/goal_form.m:
compiler/granularity.m:
compiler/hlds_args.m:
compiler/hlds_out_util.m:
compiler/interval.m:
compiler/java_names.m:
compiler/jumpopt.m:
compiler/labelopt.m:
compiler/lambda.m:
compiler/lco.m:
compiler/live_vars.m:
compiler/livemap.m:
compiler/modecheck_conj.m:
compiler/type_constraints.m:
Minor style cleanups.
|
||
|
|
4d38590690 |
Construct partially instantiated direct arg functor values.
Construction unifications of partially instantiated values involving direct argument functors (where the single argument is free) did not generate any code in both low-level and high-level backends. Incorrect behaviour could result if the program tried to deconstruct the value at run-time. Also, in the LLDS backend, such a construction unification did not enter the variable into the var_state_map, leading to a compiler abort when the variable is looked up. compiler/ml_unify_gen.m: Generate code for constructions of a direct arg functor with free argument. This amounts to assigning a variable to a tagged null pointer. compiler/llds.m: Add an rval option `mkword_hole', which is like `mkword' but the pointer to be tagged is unspecified. compiler/unify_gen.m: Assign a variable to an `mkword_hole' rval, for a construction unification of a direct arg functor with a free argument. Reassign the variable to an `mkword' rval when the argument becomes bound in a later unification. compiler/code_info.m: compiler/var_locn.m: Add a predicate to reassign a variable from a `mkword_hole' expression to a `mkword' expression. compiler/llds_out_data.m: Write out `mkword_hole' values as a tagged null pointer in C code. compiler/call_gen.m: compiler/code_util.m: compiler/dupelim.m: compiler/dupproc.m: compiler/exprn_aux.m: compiler/global_data.m: compiler/jumpopt.m: compiler/livemap.m: compiler/llds_to_x86_64.m: compiler/middle_rec.m: compiler/opt_debug.m: compiler/opt_util.m: compiler/peephole.m: compiler/stack_layout.m: Conform to addition of `mkword_hole'. tests/hard_coded/Mmakefile: tests/hard_coded/direct_arg_partial_inst.exp: tests/hard_coded/direct_arg_partial_inst.m: tests/hard_coded/direct_arg_partial_inst2.exp: tests/hard_coded/direct_arg_partial_inst2.m: Add test cases. |
||
|
|
517fbac88e |
Add four LLDS instructions Paul will soon need to implement the loop control
Estimated hours taken: 8 Branches: main Add four LLDS instructions Paul will soon need to implement the loop control transformation. compiler/llds.m: Add the new instructions. compiler/llds_out_instr.m: Output the new instructions. Paul may want to change the code we generate. compiler/dupelim.m: compiler/dupproc.m: compiler/exprn_aux.m: compiler/global_data.m: compiler/jumpopt.m: compiler/livemap.m: compiler/llds_to_x86_64.m: compiler/middle_rec.m: compiler/opt_debug.m: compiler/opt_util.m: compiler/peephole.m: compiler/reassign.m: compiler/use_local_vars.m: Handle the new instructions. In opt_util.m, fix two old bugs. First, the restore_maxfr instruction behaved as if it updated hp, not maxfr. Second, the keep_assign instruction wasn't being handled as an assignment operation. In peephole.m, fix an old bug, in which assignments through mem_refs were not considered to invalidate the cached value of an lval. In use_local_vars, fix an old bug: the keep_assign instruction wasn't being handled as an assignment operation. Assignments themselves weren't being as optimized as they could be. |
||
|
|
7c5fe1e988 |
Record the number of instructions in each basic block.
Estimated hours taken: 1 Branches: main compiler/basic_block.m: Record the number of instructions in each basic block. compiler/use_local_vars.m: Do not perform this quadratic optimization on basic blocks on which it would take too long. compiler/dupelim.m: Don't try to detect duplicates in big blocks; the attempt is expensive, and also very likely to fail. (Big blocks are unlikely to duplicated; the optimization was meant for redundant copies of the procedure epilogue.) compiler/livemap.m: Put a limit on the number of iterations done by the fixpoint algorithm. |
||
|
|
295415090e |
Convert almost all remaining modules in the compiler to use
Estimated hours taken: 6 Branches: main compiler/*.m: Convert almost all remaining modules in the compiler to use "$module, $pred" instead of "this_file" in error messages. In a few cases, the old error message was misleading, since it contained an incorrect, out-of-date or cut-and-pasted predicate name. tests/invalid/unresolved_overloading.err_exp: Update an expected output containing an updated error message. |
||
|
|
7e26b55e74 |
Implement a new form of memory profiling, which tells the user what memory
Branches: main
Implement a new form of memory profiling, which tells the user what memory
is being retained during a program run. This is done by allocating an extra
word before each cell, which is used to "attribute" the cell to an
allocation site. The attribution, or "allocation id", is an address to an
MR_AllocSiteInfo structure generated by the Mercury compiler, giving the
procedure, filename and line number of the allocation, and the type
constructor and arity of the cell that it allocates.
The user must manually instrument the program with calls to
`benchmarking.report_memory_attribution', which forces a GC and summarises
the live objects on the heap using the attributions. The mprof tool is
extended with a new mode to parse and present that data.
Objects which are unattributed (e.g. by hand-written C code which hasn't
been updated) are still accounted for, but show up in profiles as "unknown".
Currently this profiling mode only works in conjunction with the Boehm
garbage collector, though in principle it can work with any memory allocator
for which we can access a list of the live objects. Since term size
profiling relies on the same technique of using an extra word per memory
cell, the two profiling modes are incompatible.
The output from `mprof -s' looks like this:
------ [1] some label ------
cells words cumul procedure / type (location)
14150 38872 total
* 1949/ 13.8% 4872/ 12.5% 12.5% <predicate `parser.parse_rest/7' mode 0>
975/ 6.9% 1950/ 5.0% list.list/1 (parser.m:502)
487/ 3.4% 1948/ 5.0% term.term/1 (parser.m:501)
487/ 3.4% 974/ 2.5% term.const/0 (parser.m:501)
* 1424/ 10.1% 4272/ 11.0% 23.5% <predicate `parser.parse_simple_term_2/6' mode 0>
708/ 5.0% 2832/ 7.3% term.term/1 (parser.m:643)
708/ 5.0% 1416/ 3.6% term.const/0 (parser.m:643)
...
boehm_gc/alloc.c:
boehm_gc/include/gc.h:
boehm_gc/misc.c:
boehm_gc/reclaim.c:
Add a callback function to be called for every live object after a GC.
Add a function to write out the GC_size_map array.
compiler/layout.m:
Define the alloc_site_info type which is equivalent to the
MR_AllocSiteInfo C structure.
Add alloc_site_array as a kind of "layout" array.
compiler/llds.m:
Add allocation sites to `cfile' structure.
Replace TypeMsg argument (which was also for profiling) on `incr_hp'
instructions by an allocation site identifier.
Add a new foreign_proc_component for allocation site ids.
compiler/code_info.m:
compiler/global_data.m:
compiler/proc_gen.m:
Keep the set of allocation sites in the code_info and global_data
structures.
compiler/unify_gen.m:
Add allocation sites to LLDS allocation instructions.
compiler/layout_out.m:
compiler/llds_out_file.m:
compiler/llds_out_instr.m:
Output MR_AllocSiteInfo arrays in generated C files.
Output code to register the MR_AllocSiteInfo array with the Mercury
runtime.
Output allocation site ids for memory allocation instructions.
compiler/llds_out_util.m:
Add allocation sites to llds_out_info.
compiler/pragma_c_gen.m:
compiler/ml_foreign_proc_gen.m:
Generate a macro MR_ALLOC_ID which resolves to an allocation site
structure, for every foreign_proc whose C code contains the string
"MR_ALLOC_ID". This is to be used by hand-written C code which
allocates memory.
MR_PROC_LABELs are retained for backwards compatibility. Though
they were introduced for profiling, they seem to have been co-opted
for printf-debugging since then.
compiler/ml_global_data.m:
Add allocation site structures to the MLDS global data.
compiler/mlds.m:
compiler/ml_unify_gen.m:
Add allocation site id to `new_object' instruction.
compiler/mlds_to_c.m:
Output allocation site arrays and allocation ids in high-level C code.
Output a call to register the allocation site array with the Mercury
runtime.
Delete an unused predicate.
compiler/exprn_aux.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/mercury_compile_llds_back_end.m:
compiler/middle_rec.m:
compiler/ml_accurate_gc.m:
compiler/ml_elim_nested.m:
compiler/ml_optimize.m:
compiler/ml_util.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_gcc.m:
compiler/mlds_to_il.m:
compiler/mlds_to_java.m:
compiler/mlds_to_managed.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/use_local_vars.m:
compiler/var_locn.m:
Conform to changes.
compiler/pickle.m:
compiler/prog_event.m:
compiler/timestamp.m:
Conform to changes in memory allocation macros.
library/benchmarking.m:
Add the `report_memory_attribution' instrumentation predicates.
Conform to changes to MR_memprof_record.
library/array.m:
library/bit_buffer.m:
library/bitmap.m:
library/construct.m:
library/deconstruct.m:
library/dir.m:
library/io.m:
library/mutvar.m:
library/store.m:
library/string.m:
library/thread.semaphore.m:
library/version_array.m:
Use attributed memory allocation throughout the standard library so
that objects don't show up in the memory profile as "unknown".
Replace MR_PROC_LABEL by MR_ALLOC_ID.
mdbcomp/program_representation.m:
mdbcomp/rtti_access.m:
Replace MR_PROC_LABEL by MR_ALLOC_ID.
profiler/Mercury.options:
profiler/globals.m:
profiler/mercury_profile.m:
profiler/options.m:
profiler/output.m:
profiler/snapshots.m:
Add a new mode to `mprof' to parse and present the data from
`Prof.Snapshots' files.
Add options for the new profiling mode.
profiler/process_file.m:
Fix a typo.
runtime/mercury_conf_param.h:
#define MR_MPROF_PROFILE_MEMORY_ATTRIBUTION if memory profiling
is enabled and we are using Boehm GC.
runtime/mercury.h:
Make MR_new_object take an allocation id argument.
Conform to changes in memory allocation macros.
runtime/mercury_memory.c:
runtime/mercury_memory.h:
runtime/mercury_types.h:
Define MR_AllocSiteInfo.
Add memory allocation functions and macros which take into the
account the additional word necessary for the new profiling mode.
These should be used in preferences to the raw memory allocation
functions wherever possible so that objects do not show up in the
profile as "unknown".
Add analogues of realloc/free which take into account the offset
introduced by the attribution word.
Add function versions of the MR_new_object macros, which can't be
written in standard C. They are only used when necessary.
Add built-in allocation site ids, to be used in the runtime and
other hand-written code when context-specific ids are unavailable.
runtime/mercury_heap.h:
Make MR_tag_offset_incr_hp_msg and MR_tag_offset_incr_hp_atomic_msg
allocate an extra word when memory attribution is desired, and store
the allocation id there.
Similarly for MR_create{1,2,3}_msg.
Replace proclabel arguments in allocation macros by alloc_id
arguments.
Replace MR_hp_alloc_atomic by MR_hp_alloc_atomic_msg. It was only
used for boxing floats.
Conform to change to MR_new_object macro.
runtime/mercury_bootstrap.h:
Delete obsolete macro hp_alloc_atomic.
runtime/mercury_heap_profile.c:
runtime/mercury_heap_profile.h:
Add the code to summarise the live objects on the Boehm GC heap and
writes out the data to `Prof.Snapshots', for display by mprof.
Don't store the procedure name in MR_memprof_record: the procedure
address is enough and faster to compare.
runtime/mercury_prof.c:
Finish and close the `Prof.Snapshots' file when the program
terminates.
Conform to changes in MR_memprof_record.
runtime/mercury_misc.h:
Add a macro to expand to the name of the allocation sites array
in LLDS grades.
runtime/mercury_bitmap.c:
runtime/mercury_bitmap.h:
Pass allocation id through bitmap allocation functions.
Delete unused function MR_string_to_bitmap.
runtime/mercury_string.h:
Add MR_make_aligned_string_copy_msg.
Make string allocation macros take allocation id arguments.
runtime/mercury.c:
runtime/mercury_array_macros.h:
runtime/mercury_context.c:
runtime/mercury_deconstruct.c:
runtime/mercury_deconstruct_macros.h:
runtime/mercury_dlist.c:
runtime/mercury_engine.c:
runtime/mercury_float.h:
runtime/mercury_hash_table.c:
runtime/mercury_ho_call.c:
runtime/mercury_label.c:
runtime/mercury_prof_mem.c:
runtime/mercury_stacks.c:
runtime/mercury_stm.c:
runtime/mercury_string.c:
runtime/mercury_thread.c:
runtime/mercury_trace_base.c:
runtime/mercury_trail.c:
runtime/mercury_type_desc.c:
runtime/mercury_type_info.c:
runtime/mercury_wsdeque.c:
Use attributed memory allocation throughout the runtime so that
objects don't show up in the profile as "unknown".
runtime/mercury_memory_zones.c:
Attribute memory zones to the Mercury runtime.
runtime/mercury_tabling.c:
runtime/mercury_tabling.h:
Use attributed memory allocation macros for tabling structures.
Delete unused MR_table_realloc_* and MR_table_copy_bytes macros.
runtime/mercury_deep_copy_body.h:
Try to retain the original attribution word when copying values.
runtime/mercury_ml_expand_body.h:
Conform to changes in memory allocation macros.
runtime/mercury_tags.h:
Replace proclabel arguments by alloc_id arguments in allocation macros.
runtime/mercury_wrapper.c:
If memory attribution is enabled, tell Boehm GC that pointers may be
displaced by an extra word.
trace/mercury_trace.c:
trace/mercury_trace_tables.c:
Conform to changes in memory allocation macros.
extras/net/tcp.m:
extras/solver_types/library/any_array.m:
extras/trailed_update/tr_array.m:
Conform to changes in memory allocation macros.
doc/user_guide.texi:
Document the new profiling mode.
doc/reference_manual.texi:
Update a commented out example.
|
||
|
|
9ae7fe6b70 |
Change the argument ordering of predicates in the set module.
Branches: main Change the argument ordering of predicates in the set module. library/set.m: Change predicate argument orders to match the versions in the svset module. Group function definitions with the corresponding predicates rather than at the end of the file. Delete Ralph's comments regarding the argument order in the module interface: readers of the library reference guide are unlikely to be interested in his opinion of the argument ordering ten or so years ago. Add extra modes for set.map/3 and set.map_fold/5. library/svset.m: library/eqvclass.m: library/tree234.m: library/varset.m: browser/*.m: compiler/*.m: deep_profiler/*.m: mdbcomp/trace_counts.m: extras/moose/grammar.m: extras/moose/lalr.m: extras/moose/moose.m: tests/hard_coded/bitset_tester.m: Conform to the above change. NEWS: Announce the above changes. |
||
|
|
9f68c330f0 |
Change the argument order of many of the predicates in the map, bimap, and
Branches: main
Change the argument order of many of the predicates in the map, bimap, and
multi_map modules so they are more conducive to the use of state variable
notation, i.e. make the order the same as in the sv* modules.
Prepare for the deprecation of the sv{bimap,map,multi_map} modules by
removing their use throughout the system.
library/bimap.m:
library/map.m:
library/multi_map.m:
As above.
NEWS:
Announce the change.
Separate out the "highlights" from the "detailed listing" for
the post-11.01 NEWS.
Reorganise the announcement of the Unicode support.
benchmarks/*/*.m:
browser/*.m:
compiler/*.m:
deep_profiler/*.m:
extras/*/*.m:
mdbcomp/*.m:
profiler/*.m:
tests/*/*.m:
ssdb/*.m:
samples/*/*.m
slice/*.m:
Conform to the above change.
Remove any dependencies on the sv{bimap,map,multi_map} modules.
|
||
|
|
322feaf217 |
Add more threadscope instrumentation.
This change introduces instrumentation that tracks sparks as well as parallel
conjunctions and their conjuncts. This should hopefully give us more
information to diagnose runtime performance issues.
As of this date the ThreadScope program hasn't been updated to read or
understand these new events.
runtime/mercury_threadscope.[ch]:
Added a function and types to register all the threadscope strings from an
array.
Add functions to post the new events (see below).
runtime/mercury_threadscope.c:
Added support for 5 new threadscope events.
Registering a string so that other messages may refer to a constant
string.
Marking the beginning and ends of parallel conjunctions.
Creating a spark for a parallel conjunct.
Finishing a parallel conjunct.
Re-arranged event IDs, I've started allocating IDs from 38 onwards for
general purposes and 100 onwards for mercury specific events after talking
with Duncan Coutts.
Trimmed excess whitespace from the end of lines.
runtime/mercury_context.h:
Post a beginning parallel conjunction message when the sync term for the
parallel conjunction is initialized.
Post an event when creating a spark for a parallel conjunction.
Add a MR_spark_id field to the MR_Spark structure, these identify sparks to
threadscope.
runtime/mercury_context.c:
Post threadscope messages when a spark is about to be executed.
Post a threadscope event when a parallel conjunct is completed.
Add a missing memory barrier.
runtime/mercury_wrapper.[ch]:
Create a global function pointer for the code that registers strings in the
threadscope string table, this is filled in by mkinit.
Call this function pointer immediatly after setting up threadscope.
runtime/mercury_wsdeque.[ch]:
Modify MR_wsdeque_pop_bottom to return the spark pointer (which points onto
the queue) rather then returning a result through a pointer and bool if the
operation was successful. This pointer is safe to dereference until
MR_wsdeque_push_bottom is used.
runtime/mercury_wsdeque.c:
Corrected a code comment.
runtime/mercury_engine.h:
Documented some of the fields of the engine structure that hadn't been
documented.
Add a next spark ID field to the engine structure.
Change the type of the engine ID field to MR_uint_least16_t
compiler/llds.m:
Add a third field to the init_sync_term instruction that stores the index
into the threadscope string table of the static conjunction ID.
Add a field to the c_file structure containing the threadscope string
table.
compiler/layout.m:
Added a new layout array name for the threadscope string table.
compiler/layout_out.m:
Implement code to write out the threadscope string table.
compiler/llds_out_file.m:
Write out the threadscope string table when writing out the c_file.
compiler/par_conj_gen.m:
Create strings that statically identify parallel conjunctions for each
init_sync_term LLDS instruction. These strings are added to a table in the
!CodeInfo and the index of the string is added to the init_sync_term
instruction.
Add an extra instruction after a parallel conjunction to post the message
that the parallel conjunction has completed.
compiler/global_data.m:
Add fields to the global data structure to represent the threadscope string
table and its current size.
Add predicates to update and retrieve the table.
Handle merging of threadscope string tables in global data by allowing the
references to the strings to be remapped.
Refactored remapping code so that a caller such as proc_gen only needs to
call one remapping predicate after merging global data..
compiler/code_info.m:
Add a table of strings for use with threadscope to the code_info_persistent
type.
Modify the code_info_init to initialise the threadscope string table fields.
Add a predicate to get the string table and another to update it.
compiler/proc_gen.m:
Build the containing goal map before code generation for procedures with
parallel conjunctions in a parallel grade. par_conj_gen.m depends on this.
Conform to changes in code_info.m and global_data.m
compiler/llds_out_instr.m:
Write out the extra parameter in the init_sync_term instruction.
compiler/dupelim.m:
compiler/dupproc.m:
compiler/exprn_aux.m:
compiler/global_data.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/llds_to_x86_64.m:
compiler/mercury_compile_llds_back_end.m:
compiler/middle_rec.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/peephole.m:
compiler/reassign.m:
compiler/use_local_vars.m:
Conform to changes in llds.m
compiler/opt_debug.m:
Conform to changes in layout.m
compiler/mercury_compile_llds_back_end.m:
Fix some trailing whitespace.
util/mkinit.c:
Build an initialisation function that registers all the strings in
threadscope string tables.
Correct the layout of a comment.
|
||
|
|
1c3bc03415 |
Make the system compiler with --warn-unused-imports.
Estimated hours taken: 2 Branches: main, release Make the system compiler with --warn-unused-imports. browser/*.m: library/*.m: compiler/*.m: Remove unnecesary imports as flagged by --warn-unused-imports. In some files, do some minor cleanup along the way. |
||
|
|
8a28e40c9b |
Add the predicates sorry, unexpected and expect to library/error.m.
Estimated hours taken: 2 Branches: main Add the predicates sorry, unexpected and expect to library/error.m. compiler/compiler_util.m: library/error.m: Move the predicates sorry, unexpected and expect from compiler_util to error. Put the predicates in error.m into the same order as their declarations. compiler/*.m: Change imports as needed. compiler/lp.m: compiler/lp_rational.m: Change imports as needed, and some minor cleanups. deep_profiler/*.m: Switch to using the new library predicates, instead of calling error directly. Some other minor cleanups. NEWS: Mention the new predicates in the standard library. |
||
|
|
9bdc5db590 |
Try to work around the Snow Leopard linker's performance problem with
Estimated hours taken: 20
Branches: main
Try to work around the Snow Leopard linker's performance problem with
debug grade object files by greatly reducing the number of symbols needed
to represent the debugger's data structures.
Specifically, this diff groups all label layouts in a module, each of which
previously had its own named global variable, into only a few (one to four)
global variables, each of which is an array. References to the old global
variables are replaced by references to slots in these arrays.
This same treatment could also be applied to other layout structures. However,
most layouts are label layouts, so doing just label layouts gets most of the
available benefit.
When the library and compiler are compiled in grade asm_fast.gc.debug,
this diff leads to about a 1.5% increase in the size of their generated C
source files (from 338 to 343 Mb), but a more significant reduction (about 17%)
in the size of the corresponding object files (from 155 to 128 Mb). This leads
to an overall reduction in disk requirements from 493 to 471 Mb (about 4.5%).
Since we generate the same code and data as before, with the data just being
arranged differently, the decrease in object file sizes is coming from the
reduction in relocation information, the information processed by the linker.
This should speed up the linker.
compiler/layout.m:
Make the change described above. We now define up to four arrays:
one each for label layouts with and without information about
variables, one for the layout structures of user events,
and one for the variable number lists of user events.
compiler/layout_out.m:
Generate the new arrays that the module being compiled needs.
Use purpose-specific types instead of booleans.
compiler/trace_gen.m:
Use a new field in foreign_proc_code instructions to record the
identity of any labels whose layout structures we want to refer to,
even though layout structures have not been generated yet. The labels
will be looked up in a map (generated together with the layout
structures) by llds_out.m.
compiler/llds.m:
Add this extra field to foreign_proc_code instructions.
Add the map (which is actually in two parts) to the c_file type,
which is the data structure representing the entire LLDS.
Also add to the c_file type some other data structures that previously
we used to hand around alongside it. Some of these data structures
used to conmingle layout structures that we now separate.
compiler/stack_layout.m:
Generate array slots instead of separate structures for label layouts.
Return the different arrays separately.
compiler/llds_out.m:
Order the output of layout structures to require fewer forward
declarations. The forward declarations of the few arrays holding the
label layout structures replace a lot of the declarations previously
needed.
Include the information needed by layout_out.m in the llds_out_info,
and conform to the changes above.
As a side-effect of all these changes, we now generate proc layout
structures in the same order as the procedures' appearence in the HLDS,
which is the same as their order in the source code, modulo any
procedures added by the compiler itself (for lambdas, unification
predicates, etc).
compiler/code_info.m:
compiler/dupelim.m:
compiler/dup_proc.m:
compiler/exprn_aux.m:
compiler/frameopt.m:
compiler/global_data.m:
compiler/ite_gen.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/llds_to_x86_64.m:
compiler/mercury_compile_llds_back_end.m:
compiler/middle_rec.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/pragma_c_gen.m:
compiler/proc_gen.m:
compiler/reassign.m:
compiler/use_local_vars.m:
Conform to the changes above.
runtime/mercury_goto.h:
Add the macros used by the new code in layout_out.m and llds_out.m.
We need new macros because the old ones assumed that the
C preprocessor can construct the address of a label's layout structure
from the name of the label, which is obviously no longer possible.
Make even existing families of macros handle in bulk up to 10 labels,
up from the previous 8.
runtime/mercury_stack_layout.h:
Add macros for use by the new code in layout.m.
tests/debugger/*.{inp,exp}:
tests/debugger/declarative/*.{inp,exp}:
Update these test cases to account for the new (and better) order
of proc layout structures. Where inputs changed, this was to ensure
that we still select the same procedures from lists of procedures,
e.g. to put a breakpoint on.
|
||
|
|
e0ff2b1903 |
Implement conditional structure reuse for LLDS backends using Boehm GC.
Estimated hours taken: 15
Branches: main
Implement conditional structure reuse for LLDS backends using Boehm GC.
Verify at run time, just before reusing a dead cell, that the base address of
the cell was dynamically allocated. If not, fall back to allocating a new
object on the heap. This makes structure reuse safe without having to disable
static data.
In the simple case, the generated C code looks like this:
MR_tag_reuse_or_alloc_heap(dest, tag, addr_of_reuse_cell,
MR_tag_alloc_heap(dest, tag, count));
...assign fields...
If some of the fields are known to already have the correct values then we can
avoid assigning them. We need to handle both reuse and non-reuse cases:
MR_tag_reuse_or_alloc_heap_flag(dest, flag_reg, tag, addr_of_reuse_cell,
MR_tag_alloc_heap(dest, tag, count));
/* flag_reg is non-zero iff reuse is possible */
if (flag_reg) {
goto skip;
}
...assign fields which don't need to be assigned in reuse case...
skip:
...assign fields which must be assigned in both cases...
It may be that it is not worth the branch to avoid assigning known fields.
I haven't yet checked.
compiler/llds.m:
Extend the `incr_hp' instruction to hold information for structure
reuse.
compiler/code_info.m:
Generate a label and pass it to `var_locn_assign_cell_to_var'. The
label is only needed for the type of code shown above.
compiler/var_locn.m:
Change the code generated for cell reuse. Rather than assigning the
dead cell's address to the target lval unconditionally, generate an
`incr_hp' instruction with the reuse field filled in.
Generate code that avoids filling in known fields if possible.
Abort if we see `construct_statically(_)' in
`var_locn_assign_dynamic_cell_to_var'.
runtime/mercury_heap.h:
runtime/mercury_conf_param.h:
Add a macro to check if an address is between
`GC_least_plausible_heap_addr' and `GC_greatest_plausible_heap_addr',
which are therefore in the heap.
Add macros to conditionally reuse a cell or otherwise fall back to
allocating a new object.
Make it possible to revert to unconditional structure reuse by
defining the C macro `MR_UNCONDITIONAL_STRUCTURE_REUSE'.
compiler/llds_out.m:
Call the new macros in `mercury_heap.h' for `incr_hp' instructions
with reuse information filled in.
compiler/dupelim.m:
compiler/dupproc.m:
compiler/exprn_aux.m:
compiler/global_data.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/llds_to_x86_64.m:
compiler/middle_rec.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/reassign.m:
compiler/unify_gen.m:
compiler/use_local_vars.m:
Conform to the changed `incr_hp' instruction.
|
||
|
|
cc88711d63 |
Implement true multi-cons_id arm switches, i.e. switches in which we associate
Estimated hours taken: 40
Branches: main
Implement true multi-cons_id arm switches, i.e. switches in which we associate
more than one cons_id with a switch arm. Previously, for switches like this:
(
X = a,
goal1
;
( X = b
; X = c
),
goal2
)
we duplicated goal2. With this diff, goal2 won't be duplicated. We still
duplicate goals when that is necessary, i.e. in cases which the inner
disjunction contains code other than a functor test on the switched-on var,
like this:
(
X = a,
goal1
;
(
X = b,
goalb
;
X = c
goalc
),
goal2
)
For now, true multi-cons_id arm switches are supported only by the LLDS
backend. Supporting them on the MLDS backend is trickier, because some MLDS
target languages (e.g. Java) don't support the concept at all. So when
compiling to MLDS, we still duplicate the goal in switch detection (although
we could delay the duplication to just before code generation, if we wanted.)
compiler/options.m:
Add an internal option that tells switch detection whether to look for
multi-cons_id switch arms.
compiler/handle_options.m:
Set this option based on the back end.
Add a version of the "trans" dump level that doesn't print unification
details.
compiler/hlds_goal.m:
Extend the representation of switch cases to allow more than one
cons_id for a switch arm.
Add a type for representing switches that also includes tag information
(for use by the backends).
compiler/hlds_data.m:
For du types, record whether it is possible to speed up tests for one
cons_id (e.g. cons) by testing for the other (nil) and negating the
result. Recording this information once is faster than having
unify_gen.m trying to compute it from scratch for every single
tag test.
Add a type for representing a cons_id together with its tag.
compiler/hlds_out.m:
Print out the cheaper_tag_test information for types, and possibly
several cons_ids for each switch arm.
Add some utility predicates for describing switch arms in terms of
which cons_ids they are for.
Replace some booleans with purpose-specific types.
Make hlds_out honor is documentation, and not print out detailed
information about unifications (e.g. uniqueness and static allocation)
unless the right character ('u') is present in the control string.
compiler/add_type.m:
Fill in the information about cheaper tag tests when adding a du type.
compiler/switch_detection.m:
Extend the switch detection algorithm to detect multi-cons_id switch
arms.
When entering a switch arm, update the instmap to reflect that the
switched-on variable can now be bound only to the cons_ids that this
switch arm is for. We now need to do this, because if the arm contains
another switch on the same variable, computing the can_fail field of
that switch correctly requires us to know this information.
(Obviously, an arm for a single cons_id is unlikely to have switch on
the same variable, and for arms for several cons_ids, we previously
duplicated the arm and left the unification with the cons_id in each
copy, and this unification allowed the correct handling of any later
switches. However, the code of a multi-cons_id switch arm obviously
cannot have a unification with each cons_id in it, which is why
we now need to get the binding information from the switch itself.)
Replace some booleans with purpose-specific types, and give some
predicates better names.
compiler/instmap.m:
Provide predicates for recording that a switched-on variable has
one of several given cons_ids, for use at the starts of switch arms.
Give some predicates better names.
compiler/modes.m:
Provide predicates for updating the mode_info at the start of a
multi-cons_id switch arm.
compiler/det_report.m:
Handle multi-cons_id switch arms.
Update the instmap when entering each switch arm, since this is needed
to provide good (i.e. non-misleading) error messages when one switch on
a variable exists inside another switch on the same variable.
Since updating the instmap requires updating the module_info (since
the new inst may require a new entry in an inst table), thread the
det_info through as updateable state.
Replace some multi-clause predicate definitions with single clauses,
to make it easier to print the arguments in mdb.
Fix some misleading variable names.
compiler/det_analysis.m:
Update the instmap when entering each switch arm and thread the
det_info through as updateable state, since the predicates we call
in det_report.m require this.
compiler/det_util.m:
Handle multi-cons_id switch arms.
Rationalize the argument order of some access predicates.
compiler/switch_util.m:
Change the parts of this module that deal with string and tag switches
to optionally convert each arm to an arbitrary representation of the
arm. In the LLDS backend, the conversion process generated code for
the arm, and the arm's representation is the label at the start of
this code. This way, we can duplicate the label without duplicating
the code.
Add a new part of this module that associates each cons_id with its
tag, and (during the same pass) checks whether all the cons_ids are
integers, and if so what are min and max of these integers (needed
for dense switches). This scan is needed because the old way of making
this test had single-cons_id switch arms as one of its basic
assumptions, and doing it while adding tags to each case reduces
the number of traversals required.
Give better names to some predicates.
compiler/switch_case.m:
New module to handle the tasks associated with managing multi-cons_id
switch arms, including representing them for switch_util.m.
compiler/ll_backend.m:
Include the new module.
compiler/notes/compiler_design.html:
Note the new module.
compiler/llds.m:
Change the computed goto instruction to take a list of maybe labels
instead of a list of labels, with any missing labels meaning "not
reached".
compiler/string_switch.m:
compiler/tag_switch.m:
Reorganize the way these modules work. We can't generate the code of
each arm in place anymore, since it is now possible for more than one
cons_id to call for the execution of the same code. Instead, in
string_switch.m, we generate the codes of all the arms all at once,
and construct the hash index afterwards. (This approach simplifies
the code significantly.)
In tag switches (unlike string switches), we can get locality benefits
if the code testing for a cons_id is close to the code for that
cons_id, so we still try to put them next to each other when such
a locality benefit is available.
In both modules, the new approach uses a utility predicate in
switch_case.m to actually generate the code of each switch arm,
eliminating several copies the same code in the old versions of these
modules.
In tag_switch.m, don't create a local label that simply jumps to the
code address do_not_reached. Previously, we had to do this for
positions in jump tables that corresponded to cons_ids that the switch
variable could not be bound to. With the change to llds.m, we now
simply generate a "no" instead.
compiler/lookup_switch.m:
Get the info about int switch limits from our caller; don't compute it
here.
Give some variables better names.
compiler/dense_switch.m:
Generate the codes of the cases all at once, then assemble the table,
duplicate the labels as needed. This separation of concerns allows
significant simplifications.
Pack up all the information shared between the predicate that detects
whether a dense switch is appropriate and the predicate that actually
generates the dense switch.
Move some utility predicates to switch_util.
compiler/switch_gen.m:
Delete the code for tagging cons_ids, since that functionality is now
in switch_util.m.
The old version of this module could call the code generator to produce
(i.e. materialize) the switched-on variable repeatedly. We now produce
the variable once, and do the switch on the resulting rval.
compiler/unify_gen.m:
Use the information about cheaper tag tests in the type constructor's
entry in the HLDS type table, instead of trying to recompute it
every time.
Provide the predicates switch_gen.m now needs to perform tag tests
on rvals, as opposed to variables, and against possible more than one
cons_id.
Allow the caller to provide the tag corresponding to the cons_id(s)
in tag tests, since when we are generating code for switches, the
required computations have already been done.
Factor out some code to make all this possible.
Give better names to some predicates.
compiler/code_info.m:
Provide some utility predicates for the new code in other modules.
Give better names to some existing predicates.
compiler/hlds_code_util.m:
Rationalize the argument order of some predicates.
Replace some multi-clause predicate definitions with single clauses,
to make it easier to print the arguments in mdb.
compiler/accumulator.m:
compiler/add_heap_ops.m:
compiler/add_pragma.m:
compiler/add_trail_ops.m:
compiler/assertion.m:
compiler/build_mode_constraints.m:
compiler/check_typeclass.m:
compiler/closure_analysis.m:
compiler/code_util.m:
compiler/constraint.m:
compiler/cse_detection.m:
compiler/dead_proc_elim.m:
compiler/deep_profiling.m:
compiler/deforest.m:
compiler/delay_construct.m:
compiler/delay_partial_inst.m:
compiler/dep_par_conj.m:
compiler/distance_granularity.m:
compiler/dupproc.m:
compiler/equiv_type_hlds.m:
compiler/erl_code_gen.m:
compiler/exception_analysis.m:
compiler/export.m:
compiler/follow_code.m:
compiler/follow_vars.m:
compiler/foreign.m:
compiler/format_call.m:
compiler/frameopt.m:
compiler/goal_form.m:
compiler/goal_path.m:
compiler/goal_util.m:
compiler/granularity.m:
compiler/hhf.m:
compiler/higher_order.m:
compiler/implicit_parallelism.m:
compiler/inlining.m:
compiler/inst_check.m:
compiler/intermod.m:
compiler/interval.m:
compiler/lambda.m:
compiler/lambda.m:
compiler/lambda.m:
compiler/lco.m:
compiler/live_vars.m:
compiler/livemap.m:
compiler/liveness.m:
compiler/llds_out.m:
compiler/llds_to_x86_64.m:
compiler/loop_inv.m:
compiler/make_hlds_warn.m:
compiler/mark_static_terms.m:
compiler/middle_rec.m:
compiler/ml_tag_switch.m:
compiler/ml_type_gen.m:
compiler/ml_unify_gen.m:
compiler/mode_constraints.m:
compiler/mode_errors.m:
compiler/mode_util.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/pd_cost.m:
compiler/pd_into.m:
compiler/pd_util.m:
compiler/peephole.m:
compiler/polymorphism.m:
compiler/post_term_analysis.m:
compiler/post_typecheck.m:
compiler/purity.m:
compiler/quantification.m:
compiler/rbmm.actual_region_arguments.m:
compiler/rbmm.add_rbmm_goal_infos.m:
compiler/rbmm.condition_renaming.m:
compiler/rbmm.execution_paths.m:
compiler/rbmm.points_to_analysis.m:
compiler/rbmm.region_transformation.m:
compiler/recompilation.usage.m:
compiler/saved_vars.m:
compiler/simplify.m:
compiler/size_prof.m:
compiler/ssdebug.m:
compiler/store_alloc.m:
compiler/stratify.m:
compiler/structure_reuse.direct.choose_reuse.m:
compiler/structure_reuse.indirect.m:
compiler/structure_reuse.lbu.m:
compiler/structure_reuse.lfu.m:
compiler/structure_reuse.versions.m:
compiler/structure_sharing.analysis.m:
compiler/table_gen.m:
compiler/tabling_analysis.m:
compiler/term_constr_build.m:
compiler/term_norm.m:
compiler/term_pass1.m:
compiler/term_traversal.m:
compiler/trailing_analysis.m:
compiler/transform_llds.m:
compiler/tupling.m:
compiler/type_ctor_info.m:
compiler/type_util.m:
compiler/unify_proc.m:
compiler/unique_modes.m:
compiler/unneeded_code.m:
compiler/untupling.m:
compiler/unused_args.m:
compiler/unused_imports.m:
compiler/xml_documentation.m:
Make the changes necessary to conform to the changes above, principally
to handle multi-cons_id arm switches.
compiler/ml_string_switch.m:
Make the changes necessary to conform to the changes above, principally
to handle multi-cons_id arm switches.
Give some predicates better names.
compiler/dependency_graph.m:
Make the changes necessary to conform to the changes above, principally
to handle multi-cons_id arm switches. Change the order of arguments
of some predicates to make this easier.
compiler/bytecode.m:
compiler/bytecode_data.m:
compiler/bytecode_gen.m:
Make the changes necessary to conform to the changes above, principally
to handle multi-cons_id arm switches. (The bytecode interpreter
has not been updated.)
compiler/prog_rep.m:
mdbcomp/program_representation.m:
Change the byte sequence representation of goals to allow switch arms
with more than one cons_id. compiler/prog_rep.m now writes out the
updated representation, while mdbcomp/program_representation.m reads in
the updated representation.
deep_profiler/mdbprof_procrep.m:
Conform to the updated program representation.
tools/binary:
Fix a bug: if the -D option was given, the stage 2 directory wasn't
being initialized.
Abort if users try to give that option more than once.
compiler/Mercury.options:
Work around bug #32 in Mantis.
|
||
|
|
672f77c4ec |
Add a new compiler option. --inform-ite-instead-of-switch.
Estimated hours taken: 20 Branches: main Add a new compiler option. --inform-ite-instead-of-switch. If this is enabled, the compiler will generate informational messages about if-then-elses that it thinks should be converted to switches for the sake of program reliability. Act on the output generated by this option. compiler/simplify.m: Implement the new option. Fix an old bug that could cause us to generate warnings about code that was OK in one duplicated copy but not in another (where a switch arm's code is duplicated due to the case being selected for more than one cons_id). compiler/options.m: Add the new option. Add a way to test for the bug fix in simplify. doc/user_guide.texi: Document the new option. NEWS: Mention the new option. library/*.m: mdbcomp/*.m: browser/*.m: compiler/*.m: deep_profiler/*.m: Convert if-then-elses to switches at most of the sites suggested by the new option. At the remaining sites, switching to switches would have nontrivial downsides. This typically happens with the switched-on type has many functors, and we treat one or two specially (e.g. cons/2 in the cons_id type). Perform misc cleanups in the vicinity of the if-then-else to switch conversions. In a few cases, improve the error messages generated. compiler/accumulator.m: compiler/hlds_goal.m: (Rename and) move insts for particular kinds of goal from accumulator.m to hlds_goal.m, to allow them to be used in other modules. Using these insts allowed us to eliminate some if-then-elses entirely. compiler/exprn_aux.m: Instead of fixing some if-then-elses, delete the predicates containing them, since they aren't used, and (as pointed out by the new option) would need considerable other fixing if they were ever needed again. compiler/lp_rational.m: Add prefixes to the names of the function symbols on some types, since without those prefixes, it was hard to figure out what type the switch corresponding to an old if-then-else was switching on. tests/invalid/reserve_tag.err_exp: Expect a new, improved error message. |
||
|
|
fa80b9a01a |
Make the parallel conjunction execution mechanism more efficient.
Branches: main Make the parallel conjunction execution mechanism more efficient. 1. Don't allocate sync terms on the heap. Sync terms are now allocated in the stack frame of the procedure call which originates a parallel conjunction. 2. Don't allocate individual sparks on the heap. Sparks are now stored in preallocated, growing arrays using an algorithm that doesn't use locks. 3. Don't have one mutex per sync term. Just use one mutex to protect concurrent accesses to all sync terms (it's is rarely needed anyway). This makes sync terms smaller and saves initialising a mutex for each parallel conjunction encountered. 4. We don't bother to acquire the global sync term lock if we know a parallel conjunction couldn't be executing in parallel. In a highly parallel program, the majority of parallel conjunctions will be executed sequentially so protecting the sync terms from concurrent accesses is unnecessary. par_fib(39) is ~8.4 times faster (user time) on my laptop (Linux 2.6, x86_64), which is ~3.5 as slow as sequential execution. configure.in: Update the configuration for a changed MR_SyncTerm structure. compiler/llds.m: Make the fork instruction take a second argument, which is the base stack slot of the sync term. Rename it to fork_new_child to match the macro name in the runtime. compiler/par_conj_gen.m: Change the generated code for parallel conjunctions to allocate sync terms on the stack and to pass the sync term to fork_new_child. compiler/dupelim.m: compiler/dupproc.m: compiler/exprn_aux.m: compiler/global_data.m: compiler/jumpopt.m: compiler/livemap.m: compiler/llds_out.m: compiler/llds_to_x86_64.m: compiler/middle_rec.m: compiler/opt_debug.m: compiler/opt_util.m: compiler/reassign.m: compiler/use_local_vars.m: Conform to the change in the fork instruction. compiler/liveness.m: compiler/proc_gen.m: Disable use of the parallel conjunction operator in the compiler as older versions of the compiler will generate code incompatible with the new runtime. runtime/mercury_context.c: runtime/mercury_context.h: Remove the next pointer field from MR_Spark as it's no longer needed. Remove the mutex from MR_SyncTerm. Add a field to record if a spark belonging to the sync term was scheduled globally, i.e. if the parallel conjunction might be executed in parallel. Define MR_SparkDeque and MR_SparkArray. Use MR_SparkDeques to hold per-context sparks and global sparks. Change the abstract machine instructions MR_init_sync_term, MR_fork_new_child, MR_join_and_continue as per the main change log. Use a preprocessor macro MR_LL_PARALLEL_CONJ as a shorthand for !MR_HIGHLEVEL_CODE && MR_THREAD_SAFE. Take the opportunity to clean things up a bit. runtime/mercury_wsdeque.c: runtime/mercury_wsdeque.h: New files containing an implementation of work-stealing deques. We don't do work stealing yet but we use the underlying data structure. runtime/mercury_atomic.c: runtime/mercury_atomic.h: New files to contain atomic operations. Currently it just contains compare-and-swap for gcc/x86_64, gcc/x86 and gcc-4.1. runtime/Mmakefile: Add the new files. runtime/mercury_engine.h: runtime/mercury_mm_own_stacks.c: runtime/mercury_wrapper.c: Conform to runtime changes. runtime/mercury_conf_param.h: Update an outdated comment. |
||
|
|
b48eaf8073 |
Add a first draft of the code generator support for region based memory
Estimated hours taken: 30 Branches: main Add a first draft of the code generator support for region based memory management. It is known to be incomplete; the missing parts are marked by XXXs. It may also be buggy; it will be tested after Quan adds the runtime support, i.e. the C macros invoked by the new LLDS instructions. However, the changes in this diff shouldn't affect non-RBMM operations. compiler/llds.m: Add five new LLDS instructions. Four are specific to RBMM operations. RBMM embeds three new stacks in compiler-reserved temp slots in procedure's usual Mercury stack frames, and the new LLDS instructions respectively (i) push those stack frames onto their respective stacks, (ii) fill some variable parts of those stack frames, (iii) fill fixed slots of those stack frames, and (iv) use the contents of and/or pop those stack frames. (The pushing and popping affect only the new embedded stacks, not the usual Mercury stacks.) The last instruction is a new variant of the old assign instruction. It has identical semantics, but restricts optimization. An assign (a) can be deleted if its target lval is not used, and (b) its target lval can be changed (e.g. to a temp register) as long as all the later instructions referring to that lval are changed to use the new lval instead. Neither is permitted for the new keep_assign instruction. This is required because in an earlier draft we used it to assign to stack variables (parts of the embedded stack frames) that aren't explicitly referred to in later LLDS code, but are nevertheless implicitly referred to by some instructions (specifically iv above). We now use a specialized instruction (iii above) for this (since the macro it invokes can refer to C structure names, this makes it easier to keep the compiler in sync with the runtime system), but given that keep_assign is already implemented, may be useful later and shouldn't cause appreciable slowdown of the compiler, this diff keeps it. Extend the type that describe the contents of lvals to allow it to describe the new kinds of things we can now store in them. Add types to manage and describe the new embedded stack frames, and some utility functions. Change some existing utility functions to make all this more conceptually consistent. compiler/ite_gen.m: Surround the code we generate for the condition of if-then-elses with the code required to ensure that regions that are logically removed in the condition aren't physically destroyed until we know that the condition succeeds (since the region may still be needed in the else branch), and to make sure that if the condition fails, all the memory allocated since the entry into the condition is reclaimed instantly. compiler/disj_gen.m: Surround the code we generate for disjunctions with the code required to ensure that regions that are logically removed in a disjunct aren't physically destroyed if a later disjunct needs them, and to make sure that at entry into a non-first disjunct, all the memory allocated since the entry into the disjunction is reclaimed instantly. compiler/commit_gen.m: compiler/code_info.m: The protection against destruction offered by a disjunction disappears when a commit cuts away all later alternatives in that disjunct, so we must undo that protection. We therefore surround the scope of a commit goal with goal that achieves that objective. Add some new utility predicates to code_info. Remove some old utility functions that are now in llds.m. compiler/continuation_info.m: Extend the type that describe the contents of stack slots to allow it to describe the new kinds of things we can now store in them. Rename the function symbols of that type to eliminate some ambiguities. compiler/code_gen.m: Remember the set of variables live at the start of the goal (before the pre_goal_update updates it), since the region operations need to know this. Leave the lookup of AddTrailOps (and now AddRegionOps) to the specific kinds of goals that need it (the most frequent goals, unify and call, do not). Make both AddTrailOps and AddRegionOps use a self-explanatory type instead of a boolean. compiler/lookup_switch.m: Conform to the change to AddTrailOps. Fix some misleading variable names. compiler/options.m: Add some options to control the number of stack slots needed for various purposes. These have to correspond to the sizes of some C structures in the runtime system. Eventually these will be constants, but it is handy to keep them easily changeable while the C data structures are still being worked on. Add an option for optimizing away region ops whereever possible. The intention is that these should be on all the time, but we will want to turn them off for benchmarking. compiler/dupelim.m: compiler/dupproc.m: compiler/exprn_aux.m: compiler/frameopt.m: compiler/global_data.m: compiler/jumpopt.m: compiler/livemap.m: compiler/llds_out.m: compiler/llds_to_x86_64.m: compiler/middle_rec.m: compiler/opt_debug.m: compiler/opt_util.m: compiler/par_conj_gen.m: compiler/reassign.m: compiler/stack_layout.m: compiler/stdlabel.m: compiler/trace_gen.m: compiler/use_local_vars.m: Conform to the changes above, which mostly means handling the new LLDS instructions. In some cases, factor out existing common code, turn if-then-elses into switches, group common cases in switches, rationalize argument orders or variable names, and/or put code in execution order. In reassign.m, fix some old oversights that could (in some unlikely cases) cause bugs in the generated code. compiler/pragma_c_gen.m: Exploit the capabilities of code_info.m. compiler/prog_type.m: Add a utility predicate. |
||
|
|
d4818a3ca4 |
Modify the code generator so that it recognizes construct_in_region and
Estimated hours taken: 35.
Branch: main.
Modify the code generator so that it recognizes construct_in_region and
generates suitable code when RBMM is used. The main
changes are in unify_gen.m. incr_hp is also changed to receive one more
(maybe) argument for region.
compiler/unify_gen.m:
Make it aware of HowToConstruct. This is the starting point of the
changes in the code generator so that it can generate code which
constructs terms in regions.
compiler/code_info.m:
compiler/var_locn.m:
Change in accordance with the introduction of how_to_construct in
unify_gen.m.
compiler/llds.m:
Add one extra argument to incr_hp for the region to construct terms
in.
compiler/dupelim.m:
compiler/dupproc.m:
compiler/exprn_aux.m:
compiler/global_data.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/llds_to_x86_64.m:
compiler/middle_rec.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/par_conj_gen.m:
compiler/reassign.m:
compiler/use_local_vars.m:
Change to deal with the extra maybe region argument in incr_hp.
compiler/llds_out.m:
Modify so that when RBMM is used it generates suitable call to
the region runtime for allocating terms in regions. The region
runtime (in C code) will be posted in anothe email.
compiler/hlds_data.m:
Fix a typo.
compiler/rbmm.interproc_region_lifetime.m:
Change to comply with coding standard.
|
||
|
|
7b7dabb89a |
Extend this optimization to handle temporaries being both defined in
Estimated hours taken: 12
Branches: main
compiler/use_local_vars.m:
Extend this optimization to handle temporaries being both defined in
and used by foreign_proc_code instructions. This should eliminate
unnecessary accesses to the MR_fake_reg array, and thus speed up
programs that use foreign code a lot, including typeclass- and
tabling-intensive programs, since those features are implemented using
inline foreign code. I/O intensive should also benefit, but not much,
since the cost of the I/O itself overwhelms the cost of the
MR_fake_reg accesses.
Group together the LLDS instructions that are handled similarly.
Factor out some common code.
compiler/opt_util.m:
Allow for the fact that foreign_proc_codes can now refer to
temporaries.
compiler/opt_debug.m:
Print more useful information about foreign_proc_code components.
compiler/prog_data.m:
Rename the types and function symbols of the recently added
foreign_proc attributes to avoid clashing with the keywords
representing them in source code.
Add a new foreign_proc attribute, proc_may_duplicate that governs
whether the body of foreign code is allowed to be duplicated.
compiler/table_gen.m:
Include does_not_affect_liveness among the annotations for the
foreign_proc calls generated by this module. Some of these procedures
affect memory beyond their arguments, but that memory is in tables,
not in unlisted registers.
Allow some of the smaller code fragments generated by this module
to be duplicated.
compiler/inlining.m:
Respect the may_not_duplicate foreign_proc attribute.
compiler/pragma_c_gen.m:
Transmit any annotations about liveness from the HLDS to the LLDS,
since without does_not_affect_liveness annotations use_local_vars.m
cannot optimize foreign_proc_codes.
Transmit any annotations about may_duplicate from the HLDS to the LLDS,
since with them jumpopt can do a better job.
compiler/llds.m:
Use the new foreign_proc attribute instead of a boolean to represent
whether a foreign code fragment may be duplicated.
compiler/simplify.m:
Generate an error message if a may_duplicate or may_not_duplicate
attribute on a foreign_proc conflicts with a no_inline or inline pragma
(respectively) on the predicate it belongs to.
compiler/hlds_pred.m:
Fix some comment rot.
compiler/jumpopt.m:
compiler/livemap.m:
compiler/proc_gen.m:
compiler/trace_gen.m:
Conform to the changes above.
doc/reference_manual.texi:
Document the new foreign_proc attribute.
library/array.m:
library/builtin.m:
library/char.m:
library/dir.m:
library/float.m:
library/int.m:
library/io.m:
library/lexer.m:
library/math.m:
library/private_builtin.m:
library/string.m:
library/version_array.m:
Add does_not_affect_liveness annotations to the C foreign_procs that
deserve them.
configure.in:
Require the installed compiler to support does_not_affect_liveness.
tests/invalid/test_may_duplicate.{m,err_exp}:
Add a new test case to test the error checking code in simplify.m.
tests/invalid/Mmakefile:
Enable the new test case.
|
||
|
|
e53d899ba0 |
Add some foreign_proc attributes that Ben needs for his work on the native
Estimated hours taken: 1 Branches: main Add some foreign_proc attributes that Ben needs for his work on the native garbage collector. One group of attributes (allocates_memory) is intended for optimization: they let the compiler figure out whether it needs to emit code to check the amount of available heap space before the foreign_proc. The second group (registers_roots) is intended for catching errors: foreign procs that do not register (or at least do not *assert* that they register) the roots they may hold. compiler/prog_data.m: Define the attributes. Standardize on spelling "does_not" instead of "doesnt". compiler/prog_io_pragma.m: Parse the attributes. compiler/*.m: Conform to the new spelling of attribute names. doc/reference_manual.texi: Add documentation for the attributes. Since they are for developers only at the moment, leave the documentation commented out. |
||
|
|
ba93a52fe7 |
This diff changes a few types from being defined as equivalent to a pair
Estimated hours taken: 10 Branches: main This diff changes a few types from being defined as equivalent to a pair to being discriminated union types with their own function symbol. This was motivated by an error message (one of many, but the one that broke the camel's back) about "-" being used in an ambiguous manner. It will reduce the number of such messages in the future, and will make compiler data structures easier to inspect in the debugger. The most important type changed by far is hlds_goal, whose function symbol is now "hlds_goal". Second and third in importance are llds.instruction (function symbol "llds_instr") and prog_item.m's item_and_context (function symbol "item_and_context"). There are some others as well. In several places, I rearranged predicates to factor the deconstruction of goals into hlds_goal_expr and hlds_goal_into out of each clause into a single point. In many places, I changed variable names that used "Goal" to refer to just hlds_goal_exprs to use "GoalExpr" instead. I also changed variable names that used "Item" to refer to item_and_contexts to use "ItemAndContext" instead. This should make reading such code less confusing. I renamed some function symbols and predicates to avoid ambiguities. I only made one algorithmic change (at least intentionally). In assertion.m, comparing two goals for equality now ignores goal_infos for all kinds of goals, whereas previously it ignored them for most kinds of goals, but for shorthand goals it was insisting on them being equal. This seemed to me to be a bug. Pete, can you confirm this? |
||
|
|
d66ed699a1 |
Add fields to structures representing the C code itself that says whether
Estimated hours taken: 4 Branches: main Add fields to structures representing the C code itself that says whether or not the C code affects the liveness of lvals. This is intended as the basis for future improvements in the optimization of such code. Implement a new foreign_proc attribute that allows programmers to set the value of this field. Eliminate names referring to `pragma c_code' in the LLDS backend in favor of names referring to foreign_procs. compiler/llds.m: Make the changes described above. Consistently put the field containing C code last in the function symbols that contain them. compiler/prog_data.m: Make the changes described above. Rename some other function symbols to avoid ambiguity. compiler/prog_io_pragma.m: Parse the new foreign_proc attribute. doc/reference_manual.texi: Document the new attribute. compiler/pragma_c_gen.m: Rename the main predicates. compiler/opt_util.m: Change some predicates into functions, for more convenient invocation. compiler/livemap.m: Rename the predicates in this module to avoid ambiguity and the need for module qualification. compiler/*.m: Conform to the changes above. |
||
|
|
ecf1ee3117 |
Add a mechanism for growing the stacks on demand by adding new segments
Estimated hours taken: 20 Branches: main Add a mechanism for growing the stacks on demand by adding new segments to them. You can ask for the new mechanism via a new grade component, stseg (short for "stack segments"). The mechanism works by adding a test to each increment of a stack pointer (sp or maxfr). If the test indicates that we are about to run out of stack, we allocate a new stack segment, allocate a placeholder frame on the new segment, and then allocate the frame we wanted in the first place on top of the placeholder. We also override succip to make it point code that will (1) release the new segment when the newly created stack frame returns, and then (2) go to the place indicated by the original, overridden succip. For leaf procedures on the det stack, we optimize away the check of the stack pointer. We can do this because we reserve some space on each stack for the use of such stack frames. My intention is that doc/user_guide.texi and NEWS will be updated once we have used the feature ourselves for a while and it seems to be stable. runtime/mercury_grade.h: Add the new grade component. runtime/mercury_conf_param.h: Document the new grade component, and the option used to debug stack segments. runtime/mercury_context.[ch]: Add new fields to contexts to hold the list of previous segments of the det and nondet stacks. runtime/mercury_memory_zones.[ch]: Include a threshold in all zones, for use in stack segments. Set it when a zone is allocated. Restore the previous #ifdef'd out function MR_unget_zone, for use when freeing stack segments execution has fallen out of. runtime/mercury_debug.[ch]: When printing the offsets of pointers into the det and nondet stacks, print the number of the segment the pointer points into (unless it is the first, in which case we suppress this in the interest of brevity and simplicity). Make all the functions in this module take a FILE * as an input argument; don't print to stdout by default. runtime/mercury_stacks.[ch]: Modify the macros that allocate stack frames to invoke the code for adding new stack segments when we are about to run out of stack. Standardize on "nondet" over "nond" as the abbreviation referring to the nondet stack. Conform to the changes in mercury_debug.c. runtime/mercury_stack_trace.c: When traversing the stack, step over the placeholder stack frames at the bottoms of stack segments. Conform to the changes in mercury_debug.c. runtime/mercury_wrapper.[ch]: Make the default stack size small in grades that support stack segments. Standardize on "nondet" over "nond" as the abbreviation referring to the nondet stack. Conform to the changes in mercury_debug.c. runtime/mercury_memory.c: Standardize on "nondet" over "nond" as the abbreviation referring to the nondet stack. runtime/mercury_engine.[ch]: runtime/mercury_overflow.h: Standardize on "nondet" over "nond" as the abbreviation referring to the nondet stack. Convert these files to four-space indentation. runtime/mercury_minimal_model.c: trace/mercury_trace.c: trace/mercury_trace_util.c: Conform to the changes in mercury_debug.c. compiler/options.m: Add the new grade option for stack segments. compiler/compile_target_code.m: compiler/handle_options.m: Add the new grade component, and handle its exclusions with other grade components and optimizations. compiler/llds.m: Extend the incr_sp instruction to record whether the stack frame is for a leaf procedure. compiler/llds_out.m: Output the extended incr_sp instruction. compiler/proc_gen.m: Fill in the extra slot in incr_sp instructions. compiler/goal_util.m: Provide a predicate for testing whether a procedure body is a leaf. compiler/delay_slot.m: compiler/dupelim.m: compiler/dupproc.m: compiler/exprn_aux.m: compiler/frameopt.m: compiler/global_data.m: compiler/jumpopt.m: compiler/middle_rec.m: compiler/opt_debug.m: compiler/opt_util.m: compiler/peephole.m: compiler/reassign.m: compiler/use_local_vars.m: Conform to the change in llds.m. scripts/canonicate_grade.sh-subr: scripts/init_grade_options.sh-subr: scripts/parse_grade_options.sh-subr: scripts/final_grade_options.sh-subr: scripts/mgnuc.in: Handle the new grade component. Convert parse_grade_options.sh-subr to four-space indentation. Mmake.workspace: Fix an old bug that prevented bootcheck from working in the new grade: when computing the gc grade, use the workspace's version of ml (which in this case understands the new grade components), rather than the installed ml (which does not). (This was a devil to track down, because neither make --debug nor strace on make revealed how the installed ml was being invoked, and there was no explicit invocation in the Makefile either; the error message appeared to come out of thin air just before the completion of the stage 2 library. It turned out the invocation happened implicitly, as a result of expanding a make variable.) |
||
|
|
e21193c283 |
Rename a bunch of predicates and function symbols to eliminate
Estimated hours taken: 6 Branches: main browser/*.m: compiler/*.m: Rename a bunch of predicates and function symbols to eliminate ambiguities. The only real change is factoring out some common code in the mlds and llds code generators, replacing them with single definitions in switch_util.m. |
||
|
|
712027f307 |
This patch changes the parallel execution mechanism in the low level backend.
Estimated hours taken: 100 Branches: main This patch changes the parallel execution mechanism in the low level backend. The main idea is that, even in programs with only moderate parallelism, we won't have enough processors to exploit it all. We should try to reduce the cost in the common case, i.e. when a parallel conjunction gets executed sequentially. This patch does two things along those lines: (1) Instead of unconditionally executing all parallel conjuncts (but the last) in separate Mercury contexts, we allow a context to continue execution of the next conjunct of a parallel conjunction if it has just finished executing the previous conjunct. This saves on allocating unnecessary contexts, which can be a big reduction in memory usage. We also try to execute conjuncts left-to-right so as to minimise the need to suspend contexts when there are dependencies between conjuncts. (2) Conjuncts that *are* executed in parallel still need separate contexts. We used to pass variable bindings to those conjuncts by flushing input variable values to stack slots and copying the procedure's stack frame to the new context. When the conjunct finished, we would copy new variable bindings back to stack slots in the original context. What happens now is that we don't do any copying back and forth. We introduce a new abstract machine register `parent_sp' which points to the location of the stack pointer at the time that a parallel conjunction began. In parallel conjuncts we refer to all stack slots via the `parent_sp' pointer, since we could be running on a different context altogether and `sp' would be pointing into a new detstack. Since parallel conjuncts now share the procedure's stack frame, we have to allocate stack slots such that all parallel conjuncts in a procedure that could be executing simultaneously have distinct sets of stack slots. We currently use the simplest possible strategy, i.e. don't allow variables in parallel conjuncts to reuse stack slots. Note: in effect parent_sp is a frame pointer which is only set for and used by the code of parallel conjuncts. We don't call it a frame pointer as it can be confused with "frame variables" which have to do with the nondet stack. compiler/code_info.m: Add functionality to keep track of how deep inside of nested parallel conjunctions the code generator is. Add functionality to acquire and release "persistent" temporary stack slots. Unlike normal temporary stack slots, these don't get implicitly released when the code generator's location-dependent state is reset. Conform to additions of `parent_sp' and parent stack variables. compiler/exprn_aux.m: Generalise the `substitute_lval_in_*' predicates by `transform_lval_in_*' predicates. Instead of performing a fixed substitution, these take a higher order predicate which performs some operation on each lval. Redefine the substitution predicates in terms of the transformation predicates. Conform to changes in `fork', `join_and_terminate' and `join_and_continue' instructions. Conform to additions of `parent_sp' and parent stack variables. Remove `substitute_rval_in_args' and `substitute_rval_in_arg' which were unused. compiler/live_vars.m: Introduce a new type `parallel_stackvars' which is threaded through `build_live_sets_in_goal'. We accumulate the sets of variables which are assigned stack slots in each parallel conjunct. At the end of processing a parallel conjunction, use this information to force variables which are assigned stack slots to have distinct slots. compiler/llds.m: Change the semantics of the `fork' instruction. It now takes a single argument: the label of the next conjunct after the current one. The instruction now "sparks" the next conjunct to be run, either in a different context (possibly in parallel, on another Mercury engine) or is queued to be executed in the current context after the current conjunct is finished. Change the semantics of the `join_and_continue' instruction. This instruction now serves to end all parallel conjuncts, not just the last one in a parallel conjunction. Remove the `join_and_terminate' instruction (no longer used). Add the new abstract machine register `parent_sp'. Introduce "parent stack slots", which are similar to normal stack slots but relative to the `parent_sp' register. compiler/par_conj_gen.m: Change the code generated for parallel conjunctions. That is: - use the new `fork' instruction at the beginning of a parallel conjunct; - use the `join_and_continue' instruction at the end of all parallel conjuncts; - keep track of how deep the code generator currently is in parallel conjunctions; - set and restore the `parent_sp' register when entering a non-nested parallel conjunction; - after generating the code of a parallel conjunct, replace all references to stack slots by parent stack slots; - remove code to copy back output variables when a parallel conjunct finishes. Update some comments. runtime/mercury_context.c: runtime/mercury_context.h: Add the type `MR_Spark'. Sparks are allocated on the heap and contain enough information to begin execution of a single parallel conjunct. Add globals `MR_spark_queue_head' and `MR_spark_queue_tail'. These are pointers to the start and end of a global queue of sparks. Idle engines can pick up work from this queue in the same way that they can pick up work from the global context queue (the "run queue"). Add new fields to the MR_Context structure. `MR_ctxt_parent_sp' is a saved copy of the `parent_sp' register for when the context is suspended. `MR_ctxt_spark_stack' is a stack of sparks that we decided not to put on the global spark queue. Update `MR_load_context' and `MR_save_context' to save and restore `MR_ctxt_parent_sp'. Add the counters `MR_num_idle_engines' and `MR_num_outstanding_contexts_and_sparks'. These are used to decide, when a `fork' instruction is reached, whether a spark should be put on the global spark queue (with potential for parallelism but also more overhead) or on the calling context's spark stack (no parallelism and less overhead). Rename `MR_init_context' to `MR_init_context_maybe_generator'. When initialising contexts, don't reset redzones of already allocated stacks. It seems to be unnecessary (and the reset implementation is buggy anyway, though it's fine on Linux). Rename `MR_schedule' to `MR_schedule_context'. Add new functions `MR_schedule_spark_globally' and `MR_schedule_spark_locally'. In `MR_do_runnext', add code for idle engines to get work from the global spark queue. Resuming contexts are prioritised over sparks. Rename `MR_fork_new_context' to `MR_fork_new_child'. Change the definitions of `MR_fork_new_child' and `MR_join_and_continue' as per the new behaviour of the `fork' and `join_and_continue' instructions. Delete `MR_join_and_terminate'. Add a new field `MR_st_orig_context' to the MR_SyncTerm structure to record which context originated the parallel conjunction instance represented by a MR_SyncTerm instance, and update `MR_init_sync_term'. This is needed by the new behaviour of `MR_join_and_continue'. Update some comments. runtime/mercury_engine.h: runtime/mercury_regs.c: runtime/mercury_regs.h: runtime/mercury_stacks.h: Add the abstract machine register `parent_sp' and code to copy it to and from the fake_reg array. Add a macro `MR_parent_sv' to access stack slots via `parent_sp'. Add `MR_eng_parent_sp' to the MercuryEngine structure. runtime/mercury_wrapper.c: runtime/mercury_wrapper.h: Add Mercury runtime option `--max-contexts-per-thread' which is saved in the global variable `MR_max_contexts_per_thread'. The number `MR_max_outstanding_contexts' is derived from this. It sets a soft limit on the number of sparks we put in the global spark queue, relative to the number of threads we are running. We don't want to put too many sparks on the global queue if there are plenty of ready contexts or sparks already on the global queues, as they are likely to result in new contexts being allocated. When initially creating worker engines, wait until all the worker engines have acknowledged that they are idle before continuing. This is mainly so programs (especially benchmarks and test cases) with only a few fork instructions near the beginning of the program don't execute the forks before any worker engines are ready, resulting in no parallelism. runtime/mercury_engine.c: runtime/mercury_thread.c: Don't allocate a context at the time a Mercury engine is created. An engine only needs a new context when it is about to pick up a spark. configure.in: compiler/options.m: scripts/Mercury.config.in: Update to reflect the extra field in MR_SyncTerm. Add the option `--sync-term-size' and actually make use the result of the sync term size calculated during configuration. compiler/code_util.m: compiler/continuation_info.m: compiler/dupelim.m: compiler/dupproc.m: compiler/global_data.m: compiler/hlds_llds.m: compiler/jumpopt.m: compiler/livemap.m: compiler/llds_out.m: compiler/middle_rec.m: compiler/opt_debug.m: compiler/opt_util.m: compiler/reassign.m: compiler/stack_layout.m: compiler/use_local_vars.m: compiler/var_locn.m: Conform to changes in `fork', `join_and_terminate' and `join_and_continue' instructions. Conform to additions of `parent_sp' and parent stack variables. XXX not sure about the changes in stack_layout.m library/par_builtin.m: Conform to changes in the runtime system. |
||
|
|
4924dfb1c9 |
One of Hans Boehm's papers says that heap cells allocated by GC_MALLOC_ATOMIC
Estimated hours taken: 5 Branches: main One of Hans Boehm's papers says that heap cells allocated by GC_MALLOC_ATOMIC are grouped together into pages, and these pages aren't scanned during the sweep phase of the garbage collector. I therefore modified the compiler to use GC_MALLOC_ATOMIC instead of GC_MALLOC whereever possible, i.e when the cell being allocated is guaranteed not to have any pointer to GCable memory inside it. My first benchmarking run showed a speedup of 4.5% in asm_fast.gc: EXTRA_MCFLAGS = --use-atomic-cells mercury_compile.01 average of 6 with ignore=1 18.30 EXTRA_MCFLAGS = --no-use-atomic-cells mercury_compile.02 average of 6 with ignore=1 19.17 However, later benchmarks, after the upgrade to version 7.0 of boehm_gc, show a less favourable and more mixed picture, with e.g. a 4% speedup in hlc.gc at -O3, a 3% slowdown in asm_fast.gc at -O4, and little effect otherwise: EXTRA_MCFLAGS = -O1 --use-atomic-cells GRADE = asm_fast.gc mercury_compile.01 average of 6 with ignore=1 23.30 EXTRA_MCFLAGS = -O1 --no-use-atomic-cells GRADE = asm_fast.gc mercury_compile.02 average of 6 with ignore=1 23.28 EXTRA_MCFLAGS = -O2 --use-atomic-cells GRADE = asm_fast.gc mercury_compile.03 average of 6 with ignore=1 18.51 EXTRA_MCFLAGS = -O2 --no-use-atomic-cells GRADE = asm_fast.gc mercury_compile.04 average of 6 with ignore=1 18.66 EXTRA_MCFLAGS = -O3 --use-atomic-cells GRADE = asm_fast.gc mercury_compile.05 average of 6 with ignore=1 18.44 EXTRA_MCFLAGS = -O3 --no-use-atomic-cells GRADE = asm_fast.gc mercury_compile.06 average of 6 with ignore=1 18.48 EXTRA_MCFLAGS = -O4 --use-atomic-cells GRADE = asm_fast.gc mercury_compile.07 average of 6 with ignore=1 18.28 EXTRA_MCFLAGS = -O4 --no-use-atomic-cells GRADE = asm_fast.gc mercury_compile.08 average of 6 with ignore=1 17.70 EXTRA_MCFLAGS = -O1 --use-atomic-cells GRADE = hlc.gc mercury_compile.09 average of 6 with ignore=1 24.78 EXTRA_MCFLAGS = -O1 --no-use-atomic-cells GRADE = hlc.gc mercury_compile.10 average of 6 with ignore=1 24.69 EXTRA_MCFLAGS = -O2 --use-atomic-cells GRADE = hlc.gc mercury_compile.11 average of 6 with ignore=1 19.36 EXTRA_MCFLAGS = -O2 --no-use-atomic-cells GRADE = hlc.gc mercury_compile.12 average of 6 with ignore=1 19.26 EXTRA_MCFLAGS = -O3 --use-atomic-cells GRADE = hlc.gc mercury_compile.13 average of 6 with ignore=1 18.64 EXTRA_MCFLAGS = -O3 --no-use-atomic-cells GRADE = hlc.gc mercury_compile.14 average of 6 with ignore=1 19.38 EXTRA_MCFLAGS = -O4 --use-atomic-cells GRADE = hlc.gc mercury_compile.15 average of 6 with ignore=1 19.39 EXTRA_MCFLAGS = -O4 --no-use-atomic-cells GRADE = hlc.gc mercury_compile.16 average of 6 with ignore=1 19.41 runtime/mercury_heap.h: Define atomic equivalents of the few heap allocation macros that didn't already have one. These macros are used by the LLDS backend. runtime/mercury.h: Define an atomic equivalent of the MR_new_object macro. These macros are used by the MLDS backend. Use MR_new_object_atomic instead of MR_new_object to box floats. compiler/hlds_data.m: compiler/llds.m: compiler/mlds.m: Modify the representations of the heap allocations constructs to include a flag that says whether we should use the atomic variants of the heap allocation macros. compiler/llds_out.m: compiler/mlds_to_c.m: Respect this extract flag when emitting C code. In mlds_to_c.m, also add some white space that makes the code easier for humans to read. compiler/type_util.m: Add a mechanism for finding out whether we can put a value of a given type into an atomic cell. Put the definitions of functions and predicates in this module in the same order as their declarations. Turn some predicates into functions. Change the argument order of some predicates to conform to our usual conventions. compiler/unify_gen.m: compiler/ml_unify_gen.m: Use the new mechanism in type_util.m to generate code that creates atomic heap cells if this is possible and is requested. compiler/code_info.m: compiler/var_locn.m: Act on the information provided by unify_gen.m. compiler/options.m: doc/user_guide.texi: Add an option to control whether the compiler should try to use atomic cells. compiler/dupelim.m: compiler/dupproc.m: compiler/exprn_aux.m: compiler/higher_order.m: compiler/jumpopt.m: compiler/livemap.m: compiler/middle_rec.m: compiler/ml_code_util.m: compiler/ml_elim_nested.m: compiler/ml_optimize.m: compiler/ml_util.m: compiler/mlds_to_gcc.m: compiler/mlds_to_il.m: compiler/mlds_to_java.m: compiler/modecheck_unify.m: compiler/opt_debug.m: compiler/opt_util.m: compiler/par_conj_gen.m: compiler/polymorphism.m: compiler/reassign.m: compiler/size_prof.m: compiler/structure_sharing.domain.m: compiler/use_local_vars.m: Minor diffs to conform to the changes above. compiler/structure_reuse.direct.choose_reuse.m: Add an XXX comment about the interaction of the new capability with structure reuse. |
||
|
|
aeeedd2c13 |
Standardize formatting of comments at the beginning of modules.
compiler/*.m: Standardize formatting of comments at the beginning of modules. |
||
|
|
469f1dc09b |
This diff contains no algorithmic changes.
Estimated hours taken: 1.5 Branches: main This diff contains no algorithmic changes. compiler/llds.m: compiler/mlds.m: Rename some function symbols and field names to avoid ambiguities with respect to language keywords. compiler/*.m: Conform to the changes in llds.m and mlds.m. |
||
|
|
d5d5986472 |
Implement lookup switches in which a switch arm may contain more than one
Estimated hours taken: 40
Branches: main
Implement lookup switches in which a switch arm may contain more than one
solution, such as this code here:
p(d, "four", f1, 4.4).
p(e, "five", f2, 5.5).
p(e, "five2", f3(5), 55.5).
p(f, "six", f4("hex"), 6.6).
p(g, "seven", f5(77.7), 7.7).
p(g, "seven2", f1, 777.7).
p(g, "seven3", f2, 7777.7).
Such code occurs frequently in benchmark programs used to evaluate the
performance of tabled logic programming systems.
Change frameopt.m, which previously worked only on det and semidet code,
to also work for nondet code. For predicates such as the one above, frameopt
can now arrange for the predicate's nondet stack frame to be created only
when a switch arm that has more than one solution is selected.
compiler/lookup_switch.m:
Extend the existing code for recognizing and implementing lookup
switches to recognize and implement them even if they are model_non.
compiler/lookup_util.m:
New module containing utility predicates useful for implementing
both lookup switches, and in the future, lookup disjunctions (i.e.
disjunctions that correspond to a nondet arm of a lookup switch).
compiler/ll_backend.m:
Include the new module.
compiler/notes/compiler_design.html:
Mention the new module.
compiler/global_data.m:
Move the job of filling in dummy slots to our caller, in this case
lookup_switch.m.
compiler/frameopt.m:
Generalize the existing code for delaying stack frame creation,
which worked only on predicates that live on the det stack, to work
also on predicates that live on the nondet stack. Without this,
predicates whose bodies are model_non lookup switches would create
a nonstack stack frame before the switch is ever entered, which
is wasteful if the selected switch arm has at most one solution.
Since the structure of model_non predicates is more complex (you can
cause a branch to a label by storing its address in a redoip slot,
you can succeed from the frame without removing the frame), this
required considerable extra work. To make the new code debuggable,
record, for each basic block that needs a stack frame, *why* it
needs that stack frame.
compiler/opt_util.m:
Be more conservative about what refers to the stack. Export some
previously internal functionality for frameopt. Turn some predicates
into functions, and rename them to better reflect their purpose.
compiler/opt_debug.m:
Print much more information about pragma_c and call LLDS instructions.
compiler/prog_data.m:
Add an extra attribute to foreign_procs that says that the code
of the foreign_proc assumes the existence of a stack frame.
This is needed to avoid frameopt optimizing the stack frame away.
compiler/add_pragma.m:
When processing fact tables, we create foreign_procs that assume
the existence of the stack frame, so set the new attribute.
compiler/pragma_c_gen.m:
When processing foreign_procs, transmit the information in the
attribute to the generated LLDS code.
compiler/llds.m:
Rename the function symbols referring to the fixed slots in nondet
stack frames to make them clearer and to avoid overloading function
symbols such as curfr and succip.
Rename the function symbols of the call_model type to avoid overloading
the function symbols of the code_model type.
Add a new field to the c_procedure type giving the code_model of the
procedure, and give names to all the fields.
Describe the stack slots used by lookup switches to the debugger
and native gc.
compiler/options.m:
doc/user_guide.texi:
Add a new option, --debug-opt-pred-name, that does when the existing
--debug-opt-pred-id options does, but taking a user-friendly predicate
name rather than a pred_id as its argument.
compiler/handle_options.m:
Process --debug-opt-pred-name, and make --frameopt-comments imply
--auto-comments, since it is not useful without it.
Reformat some existing comments that were written in the days of
8-space indentation.
compiler/optimize.m:
Implement the new option.
Use the new field of the c_procedure type to try only the version
of frameopt appropriate for the code model of the current procedure.
Do a peephole pass after frameopt, since frameopt can generate code
sequences that peephole can optimize.
Make the mechanism for recording the process of optimizing procedure
bodies more easily usable by including the name of the optimization
that created a given version of the code in the name of the file
that contains that version of the code, and ensuring that all numbers
are two characters long, so that "vi procname*.opt*" looks at the
relevant files in the proper chronological sequence, instead of having
version 11 appear before version 2.
compiler/peephole.m:
Add a new optimization pattern: a "mkframe, goto fail" pair (which
can be generated by frameopt) should be replaced by a simple "goto
redo".
compiler/code_gen.m:
Factor out some common code.
compiler/llds_out.m:
Ensure that C comments nested inside comment(_) LLDS instructions
aren't emitted as nested C comments, since C compilers cannot handle
these.
compiler/code_info.m:
compiler/code_util.m:
compiler/continuation_info.m:
compiler/dupelim.m:
compiler/exprn_aux.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/llds_out.m:
compiler/mercury_compile.m:
compiler/middle_rec.m:
compiler/ml_code_gen.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/peephole.m:
compiler/stack_layout.m:
compiler/transform_llds.m:
compiler/var_locn.m:
Conform to the change to prog_data.m, opt_util.m and/or llds.m.
compiler/handle_options.m:
Don't execute the code in stdlabel.m if doing so would cause a compiler
abort.
tests/hard_coded/dense_lookup_switch_non.{m,exp}:
New test case to exercise the new algorithm.
tests/hard_coded/Mmakefile:
Enable the new test case.
tests/hard_coded/cycles.m:
Make this test case conform to our coding convention.
|
||
|
|
483e861c12 |
Fix the same bug in all these three places: treat the rval arguments
Estimated hours taken: 0.5 Branches: main, release compiler/livemap.m: compiler/middle_rec.m: compiler/var_locn.m: Fix the same bug in all these three places: treat the rval arguments of mem_ref terms as rvals, instead of as the constants they used to be. |
||
|
|
459847a064 |
Move the univ, maybe, pair and unit types from std_util into their own
Estimated hours taken: 18 Branches: main Move the univ, maybe, pair and unit types from std_util into their own modules. std_util still contains the general purpose higher-order programming constructs. library/std_util.m: Move univ, maybe, pair and unit (plus any other related types and procedures) into their own modules. library/maybe.m: New module. This contains the maybe and maybe_error types and the associated procedures. library/pair.m: New module. This contains the pair type and associated procedures. library/unit.m: New module. This contains the types unit/0 and unit/1. library/univ.m: New module. This contains the univ type and associated procedures. library/library.m: Add the new modules. library/private_builtin.m: Update the declaration of the type_ctor_info struct for univ. runtime/mercury.h: Update the declaration for the type_ctor_info struct for univ. runtime/mercury_mcpp.h: runtime/mercury_hlc_types.h: Update the definition of MR_Univ. runtime/mercury_init.h: Fix a comment: ML_type_name is now exported from type_desc.m. compiler/mlds_to_il.m: Update the the name of the module that defines univs (which are handled specially by the il code generator.) library/*.m: compiler/*.m: browser/*.m: mdbcomp/*.m: profiler/*.m: deep_profiler/*.m: Conform to the above changes. Import the new modules where they are needed; don't import std_util where it isn't needed. Fix formatting in lots of modules. Delete duplicate module imports. tests/*: Update the test suite to confrom to the above changes. |
||
|
|
be5b71861b |
Convert almost all the compiler modules to use . instead of __ as
Estimated hours taken: 6 Branches: main compiler/*.m: Convert almost all the compiler modules to use . instead of __ as the module qualifier. In some cases, change the names of predicates and types to make them meaningful without the module qualifier. In particular, most of the types that used to be referred to with an "mlds__" prefix have been changed to have a "mlds_" prefix instead of changing the prefix to "mlds.". There are no algorithmic changes. |
||
|
|
4d377aba26 |
Fix a bug reported by Greg Duck. The bug was that the compiler always generated
Estimated hours taken: 5
Branches: main
Fix a bug reported by Greg Duck. The bug was that the compiler always generated
a global variable of type MR_Word for each mutable, but could then assign
variables of other types to and from this global. If this other type is
float, the assignments discard the fractional part. If this other type
is something else, the result can be even worse.
There are two ways to fix this discrepancy. One is to change the type of the
global, and the other is to change the type of the variables it is assigned
to and from. The former looks cleaner, but it would mean that every call
to the get predicate would require a boxing operation, and that can be
relatively slow, since (e.g. for floats) it may require allocating a heap cell.
This diff implements both solutions.
We use the second solution on the LLDS backend because of its better
performance. We have to use the first solution on the MLDS backend,
because on that backend the type of the mutable variable is reflected
in the signature of the getter and setter predicates (whereas on the
LLDS backend all arguments are always MR_Words).
compiler/options.m:
Add an internal-only option that controls whether we use the first
solution or the second.
compiler/handle_options.m:
Make the MLDS backend imply the first solution.
compiler/prog_data.m:
For each argument of a foreign_proc item, record whether we want to
keep it boxed in the foreign code.
Add a foreign_proc attribute that asks for every arg to be kept boxed.
We can attach this to the mutable implementation foreign_procs we write
out to .opt files. This attribute is deliberately undocumented since
users should never use it.
compiler/make_hlds_passes.m:
For each argument of the get and set foreign_procs we create for
mutables, record that we do want to keep it boxed.
Move the action of creating the foreign code for the mutable's
declaration and definition to the third pass, since during the second
pass we don't necessarily know yet what its foreign type is (we may not
have processed a foreign_type declaration affecting it). Move the code
for creating the foreign code here from prog_mutable, since it depends
on the HLDS (and prog_mutable.m is in the parse_tree package).
Hoist some error handling code to put it where it belongs,
and to avoid some errors being reported twice.
compiler/hlds_goal.m:
For each argument of a foreign_proc goal, record whether we want to
keep it boxed in the foreign code.
compiler/llds_out.m:
compiler/pragma_c_gen.m:
compiler/ml_code_gen.m:
compiler/ml_call_gen.m:
If a foreign_proc argument is noted as being kept boxed in the
foreign_proc code, then keep it that way.
compiler/prog_io_pragma.m:
Parse the new foreign_proc annotation.
compiler/simplify.m:
If a foreign_proc has the always_boxed annotation, attach this info
to each of its args. We do this here because simplify is guaranteed
to be executed before all the code that may inspect these arguments.
Since nothing ever deletes an always_boxed annotation of a foreign_proc
arg, the code that attaches the annotation is idempotent, so the fact
that the compiler executes simplify twice is not a problem.
compiler/*.m:
Minor changes to conform to the changes in data structures above.
compiler/prog_type.m:
Move a function definition from prog_mutable to prog_type, and
fix the lack of abstraction in its code.
compiler/prog_mutable.m:
Delete the code moved to make_hlds_passes.m and prog_type.m.
compiler/notes/compiler_design.html:
Make the documentation of prog_mutable.m easier to read in source.
tests/hard_coded/float_gv.{m,exp}:
An extended version of Greg's code as a new test case.
tests/hard_coded/Mmakefile:
Enable the new test case.
tests/hard_coded/sub-modules/non_word_mutable.{m,exp}:
tests/hard_coded/sub-modules/non_word_mutable.child.m:
A version of the float_gv test case in which the compiler-generated
get and set foreign_procs should end up in .opt files.
tests/hard_coded/sub-modules/Mmakefile:
tests/hard_coded/sub-modules/Mercury_options:
Enable the new test case, and make it execute with intermodule
optimization.
tests/invalid/bad_mutable.err_exp:
Expect the new output (in which an error is reported just once,
not twice).
|
||
|
|
45fdb6c451 |
Use expect/3 in place of require/2 throughout most of the
Estimated hours taken: 4 Branches: main compiler/*.m: Use expect/3 in place of require/2 throughout most of the compiler. Use unexpected/2 (or sorry/2) in place of error/1 in more places. Fix more dodgy assertion error messages. s/map(prog_var, mer_type)/vartypes/ where the latter is meant. |
||
|
|
f9fe8dcf61 |
Improve the error messages generated for determinism errors involving committed
Estimated hours taken: 8
Branches: main
Improve the error messages generated for determinism errors involving committed
choice contexts. Previously, we printed a message to the effect that e.g.
a cc pred is called in context that requires all solutions, but we didn't say
*why* the context requires all solutions. We now keep track of all the goals
to the right that could fail, since it is these goals that may reject the first
solution of a committed choice goal.
The motivation for this diff was the fact that I found that locating the
failing goal can be very difficult if the conjunction to the right is
a couple of hundred lines long. This would have been a nontrivial problem,
since (a) unifications involving values of user-defined types are committed
choice goals, and (b) we can expect uses of user-defined types to increase.
compiler/det_analysis.m:
Keep track of goals to the right of the current goal that could fail,
and include them in the error representation if required.
compiler/det_report.m:
Include the list of failing goals to the right in the representations
of determinism errors involving committed committed choice goals.
Convert the last part of this module that wasn't using error_util
to use error_util. Make most parts of this module just construct
error message specifications; print those specifications (using
error_util) in only a few places.
compiler/hlds_out.m:
Add a function for use by the new code in det_report.m.
compiler/error_util.m:
Add a function for use by the new code in det_report.m.
compiler/error_util.m:
compiler/compiler_util.m:
Error_util is still changing reasonably often, and yet it is
included in lots of modules, most of which need only a few simple
non-parse-tree-related predicates from it (e.g. unexpected).
Move those predicates to a new module, compiler_util.m. This also
eliminates some undesirable dependencies from libs to parse_tree.
compiler/libs.m:
Include compiler_util.m.
compiler/notes/compiler_design.html:
Document compiler_util.m, and fix the documentation of some other
modules.
compiler/*.m:
Import compiler_util instead of or in addition to error_util.
To make this easier, consistently use . instead of __ for module
qualifying module names.
tests/invalid/det_errors_cc.{m,err_exp}:
Add this new test case to test the error messages for cc contexts.
tests/invalid/det_errors_deet.{m,err_exp}:
Add this new test case to test the error messages for unifications
inside function symbols.
tests/invalid/Mmakefile:
Add the new test cases.
tests/invalid/det_errors.err_exp:
tests/invalid/magicbox.err_exp:
Change the expected output to conform to the change in det_report.m,
which is now more consistent.
|
||
|
|
d609181cb9 |
Consider types of the form
Estimated hours taken: 30
Branches: main
Consider types of the form
:- type x ---> f.
to be dummy types, since they contain no information. Optimize them the same
way we currently optimize io.state and store.store.
runtime/mercury_type_info.h:
Add a new type_ctor_rep for dummy types.
runtime/mercury_tabling.h:
Add a representation for "tabled" dummy types, which don't actually
have a level in the trie, so that the runtime system can handle that
fact.
runtime/mercury_ml_expand_body.h:
When deconstructing a value of a dummy type, ignore the actual value
(since it will contain garbage) and instead return the only possible
value of the type.
runtime/mercury_construct.c:
runtime/mercury_deconstruct.c:
runtime/mercury_deep_copy_body.c:
runtime/mercury_tabling.c:
runtime/mercury_unify_compare_body.h:
library/rtti_implementation.m:
Handle the type_ctor_rep of dummy types.
runtime/mercury_builtin_types.c:
Provide a place to record profiling information about unifications and
comparisons for dummy types.
runtime/mercury_mcpp.h:
java/runtime/TypeCtorRep.java:
library/private_builtin.m:
Add a new type_ctor_rep for dummy types, and fix some previous
discrepancies in type_ctor_reps.
mdbcomp/prim_data.m:
Move a bunch of predicates for manipulating special_pred_ids here from
the browser and compiler directories.
Rename the function symbols of the special_pred_id type to avoid the
need to parenthesize the old `initialise' function symbol.
Convert to four-space indentation.
mdbcomp/rtti_access.m:
Don't hardcode the names of special preds: use the predicates in
prim_data.m.
Convert to four-space indentation.
browser/declarative_execution.m:
Delete some predicates whose functionality is now in
mdbcomp/prim_data.m.
compiler/hlds_data.m:
Replace the part of du type that says whether a type an enum, which
used to be a bool, with something that also says whether the type is a
dummy type.
Convert to four-space indentation.
compiler/make_tags.m:
Compute the value for the new field of du type definitions.
compiler/hlds_out.m:
Write out the new field of du type definitions.
compiler/rtti.m:
Modify the data structures we use to create type_ctor_infos to allow
for dummy types.
Convert to four-space indentation.
compiler/type_ctor_info.m:
Modify the code that generates type_ctor_infos to handle dummy types.
compiler/type_util.m:
Provide predicates for recognizing dummy types.
Convert to four-space indentation.
compiler/unify_proc.m:
Generate the unify and compare predicates of dummy types using a new
code scheme that avoids referencing arguments that contain garbage.
When generating code for unifying or comparing other types, ignore
any arguments of function symbols that are dummy types.
Don't use DCG style access predicates.
compiler/higher_order.m:
Specialize the unification and comparison of values of dummy types.
Break up an excessively large predicate, and factor out common code
from the conditions of a chain of if-then-elses.
compiler/llds.m:
For each input and output of a foreign_proc, include a field saying
whether the value is of a dummy type.
compiler/pragma_c_gen.m:
Fill in the new fields in foreign_proc arguments.
compiler/hlds_goal.m:
Rename some predicates for constructing unifications to avoid
unnecessary ad-hoc overloading. Clarify their documentation.
Rename a predicate to make clear the restriction on its use,
and document the restriction.
Add a predicate for creating simple tests.
Add a utility predicate for setting the context of a goal directly.
compiler/modules.m:
Include dummy types interface files, even if they are private to the
module. This is necessary because with the MLDS backend, the generated
code inside the module and outside the module must agree whether a
function returning a value of the type returns a real value or a void
value, and this requires them to agree on whether the type is dummy
or not.
The impact on interface files is minimal, since very few types are
dummy types, and changing a type from a dummy type to a non-dummy type
or vice versa is an ever rarer change.
compiler/hlds_pred.m:
Provide a representation in the compiler of the trie step for dummy
types.
compiler/layout_out.m:
Print the trie step for dummy types.
compiler/table_gen.m:
Don't table values of dummy types, and record the fact that we don't
by including a dummy trie step in the list of trie steps.
compiler/add_pragma.m:
compiler/add_special_pred.m:
compiler/add_type.m:
compiler/aditi_builtin_ops.m:
compiler/bytecode.m:
compiler/bytecode_gen.m:
compiler/code_gen.m:
compiler/code_info.m:
compiler/continuation_info.m:
compiler/cse_detection.m:
compiler/det_report.m:
compiler/exception_analysis.m:
compiler/inst_match.m:
compiler/livemap.m:
compiler/llds_out.m:
compiler/llds_out.m:
compiler/middle_rec.m:
compiler/ml_call_gen.m:
compiler/ml_closure_gen.m:
compiler/ml_code_gen.m:
compiler/ml_code_util.m:
compiler/ml_type_gen.m:
compiler/ml_unify_gen.m:
compiler/mlds_to_c.m:
compiler/mlds_to_gcc.m:
compiler/mlds_to_il.m:
compiler/mlds_to_il.m:
compiler/modecheck_unify.m:
compiler/modes.m:
compiler/opt_util.m:
compiler/post_term_analysis.m:
compiler/post_typecheck.m:
compiler/qual_info.m:
compiler/rl.m:
compiler/rl_exprn.m:
compiler/rl_key.m:
compiler/rtti_out.m:
compiler/simplify.m:
compiler/size_prof.m:
compiler/term_constr_initial.m:
compiler/term_constr_util.m:
compiler/term_norm.m:
compiler/termination.m:
compiler/trace.m:
compiler/typecheck.m:
compiler/unify_gen.m:
Conform to the changes above.
compiler/export.m:
compiler/exprn_aux.m:
compiler/foreign.m:
compiler/polymorphism.m:
compiler/proc_label.m:
compiler/rtti_to_mlds.m:
compiler/special_pred.m:
compiler/stack_alloc.m:
compiler/stack_layout.m:
compiler/state_var.m:
compiler/switch_util.m:
compiler/trace_params.m:
Conform to the changes above.
Convert to four-space indentation.
compiler/mlds_to_java.m:
compiler/var_locn.m:
Conform to the changes above, which requires threading the module_info
through the module.
Convert to four-space indentation.
compiler/mercury_compile.m:
Pass the module_info to mlds_to_java.m.
compiler/ml_util.m:
compiler/polymorphism.m:
compiler/type_ctor_info.m:
compiler/type_util.m:
Delete some previously missed references to the temporary types used
to bootstrap the change to the type_info type's arity.
compiler/polymorphism.m:
Turn back on an optimization that avoids passing parameters (such as
type_infos) to foreign_procs if they are not actually referred to.
compiler/prog_data.m:
Convert to four-space indentation.
library/svvarset.m:
Add a missing predicate.
trace/mercury_trace.c:
Delete the unused function that used to check for dummy types.
tests/debugger/field_names.{m,inp,exp}:
Add to this test case a test of the handling of dummy types. Check that
their values can be printed out during normal execution, and that the
debugger doesn't consider them live nondummy variables, just as it
doesn't consider I/O states live nondummy variables.
|
||
|
|
753d9755ae |
When returning from det and semidet predicates, load the return address into a
Estimated hours taken: 3 Branches: main When returning from det and semidet predicates, load the return address into a local C variable instead of the succip abstract machine "register" before popping the stack frame and returning. This gives the C compiler more freedom to reorder instructions. This diff gets a 1.4% speed increase on the compiler. runtime/mercury_stacks.h: Provide a new macro, MR_decr_sp_and_return, to do the combined job that its name describes. compiler/llds.m: Add a new LLDS instruction that corresponds to the new macro. compiler/llds_out.m: Output the new LLDS instruction. compiler/peephole.m: Add a predicate that looks for and exploits opportunities for using the new instruction. compiler/optimize.m: Invoke the new peephole predicate as the next-to-last optimization pass. (The last is wrapping up blocks created by --use-local-vars.) compiler/*.m: Minor changes to handle the new instruction. |
||
|
|
1ed891b7b1 |
Introduce a mechanism for extending the det and nondet stacks when needed.
Estimated hours taken: 24
Branches: main
Introduce a mechanism for extending the det and nondet stacks when needed.
The mechanism takes the form of a new grade component, .exts ("extend stacks").
While the new mechanism may be useful in its own right, it is intended mainly
to support a new implementation of minimal model tabling, which will use a
separate Mercury context for each distinct subgoal. Each context has its own
det and nondet stack. Clearly, we can't have hundreds of contexts each with
megabyte sized det stacks. The intention is that the stacks of the subgoals
will start small, and be expanded when needed.
The runtime expansion of stacks doesn't work yet, but it is unnecessarily
hard to debug without an installed compiler that understands the new grade
component, which is why this diff will be committed before that is fixed.
compiler/handle_options.m:
compiler/options.m:
runtime/mercury_grade.h:
scripts/canonical_grade.sh-subr
scripts/init_grade_options.sh-subr
scripts/parse_grade_options.sh-subr
scripts/mgnuc.in
Handle the new grade component.
runtime/mercury_memory_zones.h:
Add MR_ prefixes to the names of the fields of the zone structure.
Record not just the actual size of each zone, which includes various
kinds of buffers, but also the desired size of the zone exclusive of
buffers.
Format the documentation of the zone structure fields more
comprehensibly.
runtime/mercury_memory_zones.c:
Instead of implementing memalign if it is not provided by the operating
system, implement a function that allows us to reallocate the returned
area of memory.
Provide a prototype implementation of memory zone extension. It doesn't
work yet.
Factor out the code for setting up redzones, since it is now needed
in more than place.
Convert to four space indentation.
Make the debugging functions a bit more flexible.
runtime/mercury_wrapper.c:
Conform to the improved interface of the debugging functions.
runtime/mercury_overflow.h:
runtime/mercury_std.h:
Move a generally useful macro from mercury_overflow.h to mercury_std.h.
runtime/mercury_stacks.c:
Add functions to extend the stacks.
runtime/mercury_stacks.h:
Add the tests required to invoke the functions that extend the stacks.
Add the macros needed by the change to compiler/llds.m.
Convert to four space indentation.
runtime/mercury_conf.h.in:
Prepare for the use of the posix_memalign function, which is the
current replacement of the obsolete memalign library function.
We don't yet use it.
runtime/mercury_context.h:
Format the documentation of the context structure fields more
comprehensibly.
Put MR_ prefixes on the names of the fields of some structures
that didn't previously have them.
Conform to the new names of the fields of the zone structure.
runtime/mercury_context.c:
runtime/mercury_debug.c:
runtime/mercury_deep_copy.c:
runtime/mercury_engine.c:
runtime/mercury_memory_handlers.c:
library/benchmarking.m:
library/exception.m:
Conform to the new names of the fields of the zone structure.
In some cases, add missing MR_ prefixes to function names
and/or convert to four space indentation.
runtime/mercury_engine.h:
Add a new low level debug flag for debugging stack extensions.
Format the documentation of the engine structure fields more
comprehensibly.
Convert to four space indentation.
runtime/mercury_conf_param.h:
Document a new low level debug flag for debugging stack extensions.
compiler/compile_target_code.m:
compiler/handle_options.m:
compiler/options.m:
Handle the new grade component.
compiler/llds.m:
Add two new kinds of LLDS instructions, save_maxfr and restore_maxfr.
These are needed because the nondet stack may be relocated between
saving and the restoring of maxfr, and the saved maxfr may point to
the old stack. In .exts grades, these instructions will save not a
pointer but the offset of maxfr from the start of the nondet stack,
since offsets are not affected by the movement of the nondet stack.
compiler/code_info.m:
Use the new instructions where relevant. (Some more work may be
needed on this score; the relevant places are marked with XXX.)
compiler/dupelim.m:
compiler/dupproc.m:
compiler/exprn_aux.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/llds_out.m:
compiler/middle_rec.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/reassign.m:
compiler/use_local_vars.m:
Handle the new LLDS instructions.
tools/bootcheck:
Provide a mechanism for setting the initial stack sizes for a
bootcheck.
|
||
|
|
f0dbbcaa34 |
Generate better code for base relations such as the ones in the transitive
Estimated hours taken: 16 Branches: main Generate better code for base relations such as the ones in the transitive closure benchmarkings in the paper on minimal model tabling. These improvements yield speedups ranging from 5 to 25% on those benchmarks. compiler/use_local_vars.m: Make this optimization operate on extended basic blocks instead of plain basic blocks. The greater length of extended basic blocks allows the local variables to have maximum scope possible. The price is that the test for whether assignment to a given lvalue can be avoided or not is now dependent on which of the constituent basic blocks of extended basic block contains the assignment, and thus the test has to be evaluate once for each assignment we try to optimize instead of once per block. Don't allocate temporary variables if the optimization they are intended for turns out not to be allowed. This change avoids having declarations for unused temporary variables in the resulting C code. If --auto-comments is set, insert use_local_vars.m's main data structure, the livemap, into the generated LLDS code as a comment. compiler/peephole.m: Look for the pattern mkframe(Size, Redoip) <straight line instructions that don't use stack slots> succeed and optimize away the mkframe. This pattern always arises for procedures that are actually semidet but are declared nondet (such as the base relations in the tabling benchmarks), and may also arise for semidet branches of nondet procedures. compiler/llds.m: Allow an existing peephole pattern to work better. The pattern is mkframe(Seize, do_fail) <straight line instructions> redoip(curfr) = Redoip Previously, if some compiler-generated C code was among the straight line instructions, the pattern couldn't be applied, since peephole.m couldn't know whether it branched away through the redoip slot of the frame. This diff adds an extra slot to the relevant pragma_c component that tells peephole.m (actually, the predicate in opt_util.m that peephole relies on) whether this is the case. compiler/basic_block.m: Provide functionality for merging basic blocks into extended basic blocks. compiler/dupelim.m: Conform to the change in basic_block.m's interface. Convert to four-space indentation, and fix departures from our style guidelines. compiler/opt_util.m: Provide extra information now needed by use_local_vars. Convert to four-space indentation, and fix departures from our style guidelines. compiler/opt_debug.m: Show the user friendly versions of label names when dumping livemaps and instructions. Shorten the dumped descriptions of registers and stack slots. Dump instructions inside blocks. compiler/frameopt.m: Conform to the changes in opt_util and opt_debug's interfaces. compiler/optimize.m: Use the facilities of opt_debug instead of llds_out when dumping the LLDS after each optimization, since these are now more compact and thus reader friendly. Print unmangled names when writing progress messages. Put the dump files we generate with --opt-debug in a separate subdirectory, since when compiling e.g. tree234.m, the process can generate more than a thousand files. Give the dump files minimally mangled names. compiler/code_gen.m: compiler/pragma_c_gen.m: Convert to four-space indentation, and fix departures from our style guidelines. Conform to the change in llds.m. compiler/code_info.m: compiler/exprn_aux.m: compiler/ite_gen.m: compiler/jumpopt.m: compiler/livemap.m: compiler/llds_out.m: compiler/middle_rec.m: compiler/trace.m: Conform to the change in llds.m. |
||
|
|
6df9a05856 |
This diff cleans up a bunch of modules. It has no algorithmic changes
Estimated hours taken: 10 Branches: main This diff cleans up a bunch of modules. It has no algorithmic changes other than in the formatting of error messages. compiler/error_util.m: Delete the obsolete predicate append_punctuation, since the suffix format component can now do more, and do it more easily. compiler/goal_util.m: compiler/hlds_goal.m: compiler/hlds_llds.m: compiler/instmap.m: compiler/const_prop.m: Change the argument order of some the predicates exported by these modules to make them easier to use with state variable syntax. compiler/*.m: Convert a bunch of these modules to four space indentation, and fix departures from our coding style. Conform to the changed argument order above. Use suffixes instead of append_punctuation. library/string.m: Add string.foldl2. tests/invalid/circ_*.err_exp: tests/warnings/unused_args_*.exp: Expect the updated error messages, which format sym_names consistently the same way as other messages. |
||
|
|
5b105a0968 |
Optimize higher order calls by providing variants of the relevant code that
Estimated hours taken: 6 Branches: main Optimize higher order calls by providing variants of the relevant code that are specialized to a given number of explicitly given arguments. runtime/mercury_ho_call.[ch]: Define variants of do_call_closure and do_call_class_method specialized to 0, 1, 2 or 3 explicit input arguments. Apart from not needing to be passed the number of explicit input arguments in a register, these avoid some runtime tests and unroll loops. Harmonize the variable names used in the do_call_closure and do_call_class_method variants. Since they are near-copies of each other, factor out their documentation. (Factoring out the code itself would be possible, but would not make maintenance easier and would make the code harder to read.) Provide a mechanism to gather statistics about the numbers of hidden and explicit arguments if the macro MR_DO_CALL_STATS is set. compiler/options.m: Add options that specify how many of these variants exist. These provide the necessary synchronization between the runtime and the compiler. They are not meant to be set from the command line, even by implementors. runtime/mercury_conf_params.h: Document MR_DO_CALL_STATS. runtime/mercury_wrapper.c: If MR_DO_CALL_STATS is set, print the gathered statistics when execution ends. runtime/mecury_mm_own_stack.c: Fix a typo that prevented the stage2 library from linking in jump.gc grade. compiler/llds.m: Provide a way to represent the labels of the new specialized variants. compiler/llds__out.m: Output the labels of the new specialized variants if required. Convert to four-space indentation. compiler/call_gen.m: Call the specialized variants of do_call_closure or do_call_class_method if they are applicable. code_info/follow_vars.m: code_info/interval.m: code_info/tupling.m: Conform to the change in call_gen.m. code_info/dupproc.m: code_info/exprn_aux.m: code_info/livemap.m: code_info/opt_util.m: Conform to the change in llds.m. compiler/code_info.m: Minor style cleanups. tools/bootcheck: Enable the collection of statistics from the compilation of stage 3 and the test cases, for use when the stage 2 is built with MR_DO_CALL_STATS enabled. tools/ho_call_stats: A new script to analyze the statistics collected. tools/makebatch: Add a new option --save-stage2-on-no-compiler, which is a variant of the existing option --save-stage2-on-error. |
||
|
|
3c60c0e485 |
Change a bunch of modules to import only one module per line, even
Estimated hours taken: 4 Branches: main compiler/*.m: Change a bunch of modules to import only one module per line, even from the library. compiler/mlds_to_il.m: compiler/mlds_to_managed.m: Convert these modules to our current coding style. Use state variables where appropriate. Use predmode declarations where possible. |
||
|
|
0786d5b00a |
Fix a code generator bug that manifested itself in profiling grades with
Estimated hours taken: 3
Branches: main
Fix a code generator bug that manifested itself in profiling grades with
(intermodule) inlining, the symptom being a compiler abort due to a stackvar
out of range (negative slot number). The problem was that we use the type of a
variable to decide whether the variable has a real stack slot or not, but
we were using the original type of an argument of a foreign_proc to decide
whether the argument should be passed or not. Inlining could cause an argument
whose original type is T (e.g. the arguments of cc_multi_equal) to be
instantiated, e.g. to io__state. Since the type of the variable is io__state,
it gets a dummy stack slot; since the original type of the argument is T,
we pass the variable and thus refer to the dummy stack slot.
The fix is the use the actual type of the variable to make the decision,
not the original type of the argument.
This fix fixes the failures of the deforest_cc_bug.m test case in deep
profiling grades and the tricky_try_store.m test case in profiling grades.
compiler/llds.m:
Add an extra argument to the pragma_c_{in,out}put function symbols
to hold the actual type of the variable being passed.
Add field names to those function symbols.
compiler/pragma_c_gen.m:
Fill in the new argument.
Clean up some code, mostly by factoring out duplicate code fragments.
compiler/llds_out.m:
Handle the new argument to decide whether to pass the argument.
compiler/exprn_aux.m:
compiler/livemap.m:
compiler/middle_rec.m:
compiler/opt_util.m:
Handle the new argument in pragma_c_{in,out}puts.
|
||
|
|
885fd4a387 |
Remove almost all dependencies by the modules of parse_tree.m on the modules
Estimated hours taken: 12 Branches: main Remove almost all dependencies by the modules of parse_tree.m on the modules of hlds.m. The only such dependencies remaining now are on type_util.m. compiler/hlds_data.m: compiler/prog_data.m: Move the cons_id type from hlds_data to prog_data, since several parts of the parse tree data structure depend on it (particularly insts). Remove the need to import HLDS modules in prog_data.m by making the cons_ids that refer to procedure ids refer to them via a new type that contains shrouded pred_ids and proc_ids. Since pred_ids and proc_ids are abstract types in hlds_data, add predicates to hlds_data to shroud and unshroud them. Also move some other types, e.g. mode_id and class_id, from hlds_data to prog_data. compiler/hlds_data.m: compiler/prog_util.m: Move predicates for manipulating cons_ids from hlds_data to prog_util. compiler/inst.m: compiler/prog_data.m: Move the contents of inst.m to prog_data.m, since that is where it belongs, and since doing so eliminates a circular dependency. The separation doesn't serve any purpose any more, since we don't need to import hlds_data.m anymore to get access to the cons_id type. compiler/mode_util.m: compiler/prog_mode.m: compiler/parse_tree.m: Move the predicates in mode_util that don't depend on the HLDS to a new module prog_mode, which is part of parse_tree.m. compiler/notes/compiler_design.m: Mention prog_mode.m, and delete the mention of inst.m. compiler/mercury_to_mercury.m: compiler/hlds_out.m: Move the predicates that depend on HLDS out of mercury_to_mercury.m to hlds_out.m. Export from mercury_to_mercury.m the predicates needed by the moved predicates. compiler/hlds_out.m: compiler/prog_out.m: Move predicates for printing parts of the parse tree out of hlds_out.m to prog_out.m, since mercury_to_mercury.m needs to use them. compiler/purity.m: compiler/prog_out.m: Move predicates for printing purities from purity.m, which is part of check_hlds.m, to prog_out.m, since mercury_to_mercury.m needs to use them. compiler/passes_aux.m: compiler/prog_out.m: Move some utility predicates (e.g. for printing progress messages) from passes_aux.m to prog_out.m, since some predicates in submodules of parse_tree.m need to use them. compiler/foreign.m: compiler/prog_data.m: Move some types from foreign.m to prog_data.m to allow the elimination of some dependencies on foreign.m from submodules of parse_tree.m. compiler/*.m: Conform to the changes above, mostly by updating lists of imported modules and module qualifications. In some cases, also do some local cleanups such as converting predicate declarations to predmode syntax and fixing white space. |
||
|
|
7bf0cd03af |
Reduce the overhead of all forms of tabling by eliminating in many cases
Estimated hours taken: 32
Branches: main
Reduce the overhead of all forms of tabling by eliminating in many cases
the overhead of transferring data across the C/Mercury boundary. These
involve lots of control transfers as well as assignments to and from
Mercury abstract machine registers, which are not real machine registers
on x86 machines. Benchmarking in Uppsala revealed this overhead to be
a real problem.
The way we do that is by changing the tabling transformation so that instead
of generating sequences of calls to predicates from library/table_builtin.m,
we generate sequences of calls to C macros from runtime/mercury_tabling_pred.h,
and emit the resulting code string as the body of a foreign_proc goal.
(The old transformation is still available via a new option,
--no-tabling-via-extra-args.)
Since the number of inputs and outputs of the resulting C code sequences
are not always fixed (they can depend on the number of input or output
arguments of predicate being transformed), implementing this required
adding to foreign_procs a new field that allows the specification of extra
arguments to be passed to and from the given foreign code fragment. For now,
this mechanism is implemented only by the C backends, since it is needed
only by the C backends. (We don't support yet tabling on other backends.)
To simplify the new implementation of the field on foreign_procs, consolidate
three existing fields into one. Each of these fields was a list with one
element per argument, so turning them into a single list with a combined record
per argument should also improve reliability, since it reduces the likelyhood
of updates leaving the data structure inconsistent.
The goal paths of components of a tabled predicate depend on whether
-no-tabling-via-extra-args was specified. To enable the expected outputs
of the debugger test cases testing tabling, we add a new mdb command,
goal_paths, that controls whether goal paths are printed by the debugger
at events, and turn off the printing of events in the relevant test cases.
Also, prepare for a future change to optimize the trie structure for
user-defined types by handling type_infos (and once we support them,
typeclass_infos) specially.
compiler/table_gen.m:
Change the tabling transformation along the lines described above.
To allow us to factor out as much of the new code as possible,
we change the meaning of the call_table_tip variable for minimal
model subgoals: instead of the trie node at the end of the answer
table, it is not now the subgoal reachable from it. This change
has no effect as yet, because we use call_table_tip variables
only to perform resets across retries in the debugger, and we
don't do retries across calls to minimal model tabled predicates.
Put predicates into logical groups.
library/table_builtin.m:
runtime/mercury_tabling_preds.h:
When the new transformations in table_gen.m generate foreign_procs
with variable numbers of arguments, the interfaces of those
foreign_procs often do not match the interfaces of the existing
library predicates at their core: they frequently have one more
or one fewer argument. To prevent any possible confusion, in such
cases we add a new variant of the predicate. These predicates
have the suffix _shortcut in their name. Their implementations
are dummy macros that do nothing; they serve merely as placeholders
before or after which the macros that actually do the work are
inserted.
Move the definitions of the lookup, save and restore predicates
into mercury_tabling_preds.h. Make the naming scheme of their
arguments more regular.
runtime/mercury_minimal_model.c:
runtime/mercury_tabling_preds.h:
Move the definition of a predicate from mercury_minimal_model.c
to mercury_tabling_preds.h, since the compiler now needs to be
able to generate an inlined version of it.
compiler/hlds_goal.m:
Replace the three existing fields describing the arguments of
foreign_procs with one, and add a new field describing the extra
arguments that may be inserted by table_gen.m.
Add utility predicates for processing the arguments of foreign_procs.
Change the order of some existing groups of declarations make it
more logical.
compiler/hlds_pred.m:
runtime/mercury_stack_layout.h:
Extend the data structures recording the structure of tabling tries
to allow the representation of trie steps for type_infos and
typeclass_infos.
runtime/mercury_tabling_macros.c:
Fix a bug regarding the tabling of typeclass_infos, which is now
required for a clean compile.
compiler/pragma_c_gen.m:
compiler/ml_code_gen.m:
Modify the generation of code for foreign_procs to handle extra
arguments, and to conform to the new data structures for foreign_proc
arguments.
compiler/llds.m:
The tabling transformations can now generate significantly sized
foreign_procs bodies, which the LLDS code generator translates to
pragma_c instructions. Duplicating these by jump optimization
may lose more by worsening locality than it gains in avoiding jumps,
so we add an extra field to pragma_c instructions that tells jumpopt
not to duplicate code sequences containing such pragma_cs.
compiler/jumpopt.m:
Respect the new flag on pragma_cs.
compiler/goal_util.m:
Add a predicate to create foreign_procs with specified contents,
modelled on the existing predicate to create calls.
Change the order of the arguments of that existing predicate
to make it more logical.
compiler/polymorphism.m:
Conform to the new definition of foreign_procs. Try to simplify
the mechanism for generating the type_info and typeclass_info
arguments of foreign_proc goals, but it is not clear that this
code is even ever executed.
compiler/aditi_builtin_ops.m:
compiler/assertion.m:
compiler/bytecode_gen.m:
compiler/clause_to_proc.m:
compiler/code_gen.m:
compiler/code_info.m:
compiler/code_util.m:
compiler/constraint.m:
compiler/deep_profiling.m:
compiler/deforest.m:
compiler/delay_construct.m:
compiler/dependency_graph.m:
compiler/det_analysis.m:
compiler/det_report.m:
compiler/dnf.m:
compiler/dupelim.m:
compiler/equiv_type_hlds.m:
compiler/exprn_aux.m:
compiler/follow_code.m:
compiler/follow_vars.m:
compiler/frameopt.m:
compiler/goal_form.m:
compiler/goal_path.m:
compiler/higher_order.m:
compiler/higher_order.m:
compiler/hlds_module.m:
compiler/hlds_out.m:
compiler/inlining.m:
compiler/ite_gen.m:
compiler/layout_out.m:
compiler/livemap.m:
compiler/liveness.m:
compiler/llds_out.m:
compiler/loop_inv.m:
compiler/magic.m:
compiler/make_hlds.m:
compiler/mark_static_terms.m:
compiler/middle_rec.m:
compiler/modes.m:
compiler/modules.m:
compiler/opt_debug.m:
compiler/pd_cost.m:
compiler/prog_rep.m:
compiler/purity.m:
compiler/quantification.m:
compiler/reassign.m:
compiler/rl_exprn.m:
compiler/saved_vars.m:
compiler/simplify.m:
compiler/size_prof.m:
compiler/store_alloc.m:
compiler/stratify.m:
compiler/switch_detection.m:
compiler/term_pass1.m:
compiler/term_traversal.m:
compiler/termination.m:
compiler/trace.m:
compiler/typecheck.m:
compiler/unify_proc.m:
compiler/unique_modes.m:
compiler/unneeed_code.m:
compiler/unused_args.m:
compiler/use_local_vars.m:
Conform to the new definition of foreign_procs, pragma_cs and/or
table trie steps, or to changed argument orders.
compiler/add_heap_ops.m:
compiler/add_trail_ops.m:
compiler/cse_detection.m:
compiler/dead_proc_elim.m:
compiler/equiv_type.m:
compiler/intermod.m:
compiler/lambda.m:
compiler/lco.m:
compiler/module_util.m:
compiler/opt_util.m:
compiler/stack_opt.m:
compiler/trans_opt.m:
Conform to the new definition of foreign_procs.
Bring these modules up to date with our current code style guidelines,
using predmode declarations, state variable syntax and unification
expressions as appropriate.
compiler/mercury_compile.m:
Conform to the changed argument order of a predicate in trans_opt.m.
compiler/options.m:
Add the --no-tabling-via-extra-args option, but leave the
documentation commented out since the option is for developers only.
doc/user_guide.texi:
Document --no-tabling-via-extra-args option, though leave the
documentation commented out since the option is for developers only.
doc/user_guide.texi:
doc/mdb_categories:
Document the new goal_paths mdb command.
trace/mercury_trace_internals.c:
Implement the new goal_paths mdb command.
tests/debugger/completion.exp:
Conform to the presence of the goal_paths mdb command.
tests/debugger/mdb_command_test.inp:
Test the existence of documentation for the goal_paths mdb command.
tests/debugger/print_table.{inp,exp*}:
tests/debugger/retry.{inp,exp*}:
Use the goal_paths command to avoid having the expected output
depend on the presence or absence of --tabling-via-extra-args.
tests/tabling/table_foreign_output.{m,exp}:
Add a new test case to test the save/restore of arguments of foreign
types.
tests/tabling/Mmakefile:
Enable the new test case.
tests/tabling/test_tabling:
Make this script more robust.
Add an option for testing only the standard model forms of tabling.
|