mirror of
https://github.com/Mercury-Language/mercury.git
synced 2026-04-21 20:33:55 +00:00
b357a3dadcdf29c8d8ca184d65f7634a1a94de36
30 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
13b6f03f46 |
Module qualify end_module declarations.
compiler/*.m:
Module qualify the end_module declarations. In some cases, add them.
compiler/table_gen.m:
Remove an unused predicate, and inline another in the only place
where it is used.
compiler/add_pragma.m:
Give some predicates more meaningful names.
|
||
|
|
3c16f614df |
Remove Jerome Tannier's old implicit parallelization transformation, since it
Estimated hours taken: 0.3 Branches: main, release Remove Jerome Tannier's old implicit parallelization transformation, since it is obsolete, (due to the compiler aborts it generates) not useful even as a baseline for comparisons, and a maintenance burden. Divide the remainder of implicit_parallelism.m into two submodules. compiler/implicit_parallelism.m: Make this file a package containing no code. Add a comment about where to find Jerome's code. compiler/introduce_parallism.m: The rest of implicit_parallelism.m from a week ago. compiler/push_goals_together.m: The transformation I recently added to implicit_parallelism.m. compiler/options.m: Remove the option calling for Jerome's transformation. compiler/mercury_compile_middle_passes.m: Conform to the changes above. compiler/follow_code.m: Remove some obsolete imports. compiler/notes/compiler_design.html: Document the new modules, as well as implicit_parallelism.m (which should have already been listed, but wasn't.) |
||
|
|
91314c58d8 |
Add support for pushing expensive goals in different conjunctions
Estimated hours taken: 20 Branches: main compiler/implicit_parallelism.m: Add support for pushing expensive goals in different conjunctions into the same conjunction, so we can parallelize that conjunction. This support is not yet tested, but Paul should now be able to test it. compiler/follow_code.m: compiler/goal_util.m: Minor style improvements. |
||
|
|
1c3bc03415 |
Make the system compiler with --warn-unused-imports.
Estimated hours taken: 2 Branches: main, release Make the system compiler with --warn-unused-imports. browser/*.m: library/*.m: compiler/*.m: Remove unnecesary imports as flagged by --warn-unused-imports. In some files, do some minor cleanup along the way. |
||
|
|
0c42f810c2 |
Start working on the 'goal push' feedback.
This feedback information is part of automatic parallelisation feedback. It
describes cases where goals after a branch goal but in the same conjunction
should be pushed into the branches of the branching goal. This can allow the
pushed goal to be parallelised against goals that already exist in one or more
arms of the branch goal without parallelising the whole branch goal.
This change simply creates the data-structures within the feedback framework on
which this feature will be based.
nmdbcomp/feedback.automatic_parallelism.m:
Introduce new push_goal structure that describes the transformation.
mdbcomp/feedback.m:
Incremented feedback format version number.
deep_profiler/mdprof_fb.automatic_parallelism.m:
compiler/implicit_parallelism.m:
Conform to changes in feedback.automatic_parallelism.m.
The code to generate or use this feedback has not been implemented, that
will come later.
|
||
|
|
a2cd0da5b3 |
The existing representation of goal_paths is suboptimal for several reasons.
Estimated hours taken: 80 Branches: main The existing representation of goal_paths is suboptimal for several reasons. - Sometimes we need forward goal paths (e.g. to look up goals), and sometimes we need reverse goal paths (e.g. when computing goal paths in the first place). We had two types for them, but - their names, goal_path and goal_path_consable, were not expressive, and - we could store only one of them in goal_infos. - Testing whether goal A is a subgoal of goal B is quite error-prone using either form of goal paths. - Using a goal path as a key in a map, which several compiler passes want to do, requires lots of expensive comparisons. This diff replaces most uses of goal paths with goal ids. A goal id is an integer, so it can be used as a key in faster maps, or even in arrays. Every goal in the body of a procedure gets its id allocated in a depth first search. Since we process each goal before we dive into is descendants, the goal representing the whole body of a procedure always gets goal id 0. The depth first traversal also builds up a map (the containing goal map) that tells us the parent goal of ever subgoal, with the obvious exception of the root goal itself. From the containing goal map, one can compute both reverse and forward goal paths. It can also serve as the basis of an efficient test of whether the goal identified by goal id A is an ancestor of another goal identified by goal id B. We don't yet use this test, but I expect we will in the future. mdbcomp/program_representation.m: Add the goal_id type. Replace the existing goal_path and goal_path_consable types with two new types, forward_goal_path and reverse_goal_path. Since these now have wrappers around the list of goal path steps that identify each kind of goal path, it is now ok to expose their representations. This makes several compiler passes easier to code. Update the set of operations on goal paths to work on the new data structures. Add a couple of step types to represent lambdas and try goals. Their omission prior to this would have been a bug for constraint-based mode analysis, or any other compiler pass prior to the expansion out of lambda and try goals that wanted to use goal paths to identify subgoals. browser/declarative_tree.m: mdbcomp/rtti_access.m: mdbcomp/slice_and_dice.m: mdbcomp/trace_counts.m: slice/mcov.m: deep_profiler/*.m: Conform to the changes in goal path representation. compiler/hlds_goal: Replace the goal_path field with a goal_id field in the goal_info, indicating that from now on, this should be used to identify goals. Keep a reverse_goal_path field in the goal_info for use by RBMM and CTGC. Those analyses were too hard to convert to using goal_ids, especially since RBMM uses goal_paths to identify goals in multi-pass algorithms that should be one-pass and should not NEED to identify any goals for later processing. compiler/goal_path: Add predicates to fill in goal_ids, and update the predicates filling in the now deprecated reverse goal path fields. Add the operations needed by the rest of the compiler on goal ids and containing goal maps. Remove the option to set goal paths using "mode equivalent steps". Constraint based mode analysis now uses goal ids, and can now do its own equivalent optimization quite simply. Move the goal_path module from the check_hlds package to the hlds package. compiler/*.m: Conform to the changes in goal path representation. Most modules now use goal_ids to identify goals, and use a containing goal map to convert the goal ids to goal paths when needed. However, the ctgc and rbmm modules still use (reverse) goal paths. library/digraph.m: library/group.m: library/injection.m: library/pprint.m: library/pretty_printer.m: library/term_to_xml.m: Minor style improvements. |
||
|
|
91e60619b0 |
Remove the concept of 'partitions' from the candidate parallel conjunction
mdbcomp/feedback.automatic_parallelism.m:
Remove the concept of 'partitions' from the candidate parallel conjunction
type. We no-longer divide conjunctions into partitions before
parallelising them.
mdbcomp/feedback.m:
Increment the feedback format version number.
compiler/implicit_parallelism.m:
Conform to changes in mdbcomp/feedback.automatic_parallelism.m.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Allow the non-atomic goals to be parallelised against one-another.
Modify the goal annotations used internally, many annotations used only for
calls are now used for any goal type.
Variable use information is now stored in a map from variable name to lazy
use data for every goal, not just for the arguments of calls.
Do not partition conjunctions before attempting to parallelise them.
Make the adjust_time_for_waits tolerate floating point errors more easily.
Format costs with commas and, in most cases, two decimal places.
deep_profiler/var_use_analysis.m:
Export a new predicate var_first_use that computes the first use of a
variable within a goal. This predicate uses a new typeclass to retrieve
coverage data from any goal that can implement the typeclass.
deep_profiler/measurements.m:
Added a new abstract type for measuring the cost of a goal, goal_cost_csq.
This is like cs_cost_csq except that it can represent trivial goals (which
don't have a call count).
deep_profiler/coverage.m:
Added deterministic versions of the get_coverage_* predicates.
deep_profiler/program_representation_utils.m:
Made initial_inst_map more generic in its type signature.
Add a new predicate, atomic_goal_is_call/2 which can be used instead of a
large switch on an atomic_goal_rep value.
deep_profiler/message.m:
Rename a message type to make it more general, this is required now that we
compute variable use information for arbitrary goals, not just calls.
library/list.m:
Add map3_foldl.
NEWS:
Announced change to list.m.
|
||
|
|
58211e2f2e |
Allow more than 2^15 vars in a procedure representation.
Estimated hours taken: 12 Branches: main Allow more than 2^15 vars in a procedure representation. mdbcomp/program_representation.m: Allow a variable number to be represented by four bytes as well as two and one. This means that we also have to represent the number of variables in a procedure using a four-byte number, not a two-byte number. Use four bytes to represent line numbers. Programs that overflow 16-bit var numbers may also overflow 16 bit line numbers. These requires a change in the deep profiler data's binary compatibility version number. compiler/prog_rep.m: Encode vars using four bytes if necessary. Be consistent in using only signed 8-bit as well as signed 16-bit numbers. compiler/implicit_parallelism.m: Conform to the change in program_representation.m. deep_profiler/profile.m: deep_profiler/read_profile.m: Add a compression flag to the set of flags read from the data file. Put the flags into the profile_stats as a group, not one-by-one. deep_profiler/canonical.m: deep_profiler/create_report.m: deep_profiler/dump.m: deep_profiler/mdprof_feedback.m: deep_profiler/old_html_format.m: deep_profiler/old_query.m: deep_profiler/query.m: Conform to the change in profile.m. runtime/mercury_deep_profiling.c: Prepare for compression of profiling data files being implemented. runtime/mercury_stack_layout.h: Fix some documentation rot. runtime/mercury_conf_param.h: Add an implication between debug flags to make debugging easier. |
||
|
|
d3011e03b0 |
Changes that make implicit parallelism easier to test.
compiler/implicit_parallelism.m
The implicit parallelism transformation emits a warning if it cannot match
feedback data to the program being compiled. With the default
--halt-at-warn this aborts compilation which is impractical since the user
cannot easily control the compiler's ability to honour the feedback data.
For example, the internal representation of the program may be different in
for profiling builds compared to release builds, even with similar
compilation options.
Therefore this warning is now informational and it does not cause
compilation to abort.
tools/speedtest:
Add a new command line option -1. This causes the speedtest script to run
the compiler against a single module only (typecheck.m). This is useful
for generating representative Deep.data files for automatic
parallelisation.
|
||
|
|
7425922921 |
Refactor mdbcomp/feedback.m
Move automatic parallelisation specific code to a new module mdbcomp/feedback.automatic_parallelism.m. mdbcomp/feedback.m: mdbcomp/feedback.automatic_parallelism.m: As above. slice/Mmakefile deep_profiler/Mmakefile Copy the new file into the current working directory when with the other mdbcomp files. compiler/implicit_parallelism.m: deep_profiler/mdprof_fb.automatic_parallelism.m: deep_profiler/mdprof_feedback.m: deep_profiler/measurements.m: Import the new module to access code that used to be in feedback.m Remove unused module imports. |
||
|
|
543fc6e342 |
Change the way the typechecker iterates over the predicates of the program.
Estimated hours taken: 12 Branches: main Change the way the typechecker iterates over the predicates of the program. We used to do it by looking up each predicate in the module_info, typechecking it, and putting it back into the module_info. We now do it by converting the predicate table into a list, iterating over the list transforming each pred_info in it, converting the updated list back to a predicate table. The original intention of this change was to allow different predicates to be typechecked in parallel by removing a synchronization bottleneck: the typechecking of a predicate now doesn't have to wait for the typechecking of the previous predicate to generate the updated version of the module_info. However, it turned out that the change is good for sequential execution as well, improving the time on tools/speedtest from 11.33 seconds to 11.08 seconds, a speedup of 2.2%. On tools/speedtest -l, which tests the compilation of more modules, the speedup is even better: 3.1% (from 32.63 to 31.60s). compiler/typecheck.m: Implement the above change. compiler/hlds_module.m: compiler/pred_table.m: Add a new operation, setting the list of valid pred_ids, now needed by typecheck.m, to both modules. Make the names of the predicates for accessing the predicate table more expressive, and make them conform to our naming conventions. compiler/*.m: Trivial changes to conform to the change in hlds_module.m. library/assoc_list.m: Add new predicates used by the new version of typecheck.m (at some time in its development). NEWS: Mention the new predicates. library/list.m: Improve documentation that is now copied to assoc_list.m. tools/speedtest: Make the test command more easily configurable. |
||
|
|
531c2d94ea |
Automatic Parallelisation Improvements.
Factor in all the costs of parallelistion into the parallel overlap estimation
algorithm. Previously only some costs where being taken into consideration.
Independent parallelsations are now generally preferred as they have fewer
overheads for similar parallelsations.
Generalised the branch and bound search algorithm into a new Mercury module.
mdbcomp/feedback.m:
Grouped candidate parallel conjunction parameters into a single type.
Added extra parameters:
future_signal_cost
future_wait_cost
context_wakeup_delay.
The first two replace locking cost, they are the costs of the signal and
wait calls for futures respectively. The third represents the length of
time for a context to begin executing after it has been placed on the run
queue. It is used to estimate the cost of blocking.
Refactored the parallel_exec_metrics type to make representing overheads easier.
Modify parallel_exec_metrics so that it can represent the cost of calling
signal in the left conjunct of any conjunct pair.
Modify parallel_exec_metrics so that it stores the parallel execution time
of the initial (leftmost) conjunct. This is necessary as the parallel
execution time includes the cost of the 'fork' call of the next conjunct.
Modify parallel_exec_metrics to record the cost of blocking for the
leftmost conjunct if it completes before the parallel conjunction completes
as a whole.
Increment the feedback file format version number.
compiler/implicit_parallelism.m:
Conform to changes in mdbcomp/feedback.m.
deep_profiler/branch_and_bound.m:
A generic branch and bound solver loop and utilities.
The modified branch and bound code includes a profiling facility.
deep_profiler/Mercury.options:
The new branch_and_bound module supports the debug_branch_and_bound trace
flag.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Generalise and move branch and bound code to branch_and_bound.m
Removed the candidate_parallel_conjunctions_opts type, we now use the
candidate_par_conjunctions_params type in its place.
Modify the code for parallelising conjunctions so that it works with lists
of goals rather than cords of goals.
Factor out the code tha looks for the next costly call, this is now handled
by a preprocessing pass so that it has linear time rather than increasing
the complexity of the search code.
Documented some predicates in more detail.
deep_profiler/mdprof_feedback.m:
Conform to changes in deep_profiler/mdprof_fb.automatic_parallelism.m and
mdbcomp/feedback.m
Add command line support for the new candidate parallel conjunctions
feedback parameters.
|
||
|
|
c877dceb2b |
Refactor profiler feedback code for implicit parallelism.
This change mostly re-factors the goal representation used to feedback implicit
parallelism information to the compiler. The goal_rep datatype is now used
rather than the much simpler datatype. (goal_rep is the same type that is used
by the declarative debugger).
This makes it easier for the compiler to match HLDS goals against goals from
the implicit parallelism analysis and will probably help in the future if the
analysis wants the compiler to re-order goals.
It also makes it easier to pretty-print the feedback sent to the compiler in
more detail.
mdbcomp/feedback.m:
As above, redefine pard_goal as a type alias to
goal_rep(pard_goal_annotation).
Added a new type, candidate_par_conjunctions_proc, it represents candidate
parallelisations within a procedure along with shared information for the
procedure.
Add a new predicate, convert_candidate_par_conjunctions_proc.
Increment the feedback file format version number.
mdbcomp/program_representation.m:
XXX: See about refactoring bytecode in/out put into one place.
Add a new predicate transform_goal_rep for transforming a goal_rep
structure from one arbitrary annotation type to another.
Add extra predicates to aid in converting a prog_rep structure to and from
bytecode. This includes cut_byte/2 and can_fail_byte/2.
deep_profiler/program_representation_utils.m:
Export print_goal_to_strings/4 so that it can be used when printing the
feedback file reports.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Conform to changes in mdbcomp/feedback.m
Wrap some lines at 76 characters.
Improve explanations in comments.
Use the goal_rep pretty-printer to print the candidate parallel
conjunctions feedback report.
deep_profiler/mdprof_feedback.m:
Conform to changes in deep_profiler/mdprof_fb.automatic_parallelism.m
deep_profiler/program_representation_utils.m:
Modify print_goal_to_strings to print determinisms and annotations on
separate lines before each goal.
deep_profiler/display_report.m:
Modify pretty printing of coverage annotations so that they make sense
after modifying print_goal_to_strings/4.
compiler/implicit_parallelism.m:
Refactor goal matching code that compares HLDS goals to feedback goals.
Goal matching is now more accurate and can more easily support goal
re-ordering when parallelising code (this is not implemented yet).
The code that builds parallel conjunctions has also been refactored.
This pass now generates warnings if it is not able to parallelise
a candidate parallel conjunction in the feedback data.
Insert deeper and later parallelizations before shallower or earlier ones,
this makes it easier to continue to parallelise a procedure as it's goal
tree changes due to parallelisation.
Silently ignore duplicate candidate parallel conjunctions.
Refuse to parallelise a procedure that has been parallelized explicitly.
compiler/prog_rep.m:
Re-factor the hlds_goal to bytecode transformation, this transformation now
goes via goal_rep. We use the hlds_goal to goal_rep portion of this
transformation in compiler/implicit_parallelism.m.
Add variable names prefixed with DCG_ to the list of those introduced by
the compiler.
compiler/goal_util.m:
Modify maybe_transform_goal_at_goal_path so that it returns a value that
can describe the different kinds of error that may be encountered.
Add a new predicate, maybe_transform_goal_at_goal_path_with_instmap. Given
a goal, goal path and initial inst map this predicate recurses the goal
structure following the goal path and maintaining the inst map. It then
uses a higher order value to transform the goal at it's destination before
re-constructing the goal. It is different to
maybe_transform_goal_at_goal_path in that it passes the instmap to it's
higher order argument, the instmap is correct for the state immediately
before executing the goal in question.
compiler/hlds_pred.m:
Include the procedure's varset in the information used to construct the
program representation data that is included in deep profiling builds.
compiler/instmap.m:
Add a useful function, apply_instmap_delta_sv. This is the same as
apply_instmap_delta except that it's arguments are in a more convenient
order for state variable notation.
compiler/stack_layout.m:
Export compute_var_number_map for the use of implicit_parallelism.m and
prog_rep.m
compiler/error_util.m:
Add a new error phase, 'phase_auto_parallelism'. This is used for warnings
issued from the automatic parallelisation transformation.
compiler/deep_profiling.m:
Conform to changes in hlds_pred.m
compiler/mercury_compile_middle_passes.m:
Conform to changes in implicit_parallelism.m
compiler/type_constraints.m:
Conform to changes in goal_util.
|
||
|
|
452dcd116c |
mdprof_feedback improvements.
Add an option to mdprof_feedback to print the feedback report without modifying
it. This option also avoids reading and parsing a Deep.data file, this makes
it quick and convenient if you just wish to view the feedback report.
deep_profiler/mdprof_feedback.m:
As above,
These changes make it necessary for the feedback_info structure to store
the program's name that the feedback is generated for. mdprof_feedback now
also checks that the program names in the feedback file, and deep profiling
data match.
mdbcomp/feedback.m:
Store the name of the program in the feedback_info structure and provide
methods to query this.
read_or_create now takes a new parameter, the name of the program that
we're creating a feedback file for. Or if a feedback file already exists,
the name that is checked against the one in the existing feedback file.
init_feedback_file now takes a new parameter, the name of the program that
this feedback_info structure is for.
These changes haven't changed the format of the feedback file, it always
contained the program's name. Therefore the feedback file version number
has not been incremented.
compiler/globals.m:
The feedback field in the compiler's globals structure now has the type
maybe(feedback). If feedback data couldn't be, or wasn't read then empty
feedback data is no longer used.
compiler/handle_options.m:
Conform to changes in mdbcomp/feedback.m and compiler/globals.m.
compiler/implicit_parallelism.m:
Conform to changes in compiler/globals.m
|
||
|
|
b7f0270f36 |
Implement the new algorithm for calculating how dependant parallel conjuncts'
executions overlap. This algorithm has also been generalised to handle cases
where there are more than two conjuncts in a parallel conjunction. A number of
other improvements have also been made.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Wrote dependant parallel conjunction overlap analysis algorithm (as above).
This algorithm introduced a new structure, parallel_execution_overlap.
This structure describes how dependant parallel executions overlap.
Use both sparking cost and sparking delay as costs of parallelisation.
Sparking cost is the cost from the perspective of the sparker, whereas
delay is the delay between creating the spark and actually beginning the
execution of the spark.
Handle pretty-printing of the candidate parallel conjunction structure.
Include variable identifiers as well as canonical names in the
pardgoal_type structure.
The inst_map_info structure has been modified to contain the sets of
consumed and produced variables separately, rather than simply containing a
set of all consumed and produced variables.
Improve the readability of messages printed by trace goals.
The search code no longer attempts to look up procedure bodies for code
whose module is "Mercury runtime".
Conjunctions that did not have a speedup due to parallelisation are now
printed out by a new trace goal.
deep_profiler/mdprof_feedback.m:
Include support for pretty printing the feedback information after
creating it. This is handled by the new --report command line option.
Include a new --implicit-parallelism-sparking-delay command line option.
This may be used to specify how long it takes an engine to steal a spark.
mdbcomp/feedback.m:
Export the sparking delay as part of the feedback information.
Create a new structure parallel_exec_metrics which contains many metrics
about parallel execution performance. This is exported for each candidate
parallel conjunction rather than only exporting the Speedup.
Create predicates for creating and querying the parallel_exec_metrics
structure.
Create a new predicate, get_all_feedback_data/2, this is used to retrieve
all the data for building the report in the mdprof_feedback tool.
Increment the feedback file format version number.
deep_profiler/message.m:
Improve the readability of the messages printed due to verbosity settings.
Export some predicates that can be used for managing indentation while
pretty-printing structures.
compiler/implicit_parallelism.m:
Conform to changes in feedback_data_candidate_parallel_conjunctions.
Add a pi_sparking_delay field to parallelism information.
deep_profiler/program_representation_utils.m:
Fix a bug in calc_inst_map_delta/3.
Correct a comment for inst_map_ground_vars/5.
deep_profiler/cliques.m:
Fixed a minor indentation issue.
deep_profiler/Mercury.options:
Document the new trace goal that enables printing of candidate parallel
conjunctions that do not result in a speedup.
|
||
|
|
3d6770a091 |
Refactor feedback parallelisation code.
These changes rename some poorly named types from inner_goal to pard_goal.
'pard' means 'parallelised'. This is explained in a comment near this type.
The candidate_par_conjunction type has been made polymorphic on the type that
it uses to represent individual goals. This is easier than using slightly
different candidate_par_conjunction types in different modules.
mdbcomp/feedback.m:
Changes to types as above.
Introduce predicates to convert candidate_par_conjunctions from one type to
anther given a function to convert the type of goal used.
Increment the feedback file format version number.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Remove our alternative candidate_par_conjunction types in favor of the
polymorphic type in feedback.m
Rename the type inner_goal_internal to pard_goal_detail.
Rename occurrences inner_goal or InnerGoal to pard_goal or PardGoal.
Use the generic conversion code in feedback.m to convert between different
types of candidate_par_conjunction.
Conform to changes in mdbcomp/feedback.m
compiler/implicit_parallelism.m:
Rename occurrences inner_goal or InnerGoal to pard_goal or PardGoal.
Conform to changes in mdbcomp/feedback.m
|
||
|
|
25ed5e004d |
Add an option to mdprof_feedback to control whether the automatic
parallelisation feedback will recommend parallelising dependant conjunctions or
not. The compiler will now parallelise both independent and dependant
conjunctions.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Add a field to the candidate_parallel_conjunctions_opts structure to
represent whether we should parallelise dependant conjunctions.
Use this flag to determine if a dependant conjunction should be recommended
for parallelisation in innergoals_build_candidate_conjunction.
deep_profiler/mdprof_feedback.m:
Add the actual command line argument.
Update the --help message.
Conform to changes in mdprof_fb.automatic_parallelism.m.
compiler/implicit_parallelism.m:
Previously the compiler would not automatically parallelise dependant
conjunctions, this restriction has been removed as the control is now
available in the mdprof_feedback tool.
|
||
|
|
79c3f39a68 |
Implicit parallelism work.
The implicit parallelism algorithm, feedback file format and therefore compiler
have been updated. They now support parallelisation across other goals and, in
theory, parallelising three or more calls against one another. The algorithm
is far from complete and very much experimental, it has been tested on a
modified version of icfp_2000 where it improves the runtime. Note that
automatic parallelisation of dependant conjunctions is disabled for now.
mdbcomp/feedback.m:
Modify deep profiling feedback data, a candidate parallel conjunct now
contains a list of sequential conjunctions that contain other goals.
Previously only two calls to be parallelised against one-another where
listed.
Document restrictions on the new candidate parallel conjunct structure that
can't be expressed by the type system.
Incremented the feedback file format number.
mdbcomp/program_representation.m:
Made a semidet predicate version of empty_goal_path.
Created maybe_search_var_name which returns it's result wrapped in a maybe
structure, this is a deterministic alternative to search_var_name. It is
useful as an argument to list.map
deep_profiler/mdprof_feedback.m:
When printing messages out to stderr also print the newlines between the
messages to stderr.
deep_profiler/measurements.m:
Re-aranged the order of arguments and added a four argument version for
sub_computation_parallelism.
Added a new function, some_parallelism/1, that initialises a parallelism
amount as.
deep_profiler/message.m:
Added extra messages.
Pretty-print program locations using the conventional string representation
for procedures and goal paths. Export the predicate that does this.
deep_profiler/program_representation_utils.m:
Export a predicate to format a procedure identifier nicely.
Add code for calculating and manipulating inst_map_delta objects similar to
those in the compiler.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Various code cleanups/simplifications.
Re-worked the parallelisation algorithm, it can now parallelise across
cheaper calls and (theoretically) handle parallel conjunctions with any
number of conjuncts.
Conform to new candidate parallel conjunction representation.
Internally use a structure similar to the candidate parallel conjunct
structure in feedback.m This makes the maybe_call_conjunct structure
obsolete, the old structure has been removed.
compiler/implicit_parallelism.m:
Updated implicit parallelism transformation to conform to the new feedback
file format.
compiler/goal_util.m:
Added goal_is_atomic/2
Modified create_conj_from_list to simply return the only goal in the list
when the list contains exactly one goal.
library/maybe.m:
Add a simple predicate (maybe_is_yes/2) that 'opens' a maybe and returns the result or
fails.
NEWS:
Announce maybe_is_yes/2
|
||
|
|
77a6a6c10c |
Implement several more changes that together speed up compilation time
Estimated hours taken: 16 Branches: main Implement several more changes that together speed up compilation time on training_cars_full by 12%, and also improve tools/speedtest -h by 7.2% and tools/speedtest by 1.6%. The first change is designed to eliminate the time that the compiler spends constructing error messages that are then ignored. The working predicates of prog_io_sym_name used to always return a single result, which either gave a description of the thing being looked, or an error message. However, in many places, the caller did not consider not finding the thing being looked for to be an error, and thus threw away the error message, keeping only the "not found" indication. For each predicate with such callers, this diff provides a parallel predicate that indicates "not found" simply by failing. This allows us to eliminate the construction of the error message, the preparation for the construction of the error message (usually by describing the context), and the construction of the "ok" wrapper. The second change is to specialize the handling of from_ground_term_construct scopes in the termination analyzer. To make this easier, I also cleaned up of the infrastructure of the termination analyzer. The third change is to avoid traversing from_ground_term_construct scopes in quantification.m when finding the variables in a goal, since termination analysis no longer needs the information it gathers. The fourth change is to avoid traversing second and later conjuncts in conjunctions twice. The first step in handling conjunctions is to call implicitly_quantify_conj, which builds up a data structure that pairs each conjunct with the variables that occur free in all the conjuncts following it. However, after this was done and each conjunct was annotated with its nonlocals, we used to compute the variables that occur free in the conjunction as a whole from scratch. This diff changes the code so that we now compute that set based on the information we gathered earlier, avoiding a redundant traversal. The fifth change is to create specialized, lower-arity versions of many of the predicates in quantification.m. These versions are intended for traversals that take place after the compiler has replaced lambda expressions with references to separate procedures. These traversals do not need to pass around arguments representing the variables occurring free in the (now non-existent) lambda expressions. compiler/prog_io_sym_name.m: Make the first change described above. Change some predicate names to adopt a consistent naming scheme in which predicates that do the same job and differ only in how they handle errors have names that differ only in a "try_" prefix. Add some predicate versions that do common tests on the output of the base versions. For example, try_parse_sym_name_and_no_args is a version of try_parse_sym_name_and_args that insists on finding an empty argument list. Remove the unused "error term" argument that we used to need a while ago. Move some predicate definitions to make their order match the order of their declarations. Turn a predicate into a function for its caller's convenience. compiler/term_constr_build.m: Make the second change described above by modeling each from_ground_term_construct scope as a single unification, assigning the total size of the ground term to the variable being built. compiler/term_constr_util.m: Put the arguments of some predicates into a more standard order. compiler/lp_rational.m: Change the names of some function symbols to avoid both the use of graphic characters that require quoting and clashes with other types. Change the names of some predicates to make their purpose clear, and to avoid ambiguity. compiler/quantification.m: Make the third, fourth and fifth changes described above. compiler/*.m: Conform to the changes above. |
||
|
|
0bbb6d07fa |
Support implicit parallelism in the compiler.
Estimated hours taken: 20 Branches: main Support implicit parallelism in the compiler. The compiler now uses the deep profiler feedback information to build a parallel version of a program. Changes have also been made to the feedback format for candidate parallel conjunctions and the analysis that recommends opportunities for parallelism to the compiler. compiler/implicit_parallelism.m: Mark Tannier's implementation as deprecated (it also crashes the compiler). Introduce new implicit parallelism transformation. apply_implicit_parallelism_transformation now returns maybe_error rather than maybe so that errors can be described. compiler/goal_util.m: Add a predicate to transform a goal referenced by a goal path within a larger goal structure and rebuild that structure. compiler/mercury_compile.m: Conform to changes in implicit_parallelism.m deep_profiler/mdprof_feedback.m: Return a cord of warnings from many predicates, these warnings are used to describe cases where parallelism might be profitable but it is not (yet) possible to transform the code into parallel code. Fix a bug whereby the wrong deep profiling statistic was used to calculate the cost of a call. Do not attempt to parallelise calls with other goals between them. mdbcomp/feedback.m: Remove the intermediate goals information from the candidate parallel conjunctions feedback data. mdbcomp/program_representation.m: Provide a in-order alternative to the goal_path type so that operations on the start of the goal path occur in constant time and goal_path itself remains usable as a key in arrays because it doesn't use the cord type internally. library/cord.m: Added a di/uo mode to cord.foldl_pred. library/list.m: Added list.find_index_of_match/4 to return the index of the first item in a list that satisfies the predicate given in the first argument. library/pqueue.m: Added pqueue.length/1 NEWS: Announce standard library changes. |
||
|
|
5ad9a27793 |
Speed up the compiler's handling of code that constructs large ground terms
Estimated hours taken: 80
Branches: main
Speed up the compiler's handling of code that constructs large ground terms
by specializing the treatment of such code.
This diff reduces the compilation time for training_cars_full.m from 106.9
seconds to 30.3 seconds on alys, my laptop. The time on tools/speedtest
stays pretty much the same.
compiler/hlds_goal.m:
Record the classification of from_ground_term scopes as purely
constructing terms, purely deconstructing them or something other.
Fix an old potential bug: variables inside the construct_how fields
of unifications weren't being renamed along with other variables.
This is a bug if any part of the compiler later looks at those
variables. (I am not sure whether or not this happens.)
compiler/superhomogenous.m:
Provisionally mark newly constructed static terms as being
from_ground_term_construct. Mode checking will either confirm this
or change the scope kind.
compiler/options.m:
compiler/handle_options.m:
Add a new option, from_ground_term_threshold, that allows the user to
set the boundary between ground terms that get scopes and ground terms
do not. I plan to experiment with different settings later.
compiler/modes.m:
Make this classification. For scopes that construct ground terms,
use a specialized algorithm that avoids quadratic behavior.
(It does not access the unify_inst_table, which is where the
factor of N other than the length of the goal list came from.)
The total size of the instmap_deltas, if printed out, still looks like
O(N^2) in size, but due to structure sharing it needs only O(N) memory.
For scopes that construct ground terms, set the determinism information
so that det_analysis.m doesn't have to traverse such scopes.
When handling disjunctions, check whether some nonlocals of the
disjunctions are constructed by from_ground_term_construct scopes.
For any such nonlocals, set their insts to just ground, throwing away
the precise information we have about exactly what function symbols
they and ALL their subterms are bound to. This is HUGE win, since
it allows us avoid spending a lot of time building a huge merge_inst
table, which later passes of the compiler (e.g. equiv_type_hlds) would
then have to spend similarly huge times traversing.
This approach does have a down side. If lots of arms of a disjunction
bind a nonlocal to a large ground term, but a few bind it to a SMALL
ground term, a term below the from_ground_term_threshold, this
optimization won't kick in. That could be one purpose of the new
option. It isn't documented yet; I will seek feedback about its
usefulness first.
compiler/modecheck_unify.m:
Handle the three different kinds of right hand sides separately.
This yields a small speedup, because now we don't test rhs_vars and
rhs_functors (the common right hand sides) for a special case
(goals containing "any" insts) that is applicable only to
rhs_lambda_goals.
compiler/unique_modes.m:
Don't traverse scopes that construct ground terms, since modes.m has
already done everything that needs to be done.
compiler/det_analysis.m:
Don't traverse scopes that construct ground terms, since modes.m has
already done the needed work.
compiler/instmap.m:
Add a new predicate for use by modes.m.
Many predicate names in this module were quite uninformative; give them
informative names.
compiler/polymorphism.m:
If this pass invalidates the from_ground_term_construct invariants,
then mark the relevant scope as from_ground_term_other.
Delete two unused access predicates.
compiler/equiv_type_hlds.m:
Don't traverse scopes that construct ground terms, since modes.m
ensures that their instmap deltas do not contain typed insts, and
thus the scope cannot contain types that need to be expanded.
Convert some predicates to single clauses.
compiler/goal_form.m:
compiler/goal_util.m:
In predicates that test goals for various properties, don't traverse
scopes that construct ground terms when the outcome of the test
is the same for all such scopes.
Convert some predicates to single clauses.
compiler/simplify.m:
Do not look for common structs in from_ground_term_construct scopes,
both because this speeds up the compiler, and because retaining
references to ground terms is in fact a pessimization, not an
optimization. This is because (a) those references need to be stored in
stack slots across calls, and (b) the C code generators ensure that
the cells representing ground terms will be shared as needed.
If all arms of a switch are from_ground_term_construct scopes,
do not merge the instmap_deltas from those arms, since this is
both time-consuming (even after the other changes in this diff)
and extremely unlikely to improve the instmap_delta.
Disable common_struct in from_ground_term_construct scopes,
since for these scopes, it is actually a pessimization.
Do not delete from_ground_term_construct scopes, since many
compiler passes can now use them.
Do some manual deforestation, break up some large predicates,
and give better names to some.
compiler/liveness.m
Special-case the handling from_ground_term_construct scopes. This
allows us to traverse them just once instead of three times, and this
traversal is simpler and faster than any of the three.
In some traversals, we were switching on the goal type twice; once
in e.g. detect_liveness_in_goal_2, and once by calling
goal_expr_has_subgoals. Eliminate the double switching by merging
the relevant predicates. (The double-switching structure was easier
to work with before we had multi-cons-id switches.)
compiler/typecheck.m:
Move a lookup after a test, so we don't have to do it if the test
fails.
Provide a specialized mode for a predicate. This should allow the
compiler to eliminate an argument and a test in the common case.
Note a possible chance for a speedup.
compiler/typecheck_info.m:
Don't apply empty substitutions to the types of a possibly very large
set of variables.
compiler/quantification.m:
Don't quantify from_ground_term_construct scopes. They are created
correctly quantified, and any compiler pass that invalidates that
quantification also removes the from_ground_term_construct mark.
Don't apply empty renamings to a possibly very large set of variables.
Move the code for handling scopes to its own predicate, to avoid
overwhelming the code that handles other kinds of goals. Even from
this, factor out the renaming code, since it is needed only for
some kinds of scopes.
Make some predicate names better reflect what the predicate does.
compiler/pd_cost.m:
For from_ground_term_construct scopes, instead of computing their cost
by adding up the costs of the goals inside, make their cost a constant,
since binding a variable to a static term takes constant time.
compiler/pd_info.m:
Add prefixes on field names to avoid ambiguities.
compiler/add_heap_ops.m:
compiler/add_trail_ops.m:
compiler/closure_analysis.m:
compiler/constraint.m:
compiler/cse_detection.m:
compiler/dead_proc_elim.m:
compiler/deep_profiling.m:
compiler/deforest.m:
compiler/delay_construct.m:
compiler/delay_partial_inst.m:
compiler/dep_par_conj.m:
compiler/distance_granularity.m:
compiler/exception_analysis.m:
compiler/follow_code.m:
compiler/follow_vars.m:
compiler/format_call.m:
compiler/granularity.m:
compiler/higher_order.m:
compiler/implicit_parallelism.m:
compiler/inlining.m:
compiler/interval.m:
compiler/lambda.m:
compiler/lco.m:
compiler/live_vars.m:
compiler/loop_inv.m:
compiler/middle_rec.m:
compiler/mode_util.m:
compiler/parallel_to_plain_conj.m:
compiler/saved_vars.m:
compiler/stm_expand.m:
compiler/store_alloc.m:
compiler/stratify.m:
compiler/structure_reuse.direct.detect_garbage.m:
compiler/structure_reuse.lbu.m:
compiler/structure_sharing.analysis.m:
compiler/switch_detection.analysis.m:
compiler/trail_analysis.m:
compiler/term_pass1.m:
compiler/tupling.m:
compiler/unneeded_code.m:
compiler/untupling.m:
compiler/unused_args.m:
These passes have nothing to do in from_ground_term_construct scopes,
so don't traverse them.
In some modules (e.g. dead_proc_elim), some traversals had to be kept.
In loop_inv.m, replace a code structure that updated accumulators
with functions (which prevented the natural use of state variables),
that in lots of places reconstructed the term it had just
deconstructed, and obscured the identical handling of different kinds
of goals, with a structure based on predicates, state variables and
shared code for different goal types where possible.
In store_alloc.m, avoid some double switching on the same value.
In stratify.m, unneeded_code.m and unused_args.m, rename predicates
to avoid ambiguities.
compiler/goal_path.m:
compiler/goal_util.m:
compiler/implementation_defined_literals.m:
compiler/intermode.m:
compiler/mark_static_terms.m:
compiler/ml_code_gen.m:
compiler/mode_ordering.m:
compiler/ordering_mode_constraints.m:
compiler/prop_mode_constraints.m:
compiler/purity.m:
compiler/rbmm.actual_region_arguments.m:
compiler/rbmm.add_rbmm_goal_infos.m:
compiler/rbmm.condition_renaming.m:
compiler/rbmm.execution_path.m:
compiler/rbmm.region_transformation.m:
compiler/structure_reuse.direct.choose_reuse.m:
compiler/structure_reuse.indirect.m:
compiler/structure_reuse.lfu.m:
compiler/structure_reuse.versions.m:
compiler/term_const_build.m:
compiler/term_traversal.m:
compiler/unused_imports.m:
Mark places where we cannot (yet) special case
from_ground_term_construct scopes.
In structure_reuse.lfu.m, turn nested if-then-elses into a switch in.
compiler/size_prof.m:
Turn from_ground_term_construct scopes into from_ground_term_other
scopes, since in term size profiling grades, we need to attach sizes to
terms.
Give predicates better names.
compiler/*.m:
Minor changes to conform to the changes above.
compiler/make_hlds_passes.m:
With -S, print statistics after the third pass over items, since
this is the time-consuming one.
compiler/mercury_compile.m:
Conform to the new names of some predicates.
When declining to output a HLDS dump because it would be identical to
the previous dump, don't confuse the user either by being silent about
the decision, or by leaving an old dump laying around that could be
mistaken for a new one.
tools/binary:
tools/binary_step:
Bring these tools up to date.
compiler/Mmakefile:
Add an int3s target for use by the new code in the tools. The
Mmakefiles in the other directories with Mercury code already have
such a target.
compiler/notes/allocation.html:
Fix an out-of-date reference.
tests/debugger/polymorphic_ground_term.{m,inp,exp}:
New test case to check whether liveness.m handles typeinfo liveness
of ground terms correctly.
tests/debugger/Mmakefile:
Enable the new test case.
tests/debugger/polymorphic_output.{m,exp}:
Fix tab/space mixup.
|
||
|
|
6a6e81b9e3 |
Add a new structure to the feedback data type,
Estimated hours taken: 2 Branches: main Add a new structure to the feedback data type, candidate_parallel_conjunctions, This produces feedback information about parallel conjunctions that may be parallelised. This data is not yet collected by the mdprof_feedback tool, or used by the compiler. Make changes to the feedback API and on disk format. This makes it easier to query the feedback_info structure for feedback data. mdbcomp/feedback.m: Introduce candidate_parallel_conjunctions feedback information. Remove type arguments from feedback predicates. Move feedback_type out of this modules interface. Use a partially instantiated feedback_data data structure to retrieve feedback data, A caller of get_feedback_data no-longer needs to use a switch to check that they received the correct data. Remove keys from the on disk format, removing the risk that some data could be stored against an incorrect key. Increment the feedback data file version number. compiler/implicit_parallelism.m: conform to changes in mdbcomp/feedback.m compiler/options.m: Added the --implicit-parallelisation-old compiler option, this will enable the old implicit parallelism implementation. deep_profiler/mdprof_feedback.m: Added options for collecting the candidate_parallel_conjunctions feedback data. |
||
|
|
01d145ab8f |
Introduce a feedback system that allows analysis tools to feed information
Estimated hours taken: 8 Branches: main Introduce a feedback system that allows analysis tools to feed information back into the compiler. This can be used with the deep profiler to improve many optimizations. Tools update information in the feedback file rather than clobbering existing un-related information. Modify the implicit parallelism work to make use of the new feedback system. mdprof_feedback updates a feedback file and in the future will be able to collect more information from the deep profiler. mdbcomp/feedback.m: Created a new module for the feedback system, types representing feedback information and predicates for reading and writing feedback files, and manipulating feedback information are defined here. mdbcomp/mdbcomp.m: Updated to include the mdbcomp/feedback.m in this library. mdbcomp/program_representation.m: Created a new type to describe a call. This is used by the current implicit parallelism implementation. deep_profiler/mdprof_feedback.m: Updated to use the new feedback system. The old feedback file code has been removed. --program-name option has been added, a program name must be provided to be included in the header of the feedback file. Conform to changes in mdbcomp/program_representation.m compiler/globals.m: Added feedback data to globals structure. Added predicates to get and set the feedback information stored in the globals structure. Modified predicates that create the globals structure. compiler/handle_options.m: Set feedback information in globals structure when it is created in postprocess_options. Read feedback information in from file in check_option_values. Code added to postprocess_options2 to check the usage of the --implicit-parallelism option. compiler/implicit_parallelism.m: This module no-longer reads the feedback file it's self, this code has been removed, as has the IO state. Information from the feedback state is retrieved and used to control implicit parallelism. compiler/mercury_compile.m: No-longer checks options for implicit parallelization, this is now done in compiler/handle_options.m. Conform to changes in implicit_parallelism.m deep_profiler/Mmakefile: slice/Mmakefile: Modified to include mdbcomp/feedback.m for compilation in this directory. |
||
|
|
b000cb322e |
Provide compiler support for Software Transactional Memory through the new
Estimated hours taken: 80 by zs, and lots more by lmika Branches: main Provide compiler support for Software Transactional Memory through the new atomic goal. This work was done by Leon Mika; I merely brought it up to date, resolved conflicts, and cleaned up a few things. There are still several aspects that are as yet incomplete. library/ops.m: Add the operators needed for the syntax of atomic scopes. library/stm_builtin.m: Add the builtin operations needed for the implementation of atomic goals. compiler/hlds_goal.m: Add a new HLDS goal type, which represents an atomic goal and its possible fallbacks (in case an earlier goal throws an exception). Rename the predicate goal_is_atomic as goal_expr_has_subgoals, since now its old name would be misleading. compiler/prog_data.m: compiler/prog_item.m: Add a parse tree representation of the new kind of goal. compiler/prog_io_goal.m: Parse the new kind of goal. compiler/add_clause.m: Translate atomic goals from parse tree form to HLDS. compiler/typecheck.m: compiler/typecheck_errors.m: Do type checking of atomic goals. compiler/modes.m: Do mode checking of atomic goals, and determine whether they are nested or not. compiler/unique_modes.m: Do unique mode checking of atomic goals. compiler/stm_expand.m: New module to expand atomic goals into sequences of simpler goals. library/stm_builtin.m: Add the primitives needed by the transformation. Improve the existing debugging support. mdbcomp/prim_data.m: Add utility functions to allow stm_expand.m to refer to modules in the library. mdbcomp/program_representation.m: Expand the goal_path type to allow the representation of components of atomic goals. compiler/notes/compiler_design.html: Document the new module. compiler/transform_hlds.m: Include the new module in the compiler. compiler/mercury_compile.m: Invoke the STM transformation. compiler/hlds_module.m: Add an auxiliary counter used by the STM transformation. compiler/hlds_pred.m: Add a new predicate origin: the STM transformation. compiler/modules.m: Import the STM builtin module automatically if the module contains any atomic goals. compiler/assertion.m: compiler/bytecode_gen.m: compiler/clause_to_proc.m: compiler/code_gen.m: compiler/code_info.m: compiler/code_util.m: compiler/constraint.m: compiler/cse_detection.m: compiler/deep_profiling.m: compiler/code_util.m: compiler/delay_construct.m: compiler/delay_partial_inst.m: compiler/dep_par_conj.m: compiler/dependency_graph.m: compiler/det_analysis.m: compiler/det_report.m: compiler/distance_granularity.m: compiler/equiv_type_hlds.m: compiler/erl_code_gen.m: compiler/exception_analysis.m: compiler/follow_code.m: compiler/format_call.m: compiler/goal_form.m: compiler/goal_path.m: compiler/goal_util.m: compiler/granularity.m: compiler/hlds_out.m: compiler/implicit_parallelism.m: compiler/inlining.m: compiler/intermod.m: compiler/lambda.m: compiler/layout_out.m: compiler/lco.m: compiler/lookup_switch.m: compiler/make_hlds_warn.m: compiler/mark_static_terms.m: compiler/mercury_to_mercury.m: compiler/middle_rec.m: compiler/ml_code_gen.m: compiler/mode_constraint_robdd.m: compiler/mode_constraints.m: compiler/mode_errors.m: compiler/mode_info.m: compiler/mode_util.m: compiler/ordering_mode_constraints.m: compiler/pd_cost.m: compiler/pd_util.m: compiler/polymorphism.m: compiler/post_typecheck.m: compiler/prog_rep.m: compiler/prog_type.m: compiler/prop_mode_constraints.m: compiler/rbmm.actual_region_arguments.m: compiler/rbmm.add_rbmm_goal_info.m: compiler/rbmm.condition_renaming.m: compiler/rbmm.execution_path.m: compiler/rbmm.points_to_analysis.m: compiler/rbmm.region_transformation.m: compiler/saved_vars.m: compiler/simplify.m: compiler/size_prog.m: compiler/smm_common.m: compiler/structure_reuse.direct.choose_reuse.m: compiler/structure_reuse.direct.detect_garbage.m: compiler/structure_reuse.indirect.m: compiler/structure_reuse.lbu.m: compiler/structure_reuse.lfu.m: compiler/structure_reuse.versions.m: compiler/structure_sharing.analysis.m: compiler/switch_detection.m: compiler/unused_imports.m: compiler/granularity.m: compiler/granularity.m: Conform to the changes above. Mostly this means handling the new kind of goal. compiler/add_heap_ops.m: compiler/add_trail_ops.m: compiler/build_mode_constraints.m: compiler/closure_analysis.m: compiler/dead_proc_elim.m: compiler/deforest.m: compiler/follow_vars.m: compiler/higher_order.m: compiler/live_vars.m: compiler/liveness.m: compiler/loop_inv.m: compiler/module_qual.m: compiler/prog_util.m: compiler/purity.m: compiler/quantification.m: compiler/store_alloc.m: compiler/stratify.m: compiler/tabling_analysis.m: compiler/term_constr_build.m: compiler/term_pass1.m: compiler/term_traversal.m: compiler/trailing_analysis.m: Conform to the changes above. Mostly this means handling the new kind of goal. Switch syntax from clauses to disj. runtime/mercury_stm.[ch]: Implement the primitives needed by the STM transformation. Add more debugging support to the existing primitives. library/term.m: Generalize get_term_context to work on terms of all kinds. |
||
|
|
cc88711d63 |
Implement true multi-cons_id arm switches, i.e. switches in which we associate
Estimated hours taken: 40
Branches: main
Implement true multi-cons_id arm switches, i.e. switches in which we associate
more than one cons_id with a switch arm. Previously, for switches like this:
(
X = a,
goal1
;
( X = b
; X = c
),
goal2
)
we duplicated goal2. With this diff, goal2 won't be duplicated. We still
duplicate goals when that is necessary, i.e. in cases which the inner
disjunction contains code other than a functor test on the switched-on var,
like this:
(
X = a,
goal1
;
(
X = b,
goalb
;
X = c
goalc
),
goal2
)
For now, true multi-cons_id arm switches are supported only by the LLDS
backend. Supporting them on the MLDS backend is trickier, because some MLDS
target languages (e.g. Java) don't support the concept at all. So when
compiling to MLDS, we still duplicate the goal in switch detection (although
we could delay the duplication to just before code generation, if we wanted.)
compiler/options.m:
Add an internal option that tells switch detection whether to look for
multi-cons_id switch arms.
compiler/handle_options.m:
Set this option based on the back end.
Add a version of the "trans" dump level that doesn't print unification
details.
compiler/hlds_goal.m:
Extend the representation of switch cases to allow more than one
cons_id for a switch arm.
Add a type for representing switches that also includes tag information
(for use by the backends).
compiler/hlds_data.m:
For du types, record whether it is possible to speed up tests for one
cons_id (e.g. cons) by testing for the other (nil) and negating the
result. Recording this information once is faster than having
unify_gen.m trying to compute it from scratch for every single
tag test.
Add a type for representing a cons_id together with its tag.
compiler/hlds_out.m:
Print out the cheaper_tag_test information for types, and possibly
several cons_ids for each switch arm.
Add some utility predicates for describing switch arms in terms of
which cons_ids they are for.
Replace some booleans with purpose-specific types.
Make hlds_out honor is documentation, and not print out detailed
information about unifications (e.g. uniqueness and static allocation)
unless the right character ('u') is present in the control string.
compiler/add_type.m:
Fill in the information about cheaper tag tests when adding a du type.
compiler/switch_detection.m:
Extend the switch detection algorithm to detect multi-cons_id switch
arms.
When entering a switch arm, update the instmap to reflect that the
switched-on variable can now be bound only to the cons_ids that this
switch arm is for. We now need to do this, because if the arm contains
another switch on the same variable, computing the can_fail field of
that switch correctly requires us to know this information.
(Obviously, an arm for a single cons_id is unlikely to have switch on
the same variable, and for arms for several cons_ids, we previously
duplicated the arm and left the unification with the cons_id in each
copy, and this unification allowed the correct handling of any later
switches. However, the code of a multi-cons_id switch arm obviously
cannot have a unification with each cons_id in it, which is why
we now need to get the binding information from the switch itself.)
Replace some booleans with purpose-specific types, and give some
predicates better names.
compiler/instmap.m:
Provide predicates for recording that a switched-on variable has
one of several given cons_ids, for use at the starts of switch arms.
Give some predicates better names.
compiler/modes.m:
Provide predicates for updating the mode_info at the start of a
multi-cons_id switch arm.
compiler/det_report.m:
Handle multi-cons_id switch arms.
Update the instmap when entering each switch arm, since this is needed
to provide good (i.e. non-misleading) error messages when one switch on
a variable exists inside another switch on the same variable.
Since updating the instmap requires updating the module_info (since
the new inst may require a new entry in an inst table), thread the
det_info through as updateable state.
Replace some multi-clause predicate definitions with single clauses,
to make it easier to print the arguments in mdb.
Fix some misleading variable names.
compiler/det_analysis.m:
Update the instmap when entering each switch arm and thread the
det_info through as updateable state, since the predicates we call
in det_report.m require this.
compiler/det_util.m:
Handle multi-cons_id switch arms.
Rationalize the argument order of some access predicates.
compiler/switch_util.m:
Change the parts of this module that deal with string and tag switches
to optionally convert each arm to an arbitrary representation of the
arm. In the LLDS backend, the conversion process generated code for
the arm, and the arm's representation is the label at the start of
this code. This way, we can duplicate the label without duplicating
the code.
Add a new part of this module that associates each cons_id with its
tag, and (during the same pass) checks whether all the cons_ids are
integers, and if so what are min and max of these integers (needed
for dense switches). This scan is needed because the old way of making
this test had single-cons_id switch arms as one of its basic
assumptions, and doing it while adding tags to each case reduces
the number of traversals required.
Give better names to some predicates.
compiler/switch_case.m:
New module to handle the tasks associated with managing multi-cons_id
switch arms, including representing them for switch_util.m.
compiler/ll_backend.m:
Include the new module.
compiler/notes/compiler_design.html:
Note the new module.
compiler/llds.m:
Change the computed goto instruction to take a list of maybe labels
instead of a list of labels, with any missing labels meaning "not
reached".
compiler/string_switch.m:
compiler/tag_switch.m:
Reorganize the way these modules work. We can't generate the code of
each arm in place anymore, since it is now possible for more than one
cons_id to call for the execution of the same code. Instead, in
string_switch.m, we generate the codes of all the arms all at once,
and construct the hash index afterwards. (This approach simplifies
the code significantly.)
In tag switches (unlike string switches), we can get locality benefits
if the code testing for a cons_id is close to the code for that
cons_id, so we still try to put them next to each other when such
a locality benefit is available.
In both modules, the new approach uses a utility predicate in
switch_case.m to actually generate the code of each switch arm,
eliminating several copies the same code in the old versions of these
modules.
In tag_switch.m, don't create a local label that simply jumps to the
code address do_not_reached. Previously, we had to do this for
positions in jump tables that corresponded to cons_ids that the switch
variable could not be bound to. With the change to llds.m, we now
simply generate a "no" instead.
compiler/lookup_switch.m:
Get the info about int switch limits from our caller; don't compute it
here.
Give some variables better names.
compiler/dense_switch.m:
Generate the codes of the cases all at once, then assemble the table,
duplicate the labels as needed. This separation of concerns allows
significant simplifications.
Pack up all the information shared between the predicate that detects
whether a dense switch is appropriate and the predicate that actually
generates the dense switch.
Move some utility predicates to switch_util.
compiler/switch_gen.m:
Delete the code for tagging cons_ids, since that functionality is now
in switch_util.m.
The old version of this module could call the code generator to produce
(i.e. materialize) the switched-on variable repeatedly. We now produce
the variable once, and do the switch on the resulting rval.
compiler/unify_gen.m:
Use the information about cheaper tag tests in the type constructor's
entry in the HLDS type table, instead of trying to recompute it
every time.
Provide the predicates switch_gen.m now needs to perform tag tests
on rvals, as opposed to variables, and against possible more than one
cons_id.
Allow the caller to provide the tag corresponding to the cons_id(s)
in tag tests, since when we are generating code for switches, the
required computations have already been done.
Factor out some code to make all this possible.
Give better names to some predicates.
compiler/code_info.m:
Provide some utility predicates for the new code in other modules.
Give better names to some existing predicates.
compiler/hlds_code_util.m:
Rationalize the argument order of some predicates.
Replace some multi-clause predicate definitions with single clauses,
to make it easier to print the arguments in mdb.
compiler/accumulator.m:
compiler/add_heap_ops.m:
compiler/add_pragma.m:
compiler/add_trail_ops.m:
compiler/assertion.m:
compiler/build_mode_constraints.m:
compiler/check_typeclass.m:
compiler/closure_analysis.m:
compiler/code_util.m:
compiler/constraint.m:
compiler/cse_detection.m:
compiler/dead_proc_elim.m:
compiler/deep_profiling.m:
compiler/deforest.m:
compiler/delay_construct.m:
compiler/delay_partial_inst.m:
compiler/dep_par_conj.m:
compiler/distance_granularity.m:
compiler/dupproc.m:
compiler/equiv_type_hlds.m:
compiler/erl_code_gen.m:
compiler/exception_analysis.m:
compiler/export.m:
compiler/follow_code.m:
compiler/follow_vars.m:
compiler/foreign.m:
compiler/format_call.m:
compiler/frameopt.m:
compiler/goal_form.m:
compiler/goal_path.m:
compiler/goal_util.m:
compiler/granularity.m:
compiler/hhf.m:
compiler/higher_order.m:
compiler/implicit_parallelism.m:
compiler/inlining.m:
compiler/inst_check.m:
compiler/intermod.m:
compiler/interval.m:
compiler/lambda.m:
compiler/lambda.m:
compiler/lambda.m:
compiler/lco.m:
compiler/live_vars.m:
compiler/livemap.m:
compiler/liveness.m:
compiler/llds_out.m:
compiler/llds_to_x86_64.m:
compiler/loop_inv.m:
compiler/make_hlds_warn.m:
compiler/mark_static_terms.m:
compiler/middle_rec.m:
compiler/ml_tag_switch.m:
compiler/ml_type_gen.m:
compiler/ml_unify_gen.m:
compiler/mode_constraints.m:
compiler/mode_errors.m:
compiler/mode_util.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/pd_cost.m:
compiler/pd_into.m:
compiler/pd_util.m:
compiler/peephole.m:
compiler/polymorphism.m:
compiler/post_term_analysis.m:
compiler/post_typecheck.m:
compiler/purity.m:
compiler/quantification.m:
compiler/rbmm.actual_region_arguments.m:
compiler/rbmm.add_rbmm_goal_infos.m:
compiler/rbmm.condition_renaming.m:
compiler/rbmm.execution_paths.m:
compiler/rbmm.points_to_analysis.m:
compiler/rbmm.region_transformation.m:
compiler/recompilation.usage.m:
compiler/saved_vars.m:
compiler/simplify.m:
compiler/size_prof.m:
compiler/ssdebug.m:
compiler/store_alloc.m:
compiler/stratify.m:
compiler/structure_reuse.direct.choose_reuse.m:
compiler/structure_reuse.indirect.m:
compiler/structure_reuse.lbu.m:
compiler/structure_reuse.lfu.m:
compiler/structure_reuse.versions.m:
compiler/structure_sharing.analysis.m:
compiler/table_gen.m:
compiler/tabling_analysis.m:
compiler/term_constr_build.m:
compiler/term_norm.m:
compiler/term_pass1.m:
compiler/term_traversal.m:
compiler/trailing_analysis.m:
compiler/transform_llds.m:
compiler/tupling.m:
compiler/type_ctor_info.m:
compiler/type_util.m:
compiler/unify_proc.m:
compiler/unique_modes.m:
compiler/unneeded_code.m:
compiler/untupling.m:
compiler/unused_args.m:
compiler/unused_imports.m:
compiler/xml_documentation.m:
Make the changes necessary to conform to the changes above, principally
to handle multi-cons_id arm switches.
compiler/ml_string_switch.m:
Make the changes necessary to conform to the changes above, principally
to handle multi-cons_id arm switches.
Give some predicates better names.
compiler/dependency_graph.m:
Make the changes necessary to conform to the changes above, principally
to handle multi-cons_id arm switches. Change the order of arguments
of some predicates to make this easier.
compiler/bytecode.m:
compiler/bytecode_data.m:
compiler/bytecode_gen.m:
Make the changes necessary to conform to the changes above, principally
to handle multi-cons_id arm switches. (The bytecode interpreter
has not been updated.)
compiler/prog_rep.m:
mdbcomp/program_representation.m:
Change the byte sequence representation of goals to allow switch arms
with more than one cons_id. compiler/prog_rep.m now writes out the
updated representation, while mdbcomp/program_representation.m reads in
the updated representation.
deep_profiler/mdbprof_procrep.m:
Conform to the updated program representation.
tools/binary:
Fix a bug: if the -D option was given, the stage 2 directory wasn't
being initialized.
Abort if users try to give that option more than once.
compiler/Mercury.options:
Work around bug #32 in Mantis.
|
||
|
|
672f77c4ec |
Add a new compiler option. --inform-ite-instead-of-switch.
Estimated hours taken: 20 Branches: main Add a new compiler option. --inform-ite-instead-of-switch. If this is enabled, the compiler will generate informational messages about if-then-elses that it thinks should be converted to switches for the sake of program reliability. Act on the output generated by this option. compiler/simplify.m: Implement the new option. Fix an old bug that could cause us to generate warnings about code that was OK in one duplicated copy but not in another (where a switch arm's code is duplicated due to the case being selected for more than one cons_id). compiler/options.m: Add the new option. Add a way to test for the bug fix in simplify. doc/user_guide.texi: Document the new option. NEWS: Mention the new option. library/*.m: mdbcomp/*.m: browser/*.m: compiler/*.m: deep_profiler/*.m: Convert if-then-elses to switches at most of the sites suggested by the new option. At the remaining sites, switching to switches would have nontrivial downsides. This typically happens with the switched-on type has many functors, and we treat one or two specially (e.g. cons/2 in the cons_id type). Perform misc cleanups in the vicinity of the if-then-else to switch conversions. In a few cases, improve the error messages generated. compiler/accumulator.m: compiler/hlds_goal.m: (Rename and) move insts for particular kinds of goal from accumulator.m to hlds_goal.m, to allow them to be used in other modules. Using these insts allowed us to eliminate some if-then-elses entirely. compiler/exprn_aux.m: Instead of fixing some if-then-elses, delete the predicates containing them, since they aren't used, and (as pointed out by the new option) would need considerable other fixing if they were ever needed again. compiler/lp_rational.m: Add prefixes to the names of the function symbols on some types, since without those prefixes, it was hard to figure out what type the switch corresponding to an old if-then-else was switching on. tests/invalid/reserve_tag.err_exp: Expect a new, improved error message. |
||
|
|
168f531867 |
Add new fields to the goal_info structure for region based memory management.
Estimated hours taken: 4 Branches: main Add new fields to the goal_info structure for region based memory management. The fields are currently unused, but (a) Quan will add the code to fill them in, and then (b) I will modify the code generator to use the filled in fields. compiler/hlds_goal.m: Make the change described above. Group all the procedures that access goal_info components together. Some of the getters were predicates while some were functions, so this diff changes them all to be functions. (The setters remain predicates.) compiler/*.m: Trivial changes to conform to the change in hlds_goal.m. In simplify.m, break up a huge (800+ line) predicate into smaller pieces. |
||
|
|
9958d3883c |
Fix some formatting.
Estimated hours taken: 0 Branches: main Fix some formatting. compiler/distance_granularity.m: compiler/exception_analysis.m: compiler/implicit_parallelism.m: compiler/inst_graph.m: compiler/interval.m: compiler/layout_out.m: compiler/lp_rational.m: compiler/make.program_target.m: compiler/modules.m: compiler/prog_data.m: compiler/purity.m: compiler/recompilation.check.m: compiler/term_constr_data.m: compiler/term_util.m: compiler/xml_documentation.m: deep_profiler/mdprof_cgi.m: library/pqueue.m: profiler/output.m: Fix the positioning of commas. s/[_|_]/[_ | _]/ in a spot. |
||
|
|
b56885be93 |
Fix a bug that caused bootchecks with --optimize-constructor-last-call to fail.
Estimated hours taken: 12 Branches: main Fix a bug that caused bootchecks with --optimize-constructor-last-call to fail. The problem was not in lco.m, but in follow_code.m. In some cases, (specifically, the LCMC version of insert_2 in sparse_bitset.m), follow_code.m moved an impure goal (store_at_ref) into the arms of an if-then-else without marking those arms, or the if-then-else, as impure. The next pass, simplify, then deleted the entire if-then-else, since it had no outputs. (The store_at_ref that originally appeared after the if-then-else was the only consumer of its only output.) The fix is to get follow_code.m to make branched control structures such as if-then-elses, as well as their arms, semipure or impure if a goal being moved into them is semipure or impure, or if they came from an semipure or impure conjunction. Improve the optimization of the LCMC version of sparse_bitset.insert_2, which had a foreign_proc invocation of bits_per_int in it: replace such invocations with a unification of the bits_per_int constant if not cross compiling. Add a new option, --optimize-constructor-last-call-null. When set, LCMC will assign NULLs to the fields not yet filled in, to avoid any junk happens to be there from being followed by the garbage collector's mark phase. This diff also makes several other changes that helped me to track down the bug above. compiler/follow_code.m: Make the fix described above. Delete all the provisions for --prev-code; it won't be implemented. Don't export a predicate that is not now used anywhere else. compiler/simplify.m: Make the optimization described above. compiler/lco.m: Make sure that the LCMC specialized procedure is a predicate, not a function: having a function with the mode LCMC_insert_2(in, in) = in looks wrong. To avoid name collisions when a function and a predicate with the same name and arity have LCMC applied to them, include the predicate vs function status of the original procedure included in the name of the new procedure. Update the sym_name of calls to LCMC variants, not just the pred_id, because without that, the HLDS dump looks misleading. compiler/pred_table.m: Don't have optimizations like LCMC insert new predicates at the front of the list of predicates. Maintain the list of predicates in the module as a two part list, to allow efficient addition of new pred_ids at the (logical) end without using O(N^2) algorithms. Having predicates in chronological order makes it easier to look at HLDS dumps and .c files. compiler/hlds_module.m: Make module_info_predids return a module_info that is physically updated though logically unchanged. compiler/options.m: Add --optimize-constructor-last-call-null. Make the options --dump-hlds-pred-id, --debug-opt-pred-id and --debug-opt-pred-name into accumulating options, to allow the user to specify more than one predicate to be dumped (e.g. insert_2 and its LCMC variant). Delete --prev-code. doc/user_guide.texi: Document the changes in options.m. compiler/code_info.m: Record the value of --optimize-constructor-last-call-null in the code_info, to avoid lookup at every cell construction. compiler/unify_gen.m: compiler/var_locn.m: When deciding whether a cell can be static or not, make sure that we never make static a cell that has some fields initialized with dummy zeros, to be filled in for real later. compiler/hlds_out.m: For goals that are semipure or impure, note this fact. This info was lost when I changed the representation of impurity from markers to a field. mdbcomp/prim_data.m: Rename some ambiguous function symbols. compiler/intermod.m: compiler/trans_opt.m: Rename the main predicates (and some function symbols) of these modules to avoid ambiguity and to make them more expressive. compiler/llds.m: Don't print line numbers for foreign_code fragments if the user has specified --no-line-numbers. compiler/make.dependencies.m: compiler/mercury_to_mercury.m: compiler/recompilation.usage.m: Don't use io.write to write out information to files we may need to parse again, because this is vulnerable to changes to the names of function symbols (e.g. the one to mdbcomp/prim_data.m). The compiler still contains some uses of io.write, but they are for debugging. I added an item to the todo list of the one exception, ilasm.m. compiler/recompilation.m: Rename a misleading function symbol name. compiler/parse_tree.m: Don't import recompilation.m here. It is not needed (all the components of parse_tree that need recompilation.m already import it themselves), and deleting the import avoids recompiling almost everything when recompilation.m changes. compiler/*.m: Conform to the changes above. compiler/*.m: browser/*.m: slice/*.m: Conform to the change to mdbcomp. library/sparse_bitset.m: Use some better variable names. |
||
|
|
7651d83206 |
This change adds two new passes to the compiler.
Estimated hours taken: 80
Branches: main
This change adds two new passes to the compiler. The first one,
implicit_parallelism, uses deep profiling feedback information, generated by
mdprof_feedback, to introduce parallel conjunctions where it could be
worthwhile. It deals with both independent and dependent parallelism.
The second new pass, distance_granularity, applies a transformation that
controls the granularity of parallelism for recursive procedures using the
distance metric.
This change also fixes a bug in mdprof_feedback regarding the construction of
the list of CSSs.
compiler/implicit_parallelism.m:
New module which uses the profiling feedback file generated by
mdprof_feedback to introduce parallel conjunction where it could be
useful.
compiler/distance_granularity.m:
New module. A program transformation that implements granularity control
of parallel execution using the distance metric.
compiler/dep_par_conj.m:
Moved find_shared_variables into the interface (needed for
implicit_parallelism.m).
compiler/goal_util.m:
Add two new predicates: flatten_conj and create_conj.
compiler/hhf.m:
Delete flatten_conj and use the one of goal_util instead.
compiler/hlds_pred.m:
Add a predicate to set the arity of a predicate (needed for
distance_granularity).
compiler/mercury_compile.m:
Add the calls to apply implicit parallelism and to control granularity
using the distance metric.
compiler/options:
Add implicit-parallelism, feedback-file and distance-granularity options.
compiler/pred_table.m:
Add a predicate to get the next pred_id available (needed for
distance_granularity).
compiler/prog_util.m:
Extend the predicate make_pred_name and the type new_pred_id for
creating a predicate name for distance_granularity.
compiler/transform_hlds.m:
Include implicit_parallelism and distance_granularity.
deep_profiler/mdprof_feedback.m:
Rename distribution to measure.
Add handling of dump_stages and dump_options.
Insert elements into the list of CSSs in the correct order.
deep_profiler/dump.m:
Add "all" option to dump everything out of the Deep.data file.
doc/user_guide.texi:
Add the following options: --distance-granularity, --implicit-parallelism and
--feedback-file.
tests/par_conj/Mercury.options:
tests/par_conj/dg_fib.{m,exp}:
tests/par_conj/dg_fib_func.{m,exp}:
Add two test cases for the distance_granularity module:dg_fib and
dg_fib_func. As things are, we do not check whether the granularity
control transformation using the distance metric is applied correctly or
not. We only check the output of these test cases.
|