mirror of
https://github.com/Mercury-Language/mercury.git
synced 2026-04-20 11:54:02 +00:00
083d376e6598628362ee91c2da170febd83590f4
42 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
06f81f1cf0 |
Add end_module declarations ...
.. to modules which did not yet have them. |
||
|
|
9095985aa8 |
Fix more warnings from --warn-inconsistent-pred-order-clauses.
deep_profiler/*.m:
Fix inconsistencies between (a) the order in which functions and predicates
are declared, and (b) the order in which they are defined.
In most modules, either the order of the declarations or the order
of the definitions made sense, and I changed the other to match.
In some modules, neither made sense, so I changed *both* to an order
that *does* make sense (i.e. it has related predicates together).
In query.m, put the various commands in the same sensible order
as the code processing them.
In html_format.m, merge two exported functions together, since
they can't be used separately.
In some places, put dividers between groups of related
functions/predicates, to make the groups themselves more visible.
In some places, fix comments or programming style.
deep_profiler/DEEP_FLAGS.in:
Since all the modules in this directory are now free from any warnings
generated by --warn-inconsistent-pred-order-clauses, specify that option
by default in this directory to keep it that way.
|
||
|
|
59b0edacbe |
New module for calculating the overlap between the conjuncts of a
Estimated hours taken: 2 deep_profiler/autopar_calc_overlap.m: New module for calculating the overlap between the conjuncts of a parallelised conjunction. Its contents are taken from the old autopar_search_callgraph.m. deep_profiler/autopar_costs.m: New module for calculating the costs of goals. Its contents are taken from the old autopar_search_callgraph.m. deep_profiler/autopar_reports.m: New module for creating reports. Its contents are taken from the old autopar_search_callgraph.m. deep_profiler/autopar_search_goals.m: New module for searching goals for parallelizable conjunctions. Its contents are taken from the old autopar_search_callgraph.m. deep_profiler/autopar_search_callgraph.m: Remove the code moved to other modules. deep_profiler/mdprof_fb.automatic_parallelism.m: Add the new modules. deep_profiler/*.m: Remove unnecessary imports. Fix copyright years on the new modules. browser/*.m: compiler/*.m: mdbcomp/*.m: Remove unnecessary imports. library/Mercury.options: Make it possible to compile a whole workspace with --warn-unused-imports by turning that option off for type_desc.m (which has a necessary import that --warn-unused-imports thinks is unused). |
||
|
|
8091c166d6 |
Start dividing a big module with low cohesion into several smaller, high
Estimated hours taken: 3 Branches: main Start dividing a big module with low cohesion into several smaller, high cohesion modules. deep_profiler/mdprof_fb.automatic_parallelism.m: Distribute the contents of this module among five new submodules. Replace those contents with just the inclusions of the new submodules. Remove part of the old module implementing Jerome Tannier's algorithms. deep_profiler/autopar_types.m: The part of the old module defining generally useful types and the operations on them. deep_profiler/autopar_annotate.m: The part of the old module that annotated the representations of goals. deep_profiler/autopar_find_best_par.m: The part of the old module that performed the search for the best way to parallelize a conjuncyion. deep_profiler/autopar_search_callgraph.m: The rest of the old module; it still has low cohesion. I will chop it into smaller pieces over the next couple of days. deep_profiler/Mmakefile: Add a target for Mercury.modules, and build before doing mmake depend. |
||
|
|
216cb7ef71 |
The push goals analysis for automatic parallelism throws an exception in the
case that it attempts to re-order goals. Rather than throwing an exception
this change pushes the goals as far as possible but no further before
evaluating whether the parallelisation is worthwhile. This allows us to push
the goals, but not as far as we'd like, and might still allow for profitable
parallelism.
deep_profiler/mdprof_fb.automatic_parallelism.m:
As above.
|
||
|
|
5435fa3667 |
Various fixes, mostly to measuring the overlap between dependant parallel
conjunctions. These fixes ensure that our implementation now matches the
algorithms in our paper.
Benchmarking can now begin for the paper.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Remove build_candidate_par_conjunction_maps since it's no-longer called.
Fix a bug where candidate procedures where generated with no candidate
conjunctions in them.
Fix a bug where !ConjNum was not incremented in a loop, this caused
SparkDelay to be calculated incorrectly when calculating the cost of a
parallel conjunction.
Account for the cost of calling signal in the right place when calculating
the cost of a parallel conjunction.
Conform to changes in measurements.m.
deep_profiler/mdprof_feedback.m:
Add the command line option for the barrier cost during parallel execution.
deep_profiler/measurements.m:
The incomplete parallel exec metrics structure now tracks deadtime due to
futures explicitly. Previously it was calculated from other values.
Conform to the parallel execution time calculations in
mdprof_fb.automatic_parallelism.m. Each conjunct is delayed by:
SparkDelay * (ConjNum - 1) except for the first.
Fix signal costs, they're now stored with the conjunct that incurred them
rather than the one that waited on the variable. This also prevents them
from being counted more than once.
Added support for the new parallel execution overhead 'barrier cost'.
mdbcomp/feedback.automatic_parallelism.m:
Added support for the new parallel execution overhead 'barrier cost'.
Modified the parallel execution metrics so that different overheads are
accounted for separately.
Changed a comment so that it clarifies how the range of goals in the
push_goal type should be interpreted.
mdbcomp/feedback.m:
Increment feedback_version.
|
||
|
|
9c98076fa8 |
When a recursive call is in a branching goal such that it is executed in some
but not all of the recursive cases we calculate it's per-call cost from the
recursion depth of MaxDepth / Calls. This can make it a candidate for
parallelisation when only some iterations of a loop (in linear recursion) are
expensive and on average a single iteration is below the threshold for
automatic parallelism.
This is important when considering whether to push goals (such as recursive
calls) into branches to that they're next to goals that are candidates for
parallelisation - when, on average the branching goal is not a candidate for
parallelisation. When pushing a goal and deciding whether a goal should be
pushed we must also do this calculation.
deep_profiler/analysis_utils.m:
The recursive call cost for different calls now begins from different
depths of recursion. The depth of recursion used is MaxDepth / Calls,
Which means that a recursive call that is called in all cases still has a
depth of 1.0
deep_profiler/mdprof_fb.automatic_parallelism.m:
When pushing a recursive call goal and detecting if it is worth-while to
push a such a goal, test its per call cost after adjusting it for according
to the formula above for the new number of calls it will have in this
context.
Fix a module qualification for a call to fold, fold is in set.m not list.m
deep_profiler/measurements.m:
Add goal_cost_get_total.
|
||
|
|
65da8c1257 |
Cosmetic changes.
Estimated hours taken: 0.1 Cosmetic changes. |
||
|
|
efc1506ec0 |
Merge sets of push goals.
Estimated hours taken: 2 deep_profiler/mdprof_db.automatic_parallelism.m: Merge sets of push goals. mdbcomp/feedback.automatic_parallelism.m: Add back the field that Paul deleted, since it is useful. |
||
|
|
70235cc3b8 |
Changes to goal-push feedback that _almost_ make it work. Ohe problem remains
that the recursive call looks cheap in cases where pushing goals is requited.
deep_profiler/mdprof_fb.automatic_parallelism.m:
If pushing a goal and attempting a paralleisation fails then return the
single costly goals to our caller so that it can attempt to push and
parallelise these goals with it's own.
Whether a goal is above the call site threshold or not no-longer depends on
the goal type. This switch has been removed.
Add marks where pushes should be merged.
Return pushes from goal_get_conjunctions_worth_parallelising and fill in
the push goals list in the candidate procedure structure.
Pretty-print the goal push list for a candidate procedure.
mdbcomp/feedback.automatic_parallelism.m:
Remove the maybe push goal field from candidate conjunctions,
There is already a list of push goals in the candidate procedure structure.
mdbcomp/feedback.m:
Increment feedback_version.
|
||
|
|
0d7e0e98cc |
Make the interesting recursion depth for singly-recursive code 2. This has the
affect that when trying to parallelise a loop that we assume the recursive call
will execute the recursive case once followed by the base case. If this
parallelisation is optimistic, then it is optimistic to parallelise the whole
loop.
deep_profiler/mdprof_fb.automatic_parallelism.m:
As above.
Track the containing goal map for a procedure's implicit parallelism
analysis.
deep_profiler/var_use_analysis.m:
Fix the checks for module boundaries, they where placed in the wrong places.
Handle recursive var use analysis by induction.
Move the checks for unbounded recursion in this code to places that make
more sense for the new analysis by induction.
Duplicate the variable use analysis to create a specific one for computing
variable use in the recursive and base cases.
Documented this module's trace flags.
deep_profiler/measurement_utils.m:
Fix the calculation of disjuncts of probabilities.
mdbcomp/mdbcomp.goal_path.m:
Add another version of create_goal_id_array that takes a default value for
each array slot.
mdbcomp/feedback.m:
Increment feedback_version to reflect Zoltan's push goals changes.
mdbcomp/feedback.automatic_parallelism.m:
Add a note asking people to increment feedback_version if they change any
structures here.
deep_profiler/Mercury.options:
Documented var_use_analysis' trace flags.
|
||
|
|
f3f3a6f0d3 |
Create candidate parallel conjunctions that require pushing goals.
Estimated hours taken: 6 Create candidate parallel conjunctions that require pushing goals. |
||
|
|
0761008d0a |
Detect places where a costly goal could be pushed next to another
Estimated hours taken: 8 Branches: main deep_profiler/mdprof_db.automatic_parallelism.m: Detect places where a costly goal could be pushed next to another costly goal inside an earlier conjunct. We don't yet do anything with this information. library/list.m: Add some more arity variants of map_foldl for mdprof_db.automatic_parallelism.m. Put these and the existing variants into a logical order. |
||
|
|
5f2087d1f3 |
Fix a cut-and-paste error.
Estimated hours taken: 0.1 Fix a cut-and-paste error. |
||
|
|
2070f42b24 |
Refactor goal annotations in the deep profiler.
Goal annotations have previously been attached to goals using type-polymorphism
and in some cases type classes. This has become clumsy as new annotations are
created. Using the goal_id code introduced recently, this change associates
annotations with goals by storing them in an array indexed by goal ids. Many
analyses have been updated to make use of this code. This code should also be
faster as less allocation is done when annotating a goal as the goal
representation does not have to be reconstructed.
mdbcomp/mdbcomp.goal_path.m:
Add predicates for working with goal attribute arrays. These are
polymorphic arrays that are indexed by goal id and can be used to associate
information with goals.
deep_profiler/report.m:
The procrep coverage info report now stores the coverage annotations in a
goal_attr_array.
deep_profiler/coverage.m:
The coverage analysis now returns its result in a goal_attr_array rather
than by annotation the goal directly.
The interface for the coverage module has changed, it now allows
programmers to pass a goal_rep to it directly. This makes it easier to
call from other analyses.
The coverage analysis no longer uses the calls_and_exits structure.
Instead it uses the cost_and_callees structure like many other analyses.
This also makes it easier to perform this annotation and others using only
a single call site map structure.
Moved add_coverage_point_to_map/5 from create_report.m to coverage.m.
deep_profiler/analysis_utils.m:
Made cost_and_callees structure polymorphic so that any type can be used to
represent the callees. (So that either static or dynamic callees can be
used).
Added the number of exit port counts to the cost_and_callees structure.
Added build_static_call_site_cost_and_callees_map/4.
Rename build_call_site_cost_and_callees_map/4 to
build_dynamic_call_site_cost_and_callees_map/4.
deep_profiler/var_use_analysis.m:
Update the var_use_analysis to use coverage information provided in a
goal_attr_array.
deep_profiler/recursion_patterns.m:
Update the recursion analysis to use coverage information provided in a
goal_attr_array.
deep_profiler/program_representation_utils.m:
Add label_goals/4 to label goals with goal ids and build a map of goal ids
to goal paths.
Update pretty printing fucntions to work with either annotation on the
goals themselves or provided by a higher order value. The higher order
argument maps nicly to the function goal_get_attribute/3 in goal_path.m
deep_profiler/mdprof_fb.automatic_parallelism.m:
Modify goal_annotate_with_instmap, it now returns the instmap annotations
in a goal_attr_array.
Conform to changes in:
program_representation_utils.m
coverage.m
var_use_analysis.m
deep_profiler/message.m:
Updated messagee to more correctly express the problems that
mdprof_fb.automatic_parallelism.m may encounter.
deep_profiler/create_report.m:
Conform to changes in coverage.m.
Make use of code in analysis_utils.m to prepare call site maps for coverage
analysis.
deep_profiler/recursion_patterns.m:
deep_profiler/var_use_analysis.m:
Conform to changes in analysis_utils.m.
deep_profiler/display_report.m:
Conform to changes in program_representation_utils.m.
|
||
|
|
d43239d6a7 |
Move some of the goal path code from compiler/goal_path.m to the mdbcomp
library where it can be used by the deep profiler.
Also move the goal path code from program_representation.m to the new module,
goal_path.m in mdbcomp/
mdbcomp/goal_path.m:
New module containing goal path code.
mdbcomp/program_representation.m:
Original location of goal path code.
compiler/goal_path.m:
Move some of this goal_path code into mdbcomp/goal_path.m
mdbcomp/feedback.automatic_parallelisation.m:
mdbcomp/rtti_access.m:
mdbcomp/slice_and_dice.m:
mdbcomp/trace_counts.m:
browser/debugger_interface.m:
browser/declarative_execution.m:
browser/declarative_tree.m:
compiler/build_mode_constraints.m:
compiler/call_gen.m:
compiler/code_info.m:
compiler/continuation_info.m:
compiler/coverage_profiling.m:
compiler/deep_profiling.m:
compiler/format_call.m:
compiler/goal_path.m:
compiler/goal_util.m:
compiler/hlds_data.m:
compiler/hlds_goal.m:
compiler/hlds_out_goal.m:
compiler/hlds_out_pred.m:
compiler/hlds_pred.m:
compiler/interval.m:
compiler/introduce_parallelism.m:
compiler/layout_out.m:
compiler/llds.m:
compiler/mode_constraint_robdd.m:
compiler/mode_constraints.m:
compiler/mode_ordering.m:
compiler/ordering_mode_constraints.m:
compiler/polymorphism.m:
compiler/post_typecheck.m:
compiler/prog_rep.m:
compiler/prop_mode_constraints.m:
compiler/push_goals_together.m:
compiler/rbmm.condition_renaming.m:
compiler/smm_common.m:
compiler/stack_layout.m:
compiler/stack_opt.m:
compiler/trace_gen.m:
compiler/tupling.m:
compiler/type_constraints.m:
compiler/typecheck.m:
compiler/unify_gen.m:
compiler/unneeded_code.m:
deep_profiler/Mmakefile:
deep_profiler/analysis_utils.m:
deep_profiler/coverage.m:
deep_profiler/create_report.m:
deep_profiler/display_report.m:
deep_profiler/dump.m:
deep_profiler/mdprof_fb.automatic_parallelism.m:
deep_profiler/message.m:
deep_profiler/old_query.m:
deep_profiler/profile.m:
deep_profiler/program_representation_utils.m:
deep_profiler/read_profile.m:
deep_profiler/recursion_patterns.m:
deep_profiler/report.m:
deep_profiler/var_use_analysis.m:
slice/Mmakefile:
slice/mcov.m:
Conform to the move of the goal path code.
|
||
|
|
436d37cc8d |
Fix a regression that affected coverage profiling and automatic parallelism
analysis.
When instrumenting a program for deep and coverage profiling the compiler adds
goal path information to the program. The compiler also writes out a bytecode
representation of the program. The step_switch goal path step includes a field
for the number of functors of the type of the variable that is being switched
upon. When this is included in the deep profiling data (as it is now) and
these goals paths are used as the indexes in tables the deep profiler cannot
reconstruct matching goal paths from the program representation bytecode,
causing many important map lookups to fail or throw an exception.
We fix this not by preventing the compiler from writing out this information,
but by stripping it out of goal paths before the analysis tool uses those goal
paths.
mdbcomp/program_representation.m:
Add a new predicate rev_goal_path_remove_type_info/2 that removes
type-dependant information from goals.
deep_profiler/profile.m:
Change the goal_path string in the call_site_static type to contain a
reverse_goal_path rather than a string. This hides the abstraction of the
stripping of type dependant information within read_profile.m
deep_profiler/read_profile.m:
Strip type dependant information from goal path strings as they are read in.
Conform to changes in profile.m
deep_profiler/report.m:
In the call site static dump information use a goal_path field rather than
a string.
deep_profiler/analysis_utils.m:
deep_profiler/create_report.m:
deep_profiler/dump.m:
deep_profiler/mdprof_fb.automatic_parallelism.m:
deep_profiler/old_query.m:
Conform to changes in profile.m.
deep_profiler/display_report.m:
Conform to changes in report.m.
|
||
|
|
31a6424ff3 |
Remove an XXX that is not needed anymore.
Estimated hours taken: 0.1 Branches: main, release deep_profiler/mdprof_fb.automatic_parallelism.m: Remove an XXX that is not needed anymore. |
||
|
|
0c42f810c2 |
Start working on the 'goal push' feedback.
This feedback information is part of automatic parallelisation feedback. It
describes cases where goals after a branch goal but in the same conjunction
should be pushed into the branches of the branching goal. This can allow the
pushed goal to be parallelised against goals that already exist in one or more
arms of the branch goal without parallelising the whole branch goal.
This change simply creates the data-structures within the feedback framework on
which this feature will be based.
nmdbcomp/feedback.automatic_parallelism.m:
Introduce new push_goal structure that describes the transformation.
mdbcomp/feedback.m:
Incremented feedback format version number.
deep_profiler/mdprof_fb.automatic_parallelism.m:
compiler/implicit_parallelism.m:
Conform to changes in feedback.automatic_parallelism.m.
The code to generate or use this feedback has not been implemented, that
will come later.
|
||
|
|
a2cd0da5b3 |
The existing representation of goal_paths is suboptimal for several reasons.
Estimated hours taken: 80 Branches: main The existing representation of goal_paths is suboptimal for several reasons. - Sometimes we need forward goal paths (e.g. to look up goals), and sometimes we need reverse goal paths (e.g. when computing goal paths in the first place). We had two types for them, but - their names, goal_path and goal_path_consable, were not expressive, and - we could store only one of them in goal_infos. - Testing whether goal A is a subgoal of goal B is quite error-prone using either form of goal paths. - Using a goal path as a key in a map, which several compiler passes want to do, requires lots of expensive comparisons. This diff replaces most uses of goal paths with goal ids. A goal id is an integer, so it can be used as a key in faster maps, or even in arrays. Every goal in the body of a procedure gets its id allocated in a depth first search. Since we process each goal before we dive into is descendants, the goal representing the whole body of a procedure always gets goal id 0. The depth first traversal also builds up a map (the containing goal map) that tells us the parent goal of ever subgoal, with the obvious exception of the root goal itself. From the containing goal map, one can compute both reverse and forward goal paths. It can also serve as the basis of an efficient test of whether the goal identified by goal id A is an ancestor of another goal identified by goal id B. We don't yet use this test, but I expect we will in the future. mdbcomp/program_representation.m: Add the goal_id type. Replace the existing goal_path and goal_path_consable types with two new types, forward_goal_path and reverse_goal_path. Since these now have wrappers around the list of goal path steps that identify each kind of goal path, it is now ok to expose their representations. This makes several compiler passes easier to code. Update the set of operations on goal paths to work on the new data structures. Add a couple of step types to represent lambdas and try goals. Their omission prior to this would have been a bug for constraint-based mode analysis, or any other compiler pass prior to the expansion out of lambda and try goals that wanted to use goal paths to identify subgoals. browser/declarative_tree.m: mdbcomp/rtti_access.m: mdbcomp/slice_and_dice.m: mdbcomp/trace_counts.m: slice/mcov.m: deep_profiler/*.m: Conform to the changes in goal path representation. compiler/hlds_goal: Replace the goal_path field with a goal_id field in the goal_info, indicating that from now on, this should be used to identify goals. Keep a reverse_goal_path field in the goal_info for use by RBMM and CTGC. Those analyses were too hard to convert to using goal_ids, especially since RBMM uses goal_paths to identify goals in multi-pass algorithms that should be one-pass and should not NEED to identify any goals for later processing. compiler/goal_path: Add predicates to fill in goal_ids, and update the predicates filling in the now deprecated reverse goal path fields. Add the operations needed by the rest of the compiler on goal ids and containing goal maps. Remove the option to set goal paths using "mode equivalent steps". Constraint based mode analysis now uses goal ids, and can now do its own equivalent optimization quite simply. Move the goal_path module from the check_hlds package to the hlds package. compiler/*.m: Conform to the changes in goal path representation. Most modules now use goal_ids to identify goals, and use a containing goal map to convert the goal ids to goal paths when needed. However, the ctgc and rbmm modules still use (reverse) goal paths. library/digraph.m: library/group.m: library/injection.m: library/pprint.m: library/pretty_printer.m: library/term_to_xml.m: Minor style improvements. |
||
|
|
8a28e40c9b |
Add the predicates sorry, unexpected and expect to library/error.m.
Estimated hours taken: 2 Branches: main Add the predicates sorry, unexpected and expect to library/error.m. compiler/compiler_util.m: library/error.m: Move the predicates sorry, unexpected and expect from compiler_util to error. Put the predicates in error.m into the same order as their declarations. compiler/*.m: Change imports as needed. compiler/lp.m: compiler/lp_rational.m: Change imports as needed, and some minor cleanups. deep_profiler/*.m: Switch to using the new library predicates, instead of calling error directly. Some other minor cleanups. NEWS: Mention the new predicates in the standard library. |
||
|
|
c6d041cbc5 |
Improve the efficiency of the algorithms that select the best parallelsation of
a conjunction. Now (by default) the search will stop creating choice points if
it has already created too many choice points.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Fix a large number of whitespace problems, such as trailing whitespace at
the end of lines.
Never attempt to parallelise goals that arn't det or cc_multi.
Remove the original greedy search, it's now an option in the branch and
bound search code. Note that the greedy search algorithm has changed and
sacrifices more solutions for runtime than before.
Note that there are bugs remaining in a few cases causing incorrect
parallel execution times to be calculated for dependant parallelisations.
deep_profiler/mdprof_feedback.m:
Conform to changes in mdbcomp/feedback.automatic_parallelism.m.
Update parsing of options for the choice of best parallelsation algorithm.
deep_profiler/branch_and_bound.m:
Allow branch and bound code to track how many 'alternatives' have been
created and alter the search in response to this.
Branch and bound code must now be impure as it may call these impure
predicates.
Flush the output stream in debugging trace goals for branch and bound.
deep_profiler/measurements.m:
Adjust the interface to the parallelsation metrics structure, so that it is
easier to use with the new parallelsation search code.
Changes to the goal costs code:
Rename zero_goal_cost to dead_goal_cost, it is the cost of goals that are
never executed.
Modify atomic_goal_cost to take as a parameter the number of calls made to
this goal.
add_goal_costs has been renamed to add_goal_costs_seq since it computes
the cost of a sequential conjunction of goals.
The goal_cost_csq type has changed to track the number of calls made to
trivial goals.
deep_profiler/message.m:
Added a notice message to be used when the candidate parallel conjunction
is not det or cc_multi.
mdbcomp/feedback.automatic_parallelism.m:
Modify the alternatives for 'best parallelisation algorithm'.
This type now represents the new ways of selecting complete vs greedy
algorithms.
mdbcomp/program_representation.m:
Add a multi-moded detism_components/3 predicate and refactor
detism_get_solutions/1 and detism_get_can_fail/1 to call it.
Add a multi-moded detism_committed_choice/2 predicate and a
committed_choice type.
Fix whitespace errors in this file.
library/array.m:
modify fetch_items/4 to do bounds checking. This change helped me track
down a bug.
|
||
|
|
887a55f783 |
Make variable use analysis assume that the compiler cannot push signals or
waits for futures across module boundaries, which is usually true.
Add a new option to the feedback tool
--implicit-parallelism-intermodule-var-use. This option re-enables the old
behaviour.
Fix a number of bugs and improve the pretty-printing of candidate parallel
conjunctions.
deep_profiler/var_use_analysis.m:
Implement the new behaviour and allow it to be controlled.
Refactor some code to slightly reduce the number of arguments passed to
predicates.
deep_profiler/mdprof_feedback.m:
Implement the new command line option.
Conform to changes in feedback.automatic_parallelism.m.
deep_profiler/recursion_patterns.m:
Fixed a bug in the handling of can-fail switches.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Fix a bug in the calculation of dependency graphs. All goals are
represented by vertexes and dependencies are edges. The program failed to
create a vertex for a goal that had no edges.
Fix a crash when trying to compute variable use information for a goal that
is never called. This was triggered by providing the new variable use
information in the feedback format.
Using the extra feedback information improve the pretty-printing of
candidate parallelisations.
Conform to changes in feedback.automatic_parallelism.m
Conform to changes in var_use_analysis.m
mdbcomp/feedback.automatic_parallelism.m:
Add the new option to control intermodule variable use analysis.
Provided more information in the candidate parallel conjunctions feedback.
The costs of the goals before and after the parallel conjunction are
now provided.
The cost of every goal is now provided (not just calls)
Variable production and consumption times of the shared variables are
provided for each goal if the analysis evaluated them.
Modified convert_candidate_par_conjunctions_proc/3 and
convert_candidate_par_conjunction/3 to pass a reference to the current
parallel conjunction to their higher order argument.
mdbcomp/feedback.m:
Increment feedback file version number.
deep_profiler/program_representation_utils.m:
Improve the pretty-printing of goal representations, in particular, their
annotations.
deep_profiler/create_report.m:
Conform to changes in var_use_analysis.m.
deep_profiler/display_report.m:
Conform to changes in program_representation_utils.m.
library/lazy.m:
Added a new predicate, read_if_val(Lazy, Value) which is true of Lazy has
already been forced and produced Value.
(No update to NEWS necessary).
|
||
|
|
91e60619b0 |
Remove the concept of 'partitions' from the candidate parallel conjunction
mdbcomp/feedback.automatic_parallelism.m:
Remove the concept of 'partitions' from the candidate parallel conjunction
type. We no-longer divide conjunctions into partitions before
parallelising them.
mdbcomp/feedback.m:
Increment the feedback format version number.
compiler/implicit_parallelism.m:
Conform to changes in mdbcomp/feedback.automatic_parallelism.m.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Allow the non-atomic goals to be parallelised against one-another.
Modify the goal annotations used internally, many annotations used only for
calls are now used for any goal type.
Variable use information is now stored in a map from variable name to lazy
use data for every goal, not just for the arguments of calls.
Do not partition conjunctions before attempting to parallelise them.
Make the adjust_time_for_waits tolerate floating point errors more easily.
Format costs with commas and, in most cases, two decimal places.
deep_profiler/var_use_analysis.m:
Export a new predicate var_first_use that computes the first use of a
variable within a goal. This predicate uses a new typeclass to retrieve
coverage data from any goal that can implement the typeclass.
deep_profiler/measurements.m:
Added a new abstract type for measuring the cost of a goal, goal_cost_csq.
This is like cs_cost_csq except that it can represent trivial goals (which
don't have a call count).
deep_profiler/coverage.m:
Added deterministic versions of the get_coverage_* predicates.
deep_profiler/program_representation_utils.m:
Made initial_inst_map more generic in its type signature.
Add a new predicate, atomic_goal_is_call/2 which can be used instead of a
large switch on an atomic_goal_rep value.
deep_profiler/message.m:
Rename a message type to make it more general, this is required now that we
compute variable use information for arbitrary goals, not just calls.
library/list.m:
Add map3_foldl.
NEWS:
Announced change to list.m.
|
||
|
|
41cb1d2f79 |
Variable use analysis now uses conservative assumptions for recursive call costs
when recursive call site cost information cannot be computed.
deep_profiler/var_use_analysis.m:
The var use analysis predicates now use instantiation state subtypes for
the recursion type information.
Only look for variable use in recursive calls if we know the cost of the
recursive call.
In call_var_first_use, use map.lookup to get the cost of a call site rather
than map.search. All call site costs are now guaranteed to be known since
this predicate is only valid for recursion types that provide this
information.
deep_profiler/recursion_patterns.m:
Fix bugs in the calculation of recursion data in semidet code.
Use instantiation states to support the subtypes used in
var_use_analysis.m.
deep_profiler/measurements.m:
Add a new abstract type, recursion_depth.
deep_profiler/analysis_utils.m:
deep_profiler/recursion_patterns.m:
deep_profiler/mdprof_fb.automatic_parallelism.m:
Conform to changes in measurements.m
|
||
|
|
881039cfed |
Correct problems in the automatic parallelism analysis.
This patch fixes various problems, the most significant is the calculation of
variable use information. The parallelisation analysis uses deep profiling
data. In other words, profiling data that is attached to context information
referring to not just the procedure but the chain of calls leading to that
invocation of that procedure (modulo recursion). The variable use analysis did
not use deep profiling data, therefore comparing the time that a variable is
produced with a call to the time in total of that call was not sound, and
sometimes resulted in information that is not possible, such as a variable
being produced or consumed after the call that produces or consumes it has
exited.
This change-set updates the variable use analysis to use deep profiling data to
avoid these problems. At the same time it provides more accurate information
to the automatic parallelisation pass. This is possible because of an earlier
change that allowed the coverage data to use deep profiling data.
In its current state, the parallelisation analysis now finishes without errors
and computes meaningful results when analysing a profile of the mercury
compiler's execution.
deep_profiler/report.m:
The proc var use report is now a call site dynamic var use report.
1) It now uses deep profiling data.
2) It makes more sense from the callers perspective so it's now based
around a call site rather than a proc.
Add inst subtypes to the recursion_type type.
deep_profiler/query.m:
The proc var use query is now a call site dynamic var use query, see
report.m.
deep_profiler/var_use_analysis.m:
Fix a bug here and in mdprof_fb.automatic_parallelism.m: If a
variable is consumed by a call and appears in it's argument list more than
once, take the earliest consumption time rather than the one for the
earliest argument.
Variable use analysis now uses recursion_patterns.m to correctly compute
the cost of recursive calls. It also uses 'deep' profiler data.
Only measure variable use relative to the entry into a procedure, rather
than either relative to the entry or exit. This allows us to simplify a
lot of code.
deep_profiler/create_report.m:
The proc var use info report is now a call site dynamic var use info
report.
Move some utility code from here to the new analysis_utils.m module.
deep_profiler/display_report.m:
Conform to changes in report.m.
Improve the information displayed for variable first-use time
reports.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Conform to changes in report.m
Refactored the walk down the clique tree. This no-longer uses the
clique reports from the deep profiling tool.
We now explore the same static procedure more than once. It may be best to
parallelise it in some contexts rather than others but for now we assume
that the benefits in some context are worth the costs without benefit in
the other contexts. This is better than reaching a context where it is
undesirable first and never visiting a case where parallelisation is
desirable.
Fix a bug in the calculation of how much parallelisation is used by
parallelisations in a clique's parents. This used to trigger an
assertion.
Don't try to parallelize anything in the "exception" module.
There's probably other builtin code we should skip over here.
Removed an overzealous assertion that was too easily triggered by the
inaccuracies of IEEE-754 arithmetic.
Compute variable use information lazily for each variable in each call. I
believe that this has made our implementation much faster as it no-longer
computes information that is never used.
Refactor and move build_recursive_call_site_cost_map to the new
module analysis_utils.m where it can be used by other analyses.
Call site cost maps now use the cs_cost_csq type to store costs,
code in this module now conforms to this change.
Conform to changes in messages.m
deep_profiler/recursion_patterns.m:
Export a new predicate, recursion_type_get_maybe_avg_max_depth/2. This
retrieves the average maximum recursion depth from recursion types that know
this information.
Move code that builds a call site cost map for a procedure to
analysis_utils.m where it can be used by other analyses.
deep_profiler/analysis_utils.m:
Added a new module containing various utility predicates for profile
analysis.
deep_profiler/coverage.m:
Added an extra utility predicate get_coverage_after/2.
deep_profiler/message.m:
Each message has a location that it refers to, a new location type has
been added: call_site_dynamic.
Added a new warning that can be used to describe when a call site's
argument's use time cannot be computed.
Added new predicates for printing out messages whose level is below a
certain threshold. These predicates can be called from io trace goals.
Message levels start at 0 and currently go to 4, more critical messages
have lower levels. The desired verbosity level is stored in a module local
mutable.
deep_profiler/mdprof_feedback.m:
Move the message printing code from here to message.m.
deep_profiler/old_html_format.m:
deep_profiler/old_query.m:
Conform to changes in query.m.
mdbcomp/feedback.automatic_parallelism.m:
Added a new function for computing the 'cpu time' of a parallel
computation.
library/lazy.m:
Moved lazy.m from extras to the standard library.
library/list.m:
Add a new predicate member_index0/3. Like member/2 except it also gives
the zero-based index of the current element within the list.
library/maybe.m:
Add two new insts.
maybe_yes(I) for the maybe type's yes/1 constructor.
maybe_error_ok(I) for the maybe_error type's ok/1 constructor.
library/Mercury.options:
Add a work around for compiling lazy.m with intermodule optimisations.
NEWS:
Update news file for the addition of lazy.m and the member_index0 predicate
in list.m
deep_profiler/.cvsignore:
Ignore feedback.automatic_parallelism.m which is copied by Mmakefile from
the mdbcomp/ directory.
|
||
|
|
1793e3898b |
Updated the automatic parallelism analysis to use the new recursive call costs
analysis.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Use new clique recursion costs report to give the costs of recursive calls.
This is more accurate than the current method which is only accurate in some
less-common situations.
Refactored the walk through the program's call graph so that it fits more
neatly with the calculation of recursive calls. For instance it is
no-longer necessary to know the cost of the call into the current clique.
Delete a number of predicates that are never called.
deep_profiler/message.m:
Added a new message type, warning_cannot_compute_cost_of_recursive_calls
since the new recursive call cost algorithm is incomplete.
deep_profiler/recursion_patterns.m:
Avoid a thrown exception when trying to retrieve the parent call site of the
initial clique.
Fix the calculation of recursion depth. Name some variables more clearly
to avoid similar issues.
deep_profiler/report.m:
Add a clarifying comment to the recursion_type data type to indicate that
costs are per-call.
mdbcomp/program_representation.m:
Added a new exported predicate goal_path_inside/3 like goal_path_inside/2
except that it also returns the goal path of the inner goal relative to the
outer goal.
Made goal_path_inside/2 more efficient by using list.remove_suffix rather
than list.append which creates a choice point whose second solution always
fails. (See the comment on list.append/3 in mode out, in, in.
|
||
|
|
067934f008 |
Start using the new dynamic coverage information in the deep profiler.
This patch separates the coverage annotated procedure report into two
reports, one with dynamic coverage data and one with static coverage data.
This restores the functionality of the static coverage report since my last
change, and provides access to the dynamic report via new controls only
visible to developers.
deep_profiler/query.m:
In the cmd data-type:
Rename deep_cmd_procrep_coverage constructor to
deep_cmd_static_procrep_coverage.
Add deep_cmd_procrep_dynamic_coverage.
In the preferences data-type:
Add a new field pref_developer_mode which indicates if developer-only
options are visible or not.
Add code to parse and print the new command and preference option.
deep_profiler/create_report.m:
Specialise create_procrep_coverage_report/3 into
create_{static,dynamic}_coverage_report/4.
Created a new exported function deep_get_maybe_procrep. This is useful for
getting a procedure representation from the deep data-structure in one
step.
deep_profiler/display.m:
Add a new display item, display_developer. This wraps another display
item but is only displayed when developer mode is active.
deep_profiler/display_report.m:
Add a control to the main screen that enables or disabled developer mode.
This control has been placed at the bottom of the screen so that it's out
of the way.
Put the developer controls on the main screen into their own list (there's
only one at the moment).
For now the coverage-annotated procedure representation link on a (static)
procedure's page is not a developer option. Should this be a developer
option?
Added a link to the dynamic coverage annotated procedure representation
report from the dump proc dynamic report.
Added a link to the clique dump report from the clique report, the dynamic
coverage annotated procedure representation report can be accessed
transitively through this link.
Added a link the variable use analysis report and proc static report to the
procedure report and static coverage annotated procedure representation
report.
deep_profiler/html_format.m:
Support the new display_developer item.
Refactor the item_to_html code for lists.
deep_profiler/profile.m:
Include a new field in the deep data-structure for storing coverage data
that is indexed by a proc_static_ptr. When dynamic coverage information is
used this field is populated by adding per ProcDynamic data for each static
procedure.
deep_profiler/startup.m:
Fill in the per ProcStatic coverage data when the deep profiler starts up.
deep_profiler/measurements.m:
Create a new data type static_coverage_info which represents per ProcStatic
coverage information.
Include functions that are used when calculating per ProcStatic coverage
information from per ProcDynamic coverage information.
deep_profiler/mdprof_cgi.m:
Remove rarely used command line option rather making it conform to changes
in query.m.
deep_profiler/old_html_format.m:
deep_profiler/old_query.m:
Conform to changes in query.m.
deep_profiler/mdprof_test.m:
deep_profiler/mdprof_fb.automatic_parallelism.m:
deep_profiler/recursion_patterns.m:
deep_profiler/var_use_analysis.m:
Conform to changes in create_report.
deep_profiler/array_util.m:
Add a new predicate, array_foldl3_from_1 to support propagation of coverage
information from proc dynamics to proc statics.
|
||
|
|
699663e5e2 |
Deep profiler enhancements.
Create two new deep profiler reports: the first calculates the costs of
recursive calls at any depth of recursion for a clique. The second summarises
the types of recursion seen in a program.
deep_profiler/query.m:
Introduce new cmd types.
Conform to changes of the cmd type.
Add a new preference: the maximum number of proc statics to display for
each recursion type.
Memoize the creation of the recursion types histogram report as it can take
a while to generate, 39 minutes on a Core 4 when generating a report for
the compiler it's self.
deep_profiler/report.m:
Define the new report structures.
deep_profiler/create_report.m:
Handle creation of the new reports.
Export describe_proc and own_and_inherit_to_perf_row_data so that the
recursion_patterns module can use them.
Write a find_clique_first_and_other_procs predicate to find the first
procedure in a clique, it also returns a list of the remaining procedures.
This is exported so that recursion_patterns can use it but it belongs here
as it is generally used for creating reports.
Refactor the retrieval of the progrep data from the deep data structure.
Make a note about a change that could be made to speed up large analyses.
deep_profiler/profile.m:
Refactor the retrieval of the progrep data from the deep data structure.
deep_profiler/display_report.m:
Handle translation of the new reports to the display data type.
Link to this report from clique reports, and to clique reports from the new
clique recursion costs report.
Refactor the code that constructs the lists of related reports.
deep_profiler/recursion_patterns.m:
Create a new module to contain the recursion pattern analysis code.
deep_profiler/old_html_format.m:
deep_profiler/old_query.m:
Conform to changes in query.m
deep_profiler/mdprof_fb.automatic_parallelism.m:
Remove add_call_site_to_map it is a duplicate of add_call_site_report_to_map.
Move add_call_site_report_to_map and proc_label_from_proc_desc to report.m
deep_profiler/measurement_units.m:
Include functions to manipulate probabilities of conjoint and disjoint events.
Allow percentages to be greater than 100% and less than 0% this can occur
legitimately in some cases.
deep_profiler/measurements.m:
deep_profiler/var_use_analysis.m:
Move weighted_average/3 from var_use_analysis.m to measurements.m where it
can be used globally.
deep_profiler/mdprof_test.m:
Modify the mdprof_test program so that it can generate the recursion types
histogram report, this is useful for profiling and debugging.
|
||
|
|
7425922921 |
Refactor mdbcomp/feedback.m
Move automatic parallelisation specific code to a new module mdbcomp/feedback.automatic_parallelism.m. mdbcomp/feedback.m: mdbcomp/feedback.automatic_parallelism.m: As above. slice/Mmakefile deep_profiler/Mmakefile Copy the new file into the current working directory when with the other mdbcomp files. compiler/implicit_parallelism.m: deep_profiler/mdprof_fb.automatic_parallelism.m: deep_profiler/mdprof_feedback.m: deep_profiler/measurements.m: Import the new module to access code that used to be in feedback.m Remove unused module imports. |
||
|
|
1580ad27ae |
Spell dependent correctly (not dependant) in identifiers comments and
strings.
mdbcomp/feedback.m:
deep_profiler/mdprof_fb.automatic_parallelism.m:
As above.
mdbcomp/feedback.m:
Increment the feedback version number.
|
||
|
|
f16e8118bd |
Implement a linear alternative to the exponential algorithm that determines how
best to parallelise a conjunction.
Made other performance improvements.
mdbcomp/feedback.m:
Add a field to the candidate_parallel_conjunction_params structure giving
the preference of algorithm.
Simplify the parallel exec metrics type here. It is now used only to
summarise information that has already been calculated. The original code
has been moved into deep_profiler/measurements.m
Add a field to the candidate_par_conjunction structure giving the index
within the conjunction of the first goal in the partition. This is used
for pretty-printing parallelisation reports.
Incremented the feedback format version number.
deep_profiler/measurements.m:
Move the original parallel exec metrics type and code here from
mdbcomp/feedback.m
deep_profiler/create_report.m:
Avoid a performance issue by memoizing create_proc_var_use_dump_report
which is called by the analysis for the same procedure (at different
dynamic call sites) many times. In simple cases this more than doubled the
execution time, in more complicated cases it should perform even better.
Conform to changes in coverage.m
deep_profiler/mdprof_fb.automatic_parallelism.m:
Implement the linear algorithm for parallelising a conjunction.
Since we don't to parallelism specialisation don't try to parallelise the
same procedure more than once. This should avoid some performance problems
but I haven't tested it.
If it is impossible to generate an independent parallelisation generate a
dependent one and then report it as something we cannot parallelise. This
can help programmers write more independent code.
Use directed graphs rather than lookup maps to track dependencies. This
simplifies some code as the digraph standard library module already has
code to compute reverse graphs and transitive closures of the graphs.
Since there are now two parallelisation algorithms; code common to both of
them has been factored out.
The objective function used by the branch and bound search has been
modified to take into account the overheads of parallel execution. It is:
minimise(ParTime + ParOverheads X 2.0)
This way we allow the overheads to increase by 1csc provided that it
reduces ParTime by more than 2csc. (csc = call sequence counts)
When pretty-printing parallelisation reports print each goal in the
parallelised conjunction with it's new goal path. This makes debugging
easier for large procedures.
Fix a bug where the goal path of scope goals was calculated incorrectly,
this lead to a thrown exception in the coverage analysis code when it used
the goalpath to lookup the call site of a call.
deep_profiler/mdprof_feedback.m:
Support a new command line option for choosing which algorithm to use.
Additionally the linear algorithm will be used if the problem is above a
certain size and the exponential algorithm was chosen. This can be
configured including the fallback threshold.
Print the user's choice of algorithm as part of the candidate parallel
conjunctions report.
deep_profiler/message.m:
Add an extra log message type for exceptions thrown during auto
parallelisation.
deep_profiler/program_representation_utils.m:
The goal_rep pretty printer now prints the goal path for each goal.
deep_profiler/coverage.m:
procrep_annotate_with_coverage now catches and returns exceptions in a
maybe_error result.
deep_profiler/cliques.m:
Copy predicates from the standard library into cliques.m to prevent the
lack of tail recursion from blowing the stack in some cases. (cliques.m is
compiled with --trace minimum).
deep_profiler/callgraph.m:
Copy list.foldl from the standard library into callgraph.m and transform it
so that it is less likely to smash the stack in non tail-recursive grades.
deep_profiler/read_profile.m:
Transform read_nodes so that it is less likely to smash the stack in non
tail-recursive grades.
deep_profiler/Mercury.options:
Removed old options that where used to work around a bug. The bug still
exists but the work-around moved into the compiler long ago.
|
||
|
|
531c2d94ea |
Automatic Parallelisation Improvements.
Factor in all the costs of parallelistion into the parallel overlap estimation
algorithm. Previously only some costs where being taken into consideration.
Independent parallelsations are now generally preferred as they have fewer
overheads for similar parallelsations.
Generalised the branch and bound search algorithm into a new Mercury module.
mdbcomp/feedback.m:
Grouped candidate parallel conjunction parameters into a single type.
Added extra parameters:
future_signal_cost
future_wait_cost
context_wakeup_delay.
The first two replace locking cost, they are the costs of the signal and
wait calls for futures respectively. The third represents the length of
time for a context to begin executing after it has been placed on the run
queue. It is used to estimate the cost of blocking.
Refactored the parallel_exec_metrics type to make representing overheads easier.
Modify parallel_exec_metrics so that it can represent the cost of calling
signal in the left conjunct of any conjunct pair.
Modify parallel_exec_metrics so that it stores the parallel execution time
of the initial (leftmost) conjunct. This is necessary as the parallel
execution time includes the cost of the 'fork' call of the next conjunct.
Modify parallel_exec_metrics to record the cost of blocking for the
leftmost conjunct if it completes before the parallel conjunction completes
as a whole.
Increment the feedback file format version number.
compiler/implicit_parallelism.m:
Conform to changes in mdbcomp/feedback.m.
deep_profiler/branch_and_bound.m:
A generic branch and bound solver loop and utilities.
The modified branch and bound code includes a profiling facility.
deep_profiler/Mercury.options:
The new branch_and_bound module supports the debug_branch_and_bound trace
flag.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Generalise and move branch and bound code to branch_and_bound.m
Removed the candidate_parallel_conjunctions_opts type, we now use the
candidate_par_conjunctions_params type in its place.
Modify the code for parallelising conjunctions so that it works with lists
of goals rather than cords of goals.
Factor out the code tha looks for the next costly call, this is now handled
by a preprocessing pass so that it has linear time rather than increasing
the complexity of the search code.
Documented some predicates in more detail.
deep_profiler/mdprof_feedback.m:
Conform to changes in deep_profiler/mdprof_fb.automatic_parallelism.m and
mdbcomp/feedback.m
Add command line support for the new candidate parallel conjunctions
feedback parameters.
|
||
|
|
c877dceb2b |
Refactor profiler feedback code for implicit parallelism.
This change mostly re-factors the goal representation used to feedback implicit
parallelism information to the compiler. The goal_rep datatype is now used
rather than the much simpler datatype. (goal_rep is the same type that is used
by the declarative debugger).
This makes it easier for the compiler to match HLDS goals against goals from
the implicit parallelism analysis and will probably help in the future if the
analysis wants the compiler to re-order goals.
It also makes it easier to pretty-print the feedback sent to the compiler in
more detail.
mdbcomp/feedback.m:
As above, redefine pard_goal as a type alias to
goal_rep(pard_goal_annotation).
Added a new type, candidate_par_conjunctions_proc, it represents candidate
parallelisations within a procedure along with shared information for the
procedure.
Add a new predicate, convert_candidate_par_conjunctions_proc.
Increment the feedback file format version number.
mdbcomp/program_representation.m:
XXX: See about refactoring bytecode in/out put into one place.
Add a new predicate transform_goal_rep for transforming a goal_rep
structure from one arbitrary annotation type to another.
Add extra predicates to aid in converting a prog_rep structure to and from
bytecode. This includes cut_byte/2 and can_fail_byte/2.
deep_profiler/program_representation_utils.m:
Export print_goal_to_strings/4 so that it can be used when printing the
feedback file reports.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Conform to changes in mdbcomp/feedback.m
Wrap some lines at 76 characters.
Improve explanations in comments.
Use the goal_rep pretty-printer to print the candidate parallel
conjunctions feedback report.
deep_profiler/mdprof_feedback.m:
Conform to changes in deep_profiler/mdprof_fb.automatic_parallelism.m
deep_profiler/program_representation_utils.m:
Modify print_goal_to_strings to print determinisms and annotations on
separate lines before each goal.
deep_profiler/display_report.m:
Modify pretty printing of coverage annotations so that they make sense
after modifying print_goal_to_strings/4.
compiler/implicit_parallelism.m:
Refactor goal matching code that compares HLDS goals to feedback goals.
Goal matching is now more accurate and can more easily support goal
re-ordering when parallelising code (this is not implemented yet).
The code that builds parallel conjunctions has also been refactored.
This pass now generates warnings if it is not able to parallelise
a candidate parallel conjunction in the feedback data.
Insert deeper and later parallelizations before shallower or earlier ones,
this makes it easier to continue to parallelise a procedure as it's goal
tree changes due to parallelisation.
Silently ignore duplicate candidate parallel conjunctions.
Refuse to parallelise a procedure that has been parallelized explicitly.
compiler/prog_rep.m:
Re-factor the hlds_goal to bytecode transformation, this transformation now
goes via goal_rep. We use the hlds_goal to goal_rep portion of this
transformation in compiler/implicit_parallelism.m.
Add variable names prefixed with DCG_ to the list of those introduced by
the compiler.
compiler/goal_util.m:
Modify maybe_transform_goal_at_goal_path so that it returns a value that
can describe the different kinds of error that may be encountered.
Add a new predicate, maybe_transform_goal_at_goal_path_with_instmap. Given
a goal, goal path and initial inst map this predicate recurses the goal
structure following the goal path and maintaining the inst map. It then
uses a higher order value to transform the goal at it's destination before
re-constructing the goal. It is different to
maybe_transform_goal_at_goal_path in that it passes the instmap to it's
higher order argument, the instmap is correct for the state immediately
before executing the goal in question.
compiler/hlds_pred.m:
Include the procedure's varset in the information used to construct the
program representation data that is included in deep profiling builds.
compiler/instmap.m:
Add a useful function, apply_instmap_delta_sv. This is the same as
apply_instmap_delta except that it's arguments are in a more convenient
order for state variable notation.
compiler/stack_layout.m:
Export compute_var_number_map for the use of implicit_parallelism.m and
prog_rep.m
compiler/error_util.m:
Add a new error phase, 'phase_auto_parallelism'. This is used for warnings
issued from the automatic parallelisation transformation.
compiler/deep_profiling.m:
Conform to changes in hlds_pred.m
compiler/mercury_compile_middle_passes.m:
Conform to changes in implicit_parallelism.m
compiler/type_constraints.m:
Conform to changes in goal_util.
|
||
|
|
455f87ba38 |
Automatic parallelisation improvements.
The automatic parallelisation analysis now searches for the best way to
parallelise a conjunction when it's deciding if parallelisation is worth-while.
This mostly affects unifications and other cheap goals between two calls being
parallelised; it decides which parallel conjunct each of these goals should be
placed in. The search used is a branch and bound search.
deep_profiler/mdprof_fb.automatic_parallelism.m:
As above,
find_costly_call now returns find_costly_call_result rather than using the
maybe type. Instantiation sub-typing is used to guarantee that the goal
returned is indeed a costly call.
Fix a bug in calculate_dependant_parallel_cost.
When printing the candidate parallel conjunction report include the goals
before and after a parallel conjunction.
Document the debug_parallel_conjunction_speedup trace flag.
Create a new type parallelise_dep_conjs rather than using a boolean to
represent this option.
Add a new trace flag, debug_branch_and_bound enables trace goals that can
be used to debug the branch and bound search for the best parallelisation.
deep_profiler/mdprof_feedback.m:
When printing the candidate parallel conjunction report remove some
vertical whitespace after the header of the report.
deep_profiler/Mercury.options:
Add a new trace flag, debug_branch_and_bound enables trace goals that can
be used to debug the branch and bound search for the best parallelisation.
This is commented out.
mdbcomp/feedback.m:
Include the goals before and after a parallel conjunction in the
candidate_par_conjunction type.
Modify the parallel_exec_metrics code so that it can handle the cost of the
goals before and after a parallel conjunction.
Increment the feedback file format version.
When reading a feedback file strip the program name of leading and trailing
whitespace.
When printing a message for the incorrect_program_name error enclose the
program names in quotes.
Conform to changes in mdprof_fb.automatic_parallelism.m
|
||
|
|
b7f0270f36 |
Implement the new algorithm for calculating how dependant parallel conjuncts'
executions overlap. This algorithm has also been generalised to handle cases
where there are more than two conjuncts in a parallel conjunction. A number of
other improvements have also been made.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Wrote dependant parallel conjunction overlap analysis algorithm (as above).
This algorithm introduced a new structure, parallel_execution_overlap.
This structure describes how dependant parallel executions overlap.
Use both sparking cost and sparking delay as costs of parallelisation.
Sparking cost is the cost from the perspective of the sparker, whereas
delay is the delay between creating the spark and actually beginning the
execution of the spark.
Handle pretty-printing of the candidate parallel conjunction structure.
Include variable identifiers as well as canonical names in the
pardgoal_type structure.
The inst_map_info structure has been modified to contain the sets of
consumed and produced variables separately, rather than simply containing a
set of all consumed and produced variables.
Improve the readability of messages printed by trace goals.
The search code no longer attempts to look up procedure bodies for code
whose module is "Mercury runtime".
Conjunctions that did not have a speedup due to parallelisation are now
printed out by a new trace goal.
deep_profiler/mdprof_feedback.m:
Include support for pretty printing the feedback information after
creating it. This is handled by the new --report command line option.
Include a new --implicit-parallelism-sparking-delay command line option.
This may be used to specify how long it takes an engine to steal a spark.
mdbcomp/feedback.m:
Export the sparking delay as part of the feedback information.
Create a new structure parallel_exec_metrics which contains many metrics
about parallel execution performance. This is exported for each candidate
parallel conjunction rather than only exporting the Speedup.
Create predicates for creating and querying the parallel_exec_metrics
structure.
Create a new predicate, get_all_feedback_data/2, this is used to retrieve
all the data for building the report in the mdprof_feedback tool.
Increment the feedback file format version number.
deep_profiler/message.m:
Improve the readability of the messages printed due to verbosity settings.
Export some predicates that can be used for managing indentation while
pretty-printing structures.
compiler/implicit_parallelism.m:
Conform to changes in feedback_data_candidate_parallel_conjunctions.
Add a pi_sparking_delay field to parallelism information.
deep_profiler/program_representation_utils.m:
Fix a bug in calc_inst_map_delta/3.
Correct a comment for inst_map_ground_vars/5.
deep_profiler/cliques.m:
Fixed a minor indentation issue.
deep_profiler/Mercury.options:
Document the new trace goal that enables printing of candidate parallel
conjunctions that do not result in a speedup.
|
||
|
|
3d6770a091 |
Refactor feedback parallelisation code.
These changes rename some poorly named types from inner_goal to pard_goal.
'pard' means 'parallelised'. This is explained in a comment near this type.
The candidate_par_conjunction type has been made polymorphic on the type that
it uses to represent individual goals. This is easier than using slightly
different candidate_par_conjunction types in different modules.
mdbcomp/feedback.m:
Changes to types as above.
Introduce predicates to convert candidate_par_conjunctions from one type to
anther given a function to convert the type of goal used.
Increment the feedback file format version number.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Remove our alternative candidate_par_conjunction types in favor of the
polymorphic type in feedback.m
Rename the type inner_goal_internal to pard_goal_detail.
Rename occurrences inner_goal or InnerGoal to pard_goal or PardGoal.
Use the generic conversion code in feedback.m to convert between different
types of candidate_par_conjunction.
Conform to changes in mdbcomp/feedback.m
compiler/implicit_parallelism.m:
Rename occurrences inner_goal or InnerGoal to pard_goal or PardGoal.
Conform to changes in mdbcomp/feedback.m
|
||
|
|
25ed5e004d |
Add an option to mdprof_feedback to control whether the automatic
parallelisation feedback will recommend parallelising dependant conjunctions or
not. The compiler will now parallelise both independent and dependant
conjunctions.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Add a field to the candidate_parallel_conjunctions_opts structure to
represent whether we should parallelise dependant conjunctions.
Use this flag to determine if a dependant conjunction should be recommended
for parallelisation in innergoals_build_candidate_conjunction.
deep_profiler/mdprof_feedback.m:
Add the actual command line argument.
Update the --help message.
Conform to changes in mdprof_fb.automatic_parallelism.m.
compiler/implicit_parallelism.m:
Previously the compiler would not automatically parallelise dependant
conjunctions, this restriction has been removed as the control is now
available in the mdprof_feedback tool.
|
||
|
|
79c3f39a68 |
Implicit parallelism work.
The implicit parallelism algorithm, feedback file format and therefore compiler
have been updated. They now support parallelisation across other goals and, in
theory, parallelising three or more calls against one another. The algorithm
is far from complete and very much experimental, it has been tested on a
modified version of icfp_2000 where it improves the runtime. Note that
automatic parallelisation of dependant conjunctions is disabled for now.
mdbcomp/feedback.m:
Modify deep profiling feedback data, a candidate parallel conjunct now
contains a list of sequential conjunctions that contain other goals.
Previously only two calls to be parallelised against one-another where
listed.
Document restrictions on the new candidate parallel conjunct structure that
can't be expressed by the type system.
Incremented the feedback file format number.
mdbcomp/program_representation.m:
Made a semidet predicate version of empty_goal_path.
Created maybe_search_var_name which returns it's result wrapped in a maybe
structure, this is a deterministic alternative to search_var_name. It is
useful as an argument to list.map
deep_profiler/mdprof_feedback.m:
When printing messages out to stderr also print the newlines between the
messages to stderr.
deep_profiler/measurements.m:
Re-aranged the order of arguments and added a four argument version for
sub_computation_parallelism.
Added a new function, some_parallelism/1, that initialises a parallelism
amount as.
deep_profiler/message.m:
Added extra messages.
Pretty-print program locations using the conventional string representation
for procedures and goal paths. Export the predicate that does this.
deep_profiler/program_representation_utils.m:
Export a predicate to format a procedure identifier nicely.
Add code for calculating and manipulating inst_map_delta objects similar to
those in the compiler.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Various code cleanups/simplifications.
Re-worked the parallelisation algorithm, it can now parallelise across
cheaper calls and (theoretically) handle parallel conjunctions with any
number of conjuncts.
Conform to new candidate parallel conjunction representation.
Internally use a structure similar to the candidate parallel conjunct
structure in feedback.m This makes the maybe_call_conjunct structure
obsolete, the old structure has been removed.
compiler/implicit_parallelism.m:
Updated implicit parallelism transformation to conform to the new feedback
file format.
compiler/goal_util.m:
Added goal_is_atomic/2
Modified create_conj_from_list to simply return the only goal in the list
when the list contains exactly one goal.
library/maybe.m:
Add a simple predicate (maybe_is_yes/2) that 'opens' a maybe and returns the result or
fails.
NEWS:
Announce maybe_is_yes/2
|
||
|
|
150213df77 |
Automatic parallelism search improvement.
Limit the automatic parallelism recursion through the program's clique graph so
that it does not consider cliques and their children if it can introduce enough
parallelism in their parent. This is incomplete work from earlier in the year
and will require fine tuning later. However it should be committed now to
avoid it getting lost, forgotten or diverging too far from the current version
and becoming hard to apply.
deep_profiler/measurements.m:
Introduce a parallelism amount type. This opaque type tracks the amount of
parallelism that is probably being exploited at a given point in the
programs execution. For now it's simply a single value representing the
'likely' parallelism available but it may become a tuple of largest,
smallest and likely amounts of parallelism.
Introduce no_parallelism/0, a function returning the representation of no
parallelism.
Introduce sub_computation_parallelism/3, this calculates the parallelism
currently exploited during a child's execution given the parallelism
exploited during the parent's and the probability that this child will be
executed in parallel.
Introduce exceeded_desired_parallelism/2, this semidet predicate is true
when we have more likely parallelism than desired parallelism.
deep_profiler/measurement_units.m:
Introduce a probability type. This is an opaque type containing a floating
point value between 0.0 and 1.0 representing the probability of an event.
Such as the probability that execution will enter a given branch.
Introduce functions for constructing probabilities such as certain/0,
impossible/0 and possible/1. As well as a function to return the
probability as a float between 0.0 and 1.0.
deep_profiler/mdprof_fb.automatic_parallelism.m:
Introduce two new types, candidate_par_conjunction_internal and
candidate_par_conjunct_internal. These are similar to
candidate_par_conjunction and candidate_par_conjunct which are defined in
mdbcomp/feedback.m These types are used internally and may be changed
without changing the feedback file format.
Modify the predicates that recurse through the clique graph, they now pass
the amount of parallelism already exploited and check it upon entering a
clique.
When recursing into a call calculate the probability that the evaluation of
this call will overlap with the evaluation of another.
Corrected a spelling error that was repeated in many places.
|
||
|
|
e70295415d |
Various changes for automatic parallelism, the two major changes are:
Estimated hours taken: 20.
Branches: main
Various changes for automatic parallelism, the two major changes are:
Refactored some of the search for parallel conjunctions to use types that
describe the cost of a call site and the cost of a clique-procedure. These
new types make it harder for programmers to mistakingly compare values of
either type accidentally.
Where possible, use the body of a clique to determine the cost of recursive
calls at the top level of recursion. This improves the accuracy of this
calculation significantly.
deep_profiler/mdprof_fb.automatic_parallelism.m:
As above.
deep_profiler/measurements.m:
New cost data types as above.
deep_profiler/coverage.m:
When coverage information completeness tests fail print out the procedure
where the coverage information is incomplete.
deep_profiler/message.m:
Introduce a new warning used in the automatic parallelism analysis.
deep_profiler/profile.m:
Introduce a semidet version of deep_get_progrep_det.
mdbcomp/program_representation.m:
Introduce a predicate to return the goal_rep from inside a case_rep
structure. This can be used as higher order code to turn a case list into
a goal list for example.
deep_profiler/Mercury.options:
Keep a commented out MCFLAGS definition that can be used to enable
debugging output for the automatic parallelism analysis.
|
||
|
|
453a08caab |
Split up mdprof_feedback.m into three modules, The original, utility code for
Estimated hours taken: 1.5. Branches: main Split up mdprof_feedback.m into three modules, The original, utility code for raising messages and the code related to automatic parallelism. deep_profiler/mdprof_fb.m: deep_profiler/mdprof_fb.automatic_parallelism.m: deep_profiler/mdprof_feedback.m: deep_profiler/message.m: As above |