Files
mercury/compiler/delay_construct.m
Peter Wang b86f973fa9 Allow the use of Mercury abstract machine float registers for passing
Branches: main

Allow the use of Mercury abstract machine float registers for passing
double-precision float arguments in higher order calls.

In of itself this is not so useful for typical Mercury code.  However, as
all non-local procedures are potentially the targets of higher order calls,
without this change first order calls to non-local procedures could not use
float registers either.  That is the actual motivation for this change.

The basic mechanism is straightforward.  As before, do_call_closure_* is
invoked to place the closure's hidden arguments into r1, ..., rN, and extra
input arguments shifted into rN+1, etc.  With float registers, extra input
arguments may also be in f1, f2, etc. and the closure may also have hidden
float arguments.  Optimising for calls, we order the closure's hidden
arguments so that all float register arguments come after all regular
register arguments in the vector.  Having the arguments out of order does
complicate code which needs to deconstruct closures, but that is not so
important.

Polymorphism complicates things.  A closure with type pred(float) may be
passed to a procedure expecting pred(T).  Due to the `float' argument type,
the closure expects its argument in a float register.  But when passed to the
procedure, the polymorphic argument type means it would be called with the
argument in a regular register.

Higher-order insts already contain information about the calling convention,
without which a higher-order term cannot be called.  We extend higher-order
insts to include information about the register class required for each
argument.  For example, we can distinguish between:

	pred(in) is semidet /* arg regs: [reg_f] */
and
	pred(in) is semidet /* arg regs: [reg_r] */

Using this information, we can create a wrapper around a higher-order
variable if it appears in a context requiring a different calling convention.
We do this in a new HLDS pass, called float_regs.m.

Note: Mercury code has a tendency to lose insts for higher-order terms, then
"recover" them by hacky means.  The float_regs pass depends on higher-order
insts; it is impossible to create a wrapper for a procedure without knowing
how to call it.  The float_regs pass will report errors which we otherwise
accepted, due to higher-order insts being unavailable.  It should be possible
for the user to adjust the code to satisfy the pass, though the user may not
understand why it should be necessary.  In most cases, it probably really
*is* unnecessary.  We may be able to make the float_regs pass more tolerant
of missing higher-order insts in the future.

Class method calls do not use float registers because I didn't want to deal
with them yet.


compiler/options.m:
compiler/handle_options.m:
	Always enable float registers in low-level C grades when floats are
	wider than a word.

compiler/make_hlds_passes.m:
	Always allow double word floats to be stored unboxed in cells on C
	grades.

compiler/hlds_goal.m:
	Add an extra field to `generic_call' which gives the register class
	to use for each argument.  This is set by the float_regs pass.

compiler/prog_data.m:
	Add an extra field to `pred_inst_info' which records the register class
	to use for each argument.  This is set by the float_regs pass.

compiler/hlds_pred.m:
	Add a field to `proc_sub_info' which lists the headvars which must be
	passed via regular registers despite their types.

	Add a field to `pred_sub_info' to record the original unsubstituted
	argument types for instance method predicates.

compiler/check_typeclass.m:
	In the pred_info of an instance method predicate, record the original
	argument types before substituting the type variables for the instance.

compiler/float_regs.m:
compiler/transform_hlds.m:
	Add the new HLDS pass.

compiler/mercury_compile_middle_passes.m:
	Run the new pass if float registers are enabled.

compiler/lambda.m:
	Export the predicate to produce a predicate from a lambda.
	This is reused by float_regs.m to create wrapper closures.

	Add an argument to `expand_lambda' to set the reg_r_headvars field on
	the newly created procedure.

	Delete some unused fields from `lambda_info'.

compiler/arg_info.m:
	Make `generate_proc_arg_info' no longer always use regular registers
	for calls to exported procedures.  Do always use regular registers for
	class methods calls.

	Add a version of `make_arg_infos' which takes an explicit list of
	argument registers.  Rename the previous version.

	Add `generic_call_arg_reg_types' to return the argument registers
	for a generic call.

	Add a version of `compute_in_and_out_vars' which additionally separates
	arguments for float and regular registers.

compiler/call_gen.m:
	Use float registers for argument passing in higher-order calls, as
	directed by the new field in `generic_call'.

compiler/code_util.m:
	Add a function to encode the number of regular and float register
	arguments when making a higher-order call.

compiler/llds.m:
	Say that the `do_call_closure_N' functions only work for zero float
	register arguments.

compiler/follow_vars.m:
compiler/interval.m:
	Account for the use of float registers by generic call goals in these
	passes.

compiler/unify_gen.m:
	Move float register arguments to the end of a closure's hidden
	arguments vector, after regular register arguments.

	Count hidden regular and float register arguments separately, but
	encode them in the same word in the closure.  This is preferable to
	using two words because it reduces the differences between grades
	with and without float registers present.

	Disable generating code which creates a closure from an existing
	closure, if float registers exist.  That code does not understand the
	reordered hidden arguments vector yet.

compiler/continuation_info.m:
	Replace an argument's type_info in the closure layout if the argument
	is a float *and* is passed via a regular register, when floats are
	normally passed via float registers.  Instead, give it the type_info
	for `private_builtin.float_box'.

compiler/builtin_lib_types.m:
	Add function to return the type of `private_builtin.float_box/0'.

compiler/hlds_out_goal.m:
compiler/hlds_out_pred.m:
compiler/mercury_to_mercury.m:
	Dump the new fields added to `generic_call', `pred_inst_info' and
	`proc_sub_info'.

compiler/prog_type.m:
	Add helper predicate.

compiler/*.m:
	Conform to changes.

library/private_builtin.m:
	Add a type `float_box'.

runtime/mercury_ho_call.h:
	Describe the modified closure representation.

	Rename the field which counts the number of hidden arguments to prevent
	it being used incorrectly, as it now encodes two numbers (potentially).

	Add macros to unpack the encoded field.

runtime/mercury_ho_call.c:
	Update the description of how higher-order calls work.

	Update code which extracts closure arguments to take account the
	arguments being reordered in the hidden arguments vector.

runtime/mercury_deep_copy.c:
runtime/mercury_deep_copy_body.h:
runtime/mercury_layout_util.c:
runtime/mercury_ml_expand_body.h:
	Update code which extracts closure arguments to take account the
	arguments being reordered in the hidden arguments vector.

runtime/mercury_type_info.c:
runtime/mercury_type_info.h:
	Add helper function.

tools/make_spec_ho_call:
	Update the generated do_call_closure_* functions to place float
	register arguments.

tests/hard_coded/Mercury.options:
tests/hard_coded/Mmakefile:
tests/hard_coded/ho_float_reg.exp:
tests/hard_coded/ho_float_reg.m:
	Add new test case.

tests/hard_coded/copy_pred.exp:
tests/hard_coded/copy_pred.m:
tests/hard_coded/deconstruct_arg.exp:
tests/hard_coded/deconstruct_arg.exp2:
tests/hard_coded/deconstruct_arg.m:
	Extend test cases with float arguments in closures.

tests/debugger/higher_order.exp2:
	Add alternative output, changed due to closure wrapping.

tests/hard_coded/ho_univ_to_type.m:
	Adjust test case so that the float_regs pass does not report errors
	about missing higher-order insts.

compiler/notes/compiler_design.html:
	Describe the new module.

	Delete a duplicated paragraph.

compiler/notes/todo.html:
TODO:
	Delete one hundred billion year old todos.
2012-02-13 00:11:57 +00:00

284 lines
12 KiB
Mathematica

%-----------------------------------------------------------------------------%
% vim: ft=mercury ts=4 sw=4 et
%-----------------------------------------------------------------------------%
% Copyright (C) 2001-2012 The University of Melbourne.
% This file may only be copied under the terms of the GNU General
% Public License - see the file COPYING in the Mercury distribution.
%-----------------------------------------------------------------------------%
%
% File: delay_construct.m.
% Author: zs.
%
% This module transforms sequences of goals in procedure bodies. It looks for
% a unification that constructs a ground term followed by primitive goals, at
% least one of which can fail, and none of which take the variable
% representing the cell as their input. Such code sequences cause the cell to
% be constructed even if the following goal would fail, which is wasteful.
% This module therefore reorders the sequence, moving the construction
% unification past all the semidet primitives it can.
%
% The reason we don't move the construction past calls or composite goals is
% that this may require storing the input arguments of the construction on the
% stack, which may cause a slowdown bigger than the speedup available from not
% having to construct the cell on some execution paths.
%
%-----------------------------------------------------------------------------%
:- module transform_hlds.delay_construct.
:- interface.
:- import_module hlds.hlds_module.
:- import_module hlds.hlds_pred.
%-----------------------------------------------------------------------------%
:- pred delay_construct_proc(module_info::in, pred_proc_id::in,
proc_info::in, proc_info::out) is det.
%-----------------------------------------------------------------------------%
%-----------------------------------------------------------------------------%
:- implementation.
:- import_module check_hlds.inst_match.
:- import_module hlds.hlds_goal.
:- import_module hlds.hlds_rtti.
:- import_module hlds.instmap.
:- import_module hlds.passes_aux.
:- import_module parse_tree.prog_data.
:- import_module parse_tree.set_of_var.
:- import_module bool.
:- import_module list.
:- import_module require.
:- import_module set.
%-----------------------------------------------------------------------------%
delay_construct_proc(ModuleInfo, proc(PredId, ProcId), !ProcInfo) :-
trace [io(!IO)] (
write_proc_progress_message("% Delaying construction unifications in ",
PredId, ProcId, ModuleInfo, !IO)
),
module_info_get_globals(ModuleInfo, Globals),
module_info_pred_info(ModuleInfo, PredId, PredInfo),
body_should_use_typeinfo_liveness(PredInfo, Globals, BodyTypeinfoLiveness),
proc_info_get_vartypes(!.ProcInfo, VarTypes),
proc_info_get_rtti_varmaps(!.ProcInfo, RttiVarMaps),
proc_info_get_initial_instmap(!.ProcInfo, ModuleInfo, InstMap0),
DelayInfo = delay_construct_info(ModuleInfo, BodyTypeinfoLiveness,
VarTypes, RttiVarMaps),
proc_info_get_goal(!.ProcInfo, Goal0),
delay_construct_in_goal(Goal0, InstMap0, DelayInfo, Goal),
proc_info_set_goal(Goal, !ProcInfo).
:- type delay_construct_info
---> delay_construct_info(
dci_module_info :: module_info,
dci_body_typeinfo_liveness :: bool,
dci_vartypes :: vartypes,
dci_rtti_varmaps :: rtti_varmaps
).
%-----------------------------------------------------------------------------%
:- pred delay_construct_in_goal(hlds_goal::in, instmap::in,
delay_construct_info::in, hlds_goal::out) is det.
delay_construct_in_goal(Goal0, InstMap0, DelayInfo, Goal) :-
Goal0 = hlds_goal(GoalExpr0, GoalInfo0),
(
GoalExpr0 = conj(ConjType, Goals0),
(
ConjType = plain_conj,
Detism = goal_info_get_determinism(GoalInfo0),
determinism_components(Detism, CanFail, MaxSoln),
(
% If the conjunction cannot fail, then its conjuncts cannot
% fail either, so we have no hope of pushing a construction
% past a failing goal.
%
% If the conjuntion contains goals that can succeed more than
% once, which is possible if MaxSoln is at_most_many or
% at_most_many_cc, then moving a construction to the right
% may increase the number of times the construction is
% executed. We are therefore careful to make sure
% delay_construct_in_conj doesn't move constructions
% across goals that succeed more than once. If the conjunction
% cannot succeed, i.e. MaxSoln is at_most_zero, there is no
% point in trying to speed it up.
CanFail = can_fail,
MaxSoln \= at_most_zero
->
delay_construct_in_conj(Goals0, InstMap0, DelayInfo, set.init,
[], Goals1)
;
Goals1 = Goals0
)
;
ConjType = parallel_conj,
Goals1 = Goals0
),
delay_construct_in_goals(Goals1, InstMap0, DelayInfo, Goals),
Goal = hlds_goal(conj(ConjType, Goals), GoalInfo0)
;
GoalExpr0 = disj(Goals0),
delay_construct_in_goals(Goals0, InstMap0, DelayInfo, Goals),
Goal = hlds_goal(disj(Goals), GoalInfo0)
;
GoalExpr0 = negation(NegGoal0),
delay_construct_in_goal(NegGoal0, InstMap0, DelayInfo, NegGoal),
Goal = hlds_goal(negation(NegGoal), GoalInfo0)
;
GoalExpr0 = switch(Var, CanFail, Cases0),
delay_construct_in_cases(Cases0, InstMap0, DelayInfo, Cases),
Goal = hlds_goal(switch(Var, CanFail, Cases), GoalInfo0)
;
GoalExpr0 = if_then_else(Vars, Cond0, Then0, Else0),
Cond0 = hlds_goal(_, CondInfo0),
CondInstMapDelta = goal_info_get_instmap_delta(CondInfo0),
instmap.apply_instmap_delta(InstMap0, CondInstMapDelta, InstMapThen),
delay_construct_in_goal(Cond0, InstMap0, DelayInfo, Cond),
delay_construct_in_goal(Then0, InstMapThen, DelayInfo, Then),
delay_construct_in_goal(Else0, InstMap0, DelayInfo, Else),
Goal = hlds_goal(if_then_else(Vars, Cond, Then, Else), GoalInfo0)
;
GoalExpr0 = scope(Reason, SubGoal0),
(
Reason = from_ground_term(_, FGT),
( FGT = from_ground_term_construct
; FGT = from_ground_term_deconstruct
)
->
Goal = Goal0
;
delay_construct_in_goal(SubGoal0, InstMap0, DelayInfo, SubGoal),
Goal = hlds_goal(scope(Reason, SubGoal), GoalInfo0)
)
;
( GoalExpr0 = generic_call(_, _, _, _, _)
; GoalExpr0 = plain_call(_, _, _, _, _, _)
; GoalExpr0 = unify(_, _, _, _, _)
; GoalExpr0 = call_foreign_proc(_, _, _, _, _, _, _)
),
Goal = Goal0
;
GoalExpr0 = shorthand(_),
% These should have been expanded out by now.
unexpected($module, $pred, "shorthand")
).
%-----------------------------------------------------------------------------%
% We maintain a list of delayed construction unifications that construct
% ground terms, and the set of variables they define.
%
% When we find other construction unifications, we add them to the list. It
% does not matter if they depend on other delayed construction unifications;
% when we put them back into the conjunction, we do so in the original order.
%
% There are several reasons why we may not be able to delay a construction
% unification past a conjunct. The conjunct may not be a primitive goal, or it
% may be impure; in either case, we must insert all the delayed construction
% unifications before it. The conjunct may also require the value of a
% variable defined by a construction unification. In such cases, we could drop
% before that goal only the construction unifications that define the
% variables needed by the conjunct, either directly or indirectly through the
% values required by some of those construction unifications. However,
% separating out this set of delayed constructions from the others would
% require somewhat complex code, and it is not clear that there would be any
% significant benefit. We therefore insert *all* the delayed constructions
% before a goal if the goal requires *any* of the variables bound by the
% constructions.
%
% The instmap we pass around is the one that we construct from the original
% conjunction order. At each point, it reflects the bindings made by the
% conjuncts so far *plus* the bindings made by the delayed goals.
:- pred delay_construct_in_conj(list(hlds_goal)::in, instmap::in,
delay_construct_info::in, set(prog_var)::in, list(hlds_goal)::in,
list(hlds_goal)::out) is det.
delay_construct_in_conj([], _, _, _, RevDelayedGoals, DelayedGoals) :-
list.reverse(RevDelayedGoals, DelayedGoals).
delay_construct_in_conj([Goal0 | Goals0], InstMap0, DelayInfo,
ConstructedVars0, RevDelayedGoals0, Goals) :-
Goal0 = hlds_goal(GoalExpr0, GoalInfo0),
InstMapDelta0 = goal_info_get_instmap_delta(GoalInfo0),
instmap.apply_instmap_delta(InstMap0, InstMapDelta0, InstMap1),
(
GoalExpr0 = unify(_, _, _, Unif, _),
Unif = construct(Var, _, Args, _, _, _, _),
Args = [_ | _], % We are constructing a cell, not a constant
instmap_lookup_var(InstMap0, Var, Inst0),
inst_is_free(DelayInfo ^ dci_module_info, Inst0),
instmap_lookup_var(InstMap1, Var, Inst1),
inst_is_ground(DelayInfo ^ dci_module_info, Inst1)
->
set.insert(Var, ConstructedVars0, ConstructedVars1),
RevDelayedGoals1 = [Goal0 | RevDelayedGoals0],
delay_construct_in_conj(Goals0, InstMap1, DelayInfo,
ConstructedVars1, RevDelayedGoals1, Goals)
;
Goal0 = hlds_goal(GoalExpr0, GoalInfo0),
delay_construct_skippable(GoalExpr0, GoalInfo0),
NonLocals = goal_info_get_nonlocals(GoalInfo0),
maybe_complete_with_typeinfo_vars(NonLocals,
DelayInfo ^ dci_body_typeinfo_liveness,
DelayInfo ^ dci_vartypes,
DelayInfo ^ dci_rtti_varmaps, CompletedNonLocals),
set_of_var.intersect(CompletedNonLocals,
set_to_bitset(ConstructedVars0), Intersection),
set_of_var.is_empty(Intersection),
goal_info_get_purity(GoalInfo0) = purity_pure
->
delay_construct_in_conj(Goals0, InstMap1, DelayInfo,
ConstructedVars0, RevDelayedGoals0, Goals1),
Goals = [Goal0 | Goals1]
;
list.reverse(RevDelayedGoals0, DelayedGoals),
delay_construct_in_conj(Goals0, InstMap1, DelayInfo,
set.init, [], Goals1),
list.append(DelayedGoals, [Goal0 | Goals1], Goals)
).
:- pred delay_construct_skippable(hlds_goal_expr::in, hlds_goal_info::in)
is semidet.
delay_construct_skippable(GoalExpr, GoalInfo) :-
(
GoalExpr = unify(_, _, _, _, _)
;
GoalExpr = plain_call(_, _, _, inline_builtin, _, _)
),
Detism = goal_info_get_determinism(GoalInfo),
determinism_components(Detism, _CanFail, MaxSoln),
MaxSoln \= at_most_many.
%-----------------------------------------------------------------------------%
:- pred delay_construct_in_goals(list(hlds_goal)::in, instmap::in,
delay_construct_info::in, list(hlds_goal)::out) is det.
delay_construct_in_goals([], _, _, []).
delay_construct_in_goals([Goal0 | Goals0], InstMap0, DelayInfo,
[Goal | Goals]) :-
delay_construct_in_goal(Goal0, InstMap0, DelayInfo, Goal),
delay_construct_in_goals(Goals0, InstMap0, DelayInfo, Goals).
:- pred delay_construct_in_cases(list(case)::in, instmap::in,
delay_construct_info::in, list(case)::out) is det.
delay_construct_in_cases([], _, _, []).
delay_construct_in_cases([Case0 | Cases0], InstMap0, DelayInfo,
[Case | Cases]) :-
Case0 = case(MainConsId, OtherConsIds, Goal0),
delay_construct_in_goal(Goal0, InstMap0, DelayInfo, Goal),
Case = case(MainConsId, OtherConsIds, Goal),
delay_construct_in_cases(Cases0, InstMap0, DelayInfo, Cases).
%-----------------------------------------------------------------------------%
:- end_module transform_hlds.delay_construct.
%-----------------------------------------------------------------------------%