bitfields with one bit per cpu to represent a set of cpus. These types
have an arbitrary width. However, the man page is misleading about how to
specify the size of a cpuset (bits or bytes) to macros and functions than
manipulate it. This patch corrects this problem.
runtime/mercury_context.c:
Fix how the size of a CPU_SET is specified.
Fix MR_pin_thread_no_locking() so that it returns a valid result if
the thread could not be pinned or the loop (in this function) was never
executed.
Users can never set MR_num_processors, so remove the code that presumes
they can.
Branches: main
Add float registers to the Mercury abstract machine, implemented as an
array of MR_Float in the Mercury engine structure.
Float registers are only useful if a Mercury `float' is wider than a word
(i.e. when using double precision floats on 32-bit platforms) so we let them
exist only then. In other cases floats may simply be passed via the regular
registers, as before.
Currently, higher order calls still require the use of the regular registers
for all arguments. As all exported procedures are potentially the target of
higher order calls, exported procedures must use only the regular registers for
argument passing. This can lead to more (un)boxing than if floats were simply
always boxed. Until this is solved, float registers must be enabled explicitly
with the developer only option `--use-float-registers'.
The other aspect of this change is using two consecutive stack slots to hold a
single double variable. Without that, the benefit of passing unboxed floats
via dedicated float registers would be largely eroded.
compiler/options.m:
Add developer option `--use-float-registers'.
compiler/handle_options.m:
Disable `--use-float-registers' if floats are not wider than words.
compiler/make_hlds_passes.m:
If `--use-float-registers' is in effect, enable a previous change that
allows float constructor arguments to be stored unboxed in structures.
compiler/hlds_llds.m:
Move `reg_type' here from llds.m and `reg_f' option.
Add stack slot width to `stack_slot' type.
Add register type and stack slot width to `abs_locn' type.
Remember next available float register in `abs_follow_vars'.
compiler/hlds_pred.m:
Add register type to `arg_loc' type.
compiler/llds.m:
Add a new kind of lval: double-width stack slots.
These are used to hold double-precision floating point values only.
Record setting of `--use-float-registers' in exprn_opts.
Conform to addition of float registers and double stack slots.
compiler/code_info.m:
Make predicates take the register type as an argument,
where it can no longer be assumed.
Remember whether float registers are being used.
Remember max float register for calls to MR_trace.
Count double width stack slots as two slots.
compiler/arg_info.m:
Allocate float registers for procedure arguments when appropriate.
Delete unused predicates.
compiler/var_locn.m:
Make predicates working with registers either take the register type as
an argument, or handle both register types at once.
Select float registers for variables when appropriate.
compiler/call_gen.m:
Explicitly use regular registers for all higher-order calls,
which was implicit before.
compiler/pragma_c_gen.m:
Use float registers, when available, at the interface between Mercury
code and C foreign_procs.
compiler/export.m:
Whether a float rval needs to be boxed/unboxed when assigned to/from a
register depends on the register type.
compiler/fact_table.m:
Use float registers for arguments to predicates defined by fact tables.
compiler/stack_alloc.m:
Allocate two consecutive stack slots for float variables when
appropriate.
compiler/stack_layout.m:
Represent double-width stack slots in procedure layout structures.
Conform to changes.
compiler/store_alloc.m:
Allocate float registers (if they exist) for float variables.
compiler/use_local_vars.m:
Substitute float abstract machine registers with MR_Float local
variables.
compiler/llds_out_data.m:
compiler/llds_out_instr.m:
Output float registers and double stack slots.
compiler/code_util.m:
compiler/follow_vars.m:
Count float registers separately from regular registers.
compiler/layout.m:
compiler/layout_out.m:
compiler/trace_gen.m:
Remember the max used float register for calls to MR_trace().
compiler/builtin_lib_types.m:
Fix incorrect definition of float_type_ctor.
compiler/bytecode_gen.m:
compiler/continuation_info.m:
compiler/disj_gen.m:
compiler/dupelim.m:
compiler/exprn_aux.m:
compiler/global_data.m:
compiler/hlds_out_goal.m:
compiler/jumpopt.m:
compiler/llds_to_x86_64.m:
compiler/lookup_switch.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/par_conj_gen.m:
compiler/proc_gen.m:
compiler/string_switch.m:
compiler/tag_switch.m:
compiler/tupling.m:
compiler/x86_64_regs.m:
Conform to changes.
runtime/mercury_engine.h:
Add an array of fake float "registers" to the Mercury engine structure,
when MR_Float is wider than MR_Word.
runtime/mercury_regs.h:
Document float registers in the Mercury abstract machine.
Add macros to access float registers in the Mercury engine.
runtime/mercury_stack_layout.h:
Add new MR_LongLval cases to represent double-width stack slots.
MR_LONG_LVAL_TAGBITS had to be increased to accomodate the new cases,
which increases the number of integers in [0, 2^MR_LONG_LVAL_TAGBITS)
equal to 0 modulo 4. These are the new MR_LONG_LVAL_TYPE_CONS_n cases.
Add max float register field to MR_ExecTrace.
runtime/mercury_layout_util.c:
runtime/mercury_layout_util.h:
Extend MR_copy_regs_to_saved_regs and MR_copy_saved_regs_to_regs
for float registers.
Understand how to look up new kinds of MR_LongLval: MR_LONG_LVAL_TYPE_F
(previously unused), MR_LONG_LVAL_TYPE_DOUBLE_STACKVAR,
MR_LONG_LVAL_TYPE_DOUBLE_FRAMEVAR.
Conform to the new MR_LONG_LVAL_TYPE_CONS_n cases.
runtime/mercury_float.h:
Delete redundant #ifdef.
runtime/mercury_accurate_gc.c:
runtime/mercury_agc_debug.c:
Conform to changes (untested).
trace/mercury_trace.c:
trace/mercury_trace.h:
trace/mercury_trace_declarative.c:
trace/mercury_trace_external.c:
trace/mercury_trace_internal.c:
trace/mercury_trace_spy.c:
trace/mercury_trace_vars.c:
trace/mercury_trace_vars.h:
Handle float registers in the trace subsystem. This is mostly a matter
of saving/restoring them as with regular registers.
One bug was caused when the master context, in MR_lc_finish() would release the
contexts used by each of the slots. The release code attempts to save state
from the engine back into the context, which is necessary most of the time.
However, in this case it saved state from the engine running the master
context, into other contexts, so that when they where re-used they used an
invalid stack pointer.
Another bug was found in code with recursive parallel conjunctions. Each
context structure contains a pointer to a code location, it is used as a value
for the instruction pointer when a context is resumed. The
MR_join_and_continue operation for parallel conjunctions uses this resume to
ensure that the master context for a parallel conjunction is only resumed if it
has become blocked and is ready to be resumed. However the field was never
cleared before and will always contain the same value parallel conjunctions are
nested as they will all have the same resume point. This caused the master
context to be resumed before it had fully blocked, causing it to be resumed
with an invalid state.
A potential bug was found where a field should have been volatile to prevent
the compiler from caching its value when doing so would not be safe.
Widen a couple of critical sections as they didn't quite protect against some
race conditions. This is another potential cause of bugs.
runtime/mercury_par_builtin.[ch]:
Make the master_context field of the loop control structure volatile so
that the compiler doesn't cache its value.
Make the last worker to finish take a lock earlier, to ensure that the
master context won't be left waiting forever.
Add a comment explaining why a context must not be saved before calling
MR_destroy_context().
Improve debugging code to print out the value of the stack or parent stack
pointer, depending on the code in question.
Make the lock in MR_lc_finish() wider, so that the lock is held when the
code checks to see if it should block.
runtime/mercury_context.c:
MR_destroy_context() no longer saves the context before releasing it.
MR_destroy_context() no longer sets the MR_ctxt_resume_owner_engine field
of the context since it's not currently used.
MR_join_and_continue(), the barrier for parallel conjunctions, how resets
the resume code pointer of the master context when it switches to it.
runtime/mercury_context.h:
Described the reason why the context must be saved before it is
destroyed/released.
runtime/mercury_context.c:
runtime/mercury_engine.c:
Call MR_save_context() before calling MR_destroy_context()
Loop controlled code would sometimes find, while signaling a future,
that the future had already been signaled and had a different value.
The problem was that the MR_lc_join_and_terminate operation would only
save the context after marking the loop control slot (that owned the
context) as free. The save context code and the MR_lc_spawn_off code
both write the parent stack pointer field in the context structure, If
these (or other cases where the context is saved - such as when it
blocks on a future) race, then they can cause problems when the spawned
off computation uses its parent stack pointer.
runtime/mercury_par_builtin.h:
MR_lc_join_and_terminate now saves the context before calling
MR_lc_join.
+ Now pins threads intelligently on SMT systems by balancing threads among
cores.
+ performs fewer migrations when pinning threads (If a thread's current
CPU is a valid CPU for pinning, then it is not migrated).
+ Handle cases where the user requests more threads than available CPUs.
+ Handle cases where the process is restricted to a subset of CPUs by its
environment. (for instance, Linux cpuset(7))
This is largely made possible by the hwloc library
http://www.open-mpi.org/projects/hwloc/ However, hwloc is not required and the
runtime system will fall back to sched_setaffinity(), it will simply be less
intelligent WRT SMT.
runtime/mercury_context.h:
runtime/mercury_context.c:
Do thread pinning either via hwloc or sched_setaffinity. Previously only
sched_setaffinity was used.
Update thread-pinning algorithm, this:
Include the general thread pinning code only if MR_HAVE_THREAD_PINNING is
defined.
Use a combination of sysconf and sched_getaffinity to detect the number of
processors when hwloc isn't available. This makes the runtime compatible
with Linux cpuset(7) when hwloc isn't available.
configure.in:
Mmake.common.in:
Detect presence of the hwloc library.
configure.in:
Detect sched_getaffinity()
aclocal.m4:
acinclude.m4:
Move aclocal.m4 to acinclude.m4, the aclocal program will build aclocal.m4
and retrieve macros from the system and the contents of acinclude.m4.
Mmakefile:
Create a make target for aclocal.m4.
runtime/Mmakefile:
Link the runtime with libhwloc in low-level C parallel grades.
Include CFLAGS for libhwloc.
scripts/ml.in:
Link programs and libraries with libhwloc in low-level C parallel grades.
runtime/mercury_conf.h.in:
Define MR_HAVE_HWLOC when it is available.
Define MR_HAVE_SCHED_GETAFFINITY when it is available.
runtime/mercury_conf_param.h:
Define MR_HAVE_THREAD_PINNING if either hwloc or [sched_setaffinity and
sched_getaffinity] are available.
runtime/mercury_thread.c:
runtime/mercury_wrapper.c:
Only call MR_pin_thread and MR_pin_primordial_thread if
MR_HAVE_THREAD_PINNING is defined.
runtime/mercury_thread.h:
runtime/mercury_context.h:
Move the declaration of MR_pin_primordial_thread to mercury_context.h from
mercury_thead.h since it's definition is in mercury_context.c.
Require MR_HAVE_THREAD_PINNING for the declaration of
MR_pin_primordial_thread.
runtime/mercury_wrapper.c:
Conform to changes in mercury_context.h
INSTALL_CVS:
tools/test_mercury
Run aclocal at the right times while testing Mercury.
This patch commits the code-generator parts of the loop control transformation.
It also makes corrections and changes to the source-to-source, runtime and
library parts of the transformation.
Preliminary results look good, loop controlled right-recursive dependent code
performs as fast as independent right-recursive code, and it does so using the
minimum number of contexts (8 on apollo (an i7)). Previously, when
transforming code by hand, we needed 32 contexts on a 4 core system (taura).
The reason for this is that we changed our design so that the master context
would become blocked if there was no free slot. This ensures that once a
worker finishes it's current job new work is either already available or can be
made available promptly.
compiler/par_conj_gen.m:
compiler/code_gen.m:
Generate code for the new loop_control scope.
compiler/llds_out_instr.m:
Write out the lc_spawn_off instruction correctly.
compiler/code_info.m:
Add support for storing out-of-line code in the code_info structure.
compiler/proc_gen.m:
After generating a procedure's body add any out-of-line code stored in the
code_info structure onto the end of the procedure (after the exit code).
compiler/par_loop_control.m:
Add missing parts to the loop control transformation:
+ Add the barrier in the base case.
+ Transform non-parallel recursive calls.
+ Add a join_and_terminate call to the end of the forked-off code.
Make minor corrections to comments.
runtime/mercury_par_builtin.h:
runtime/mercury_par_builtin.c:
MR_lc_wait_free_slot and MR_lc_spawn_off no-longer mangle the labels they
are passed.
Fix a typeo that caused a bug.
Add debugging code.
library/par_builtin.m:
Store the value of LC in a stack slot during lc_wait_for_slot, This makes
sure it is available in the case that lc_wait_for_slot suspends the
context.
Remove the loop_control_slot type, we now use integers to represent the
position of a slot within a loop control structure.
runtime/mercury_par_builtin.[ch]:
Corrected some comments, cleaning up unclear prose and also correcting
content.
Use a hint for the next free slot in the loop control structure. This will
ensure that free slots are found more quickly.
Corrected a case MR_fatal_error would have been called when there was no
error.
Estimated hours taken: 0.1
Branches: main
runtime/mercury_stack_layout.h:
trace/mercury_trace_declararative.c:
Add comments about my recent design decision about the representation
of goal paths.
Estimated hours taken: 0.1
Branches: main
runtime/mercury_stack_layout.h:
Remove a bunch of long-obsolete macros. Their job was to define
or to declare individual label layouts, but for a long time now
we have put all label layouts into arrays.
Estimated hours taken: 0.5
Branches: main
The first part of my post-commit review of Paul's loop control diff,
covering everything except the transformation.
compiler/goal_util.m:
Remove the new expand_plain_conj predicate Paul just added,
since it exactly duplicates the existing goal_to_conj_list.
compiler/par_loop_control.m:
Conform to the above.
runtime/mercury_par_builtin.h:
Fix a bug introduced by Paul's diff: the extendable array MUST be
the last slot in the MR_LoopControl structure.
Fix some of the documentation and the formatting.
runtime/mercury_par_builtin.c:
Fix some of the documentation and the formatting.
Add some XXXs.
remaining part is the code generation for code that is to be spawned off. It
must be handled in the code generator since it uses the parent stack pointer in
many cases.
I'm committing this now so that Zoltan can begin to review it while I work on
the code generator component.
compiler/par_loop_control.m:
This new file contains the source-to-source part of the parallel loop
control transformation..
compiler/transform_hlds.m.
Include the par_loop_control module within the transform_hlds module.
compiler/mercury_compile_middle_passes.m:
Call the loop control transformation at stage 206 - after the dependant
parallel conjunction transformation.
Move the last call optimisation pass from stage 175 to 206 since it will
most-likely prevent loop control from working. Where both transformations
are applicable, the loop control transformation is preferred.
compiler/options.m:
Add new options for loop control.
compiler/handle_options.m:
Disable loop control if we're not in a grade that supports parallel
conjunctions.
Other tests that should have been testing for parallel conjunction support
but only tested parallel support have been fixed.
compiler/hlds_goal.m:
Add the feature_do_not_tailcall feature.
compiler/call_gen.m:
Mark LLCS call goals that may not have last call optimisation applied to
them if they have the feature_do_not_tailcall feature set in their HLDS
info.
compiler/goal_util.m:
Create a new predicate expand_plain_conj, this returns a list of the sub
goals of a plain conjunction, or returns the goal in a singleton list.
XXX: Could someone review the name of this predicate.
compiler/hlds_pred.m:
Add a symbol for the new transformation in the pred_transformation type.
Corrected a comment to match the arguments in the predicate it refers to.
compiler/prog_util.m:
Add support to make_pred_name for creating names for loop control
predicates.
compiler/dep_par_conj.m:
Fix grammer in a comment.
compiler/saved_vars.m:
Conform to the change in hlds_goal.m
compiler/layout_out.m:
Conform to the change in hlds_pred.m
runtime/mercury_par_builtin.[ch]:
Add support for lc_wait_free_slot/2, the blocking version of
lc_get_free_slot/2. This means that other loop control builtins have
changed, for instance, lc_join_and_terminate/2 must wake up a context
blocked in lc_wait_free_slot/2 after making the slot it was using free.
Use a spin lock in the loop control structure rather than a POSIX mutex.
runtime/mercury_wrapper.[ch]:
Add support for a runtime variable, the number of contexts per loop control.
This can be controlled with a MERCURY_OPTIONS option.
mdbcomp/program_representation.m:
Include lc_wait_free_slot/2 in the list of external predicates.
mdbcomp/mdbcomp.goal_path.m:
Add two new predicates goal_path_remove_first/3 and goal_path_get_first/2.
library/par_builtin.m:
Add new builtins to support the loop control transformation:
lc_wait_free_slot/2 will block the context until a new slot is
available.
lc_default_num_contexts/1 will return the number of contexts to use, by
default, for a loop-controlled loop.
Add myself as an author of this module.
doc/user_guide.texi:
Document the runtime --num-contexts-per-lc-per-thread option. It is
currently commented out since it is not intended for users, at least for
now.
Document the loop control options for the compiler.
---
The change below was written by Zoltan, I reviewed when I applied his diff to
my workspace.
Allow the compiler to mark calls in the LLDS as calls that cannot have last
call optimization applied to them. Paul will soon need this capability
in order to implement parallel conjunctions in which earlier conjuncts
are spawned off, and later conjuncts contain recursive calls, but the
earlier conjuncts need the stack frame.
compiler/llds.m:
Add a flag to det and semi calls. (Model_non calls have had a similar
flag for a long time, for a totally different reason.)
compiler/call_gen.m:
By default, say that det and semi calls may have LCO applied to them.
compiler/jumpopt.m:
Apply LCO to det and semi calls only if this flag allows it.
compiler/opt_debug.m:
Include the flag in debugging dumps.
Estimated hours taken: 2
Branches: main
Remove references to nondet foreign_proc from the definition of the data
structures that define stack layouts.
runtime/mercury_stack_layout.h:
Remove the trace ports that could occur in nondet foreign_procs
from the definition of the trace port type used in C code.
mdbcomp/prim_data.m:
Remove the trace ports that could occur in nondet foreign_procs
from the definition of the trace port type used in Mercury code.
compiler/layout_out.m:
compiler/stack_layout.m:
compiler/trace_params.m:
mdbcomp/trace_counts.m:
runtime/mercury_trace_base.h:
trace/mercury_trace_declarative.h:
Delete references to those ports.
runtime/mercury_stack_layout.h:
Update the binary compatibility version number for debuggable
executables, since the port number of user events has changed.
Estimated hours taken: 6
Branches: main
Reduce the size of the string tables in debuggable executables by encoding
variable names that fit a few standard templates, the most important of which
is STATE_VARIABLE_name_number.
The effect on the compiler is to reduce the string table size from about
3.1Mb to about 2.1Mb, which is about a 30% reduction.
compiler/stack_layout.m:
Look for the names fitting the patterns in variable names, and encode
them.
runtime/mercury_stack_layout.[ch]:
Add a function for looking up variable names, decoding them if needed.
Since goal paths cannot fit any of the patterns, access them without
using that function.
mdbcomp/rtti_access.m:
Use the new function to retrieve variable names.
runtime/mercury_grade.h:
Increment the debugging compatibility version number, since debuggable
executables in which some modules were produced by a compiler without
this diff and some were produced by a compiler with this diff won't
work together.
Branches: 11.07, main
runtime/mercury.h:
Fix the condition protecting the definition of MR_GC_MALLOC_INLINE,
since we are calling the Boehm collector directly we require
MR_BOEHM_GC to be defined, not just MR_CONSERVATIVE_GC.
MR_GC_MALLOC_ATOMIC does not exist; use GC_MALLOC_ATOMIC instead.
Add a couple of XXXs regarding the definition of MR_new_object_atomic
in the case where inline allocation is enabled.
Branches: main, 11.07
Avoid failures in the namespace cleanliness check in .par grade on MinGW.
*/RESERVED_MACRO_NAMES:
Add some macros automatically defined by GCC on MinGW.
Branches: main, 11.07
Avoid warnings about functions that don't return in the runtime
with MSVC.
Avoid a warning in the configure script with MSVC.
configure.in:
The cygpath tool is only required with MSVC when using
Cygwin as the build environment; don't emit an error message
about this on other systems, e.g. MingGW.
runtime/mercury_std.h:
Redefine MR_NO_RETURN so that it works with both GCC/Clang
and Visual C.
runtime/mercury_misc.h:
runtime/mercury_engine.c:
Conform to the above change to MR_NO_RETURN.
runtime/mercury_bootstrap.h:
Delete the redefinition of NO_RETURN; any code that still
uses is not going to work for a variety of other reasons.
Estimated hours taken: 1
Branches: main
Post-commit review of Paul's change introducing the loop control primitives.
It also updates some documentation Paul's update did not touch.
library/par_buildin.m:
runtime/mercury_atomic_ops.h:
runtime/mercury_context.h:
Fix formatting and grammar.
runtime/mercury_par_builtin.[ch]:
Use a variable length array in the loop control struct to store
the loop control slots. This setup needs one load to access a slot,
compared to two with the previous arrangement.
Fix formatting and grammar.
Add XXXs where relevant.
runtime/mercury_par_builtin.h:
runtime/mercury_par_builtin.c:
Introduce loop control runtime code.
runtime/mercury_context.h:
Introduce a new new macro to tune the size of contexts that are used as
workers by the loop control runtime. This is set to the same context size
as for sparks.
runtime/mercury_context.c:
Fixed a typeo in a comment.
library/par_builtin.m:
Create predicate versions of the par builtin macros runtime code. The only
primitive without a predicate version is MR_lc_spawn_off which cannot be
expressed in Mercury and needs support from the LLDS stage in the compiler.
mdbcomp/program_representation.m:
Add par_builtin.lc_finish/1 as an externally defined predicate. This tells
the debugger not to expect any events for it.
An event described in our ThreadScope paper had not been added to the runtime
system. This event announces that an engine is attempting find work on the
form of a local spark.
This change also introduces a hierarchy of events, where one event 'extends'
another existing event. We use this for Mercury's spark events which contain
spark IDs in their payloads. These extend GHC's spark events.
Other changes have been made to ensure that Mercury conforms with the
ghc-events library, which is used by the ThreadScope tool.
runtime/mercury_threadscope.h:
runtime/mercury_threadscope.c:
Add support for the LOOKING_FOR_LOCAL_SPARK event.
Re-number the CALLING_MAIN event to make a Mercury specific event.
Re-number the STRING event.
Re-name the STRING event, it is now INTERN_STRING.
No-longer use the deprecated SPARK_RUN and SPARK_STEAL events, instead use
the new events and create Mercury specific events that extend these events.
The Mercury-specific SPARKING event has been renamed to SPARK_CREATE and
now extends the base SPARK_CREATE event.
Made a correction to a comment.
runtime/mercury_context.c:
Post the LOOKING_FOR_LOCAL_SPARK event.
Branches: main
Store double-precision `float' constructor arguments in unboxed form,
in high-level C grades on 32-bit platforms, i.e. `float' (and equivalent)
arguments may occupy two machine words.
As the C code generated by the MLDS back-end makes use of MR_Float variables
and parameters, float (un)boxing may be reduced substantially in many programs.
compiler/prog_data.m:
Add `double_word' as a new option for constructor argument widths,
only used for float arguments as yet.
compiler/make_hlds_passes.m:
Set constructor arguments to have `double_word' width if required,
and possible.
compiler/type_util.m:
Add helper predicate.
compiler/builtin_ops.m:
compiler/c_util.m:
compiler/llds.m:
Add two new binary operators used by the MLDS back-end.
compiler/arg_pack.m:
Handle `double_word' arguments.
compiler/ml_code_util.m:
Deciding whether or not a float constructor argument requires boxing
now depends on the width of the field.
compiler/ml_global_data.m:
When a float constant appears as an initialiser of a generic array
element, it is now always unboxed, irrespective of --unboxed-float.
compiler/ml_type_gen.m:
Take double-word arguments into account when generating structure
fields.
compiler/ml_unify_gen.m:
Handle double-word float constructor arguments in (de)constructions.
In some cases we break a float argument into its two words, so
generating two assignments statements or two separate rvals.
Take double-word arguments into account when calculating field offsets.
compiler/mlds_to_c.m:
The new binary operators require no changes here.
As a special case, write `MR_float_from_dword_ptr(&X)' instead of
`MR_float_from_dword(X, Y)' when X, Y are consecutive words within a
field. The definition of `MR_float_from_dword_ptr' is more
straightforward, and gcc produces better code than if we use the more
general `MR_float_from_dword'.
compiler/rtti_out.m:
For double-word arguments, generate MR_DuArgLocn structures with
MR_arg_bits set to -1.
compiler/rtti_to_mlds.m:
Handle double-word arguments in field offset calculation.
compiler/unify_gen.m:
Partially handle double_word arguments in LLDS back-end.
compiler/handle_options.m:
Set --unboxed-float when targetting Java, C# and Erlang.
compiler/structure_reuse.direct.choose_reuse.m:
Rename a predicate.
compiler/bytecode.m:
compiler/equiv_type.m:
compiler/equiv_type_hlds.m:
compiler/llds_to_x86_64.m:
compiler/mlds_to_gcc.m:
compiler/mlds_to_il.m:
compiler/opt_debug.m:
Conform to changes.
library/construct.m:
library/store.m:
Handle double-word constructor arguments.
runtime/mercury_conf.h.in:
Clarify what `MR_BOXED_FLOAT' now means.
runtime/mercury_float.h:
Add helper macros for converting between doubles and word/dwords.
runtime/mercury_deconstruct.c:
runtime/mercury_deconstruct.h:
Add a macro `MR_arg_value' and a helper function to extract a
constructor argument value. This replaces `MR_unpack_arg'.
runtime/mercury_type_info.h:
Remove `MR_unpack_arg'.
Document that MR_DuArgLocn.MR_arg_bits may be -1.
runtime/mercury_deconstruct_macros.h:
runtime/mercury_deep_copy_body.h:
runtime/mercury_ml_arg_body.h:
runtime/mercury_table_type_body.h:
runtime/mercury_tabling.c:
runtime/mercury_type_info.c:
Handle double-word constructor arguments.
tests/hard_coded/Mercury.options:
tests/hard_coded/Mmakefile:
tests/hard_coded/lco_double.exp:
tests/hard_coded/lco_double.m:
tests/hard_coded/pack_args_float.exp:
tests/hard_coded/pack_args_float.m:
Add test cases.
trace/mercury_trace_vars.c:
Conform to changes.
Branches: main, 11.07
Make none.par.gc bootstrap with clang (2.8.0) on Linux.
runtime/mercury_atomic_ops.h:
Define MR_ATOMIC_DEC_INT_BODY and MR_ATOMIC_DEC_AND_IS_ZERO_WORD_BODY
for clang - we use the same inline assembler definitions that are used
for GCC.
Branches: main, 11.07
Make hlc.par.gc bootstrap with clang on Linux.
runtime/mercury_atomic_ops.h:
Use the GCC definitions of MR_COMPARE_AND_SWAP_WORD_BODY and
MR_CPU_SFENCE with clang.
Branches: main
Support unboxed float fields in high-level C grades.
When the representation of `float' is no wider than a machine word, d.u.
functor arguments of type `float' (or equivalent) will be stored directly
within cells constructed for that functor, instead of a pointer to the box
containing the value. This was already so for low-level C grades.
compiler/mlds.m:
Add an option to mlds_type, equivalent to
`mlds_array_type(mlds_generic_type)' except that some elements are
known to be floats.
Update some comments.
compiler/ml_global_data.m:
Remember the `--unboxed-float' option in `ml_global_data'.
Special case generic arrays in `ml_gen_static_scalar_const_addr' and
`ml_gen_static_scalar_const_value'. Float literals cannot be used to
initialize an element of a generic array in C. If any appear, replace
the generic array type by an instance of
`mlds_mostly_generic_array_type' with float fields in the positions
which have float initializers.
compiler/ml_code_util.m:
Make `ml_must_box_field_type' and `ml_gen_box_const_rval' depend on the
`--unboxed-float' option.
Delete some now-misleading comments.
Delete an unused predicate.
compiler/mlds_to_c.m:
Update code that writes out scalar static data to handle
`mlds_mostly_generic_array_type'.
In one case, for `--high-level-data' only, output float constants by
their integer representation, so that they may be cast to pointer
types.
compiler/ml_unify_gen.m:
Rename some predicates for clarity.
compiler/ml_accurate_gc.m:
compiler/ml_lookup_switch.m:
compiler/ml_proc_gen.m:
compiler/ml_simplify_switch.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_gcc.m:
compiler/mlds_to_il.m:
compiler/mlds_to_java.m:
Conform to changes.
library/float.m:
Add hidden functions to return the integer representation of the bit
layout of floating point values.
library/exception.m:
Delete mention of MR_AVOID_MACROS.
runtime/mercury.c:
runtime/mercury.h:
Make MR_box_float/MR_unbox_float act like "casts" when MR_BOXED_FLOAT
is undefined, and only define them in high-level grades. I think they
should be replaced by MR_float_to_word/MR_word_to_float (which have
less confusing names when there is no boxing) but that would require
some header file reshuffling which I don't want to undertake yet.
Delete references to MR_AVOID_MACROS. Apparently it existed to support
the defunct gcc back-end but I cannot see it ever being defined.
runtime/mercury_conf_param.h:
MR_HIGHLEVEL_CODE no longer implies MR_BOXED_FLOAT.
Delete mention of MR_AVOID_MACROS.
runtime/mercury_float.h:
Fix a comment.
tests/hard_coded/Mmakefile:
tests/hard_coded/float_ground_term.exp:
tests/hard_coded/float_ground_term.m:
Add a test case.
Branches: main, 11.07
Respond to review comments from Paul.
runtime/mercury_conf_param.h:
Fix some spacing.
runtime/mercury_std.h:
Fix s/MR_GNUC/__GNUC__/ in a comment.
Branches: main, 11.07
runtime/mercury_getopt.c:
Don't use MR_GNUC here since we don't #include the usual
Mercury headers here and it will always be undefined.
Estimated hours taken: 12
Branches: main
Further improvements in the implementation of string switches, along with
some bug fixes.
If the chosen hash function does not yield any collisions for the strings
in the switch arms, then we can optimize away the table column that we would
otherwise need for open addressing. This was implemented in a previous diff.
For an ordinary (non-lookup) string switch, the hash table has two columns
in the presence of collisions and one column in their absence. Therefore if
doubling the size of the table allows us to eliminate collisions, the table
size is unaffected, though the corresponding array of labels we have to put
into the computed_goto instruction we generate has to double as well.
Thus the only cost of such doubling is an increase in "code" size, and
for small tables, the elimination of the open addressing loop may compensate
for this, at least partially.
For lookup string switches, doubling the table size this way has a bigger
space cost, but the elimination of the open addressing loop still brings
a useful speed boost.
We therefore now DO double the table size if this eliminates collisions.
In the library, compiler etc directories, this eliminates collisions in
19 out of 47 switch switches that had collisions with the standard table size.
compiler/switch_util.m:
Replace the separate sets of predicates we used to have for computing
hash maps (one for lookup switches and one for non-lookup switches)
with a single set that works for both.
Change this set to double the table size if this eliminates collisions.
This requires it to decide the table size, a task previously done
separately by each of its callers.
One version of this set had an old bug, which caused it to effectively
ignore the second and third string hash functions. This diff fixes it.
There were two bugs in my previous diff: the unneeded table column
was not being optimized away from several_soln lookup switches, and the
lookup code for one_soln lookup switches used the wrong column offset.
This diff fixes these too.
Since doubling the table size requires recalculating all the hash
values, decouple the computation of the hash values from generating
code for each switch arm, since the latter shouldn't be done more than
once.
Add a note on an old problem.
compiler/ml_string_switch.m:
compiler/string_switch.m:
Bring the code for generating code for the arms of string switches
here from switch_util.m.
tests/hard_coded/Mmakefile:
Fix the reason why the bugs mentioned above were not detected:
the relevant test cases weren't enabled.
tests/hard_coded/string_hash.m:
Update this test case to test the correspondence of the compiler's
and the runtime's versions of not just the first hash function,
but also the second and third.
runtime/mercury_string.h:
Fix a typo in a comment.
Branches: main, 11.07
Avoid using the __GNUC__ macro in the runtime as a test for the presence of
gcc, since clang also defines that macro. Since clang doesn't support all
of the GNU C extensions, we can't actually use __GNUC__ without also checking
whether we are actually using clang.
runtime/mercury_conf_param.h:
Add three new macros, MR_CLANG, MR_GNUC and MR_MSVC that are defined
only when the C compiler is clang, gcc, or Visual C respectively.
(In particular, MR_GNUC will _not_ be defined when the C compiler
is clang.)
runtime/mercury.c:
runtime/mercury.h:
runtime/mercury_atomic_ops.c:
runtime/mercury_atomic_ops.h
runtime/mercury_bitmap.h:
runtime/mercury_float.h:
runtime/mercury_getopt.c:
runtime/mercury_goto.h:
runtime/mercury_heap.h:
runtime/mercury_std.h:
Replace uses of the __GNUC__ and __clang__ macros with the above.
runtime/mercury_regs.h:
As above, also #include mercury_conf_param.h directly since
this file is #included by some of the tests in the configure
script.
Branches: main, 11.07
Fix minor problems in the runtime identified by Visual C.
runtime/mercury_memory_zones.c:
Fix a call to a function that no longer exists.
runtime/mercury_stack_trace.h:
Fix argument type mismatches between function prototypes
and definitions.
Branches: main, 11.07
Fix a another Visual C runtime compilation problem.
runtime/mercury_heap_profile.c:
Avoid arithmetic with void pointers.
(That's a GNU extension.)
Branches: main, 11.07
Fix a runtime compilation error with Visual C.
runtime/mercury_memory_zones.c:
Don't interleave variable declarations and code.
(Doing so works in GNU C or C99, but not with VC9.)
Branches: main, 11.07
Fix a problem that was causing the runtime not to compile in high-level C
grades with clang.
runtime/mercury_std.h:
Define MR_STATIC_INLINE and friends for clang.
Add an XXX regarding the use of the C99 definitions for the above.
Branches: main
Pack consecutive enumeration arguments in discriminated union types into a
single word to reduce cell sizes. Argument packing is only enabled on C
back-ends with low-level data, and reordering arguments to improve
opportunities for packing is not yet attempted. The RTTI implementations for
other back-ends will need to be updated, but that is best left until after any
argument reordering change.
Modules which import abstract enumeration types are notified so by writing
declarations of the form:
:- type foo where type_is_abstract_enum(NumBits).
into the interface file for the module which defines the type.
compiler/prog_data.m:
Add an `arg_width' argument to constructor arguments.
Replace `is_solver_type' by `abstract_type_details', with an extra
option for abstract exported enumeration types.
compiler/handle_options.m:
compiler/options.m:
Add an internal option `--allow-argument-packing'.
compiler/make_hlds_passes.m:
Determine whether and how to pack enumeration arguments, updating the
`arg_width' fields of constructor arguments before constructors are
added to the HLDS.
compiler/mercury_to_mercury.m:
compiler/modules.m:
Write `where type_is_abstract_enum(NumBits)' to interface files
for abstract exported enumeration types.
compiler/prog_io_type_defn.m:
Parse `where type_is_abstract_enum(NumBits)' attributes on type
definitions.
compiler/arg_pack.m:
compiler/backend_libs.m:
Add a new module. This mainly contains a predicate which packs rvals
according to arg_widths, which is used by both LLDS and MLDS back-ends.
compiler/ml_unify_gen.m:
compiler/unify_gen.m:
Take argument packing into account when generating code for
constructions and deconstructions. Only a relatively small part of the
compiler actually needs to understand argument packing. The rest works
at the HLDS level with constructor arguments and variables, or at the
LLDS and MLDS levels with structure fields.
compiler/code_info.m:
compiler/var_locn.m:
Add assign_field_lval_expr_to_var and
var_locn_assign_field_lval_expr_to_var.
Allow more kinds of rvals in assign_cell_arg. I do not know why it was
previously restricted, except that the other kinds of rvals were not
encountered as cell arguments before.
compiler/mlds.m:
We can now rely on the compiler to pack arguments in the
mlds_decl_flags type instead of doing it manually. A slight downside
is that though the type is packed down to a single word cell, it will
still incur a memory allocation per cell. However, I did not notice
any difference in compiler speed.
compiler/rtti.m:
compiler/rtti_out.m:
Add and output a new field for MR_DuFunctorDesc instances, which, if
any arguments are packed, points to an array of MR_DuArgLocn. Each
array element describes the offset in the cell at which the argument's
value is held, and which bits of the word it occupies. In the more
common case where no arguments are packed, the new field is simply
null.
compiler/rtti_to_mlds.m:
Generate the new field to MR_DuFunctorDesc.
compiler/structure_reuse.direct.choose_reuse.m:
For now, prevent structure reuse reusing a dead cell which has a
different constructor to the new cell. The code to determine whether a
dead cell will hold the arguments of a new cell with a different
constructor will need to be updated to account for argument packing.
compiler/type_ctor_info.m:
Bump RTTI version number.
Conform to changes.
compiler/add_type.m:
compiler/check_typeclass.m:
compiler/equiv_type.m:
compiler/equiv_type_hlds.m:
compiler/erl_rtti.m:
compiler/hlds_data.m:
compiler/hlds_out_module.m:
compiler/intermod.m:
compiler/make_tags.m:
compiler/mlds_to_gcc.m:
compiler/opt_debug.m:
compiler/prog_type.m:
compiler/recompilation.check.m:
compiler/recompilation.version.m:
compiler/special_pred.m:
compiler/type_constraints.m:
compiler/type_util.m:
compiler/unify_proc.m:
compiler/xml_documentation.m:
Conform to changes.
Reduce code duplication in classify_type_defn.
compiler/hlds_goal.m:
Clarify a comment.
library/construct.m:
Make `construct' pack arguments when necessary.
Remove an old RTTI version number check as recommended in
mercury_grade.h.
library/store.m:
Deal with packed arguments in this module.
runtime/mercury_grade.h:
Bump binary compatibility version number.
runtime/mercury_type_info.c:
runtime/mercury_type_info.h:
Bump RTTI version number.
Add MR_DuArgLocn structure definition.
Add a macro to unpack an argument as described by MR_DuArgLocn.
Add a function to determine a cell's size, since the number of
arguments is no longer correct.
runtime/mercury_deconstruct.c:
runtime/mercury_deconstruct.h:
runtime/mercury_deconstruct_macros.h:
runtime/mercury_ml_arg_body.h:
runtime/mercury_ml_expand_body.h:
Deal with packed arguments when deconstructing.
Remove an old RTTI version number check as recommended in
mercury_grade.h.
runtime/mercury_deep_copy_body.h:
Deal with packed arguments when copying.
runtime/mercury_table_type_body.h:
Deal with packed arguments in tabling.
runtime/mercury_dotnet.cs.in:
Add DuArgLocn field to DuFunctorDesc. Argument packing is not enabled
for the C# back-end yet so this is unused.
trace/mercury_trace_vars.c:
Deal with packed arguments in MR_select_specified_subterm,
use for the `hold' command.
java/runtime/DuArgLocn.java:
java/runtime/DuFunctorDesc.java:
Add DuArgLocn field to DuFunctorDesc. Argument packing is not enabled
for the Java back-end yet so this is unused.
extras/trailed_update/tr_store.m:
Deal with packed arguments in this module (untested).
extras/trailed_update/samples/interpreter.m:
extras/trailed_update/tr_array.m:
Conform to argument reordering in the array, map and other modules in
previous changes.
tests/hard_coded/Mercury.options:
tests/hard_coded/Mmakefile:
tests/hard_coded/lco_pack_args.exp:
tests/hard_coded/lco_pack_args.m:
tests/hard_coded/pack_args.exp:
tests/hard_coded/pack_args.m:
tests/hard_coded/pack_args_copy.exp:
tests/hard_coded/pack_args_copy.m:
tests/hard_coded/pack_args_intermod1.exp:
tests/hard_coded/pack_args_intermod1.m:
tests/hard_coded/pack_args_intermod2.m:
tests/hard_coded/pack_args_reuse.exp:
tests/hard_coded/pack_args_reuse.m:
tests/hard_coded/store_ref.exp:
tests/hard_coded/store_ref.m:
tests/invalid/Mmakefile:
tests/invalid/where_abstract_enum.err_exp:
tests/invalid/where_abstract_enum.m:
tests/tabling/Mmakefile:
tests/tabling/pack_args_memo.exp:
tests/tabling/pack_args_memo.m:
Add new test cases.
tests/hard_coded/deconstruct_arg.exp:
tests/hard_coded/deconstruct_arg.exp2:
tests/hard_coded/deconstruct_arg.m:
Add constructors with packed arguments to these cases.
tests/invalid/where_direct_arg.err_exp:
Update expected output.
Branches: main
The direct argument functor change added the constant MR_SECTAG_NONE_DIRECT_ARG
in some places but not others, breaking deconstruct on C# and Java back-ends.
compiler/mlds_to_gcc.m:
java/runtime/Sectag_Locn.java:
library/rtti_implementation.m:
runtime/mercury_dotnet.cs.in:
Add missing constants.
Firstly, this change allows the ThreadScope tool to read Mercury's .eventlog
files without aborting. This is fixed by making THREAD_START and THREAD_STOP
events consistent.
Secondly, this change implements the missing EVENT_SLEEPING event. This
ensures that the implementation matches the description in the ThreadScope
paper.
Thirdly, the idle engines try to run a suspended context before running a
spark.
runtime/mercury_threadscope.c:
Don't post THREAD_START or THREAD_STOP events if it wouldn't make sense,
ie: the thread is already stopped. We do this to make RTS code simpler
since an engine may hang on to a context even when that context is stopped.
The RTS uses this for caching.
Create a new event ENGINE_SLEEPING to be used when an engine goes to sleep.
runtime/mercury_context.c:
Add some missing calls to threadscope, this ensures that Mercury's eventlog file
maintains some invariants expected by the ThreadScope visualisation tool.
Modify how idle engines look for new work: now, in all cases, an idle
engine will attempt to resume a context first.
Avoid taking the lock to the global run queue of contexts if the runqueue
pointer is NULL indicating that the queue is empty.
Branches: main
Implement a type representation optimisation ("direct argument functors"),
where a functor with exactly one argument can be represented by a tagged
pointer to the argument value, which itself does not require the tag bits,
e.g.
:- type maybe_foo ---> yes(foo) ; no.
:- type foo ---> foo(int, int). % aligned pointer
To ensure that all modules which could construct or deconstruct the functor
agree on the type representation, I had planned to automatically output
extra information to .int files to notify importing modules about functors
using the optimised representation:
:- type maybe_foo ---> yes(foo) ; no
where direct_arg is [yes/1].
However, the compiler does not perform enough (or any) semantic analysis
while making interface files. The fallback solution is to only use the
optimised representation when all importing modules can be guaranteed to
import both the top-level type and the argument type, namely, when both
types are exported from the same module. We also allow certain built-in
argument types; currently this only includes tuples.
Non-exported types may use the optimised representation, but when
intermodule optimisation is enabled, they may be written out to .opt files.
Then, we *do* add direct_arg attributes to .opt files to ensure that importing
modules agree on the type representation. The attributes may also be added by
Mercury programmers to source files, which will be copied directly into .int
files without analysis. They will be checked when the module is actually
compiled.
This patch includes work by Zoltan, who independently implemented a version
of this change.
compiler/hlds_data.m:
Record the direct arg functors in hlds_du_type.
Add a new option to cons_tag.
Fix some comments.
compiler/prog_data.m:
compiler/prog_io_type_defn.m:
Parse and record `direct_arg' attributes on type definitions.
compiler/prog_io_pragma.m:
Issue an error if the `direct_arg' attribute is used with a foreign
type.
compiler/make_tags.m:
compiler/mercury_compile_front_end.m:
Add a pass to convert suitable functors to use the direct argument
representation. The argument type must have been added to the type
table, so we do this after all type definitions have been added.
Move code to compute cheaper_tag_test here.
compiler/ml_unify_gen.m:
compiler/unify_gen.m:
Generate different code to construct/deconstruct direct argument
functors.
compiler/intermod.m:
Write `direct_arg' attributes to .opt files for functors
using the direct argument representation.
compiler/mercury_to_mercury.m:
Write out `direct_arg' attributes.
compiler/rtti.m:
compiler/rtti_out.m:
compiler/rtti_to_mlds.m:
Add an option to the types which describe the location of secondary
tag options. The functors which can use the optimised representation
are a subset of those which require no secondary tag.
Output "MR_SECTAG_NONE_DIRECT_ARG" instead of "MR_SECTAG_NONE" in
RTTI structures when applicable.
compiler/add_pragma.m:
compiler/add_type.m:
compiler/bytecode_gen.m:
compiler/check_typeclass.m
compiler/code_info.m:
compiler/equiv_type.m:
compiler/export.m:
compiler/foreign.m:
compiler/hlds_code_util.m:
compiler/hlds_out_module.m:
compiler/inst_check.m:
compiler/ml_proc_gen.m:
compiler/ml_switch_gen.m:
compiler/ml_tag_switch.m:
compiler/ml_type_gen.m:
compiler/module_qual.m:
compiler/modules.m:
compiler/post_term_analysis.m:
compiler/post_typecheck.m:
compiler/recompilation.check.m:
compiler/recompilation.usage.m:
compiler/recompilation.version.m:
compiler/simplify.m:
compiler/structure_reuse.direct.choose_reuse.m:
compiler/switch_gen.m:
compiler/switch_util.m:
compiler/tag_switch.m:
compiler/term_norm.m:
compiler/type_ctor_info.m:
compiler/type_util.m:
compiler/unify_proc.m:
compiler/unused_imports.m:
compiler/xml_documentation.m:
Conform to changes.
Bump RTTI version number.
doc/reference_manual.texi:
Add commented out documentation for `direct_arg' attributes.
library/construct.m:
Handle MR_SECTAG_NONE_DIRECT_ARG in construct.construct/3.
library/private_builtin.m:
Add MR_SECTAG_NONE_DIRECT_ARG constant for Java for consistency,
though it won't be used.
runtime/mercury_grade.h:
Bump binary compatibility version number.
runtime/mercury_type_info.h:
Bump RTTI version number.
Add MR_SECTAG_NONE_DIRECT_ARG.
runtime/mercury_deconstruct.c:
runtime/mercury_deep_copy_body.h:
runtime/mercury_ml_expand_body.h:
runtime/mercury_table_type_body.h:
runtime/mercury_term_size.c:
runtime/mercury_unify_compare_body.h:
Handle MR_SECTAG_NONE_DIRECT_ARG in RTTI code.
tests/debugger/Mmakefile:
tests/debugger/chooser_tag_test.exp:
tests/debugger/chooser_tag_test.inp:
tests/debugger/chooser_tag_test.m:
tests/hard_coded/Mercury.options:
tests/hard_coded/Mmakefile:
tests/hard_coded/construct_test.exp:
tests/hard_coded/construct_test.m:
tests/hard_coded/direct_arg_cyclic1.exp:
tests/hard_coded/direct_arg_cyclic1.m:
tests/hard_coded/direct_arg_cyclic2.m:
tests/hard_coded/direct_arg_cyclic3.m:
tests/hard_coded/direct_arg_intermod1.exp:
tests/hard_coded/direct_arg_intermod1.m:
tests/hard_coded/direct_arg_intermod2.m:
tests/hard_coded/direct_arg_intermod3.m:
tests/hard_coded/direct_arg_parent.exp:
tests/hard_coded/direct_arg_parent.m:
tests/hard_coded/direct_arg_sub.m:
tests/invalid/Mmakefile:
tests/invalid/where_direct_arg.err_exp:
tests/invalid/where_direct_arg.m:
tests/invalid/where_direct_arg2.err_exp:
tests/invalid/where_direct_arg2.m:
Add test cases.
tests/invalid/ee_invalid.err_exp:
Update expected output.
was re-used (as apposed to created from scratch) we would re-assign it's ID, so
that it was clear to see when a new computation was started. This is no-longer
necessary and prevents anyone using ThreadScope from understanding how contexts
are re-used.
This change also adds a new ThreadScope event that marks when a context is
released back to the free context pool.
runtime/mercury_context.c:
Only allocate new context IDs for new contexts (not re-used contexts
Use the new release_context event.
Fixed spelling mistake.
runtime/mercury_threadscope.h:
runtime/mercury_threadscope.c:
Add support for the release_context event.
runtime/mercury_threadscope.h:
runtime/mercury_threadscope.c:
Add a second parameter for the NEW_FUTURE event. The parameter is the id of
the string that holds the future's name.
runtime/mercury_par_builtin.h:
In threadscope grades use a two-args version of the new_future macro.
library/par_builtin.m:
Conform to changes in mercury_par_builtin.h, new_future now takes two
arguments.
compiler/dep_par_conj.m:
Create a name variable for each future and pass it as a second parameter to
calls to new_future.
Thread a threadscope string table throughout this transformation so that
strings for variables can be collected.
compiler/hlds_module.m:
Add a threadscope string table to the module_info structure.
compiler/global_data.m:
global_data_init now takes the threadscope string table and its size as
parameters. This is necessary because the table may be non-empty before
the LLDS transformation begins.
compiler/mercury_compile_llds_back_end.m:
Conform to changes in global_data.m
mdbcomp/program_representation.m:
Disable the polymorphism transformation for new_future/2 rather than the
old new_future/1.
runtime/mercury_threadscope.h:
runtime/mercury_threadscope.c:
Fix some compilation problems.
Rename stop conjunction and stop conjunct events to use the word "end"
rather than "stop". The meaning is clearer and the name matches that used
in the threadscope paper.
runtime/mercury_context.h:
runtime/mercury_context.c:
Re-order some operations in the idle loop: try to resume an earlier
context before working on a local spark, this may lead to leas blocking.
The RUN_CONTEXT event was posted from the load_context macro. Change
this to post the RUN_CONTEXT event explicitly.
Fix some over-long lines.
Conform to changes in mercury_threadscope.h.
runtime/mercury_thread.c:
Add an explicit call to post the RUN_CONTEXT event.
compiler/layout_out.m:
Add a missing output_layout_array_name call when writing out the
threadscope string table array.
compiler/par_conj_gen.m:
Conform to changes in runtime/mercury_threadscope.h
Branches: main
Fix a problem that was causing the namespace check to fail.
runtime/mercury_heap_profile.h:
Make sure that MR_STATIC_CODE_CONST is defined when doing
the namespace check.
Fix some formatting issues.
Branches: main
Implement a new form of memory profiling, which tells the user what memory
is being retained during a program run. This is done by allocating an extra
word before each cell, which is used to "attribute" the cell to an
allocation site. The attribution, or "allocation id", is an address to an
MR_AllocSiteInfo structure generated by the Mercury compiler, giving the
procedure, filename and line number of the allocation, and the type
constructor and arity of the cell that it allocates.
The user must manually instrument the program with calls to
`benchmarking.report_memory_attribution', which forces a GC and summarises
the live objects on the heap using the attributions. The mprof tool is
extended with a new mode to parse and present that data.
Objects which are unattributed (e.g. by hand-written C code which hasn't
been updated) are still accounted for, but show up in profiles as "unknown".
Currently this profiling mode only works in conjunction with the Boehm
garbage collector, though in principle it can work with any memory allocator
for which we can access a list of the live objects. Since term size
profiling relies on the same technique of using an extra word per memory
cell, the two profiling modes are incompatible.
The output from `mprof -s' looks like this:
------ [1] some label ------
cells words cumul procedure / type (location)
14150 38872 total
* 1949/ 13.8% 4872/ 12.5% 12.5% <predicate `parser.parse_rest/7' mode 0>
975/ 6.9% 1950/ 5.0% list.list/1 (parser.m:502)
487/ 3.4% 1948/ 5.0% term.term/1 (parser.m:501)
487/ 3.4% 974/ 2.5% term.const/0 (parser.m:501)
* 1424/ 10.1% 4272/ 11.0% 23.5% <predicate `parser.parse_simple_term_2/6' mode 0>
708/ 5.0% 2832/ 7.3% term.term/1 (parser.m:643)
708/ 5.0% 1416/ 3.6% term.const/0 (parser.m:643)
...
boehm_gc/alloc.c:
boehm_gc/include/gc.h:
boehm_gc/misc.c:
boehm_gc/reclaim.c:
Add a callback function to be called for every live object after a GC.
Add a function to write out the GC_size_map array.
compiler/layout.m:
Define the alloc_site_info type which is equivalent to the
MR_AllocSiteInfo C structure.
Add alloc_site_array as a kind of "layout" array.
compiler/llds.m:
Add allocation sites to `cfile' structure.
Replace TypeMsg argument (which was also for profiling) on `incr_hp'
instructions by an allocation site identifier.
Add a new foreign_proc_component for allocation site ids.
compiler/code_info.m:
compiler/global_data.m:
compiler/proc_gen.m:
Keep the set of allocation sites in the code_info and global_data
structures.
compiler/unify_gen.m:
Add allocation sites to LLDS allocation instructions.
compiler/layout_out.m:
compiler/llds_out_file.m:
compiler/llds_out_instr.m:
Output MR_AllocSiteInfo arrays in generated C files.
Output code to register the MR_AllocSiteInfo array with the Mercury
runtime.
Output allocation site ids for memory allocation instructions.
compiler/llds_out_util.m:
Add allocation sites to llds_out_info.
compiler/pragma_c_gen.m:
compiler/ml_foreign_proc_gen.m:
Generate a macro MR_ALLOC_ID which resolves to an allocation site
structure, for every foreign_proc whose C code contains the string
"MR_ALLOC_ID". This is to be used by hand-written C code which
allocates memory.
MR_PROC_LABELs are retained for backwards compatibility. Though
they were introduced for profiling, they seem to have been co-opted
for printf-debugging since then.
compiler/ml_global_data.m:
Add allocation site structures to the MLDS global data.
compiler/mlds.m:
compiler/ml_unify_gen.m:
Add allocation site id to `new_object' instruction.
compiler/mlds_to_c.m:
Output allocation site arrays and allocation ids in high-level C code.
Output a call to register the allocation site array with the Mercury
runtime.
Delete an unused predicate.
compiler/exprn_aux.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/mercury_compile_llds_back_end.m:
compiler/middle_rec.m:
compiler/ml_accurate_gc.m:
compiler/ml_elim_nested.m:
compiler/ml_optimize.m:
compiler/ml_util.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_gcc.m:
compiler/mlds_to_il.m:
compiler/mlds_to_java.m:
compiler/mlds_to_managed.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/use_local_vars.m:
compiler/var_locn.m:
Conform to changes.
compiler/pickle.m:
compiler/prog_event.m:
compiler/timestamp.m:
Conform to changes in memory allocation macros.
library/benchmarking.m:
Add the `report_memory_attribution' instrumentation predicates.
Conform to changes to MR_memprof_record.
library/array.m:
library/bit_buffer.m:
library/bitmap.m:
library/construct.m:
library/deconstruct.m:
library/dir.m:
library/io.m:
library/mutvar.m:
library/store.m:
library/string.m:
library/thread.semaphore.m:
library/version_array.m:
Use attributed memory allocation throughout the standard library so
that objects don't show up in the memory profile as "unknown".
Replace MR_PROC_LABEL by MR_ALLOC_ID.
mdbcomp/program_representation.m:
mdbcomp/rtti_access.m:
Replace MR_PROC_LABEL by MR_ALLOC_ID.
profiler/Mercury.options:
profiler/globals.m:
profiler/mercury_profile.m:
profiler/options.m:
profiler/output.m:
profiler/snapshots.m:
Add a new mode to `mprof' to parse and present the data from
`Prof.Snapshots' files.
Add options for the new profiling mode.
profiler/process_file.m:
Fix a typo.
runtime/mercury_conf_param.h:
#define MR_MPROF_PROFILE_MEMORY_ATTRIBUTION if memory profiling
is enabled and we are using Boehm GC.
runtime/mercury.h:
Make MR_new_object take an allocation id argument.
Conform to changes in memory allocation macros.
runtime/mercury_memory.c:
runtime/mercury_memory.h:
runtime/mercury_types.h:
Define MR_AllocSiteInfo.
Add memory allocation functions and macros which take into the
account the additional word necessary for the new profiling mode.
These should be used in preferences to the raw memory allocation
functions wherever possible so that objects do not show up in the
profile as "unknown".
Add analogues of realloc/free which take into account the offset
introduced by the attribution word.
Add function versions of the MR_new_object macros, which can't be
written in standard C. They are only used when necessary.
Add built-in allocation site ids, to be used in the runtime and
other hand-written code when context-specific ids are unavailable.
runtime/mercury_heap.h:
Make MR_tag_offset_incr_hp_msg and MR_tag_offset_incr_hp_atomic_msg
allocate an extra word when memory attribution is desired, and store
the allocation id there.
Similarly for MR_create{1,2,3}_msg.
Replace proclabel arguments in allocation macros by alloc_id
arguments.
Replace MR_hp_alloc_atomic by MR_hp_alloc_atomic_msg. It was only
used for boxing floats.
Conform to change to MR_new_object macro.
runtime/mercury_bootstrap.h:
Delete obsolete macro hp_alloc_atomic.
runtime/mercury_heap_profile.c:
runtime/mercury_heap_profile.h:
Add the code to summarise the live objects on the Boehm GC heap and
writes out the data to `Prof.Snapshots', for display by mprof.
Don't store the procedure name in MR_memprof_record: the procedure
address is enough and faster to compare.
runtime/mercury_prof.c:
Finish and close the `Prof.Snapshots' file when the program
terminates.
Conform to changes in MR_memprof_record.
runtime/mercury_misc.h:
Add a macro to expand to the name of the allocation sites array
in LLDS grades.
runtime/mercury_bitmap.c:
runtime/mercury_bitmap.h:
Pass allocation id through bitmap allocation functions.
Delete unused function MR_string_to_bitmap.
runtime/mercury_string.h:
Add MR_make_aligned_string_copy_msg.
Make string allocation macros take allocation id arguments.
runtime/mercury.c:
runtime/mercury_array_macros.h:
runtime/mercury_context.c:
runtime/mercury_deconstruct.c:
runtime/mercury_deconstruct_macros.h:
runtime/mercury_dlist.c:
runtime/mercury_engine.c:
runtime/mercury_float.h:
runtime/mercury_hash_table.c:
runtime/mercury_ho_call.c:
runtime/mercury_label.c:
runtime/mercury_prof_mem.c:
runtime/mercury_stacks.c:
runtime/mercury_stm.c:
runtime/mercury_string.c:
runtime/mercury_thread.c:
runtime/mercury_trace_base.c:
runtime/mercury_trail.c:
runtime/mercury_type_desc.c:
runtime/mercury_type_info.c:
runtime/mercury_wsdeque.c:
Use attributed memory allocation throughout the runtime so that
objects don't show up in the profile as "unknown".
runtime/mercury_memory_zones.c:
Attribute memory zones to the Mercury runtime.
runtime/mercury_tabling.c:
runtime/mercury_tabling.h:
Use attributed memory allocation macros for tabling structures.
Delete unused MR_table_realloc_* and MR_table_copy_bytes macros.
runtime/mercury_deep_copy_body.h:
Try to retain the original attribution word when copying values.
runtime/mercury_ml_expand_body.h:
Conform to changes in memory allocation macros.
runtime/mercury_tags.h:
Replace proclabel arguments by alloc_id arguments in allocation macros.
runtime/mercury_wrapper.c:
If memory attribution is enabled, tell Boehm GC that pointers may be
displaced by an extra word.
trace/mercury_trace.c:
trace/mercury_trace_tables.c:
Conform to changes in memory allocation macros.
extras/net/tcp.m:
extras/solver_types/library/any_array.m:
extras/trailed_update/tr_array.m:
Conform to changes in memory allocation macros.
doc/user_guide.texi:
Document the new profiling mode.
doc/reference_manual.texi:
Update a commented out example.
future is available and if it is reads the value and unlocks the future. We
can avoid the locking operation in many cases by testing if the future is
available before taking the lock. If the future is not available then take
the lock and re-test to see if the future is available.
To make this safe we now write the future's value before writing to the field
that says it's available, these two writes are stored in the correct order by
using an 'sfence' instruction.
runtime/mercury_par_builtin.m:
As above.
Also re-order the fields of the future structure, putting fut_value and
fut_signalled next to each other, they're more likely to be in te same
cache line this way.
library/Mmakefile:
Make par_builtin.o depend on mercury_par_builtin.h in the runtime.
Branches: main
Disable garbage collection during early runtime initialisation, when little or
no garbage is created anyway.
runtime/mercury_wrapper.c:
As above.