Commit Graph

85 Commits

Author SHA1 Message Date
Julien Fischer
b57cfb54a5 Update references to configure.in.
configure.ac:
compiler/notes/overall_design.html:
deep_profiler/conf.m:
runtime/mercury_context.h:
runtime/mercury_goto.h:
runtime/mercury_grade.h:
runtime/mercury_regs.h:
    As above -- the configure template has been named configure.ac
    for a long time now.
2020-10-25 14:45:35 +11:00
Julien Fischer
f60caca91c Use trail segments by default in trailing grades.
Until now, we have supported two variants of trailing grades, those that use a
fixed-size trail (.tr) and those that use trail segments (.trseg).  This change
removes support for fixed sized trails, and renames the .trseg grade component
to .tr. The .trseg grade now acts a synonym for .tr; it is deprecated, since we
intend to eventually delete it.  Until then, the behavior of the old .tr grade
component should be available, though to developers only, by compiling the
whole system with EXTRA_CFLAGS = -DMR_USE_FIXED_SIZE_TRAIL.

runtime/mercury_conf_param.h:
    Delete the MR_TRAIL_SEGMENTS macro. Its effect is now implied by
    MR_USE_TRAIL, unless a new macro, MR_USE_FIXED_SIZE_TRAIL, is defined.
    Developers can use this new macro to disable trail segments, should the
    need for doing that arise.

runtime/mercury_grade.h:
    Add a new macro that defines a binary compatibility version number for
    trailing; use that in the grade part for trailing.

    Use "_trfix" or "_trseg" as the prefix of the trailing part of the
    MR_GRADE_VAR depending on if MR_USE_FIXED_SIZE_TRAIL is defined or
    not.

runtime/mercury_trail.[ch]:
runtime/mercury_context.h:
    Enable trail segments by default, only disabling them if
    MR_USE_FIXED_SIZE_TRAIL is enabled.

runtime/mercury_wrapper.c:
trace/mercury_trace_cmd_developer.c:
    Conform to the above changes.

compiler/compile_target_code.m:
    Do not pass options for trail segments to the C compiler.

compiler/compute_grade.m:
    Treat trseg as a synonym for tr.

compiler/options.m:
    Deprecate --trail-segments.

grade_lib/grade_spec.m:
grade_lib/grade_string.m:
grade_lib/grade_structure.m:
grade_lib/grade_vars.m:
grade_lib/try_all_grade_structs.m:
grade_lib/var_value_names.m:
    Remove the trseg component.

scripts/canonical_grade.sh-subr:
scripts/init_grade_options.sh-subr:
scripts/mgnuc.in:
scripts/parse_grade_options.sh-subr:
    Remove support for the --trail-segments option.

doc/user_guide.texi:
    Update the documentation for the --trail-segments.

    Comment out the documentation of the --trail-size and --trail-size-kwords
    runtime options; they are no longer useful to non-developers.

NEWS:
    Announce this change.
2020-02-18 13:07:24 +11:00
Mark Brown
d465fa53cb Update the COPYING.LIB file and references to it.
Discussion of these changes can be found on the Mercury developers
mailing list archives from June 2018.

COPYING.LIB:
    Add a special linking exception to the LGPL.

*:
    Update references to COPYING.LIB.

    Clean up some minor errors that have accumulated in copyright
    messages.
2018-06-09 17:43:12 +10:00
Peter Wang
f1a148477f Fix undefined reference to MR_thread_pinning.
This was possible when building against an old glibc where
MR_HAVE_SCHED_SETAFFINITY does not imply MR_HAVE_THREAD_PINNING.

runtime/mercury_context.h:
runtime/mercury_wrapper.c:
    Declare MR_thread_pinning and only use it when actually building
    with thread pinning support.
2018-05-23 17:52:39 +10:00
Peter Wang
4af9648908 Defer CPU detection in high-level C grades, and related cleanups.
library/thread.m:
    Make num_processors/3 call a function in the runtime instead of
    reading a global variable, so we can defer initialisation of the
    variable.

    Return `no' if unable to determine the number of CPUs instead of
    defaulting to 1.

runtime/mercury_context.c:
runtime/mercury_context.h:
    Add a function MR_get_num_processors().

    Hide MR_num_processors_detected variable.

    Rename MR_detect_num_processors() to
    MR_init_available_cpus_and_detect_num_processors()
    to reflect what it does.

    Add a function MR_free_available_cpus(), currently only implemented
    for the Linux CPU affinity API path.

    Call MR_init_available_cpus_and_detect_num_processors() at startup
    only in low-level C grades. The only reason to call it in high-level
    C grades was to set MR_num_processors_detected. It will now be set
    at the first call to MR_get_num_processors().

    Free the available CPU data structures as soon as we have
    MR_num_processors_detected, unless required for thread pinning.

    Rename MR_setup_num_threads to MR_setup_num_ws_engines.
    Pass the number of processors detected as an argument to make
    the dependency explicit.

    Make MR_pin_primordial_thread return a CPU number as its comment
    suggests.

    Delete the global variable MR_primordial_thread_cpu as it is
    currently unused.

    Add a function MR_done_thread_pinning().

runtime/mercury_wrapper.c:
    Call MR_done_thread_pinning() to free up available CPU data
    structures after thread pinning is done with them.

    Add an unrelated XXX.
2018-05-23 17:52:39 +10:00
Peter Wang
fb682d9d71 Fix use of Linux CPU affinity API.
The main fix is to retry the sched_getaffinity() call in
MR_reset_available_cpus(). If that call failed because we provided a
cpuset that is too small for the kernel, MR_reset_available_cpus() would
set MR_available_cpus to NULL but leave MR_cpuset_size non-zero,
leading to a null pointer dereference soon after.

runtime/mercury_context.c:
runtime/mercury_context.h:
    Use a macro MR_HAVE_LINUX_CPU_AFFINITY_API if we have all the pieces
    we need for that API.

    Rename MR_num_processors to MR_num_processors_detected for clarity.

    Add a global variable MR_cpuset_num_cpus to record the size of the
    cpuset that we allocated.

    Change MR_cpuset_size to size_t to match the type in the underlying
    interface.

    Rename MR_available_cpus to MR_cpuset_available to clarify the
    relationship with MR_cpuset_size and MR_cpuset_num_cpus.

    In MR_reset_available_cpus(), retry sched_getaffinity() if it fails
    with EINVAL. It may fail because the kernel uses a larger CPU
    affinity mask than the cpuset we provided; then we must retry the
    call with a larger cpuset.

    Ensure MR_reset_available_cpus() does not leave MR_cpuset_*
    variables in an inconsistent state.

    Make MR_detect_num_processors() set MR_num_processors to 1 if the
    MR_cpuset_available set is somehow empty after
    MR_reset_available_cpus(). It should not happen, but the program
    should be able to continue anyway.

    In MR_pin_thread_no_locking(), loop over the entire
    MR_cpuset_available set to find a target CPU to pin to.
    The CPUs available to the thread are not necessarily numbered
    consecutively from the initial CPU onwards mod MR_num_processors.

    Fix cpuset allocation in MR_do_pin_thread to not assume that CPUs
    are numbered from 0 to MR_num_processors-1. Fix a memory leak.

    Clean up MR_current_cpu().
    Prevent a null pointer dereference in the hwloc path:
    hwloc_get_pu_obj_by_os_index always returns NULL on my system.

    Change some types from 'unsigned' to 'int' to match underlying APIs.

library/thread.m:
    Conform to variable renaming.

checkout sysconf(_SC_NPROCESSORS_ONLN); error
2018-05-23 17:52:21 +10:00
Zoltan Somogyi
53b573692a Convert C code to use // style comments.
runtime/*.[ch]:
trace/*.[chyl]:
    As above. In some places, improve comments, e.g. by expanding contractions
    such as "we've". Add #ifndef guards against double inclusion around
    the trace/*.h files that did not already have them.

tools/*:
    Make the corresponding changes in shell scripts that generate .[ch] files
    in the runtime.

tests/*:
    Conform to a slight change in the text of a message.
2016-07-14 13:57:35 +02:00
Zoltan Somogyi
67326f16e4 Fix style issues in the runtime.
Move all .h and .c files to four-space indentation without tabs,
if they weren't there already.

Use the same vim line for all .h and .c files.

Align all backslashes at the ends of lines in macro definitions.
Align close comment signs.

In some places, fix inconsistent indentation.

Fix a bunch of comments. Add XXXs to a few of them.
2016-07-09 12:14:00 +02:00
Paul Bone
0bd301b59e Add thread.num_processors/3
Add a predicate that retrieves the number of processors available to this
process if known.

library/thread.m:
    As above.

runtime/mercury_context.c:
    The existing code that determines the number of processors was only used
    on thread safe low-level C grades.  Make it also available for thread
    safe high-level C grades.

    Add a fall back (less accurate) method of determining the number of
    processors.

    Remove an out-of-date comment.

runtime/mercury_context.h:
    Export the number of available processors.

scripts/ml.in:
    Link to libhwloc (if configured) on thread safe high and low-level C
    grades.

NEWS:
    Announce new predicate.

configure.ac:
    Update a --help message.
2016-05-13 16:17:52 +10:00
Julien Fischer
787f8b2c6d Fix spelling and grammer in runtime comments.
runtime/*.[ch]:
    As above.
2015-09-03 15:43:35 +10:00
Peter Wang
29f2dcf213 Support dynamic creation of Mercury engines in low-level C parallel grades.
This change allows Mercury engines (each in a separate OS thread) to be
created and destroyed dynamically in low-level C grades.

We divide Mercury engines into two types:

    "Shared" engines may execute code from any Mercury thread.
    Shared engines may steal work from other shared engines, so are also
    called work-stealing engines; we do not have shared engines that
    refrain from work-stealing.

    "Exclusive" engines execute code only for a single Mercury thread.

Only exclusive engines may be created and destroyed dynamically so far.
This assumption could be lifted when and if the need should arise.

Exclusive engines are a means for the user to map a Mercury thread directly
to an OS thread.  Calls to blocking procedures on that thread will not block
progress in arbitrary other Mercury threads.  Foreign code which depends on
the OS thread-local state is usable when called from that thread.

We do not yet allow shared engines to steal parallel work from exclusive
engines.

runtime/mercury_wrapper.c:
runtime/mercury_wrapper.h:
	Rename MR_num_threads to MR_num_ws_engines.  It counts only
	work-stealing engines.  Move comment to the header file.

	Add MR_max_engines.  The default value is arbitrary.

	Add MERCURY_OPTIONS `--max-engines' option.

	Define MR_num_ws_engines and MR_max_engines only with
	MR_LL_PARALLEL_CONJ.

runtime/mercury_context.c:
runtime/mercury_context.h:
	Rename MR_num_idle_engines to MR_num_idle_ws_engines.
	It only counts idle work-stealing engines.

	Extend MR_spark_deques to MR_max_engines length.

	Extend engine_sleep_sync_data to MR_max_engines length.

	Add function to index engine_sleep_sync_data with optional bounds
	checking.

	Replace instances of MR_num_threads by MR_num_ws_engines or
	MR_max_engines as appropriate.

	Add MR_ctxt_exclusive_engine field.

	Rename existing MR_Context fields to remove the implication that the
	engine "owns" the context.  The new exclusive_engine field does
	imply a kind of ownership, hence potential confusion.

	Rename MR_SavedOwner, too.

	Make MR_find_ready_context respect MR_ctxt_exclusive_engine.

	Make MR_schedule_context respect MR_ctxt_exclusive_engine.

	Rename MR_try_wake_an_engine to MR_try_wake_ws_engine
	and restrict it to work-stealing engines.

	Rename MR_shutdown_all_engines to MR_shutdown_ws_engines
	and restrict it to work-stealing engines.

	Make try_wake_engine and try_notify_engine decrement
	MR_num_idle_ws_engines only for shared engines.

	In MR_do_idle, make exclusive engines bypass work-stealing
	and skip to the sleep state.

	In MR_do_sleep, make exclusive engines ignore work-stealing advice
	and abort the program if told to shut down.

	Assert that a context with an exclusive_engine really is only loaded
	by that engine.

	In MR_fork_new_child, make exclusive engines not attempt to wake
	work-stealing engines.  Its sparks cannot be stolen anyway.

	Make do_work_steal fail the attempt for exclusive engines.
	There is one call where this might happen.

	Add notes to MR_attempt_steal_spark.  Its behaviour is unchanged.

	Replace a call to MR_destroy_thread by MR_finalize_thread_engine.

	Delete MR_num_exited_engines.  It was unused.

runtime/mercury_thread.c:
runtime/mercury_thread.h:
	Delete MR_next_engine_id and MR_next_engine_id_lock.  We can no longer
	allocate engine ids by incrementing a counter.  Engine ids need to be
	reused as they act as indices into fixed-sized arrays.

	Extend MR_all_engine_bases to MR_max_engines entries.

	Add MR_all_engine_bases_lock to protect MR_all_engine_bases.

	Add MR_highest_engine_id.

	Add MR_EngineType with the two options described.

	Split the main part of MR_init_engine into a new function which
	accepts an engine type.  MR_init_engine is used by generated code so
	maintain the interface.

	Factor out setup/shutdown for thread support.

	Make MR_finalize_thread_engine call the shutdown function.

	Specialise MR_create_thread into MR_create_worksteal_thread.
	The generic form was unused.

	Move thread pinning into MR_create_worksteal_thread as other threads
	do not require it.

	Delete MR_destroy_thread.  Its one caller can use
	MR_finalize_thread_engine.

	Delete declaration for non-existent variable
	MR_init_engine_array_lock.

runtime/mercury_engine.c:
runtime/mercury_engine.h:
	Add MR_eng_type field.

	Make MR_eng_spark_deque a pointer to separately-allocated memory.
	The reason is given in MR_attempt_steal_spark.

	Add MR_ENGINE_ID_NONE, a dummy value for MR_ctxt_exclusive_engine.

	Delete MR_eng_owner_thread which was obsoleted by engine ids
	before.

	Delete misplaced declaration of MR_all_engine_bases.

runtime/mercury_memory_zones.c:
	Replace MR_num_threads by appropriate counters (I hope).

runtime/mercury_memory_handlers.c:
runtime/mercury_par_builtin.h:
	Conform to changes.

runtime/mercury_threadscope.c:
	Conform to renaming (but it might be wrong).

library/thread.m:
	Add hidden predicate `spawn_native' for testing.
	The interface is subject to change.

	Share much of the code with the high-level C backend.

library/par_builtin.m:
	Delete `num_os_threads' as it is unused.

doc/user_guide.texi:
	Document MERCURY_OPTIONS `--max-engines' option.
2014-07-10 14:57:48 +10:00
Peter Wang
fdd8e9013e Remove unnecessary detection of sync term size during configure.
The size of a sync term must be kept synchronised between the runtime
and the compiler.  The size is "detected" during configure by running
a small program whose output would always be the same.  It doesn't work
for cross-compilation so a value is hard-coded in configure_mingw_cross,
which is liable to be missed if the sync term ever changes.

configure.ac:
	Hard code sync term size as it measures a number of detstack
	slots, which does not depend on architecture.

runtime/mercury_conf.h.in:
	Define MR_SYNC_TERM_SIZE.

runtime/mercury_std.h:
	Add MR_STATIC_ASSERT macro.

runtime/mercury_context.h:
	Check at compile time that the value of MR_SYNC_TERM_SIZE
	from configure.ac matches the real size.

tools/configure_mingw_cross:
	Delete mercury_cv_sync_term_size as no longer needed.
2014-03-25 12:34:01 +11:00
Paul Bone
e17a22caaa Parallel RTS changes for idle engines and notifications.
This redesign prevents at least one race condition that could previously
occur with scheduling of contexts.  This work actually represents an attempt
to design and check these algorithms rather than my previous work which was
no-where near as thorough.

runtime/mercury_context.[ch]:
    Modified along with this are the actions that an engine can undertake,
    many have been renamed but MR_ENGINE_ACTION_CONTEXT_ADVICE has been
    added, which is given to an engine to instruct that engine to check the
    context run queue.

    There are now two procedures for notifying/waking engines.
    try_wake_engine and try_notify_engine, the former is used only when an
    engine is sleeping, the latter is to be used when the engine is idle but
    not sleeping.

    I've also modified how the MR_num_idle_engines counter is used.  It
    now counts the number of engines in the stealing or sleeping states.

    A lot of fprintfs have been added for thread debugging.  They use the
    MR_DEBUG_THREADS macro and --debug-threads runtime option.

    We now support polling and non-polling modes for work stealing.  Engines
    can either be notified when there is a new spark, slowing down spark
    creation time.  Or engines can wake up and poll other engines for
    sparks meaning that the system doesn't idle well.

    Remove MR_do_idle_clean_context which was a specialised entry point for
    the MR_do_idle code.  However there is not much difference between it
    and MR_do_idle, so it is not worth maintaining two procedures.

    Remove MR_do_idle_dirty_context, this logic has been re-implemented in
    MR_join_and_continue where it can be faster.

    Make MR_attempt_steal_spark skip stealing from the current engine.

    Add a memory fence to MR_join_and_continue to make sure that context
    data is saved before marking the context as runnable.

    Modify the spark deque structure to reduce false-sharing.  (Very minor
    speed-up).
2012-10-18 06:12:03 +00:00
Paul Bone
79c5b951fb Modify how an idle engine with a dirty context gets new work.
These changes fix a couple of performance bugs and also modify the
algorithms in the RTS so that they match the ones in my thesis.

runtime/mercury_context.[ch]:
    An engine with a dirty context will no-longer check for a runnable
    context first.  It first checks for a local spark, if the spark is not
    compatible it puts the spark back on the local spark stack.  Then it
    saves the dirty context and jumps to MR_idle - the entry point for
    engines with no contexts.

    Remove the MR_MAYBE_TRAMPOLINE macro and expand out any case where it
    was previously used.

    We no longer execute a spark when an engine has a dirty incompatible
    context.  (previously we saved the old context then allocated a new
    one).  Therefore prepare_engine_for_spark() no-longer needs the
    join_label parameter (which was used when saving a dirty context).
    Consequently, the same parameter has been removed from
    MR_do_steal_spark.

    If a work stealing thief looses a race then it retries until it wins or
    there is no work.

    Use a mutex rather than a (binary) semaphore to protect the engine sleep
    sync structure.  This more directly reflects the intentions plus POSIX
    mutexes don't always make kernel calls but semaphores do.

    The MR_num_idle_engines global was not being updated correctly, in
    particular it showed that there were idle engines even when there
    weren't.  This caused an engine creating a spark to always attempt to
    notify other engines of the spark.  Fixing the use of
    MR_num_idle_engines improves performance by over a factor of 2x in the
    naive fibs micro benchmark.

    Refactor MR_join_and_continue to match the simplier structure in my
    thesis.

    Rename MR_destroy_context to MR_release_context, which gives a more
    accurate impression.

    Update some MR_assert calls that where incorrect.

runtime/mercury_engine.c:
runtime/mercury_par_builtin.c:
    Conform to MR_release_context.

library/thread.m:
    Conform to MR_release_context.

    Add a missing MR_save_context.
2012-08-06 02:11:24 +00:00
Paul Bone
75f961dedf Fix two bugs in the parallel runtime code.
One bug was caused when the master context, in MR_lc_finish() would release the
contexts used by each of the slots.  The release code attempts to save state
from the engine back into the context, which is necessary most of the time.
However, in this case it saved state from the engine running the master
context, into other contexts, so that when they where re-used they used an
invalid stack pointer.

Another bug was found in code with recursive parallel conjunctions.  Each
context structure contains a pointer to a code location, it is used as a value
for the instruction pointer when a context is resumed.  The
MR_join_and_continue operation for parallel conjunctions uses this resume to
ensure that the master context for a parallel conjunction is only resumed if it
has become blocked and is ready to be resumed.  However the field was never
cleared before and will always contain the same value parallel conjunctions are
nested as they will all have the same resume point.  This caused the master
context to be resumed before it had fully blocked, causing it to be resumed
with an invalid state.

A potential bug was found where a field should have been volatile to prevent
the compiler from caching its value when doing so would not be safe.

Widen a couple of critical sections as they didn't quite protect against some
race conditions.  This is another potential cause of bugs.

runtime/mercury_par_builtin.[ch]:
    Make the master_context field of the loop control structure volatile so
    that the compiler doesn't cache its value.

    Make the last worker to finish take a lock earlier, to ensure that the
    master context won't be left waiting forever.

    Add a comment explaining why a context must not be saved before calling
    MR_destroy_context().

    Improve debugging code to print out the value of the stack or parent stack
    pointer, depending on the code in question.

    Make the lock in MR_lc_finish() wider, so that the lock is held when the
    code checks to see if it should block.

runtime/mercury_context.c:
    MR_destroy_context() no longer saves the context before releasing it.

    MR_destroy_context() no longer sets the MR_ctxt_resume_owner_engine field
    of the context since it's not currently used.

    MR_join_and_continue(), the barrier for parallel conjunctions, how resets
    the resume code pointer of the master context when it switches to it.

runtime/mercury_context.h:
    Described the reason why the context must be saved before it is
    destroyed/released.

runtime/mercury_context.c:
runtime/mercury_engine.c:
    Call MR_save_context() before calling MR_destroy_context()
2011-10-16 03:34:40 +00:00
Paul Bone
a071eaba53 Improve thread pinning:
+ Now pins threads intelligently on SMT systems by balancing threads among
      cores.
    + performs fewer migrations when pinning threads (If a thread's current
      CPU is a valid CPU for pinning, then it is not migrated).
    + Handle cases where the user requests more threads than available CPUs.
    + Handle cases where the process is restricted to a subset of CPUs by its
      environment.  (for instance, Linux cpuset(7))

This is largely made possible by the hwloc library
http://www.open-mpi.org/projects/hwloc/  However, hwloc is not required and the
runtime system will fall back to sched_setaffinity(), it will simply be less
intelligent WRT SMT.

runtime/mercury_context.h:
runtime/mercury_context.c:
    Do thread pinning either via hwloc or sched_setaffinity.  Previously only
    sched_setaffinity was used.

    Update thread-pinning algorithm, this:

    Include the general thread pinning code only if MR_HAVE_THREAD_PINNING is
    defined.

    Use a combination of sysconf and sched_getaffinity to detect the number of
    processors when hwloc isn't available.  This makes the runtime compatible
    with Linux cpuset(7) when hwloc isn't available.

configure.in:
Mmake.common.in:
    Detect presence of the hwloc library.

configure.in:
    Detect sched_getaffinity()

aclocal.m4:
acinclude.m4:
    Move aclocal.m4 to acinclude.m4, the aclocal program will build aclocal.m4
    and retrieve macros from the system and the contents of acinclude.m4.

Mmakefile:
    Create a make target for aclocal.m4.

runtime/Mmakefile:
    Link the runtime with libhwloc in low-level C parallel grades.

    Include CFLAGS for libhwloc.

scripts/ml.in:
    Link programs and libraries with libhwloc in low-level C parallel grades.

runtime/mercury_conf.h.in:
    Define MR_HAVE_HWLOC when it is available.

    Define MR_HAVE_SCHED_GETAFFINITY when it is available.

runtime/mercury_conf_param.h:
    Define MR_HAVE_THREAD_PINNING if either hwloc or [sched_setaffinity and
    sched_getaffinity] are available.

runtime/mercury_thread.c:
runtime/mercury_wrapper.c:
    Only call MR_pin_thread and MR_pin_primordial_thread if
    MR_HAVE_THREAD_PINNING is defined.

runtime/mercury_thread.h:
runtime/mercury_context.h:
    Move the declaration of MR_pin_primordial_thread to mercury_context.h from
    mercury_thead.h since it's definition is in mercury_context.c.

    Require MR_HAVE_THREAD_PINNING for the declaration of
    MR_pin_primordial_thread.

runtime/mercury_wrapper.c:
    Conform to changes in mercury_context.h

INSTALL_CVS:
tools/test_mercury
    Run aclocal at the right times while testing Mercury.
2011-10-13 02:42:21 +00:00
Paul Bone
ea9eb7a654 Introduce loop control runtime code.
runtime/mercury_par_builtin.h:
runtime/mercury_par_builtin.c:
    Introduce loop control runtime code.

runtime/mercury_context.h:
    Introduce a new new macro to tune the size of contexts that are used as
    workers by the loop control runtime.  This is set to the same context size
    as for sparks.

runtime/mercury_context.c:
    Fixed a typeo in a comment.

library/par_builtin.m:
    Create predicate versions of the par builtin macros runtime code.  The only
    primitive without a predicate version is MR_lc_spawn_off which cannot be
    expressed in Mercury and needs support from the LLDS stage in the compiler.

mdbcomp/program_representation.m:
    Add par_builtin.lc_finish/1 as an externally defined predicate.  This tells
    the debugger not to expect any events for it.
2011-09-12 04:51:17 +00:00
Paul Bone
987d2e31e3 Fix ThreadScope support since my recent work stealing changes.
runtime/mercury_threadscope.h:
runtime/mercury_threadscope.c:
    Fix some compilation problems.

    Rename stop conjunction and stop conjunct events to use the word "end"
    rather than "stop".  The meaning is clearer and the name matches that used
    in the threadscope paper.

runtime/mercury_context.h:
runtime/mercury_context.c:
    Re-order some operations in the idle loop: try to resume an earlier
    context before working on a local spark, this may lead to leas blocking.

    The RUN_CONTEXT event was posted from the load_context macro.  Change
    this to post the RUN_CONTEXT event explicitly.

    Fix some over-long lines.

    Conform to changes in mercury_threadscope.h.

runtime/mercury_thread.c:
    Add an explicit call to post the RUN_CONTEXT event.

compiler/layout_out.m:
    Add a missing output_layout_array_name call when writing out the
    threadscope string table array.

compiler/par_conj_gen.m:
    Conform to changes in runtime/mercury_threadscope.h
2011-05-24 04:16:48 +00:00
Paul Bone
52aea7f18b Undo a previous change to thread pinning.
I enabled thread pinning by default in a previous commit except where -P is
explicitly specified.  But this left the programmer with no-way to specify
-P and have thread pinning.

This change reverts my previous change.

runtime/mercury_context.h:
runtime/mercury_context.c:
runtime/mercury_wrapper.c:
doc/user_guide.texi:
    As above.
2011-04-15 11:13:45 +00:00
Paul Bone
f1779bd1e8 Improve work stealing. Spark deques have been associated with contexts so far.
This is a problem for the following reasons:

    The work stealing code must take a lock to access the resizeable array of
    work stealing dequeues.  This adds global contention that can be avoided if
    this array has a fixed size.

    If a context is blocked on a future then that engine cannot execute the
    sparks from that context, instead it tries to find global work, this is
    more expensive than necessary.

    If there are a few dozen contexts then there may be just as many work
    stealing queues to take work from, the density of these queues will be
    higher if they are fewer.  Therefore work stealing will be more successful
    on average.

This change associates spark deques with Mercury Engines rather than Contexts
to avoid these problems.

This has invalidated some invariants that allowed the runtime system to make
some worth-while optimisations.  These optimisations have been maintained.
Mercury's idle loop has been reimplemented to allow for this.  This
re-implementation has allowed for a number of other improvements:

    Polling was used to check for new global sparks.  This has been removed and
    each engine now sleeps using it's own semaphore.

    Checks for work can be done in different orders depending on how an engine
    joins the idle loop.

    When global work becomes available a particular engine can be woken up
    rather than any arbitrary engine.  We take advantage of this when making
    contexts runnable, we try to schedule them on the engine that last executed
    them.

    When an engine is woken up it can be instructed with what it should do upon
    waking up.

    When a engine looks for a context to run, it will try to pick a context
    that was last executed on it.  This may avoid cache misses when the context
    begins to run.

In the future we should consider:
    Experiment with telling engines which context to run.

    Improve the selection of which engine work should be scheduled on to be
    hardware and memory-hierarchy aware.

Things that need doing next (probably next week):
    ./configure should check for POSIX semaphore support.

    Profiling times have been broken by this change, they will need fixing.

    The threadscope event long now breaks an invariants that the threadscope
    graphical tool requires.

    Semaphores are setup but never released, this is not a big problem but the
    manual page says that some implementations may leak resources.

runtime/mercury_context.h:
runtime/mercury_context.c:
    Remove the spark deque field from the MR_Context structure.

    Export the new array of spark deques so that other modules may fill in
    elements as engines are setup.

    Modify the resume_owner_thread field of the MR_Context structure, this was
    used to ensure that a context returning through C code would be resumed on
    the engine with the correct C stack and depth.  This field is now an engine
    id and has been renamed to resume_owner_engine, it is advisory unless
    resume_engine_required is also set.  This way it is used to advise which
    engine most recently executed this context and therefore may have a warm
    cache.

    Remove code that dynamically resized the array of spark deques.  Including
    the lock that protected against updating this array while it was being read
    from other thread.

    Introduce code that initialises the statically sized array of spark deques.

    Reimplement the idle loop.  This replaces MR_runnext and MR_do_runnext with
    MR_idle and MR_do_idle respectively.  There are also two new entry points
    into the idle loop.  Which one to use depends on the state of the engine.

    Introduce new mechanisms for waking a particular engine.  For example the
    engine that last executed a context that is now runnable.

    Change the algorithm for selecting which context to run, try to select
    contexts that where last used on the current engine to avoid cache misses.

    Use an engine's victim counter rather than a global victim counter when
    trying to steal work.

    Introduce some conditionally-compiled code that can be used to profile how
    quickly new contexts can be created.

    Rename MR_init_thread_stuff and MR_finalize_thread_stuff.  The term thread
    has been replaced with context since they're in mercury_context.c.  This
    allows the creation of a new function MR_init_thread_stuff() in
    mercury_thread.c I also found the mismatch between the function names and
    file name confusing.  Move some of the code from MR_init_context_stuff to
    the new MR_init_thread_stuff function where it belongs.

    Refactor the thread pinning code so that even when thread pinning is
    disabled it can be used to allocate each thread to a CPU but not actually
    pin them.

    Fix some whitespace errors.

runtime/mercury_thread.h:
runtime/mercury_thread.c:
    In MR_init_engine():
        Allocate an engine id for each engine.

        A number of arrays had one slot per engine and where setup using a
        lock.  Now engine ids are used to index each array and setup is done
        without a lock, each engine simply sets up its own slot.

        Setup the new per-engine work stealing deques.

    The MR_all_engine_bases array has been moved to this file.

    Implement a new MR_init_thread_stuff function which initialises some global
    variables and locks.  Some of MR_init_thread_stuff has been moved from
    mercury_context.c

    Pin threads as part of MR_init_thread, excluding the primordial thread
    which must be pinned before threadscope is initialised.

    Add functions for debugging the use of semaphores.

    Add corresponding macros that can be used to redirect semaphore calls to
    debugging functions as above.

    Improved thread debugging code, ensured that stderr is flushed after every
    use, and that logging is done after calls return as well as before they're
    called.

    Conform to changes in mercury_context.h

runtime/mercury_engine.h:
runtime/mercury_engine.c:
    Add spark deque and victim counter fields to the MercuryEngine structure.

    Make the MR_eng_id field of the MercuryEngine structure available in all
    thread safe grades, formerly it was used in only threadscope grades.

    Move the MR_all_engine_bases variable to mercury_thread.[ch]

    Put a reference to the engine's spark queue into the global array.  This is
    done here, so that it is after thread pinning because the original plan was
    to have this array sorted by CPU rather then engine - we may yet do this in
    the future.

    Initialise an engine's spark deque when an engine is initialised.

    Setup the engine specific threadscope data in mercury_thread.c

    Conform to changes in mercury_context.h

runtime/mercury_wrapper.c:
    The engine base array is no longer setup here, that code has been moved to
    mercury_thread.c

    Conform to changes in mercury_context.h and mercury_thread.h

runtime/mercury_wsdeque.h:
runtime/mercury_wsdeque.c:
    The original implementation allocated an array for a spark queue only if
    one wasn't already allocated, which could happen when a context was reused.
    Now that spark queues are associated with engines arrays are always
    allocated.

    Replaced two macros with a single macro since there's no-longer a
    distinction between global and local work queues, all work queues are
    local.

runtime/mercury_wsdeque.c:
runtime/mercury_wsdeque.h:
    Remove the --worksteal-max-attempts and --worksteal-sleep-msecs options as
    they are no-longer used.

runtime/mercury_threadscope.h:
runtime/mercury_threadscope.c:
    The MR_EngineId type has been moved to mercury_types.h

    Engine IDs are no-longer allocated here, this is done in mercury_thread.c

    The run spark and steal spark messages now write 0xFFFFFFFF for the context
    id if there is no current context.  Previously this would dereference a
    null pointer.

runtime/mercury_memory_zones.c:
    When checking for an existing memory zone check the free_zones_list
    variable before taking a lock.  This can prevent taking the lock in cases
    where there are no free zones.

    Introduce some conditionally-compiled code that can be used to profile how
    quickly new contexts can be created.

runtime/mercury_bootstrap.h:
    Remove macros that no-longer resolve to functions due to changes in the
    runtime system.

runtime/mercury_types.h:
    Move the MR_EngineId type from mercury_threadscope.h to mercury_types.h

runtime/mercury_grade.h:
    Introduce a parallel grade version number, this change brakes binary
    compatibility with existing parallel code.

runtime/mercury_backjump.c:
runtime/mercury_par_builtin.c:
runtime/mercury_mm_own_stacks.c:
library/stm_builtin.m:
library/thread.m:
library/thread.semaphore.m:
    Conform to changes in mercury_context.h.

library/io.m:
    Make this module compatible with MR_debug_threads.

doc/user_guide.texi
    Remove the documentation for the --worksteal-max-attempts and
    --worksteal-sleep-msecs options.  The documentation was already commented
    out.
2011-04-13 13:19:42 +00:00
Paul Bone
5d5db67aad Adjust some default settings for automatic parallelism, and parallel runtime
options.

deep_profiler/mdprof_feedback.m:
    Enable parallelisation of dependant conjunctions by default.

    Enable the --report option by default.

    Increase the desired amount of parallelism from 4.0 to 8.0.

    Reduce the cost of waits from 250csc to 200csc

    Reduce the clique cost threashold from 100,000csc to 2,000csc

    Reduce the call site cost threashold from 50,000csc to 2,000csc

runtime/mercury_context.h:
runtime/mercury_context.c:
runtime/mercury_wrapper.c:
doc/user_guide.texi:
    Enable thread pinning by default.

    Modify how the thread pinning option is handled.

runtime/mercury_wrapper.c:
    Change the default for --max-contexts-per-thread so that it is much more
    generous in stack segment grades.
2011-04-13 06:29:57 +00:00
Paul Bone
ca7878f01a Make improvements to stack segments code.
The main benefits of these changes are:

    Stack segments (and other memory zones) are cached when they are released
    and can be re-used.

    Some thread safety-fixes have been added.

    All stack segments on all stacks are now the same size:
        Small contexts (which had small stacks) aren't used with stack
        segments.

        The first segment on any stack is the same size as any other segment.

    The first segment on any stack no-longer has a redzone.

    Hard zones on all memory zones have been set to the minimum of one page
    rather than one MR_unit which is usually two pages.

The caching of stack segments results in the following benchmark results.  The
benefit is negligible under normal circumstances, but becomes important when
small segment sizes are used.  Small segment sizes are common in
asm_fast.gc.par.stseg configurations as they reduce the memory required for
suspended contexts.

Non-segmented stack (32MB)
    asm_fast.gc                      average of 5 with ignore=1     18.16 (1.00)

With 512KB (normal) segments:
    asm_fast.gc.stseg and NO caching average of 5 with ignore=1     19.20 (1.06)
    asm_fast.gc.stseg WITH caching   average of 5 with ignore=1     19.16 (1.06)

With 4KB segments:
    asm_fast.gc.stseg and NO caching average of 5 with ignore=1     20.66 (1.14)
    asm_fast.gc.stseg WITH caching   average of 5 with ignore=1     19.66 (1.08)

Other changes include corrections in code comments, clearer function names and
a documentation fix.

runtime/mercury_memory_zones.h:
runtime/mercury_memory_zones.c:
    Re-write a lot of the code that managed the zone lists.  The old code did
    not re-use previously allocated but saved zones.  The changes ensure that
    MR_create_or_reuse_zone (formerly MR_create_zone) checks for a free zone
    of at least the required size before allocating a new one.  When zones are
    released they are put on the free list.

    As above MR_create_zone is now MR_create_or_reuse_zone,

    MR_unget_zone is now MR_release_zone.

    MR_construct_zone has been removed, it was only ever called by
    MR_create_or_reuse_zone.  MR_create_or_reuse_zone now contains the code for
    MR_construct_zone.

    To avoid an unnecessary sychronisation in parallel code some zones are not
    added to the used list.  The only zones put on the used list are those that
    are useful to have on the used list because they have a non-default signal
    handler or a redzone.

    Updates to used_memory_zones now use a pthread mutex so that only one
    thread may be updating the list at once.  This lock is shared with the
    free_memory_zones structure.

    Updates to used_memory_zones now use memory barriers to guarantee that
    concurrent reads always read a consistent, but possibly incomplete,
    data-structure.  This is necessary because it is read from a signal handler
    which cannot call pthread_mutex().

    Rename MR_get_used_memory_zones() to MR_get_used_memory_zones_readonly()
    and document that the zone lists may be incomplete.

    Make the MR_zone_next field of the MR_MemoryZone_Struct structure volatile.

    Remove MAX_ZONES, it wasn't being used anywhere.

    Insert some calls to MR_debug_log_message to help with debugging.

    Use the correct printf integer length modifier for MR_Unsigned values.

    Rename MR_context_id_counter to zone_id_counter, protect it with a lock in
    HLC thread safe grades and use atomic operations in LLC thread-safe
    grades..

    The offset at which we start using a memory zone is allocated in sequence
    from a table.  This table was protected by Mercury's global lock, this is
    now a CAS operation which prevents deadlocks when using trail segment,
    parallel grades.

runtime/mercury_stacks.c:
    Conform to changes in mercury_memory_zones.c.

    Use MR_debug_log_message for printf-style debugging rather than printf.

runtime/mercury_wrapper.h:
runtime/mercury_wrapper.c:
    Remove support for the smaller sized stacks in grades with stack segments.

    Disable redzones when using stack segments.  The MR_(non)detstack_zone_size
    variables affect the first segment on every stack.  Regardless of the type
    of contaxt that owns that stack.

    Conform to changes in runtime/mercury_memory_zones.h.

runtime/mercury_context.h:
runtime/mercury_context.c:
    Removed an extra declaration for MR_init_context_maybe_generator

    Small contexts are problematic since it's unclear to the programmer which
    computations will be executed on smaller contexts and therefore whether
    their stacks would overflow.

    Conform to changes in runtime/mercury_memory_zones.h.
    Conform to changes in runtime/mercury_wrapper.h.

runtime/mercury_memory.c:
    Adjust the definition of MR_unit.  It is now guaranteed to be a multiple of
    the page size which is required by its use in mercury_memory_zones.c

    Conform to changes in mercury_wrapper.h.

runtime/mercury_engine.c:
runtime/mercury_memory_handlers.c:
runtime/mercury_trail.c:
    Conform to changes in runtime/mercury_memory_zones.h.

runtime/mercury_memory_handlers.c:
    Use the correct printf integer length modifier for MR_Unsigned values.

runtime/mercury_misc.c:
    Print out the meaning of errno if it is nonzero in MR_fatal_error.

    Use the correct printf integer length modifier for MR_Unsigned values.

runtime/mercury_atomic_ops.h:
    Define MR_THREADSAFE_VOLATILE to expand to volatile when MR_THREADSAFE is
    defined.  Otherwise it expands to nothing.

    Make memory fences macros and atomic operations available in all thread safe
    grades, not just low level C grades.

doc/user_guide.texi:
    Corrected the default detstack size.
2011-04-05 10:27:26 +00:00
Paul Bone
322feaf217 Add more threadscope instrumentation.
This change introduces instrumentation that tracks sparks as well as parallel
conjunctions and their conjuncts.  This should hopefully give us more
information to diagnose runtime performance issues.

As of this date the ThreadScope program hasn't been updated to read or
understand these new events.

runtime/mercury_threadscope.[ch]:
    Added a function and types to register all the threadscope strings from an
    array.

    Add functions to post the new events (see below).

runtime/mercury_threadscope.c:
    Added support for 5 new threadscope events.
        Registering a string so that other messages may refer to a constant
        string.

        Marking the beginning and ends of parallel conjunctions.

        Creating a spark for a parallel conjunct.

        Finishing a parallel conjunct.

    Re-arranged event IDs, I've started allocating IDs from 38 onwards for
    general purposes and 100 onwards for mercury specific events after talking
    with Duncan Coutts.

    Trimmed excess whitespace from the end of lines.

runtime/mercury_context.h:
    Post a beginning parallel conjunction message when the sync term for the
    parallel conjunction is initialized.

    Post an event when creating a spark for a parallel conjunction.

    Add a MR_spark_id field to the MR_Spark structure, these identify sparks to
    threadscope.

runtime/mercury_context.c:
    Post threadscope messages when a spark is about to be executed.

    Post a threadscope event when a parallel conjunct is completed.

    Add a missing memory barrier.

runtime/mercury_wrapper.[ch]:
    Create a global function pointer for the code that registers strings in the
    threadscope string table, this is filled in by mkinit.

    Call this function pointer immediatly after setting up threadscope.

runtime/mercury_wsdeque.[ch]:
    Modify MR_wsdeque_pop_bottom to return the spark pointer (which points onto
    the queue) rather then returning a result through a pointer and bool if the
    operation was successful.  This pointer is safe to dereference until
    MR_wsdeque_push_bottom is used.

runtime/mercury_wsdeque.c:
    Corrected a code comment.

runtime/mercury_engine.h:
    Documented some of the fields of the engine structure that hadn't been
    documented.

    Add a next spark ID field to the engine structure.

    Change the type of the engine ID field to MR_uint_least16_t

compiler/llds.m:
    Add a third field to the init_sync_term instruction that stores the index
    into the threadscope string table of the static conjunction ID.

    Add a field to the c_file structure containing the threadscope string
    table.

compiler/layout.m:
    Added a new layout array name for the threadscope string table.

compiler/layout_out.m:
    Implement code to write out the threadscope string table.

compiler/llds_out_file.m:
    Write out the threadscope string table when writing out the c_file.

compiler/par_conj_gen.m:
    Create strings that statically identify parallel conjunctions for each
    init_sync_term LLDS instruction.  These strings are added to a table in the
    !CodeInfo and the index of the string is added to the init_sync_term
    instruction.

    Add an extra instruction after a parallel conjunction to post the message
    that the parallel conjunction has completed.

compiler/global_data.m:
    Add fields to the global data structure to represent the threadscope string
    table and its current size.

    Add predicates to update and retrieve the table.

    Handle merging of threadscope string tables in global data by allowing the
    references to the strings to be remapped.

    Refactored remapping code so that a caller such as proc_gen only needs to
    call one remapping predicate after merging global data..

compiler/code_info.m:
    Add a table of strings for use with threadscope to the code_info_persistent
    type.

    Modify the code_info_init to initialise the threadscope string table fields.

    Add a predicate to get the string table and another to update it.

compiler/proc_gen.m:
    Build the containing goal map before code generation for procedures with
    parallel conjunctions in a parallel grade.  par_conj_gen.m depends on this.

    Conform to changes in code_info.m and global_data.m

compiler/llds_out_instr.m:
    Write out the extra parameter in the init_sync_term instruction.

compiler/dupelim.m:
compiler/dupproc.m:
compiler/exprn_aux.m:
compiler/global_data.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/llds_to_x86_64.m:
compiler/mercury_compile_llds_back_end.m:
compiler/middle_rec.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/peephole.m:
compiler/reassign.m:
compiler/use_local_vars.m:
    Conform to changes in llds.m

compiler/opt_debug.m:
    Conform to changes in layout.m

compiler/mercury_compile_llds_back_end.m:
    Fix some trailing whitespace.

util/mkinit.c:
    Build an initialisation function that registers all the strings in
    threadscope string tables.

    Correct the layout of a comment.
2011-03-25 03:13:42 +00:00
Paul Bone
bf6a35f5ec Fix bugs 144 and 171.
Bug 144 is a pathological case where right-recursion is used in a parallel
conjunction and the conjuncts cannot be re-ordered.  This can cause excess
stack allocation and abysmal performance.  The --max-contexts-per-thread
runtime option is used to reduce the impact of these cases by reducing the
amount of parallelism gained at runtime.

Bug 171 is a simple case where the threadscope grade could not be compiled
without enabling the Boehm garbage collector.

runtime/mercury_threadscope.c:
    Enclose boehm GC specific code within #ifdef MR_BOEHM_GC

runtime/mercury_context.[ch]:
    Record the number of contexts running or suspended at any time in a new
    variable, MR_num_outstanding_contexts

    Remove counts of other in-use objects such as the sum of outstanding
    contexts and sparks.

    Remove two granularity control macros that haven't been used for some time.

compiler/granularity.m:
    Ensure that the runtime granularity decision is updated for when it is
    available.

library/par_builtin.m:
    Remove granularity decisions for which support has been removed in the
    runtime.

tests/par_conj/Mmakefile:
tests/par_conj/pathological_right_recursion.{m,exp}:
    Add a test case for bug 144.
2011-02-08 03:48:11 +00:00
Paul Bone
b9c889f881 Update granularity control to ensure that it works with the current runtime system.
Granularity control now uses the length of a contexts run queue as the measure
of how busy the system is and whether it should fork off work.  It is now
configured at runtime rather than compile time and therefore the
--parallelism-target option has been removed from the compiler.

Running some simple tests shows that granularity control has little effect on
most programs.  The effect is probably negligible on programs that use few,
large grains of parallelism.  On programs that represent pathological cases
such as parallel naive Fibonacci granularity control has a significant affect.
Parallel Fibonacci runs roughly four times faster than sequential Fibonacci
on an eight core machine.  But ten times slower if granularity control is
disabled.

Granularity control slightly improves the performance of very-dependant and
parallelism.  However the sequential versions of these programs are faster as
there is close to zero 'parallel overlap'.

These tests where informal, more formal testing is required, especially for
tuning.

compiler/granularity.m:
    Updated granularity control to use a new macro in the runtime to test if a
    new task should be spawned.

    Use a runtime option to tune runtime granularity rather than a compile time
    option.

    Mark the runtime test as thread safe to avoid locking - which is
    unnecessary.

compiler/options.m:
    Removed --parallelism-target compilation option.  Granularity control is
    now configured at run-time.

runtime/mercury_wrapper.h:
    Create a two new global variable MR_granularity_wsdeque_length and
    MR_granularity_wsdeque_length_factor.  MR_granularity_wsdeque_length is
    MR_granularity_wsdeque_length_factor * MR_num_threads.
    MR_granularity_wsdeque_length_factor and MR_engines are both configurable via
    the MERCURY_OPTIONS environment.

    This test calculates the length of the wsdeque each time.  A comment is
    provided to justify this design.

runtime/mercury_wrapper.c:
    Initialise MR_granularity_wsdeque_length during startup of the runtime.

    Parse the new runtime option --runtime-granularity-wsdeque-length-factor
    The default value for this option is 8, this has been chosen somewhat
    arbitrarily.  IN the future we should test the affects of different values
    of this option.

runtime/mercury_context.h:
    Implement a new granularity control test that is linked to the length of a
    local thread's run queue.  The test compares the length of the queue to
    MR_granularity_wsdeque_length.

runtime/mercury_context.c:
    re-initialisation MR_granularity_wsdeque_length after auto-detection of the
    MR_num_threads.

runtime/mercury_wsdeque.h:
    Provide a new inline function to get the length of a wsdeque.

doc/user_guide.texi:
    Updated documentation to reflect changes to compiler and runtime options.

    The new runtime option's documentation is commented out, it is intended for
    developers who understand it's operational semantics.
2010-10-07 23:38:45 +00:00
Paul Bone
edc230406e Fix a number of errors and warnings in the runtime picked up by GCC 4.x in
parallel and threadscope grades.

We had been using types with the wrong signedness well calling atomic operations.
GCC 4.x also picked up an error where #elif was used instead of #else.

While testing these changes on a 32bit system more bugs where found on the i386
architecture and on AMD brand processors.

runtime/mercury_atomic_ops.h:
runtime/mercury_atomic_ops.c:
    Add unsigned variants of the following atomic operations:
        increment,
        add,
        add_and_fetch,
        dec_and_is_zero,

    Add a signed variant for compare and swap.

    Rename the MR_atomic_dec_<type>_and_is_zero operation to move the type to
    the end of the name.

    Use volatile storage in the MR_Stats structure.

    A 32bit machine cannot do atomic operations on 64bit values and MR_Stats
    must use 64bit values.  Therefore 64bit values in the MR_Stats structure
    are now protected by a lock on 32bit machines.

runtime/mercury_atomic_ops.h:
    Fix a typeo in the i386 version of MR_atomic_dec_and_is_zero_uint().

runtime/mercury_atomic_ops.c:
    AMD CPUs do not conform to Intel's specification for being able to
    extract the CPU clock speed from the brand string.  When we cannot
    determine the CPU's clock speed then we write out threadscope
    timestamps in raw clock cycles rather than nanoseconds.

    On i386 machines the ebx register is used to implement PIC code,
    however the CPUID instruction uses it to output information.  Save
    this register on C's stack while we issue CPUID and retrieve the
    result in ebx.

    We now pass native machine sized values to the inline assembler code
    that implements RDTSC and RDTSCP.

    Fix commenting style in some places.

runtime/mercury_atomic_ops.c:
    Fix some incorrect C preprocessor code for conditional compilation.

runtime/mercury_grade.h:
    Increment binary compatibility number.  This should have been done in a
    prior change when the MR_runnext macro changed which broke binary
    compatibility in the parallel low-level C grades.

runtime/mercury_context.h:
    In MR_SyncTerm_Struct use an unsigned value for the number of conjuncts
    remaining before the conjunction is complete.

runtime/mercury_threadscope.c:
    Record raw cpu clock ticks rather than milliseconds when we don't
    know the processor's clock speed.

runtime/mercury_context.c:
runtime/mercury_wsdeque.h:
runtime/mercury_wsdeque.c:
    Conform to changes in mercury_atomic_ops.h
2010-03-20 10:15:51 +00:00
Paul Bone
83a6f14708 Create a threadscope grade component.
Threadscope grades are enabled by using the grade component 'threadscope'.
They are supported only with low-lavel C parallel grades.  Support for
threadscope in high level C grades is intended in the future but does not work
now.

runtime/mercury_conf_param.h:
    Create the MR_THREADSCOPE macro that is defined if the grade is a
    threadscope grade.

    Define MR_PROFILE_FOR_PARALLEL_EXECUTION if MR_THREADSCOPE is defined.

    Emit an error if MR_LL_PARALLEL_CONJ is defined before it is implied by
    MR_THREADSAFE and ! MR_HIGHLEVEL_CODE

runtime/mercury_grade.h
    Update the grade symbol for the threadscope grade component.

runtime/mercury_atomic_ops.c:
runtime/mercury_atomic_ops.h:
runtime/mercury_context.c:
runtime/mercury_context.h:
runtime/mercury_engine.c:
runtime/mercury_engine.h:
runtime/mercury_thread.c:
runtime/mercury_threadscope.c:
runtime/mercury_threadscope.h:
runtime/mercury_wrapper.c:
    Now that MR_PROFILE_FOR_IMPLICIT_PARALLELISM is implied by MR_THREADSAFE we
    don't need to test for MR_THREADSAFE when we test for
    MR_PROFILE_FOR_IMPLICIT_PARALLELISM.  The same is true for
    MR_LL_PARALLEL_CONJ which is implied by MR_THREADSAFE &&
    !MR_HIGHLEVEL_CODE.

    Replace some occurances of MR_PROFILE_FOR_IMPLICIT_PARALLELISM with
    MR_THREADSCOPE where the conditionally compiled code is used to support
    threadscope profiling.

scripts/init_grade_options.sh-subr:
scripts/canonical_grade.sh-subr:
scripts/parse_grade_options.sh-subr:
scripts/final_grade_options.sh-subr:
scripts/mgnuc.in:
compiler/handle_options.m:
compiler/options.m:
compiler/compile_target_code.m:
configure.in:
    Add support for the new grade component.

    Pass -DMR_THREADSCOPE to the C compiler when using a threadscope grade.

    Add assertions to ensure that the 'threadscope' grade component is used
    only with the 'par' grade component.

doc/user_guide.texi:
    Added commented-out documentation for the threadscope greate component.

    Adjusted documentation of the --profile-parallel-execution runtime option
    to describe the correct prerequisite compile time options.

    Added my name to the authors list.

runtime/mercury_context.c:
    Corrected grammar and prose in comments in the MR_do_join_and_continue code.
2010-01-10 04:53:40 +00:00
Paul Bone
5cfd73644a Implement work stealing.
This patch is heavily based on earlier, uncommitted work by Peter Wang.  It
has been updated so that it applies against the current version of the source.
A number of other changes have been made.  Peter's original ChangeLog
follows:

	Implement work stealing for parallel conjunctions.  This builds on an
	older patch which introduced work-stealing deques to the runtime but
	didn't perform work stealing.

	Previously when we came across a parallel conjunct, we would place a spark
	into either the _global spark queue_ or the _local spark stack_ of the
	Mercury context.  A spark on the global spark queue may be picked up for
	parallel execution by an idle Mercury engine, whereas a spark on a local
	spark stack is confined to execution in the context that originated it.

	The problem is that we have to decide, ahead of time, where to put a
	spark.  Ideally, we should have just enough sparks in the global queue to
	keep the available Mercury engines busy, and leave the rest of the sparks
	to execute in their original contexts since that is more efficient.  But
	we can't predict the future so have to make do with guesses using simple
	heuristics.  A bad decision, once made, cannot be reversed.  An engine may
	sit idle due to an empty global spark queue, even while there are sparks
	available in some local spark stacks.

	In the work stealing scheme, sparks are always placed into each context's
	_local spark deque_.  Idle engines actively try to steal sparks from
	random spark deques.  We don't need to make irreversible and potentially
	suboptimal decisions about where to put sparks.  Making a spark available
	for parallel execution is cheap and happens by default because of the
	work-stealing deques; putting a spark on a global queue implies
	synchronisation with other threads.  The downside is that idle engines
	need to expend more time and effort to find the work from multiple places
	instead of just one place.

	Practically, the new scheme seems to work as well as the old scheme and
	vice versa, except that the old scheme often required
	`--max-context-per-threads' to be set "correctly" to get good results.

	Only tested on x86-64, which has a relatively constrained memory model.

My modifications include:

	The difference between 'shared' and 'private' synchronisation terms has
	been removed.  All sync terms are assumed to be shared and thread-safe
	operations are used everywhere.  This allows us to remove complicated code
	used when a private synchronisation term became shared.  This may change
	the performance of thread stealing, in particular it may become slower due
	to the assumption that all sync terms are shared and therefore atomic
	operations must always be used when decrementing their count field.

	I've re-factored MR_do_join_and_continue, It is now much simpler as the
	conditional code in it enumerates the possible cases clearly.

This change bootchecks and successfully runs the test suite in asm_fast.gc
asm_fast.gc.par hlc.gc and hlc.par, no other grades where tested.  I have not
yet tested performance.

runtime/mercury_context.c:
runtime/mercury_context.h:
	Keep pointers to all spark deques in a flat array, so we have access
    to them for stealing.

	Added functions to manage the global array of spark deques.

	Modify MR_do_run_next, it now attempts to steal work from other context's
	spark queues.  Threads sleeping on the condition variable in
	MR_do_run_next now use a timed wait so they can wakeup and try to steal
	sparks.

	Re-factored MR_do_join_and_continue.

	MR_num_idle_engines is used by atomic operations, it has been made an
	MR_Integer so that it's size matches the expectations of the atomic
	operations we have defined.

	Modified the MR_SyncTerm and MR_Spark structures.  Sparks now point to
	their sync terms.  The perant stack pointer has been moved into the
	SyncTerm structure.  The MR_st_is_shared field in the MR_SyncTerm
	structure has been removed.

runtime/mercury_atomic_ops.c:
runtime/mercury_atomic_ops.h:
	Implement a new atomic operation: decrement integer and is zero.  On the
	x86/x86_64 one can't atomically decrement an integer and fetch the result
	in a single instruction, a loop with a 'compare and exchange' instruction
	is necessary.  However since we only want to test if the value has become
	zero after the decrement we can use the processor's flags.  This can be
	done in two instructions, but more importantly a loop is not required and
	only one instruction is atomic.

runtime/mercury_wrapper.c:
runtime/mercury_wrapper.h:
	Added runtime tunable options for work stealing.  These control the number
	of attempts an idle engine will make when looking for work, and the
	duration to sleep after failing to find any work.

runtime/mercury_thread.c:
runtime/mercury_thread.h:
	Added MR_COND_TIMED_WAIT, which waits on condition variables like
	MR_COND_WAIT except that it may time out.

runtime/mercury_wsdeque.h:
runtime/mercury_wsdeque.c:
	MR_wsdeque_pop_bottom now uses it's second argument to return the code
	address to jump to rather the whole spark.

runtime/mercury_conf.h.in:
configure.in:
	Test for sched_yield()

	Change the synchronisation term structure.

doc/user_guide.texi:
    Add commented out documentation for two new tunable parameters,
    `--worksteal-max-attempts' and `--worksteal-sleep-msecs'.
    Implementors may want to experiment with different values but end
    users shouldn't need to know about them.
2009-12-15 02:29:07 +00:00
Paul Bone
92afa23af5 Support for threadscope profiling of the parallel runtime.
This change adds support for threadscope profiling of the parallel runtime in
low level C grades.  It can be enabled by compiling _all_ code with the
MR_PROFILE_PARALLEL_EXECUTION_SUPPORT C macro defined.  The runtime, libraries
and applications must all have this flag defined as it alters the MercuryEngine
and MR_Context structures.

See Don Jones Jr, Simon Marlow, Satnam Singh - Parallel Performance Tuning for
Haskell.

This change also includes:

    Smarter thread pinning (the primordial thread is pinned to the thread that
    it is currently running on).

    The addition of callbacks from the Boehm GC to notify the runtime of
    stop the world garbage collections.

    Implement some userspace spin loops and conditions.  These are cheaper than
    their POSIX equivalents, do not support sleeping, and are signal handler
    safe.

boehm_gc/alloc.h:
boehm_gc/alloc.c:
    Declare and define the new callback functions.

boehm_gc/alloc.c:
    Call the start and stop collect callbacks when we start and stop a
    stop-the-world collection.

    Correct how we record the time spent collecting, it now includes
    collections that stop prematurely.

boehm_gc/pthread_stop_world.c:
    Call the pause and resume thread callbacks in each thread where the GC
    arranges for that thread to be stopped during a stop-the-world collection.

runtime/mercury_threadscope.c:
runtime/mercury_threadscope.h:
    New files implementing the threadscope support.

runtime/mercury_atomic_ops.c:
runtime/mercury_atomic_ops.h:
    Rename MR_configure_profiling_timers to MR_do_cpu_feature_detection.

    Add a new function MR_read_cpu_tsc() to read the TSC register from the CPU,
    this simply abstracts the static MR_rdtsc function.

runtime/mercury_atomic_ops.h:
    Modify the C inline assembler to ensure we tell the C compiler that the
    value in the register mapped to the 'old' parameter is also an output from
    the instructions.  That is, the C compiler must not depend on the value of
    'old' being the same before and after the instruction is executed.  This
    has never been a problem in practice though.

    Implement some cheap userspace mutual exclusion locks and condition
    variables.  These will be faster than pthread's mutexes when critical
    sections are short and threads are pinned to separate CPUs.

runtime/mercury_context.c:
runtime/mercury_context.h:
    Add a new function for pinning the primordial thread.  If the OS supports
    sched_getcpu we use it to determine which CPU the primordial thread should
    use.  No other thread will be pinned to this CPU.

    Add a numeric id field to each context, this id is uniquely assigned and
    identifies each context for threadscope.

    MR_schedule_context posts the 'context runnable' threadscope event.

    MR_do_runnext has been modified to destroy engines differently, it ensures
    they cleanup properly so that their threadscope events are flushed properly
    and then calls pthread_exit(0)

    MR_do_runnext posts events for threadscope.

    MR_do_join_and_continue posts events for threadscope.

runtime/mercury_engine.h:
    Add new fields to the MercuryEngine structure including a buffer of
    threadscope events, a clock offset (used to synchronize the TSC clocks) and
    a unique identifier for the engine,

runtime/mercury_engine.c:
    Call MR_threadscope_setup_engine() and MR_threadscope_finalize_engine for
    newly created and about-to-be-destroyed engines.

    When the main context finishes on a thread that's not the primordial thread
    post a 'context is yielding' message before re-scheduling the context on
    the primordial thread.

runtime/mercury_thread.c:
    Added an XXX comment about a potential problem, it's only relevant for
    programs using thread.spawn.

    Added calls to the TSC synchronisation code used for threadscope profiling.
    It appears that this is not necessary on modern x86 machines, it has been
    commented out.

    Post a threadscope event when we create a new context.

    Don't call pthread_exit in MR_destroy_thread, we now do this in
    MR_do_runnext so that we can unlock the runqueue mutex after cleaning up.

runtime/mercury_wrapper.c:
    Conform to changes in mercury_atomic_ops.[ch]

    Post an event immediately before calling main to mark the beginning of the
    program in the threadscope profile.

    Post a "context finished" event at the end of the program.

    Wait until all engines have exited before cleaning up global data, this is
    important for finishing writing the threadscope data file.

configure.in:
runtime/mercury_conf.h.in:
    Test for the sched_getcpu C function and utmpx.h header file, these are
    used for thread pinning.

runtime/Mmakefile:
    Include the mercury_threadscope.[hc] files in the list of runtime headers
    and sources respectively.
2009-12-03 05:28:00 +00:00
Paul Bone
6807e11661 Re-factor the MR_join_and_continue macro.
This change replaces the MR_join_and_continue macro with a C procedure.  A
smaller macro named MR_join_and_continue wraps the new C procedure and provides
a trampoline to prevent C stack leaks.  MR_join_and_continue will now have the
additional cost of a C procedure call rather than always being inlined.  This
code is only used in the implementation of parallel conjunctions in the low
level C grades, it does not affect other grades.

An earlier revision of this code was causing deadlocks, to debug them support
was added to the MR_SIGNAL MR_BROADCAST and MR_WAIT macros to enable better
logging of the use of condition variables when MR_DEBUG_THREADS is defined at
compile time.

This change passes bootcheck and the test suite in the asm_fast.gc.par grade.

runtime/mercury_context.h:
runtime/mercury_context.c:
    Created MR_do_join_and_continue procedure from old MR_join_and_continue
    macro.
    Added additional comments to this procedure, describing how it works.
    Created a new macro MR_join_and_continue that wraps the new procedure.
    Conform to changes in the MR_WAIT, MR_SIGNAL and MR_BROADCAST macros.

runtime/mercury_thread.h:
    Added a from parameter to the MR_WAIT, MR_SIGNAL and MR_BROADCAST macros.
    Added a from parameter to the C procedures' declarations that implement the
    debugging versions of the condition operations above.
    Adjusted the formatting of these declarations to match the C style used in
    the project.

runtime/mercury_thread.c:
    The C procedures implementing the debugging versions of the condition
    operations now print out their from parameter.
    MR_cond_broadcast now uses "broadcast" in it's log message rather than
    "signal"
    MR_cond_wait's log message now more clearly specifies which argument is the
    lock and which is the condition variable.
2009-11-27 03:51:20 +00:00
Paul Bone
8c51e02c6b Remove broken parallel execution profiling code.
Parallel execution profiling can be enabled by defining
MR_PROFILE_PARALLEL_EXECUTION_SUPPORT when building the runtime, libraries and
applications.  This collects information about the performance of various
parallel execution operations such as the management of contexts and sparks.

in the case of sparks this was broken.  It had gone unnoticed since it was only
compiled in if the MR_join_and_continue macro was evaluated while
MR_PROFILE_PARALLEL_EXECUTION_SUPPORT was defined.  The C compiler would be
unable to compile any program making use of parallel conjunctions.

This commit removes the broken code, I plan to fix it and re-add it later.

runtime/mercury_context.h:
    As above.
2009-11-17 06:30:26 +00:00
Paul Bone
db9a526d6b Corrections in response to review comments on recent parallel runtime changes.
Made thread pinning off by default, the operating system should handle this
unless we have a good reason to.

configure.in:
    Removed newly added declaration checking code.

doc/user_guide.texi:
    Documentation corrections.
    Adjusted the --pin-threads runtime option default.

runtime/mercury_atomic_ops.c
runtime/mercury_atomic_ops.h
    Use __x86_64__ instead of __amd64__
    Altered comments at the beginning of sections of the file to better
    describe the contents of that section.
    Placed comments at the end of long conditional compilation blocks that
    match the condition at the beginning of the block.

runtime/mercury_conf.h.in:
    Added editor hint for vim at the top of the file.
    Remove newly added declarations section.

runtime/mercury_context.c:
    Adjusted default behaviour of --pin-threads
    Fixed some style issues.

runtime/mercury_context.h:
    Fixed grammatical error.

runtime/mercury_wrapper.c:
    Fixed grammatical error.
    Fixed a missing break statement in a switch statement.
2009-11-05 05:47:40 +00:00
Paul Bone
d5d4457463 Parallel runtime thread pinning.
This change introduces two new features in the mercury runtime;
pinning of threads to CPU cores/threads and runtime detection of the number of
CPU cores/threads available.

If MR_num_threads has not been specified in the runtime options with the -P
flag we use the sysconf(_SC_NPROCESSORS_ONLN) call if available to detect the
number of CPUs online and set MR_num_threads available.  As before this
defaults to 1.

Thread pinning is enabled if the runtime was able to detect the number of CPUs
on the machine or the user specifically requests thread pinning with the
--thread-pinning runtime option.  The sched_setaffinity() call is used to pin
each thread to a specific CPU.

I believe that in some cases thread pinning can achieve better performance,
this is yet to be determined and it may depend on the machine's architecture.
It does make profiling of the runtime system more reliable where the RDTSCP
instruction is not available.  It ensuring that a thread is not migrated to a
different CPU between sampling of the CPU's TSC.

configure.in:
runtime/mercury_conf.h.in:
	Detect the presence of sched.h sysconf() sched_setaffinity() and
	_SC_NPROCESSORS_ONLN.

doc/user_guide.texi:
	Document the new --thread-pinning runtime option.
	Adjust the documentation of -P to reflect the new behaviour.

runtime/mercury_context.c:
	Add the MR_pin_thread() function.
	Create a new global MR_bool MR_pin_threads;
	Add the calculation of the number of threads to use to
	MR_init_thread_stuff()
	Correct a bug in a format string in my previous patch.

runtime/mercury_context.h:
	Export the new MR_pin_thread() function.
	Export the new MR_pin_threads global.
	Correct a previous spelling mistake.
	Adjust the documentation of MR_init_thread_stuff to reflect the new
	behaviour.

runtime/mercury_wrapper.c:
	Pin the primordial thread to a CPU after it spawns the other threads.
	Add the --thread-pinning runtime configuration option.
	Move the calculation of MR_max_outstanding_contexts until after
	MR_init_thread_stuff() so that it is calculated after the number of CPUs
	available has been determined.
	Add a pause instruction to a spinloop for better behaviour on later
	i386/x86_64 processors.  See the documentation for MR_ATOMIC_PAUSE.

runtime/mercury_thread.c:
	After a thread is spawned call MR_pin_thread() to pin a thread to a CPU if
	the thread has been created to pickup work from the global work queue.
2009-08-23 22:52:35 +00:00
Paul Bone
b04af52232 Fix some overlong lines and other formatting issues in my previous patch.
runtime/mercury_context.c:
runtime/mercury_context.h:
runtime/mercury_atomic_ops.h:
    As above.
2009-08-16 10:47:55 +00:00
Paul Bone
4f1bfc2ebc Parallel runtime profiling improvements.
Improve the profiling of the parallel runtime code in two main ways:
	+ Record data for more events.
	+ Record high-precision timing data on x86 machines via the TSC where
	  access to the TSC is available.

Access to the TSC is available via two machine instructions.  RDTSC - read
TSC. and RDTSCP - read TSC and processor ID.  We prefer the latter as a
process migrated between two calls to RDTSC may cause an incorrect time
duration to be calculated (since TSC counts are seldom synchronized).  We
fall back to RDTSC when RDTSCP is not available and gracefully record no
timing information when neither is available.  Availability is detected via
the CPUID instruction, see MR_configure_profiling_timers().

runtime/mercury_context.c:
runtime/mercury_context.h:
	Runtime profiling changes as above.

runtime/mercury_atomic_ops.c:
runtime/mercury_atomic_ops.h:
	Add runtime profiling timing code.
	Add new add and subtract atomic operations.

runtime/mercury_wrapper.c:
	Call the new MR_configure_profiling_timers() procedure to detect the CPU
	and configure access to the TSC.

Mmakefile:
runtime/Mmakefile:
	'mmake tags' at the top level now builds the tags file for the runtime
	directory.
	The tags target in the runtime directory is now marked as PHONY so it is
	generated even if it already exists.
2009-08-16 10:18:36 +00:00
Paul Bone
f57e7efbd8 Introduce some parallel execution profiling code in the runtime system.
This change adds profiling code that can be enabled with the
MR_PROFILE_PARALLEL_EXECUTION C macro and --profile-parallel-execution runtime
option.  Currently it records some very simple information about the global
scheduling of sparks.

runtime/mercury_context.c:
runtime/mercury_context.h:
	Introduce profiling of parallel execution.
	Record the number of sparks that are executed from the global spark queue.
	Record the number of contexts created in order to execute sparks from the
	global queue.
	Rename the MR_finalize_runqueue function to MR_finalize_thread_stuff and
	have it write out the profiling information when profiling is enabled.

runtime/mercury_wrapper.c:
	Parse the new --profile-parallel-execution runtime option and use it to
	enable profiling of parallel execution when support is compiled-in.
	Call MR_finalize_thread_stuff() when finalizing the runtime system, The
	former function MR_finalize_runqueue was not being called anywhere.

runtime/mercury_bootstrap.h:
	Remove the unused finalize_runqueue() macro.

doc/user_guide.texi:
	Document the --max-contexts-per-thread runtime option.
	Document and comment-out the new --profile-parallel-execution runtime
	option.
2009-07-13 05:27:12 +00:00
Paul Bone
4d41cf6c23 Rename the runtime granularity control macros, variables and predicates.
Estimated hours taken: 3
Branches: main

Rename the runtime granularity control macros, variables and predicates.

Names of the runtime granularity control macros, variables and predicates are
now more descriptive and more consistent.

An alternative runtime granularity control predicate and macro is now
available, it considers the number of contexts and all sparks whereas the
original predicate and macro considers only the number of contexts and sparks
on the global queue.

A new predicate has been added to determine the number of worker threads that
the mercury runtime is configured to use.


library/par_builtin.m:
	Renamed predicates.
	Conform to changes in runtime/mercury_thread.h
	Added the new predicates.
	Removed some old foreign procedure attributes.
	Addressed an XXX comment left by Zoltan.

runtime/mercury_context.c:
runtime/mercury_context.h:
	Rename existing runtime granularity control variables and macros.
	Add new runtime granularity control variable and macro.

runtime/mercury_wrapper.c:
runtime/mercury_wrapper.h:
	Export MR_num_threads variable.
	Make this variable an MR_Unsigned.

runtime/mercury_atomic_ops.c:
runtime/mercury_atomic_ops.h:
	Introduce new atomic increment and decrement instructions.  These are used
	to count the number of local sparks created which is done outside of a
	critical section.

library/Mmakefile:
	Rebuild the par_builtin module when either runtime/mercury_context.h or
	runtime/mercury_thread.h change.

compiler/granularity.m:
	Conform to changes in runtime/mercury_context.h
2009-06-17 03:26:00 +00:00
Zoltan Somogyi
541059b691 Fix spelling error in Paul's change, and fix some old bad indentation.
Estimated hours taken: 0.1
Branches: main

runtime/mercury_context.[ch]:
library/par_builtin.m:
	Fix spelling error in Paul's change, and fix some old bad indentation.
2009-06-04 08:07:10 +00:00
Paul Bone
d1719ad812 Provide access to runtime granularity control primitives.
Estimated hours taken: 10
Branches: main

Provide access to runtime granularity control primitives.

In 2007 Zoltan implemented some runtime granularity control primitives in the
runtime system.  This change provides access to them from procedures in the
par_builtin module, this is useful for testing.  Recording of parallelization
decisions has also been implemented to help verify and debug this approach.
The variables that the runtime granularity control method depends upon have
been made volatile.

The runtime granularity control method is:
	 NumCPUs > (executable contexts + sparks on global queue).

I previously thought that this was incorrect but instead my testing code was
broken, having fixed it this works correctly.

library/par_builtin.m:
    Define a procedure as above so that it can be called explicits from code,
		this is useful for testing.

runtime/mercury_context.c:
runtime/mercury_context.h:
	  Make the variables used by the granularity control method volatile.
		Provide debugging code enabled by a C macro.
2009-06-04 06:42:13 +00:00
Julien Fischer
9c98a60bd5 Add a mechanism for dynamically growing the trail by adding new segments to it
Estimated hours taken: 20
Branches: main

Add a mechanism for dynamically growing the trail by adding new segments to it
in a similar fashion to what we do for the stacks with stack segments.  The
mechanism is enabled by the trseg (trail segments) grade component.  Unlike
stack segments the trail segment mechanism also works with the high-level C
backend.

The mechanism works by adding a test to MR_trail_{value,function} that checks
if we are about to run out of a trail and allocates a new trail segment if
that test succeeds.

Extend mdb's trail_details command to print the current number of trail
segments in trseg grades.

Fix a bug where the MR_trail_ptr was not being reset correctly after
a trail reset.

runtime/mercury_grade.h:
	Add the new grade component.

runtime/mercury_conf_param.h:
	Document the new grade component, and the option used to debug
	trail segments.

runtime/mercury_memory_zones.h:
	Shift the definition MR_MemoryZones to this file in order break
	a cyclic dependency between header files.

runtime/mercury_context.h:
	Add a new field to the context structure to hold a list previous
	trail segments.

	Delete the definition of the type MR_MemoryZones from here.

runtime/mercury_trail.[ch]:
	When adding a new trail entry in trseg grades first check whether we
	need to extend the trail and do so if necessary.

	Export the definitions of MR_TRAIL_{BASE,ZONE}.

	Add a macro, MR_PREV_TRAIL_ZONES, for accessing the list of trail zones
	in a grade independent manner.

	Fix a typo in a comment.

	Add functions for creating and destroying trail segments.

	Handle trail segments in the code that handles untrailing and
	resets.  This also fixes a bug with trail reset where MR_trail_ptr
	was not being reset along with the rest of the trail state.

compiler/options.m:
compiler/handle_options.m:
compiler/compile_target_code.m:
scripts/canonical_grade.sh-subr:
scripts/init_grade_option.sh-subr:
scripts/mgnuc.in:
scripts/parse_grade_options.sh-subr:
	Handle the new grade component.

trace/mercury_trace_cmd_developer.c:
	Make the trail_details command print out the number of trail segments
	in trseg grades.

tests/trailing/Mmakefile:
tests/trailing/tr_reset_bug.{m,exp}:
	Regression test for the bug with trail resets.
2008-09-05 11:19:34 +00:00
Julien Fischer
5c49b5dac9 Add support for thread-local backjumping in grades that support concurrency.
Estimated hours taken: 7
Branches: main

Add support for thread-local backjumping in grades that support concurrency.

library/backjump.m:
	Shift the C definition of backjump handlers and choice ids into the
	runtime.  This is needed because MR_Context now refers to them and
	the runtime should not depend on the library.

	Delete the XXX comment regarding pred_is_external/3.  (This has been
	fixed below.)

runtime/mercury_backjump.h:
runtime/mercury_backjump.c:
	New module that defines those parts of the backjumping support that
	the runtime requires access to.

	In high-level C .par grades make the global state required by
	backjumping thread-specific.

	Conform to the usual coding conventions in the runtime in the new
	versions of the data structures that were originally in backjump.m.

	Rename ML_Choice_Id to MR_BackJumpChoiceId, the latter is less
	ambiguous.

runtime/mercury_context.h:
runtime/mercury_context.c:
	In low-level C grades add two extra fields to the MR_Context structure
	to hold the global state required by backjumping.

	In high-level C .par grades initialise the the thread-specific data that
	is used to store the backjump global state at program startup.

	Reformat a function prototype.

runtime/mercury_thread.h:
	Reformat a function prototype.

runtime/Mmakefile:
	Include the new files.

mdbcomp/program_representation.m:
	Update pred_is_external/3 to include backjump.builtin_choice_id/1
	and backjump.builtin_backjump/1.

tests/hard_coded/Mmakefile:
tests/hard_coded/tl_backjump_test.m:
tests/hard_coded/tl_backjump_test.exp:
	Test thread-local backjumping.

tests/hard_coded/tl_backjump_test.exp2:
	Expected output for the above test case for grades in which spawn/3
	does not work.
2008-03-19 05:30:01 +00:00
Peter Wang
fa80b9a01a Make the parallel conjunction execution mechanism more efficient.
Branches: main

Make the parallel conjunction execution mechanism more efficient.

1. Don't allocate sync terms on the heap.  Sync terms are now allocated in
the stack frame of the procedure call which originates a parallel
conjunction.

2. Don't allocate individual sparks on the heap.  Sparks are now stored in
preallocated, growing arrays using an algorithm that doesn't use locks.

3. Don't have one mutex per sync term.  Just use one mutex to protect
concurrent accesses to all sync terms (it's is rarely needed anyway).  This
makes sync terms smaller and saves initialising a mutex for each parallel
conjunction encountered.

4. We don't bother to acquire the global sync term lock if we know a parallel
conjunction couldn't be executing in parallel.  In a highly parallel program,
the majority of parallel conjunctions will be executed sequentially so
protecting the sync terms from concurrent accesses is unnecessary.


par_fib(39) is ~8.4 times faster (user time) on my laptop (Linux 2.6, x86_64),
which is ~3.5 as slow as sequential execution.


configure.in:
	Update the configuration for a changed MR_SyncTerm structure.

compiler/llds.m:
	Make the fork instruction take a second argument, which is the base
	stack slot of the sync term.

	Rename it to fork_new_child to match the macro name in the runtime.

compiler/par_conj_gen.m:
	Change the generated code for parallel conjunctions to allocate sync
	terms on the stack and to pass the sync term to fork_new_child.

compiler/dupelim.m:
compiler/dupproc.m:
compiler/exprn_aux.m:
compiler/global_data.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/llds_out.m:
compiler/llds_to_x86_64.m:
compiler/middle_rec.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/reassign.m:
compiler/use_local_vars.m:
	Conform to the change in the fork instruction.

compiler/liveness.m:
compiler/proc_gen.m:
	Disable use of the parallel conjunction operator in the compiler as
	older versions of the compiler will generate code incompatible with
	the new runtime.

runtime/mercury_context.c:
runtime/mercury_context.h:
	Remove the next pointer field from MR_Spark as it's no longer needed.

	Remove the mutex from MR_SyncTerm.  Add a field to record if a spark
	belonging to the sync term was scheduled globally, i.e. if the
	parallel conjunction might be executed in parallel.

	Define MR_SparkDeque and MR_SparkArray.

	Use MR_SparkDeques to hold per-context sparks and global sparks.

	Change the abstract machine instructions MR_init_sync_term,
	MR_fork_new_child, MR_join_and_continue as per the main change log.

	Use a preprocessor macro MR_LL_PARALLEL_CONJ as a shorthand for
	!MR_HIGHLEVEL_CODE && MR_THREAD_SAFE.

	Take the opportunity to clean things up a bit.

runtime/mercury_wsdeque.c:
runtime/mercury_wsdeque.h:
	New files containing an implementation of work-stealing deques.  We
	don't do work stealing yet but we use the underlying data structure.

runtime/mercury_atomic.c:
runtime/mercury_atomic.h:
	New files to contain atomic operations.  Currently it just contains
	compare-and-swap for gcc/x86_64, gcc/x86 and gcc-4.1.

runtime/Mmakefile:
	Add the new files.

runtime/mercury_engine.h:
runtime/mercury_mm_own_stacks.c:
runtime/mercury_wrapper.c:
	Conform to runtime changes.

runtime/mercury_conf_param.h:
	Update an outdated comment.
2007-10-11 11:45:22 +00:00
Julien Fischer
1d50d41883 Add support for thread local trailing in grades that support parallel
Estimated hours taken: 14
Branches: main

Add support for thread local trailing in grades that support parallel
execution.  Previously the trail state for each context was copied into the
global variables defined in mercury_trail.[ch] when the context was loaded,
i.e. we had thread-local trailing in grades that did not support parallel
execution.

This change extends the runtime to support thread local trailing in the
presence of parallel execution.  We do this by adding the relevant fields to
the MercuryEngine structure and redefining the abstract registers related to
trailing to make accesses to the trail state go via the relevant engine.

This also works for parallel trailing grades with the high-level C backend
because in those grades the trail state is stored in a thread-local dummy
engine and context.  (The context, and its attached trail state, are created
when the thread is spawned.)

XXX the coupling between the high-level and low-level versions of the runtimes
w.r.t trailing is not good.  I intend to fix it as a separate change.
(It also affects thread local mutables.)

runtime/mercury_context.h:
	Fix a comment: a context's trail_zone field should be accessed
	via MR_eng_context, not via the an abstract machine register.

runtime/mercury_engine.h:
	Extend engines so that in parallel trailing grades they contain the
	information necessary to keep track of the state of the trail.

runtime/mercury_regs.h:
	In parallel grades MR_trail_ptr, MR_ticket_counter and
	MR_ticket_high_water need to be accessed w.r.t the engine that the
	context is being executed on.

	Modify the definition of MR_restore_trail_registers() so that it
	accesses the trail state via abstract registers rather than
	manipulating the global variables defined in mercury_trail.[ch]
	directly.

runtime/mercury_trail.[ch]:
	Only define global variables to hold the trail state in non-parallel
	grades.

	Document the difference between parallel and non-parallel trailing
	grades.

library/benchmarking.m:
	In .par.tr grades, ML_report_stats should print the size of the trail
	for the current context.

NEWS:
	Announce support for thread local trailing in parallel grades.
2007-05-10 05:24:16 +00:00
Zoltan Somogyi
9ce35377f6 Fix a problem with the representation of goal paths for switches on unbounded
Estimated hours taken: 2
Branches: main

Fix a problem with the representation of goal paths for switches on unbounded
numbers of function symbols, such as ints, floats and strings. We used to
represent such switches as a negative number in the field representing the
number of function symbols in the switched-on type, but some parts of the
system weren't handling negative numbers specially, and they couldn't be
properly parsed back in anyway (since "-" was also the character separating
the case's ordinal number from this number).

mdbcomp/program_representation.m:
	Change the switch goal path step to use a maybe type to record the
	number of function symbols in the switched-on type, with a "no" meaning
	the type is unbounded. Update the goal path step printing and parsing
	code accordingly.

	Make the names of some other function symbols more expressive.

browser/declarative_execution.m:
	Conform to the change to program_representation.m.

compiler/goal_path.m:
compiler/deep_profiling.m:
compiler/unneeded_code.m:
	Update the code that creates goal path steps.

	In deep_profiling.m, fix an old bug: the two integers (case number,
	number of function symbols in switched-on type) were swapped.

	In deep_profiling.m, give a prefix to the field names of the main
	structure passed around, to try to make them unique.

tests/debugger/switch_on_unbounded.{m,inp,exp}:
	New test case to test this fix.

tests/debugger/Mmakefile:
	Enable the new test case.
2007-04-19 04:24:53 +00:00
Zoltan Somogyi
7989f17311 Fix some software rot that prevented I/O operations from working in mmos
Estimated hours taken: 6
Branches: main

Fix some software rot that prevented I/O operations from working in mmos
grades. The problem was the change to the I/O module to make it use thread
local storage via a new field of the MR_Context structure which was accessed
via the MR_eng_this_context field of the engine, instead of via the
MR_eng_context field. The new field was not set by the code for initializing
the contexts used by own stack minimal model tabling.

runtime/mercury_context.h:
runtime/mercury_engine.h:
	Add significant new documentation about how fields of the MR_Context
	structure are accessed, both because the documentation is useful and to
	make similar mistakes less likely in future.

	Add a macro for use by own stack minimal model tabling.

runtime/mercury_thread.c:
	Add a comment about a link to mercury_engine.h.

runtime/mercury_thread.h:
	Convert to four-space indentation, and fix some formatting.

runtime/mercury_mm_own_stacks.c:
	Add code for filling in the missing fields of newly created contexts.

runtime/mercury_wrapper.c:
	In own stack minimal model grades, set up the main context properly.
	The previous code was based on a flawed understanding of the
	relationalship between MR_eng_context and MR_eng_this_context.

tests/debugger/mmos_print.{m,inp,exp}:
	Add a new test case (which we don't yet pass due to a problem with
	formatting of mdb output) to test the fix. The old versions of the
	compiler don't pass this test case, because the "p *" commands of the
	debugger invoke I/O code in the Mercury standard library, which fails
	with a segfault due to the thread local fields of generators' contexts
	being uninitialized.

	Note that the .inp aborts execution, because without the abort the
	execution would go into an infinite loop since mmos grades don't yet
	have code for detecting completion.

tests/debugger/Mmakefile:
	Enable the new test case in mmos grades.

	Fix inconsistent indentation.

tests/tabling/Mmakefile:
	Do not try to execute minimal tests in mmos grades, since we don't pass
	them yet, and the symptom is in many cases an infinite loop.
2007-04-17 05:38:22 +00:00
Peter Wang
b2f14e1afa Some bug fixes to do with threads.
Branches: main

Some bug fixes to do with threads.

library/io.m:
	ML_maybe_make_err_msg() was not thread-safe but was called from some
	`thread_safe' foreign_procs.  Make ML_maybe_make_err_msg() acquire the
	global lock if the caller does not acquire the global lock itself.

library/thread.m:
runtime/mercury_thread.c:
	Create threads in the detached state so that resources will be
	automatically freed when threads terminate (we don't call
	pthread_join() anywhere).

library/thread.semaphore.m:
	Wake up waiting threads in FIFO order, instead of LIFO order.

runtime/mercury_context.c:
runtime/mercury_context.h:
runtime/mercury_engine.c:
runtime/mercury_engine.h:
	Change the way we enforce that a Mercury context returning from Mercury
	code back into a C function runs on the original Mercury engine that
	called the C function.

	Previously, if a C function called into Mercury code, the Mercury
	context would be "owned" by that Mercury engine until the C function
	finished.  If the Mercury code suspended (e.g. waiting on a semaphore),
	it could not be resumed by another Mercury engine.  This was
	unnecessarily conservative.

	Now any Mercury engine can resume a suspended context.  Just before
	returning into C functions, we check that the context is actually
	running on the Mercury engine in which the C function was started.  If
	not, *then* we reschedule the context so that it will only be picked up
	by the right Mercury engine.

	Add a comment that none of this is implemented for grades not using gcc
	non-local gotos (nor was it implemented before).

runtime/mercury_memory_zones.c:
	Fix an off-by-one bug and a thread-safety bug in MR_next_offset().
2007-03-03 03:43:35 +00:00
Peter Wang
81b8e55825 Add support for thread-local mutables. These can take on a different value for
Estimated hours taken: 15
Branches: main

Add support for thread-local mutables.  These can take on a different value for
each Mercury thread.  Child threads automatically inherit the thread-local
values of the parent thread that spawned it.

compiler/make_hlds_passes.m:
compiler/prog_io.m:
compiler/prog_item.m:
compiler/prog_mutable.m:
	Accept a `thread_local' attribute for mutables and update the
	source-to-source transformation.

doc/reference_manual.texi:
	Document the `thread_local' attribute as a Melbourne Mercury compiler
	extension.

runtime/mercury_context.c:
runtime/mercury_context.h:
	Add a `thread_local_mutables' field to MR_Context, which points to an
	array which holds all the values of thread-local mutables in the
	program.  Each thread-local mutable has an associated index into the
	array, which is allocated during initialisation.  A child thread
	inherits the parent's thread-locals simply by copying the array.

	Add a `thread_local_mutables' field to MR_Spark and update the parallel
	conjunction implementation to take into account thread-locals.

runtime/mercury_thread.c:
runtime/mercury_thread.h:
	Add the functions and macros which are used by the code generated for
	thread-local mutables.

runtime/mercury_wrapper.c:
	Allocate a thread-local mutable array for the initial context at
	startup.

extras/concurrency/spawn.m:
	Update the spawn/3 implementation to make child threads inherit the
	thread-local values of the parent.

	Make different threads in high-level C grades use different
	MR_Contexts.  This makes it possible to use the same implementation of
	thread-local mutables as in the low-level C grades.

tests/hard_coded/mutable_decl.exp:
tests/hard_coded/mutable_decl.m:
tests/hard_coded/pure_mutable.exp:
tests/hard_coded/pure_mutable.m:
tests/invalid/bad_mutable.err_exp:
tests/invalid/bad_mutable.m:
	Add some thread-local mutables to these test cases.

NEWS:
	Announce the addition.
2007-01-12 05:00:32 +00:00
Zoltan Somogyi
b61ea9de44 Implement a large chunk of the code that was previously missing for .mmos
Estimated hours taken: 20
Branches: main

Implement a large chunk of the code that was previously missing for .mmos
grades. The system now correctly computes several answers for the tc_minimal
test case, before going into an infinite loop (since the code for recognizing
the absence of further solutions is not yet there).

Significantly improve the infrastructure for debugging such changes.

compiler/table_gen.m:
	Complete the mmos transformation.

compiler/proc_gen.m:
	Handle the special return requirements of mmos generators, which must
	return not to a caller (since each generator is the root of its own
	SLD tree), but to a consumer in another SLD tree that is waiting for an
	answer.

compiler/hlds_pred.m:
	Provide a mechanism whereby table_gen.m can communicate to proc_gen.m
	the requirement for this special return.

compiler/trace_gen.m:
	When generating events, include the port and the goal path in a
	comment. This makes the generated C code significantly easier to
	understand.

compiler/layout_out.m:
	Export a function for trace_gen.m to use.

compiler/hlds_goal.m:
	Change goal_path_to_string to a function to make it easier to use.

compiler/*.m:
	Conform to the change to goal_path_to_string.

runtime/mercury_context.[ch]:
	In .mmos grades, include the current debugger call sequence number,
	depth, and event number in contexts, to be saved and loaded with
	the contexts. This allows each context to have its own separate
	sequence of events.

	This capability depends not directly on the grade, but on the macro
	MR_EXEC_TRACE_INFO_IN_CONTEXT. For now, this is defined only in .mmos
	grades, but in future, it may be useful in other grades as well.

runtime/mercury_conf_param.h:
	Define and document MR_EXEC_TRACE_INFO_IN_CONTEXT.

runtime/mercury_mm_own_stacks.[ch]:
runtime/mercury_tabling_preds.h:
	Implement some predicates needed by the own stack transformation.
	Implement the code for generators returning answers to consumers,
	and the code for consumers scheduling generators when they need
	more answers. At the moment, the code for detecting when generators
	depend on each other is not yet written.

	Provide better facilities for debugging own stack minimal model grades.

	Fix a cut-and-paste bug (wrong macro name guarding the handwritten
	C module).

runtime/Mmakefile:
	Rebuild only what needs to be rebuilt when mercury_tabling_preds.h
	changes.

runtime/mercury_label.[ch]:
	Add a utility function for returning the name of an arbitrary label
	(internal or entry).

	Rename some fields to give them MR_ prefixes.

	Always define the functions for recording both entry and internal
	labels, even if they are not called from most modules, since they
	may be called from a few handwritten modules in the runtime.

	Rename a function to avoid a clash with the name of a macro,
	and thus allow the change to mercury_goto.h.

runtime/mercury_goto.h:
	Fix a bug with MR_init_entry_an. This macro was supposed to always
	insert the entry label that is its argument into the entry table,
	but instead of calling the function it was meant to call, it called
	a macro that could be (and usually way) defined to expand to nothing.

	The fix is to call the function a different name than the macro,
	and to call the function, not the macro.

runtime/mercury_wrapper.c:
	In own stack minimal model grades, create a main context separate
	from the current context, since the current context may be needed
	to hold a generator's state. Make MR_eng_this_context point to
	this context.

	Register all labels in the debugging variants of minimal model grades.

runtime/mercury_accurate_gc.c:
runtime/mercury_agc_debug.c:
runtime/mercury_debug.c:
library/exception.m:
	Conform to the change to runtime/mercury_label.h.

runtime/mercury_stack_trace.c:
	Conform to the change to runtime/mercury_label.h.

	Document the link to trace/mercury_trace_internal.c.

trace/mercury_trace.[ch]:
trace/mercury_trace_cmd_forward.c:
	Split the GOTO command into two: STEP and GOTO. STEP always stops
	at the next event (without any test), even if it is in a different
	context (and possibly with a lower event number than the immediately
	previous event, since the event numbers in different contexts are
	not related). As before, GOTO always goes to the specified event
	number, but in .dmmos grades it can now be told that this event number
	should be matched only in a specified context. The specification is
	done by an extra argument specifying the short name of the context's
	generator; the ansence of such an argument means the main context.

trace/mercury_trace_cmd_internal.c:
	In own stack grades, when the current context is that of a generator,
	print the subgoal the generator is working on before the event number,
	call depth, call sequence number and the rest of the event report.

	Document the link to runtime/mercury_stack_trace.c, which has similar
	code.

trace/mercury_trace_cmd_external.c:
trace/mercury_trace_cmd_declararive.c:
	Use the STEP command where GOTO was used for this simpler job,
	since this is (very slightly) faster.

trace/mercury_trace_cmd_developer.c:
	Fix some bugs with handling own stack tables.

doc/user_guide.texi:
	Document the new functionality of the goto mdb command. The
	documentation is commented out, since .mmos grades are for developers
	only at the moment.

tools/lmc.in:
	Turn off C optimizations when C debugging is enabled. For some reason,
	the default value of --cflags-for-debug does not include -O0.
2007-01-03 05:17:21 +00:00
Zoltan Somogyi
82eab0e65e Add an optional pass that tries to avoid generating too many parallel goals.
Estimated hours taken: 12
Branches: main

Add an optional pass that tries to avoid generating too many parallel goals.
The first transformation implemented by this pass is to transform parallel
conjunctions into goals of the form

	( queues already contain lot of work ->
		sequential version of parallel conjunction
	;
		parallel conjunction as before
	)

if they contain recursive calls.

The effect of this transformation is to reduce the overhead of the new par_fib
test case from:

	fib(35): sequential 189 vs parallel 5770

to

	fib(35): sequential 189 vs parallel 1090

i.e. a speedup of more than a factor of five.

compiler/granularity.m:
	New module that implements this transformation. I intend to add other,
	more sophisticated transformations in the future.

compiler/transform_hlds.m:
	Add granularity.m as one of the submodules of transform_hlds.m.

compiler/mercury_compile.m:
	Invoke the new pass.

	Invoke dep_par_conj only if needed.

	Fix some stage numbers.

compiler/notes/compiler_design.html:
	Document the new module.

	Document some modules that should have been documented earlier.

	Fix a hurried deletion of a reference to the Aditi backend.

compiler/goal_util.m:
	Add some utility functions for use by the new module.

compiler/simplify.m:
	Record the information mercury_compile.m needs in order to check
	whether we have any parallelism for granularity.m and dep_par_conj.m
	to process.

compiler/hlds_module.m:
	Add a slot to the module_info to record the information from
	simplify.m. Clean up some interfaces.

compiler/add_type.m:
	Conform to the change to hlds_module.m.

compiler/dependency_graph.m:
	Delete unnecessary module qualifications, and rename some predicates
	to avoid potential ambiguities.

compiler/options.m:
	Add the options required for controlling the new transformation.

	Rename an option's internal name to avoid conflict with a language
	keyword (the user-visible name remains unchanged).

	Move some options around to put them in logical groups.

doc/user_guide.texi:
	Document the new options.

	Fix some omissions in some earlier options.

compiler/handle_options.m:
compiler/termination.m:
	Conform to the option rename.

compiler/quantification.m:
	Rename some predicates to avoid ambiguities.

library/par_builtin.m:
	Add a predicate for use by the new transformation.

tests/par_conj/par_fib.{m,exp}:
	A new test case: a version of fib for use in testing parallelism.

tests/par_conj/Mmakefile:
	Enable the new test case.
2006-11-03 08:31:22 +00:00
Zoltan Somogyi
ecf1ee3117 Add a mechanism for growing the stacks on demand by adding new segments
Estimated hours taken: 20
Branches: main

Add a mechanism for growing the stacks on demand by adding new segments
to them. You can ask for the new mechanism via a new grade component, stseg
(short for "stack segments").

The mechanism works by adding a test to each increment of a stack pointer (sp
or maxfr). If the test indicates that we are about to run out of stack, we
allocate a new stack segment, allocate a placeholder frame on the new segment,
and then allocate the frame we wanted in the first place on top of the
placeholder. We also override succip to make it point code that will (1)
release the new segment when the newly created stack frame returns, and then
(2) go to the place indicated by the original, overridden succip.

For leaf procedures on the det stack, we optimize away the check of the stack
pointer. We can do this because we reserve some space on each stack for the
use of such stack frames.

My intention is that doc/user_guide.texi and NEWS will be updated once we have
used the feature ourselves for a while and it seems to be stable.

runtime/mercury_grade.h:
	Add the new grade component.

runtime/mercury_conf_param.h:
	Document the new grade component, and the option used to debug stack
	segments.

runtime/mercury_context.[ch]:
	Add new fields to contexts to hold the list of previous segments of the
	det and nondet stacks.

runtime/mercury_memory_zones.[ch]:
	Include a threshold in all zones, for use in stack segments.
	Set it when a zone is allocated.

	Restore the previous #ifdef'd out function MR_unget_zone, for use
	when freeing stack segments execution has fallen out of.

runtime/mercury_debug.[ch]:
	When printing the offsets of pointers into the det and nondet stacks,
	print the number of the segment the pointer points into (unless it is
	the first, in which case we suppress this in the interest of brevity
	and simplicity).

	Make all the functions in this module take a FILE * as an input
	argument; don't print to stdout by default.

runtime/mercury_stacks.[ch]:
	Modify the macros that allocate stack frames to invoke the code for
	adding new stack segments when we are about to run out of stack.

	Standardize on "nondet" over "nond" as the abbreviation referring to
	the nondet stack.

	Conform to the changes in mercury_debug.c.

runtime/mercury_stack_trace.c:
	When traversing the stack, step over the placeholder stack frames
	at the bottoms of stack segments.

	Conform to the changes in mercury_debug.c.

runtime/mercury_wrapper.[ch]:
	Make the default stack size small in grades that support stack
	segments.

	Standardize on "nondet" over "nond" as the abbreviation referring to
	the nondet stack.

	Conform to the changes in mercury_debug.c.

runtime/mercury_memory.c:
	Standardize on "nondet" over "nond" as the abbreviation referring to
	the nondet stack.

runtime/mercury_engine.[ch]:
runtime/mercury_overflow.h:
	Standardize on "nondet" over "nond" as the abbreviation referring to
	the nondet stack.

	Convert these files to four-space indentation.

runtime/mercury_minimal_model.c:
trace/mercury_trace.c:
trace/mercury_trace_util.c:
	Conform to the changes in mercury_debug.c.

compiler/options.m:
	Add the new grade option for stack segments.

compiler/compile_target_code.m:
compiler/handle_options.m:
	Add the new grade component, and handle its exclusions with other grade
	components and optimizations.

compiler/llds.m:
	Extend the incr_sp instruction to record whether the stack frame
	is for a leaf procedure.

compiler/llds_out.m:
	Output the extended incr_sp instruction.

compiler/proc_gen.m:
	Fill in the extra slot in incr_sp instructions.

compiler/goal_util.m:
	Provide a predicate for testing whether a procedure body is a leaf.

compiler/delay_slot.m:
compiler/dupelim.m:
compiler/dupproc.m:
compiler/exprn_aux.m:
compiler/frameopt.m:
compiler/global_data.m:
compiler/jumpopt.m:
compiler/middle_rec.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/peephole.m:
compiler/reassign.m:
compiler/use_local_vars.m:
	Conform to the change in llds.m.

scripts/canonicate_grade.sh-subr:
scripts/init_grade_options.sh-subr:
scripts/parse_grade_options.sh-subr:
scripts/final_grade_options.sh-subr:
scripts/mgnuc.in:
	Handle the new grade component.

	Convert parse_grade_options.sh-subr to four-space indentation.

Mmake.workspace:
	Fix an old bug that prevented bootcheck from working in the new grade:
	when computing the gc grade, use the workspace's version of ml (which
	in this case understands the new grade components), rather than the
	installed ml (which does not).

	(This was a devil to track down, because neither make --debug nor
	strace on make revealed how the installed ml was being invoked,
	and there was no explicit invocation in the Makefile either; the error
	message appeared to come out of thin air just before the completion
	of the stage 2 library. It turned out the invocation happened
	implicitly, as a result of expanding a make variable.)
2006-11-01 02:31:19 +00:00