Commit Graph

8 Commits

Author SHA1 Message Date
Mark Brown
d465fa53cb Update the COPYING.LIB file and references to it.
Discussion of these changes can be found on the Mercury developers
mailing list archives from June 2018.

COPYING.LIB:
    Add a special linking exception to the LGPL.

*:
    Update references to COPYING.LIB.

    Clean up some minor errors that have accumulated in copyright
    messages.
2018-06-09 17:43:12 +10:00
Zoltan Somogyi
53b573692a Convert C code to use // style comments.
runtime/*.[ch]:
trace/*.[chyl]:
    As above. In some places, improve comments, e.g. by expanding contractions
    such as "we've". Add #ifndef guards against double inclusion around
    the trace/*.h files that did not already have them.

tools/*:
    Make the corresponding changes in shell scripts that generate .[ch] files
    in the runtime.

tests/*:
    Conform to a slight change in the text of a message.
2016-07-14 13:57:35 +02:00
Zoltan Somogyi
67326f16e4 Fix style issues in the runtime.
Move all .h and .c files to four-space indentation without tabs,
if they weren't there already.

Use the same vim line for all .h and .c files.

Align all backslashes at the ends of lines in macro definitions.
Align close comment signs.

In some places, fix inconsistent indentation.

Fix a bunch of comments. Add XXXs to a few of them.
2016-07-09 12:14:00 +02:00
Peter Wang
7e26b55e74 Implement a new form of memory profiling, which tells the user what memory
Branches: main

Implement a new form of memory profiling, which tells the user what memory
is being retained during a program run.  This is done by allocating an extra
word before each cell, which is used to "attribute" the cell to an
allocation site.  The attribution, or "allocation id", is an address to an
MR_AllocSiteInfo structure generated by the Mercury compiler, giving the
procedure, filename and line number of the allocation, and the type
constructor and arity of the cell that it allocates.

The user must manually instrument the program with calls to
`benchmarking.report_memory_attribution', which forces a GC and summarises
the live objects on the heap using the attributions.  The mprof tool is
extended with a new mode to parse and present that data.

Objects which are unattributed (e.g. by hand-written C code which hasn't
been updated) are still accounted for, but show up in profiles as "unknown".

Currently this profiling mode only works in conjunction with the Boehm
garbage collector, though in principle it can work with any memory allocator
for which we can access a list of the live objects.  Since term size
profiling relies on the same technique of using an extra word per memory
cell, the two profiling modes are incompatible.

The output from `mprof -s' looks like this:

------ [1] some label ------
   cells            words         cumul  procedure / type (location)
   14150            38872                total

*   1949/ 13.8%      4872/ 12.5%  12.5%  <predicate `parser.parse_rest/7' mode 0>
     975/  6.9%      1950/  5.0%         list.list/1 (parser.m:502)
     487/  3.4%      1948/  5.0%         term.term/1 (parser.m:501)
     487/  3.4%       974/  2.5%         term.const/0 (parser.m:501)

*   1424/ 10.1%      4272/ 11.0%  23.5%  <predicate `parser.parse_simple_term_2/6' mode 0>
     708/  5.0%      2832/  7.3%         term.term/1 (parser.m:643)
     708/  5.0%      1416/  3.6%         term.const/0 (parser.m:643)
...


boehm_gc/alloc.c:
boehm_gc/include/gc.h:
boehm_gc/misc.c:
boehm_gc/reclaim.c:
	Add a callback function to be called for every live object after a GC.

	Add a function to write out the GC_size_map array.

compiler/layout.m:
	Define the alloc_site_info type which is equivalent to the
	MR_AllocSiteInfo C structure.

	Add alloc_site_array as a kind of "layout" array.

compiler/llds.m:
	Add allocation sites to `cfile' structure.

	Replace TypeMsg argument (which was also for profiling) on `incr_hp'
	instructions by an allocation site identifier.

	Add a new foreign_proc_component for allocation site ids.

compiler/code_info.m:
compiler/global_data.m:
compiler/proc_gen.m:
	Keep the set of allocation sites in the code_info and global_data
	structures.

compiler/unify_gen.m:
	Add allocation sites to LLDS allocation instructions.

compiler/layout_out.m:
compiler/llds_out_file.m:
compiler/llds_out_instr.m:
	Output MR_AllocSiteInfo arrays in generated C files.

	Output code to register the MR_AllocSiteInfo array with the Mercury
	runtime.

	Output allocation site ids for memory allocation instructions.

compiler/llds_out_util.m:
	Add allocation sites to llds_out_info.

compiler/pragma_c_gen.m:
compiler/ml_foreign_proc_gen.m:
	Generate a macro MR_ALLOC_ID which resolves to an allocation site
	structure, for every foreign_proc whose C code contains the string
	"MR_ALLOC_ID".  This is to be used by hand-written C code which
	allocates memory.

	MR_PROC_LABELs are retained for backwards compatibility.  Though
	they were introduced for profiling, they seem to have been co-opted
	for printf-debugging since then.

compiler/ml_global_data.m:
	Add allocation site structures to the MLDS global data.

compiler/mlds.m:
compiler/ml_unify_gen.m:
	Add allocation site id to `new_object' instruction.

compiler/mlds_to_c.m:
	Output allocation site arrays and allocation ids in high-level C code.

	Output a call to register the allocation site array with the Mercury
	runtime.

	Delete an unused predicate.

compiler/exprn_aux.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/mercury_compile_llds_back_end.m:
compiler/middle_rec.m:
compiler/ml_accurate_gc.m:
compiler/ml_elim_nested.m:
compiler/ml_optimize.m:
compiler/ml_util.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_gcc.m:
compiler/mlds_to_il.m:
compiler/mlds_to_java.m:
compiler/mlds_to_managed.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/use_local_vars.m:
compiler/var_locn.m:
	Conform to changes.

compiler/pickle.m:
compiler/prog_event.m:
compiler/timestamp.m:
	Conform to changes in memory allocation macros.

library/benchmarking.m:
	Add the `report_memory_attribution' instrumentation predicates.

	Conform to changes to MR_memprof_record.

library/array.m:
library/bit_buffer.m:
library/bitmap.m:
library/construct.m:
library/deconstruct.m:
library/dir.m:
library/io.m:
library/mutvar.m:
library/store.m:
library/string.m:
library/thread.semaphore.m:
library/version_array.m:
	Use attributed memory allocation throughout the standard library so
	that objects don't show up in the memory profile as "unknown".

	Replace MR_PROC_LABEL by MR_ALLOC_ID.

mdbcomp/program_representation.m:
mdbcomp/rtti_access.m:
	Replace MR_PROC_LABEL by MR_ALLOC_ID.

profiler/Mercury.options:
profiler/globals.m:
profiler/mercury_profile.m:
profiler/options.m:
profiler/output.m:
profiler/snapshots.m:
	Add a new mode to `mprof' to parse and present the data from
	`Prof.Snapshots' files.

	Add options for the new profiling mode.

profiler/process_file.m:
	Fix a typo.

runtime/mercury_conf_param.h:
	#define MR_MPROF_PROFILE_MEMORY_ATTRIBUTION if memory profiling
	is enabled and we are using Boehm GC.

runtime/mercury.h:
	Make MR_new_object take an allocation id argument.

	Conform to changes in memory allocation macros.

runtime/mercury_memory.c:
runtime/mercury_memory.h:
runtime/mercury_types.h:
	Define MR_AllocSiteInfo.

	Add memory allocation functions and macros which take into the
	account the additional word necessary for the new profiling mode.
	These should be used in preferences to the raw memory allocation
	functions wherever possible so that objects do not show up in the
	profile as "unknown".

	Add analogues of realloc/free which take into account the offset
	introduced by the attribution word.

	Add function versions of the MR_new_object macros, which can't be
	written in standard C.  They are only used when necessary.

	Add built-in allocation site ids, to be used in the runtime and
	other hand-written code when context-specific ids are unavailable.

runtime/mercury_heap.h:
	Make MR_tag_offset_incr_hp_msg and MR_tag_offset_incr_hp_atomic_msg
	allocate an extra word when memory attribution is desired, and store
	the allocation id there.

	Similarly for MR_create{1,2,3}_msg.

	Replace proclabel arguments in allocation macros by alloc_id
	arguments.

	Replace MR_hp_alloc_atomic by MR_hp_alloc_atomic_msg.  It was only
	used for boxing floats.

	Conform to change to MR_new_object macro.

runtime/mercury_bootstrap.h:
	Delete obsolete macro hp_alloc_atomic.

runtime/mercury_heap_profile.c:
runtime/mercury_heap_profile.h:
	Add the code to summarise the live objects on the Boehm GC heap and
	writes out the data to `Prof.Snapshots', for display by mprof.

	Don't store the procedure name in MR_memprof_record: the procedure
	address is enough and faster to compare.

runtime/mercury_prof.c:
	Finish and close the `Prof.Snapshots' file when the program
	terminates.

	Conform to changes in MR_memprof_record.

runtime/mercury_misc.h:
	Add a macro to expand to the name of the allocation sites array
	in LLDS grades.

runtime/mercury_bitmap.c:
runtime/mercury_bitmap.h:
	Pass allocation id through bitmap allocation functions.

	Delete unused function MR_string_to_bitmap.

runtime/mercury_string.h:
	Add MR_make_aligned_string_copy_msg.

	Make string allocation macros take allocation id arguments.

runtime/mercury.c:
runtime/mercury_array_macros.h:
runtime/mercury_context.c:
runtime/mercury_deconstruct.c:
runtime/mercury_deconstruct_macros.h:
runtime/mercury_dlist.c:
runtime/mercury_engine.c:
runtime/mercury_float.h:
runtime/mercury_hash_table.c:
runtime/mercury_ho_call.c:
runtime/mercury_label.c:
runtime/mercury_prof_mem.c:
runtime/mercury_stacks.c:
runtime/mercury_stm.c:
runtime/mercury_string.c:
runtime/mercury_thread.c:
runtime/mercury_trace_base.c:
runtime/mercury_trail.c:
runtime/mercury_type_desc.c:
runtime/mercury_type_info.c:
runtime/mercury_wsdeque.c:
	Use attributed memory allocation throughout the runtime so that
	objects don't show up in the profile as "unknown".

runtime/mercury_memory_zones.c:
	Attribute memory zones to the Mercury runtime.

runtime/mercury_tabling.c:
runtime/mercury_tabling.h:
	Use attributed memory allocation macros for tabling structures.

	Delete unused MR_table_realloc_* and MR_table_copy_bytes macros.

runtime/mercury_deep_copy_body.h:
	Try to retain the original attribution word when copying values.

runtime/mercury_ml_expand_body.h:
	Conform to changes in memory allocation macros.

runtime/mercury_tags.h:
	Replace proclabel arguments by alloc_id arguments in allocation macros.

runtime/mercury_wrapper.c:
	If memory attribution is enabled, tell Boehm GC that pointers may be
	displaced by an extra word.

trace/mercury_trace.c:
trace/mercury_trace_tables.c:
	Conform to changes in memory allocation macros.

extras/net/tcp.m:
extras/solver_types/library/any_array.m:
extras/trailed_update/tr_array.m:
	Conform to changes in memory allocation macros.

doc/user_guide.texi:
	Document the new profiling mode.

doc/reference_manual.texi:
	Update a commented out example.
2011-05-20 04:16:58 +00:00
Paul Bone
f1779bd1e8 Improve work stealing. Spark deques have been associated with contexts so far.
This is a problem for the following reasons:

    The work stealing code must take a lock to access the resizeable array of
    work stealing dequeues.  This adds global contention that can be avoided if
    this array has a fixed size.

    If a context is blocked on a future then that engine cannot execute the
    sparks from that context, instead it tries to find global work, this is
    more expensive than necessary.

    If there are a few dozen contexts then there may be just as many work
    stealing queues to take work from, the density of these queues will be
    higher if they are fewer.  Therefore work stealing will be more successful
    on average.

This change associates spark deques with Mercury Engines rather than Contexts
to avoid these problems.

This has invalidated some invariants that allowed the runtime system to make
some worth-while optimisations.  These optimisations have been maintained.
Mercury's idle loop has been reimplemented to allow for this.  This
re-implementation has allowed for a number of other improvements:

    Polling was used to check for new global sparks.  This has been removed and
    each engine now sleeps using it's own semaphore.

    Checks for work can be done in different orders depending on how an engine
    joins the idle loop.

    When global work becomes available a particular engine can be woken up
    rather than any arbitrary engine.  We take advantage of this when making
    contexts runnable, we try to schedule them on the engine that last executed
    them.

    When an engine is woken up it can be instructed with what it should do upon
    waking up.

    When a engine looks for a context to run, it will try to pick a context
    that was last executed on it.  This may avoid cache misses when the context
    begins to run.

In the future we should consider:
    Experiment with telling engines which context to run.

    Improve the selection of which engine work should be scheduled on to be
    hardware and memory-hierarchy aware.

Things that need doing next (probably next week):
    ./configure should check for POSIX semaphore support.

    Profiling times have been broken by this change, they will need fixing.

    The threadscope event long now breaks an invariants that the threadscope
    graphical tool requires.

    Semaphores are setup but never released, this is not a big problem but the
    manual page says that some implementations may leak resources.

runtime/mercury_context.h:
runtime/mercury_context.c:
    Remove the spark deque field from the MR_Context structure.

    Export the new array of spark deques so that other modules may fill in
    elements as engines are setup.

    Modify the resume_owner_thread field of the MR_Context structure, this was
    used to ensure that a context returning through C code would be resumed on
    the engine with the correct C stack and depth.  This field is now an engine
    id and has been renamed to resume_owner_engine, it is advisory unless
    resume_engine_required is also set.  This way it is used to advise which
    engine most recently executed this context and therefore may have a warm
    cache.

    Remove code that dynamically resized the array of spark deques.  Including
    the lock that protected against updating this array while it was being read
    from other thread.

    Introduce code that initialises the statically sized array of spark deques.

    Reimplement the idle loop.  This replaces MR_runnext and MR_do_runnext with
    MR_idle and MR_do_idle respectively.  There are also two new entry points
    into the idle loop.  Which one to use depends on the state of the engine.

    Introduce new mechanisms for waking a particular engine.  For example the
    engine that last executed a context that is now runnable.

    Change the algorithm for selecting which context to run, try to select
    contexts that where last used on the current engine to avoid cache misses.

    Use an engine's victim counter rather than a global victim counter when
    trying to steal work.

    Introduce some conditionally-compiled code that can be used to profile how
    quickly new contexts can be created.

    Rename MR_init_thread_stuff and MR_finalize_thread_stuff.  The term thread
    has been replaced with context since they're in mercury_context.c.  This
    allows the creation of a new function MR_init_thread_stuff() in
    mercury_thread.c I also found the mismatch between the function names and
    file name confusing.  Move some of the code from MR_init_context_stuff to
    the new MR_init_thread_stuff function where it belongs.

    Refactor the thread pinning code so that even when thread pinning is
    disabled it can be used to allocate each thread to a CPU but not actually
    pin them.

    Fix some whitespace errors.

runtime/mercury_thread.h:
runtime/mercury_thread.c:
    In MR_init_engine():
        Allocate an engine id for each engine.

        A number of arrays had one slot per engine and where setup using a
        lock.  Now engine ids are used to index each array and setup is done
        without a lock, each engine simply sets up its own slot.

        Setup the new per-engine work stealing deques.

    The MR_all_engine_bases array has been moved to this file.

    Implement a new MR_init_thread_stuff function which initialises some global
    variables and locks.  Some of MR_init_thread_stuff has been moved from
    mercury_context.c

    Pin threads as part of MR_init_thread, excluding the primordial thread
    which must be pinned before threadscope is initialised.

    Add functions for debugging the use of semaphores.

    Add corresponding macros that can be used to redirect semaphore calls to
    debugging functions as above.

    Improved thread debugging code, ensured that stderr is flushed after every
    use, and that logging is done after calls return as well as before they're
    called.

    Conform to changes in mercury_context.h

runtime/mercury_engine.h:
runtime/mercury_engine.c:
    Add spark deque and victim counter fields to the MercuryEngine structure.

    Make the MR_eng_id field of the MercuryEngine structure available in all
    thread safe grades, formerly it was used in only threadscope grades.

    Move the MR_all_engine_bases variable to mercury_thread.[ch]

    Put a reference to the engine's spark queue into the global array.  This is
    done here, so that it is after thread pinning because the original plan was
    to have this array sorted by CPU rather then engine - we may yet do this in
    the future.

    Initialise an engine's spark deque when an engine is initialised.

    Setup the engine specific threadscope data in mercury_thread.c

    Conform to changes in mercury_context.h

runtime/mercury_wrapper.c:
    The engine base array is no longer setup here, that code has been moved to
    mercury_thread.c

    Conform to changes in mercury_context.h and mercury_thread.h

runtime/mercury_wsdeque.h:
runtime/mercury_wsdeque.c:
    The original implementation allocated an array for a spark queue only if
    one wasn't already allocated, which could happen when a context was reused.
    Now that spark queues are associated with engines arrays are always
    allocated.

    Replaced two macros with a single macro since there's no-longer a
    distinction between global and local work queues, all work queues are
    local.

runtime/mercury_wsdeque.c:
runtime/mercury_wsdeque.h:
    Remove the --worksteal-max-attempts and --worksteal-sleep-msecs options as
    they are no-longer used.

runtime/mercury_threadscope.h:
runtime/mercury_threadscope.c:
    The MR_EngineId type has been moved to mercury_types.h

    Engine IDs are no-longer allocated here, this is done in mercury_thread.c

    The run spark and steal spark messages now write 0xFFFFFFFF for the context
    id if there is no current context.  Previously this would dereference a
    null pointer.

runtime/mercury_memory_zones.c:
    When checking for an existing memory zone check the free_zones_list
    variable before taking a lock.  This can prevent taking the lock in cases
    where there are no free zones.

    Introduce some conditionally-compiled code that can be used to profile how
    quickly new contexts can be created.

runtime/mercury_bootstrap.h:
    Remove macros that no-longer resolve to functions due to changes in the
    runtime system.

runtime/mercury_types.h:
    Move the MR_EngineId type from mercury_threadscope.h to mercury_types.h

runtime/mercury_grade.h:
    Introduce a parallel grade version number, this change brakes binary
    compatibility with existing parallel code.

runtime/mercury_backjump.c:
runtime/mercury_par_builtin.c:
runtime/mercury_mm_own_stacks.c:
library/stm_builtin.m:
library/thread.m:
library/thread.semaphore.m:
    Conform to changes in mercury_context.h.

library/io.m:
    Make this module compatible with MR_debug_threads.

doc/user_guide.texi
    Remove the documentation for the --worksteal-max-attempts and
    --worksteal-sleep-msecs options.  The documentation was already commented
    out.
2011-04-13 13:19:42 +00:00
Paul Bone
322feaf217 Add more threadscope instrumentation.
This change introduces instrumentation that tracks sparks as well as parallel
conjunctions and their conjuncts.  This should hopefully give us more
information to diagnose runtime performance issues.

As of this date the ThreadScope program hasn't been updated to read or
understand these new events.

runtime/mercury_threadscope.[ch]:
    Added a function and types to register all the threadscope strings from an
    array.

    Add functions to post the new events (see below).

runtime/mercury_threadscope.c:
    Added support for 5 new threadscope events.
        Registering a string so that other messages may refer to a constant
        string.

        Marking the beginning and ends of parallel conjunctions.

        Creating a spark for a parallel conjunct.

        Finishing a parallel conjunct.

    Re-arranged event IDs, I've started allocating IDs from 38 onwards for
    general purposes and 100 onwards for mercury specific events after talking
    with Duncan Coutts.

    Trimmed excess whitespace from the end of lines.

runtime/mercury_context.h:
    Post a beginning parallel conjunction message when the sync term for the
    parallel conjunction is initialized.

    Post an event when creating a spark for a parallel conjunction.

    Add a MR_spark_id field to the MR_Spark structure, these identify sparks to
    threadscope.

runtime/mercury_context.c:
    Post threadscope messages when a spark is about to be executed.

    Post a threadscope event when a parallel conjunct is completed.

    Add a missing memory barrier.

runtime/mercury_wrapper.[ch]:
    Create a global function pointer for the code that registers strings in the
    threadscope string table, this is filled in by mkinit.

    Call this function pointer immediatly after setting up threadscope.

runtime/mercury_wsdeque.[ch]:
    Modify MR_wsdeque_pop_bottom to return the spark pointer (which points onto
    the queue) rather then returning a result through a pointer and bool if the
    operation was successful.  This pointer is safe to dereference until
    MR_wsdeque_push_bottom is used.

runtime/mercury_wsdeque.c:
    Corrected a code comment.

runtime/mercury_engine.h:
    Documented some of the fields of the engine structure that hadn't been
    documented.

    Add a next spark ID field to the engine structure.

    Change the type of the engine ID field to MR_uint_least16_t

compiler/llds.m:
    Add a third field to the init_sync_term instruction that stores the index
    into the threadscope string table of the static conjunction ID.

    Add a field to the c_file structure containing the threadscope string
    table.

compiler/layout.m:
    Added a new layout array name for the threadscope string table.

compiler/layout_out.m:
    Implement code to write out the threadscope string table.

compiler/llds_out_file.m:
    Write out the threadscope string table when writing out the c_file.

compiler/par_conj_gen.m:
    Create strings that statically identify parallel conjunctions for each
    init_sync_term LLDS instruction.  These strings are added to a table in the
    !CodeInfo and the index of the string is added to the init_sync_term
    instruction.

    Add an extra instruction after a parallel conjunction to post the message
    that the parallel conjunction has completed.

compiler/global_data.m:
    Add fields to the global data structure to represent the threadscope string
    table and its current size.

    Add predicates to update and retrieve the table.

    Handle merging of threadscope string tables in global data by allowing the
    references to the strings to be remapped.

    Refactored remapping code so that a caller such as proc_gen only needs to
    call one remapping predicate after merging global data..

compiler/code_info.m:
    Add a table of strings for use with threadscope to the code_info_persistent
    type.

    Modify the code_info_init to initialise the threadscope string table fields.

    Add a predicate to get the string table and another to update it.

compiler/proc_gen.m:
    Build the containing goal map before code generation for procedures with
    parallel conjunctions in a parallel grade.  par_conj_gen.m depends on this.

    Conform to changes in code_info.m and global_data.m

compiler/llds_out_instr.m:
    Write out the extra parameter in the init_sync_term instruction.

compiler/dupelim.m:
compiler/dupproc.m:
compiler/exprn_aux.m:
compiler/global_data.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/llds_to_x86_64.m:
compiler/mercury_compile_llds_back_end.m:
compiler/middle_rec.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/peephole.m:
compiler/reassign.m:
compiler/use_local_vars.m:
    Conform to changes in llds.m

compiler/opt_debug.m:
    Conform to changes in layout.m

compiler/mercury_compile_llds_back_end.m:
    Fix some trailing whitespace.

util/mkinit.c:
    Build an initialisation function that registers all the strings in
    threadscope string tables.

    Correct the layout of a comment.
2011-03-25 03:13:42 +00:00
Paul Bone
edc230406e Fix a number of errors and warnings in the runtime picked up by GCC 4.x in
parallel and threadscope grades.

We had been using types with the wrong signedness well calling atomic operations.
GCC 4.x also picked up an error where #elif was used instead of #else.

While testing these changes on a 32bit system more bugs where found on the i386
architecture and on AMD brand processors.

runtime/mercury_atomic_ops.h:
runtime/mercury_atomic_ops.c:
    Add unsigned variants of the following atomic operations:
        increment,
        add,
        add_and_fetch,
        dec_and_is_zero,

    Add a signed variant for compare and swap.

    Rename the MR_atomic_dec_<type>_and_is_zero operation to move the type to
    the end of the name.

    Use volatile storage in the MR_Stats structure.

    A 32bit machine cannot do atomic operations on 64bit values and MR_Stats
    must use 64bit values.  Therefore 64bit values in the MR_Stats structure
    are now protected by a lock on 32bit machines.

runtime/mercury_atomic_ops.h:
    Fix a typeo in the i386 version of MR_atomic_dec_and_is_zero_uint().

runtime/mercury_atomic_ops.c:
    AMD CPUs do not conform to Intel's specification for being able to
    extract the CPU clock speed from the brand string.  When we cannot
    determine the CPU's clock speed then we write out threadscope
    timestamps in raw clock cycles rather than nanoseconds.

    On i386 machines the ebx register is used to implement PIC code,
    however the CPUID instruction uses it to output information.  Save
    this register on C's stack while we issue CPUID and retrieve the
    result in ebx.

    We now pass native machine sized values to the inline assembler code
    that implements RDTSC and RDTSCP.

    Fix commenting style in some places.

runtime/mercury_atomic_ops.c:
    Fix some incorrect C preprocessor code for conditional compilation.

runtime/mercury_grade.h:
    Increment binary compatibility number.  This should have been done in a
    prior change when the MR_runnext macro changed which broke binary
    compatibility in the parallel low-level C grades.

runtime/mercury_context.h:
    In MR_SyncTerm_Struct use an unsigned value for the number of conjuncts
    remaining before the conjunction is complete.

runtime/mercury_threadscope.c:
    Record raw cpu clock ticks rather than milliseconds when we don't
    know the processor's clock speed.

runtime/mercury_context.c:
runtime/mercury_wsdeque.h:
runtime/mercury_wsdeque.c:
    Conform to changes in mercury_atomic_ops.h
2010-03-20 10:15:51 +00:00
Peter Wang
fa80b9a01a Make the parallel conjunction execution mechanism more efficient.
Branches: main

Make the parallel conjunction execution mechanism more efficient.

1. Don't allocate sync terms on the heap.  Sync terms are now allocated in
the stack frame of the procedure call which originates a parallel
conjunction.

2. Don't allocate individual sparks on the heap.  Sparks are now stored in
preallocated, growing arrays using an algorithm that doesn't use locks.

3. Don't have one mutex per sync term.  Just use one mutex to protect
concurrent accesses to all sync terms (it's is rarely needed anyway).  This
makes sync terms smaller and saves initialising a mutex for each parallel
conjunction encountered.

4. We don't bother to acquire the global sync term lock if we know a parallel
conjunction couldn't be executing in parallel.  In a highly parallel program,
the majority of parallel conjunctions will be executed sequentially so
protecting the sync terms from concurrent accesses is unnecessary.


par_fib(39) is ~8.4 times faster (user time) on my laptop (Linux 2.6, x86_64),
which is ~3.5 as slow as sequential execution.


configure.in:
	Update the configuration for a changed MR_SyncTerm structure.

compiler/llds.m:
	Make the fork instruction take a second argument, which is the base
	stack slot of the sync term.

	Rename it to fork_new_child to match the macro name in the runtime.

compiler/par_conj_gen.m:
	Change the generated code for parallel conjunctions to allocate sync
	terms on the stack and to pass the sync term to fork_new_child.

compiler/dupelim.m:
compiler/dupproc.m:
compiler/exprn_aux.m:
compiler/global_data.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/llds_out.m:
compiler/llds_to_x86_64.m:
compiler/middle_rec.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/reassign.m:
compiler/use_local_vars.m:
	Conform to the change in the fork instruction.

compiler/liveness.m:
compiler/proc_gen.m:
	Disable use of the parallel conjunction operator in the compiler as
	older versions of the compiler will generate code incompatible with
	the new runtime.

runtime/mercury_context.c:
runtime/mercury_context.h:
	Remove the next pointer field from MR_Spark as it's no longer needed.

	Remove the mutex from MR_SyncTerm.  Add a field to record if a spark
	belonging to the sync term was scheduled globally, i.e. if the
	parallel conjunction might be executed in parallel.

	Define MR_SparkDeque and MR_SparkArray.

	Use MR_SparkDeques to hold per-context sparks and global sparks.

	Change the abstract machine instructions MR_init_sync_term,
	MR_fork_new_child, MR_join_and_continue as per the main change log.

	Use a preprocessor macro MR_LL_PARALLEL_CONJ as a shorthand for
	!MR_HIGHLEVEL_CODE && MR_THREAD_SAFE.

	Take the opportunity to clean things up a bit.

runtime/mercury_wsdeque.c:
runtime/mercury_wsdeque.h:
	New files containing an implementation of work-stealing deques.  We
	don't do work stealing yet but we use the underlying data structure.

runtime/mercury_atomic.c:
runtime/mercury_atomic.h:
	New files to contain atomic operations.  Currently it just contains
	compare-and-swap for gcc/x86_64, gcc/x86 and gcc-4.1.

runtime/Mmakefile:
	Add the new files.

runtime/mercury_engine.h:
runtime/mercury_mm_own_stacks.c:
runtime/mercury_wrapper.c:
	Conform to runtime changes.

runtime/mercury_conf_param.h:
	Update an outdated comment.
2007-10-11 11:45:22 +00:00