mirror of
https://github.com/Mercury-Language/mercury.git
synced 2026-04-23 13:23:47 +00:00
083d376e6598628362ee91c2da170febd83590f4
8 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
d465fa53cb |
Update the COPYING.LIB file and references to it.
Discussion of these changes can be found on the Mercury developers
mailing list archives from June 2018.
COPYING.LIB:
Add a special linking exception to the LGPL.
*:
Update references to COPYING.LIB.
Clean up some minor errors that have accumulated in copyright
messages.
|
||
|
|
53b573692a |
Convert C code to use // style comments.
runtime/*.[ch]:
trace/*.[chyl]:
As above. In some places, improve comments, e.g. by expanding contractions
such as "we've". Add #ifndef guards against double inclusion around
the trace/*.h files that did not already have them.
tools/*:
Make the corresponding changes in shell scripts that generate .[ch] files
in the runtime.
tests/*:
Conform to a slight change in the text of a message.
|
||
|
|
67326f16e4 |
Fix style issues in the runtime.
Move all .h and .c files to four-space indentation without tabs, if they weren't there already. Use the same vim line for all .h and .c files. Align all backslashes at the ends of lines in macro definitions. Align close comment signs. In some places, fix inconsistent indentation. Fix a bunch of comments. Add XXXs to a few of them. |
||
|
|
7e26b55e74 |
Implement a new form of memory profiling, which tells the user what memory
Branches: main
Implement a new form of memory profiling, which tells the user what memory
is being retained during a program run. This is done by allocating an extra
word before each cell, which is used to "attribute" the cell to an
allocation site. The attribution, or "allocation id", is an address to an
MR_AllocSiteInfo structure generated by the Mercury compiler, giving the
procedure, filename and line number of the allocation, and the type
constructor and arity of the cell that it allocates.
The user must manually instrument the program with calls to
`benchmarking.report_memory_attribution', which forces a GC and summarises
the live objects on the heap using the attributions. The mprof tool is
extended with a new mode to parse and present that data.
Objects which are unattributed (e.g. by hand-written C code which hasn't
been updated) are still accounted for, but show up in profiles as "unknown".
Currently this profiling mode only works in conjunction with the Boehm
garbage collector, though in principle it can work with any memory allocator
for which we can access a list of the live objects. Since term size
profiling relies on the same technique of using an extra word per memory
cell, the two profiling modes are incompatible.
The output from `mprof -s' looks like this:
------ [1] some label ------
cells words cumul procedure / type (location)
14150 38872 total
* 1949/ 13.8% 4872/ 12.5% 12.5% <predicate `parser.parse_rest/7' mode 0>
975/ 6.9% 1950/ 5.0% list.list/1 (parser.m:502)
487/ 3.4% 1948/ 5.0% term.term/1 (parser.m:501)
487/ 3.4% 974/ 2.5% term.const/0 (parser.m:501)
* 1424/ 10.1% 4272/ 11.0% 23.5% <predicate `parser.parse_simple_term_2/6' mode 0>
708/ 5.0% 2832/ 7.3% term.term/1 (parser.m:643)
708/ 5.0% 1416/ 3.6% term.const/0 (parser.m:643)
...
boehm_gc/alloc.c:
boehm_gc/include/gc.h:
boehm_gc/misc.c:
boehm_gc/reclaim.c:
Add a callback function to be called for every live object after a GC.
Add a function to write out the GC_size_map array.
compiler/layout.m:
Define the alloc_site_info type which is equivalent to the
MR_AllocSiteInfo C structure.
Add alloc_site_array as a kind of "layout" array.
compiler/llds.m:
Add allocation sites to `cfile' structure.
Replace TypeMsg argument (which was also for profiling) on `incr_hp'
instructions by an allocation site identifier.
Add a new foreign_proc_component for allocation site ids.
compiler/code_info.m:
compiler/global_data.m:
compiler/proc_gen.m:
Keep the set of allocation sites in the code_info and global_data
structures.
compiler/unify_gen.m:
Add allocation sites to LLDS allocation instructions.
compiler/layout_out.m:
compiler/llds_out_file.m:
compiler/llds_out_instr.m:
Output MR_AllocSiteInfo arrays in generated C files.
Output code to register the MR_AllocSiteInfo array with the Mercury
runtime.
Output allocation site ids for memory allocation instructions.
compiler/llds_out_util.m:
Add allocation sites to llds_out_info.
compiler/pragma_c_gen.m:
compiler/ml_foreign_proc_gen.m:
Generate a macro MR_ALLOC_ID which resolves to an allocation site
structure, for every foreign_proc whose C code contains the string
"MR_ALLOC_ID". This is to be used by hand-written C code which
allocates memory.
MR_PROC_LABELs are retained for backwards compatibility. Though
they were introduced for profiling, they seem to have been co-opted
for printf-debugging since then.
compiler/ml_global_data.m:
Add allocation site structures to the MLDS global data.
compiler/mlds.m:
compiler/ml_unify_gen.m:
Add allocation site id to `new_object' instruction.
compiler/mlds_to_c.m:
Output allocation site arrays and allocation ids in high-level C code.
Output a call to register the allocation site array with the Mercury
runtime.
Delete an unused predicate.
compiler/exprn_aux.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/mercury_compile_llds_back_end.m:
compiler/middle_rec.m:
compiler/ml_accurate_gc.m:
compiler/ml_elim_nested.m:
compiler/ml_optimize.m:
compiler/ml_util.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_gcc.m:
compiler/mlds_to_il.m:
compiler/mlds_to_java.m:
compiler/mlds_to_managed.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/use_local_vars.m:
compiler/var_locn.m:
Conform to changes.
compiler/pickle.m:
compiler/prog_event.m:
compiler/timestamp.m:
Conform to changes in memory allocation macros.
library/benchmarking.m:
Add the `report_memory_attribution' instrumentation predicates.
Conform to changes to MR_memprof_record.
library/array.m:
library/bit_buffer.m:
library/bitmap.m:
library/construct.m:
library/deconstruct.m:
library/dir.m:
library/io.m:
library/mutvar.m:
library/store.m:
library/string.m:
library/thread.semaphore.m:
library/version_array.m:
Use attributed memory allocation throughout the standard library so
that objects don't show up in the memory profile as "unknown".
Replace MR_PROC_LABEL by MR_ALLOC_ID.
mdbcomp/program_representation.m:
mdbcomp/rtti_access.m:
Replace MR_PROC_LABEL by MR_ALLOC_ID.
profiler/Mercury.options:
profiler/globals.m:
profiler/mercury_profile.m:
profiler/options.m:
profiler/output.m:
profiler/snapshots.m:
Add a new mode to `mprof' to parse and present the data from
`Prof.Snapshots' files.
Add options for the new profiling mode.
profiler/process_file.m:
Fix a typo.
runtime/mercury_conf_param.h:
#define MR_MPROF_PROFILE_MEMORY_ATTRIBUTION if memory profiling
is enabled and we are using Boehm GC.
runtime/mercury.h:
Make MR_new_object take an allocation id argument.
Conform to changes in memory allocation macros.
runtime/mercury_memory.c:
runtime/mercury_memory.h:
runtime/mercury_types.h:
Define MR_AllocSiteInfo.
Add memory allocation functions and macros which take into the
account the additional word necessary for the new profiling mode.
These should be used in preferences to the raw memory allocation
functions wherever possible so that objects do not show up in the
profile as "unknown".
Add analogues of realloc/free which take into account the offset
introduced by the attribution word.
Add function versions of the MR_new_object macros, which can't be
written in standard C. They are only used when necessary.
Add built-in allocation site ids, to be used in the runtime and
other hand-written code when context-specific ids are unavailable.
runtime/mercury_heap.h:
Make MR_tag_offset_incr_hp_msg and MR_tag_offset_incr_hp_atomic_msg
allocate an extra word when memory attribution is desired, and store
the allocation id there.
Similarly for MR_create{1,2,3}_msg.
Replace proclabel arguments in allocation macros by alloc_id
arguments.
Replace MR_hp_alloc_atomic by MR_hp_alloc_atomic_msg. It was only
used for boxing floats.
Conform to change to MR_new_object macro.
runtime/mercury_bootstrap.h:
Delete obsolete macro hp_alloc_atomic.
runtime/mercury_heap_profile.c:
runtime/mercury_heap_profile.h:
Add the code to summarise the live objects on the Boehm GC heap and
writes out the data to `Prof.Snapshots', for display by mprof.
Don't store the procedure name in MR_memprof_record: the procedure
address is enough and faster to compare.
runtime/mercury_prof.c:
Finish and close the `Prof.Snapshots' file when the program
terminates.
Conform to changes in MR_memprof_record.
runtime/mercury_misc.h:
Add a macro to expand to the name of the allocation sites array
in LLDS grades.
runtime/mercury_bitmap.c:
runtime/mercury_bitmap.h:
Pass allocation id through bitmap allocation functions.
Delete unused function MR_string_to_bitmap.
runtime/mercury_string.h:
Add MR_make_aligned_string_copy_msg.
Make string allocation macros take allocation id arguments.
runtime/mercury.c:
runtime/mercury_array_macros.h:
runtime/mercury_context.c:
runtime/mercury_deconstruct.c:
runtime/mercury_deconstruct_macros.h:
runtime/mercury_dlist.c:
runtime/mercury_engine.c:
runtime/mercury_float.h:
runtime/mercury_hash_table.c:
runtime/mercury_ho_call.c:
runtime/mercury_label.c:
runtime/mercury_prof_mem.c:
runtime/mercury_stacks.c:
runtime/mercury_stm.c:
runtime/mercury_string.c:
runtime/mercury_thread.c:
runtime/mercury_trace_base.c:
runtime/mercury_trail.c:
runtime/mercury_type_desc.c:
runtime/mercury_type_info.c:
runtime/mercury_wsdeque.c:
Use attributed memory allocation throughout the runtime so that
objects don't show up in the profile as "unknown".
runtime/mercury_memory_zones.c:
Attribute memory zones to the Mercury runtime.
runtime/mercury_tabling.c:
runtime/mercury_tabling.h:
Use attributed memory allocation macros for tabling structures.
Delete unused MR_table_realloc_* and MR_table_copy_bytes macros.
runtime/mercury_deep_copy_body.h:
Try to retain the original attribution word when copying values.
runtime/mercury_ml_expand_body.h:
Conform to changes in memory allocation macros.
runtime/mercury_tags.h:
Replace proclabel arguments by alloc_id arguments in allocation macros.
runtime/mercury_wrapper.c:
If memory attribution is enabled, tell Boehm GC that pointers may be
displaced by an extra word.
trace/mercury_trace.c:
trace/mercury_trace_tables.c:
Conform to changes in memory allocation macros.
extras/net/tcp.m:
extras/solver_types/library/any_array.m:
extras/trailed_update/tr_array.m:
Conform to changes in memory allocation macros.
doc/user_guide.texi:
Document the new profiling mode.
doc/reference_manual.texi:
Update a commented out example.
|
||
|
|
f1779bd1e8 |
Improve work stealing. Spark deques have been associated with contexts so far.
This is a problem for the following reasons:
The work stealing code must take a lock to access the resizeable array of
work stealing dequeues. This adds global contention that can be avoided if
this array has a fixed size.
If a context is blocked on a future then that engine cannot execute the
sparks from that context, instead it tries to find global work, this is
more expensive than necessary.
If there are a few dozen contexts then there may be just as many work
stealing queues to take work from, the density of these queues will be
higher if they are fewer. Therefore work stealing will be more successful
on average.
This change associates spark deques with Mercury Engines rather than Contexts
to avoid these problems.
This has invalidated some invariants that allowed the runtime system to make
some worth-while optimisations. These optimisations have been maintained.
Mercury's idle loop has been reimplemented to allow for this. This
re-implementation has allowed for a number of other improvements:
Polling was used to check for new global sparks. This has been removed and
each engine now sleeps using it's own semaphore.
Checks for work can be done in different orders depending on how an engine
joins the idle loop.
When global work becomes available a particular engine can be woken up
rather than any arbitrary engine. We take advantage of this when making
contexts runnable, we try to schedule them on the engine that last executed
them.
When an engine is woken up it can be instructed with what it should do upon
waking up.
When a engine looks for a context to run, it will try to pick a context
that was last executed on it. This may avoid cache misses when the context
begins to run.
In the future we should consider:
Experiment with telling engines which context to run.
Improve the selection of which engine work should be scheduled on to be
hardware and memory-hierarchy aware.
Things that need doing next (probably next week):
./configure should check for POSIX semaphore support.
Profiling times have been broken by this change, they will need fixing.
The threadscope event long now breaks an invariants that the threadscope
graphical tool requires.
Semaphores are setup but never released, this is not a big problem but the
manual page says that some implementations may leak resources.
runtime/mercury_context.h:
runtime/mercury_context.c:
Remove the spark deque field from the MR_Context structure.
Export the new array of spark deques so that other modules may fill in
elements as engines are setup.
Modify the resume_owner_thread field of the MR_Context structure, this was
used to ensure that a context returning through C code would be resumed on
the engine with the correct C stack and depth. This field is now an engine
id and has been renamed to resume_owner_engine, it is advisory unless
resume_engine_required is also set. This way it is used to advise which
engine most recently executed this context and therefore may have a warm
cache.
Remove code that dynamically resized the array of spark deques. Including
the lock that protected against updating this array while it was being read
from other thread.
Introduce code that initialises the statically sized array of spark deques.
Reimplement the idle loop. This replaces MR_runnext and MR_do_runnext with
MR_idle and MR_do_idle respectively. There are also two new entry points
into the idle loop. Which one to use depends on the state of the engine.
Introduce new mechanisms for waking a particular engine. For example the
engine that last executed a context that is now runnable.
Change the algorithm for selecting which context to run, try to select
contexts that where last used on the current engine to avoid cache misses.
Use an engine's victim counter rather than a global victim counter when
trying to steal work.
Introduce some conditionally-compiled code that can be used to profile how
quickly new contexts can be created.
Rename MR_init_thread_stuff and MR_finalize_thread_stuff. The term thread
has been replaced with context since they're in mercury_context.c. This
allows the creation of a new function MR_init_thread_stuff() in
mercury_thread.c I also found the mismatch between the function names and
file name confusing. Move some of the code from MR_init_context_stuff to
the new MR_init_thread_stuff function where it belongs.
Refactor the thread pinning code so that even when thread pinning is
disabled it can be used to allocate each thread to a CPU but not actually
pin them.
Fix some whitespace errors.
runtime/mercury_thread.h:
runtime/mercury_thread.c:
In MR_init_engine():
Allocate an engine id for each engine.
A number of arrays had one slot per engine and where setup using a
lock. Now engine ids are used to index each array and setup is done
without a lock, each engine simply sets up its own slot.
Setup the new per-engine work stealing deques.
The MR_all_engine_bases array has been moved to this file.
Implement a new MR_init_thread_stuff function which initialises some global
variables and locks. Some of MR_init_thread_stuff has been moved from
mercury_context.c
Pin threads as part of MR_init_thread, excluding the primordial thread
which must be pinned before threadscope is initialised.
Add functions for debugging the use of semaphores.
Add corresponding macros that can be used to redirect semaphore calls to
debugging functions as above.
Improved thread debugging code, ensured that stderr is flushed after every
use, and that logging is done after calls return as well as before they're
called.
Conform to changes in mercury_context.h
runtime/mercury_engine.h:
runtime/mercury_engine.c:
Add spark deque and victim counter fields to the MercuryEngine structure.
Make the MR_eng_id field of the MercuryEngine structure available in all
thread safe grades, formerly it was used in only threadscope grades.
Move the MR_all_engine_bases variable to mercury_thread.[ch]
Put a reference to the engine's spark queue into the global array. This is
done here, so that it is after thread pinning because the original plan was
to have this array sorted by CPU rather then engine - we may yet do this in
the future.
Initialise an engine's spark deque when an engine is initialised.
Setup the engine specific threadscope data in mercury_thread.c
Conform to changes in mercury_context.h
runtime/mercury_wrapper.c:
The engine base array is no longer setup here, that code has been moved to
mercury_thread.c
Conform to changes in mercury_context.h and mercury_thread.h
runtime/mercury_wsdeque.h:
runtime/mercury_wsdeque.c:
The original implementation allocated an array for a spark queue only if
one wasn't already allocated, which could happen when a context was reused.
Now that spark queues are associated with engines arrays are always
allocated.
Replaced two macros with a single macro since there's no-longer a
distinction between global and local work queues, all work queues are
local.
runtime/mercury_wsdeque.c:
runtime/mercury_wsdeque.h:
Remove the --worksteal-max-attempts and --worksteal-sleep-msecs options as
they are no-longer used.
runtime/mercury_threadscope.h:
runtime/mercury_threadscope.c:
The MR_EngineId type has been moved to mercury_types.h
Engine IDs are no-longer allocated here, this is done in mercury_thread.c
The run spark and steal spark messages now write 0xFFFFFFFF for the context
id if there is no current context. Previously this would dereference a
null pointer.
runtime/mercury_memory_zones.c:
When checking for an existing memory zone check the free_zones_list
variable before taking a lock. This can prevent taking the lock in cases
where there are no free zones.
Introduce some conditionally-compiled code that can be used to profile how
quickly new contexts can be created.
runtime/mercury_bootstrap.h:
Remove macros that no-longer resolve to functions due to changes in the
runtime system.
runtime/mercury_types.h:
Move the MR_EngineId type from mercury_threadscope.h to mercury_types.h
runtime/mercury_grade.h:
Introduce a parallel grade version number, this change brakes binary
compatibility with existing parallel code.
runtime/mercury_backjump.c:
runtime/mercury_par_builtin.c:
runtime/mercury_mm_own_stacks.c:
library/stm_builtin.m:
library/thread.m:
library/thread.semaphore.m:
Conform to changes in mercury_context.h.
library/io.m:
Make this module compatible with MR_debug_threads.
doc/user_guide.texi
Remove the documentation for the --worksteal-max-attempts and
--worksteal-sleep-msecs options. The documentation was already commented
out.
|
||
|
|
322feaf217 |
Add more threadscope instrumentation.
This change introduces instrumentation that tracks sparks as well as parallel
conjunctions and their conjuncts. This should hopefully give us more
information to diagnose runtime performance issues.
As of this date the ThreadScope program hasn't been updated to read or
understand these new events.
runtime/mercury_threadscope.[ch]:
Added a function and types to register all the threadscope strings from an
array.
Add functions to post the new events (see below).
runtime/mercury_threadscope.c:
Added support for 5 new threadscope events.
Registering a string so that other messages may refer to a constant
string.
Marking the beginning and ends of parallel conjunctions.
Creating a spark for a parallel conjunct.
Finishing a parallel conjunct.
Re-arranged event IDs, I've started allocating IDs from 38 onwards for
general purposes and 100 onwards for mercury specific events after talking
with Duncan Coutts.
Trimmed excess whitespace from the end of lines.
runtime/mercury_context.h:
Post a beginning parallel conjunction message when the sync term for the
parallel conjunction is initialized.
Post an event when creating a spark for a parallel conjunction.
Add a MR_spark_id field to the MR_Spark structure, these identify sparks to
threadscope.
runtime/mercury_context.c:
Post threadscope messages when a spark is about to be executed.
Post a threadscope event when a parallel conjunct is completed.
Add a missing memory barrier.
runtime/mercury_wrapper.[ch]:
Create a global function pointer for the code that registers strings in the
threadscope string table, this is filled in by mkinit.
Call this function pointer immediatly after setting up threadscope.
runtime/mercury_wsdeque.[ch]:
Modify MR_wsdeque_pop_bottom to return the spark pointer (which points onto
the queue) rather then returning a result through a pointer and bool if the
operation was successful. This pointer is safe to dereference until
MR_wsdeque_push_bottom is used.
runtime/mercury_wsdeque.c:
Corrected a code comment.
runtime/mercury_engine.h:
Documented some of the fields of the engine structure that hadn't been
documented.
Add a next spark ID field to the engine structure.
Change the type of the engine ID field to MR_uint_least16_t
compiler/llds.m:
Add a third field to the init_sync_term instruction that stores the index
into the threadscope string table of the static conjunction ID.
Add a field to the c_file structure containing the threadscope string
table.
compiler/layout.m:
Added a new layout array name for the threadscope string table.
compiler/layout_out.m:
Implement code to write out the threadscope string table.
compiler/llds_out_file.m:
Write out the threadscope string table when writing out the c_file.
compiler/par_conj_gen.m:
Create strings that statically identify parallel conjunctions for each
init_sync_term LLDS instruction. These strings are added to a table in the
!CodeInfo and the index of the string is added to the init_sync_term
instruction.
Add an extra instruction after a parallel conjunction to post the message
that the parallel conjunction has completed.
compiler/global_data.m:
Add fields to the global data structure to represent the threadscope string
table and its current size.
Add predicates to update and retrieve the table.
Handle merging of threadscope string tables in global data by allowing the
references to the strings to be remapped.
Refactored remapping code so that a caller such as proc_gen only needs to
call one remapping predicate after merging global data..
compiler/code_info.m:
Add a table of strings for use with threadscope to the code_info_persistent
type.
Modify the code_info_init to initialise the threadscope string table fields.
Add a predicate to get the string table and another to update it.
compiler/proc_gen.m:
Build the containing goal map before code generation for procedures with
parallel conjunctions in a parallel grade. par_conj_gen.m depends on this.
Conform to changes in code_info.m and global_data.m
compiler/llds_out_instr.m:
Write out the extra parameter in the init_sync_term instruction.
compiler/dupelim.m:
compiler/dupproc.m:
compiler/exprn_aux.m:
compiler/global_data.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/llds_to_x86_64.m:
compiler/mercury_compile_llds_back_end.m:
compiler/middle_rec.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/peephole.m:
compiler/reassign.m:
compiler/use_local_vars.m:
Conform to changes in llds.m
compiler/opt_debug.m:
Conform to changes in layout.m
compiler/mercury_compile_llds_back_end.m:
Fix some trailing whitespace.
util/mkinit.c:
Build an initialisation function that registers all the strings in
threadscope string tables.
Correct the layout of a comment.
|
||
|
|
edc230406e |
Fix a number of errors and warnings in the runtime picked up by GCC 4.x in
parallel and threadscope grades.
We had been using types with the wrong signedness well calling atomic operations.
GCC 4.x also picked up an error where #elif was used instead of #else.
While testing these changes on a 32bit system more bugs where found on the i386
architecture and on AMD brand processors.
runtime/mercury_atomic_ops.h:
runtime/mercury_atomic_ops.c:
Add unsigned variants of the following atomic operations:
increment,
add,
add_and_fetch,
dec_and_is_zero,
Add a signed variant for compare and swap.
Rename the MR_atomic_dec_<type>_and_is_zero operation to move the type to
the end of the name.
Use volatile storage in the MR_Stats structure.
A 32bit machine cannot do atomic operations on 64bit values and MR_Stats
must use 64bit values. Therefore 64bit values in the MR_Stats structure
are now protected by a lock on 32bit machines.
runtime/mercury_atomic_ops.h:
Fix a typeo in the i386 version of MR_atomic_dec_and_is_zero_uint().
runtime/mercury_atomic_ops.c:
AMD CPUs do not conform to Intel's specification for being able to
extract the CPU clock speed from the brand string. When we cannot
determine the CPU's clock speed then we write out threadscope
timestamps in raw clock cycles rather than nanoseconds.
On i386 machines the ebx register is used to implement PIC code,
however the CPUID instruction uses it to output information. Save
this register on C's stack while we issue CPUID and retrieve the
result in ebx.
We now pass native machine sized values to the inline assembler code
that implements RDTSC and RDTSCP.
Fix commenting style in some places.
runtime/mercury_atomic_ops.c:
Fix some incorrect C preprocessor code for conditional compilation.
runtime/mercury_grade.h:
Increment binary compatibility number. This should have been done in a
prior change when the MR_runnext macro changed which broke binary
compatibility in the parallel low-level C grades.
runtime/mercury_context.h:
In MR_SyncTerm_Struct use an unsigned value for the number of conjuncts
remaining before the conjunction is complete.
runtime/mercury_threadscope.c:
Record raw cpu clock ticks rather than milliseconds when we don't
know the processor's clock speed.
runtime/mercury_context.c:
runtime/mercury_wsdeque.h:
runtime/mercury_wsdeque.c:
Conform to changes in mercury_atomic_ops.h
|
||
|
|
fa80b9a01a |
Make the parallel conjunction execution mechanism more efficient.
Branches: main Make the parallel conjunction execution mechanism more efficient. 1. Don't allocate sync terms on the heap. Sync terms are now allocated in the stack frame of the procedure call which originates a parallel conjunction. 2. Don't allocate individual sparks on the heap. Sparks are now stored in preallocated, growing arrays using an algorithm that doesn't use locks. 3. Don't have one mutex per sync term. Just use one mutex to protect concurrent accesses to all sync terms (it's is rarely needed anyway). This makes sync terms smaller and saves initialising a mutex for each parallel conjunction encountered. 4. We don't bother to acquire the global sync term lock if we know a parallel conjunction couldn't be executing in parallel. In a highly parallel program, the majority of parallel conjunctions will be executed sequentially so protecting the sync terms from concurrent accesses is unnecessary. par_fib(39) is ~8.4 times faster (user time) on my laptop (Linux 2.6, x86_64), which is ~3.5 as slow as sequential execution. configure.in: Update the configuration for a changed MR_SyncTerm structure. compiler/llds.m: Make the fork instruction take a second argument, which is the base stack slot of the sync term. Rename it to fork_new_child to match the macro name in the runtime. compiler/par_conj_gen.m: Change the generated code for parallel conjunctions to allocate sync terms on the stack and to pass the sync term to fork_new_child. compiler/dupelim.m: compiler/dupproc.m: compiler/exprn_aux.m: compiler/global_data.m: compiler/jumpopt.m: compiler/livemap.m: compiler/llds_out.m: compiler/llds_to_x86_64.m: compiler/middle_rec.m: compiler/opt_debug.m: compiler/opt_util.m: compiler/reassign.m: compiler/use_local_vars.m: Conform to the change in the fork instruction. compiler/liveness.m: compiler/proc_gen.m: Disable use of the parallel conjunction operator in the compiler as older versions of the compiler will generate code incompatible with the new runtime. runtime/mercury_context.c: runtime/mercury_context.h: Remove the next pointer field from MR_Spark as it's no longer needed. Remove the mutex from MR_SyncTerm. Add a field to record if a spark belonging to the sync term was scheduled globally, i.e. if the parallel conjunction might be executed in parallel. Define MR_SparkDeque and MR_SparkArray. Use MR_SparkDeques to hold per-context sparks and global sparks. Change the abstract machine instructions MR_init_sync_term, MR_fork_new_child, MR_join_and_continue as per the main change log. Use a preprocessor macro MR_LL_PARALLEL_CONJ as a shorthand for !MR_HIGHLEVEL_CODE && MR_THREAD_SAFE. Take the opportunity to clean things up a bit. runtime/mercury_wsdeque.c: runtime/mercury_wsdeque.h: New files containing an implementation of work-stealing deques. We don't do work stealing yet but we use the underlying data structure. runtime/mercury_atomic.c: runtime/mercury_atomic.h: New files to contain atomic operations. Currently it just contains compare-and-swap for gcc/x86_64, gcc/x86 and gcc-4.1. runtime/Mmakefile: Add the new files. runtime/mercury_engine.h: runtime/mercury_mm_own_stacks.c: runtime/mercury_wrapper.c: Conform to runtime changes. runtime/mercury_conf_param.h: Update an outdated comment. |