Branches: main, 11.07
Optimise some UTF-8 routines in C grades and fix a few bugs.
library/string.m:
Avoid function calls in unsafe_index, unsafe_index_next, and
unsafe_prev_index in the ASCII case.
Handle illegal code unit at start of string in first_char(in, uo, in)
and first_char(in, uo, uo) modes.
runtime/mercury_string.c:
runtime/mercury_string.h:
Fix a bug where MR_utf8_next would not advance from pos 0. Fortunately
MR_utf8_next is only rarely called, to skip past illegal code units.
Delete redundant initial test in MR_utf8_prev.
Add MR_utf8_get_mb to extract multibyte code points only.
Unroll a loop.
Add MR_utf8_get_next_mb to extract multibyte code points only.
Make MR_utf8_prev_get avoid an extra function call in the ASCII case.
Use MR_Integer consistently for string offsets instead of int.
Branches: main
Implement a new form of memory profiling, which tells the user what memory
is being retained during a program run. This is done by allocating an extra
word before each cell, which is used to "attribute" the cell to an
allocation site. The attribution, or "allocation id", is an address to an
MR_AllocSiteInfo structure generated by the Mercury compiler, giving the
procedure, filename and line number of the allocation, and the type
constructor and arity of the cell that it allocates.
The user must manually instrument the program with calls to
`benchmarking.report_memory_attribution', which forces a GC and summarises
the live objects on the heap using the attributions. The mprof tool is
extended with a new mode to parse and present that data.
Objects which are unattributed (e.g. by hand-written C code which hasn't
been updated) are still accounted for, but show up in profiles as "unknown".
Currently this profiling mode only works in conjunction with the Boehm
garbage collector, though in principle it can work with any memory allocator
for which we can access a list of the live objects. Since term size
profiling relies on the same technique of using an extra word per memory
cell, the two profiling modes are incompatible.
The output from `mprof -s' looks like this:
------ [1] some label ------
cells words cumul procedure / type (location)
14150 38872 total
* 1949/ 13.8% 4872/ 12.5% 12.5% <predicate `parser.parse_rest/7' mode 0>
975/ 6.9% 1950/ 5.0% list.list/1 (parser.m:502)
487/ 3.4% 1948/ 5.0% term.term/1 (parser.m:501)
487/ 3.4% 974/ 2.5% term.const/0 (parser.m:501)
* 1424/ 10.1% 4272/ 11.0% 23.5% <predicate `parser.parse_simple_term_2/6' mode 0>
708/ 5.0% 2832/ 7.3% term.term/1 (parser.m:643)
708/ 5.0% 1416/ 3.6% term.const/0 (parser.m:643)
...
boehm_gc/alloc.c:
boehm_gc/include/gc.h:
boehm_gc/misc.c:
boehm_gc/reclaim.c:
Add a callback function to be called for every live object after a GC.
Add a function to write out the GC_size_map array.
compiler/layout.m:
Define the alloc_site_info type which is equivalent to the
MR_AllocSiteInfo C structure.
Add alloc_site_array as a kind of "layout" array.
compiler/llds.m:
Add allocation sites to `cfile' structure.
Replace TypeMsg argument (which was also for profiling) on `incr_hp'
instructions by an allocation site identifier.
Add a new foreign_proc_component for allocation site ids.
compiler/code_info.m:
compiler/global_data.m:
compiler/proc_gen.m:
Keep the set of allocation sites in the code_info and global_data
structures.
compiler/unify_gen.m:
Add allocation sites to LLDS allocation instructions.
compiler/layout_out.m:
compiler/llds_out_file.m:
compiler/llds_out_instr.m:
Output MR_AllocSiteInfo arrays in generated C files.
Output code to register the MR_AllocSiteInfo array with the Mercury
runtime.
Output allocation site ids for memory allocation instructions.
compiler/llds_out_util.m:
Add allocation sites to llds_out_info.
compiler/pragma_c_gen.m:
compiler/ml_foreign_proc_gen.m:
Generate a macro MR_ALLOC_ID which resolves to an allocation site
structure, for every foreign_proc whose C code contains the string
"MR_ALLOC_ID". This is to be used by hand-written C code which
allocates memory.
MR_PROC_LABELs are retained for backwards compatibility. Though
they were introduced for profiling, they seem to have been co-opted
for printf-debugging since then.
compiler/ml_global_data.m:
Add allocation site structures to the MLDS global data.
compiler/mlds.m:
compiler/ml_unify_gen.m:
Add allocation site id to `new_object' instruction.
compiler/mlds_to_c.m:
Output allocation site arrays and allocation ids in high-level C code.
Output a call to register the allocation site array with the Mercury
runtime.
Delete an unused predicate.
compiler/exprn_aux.m:
compiler/jumpopt.m:
compiler/livemap.m:
compiler/mercury_compile_llds_back_end.m:
compiler/middle_rec.m:
compiler/ml_accurate_gc.m:
compiler/ml_elim_nested.m:
compiler/ml_optimize.m:
compiler/ml_util.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_gcc.m:
compiler/mlds_to_il.m:
compiler/mlds_to_java.m:
compiler/mlds_to_managed.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/use_local_vars.m:
compiler/var_locn.m:
Conform to changes.
compiler/pickle.m:
compiler/prog_event.m:
compiler/timestamp.m:
Conform to changes in memory allocation macros.
library/benchmarking.m:
Add the `report_memory_attribution' instrumentation predicates.
Conform to changes to MR_memprof_record.
library/array.m:
library/bit_buffer.m:
library/bitmap.m:
library/construct.m:
library/deconstruct.m:
library/dir.m:
library/io.m:
library/mutvar.m:
library/store.m:
library/string.m:
library/thread.semaphore.m:
library/version_array.m:
Use attributed memory allocation throughout the standard library so
that objects don't show up in the memory profile as "unknown".
Replace MR_PROC_LABEL by MR_ALLOC_ID.
mdbcomp/program_representation.m:
mdbcomp/rtti_access.m:
Replace MR_PROC_LABEL by MR_ALLOC_ID.
profiler/Mercury.options:
profiler/globals.m:
profiler/mercury_profile.m:
profiler/options.m:
profiler/output.m:
profiler/snapshots.m:
Add a new mode to `mprof' to parse and present the data from
`Prof.Snapshots' files.
Add options for the new profiling mode.
profiler/process_file.m:
Fix a typo.
runtime/mercury_conf_param.h:
#define MR_MPROF_PROFILE_MEMORY_ATTRIBUTION if memory profiling
is enabled and we are using Boehm GC.
runtime/mercury.h:
Make MR_new_object take an allocation id argument.
Conform to changes in memory allocation macros.
runtime/mercury_memory.c:
runtime/mercury_memory.h:
runtime/mercury_types.h:
Define MR_AllocSiteInfo.
Add memory allocation functions and macros which take into the
account the additional word necessary for the new profiling mode.
These should be used in preferences to the raw memory allocation
functions wherever possible so that objects do not show up in the
profile as "unknown".
Add analogues of realloc/free which take into account the offset
introduced by the attribution word.
Add function versions of the MR_new_object macros, which can't be
written in standard C. They are only used when necessary.
Add built-in allocation site ids, to be used in the runtime and
other hand-written code when context-specific ids are unavailable.
runtime/mercury_heap.h:
Make MR_tag_offset_incr_hp_msg and MR_tag_offset_incr_hp_atomic_msg
allocate an extra word when memory attribution is desired, and store
the allocation id there.
Similarly for MR_create{1,2,3}_msg.
Replace proclabel arguments in allocation macros by alloc_id
arguments.
Replace MR_hp_alloc_atomic by MR_hp_alloc_atomic_msg. It was only
used for boxing floats.
Conform to change to MR_new_object macro.
runtime/mercury_bootstrap.h:
Delete obsolete macro hp_alloc_atomic.
runtime/mercury_heap_profile.c:
runtime/mercury_heap_profile.h:
Add the code to summarise the live objects on the Boehm GC heap and
writes out the data to `Prof.Snapshots', for display by mprof.
Don't store the procedure name in MR_memprof_record: the procedure
address is enough and faster to compare.
runtime/mercury_prof.c:
Finish and close the `Prof.Snapshots' file when the program
terminates.
Conform to changes in MR_memprof_record.
runtime/mercury_misc.h:
Add a macro to expand to the name of the allocation sites array
in LLDS grades.
runtime/mercury_bitmap.c:
runtime/mercury_bitmap.h:
Pass allocation id through bitmap allocation functions.
Delete unused function MR_string_to_bitmap.
runtime/mercury_string.h:
Add MR_make_aligned_string_copy_msg.
Make string allocation macros take allocation id arguments.
runtime/mercury.c:
runtime/mercury_array_macros.h:
runtime/mercury_context.c:
runtime/mercury_deconstruct.c:
runtime/mercury_deconstruct_macros.h:
runtime/mercury_dlist.c:
runtime/mercury_engine.c:
runtime/mercury_float.h:
runtime/mercury_hash_table.c:
runtime/mercury_ho_call.c:
runtime/mercury_label.c:
runtime/mercury_prof_mem.c:
runtime/mercury_stacks.c:
runtime/mercury_stm.c:
runtime/mercury_string.c:
runtime/mercury_thread.c:
runtime/mercury_trace_base.c:
runtime/mercury_trail.c:
runtime/mercury_type_desc.c:
runtime/mercury_type_info.c:
runtime/mercury_wsdeque.c:
Use attributed memory allocation throughout the runtime so that
objects don't show up in the profile as "unknown".
runtime/mercury_memory_zones.c:
Attribute memory zones to the Mercury runtime.
runtime/mercury_tabling.c:
runtime/mercury_tabling.h:
Use attributed memory allocation macros for tabling structures.
Delete unused MR_table_realloc_* and MR_table_copy_bytes macros.
runtime/mercury_deep_copy_body.h:
Try to retain the original attribution word when copying values.
runtime/mercury_ml_expand_body.h:
Conform to changes in memory allocation macros.
runtime/mercury_tags.h:
Replace proclabel arguments by alloc_id arguments in allocation macros.
runtime/mercury_wrapper.c:
If memory attribution is enabled, tell Boehm GC that pointers may be
displaced by an extra word.
trace/mercury_trace.c:
trace/mercury_trace_tables.c:
Conform to changes in memory allocation macros.
extras/net/tcp.m:
extras/solver_types/library/any_array.m:
extras/trailed_update/tr_array.m:
Conform to changes in memory allocation macros.
doc/user_guide.texi:
Document the new profiling mode.
doc/reference_manual.texi:
Update a commented out example.
Branches: main
Improve Unicode support.
Declare that we use the Unicode character set, and UTF-8 or UTF-16 for the
internal string representation (depending on the backend). User code may be
written to those assumptions. Other external encodings can be supported in
the future by translating to/from Unicode internally.
The `char' type now represents a Unicode code point.
NOTE: questions about how to handle unpaired surrogate code points, etc. have
been left for later.
library/char.m:
Define a `char' to be a Unicode code point and extend ranges
appropriately.
Add predicates: to_utf8, to_utf16, is_surrogate, is_noncharacter.
Update some documentation.
library/io.m:
Declare I/O predicates on text streams to read/write code points, not
ambiguous "characters". Text files are expected to use UTF-8 encoding.
Supporting other encodings is for future work.
Update the C and Erlang implementations to understand UTF-8 encoding.
Update Java and C# implementations to read/write code points (Mercury
char) instead of UTF-16 code units.
Add `may_not_duplicate' attributes to some foreign_procs.
Improve Erlang implementations of seeking and getting the stream size.
library/string.m:
Declare the string representations, as described earlier.
Distinguish between code units and code points everywhere.
Existing functions and predicates which take offset and length
arguments continue to take them in terms of code units.
Add procedures: count_code_units, count_codepoints, codepoint_offset,
to_code_unit_list, from_code_unit_list, index_next, unsafe_index_next,
unsafe_prev_index, unsafe_index_code_unit, split_by_codepoint,
left_by_codepoint, right_by_codepoint, substring_by_codepoint.
Make index, index_det call error/1 if an illegal sequence is detected,
as they already do for invalid offsets.
Clarify that is_all_alpha, is_all_alnum_or_underscore,
is_alnum_or_underscore only succeed for the ASCII characters under each
of those categories.
Clarify that whitespace stripping functions only strip whitespace
characters in the ASCII range.
Add comments about the future treatment of surrogate code points
(not yet implemented).
Use Mercury format implementation when necessary instead of `sprintf'.
The %c specifier does not work for code points which require multi-byte
representation. The field width modifier for %s only works if the
string contains only single-byte code points.
library/lexer.m:
Conform to string encoding changes.
Simplify code dealing with \uNNNN escapes now that encoding/decoding
is handled by the string module.
library/term_io.m:
Allow code points above 126 directly in Mercury source.
NOTE: \x and \o codes are treated as code points by this change.
runtime/mercury_types.h:
Redefine `MR_Char' to be `int' to hold a Unicode code point.
`MR_String' has to be defined as a pointer to `char' instead of a
pointer to `MR_Char'. Some C foreign code will be affected by this
change.
runtime/mercury_string.c:
runtime/mercury_string.h:
Add UTF-8 helper routines and macros.
Make hash routines conform to type changes.
compiler/c_util.m:
Fix output_quoted_string_lang so that it correctly outputs non-ASCII
characters for each of the target languages.
Fix quote_char for non-ASCII characters.
compiler/elds_to_erlang.m:
Write out code points above 126 normally instead of using escape
syntax.
Conform to string encoding changes.
compiler/mlds_to_cs.m:
Change Mercury `char' to be represented by C# `int'.
compiler/mlds_to_java.m:
Change Mercury `char' to be represented by Java `int'.
doc/reference_manual.texi:
Uncomment description of \u and \U escapes in string literals.
Update description of C# and Java representations for Mercury `char'
which are now `int'.
tests/debugger/tailrec1.m:
Conform to renaming.
tests/general/string_replace.exp:
tests/general/string_replace.m:
Test non-ASCII characters to string.replace.
tests/general/string_test.exp:
tests/general/string_test.m:
Test non-ASCII characters to string.duplicate_char,
string.pad_right, string.pad_left, string.format_table.
tests/hard_coded/char_unicode.exp:
tests/hard_coded/char_unicode.m:
Add test for new procedures in `char' module.
tests/hard_coded/contains_char_2.m:
Test non-ASCII characters to string.contains_char.
tests/hard_coded/nonascii.exp:
tests/hard_coded/nonascii.m:
tests/hard_coded/nonascii_gen.c:
Add code points above 255 to this test case.
Change test data encoding to UTF-8.
tests/hard_coded/string_class.exp:
tests/hard_coded/string_class.m:
Add test case for string.is_alpha, etc.
tests/hard_coded/string_codepoint.exp:
tests/hard_coded/string_codepoint.exp2:
tests/hard_coded/string_codepoint.m:
Add test case for new string procedures dealing with code points.
tests/hard_coded/string_first_char.exp:
tests/hard_coded/string_first_char.m:
Add test case for all modes of string.first_char.
tests/hard_coded/string_hash.m:
Don't use buggy random.random/5 predicate which can overflow on
a large range (such as the range of code points).
tests/hard_coded/string_presuffix.exp:
tests/hard_coded/string_presuffix.m:
Add test case for string.prefix, string.suffix, etc.
tests/hard_coded/string_set_char.m:
Test non-ASCII characters to string.set_char.
tests/hard_coded/string_strip.exp:
tests/hard_coded/string_strip.m:
Test non-ASCII characters to string stripping procedures.
tests/hard_coded/string_sub_string_search.m:
Test non-ASCII characters to string.sub_string_search.
tests/hard_coded/unicode_test.exp:
Update expected output due to change of behaviour of
`string.to_char_list'.
tests/hard_coded/unicode_test.m:
Test non-ASCII character in separator string argument to
string.join_list.
tests/hard_coded/utf8_io.exp:
tests/hard_coded/utf8_io.m:
Add tests for UTF-8 I/O.
tests/hard_coded/words_separator.exp:
tests/hard_coded/words_separator.m:
Add test case for `string.words_separator'.
tests/hard_coded/Mmakefile:
Add new test cases.
Make special_char test case run on all backends.
tests/hard_coded/special_char.exp:
tests/valid/mercury_java_parser_follow_code_bug.m:
Reencode these files in UTF-8.
NEWS:
Add a news entry.
Branches: main, 11.01
runtime/mercury_string.c:
runtime/mercury_string.h:
Add missing non-macro definitions of MR_hash_string2, MR_hash_string3.
These are required for non-gcc compilers.
Estimated hours taken: 0.5
Branches: main
runtime/*.c:
Convert all remaining C source files to four-space indentation.
Fix some deviations from our style guide.
Estimated hours taken: 0.5
Branches: main, release
runtime/mercury_string.h:
runtime/mercury_string.c:
Fix bug which caused the results of MR_hash_string() and
string__hash to differ when `sizeof(int) != sizeof(MR_Integer)'.
runtime/mercury_misc.c:
MR_hash_string() was defined here for non-GCC compilers
for historical reasons. Move it to mercury_string.c.
Estimated hours taken: 4
Branches: main
Add MR_ prefixes to the remaining non-prefixed symbols.
This change will require all workspaces to be updated
The compiler will start generating references to MR_TRUE,
MR_bool, etc., which are not defined in the old runtime
header files.
runtime/mercury_std.h:
Add MR_ prefixes to bool, TRUE, FALSE, max, min,
streq, strdiff, strtest, strntest, strneq, strndiff,
strntest, NO_RETURN.
Delete a commented out definition of `reg'.
runtime/mercury_tags.h:
Add an MR_ prefix to TAGBITS.
configure.in:
runtime/mercury_goto.h:
runtime/machdeps/i386_regs.h/mercury_goto.h:
Add an MR_ prefix to PIC.
runtime/mercury_conf_param.h:
Allow non-prefixed PIC and HIGHTAGS to be defined on
the command line.
runtime/mercury_bootstrap.h:
Add backwards compatibility definitions.
RESERVED_MACRO_NAMES:
Remove the renamed macros.
compiler/export.m:
compiler/ml_code_gen.m:
Use MR_bool rather than MR_Bool (MR_Bool is
meant to be for references to the Mercury type
bool__bool).
runtime/mercury_types.h:
Add a comment the MR_Bool is for references to
bool__bool.
*/*.c:
*/*.h:
*/*.m:
Add MR_ prefixes.
Estimated hours taken: 2.5
Branches: main
Add MR_ prefixes to uses of configuration macros.
Bootcheck now succeeds with MR_NO_CONF_BACKWARDS_COMPAT.
Mmake.common.in:
Define MR_NO_CONF_BACKWARDS_COMPAT when checking
for namespace cleanliness.
RESERVED_MACRO_NAMES:
Remove the configuration macros.
runtime/mercury_conf_bootstrap.h:
Remove a duplicate definition of BOXED_FLOAT.
configure.in:
*/*.c:
*/*.h:
*/*.m:
Add MR_ prefixes.
Estimated hours taken: 1
runtime/mercury_string.c:
Fix a bug where MR_make_string introduced a space leak under windows
when the created string was greater then the fixed buffer size as
the conditional compilation wasn't including the call to MR_free.
Estimated hours taken: 10
Make everything in the runtime use MR_ prefixes, and make the compiler
bootstrap with -DMR_NO_BACKWARDS_COMPAT.
runtime/mercury_*.[ch]
Add MR_ prefixes to all functions, global variables and almost all
macros that could pollute the namespace. The (intentional) exceptions
are
1. some function, variable, type and label names that already start
with MR_, mercury_, Mercury or _entry;
2. some standard C macros in mercury_std.h;
3. the macros used in autoconfiguration (since they are used in scripts
as well as the runtime, the MR_ prefix may not be appropriate for
those).
In some cases, I deleted things instead of adding prefixes
if the "things" were obsolete and not user visible.
runtime/mercury_bootstrap.h:
Provide MR_-less forms of the macros for bootstrapping and for
backward compatibility for user code.
runtime/mercury_debug.[ch]:
Add a FILE * parameter to a function that needs it.
compiler/code_info.m:
compiler/export.m:
compiler/fact_table.m:
compiler/llds.m:
compiler/llds_out.m:
compiler/pragma_c_gen.m:
compiler/trace.m:
Add MR_ prefixes to the C code generated by the compiler.
library/*.m:
Add MR_ prefixes to handwritten code.
trace/mercury_trace_*.c:
util/mkinit.c:
Add MR_ prefixes as necessary.
extras/concurrency/semaphore.m:
Add MR_ prefixes as necessary.
Estimated hours taken: 1.5
Fix a bug in petdr's recent changes to string__format that
broke things in non-gc grades on sparcs.
runtime/mercury_string.c:
In MR_make_string(), wrap calls to restore/save_transient_hp()
around the call to MR_make_aligned_string_msg(), as mentioned
in the documentation for MR_make_aligned_string_msg().
runtime/mercury_string.h:
Document that calls to MR_make_string need to be wrapped inside
calls to save/restore_transient_hp().
library/string.m:
Wrap calls to MR_make_string inside calls to
save/restore_transient_hp().
Estimated hours taken: 8
runtime/mercury_string.c:
Fix two bugs:
- the variable size should hold the size of the fixed array initially
- when resizing the array record the pointer to the new resized array.
Estimated hours taken: 0.5
runtime/mercury_string.c:
Modify MR_make_string so that it first tries printing into a
fixed size buffer. Only if that buffer is not big enough do we
allocate a buffer on the heap.
Estimated hours taken: 0.5
Check for the availability of _vsnprintf.
configure.in:
Check for the availability of _vsnprintf.
runtime/mercury_conf.h.in:
Define HAVE__VSNPRINTF.
runtime/mercury_string.c:
If vsnprintf isn't available and _vsnprintf is, use _vsnprintf
in place of vsnprintf.
Estimated hours taken: 24
Reimplement string__format so that it uses less memory and it's
behaviour is closer to the C standard.
library/string.m:
Reimplement string__format. The new implementation parses the
format string building a control structure. The control structure
is then traversed to generate the output.
Implement string__to_char_list in C.
Implement string__append_list in C as this results in no garbage
being generated due to the creation of the intermediate strings,
which can be quite a significant saving.
Remove the string__append_list(out, in) mode as it generates an
infinite number of solutions.
Use MR_allocate_aligned_string_msg for all memory allocations.
runtime/mercury_string.h:
Add a new macro MR_allocate_aligned_string_msg which allocates
word aligned space for storage of a string in.
runtime/mercury_string.c:
Define a new function MR_make_string which provides sprintf like
functionality for creating MR_Strings. This function is safe from
buffer overflows providing the vsnprintf function is available.
runtime/Mmakefile:
Add mercury_string.c
configure.in:
Check for the vsnprintf function.
runtime/mercury_conf.h.in:
Define HAVE_VSNPRINTF.