library/array.m:
library/char.m:
library/float.m:
library/int.m:
library/int16.m:
library/int32.m:
library/int64.m:
library/int8.m:
library/list.m:
library/one_or_more.m:
library/string.m:
library/tree234.m:
library/uint.m:
library/uint16.m:
library/uint32.m:
library/uint64.m:
library/uint8.m:
library/version_array.m:
Mark the X_to_doc function in each of these modules as obsolete,
and make it a forwarding function to the actual implementation
in pretty_printer.m. The intention is that when these forwarding
functions are eventually removed, this will also remove the dependency
of these modules on pretty_printer.m. This should help at least some
of these modules escape the giant SCC in the library's dependency graph.
(It does not make sense that a library module that adds code to increment
an int thereby becomes dependent on pretty_printer.m through int.m.)
library/pretty_printer.m:
Move all the X_to_doc functions from the above modules here.
Fix the one_or_more_to_doc function, which was
- missing the comma between the two arguments of the one_or_more
function symbol, and
- would print "..., ...]" instead of just "...]" at the end of the
tail list when that list exceeded the limits of the specified pp_params.
Rename one of the moved types along with its function symbols,
to reduce ambiguity.
Put arrays before their indexes in the argument lists of some of
the moved functions.
Some of the moved X_to_doc functions for compound types returned
a doc that had an indent wrapper. These indents differed between the
various X_to_doc functions without any visible reason, but they are
also redundant. The callers can trivially add such wrappers if they
want to, but taking them off, if they want them off, is harder.
Eliminate the problem by deleting all such indent wrappers.
Add formatters for the intN, uintN and one_or_more types to the
default formatter map. Their previous absence was an oversight.
Add a function, get_formatter_map_entry_types, that returns the ids
of the types in the formatter_map given to the function. It is intended
for tests/hard_coded/test_pretty_printer_defaults.m, but is exported
for anyone to use.
tests/hard_coded/test_pretty_printer_defaults.{m,exp}:
Use get_formatter_map_entry_types to print the default formatter map
in a format that is much more easily readable.
NEWS:
Announce all the user-visible changes above.
The recent change to sparse_bitsets broke the lex library in extras.
Specifically, we now now need to make characters an instance of the
uenum typeclass. This diff does so.
library/char.m:
Add predicates and functions for converting between unsigned integers
and characters.
Make characters an instance of the uenum typeclass.
tests/hard_coded/Mmakefile:
tests/hard_coded/char_uint_conv.{m,exp,exp2}:
Add a test of the above conversions.
NEWS:
Announce the additions.
extras/lex/lex.m:
Conform to recent changes.
library/char.m:
Add to_utf8_uint8/2 and to_utf16_uint16/2. These do the
same thing as the existing to_utf8/2 and to_utf16/2 predicates,
but return the code units as uint8s or uint16s respectively.
Update the comment at the head of the module about which version
of the Unicode standard this module conforms to.
(For the purposes of this module, nothing has changed between
versions 10 and 13.)
NEWS:
Announce the additions.
library/string.m:
Test whether base string is between 2 and 36 just once.
library/char.m:
Provide a predicate to convert a character to a base digit
that does not check whether the base is between 2 and 36,
leaving that to the caller.
Make the error message for bases not in that range more informative,
in the predicates that *do* check the base.
NEWS:
Announce unsafe_base_digit_to_int.
library/*.m:
Delete Erlang foreign code and foreign types.
Delete documentation specific to Erlang targets.
library/deconstruct.m:
Add pragma no_determinism_warning to allow functor_number_cc/3
to compile for now.
library/Mercury.options:
Delete workaround only needed when targetting Erlang.
browser/listing.m:
mdbcomp/rtti_access.m:
Delete Erlang foreign code and foreign types.
library/*.m:
Specifically, delete any predicates and functions whose `pragma obsolete'
dates from 2018 or before. Keep the ones that were obsoleted
only this year or last year.
NEWS:
Announce the changes.
tests/debugger/io_tab_goto.m:
tests/debugger/tabled_read.m:
tests/declarative_debugger/io_stream_test.m:
tests/declarative_debugger/tabled_read_decl.m:
tests/declarative_debugger/tabled_read_decl_goto.m:
tests/general/array_test.m:
tests/hard_coded/mutable_init_impure.m:
tests/hard_coded/remove_file.m:
tests/tabling/mercury_java_parser_dead_proc_elim_bug.m:
tests/tabling/mercury_java_parser_dead_proc_elim_bug2.m:
tests/valid/mercury_java_parser_follow_code_bug.m:
Replace references to predicates and functions that this diff deletes
with their suggested replacements.
In several test cases, bring the programming style up to date.
tests/hard_coded/shift_test.{m,exp}:
Most of this test case tested the now-deleted legacy shift operations.
Replace these with tests of their non-legacy versions, including
testing for the expected exceptions.
tests/hard_coded/shift_test.{m,exp}:
Don't pass --no-warn-obsolete when compiling shift_test.m anymore.
Currently, the hash functions for (some of) the primitive types are defined in
three places:
1. The hash_table module.
2. The version_hash_table module (duplicates of the above).
3. Some (but not all) of the library modules for the primitive types.
This change makes the library module for a primitive type provide the hash
function for that type and deprecates the versions in the hash table modules.
Additionally, deprecate the "generic" has functions in the hash table modules.
library/hash_table.m:
library/version_hash_table.m:
As above.
library/char.m:
library/int.m:
library/uint.m:
Add hash/1 and hash/2.
library/float.m:
Add hash/2.
library/robdd.m:
Replace a call to the deprecated function.
NEWS:
Announce the above additions and deprecations.
tests/hard_coded/hash_table_delete.m:
tests/hard_coded/hash_table_test.m:
tests/hard_coded/version_hash_table_delete.m:
tests/hard_coded/version_hash_table_test.m:
Conform to the above change.
Also, throw software_error/1 exceptions rather than directly throwing strings
in a few spots.
Undo special Ralph-style formatting.
library/*.:
As above.
tests/hard_coded/array2d_from_array.exp:
tests/hard_coded/array2d.exp:
tests/hard_coded/test_injection.exp:
Update to conform with the above change.
browser/collect_lib.m:
browser/declarative_execution.m:
browser/dl.m:
browser/io_action.m:
compiler/make.util.m:
compiler/pickle.m:
compiler/process_util.m:
compiler/prog_event.m:
library/array.m:
library/benchmarking.m:
library/bit_buffer.m:
library/builtin.m:
library/char.m:
library/deconstruct.m:
library/dir.m:
library/erlang_rtti_implementation.m:
library/int.m:
library/int16.m:
library/int32.m:
library/int64.m:
library/int8.m:
library/io.m:
library/math.m:
library/mutvar.m:
library/private_builtin.m:
library/profiling_builtin.m:
library/rtti_implementation.m:
library/store.m:
library/string.format.m:
library/string.m:
library/table_builtin.m:
library/term_size_prof_builtin.m:
library/thread.m:
library/time.m:
library/type_desc.m:
library/uint16.m:
library/uint32.m:
library/uint64.m:
library/uint8.m:
ssdb/ssdb.m:
As above. This mostly involved two things.
The first was grouping foreign_procs by predicate instead of by language.
In a few cases, this revealed that some predicates *had* no foreign_proc
for a language, while related predicates did have one that just aborted
if called. This diff adds similar aborting foreign_procs to predicate/
language combinations that were missing them, when this seemed obviously
the right thing to do.
The second was moving pragmas about a predicate from the middle of the
block of clauses of that predicate to the start of that block.
Discussion of these changes can be found on the Mercury developers
mailing list archives from June 2018.
COPYING.LIB:
Add a special linking exception to the LGPL.
*:
Update references to COPYING.LIB.
Clean up some minor errors that have accumulated in copyright
messages.
library/assoc_list.m:
library/bag.m:
library/bimap.m:
library/calendar.m:
library/char.m:
library/digraph.m:
library/list.m:
library/map.m:
library/multi_map.m:
library/psqueue.m:
library/rbtree.m:
library/string.m:
library/term.m:
library/tree234.m:
library/type_desc.m:
library/univ.m:
library/varset.m:
Replace most occurrences of "abort" with "throw an exception".
Slightly improve the documentation for map.search, map.lookup,
map.inverse_search.
library/deconstruct.m:
Replace "abort" with "runtime abort" where that is meant.
library/char.m:
Present list of whitespace characters for char.whitespace
in tabular form.
library/io.m:
library/lexer.m:
library/pprint.m:
library/stream.m:
library/string.m:
Mention char.is_whitespace in the documentation of predicates which
use that definition of whitespace characters.
Module qualify some calls to char.is_whitespace for clarity.
library/char.m:
Delete an out-of-date comment: Mercury characters are Unicode
code points, so the set of characters supported and their mapping
to integers is not implementation dependent.
library/char.m:
Add the predicates is_control/1, is_space_separator/1,
is_paragraph_separator/1, is_line_separator/1 and is_private_use/1.
Reword some documentation from "Succeed if ..." to "True iff ...".
This is more consistent with the other documentation in this module.
NEWS:
Announce the additions.
library/bimap.m:
library/bitmap.m:
library/calendar.m:
library/char.m:
library/cord.m:
library/deconstruct.m:
library/diet.m:
library/dir.m:
library/eqvclass.m:
library/map.m:
library/pprint.m:
library/pqueue.m:
library/stream.string_writer.m:
library/term_conversion.m:
library/time.m:
library/type_desc.m:
library/version_array.m:
library/version_array2d.m:
library/version_bitmap.m:
library/version_hash_table.m:
Fix inconsistencies between (a) the order in which functions and predicates
are declared, and (b) the order in which they are defined.
In most of these modules, either the order of the declarations
or the order of the definitions made sense, and I changed the other
to match. In a few modules, neither made sense, so I changed *both*
to an order that *does* make sense (i.e. it has related predicates
together).
In some places, put dividers between groups of related
functions/predicates, to make the groups themselves more visible.
In some places, fix comments or programming style, give some auxiliary
(non-exported) predicates better names, or delete some unneeded module
qualifications.
In some places, have the function form of a procedure forward
the work to the predicate form, instead of vice versa, where this is
more natural (usually because it allows the use of state variables).
However, this is the only kind of "algorithmic" change in this diff;
the rest is just moving code around.
The functions det_int_to_{binary,octal,decimal,hex)_digit/1 were supposed to
have been added at the same time as the corresponding semidet predicates but
weren't: add them.
In the char module, don't call unexpected/3 in cases where the error arises as
a result of bad inputs: throwing an exception in that case *is* expected.
library/char.m:
As above.
NEWS:
Announce the above additions.
samples/e.m:
Avoid a call to an obsolete predicate and update syntax.
For input validation and character encoding transformations
it is essential to check for surrogate characters and their correct
sequence.
NEWS:
Announce the addition of is_(leading|trailing)_surrogate/1.
library/char.m:
Add is_leading_surrogate/1 which succeeds if a character is a
leading surrogate character.
is_trailing_surrogate/1 succeeds if the character is a
trailing surrogate character.
library/string.m:
Avoid unnecessary use of higher order code in the implementation of
string.format. Avoid the unnecessary insertion of empty manifest strings
between actual format conversions.
Represent the optional width and precision in string.format as integers,
not strings, since the strings take up more space and require more time
to create and to process.
Simplify the code implementing word_wrap.
Give some variables better names.
Clarify some documentation.
Call unexpected instead of error in cases where the source of a problem
can only be a software bug inside string.m, and not bad input supplied
from outside of it.
Don't mention the arity of a predicate that is the source of an exception,
since the predicate's identity is unambiguous even without the arity,
and since other exceptions also don't mention arities.
library/require.m:
Add two-argument versions of error/1 and func_error/1, to allow
the name of the predicate and the error message to be passed separately,
possible with the first being specified as $pred. string.m had several
places where these were being appended on the fly.
library/char.m:
Clarify some code.
NEWS:
Mention the additions to require.m.
NOTE: this change does not affect the io module -- I've left that for a
separate change.
library/*.m:
As per the recent change to the coding standard, avoid module
qualification in library interfaces where possible.
Reformat declarations and descriptive comments to better utilise
any space freed up by the above.
(1) The behaviour of digit_to_int/2 was inconsistent with that of is_digit/2.
The former succeeds for all of 0-9, a-z and A-Z while the latter succeeds only
for 0-9 (i.e. it was possible for digit_to_int/2 to succeed for non-decimal
characters, which is not what was intended in many of it uses).
(2) Predicates involving hexadecimal digits were inconsistently named, they
were "hex digits" in one predicate name, "hex chars" in another.
This change ensures that the following operations are supported for binary,
octal, decimal and hexadecimal digits and that we use a consistent naming
scheme for the predicates that implement them:
- testing if a character is a digit of the given base
- conversion to an int
- conversion from an int
In addition, we also add predicates for supporting these operations for user
defined bases, ranging from 2-36.
library/char.m:
Add the predicate is_decimal_digit/1, which is a synonym for is_digit/1.
Add the predicate is_base_digit/2.
Add the predicates int_to_{binary,octal,decimal,hex}_digit/2 and
base_int_to_digit/3.
Add the predicates {binary,octal,decimal,hex}_digit_to_int/2 and
base_digit_to_int/3.
Add det function versions of the above.
Delete the function det_digit_to_int/1 that I added the other day.
Mark the following as obsolete:
- is_hex_digit/2
- int_to_hex_char/2
- int_to_digit/2
- det_int_to_digit/1
- det_int_to_digit/2
Avoid redundant module qualification in the implementation.
Mark some C foreign_procs as not modifying the trail.
Re-order some declarations according to how the coding standard says they
should be ordered.
library/bitmap.m:
library/integer.m:
library/parsing_utils.m:
library/string.m:
compiler/prog_rep_tables.m:
Replace calls to obsolete predicates or functions.
NEWS:
Announce the above changes.
Add note advising users of digit_to_int/2 to check their code for the
problem described above.
tests/hard_coded/Mmakefile:
tests/hard_coded/test_char_digits.m:
tests/hard_coded/test_char_digits.exp:
Add a systematic test for the above predicates.
library/char.m:
Document explicitly the range of code points that char
predicates work with so that there is no ambiguity,
and to give pause to anyone considering to extend them
in future.
Branches: main
library/cord.m:
Add cord.init/0 for consistency with other modules.
library/int.m:
library/char.m:
Group function definitions with the corresponding
predicate definition.
library/version_store.m:
Fix indentation.
NEWS:
Announce the addition of cord.init/0.
Fix some earlier entries.
Branches: main
Improve Unicode support.
Declare that we use the Unicode character set, and UTF-8 or UTF-16 for the
internal string representation (depending on the backend). User code may be
written to those assumptions. Other external encodings can be supported in
the future by translating to/from Unicode internally.
The `char' type now represents a Unicode code point.
NOTE: questions about how to handle unpaired surrogate code points, etc. have
been left for later.
library/char.m:
Define a `char' to be a Unicode code point and extend ranges
appropriately.
Add predicates: to_utf8, to_utf16, is_surrogate, is_noncharacter.
Update some documentation.
library/io.m:
Declare I/O predicates on text streams to read/write code points, not
ambiguous "characters". Text files are expected to use UTF-8 encoding.
Supporting other encodings is for future work.
Update the C and Erlang implementations to understand UTF-8 encoding.
Update Java and C# implementations to read/write code points (Mercury
char) instead of UTF-16 code units.
Add `may_not_duplicate' attributes to some foreign_procs.
Improve Erlang implementations of seeking and getting the stream size.
library/string.m:
Declare the string representations, as described earlier.
Distinguish between code units and code points everywhere.
Existing functions and predicates which take offset and length
arguments continue to take them in terms of code units.
Add procedures: count_code_units, count_codepoints, codepoint_offset,
to_code_unit_list, from_code_unit_list, index_next, unsafe_index_next,
unsafe_prev_index, unsafe_index_code_unit, split_by_codepoint,
left_by_codepoint, right_by_codepoint, substring_by_codepoint.
Make index, index_det call error/1 if an illegal sequence is detected,
as they already do for invalid offsets.
Clarify that is_all_alpha, is_all_alnum_or_underscore,
is_alnum_or_underscore only succeed for the ASCII characters under each
of those categories.
Clarify that whitespace stripping functions only strip whitespace
characters in the ASCII range.
Add comments about the future treatment of surrogate code points
(not yet implemented).
Use Mercury format implementation when necessary instead of `sprintf'.
The %c specifier does not work for code points which require multi-byte
representation. The field width modifier for %s only works if the
string contains only single-byte code points.
library/lexer.m:
Conform to string encoding changes.
Simplify code dealing with \uNNNN escapes now that encoding/decoding
is handled by the string module.
library/term_io.m:
Allow code points above 126 directly in Mercury source.
NOTE: \x and \o codes are treated as code points by this change.
runtime/mercury_types.h:
Redefine `MR_Char' to be `int' to hold a Unicode code point.
`MR_String' has to be defined as a pointer to `char' instead of a
pointer to `MR_Char'. Some C foreign code will be affected by this
change.
runtime/mercury_string.c:
runtime/mercury_string.h:
Add UTF-8 helper routines and macros.
Make hash routines conform to type changes.
compiler/c_util.m:
Fix output_quoted_string_lang so that it correctly outputs non-ASCII
characters for each of the target languages.
Fix quote_char for non-ASCII characters.
compiler/elds_to_erlang.m:
Write out code points above 126 normally instead of using escape
syntax.
Conform to string encoding changes.
compiler/mlds_to_cs.m:
Change Mercury `char' to be represented by C# `int'.
compiler/mlds_to_java.m:
Change Mercury `char' to be represented by Java `int'.
doc/reference_manual.texi:
Uncomment description of \u and \U escapes in string literals.
Update description of C# and Java representations for Mercury `char'
which are now `int'.
tests/debugger/tailrec1.m:
Conform to renaming.
tests/general/string_replace.exp:
tests/general/string_replace.m:
Test non-ASCII characters to string.replace.
tests/general/string_test.exp:
tests/general/string_test.m:
Test non-ASCII characters to string.duplicate_char,
string.pad_right, string.pad_left, string.format_table.
tests/hard_coded/char_unicode.exp:
tests/hard_coded/char_unicode.m:
Add test for new procedures in `char' module.
tests/hard_coded/contains_char_2.m:
Test non-ASCII characters to string.contains_char.
tests/hard_coded/nonascii.exp:
tests/hard_coded/nonascii.m:
tests/hard_coded/nonascii_gen.c:
Add code points above 255 to this test case.
Change test data encoding to UTF-8.
tests/hard_coded/string_class.exp:
tests/hard_coded/string_class.m:
Add test case for string.is_alpha, etc.
tests/hard_coded/string_codepoint.exp:
tests/hard_coded/string_codepoint.exp2:
tests/hard_coded/string_codepoint.m:
Add test case for new string procedures dealing with code points.
tests/hard_coded/string_first_char.exp:
tests/hard_coded/string_first_char.m:
Add test case for all modes of string.first_char.
tests/hard_coded/string_hash.m:
Don't use buggy random.random/5 predicate which can overflow on
a large range (such as the range of code points).
tests/hard_coded/string_presuffix.exp:
tests/hard_coded/string_presuffix.m:
Add test case for string.prefix, string.suffix, etc.
tests/hard_coded/string_set_char.m:
Test non-ASCII characters to string.set_char.
tests/hard_coded/string_strip.exp:
tests/hard_coded/string_strip.m:
Test non-ASCII characters to string stripping procedures.
tests/hard_coded/string_sub_string_search.m:
Test non-ASCII characters to string.sub_string_search.
tests/hard_coded/unicode_test.exp:
Update expected output due to change of behaviour of
`string.to_char_list'.
tests/hard_coded/unicode_test.m:
Test non-ASCII character in separator string argument to
string.join_list.
tests/hard_coded/utf8_io.exp:
tests/hard_coded/utf8_io.m:
Add tests for UTF-8 I/O.
tests/hard_coded/words_separator.exp:
tests/hard_coded/words_separator.m:
Add test case for `string.words_separator'.
tests/hard_coded/Mmakefile:
Add new test cases.
Make special_char test case run on all backends.
tests/hard_coded/special_char.exp:
tests/valid/mercury_java_parser_follow_code_bug.m:
Reencode these files in UTF-8.
NEWS:
Add a news entry.
Branches: main, 10.04
Allow inlining of Java foreign_procs.
This revealed a problem with directly using the `succeeded' flag directly as
the success indicator in Java foreign_procs. When the code of the foreign_proc
becomes a nested function, and after nested functions are eliminated, there may
not be a variable called `succeeded' in that context; it is moved into
environment struct, and the transformation is not able to update handwritten
code to reflect that. The solution is to declare a local variable for the
foreign_proc, let the handwritten code assign that, then assign its final
value to the `succeeded' flag with an MLDS statement.
We take the opportunity to name the local variable `SUCCESS_INDICATOR', in
line with other backends.
compiler/inlining.m:
Allow inlining of Java foreign_procs.
compiler/ml_foreign_proc_gen.m:
In the code generated for semidet Java foreign_procs, declare a local
`SUCCESS_INDICATOR' variable and assign its value to the `succeeded'
flag afterwards.
Add braces to give the foreign_proc variables a limited scope.
compiler/make_hlds_warn.m:
Conform to renaming.
doc/reference_manual.texi:
Update documentation for the renaming of the `succeeded' variable.
library/array.m:
library/bitmap.m:
library/builtin.m:
library/char.m:
library/construct.m:
library/dir.m:
library/exception.m:
library/float.m:
library/int.m:
library/io.m:
library/math.m:
library/private_builtin.m:
library/rtti_implementation.m:
library/string.m:
library/thread.m:
library/time.m:
library/type_desc.m:
library/version_array.m:
Conform to renaming.
Fix problems with Java foreign_procs that may now be copied into other
modules when intermodule optimisation is enabled, some by disallowing
the procedures from being duplicated, some by making referenced
classes/fields `public'.
[Some of the `may_not_duplicate' attributes may not indicate actual
problems, just that it seems not worthwhile inlining calls to the
procedure.]
extras/solver_types/library/any_array.m:
tests/hard_coded/equality_pred_which_requires_boxing.m:
tests/hard_coded/external_unification_pred.m:
tests/hard_coded/java_test.m:
tests/hard_coded/redoip_clobber.m:
tests/hard_coded/user_compare.m:
tests/valid/exported_foreign_type2.m:
tests/warnings/warn_succ_ind.m:
tests/warnings/warn_succ_ind.exp3:
Conform to renaming.
Estimated hours taken: 3
Branches: main
library/lexer.m:
This module has four versions of the get_token predicate, some
of which consume significant time during typical compilations.
This diff factors out some of the code common among them,
and restructures this common code from long sequences
of nested if-then-elses to switches.
This yields an overall speedup of 3.2% on tools/speedtest.
Don't disable vim's wrapmargin functionality.
library/char.m:
Add notes to the predicates which are now linked to code in lexer.m.
Estimated hours taken: 4
Branches: main
Define pretty_printer formatters for some common standard library types.
Include these formatters in the default pretty_printer formatter_map.
Add a useful function to the pretty_printer interface.
library/array.m:
library/char.m:
library/float.m:
library/int.m:
library/list.m:
library/string.m:
library/tree234.m:
Add <type>_to_doc functions.
library/pretty_printer.m:
Added function format_arg/1.
Initialise the default formatter_map to use the <type>_to_doc
functions.
tests/hard_coded/Mmakefile:
tests/hard_coded/test_pretty_printer_defaults.exp:
tests/hard_coded/test_pretty_printer_defaults.m:
Test case.
Estimated hours taken: 16
Branches: main
Implement the TypeCtorInfo RTTI so that generic compare
and unify work via rtti_implementation.
However after discussion with zs, this current design of
RTTI has to be adapted to become more erlang specific, due
to the different data representation on the erlang backend.
The current code will serve as a template for this new design
though, which is why it is being checked in.
Add various erlang library implementations which are
needed to run useful programs when testing the erlang backend.
compiler/elds.m:
Add a type_info_id.
compiler/elds_to_erlang.m:
We now generate type_ctor_info's so call them.
compiler/erl_rtti.m:
After discussions with zs, type_info and pseudo_type_infos
should never occur on the erlang backend as they are needed
for gc and the debugger so throw an exception for them.
Add an implementation of creating a static type_info, but which
isn't used in case we need it again later.
Create type_ctor_info with all the fields except the
TypeFunctors, the TypeLayout and the FunctorNumberMap
compiler/special_pred.m:
Make sure we generate RTTI for the builtin types.
library/builtin.m:
library/char.m:
library/exception.m:
library/float.m:
library/int.m:
library/io.m:
library/lexer.m:
library/math.m:
library/mutvar.m:
library/ops.m:
library/par_builtin.m:
library/private_builtin.m:
library/rtti_implementation.m:
library/solutions.m:
library/store.m:
library/string.m:
library/table_builtin.m:
library/thread.semaphore.m:
library/time.m:
library/type_desc.m:
Erlang implementations of std library functions.
Estimated hours taken: 15
Branches: main
Make all functions which create strings from characters throw an exception
or fail if the list of characters contains a null character.
This removes a potential source of security vulnerabilities where one
part of the program performs checks against the whole of a string passed
in by an attacker (processing the string as a list of characters or using
`unsafe_index' to look past the null character), but then passes the string
to another part of the program or an operating system call that only sees
up to the first null character. Even if Mercury stored the length with
the string, allowing the creation of strings containing nulls would be a
bad idea because it would be too easy to pass a string to foreign code
without checking.
For examples see:
<http://insecure.org/news/P55-07.txt>
<http://www.securiteam.com/securitynews/5WP0B1FKKQ.html>
<http://www.securityfocus.com/archive/1/445788>
<http://www.securityfocus.com/archive/82/368750>
<http://secunia.com/advisories/16420/>
NEWS:
Document the change.
library/string.m:
Throw an exception if null characters are found in
string.from_char_list and string.from_rev_char_list.
Add string.from_char_list_semidet and string.from_rev_char_list_semidet
which fail rather throwing an exception. This doesn't match the
normal naming convention, but string.from_{,rev_}char_list are widely
used, so changing their determinism would be a bit too disruptive.
Don't allocate an unnecessary extra word for each string created by
from_char_list and from_rev_char_list.
Explain that to_upper and to_lower only work on un-accented
Latin letters.
library/lexer.m:
Check for invalid characters when reading Mercury strings and
quoted names.
Improve error messages by skipping to the end of any string
or quoted name containing an error. Previously we just stopped
processing at the error leaving an unmatched quote.
library/io.m:
Make io.read_line_as_string and io.read_file_as_string return
an error code if the input file contains a null character.
Fix an XXX: '\0\' is not recognised as a character constant,
but char.det_from_int can be used to make a null character.
library/char.m:
Explain the workaround for '\0\' not being accepted as a char
constant.
Explain that to_upper and to_lower only work on un-accented
Latin letters.
compiler/layout.m:
compiler/layout_out.m:
compiler/c_util.m:
compiler/stack_layout.m:
compiler/llds.m:
compiler/mlds.m:
compiler/ll_backend.*.m:
compiler/ml_backend.*.m:
Don't pass around strings containing null characters (the string
tables for the debugger). This doesn't cause any problems now,
but won't work with the accurate garbage collector. Use lists
of strings instead, and add the null characters when writing the
strings out.
tests/hard_coded/null_char.{m,exp}:
Change an existing test case to test that creation of a string
containing a null throws an exception.
tests/hard_coded/null_char.exp2:
Deleted because alternative output is no longer needed.
tests/invalid/Mmakefile:
tests/invalid/null_char.m:
tests/invalid/null_char.err_exp:
Test error messages for construction of strings containing null
characters by the lexer.
tests/invalid/unicode{1,2}.err_exp:
Update the expected output after the change to the handling of
invalid quoted names and strings.
Estimated hours taken: 80
Branches: main
Improvements for bitmap.m, to make it useable as a general container
for binary data.
library/bitmap.m:
runtime/mercury_bitmap.c:
runtime/mercury_bitmap.h:
Specialize the representation of bitmaps to an array of unsigned
bytes defined as a foreign type.
This is better than building on top of array(int) because it:
- is better for interfacing with foreign code
- has a more sensible machine-independent comparison order
(same as array(bool))
- avoids storing the size twice
- has more efficient copying, unification, comparison and tabling
(although we should probably specialize the handling of array(int)
and isomorphic types as well)
- uses GC_MALLOC_ATOMIC to avoid problems with bit patterns that look
like pointers (although we should do that for array(int) as well)
XXX The code for the Java and IL backends is untested.
Building the library in grade Java with Sun JDK 1.6 failed (but
at least passed error checking), and I don't have access to a
copy of MSVS.NET. The foreign code that needs to be tested is
trivial.
Add fields `bit', `bits' and `byte' to get/set a single bit,
multiple bits (from an int) or an 8 bit byte.
Add functions for converting bitmaps to hex strings and back,
for use by stream.string_writer.write and deconstruct.functor/4.
bitmap.intersect was buggy in the case where the input bitmaps
had a different size. Given that bitmaps are implemented with
a fixed domain (lookups out of range throw an exception), it
makes more sense to throw an exception in that case anyway,
so all of the set operations do that now.
The difference operation actually performed xor. Fix it and
add an xor function.
library/version_bitmap.m:
This hasn't been fully updated to be the same as bitmap.m.
The payoff would be much less because foreign code can't
really do anything with version_bitmaps.
Add a `bit' field.
Deprecate the `get/2' function in favour of the `bit' field.
Fix the union, difference, intersection and xor functions
as for bitmap.m.
Fix comparison of version_arrays so that it uses the same
method as array.m: compare size then elements in order.
The old code found version_arrays to be equal if one was
a suffix of the other.
library/char.m:
Add predicates for converting between hex digits and integers.
library/io.m:
library/stream.string_writer.m:
library/term.m:
Read and write bitmaps.
runtime/mercury_type_info.h:
runtime/mercury_deep_copy_body.h:
runtime/mercury_mcpp.h:
runtime/mercury_table_type_body.h:
runtime/mercury_tabling_macros.h:
runtime/mercury_unify_compare_body.h:
runtime/mercury_construct.c:
runtime/mercury_deconstruct.c:
runtime/mercury_term_size.c:
runtime/mercury_string.h:
library/construct.m:
library/deconstruct.m
compiler/prog_type.m:
compiler/mlds_to_gcc.m:
compiler/rtti.m:
Add a MR_TypeCtorRep for bitmaps, and handle it in the library
and runtinme.
library/Mercury.options:
Compile bitmap.m with `--no-warn-insts-without-matching-type'.
runtime/mercury_type_info.h:
Bump MR_RTTI_VERSION.
NEWS:
Document the changes.
tests/hard_coded/Mmakefile:
tests/hard_coded/bitmap_test.m:
tests/hard_coded/bitmap_simple.m:
tests/hard_coded/bitmap_tester.m:
tests/hard_coded/bitmap_test.exp:
tests/tabling/Mmakefile:
tests/tabling/expand_bitmap.m:
tests/tabling/expand_bitmap.exp:
tests/hard_coded/version_array_test.m:
tests/hard_coded/version_array_test.exp:
Test cases.
Estimated hours taken: 12
Branches: main
compiler/use_local_vars.m:
Extend this optimization to handle temporaries being both defined in
and used by foreign_proc_code instructions. This should eliminate
unnecessary accesses to the MR_fake_reg array, and thus speed up
programs that use foreign code a lot, including typeclass- and
tabling-intensive programs, since those features are implemented using
inline foreign code. I/O intensive should also benefit, but not much,
since the cost of the I/O itself overwhelms the cost of the
MR_fake_reg accesses.
Group together the LLDS instructions that are handled similarly.
Factor out some common code.
compiler/opt_util.m:
Allow for the fact that foreign_proc_codes can now refer to
temporaries.
compiler/opt_debug.m:
Print more useful information about foreign_proc_code components.
compiler/prog_data.m:
Rename the types and function symbols of the recently added
foreign_proc attributes to avoid clashing with the keywords
representing them in source code.
Add a new foreign_proc attribute, proc_may_duplicate that governs
whether the body of foreign code is allowed to be duplicated.
compiler/table_gen.m:
Include does_not_affect_liveness among the annotations for the
foreign_proc calls generated by this module. Some of these procedures
affect memory beyond their arguments, but that memory is in tables,
not in unlisted registers.
Allow some of the smaller code fragments generated by this module
to be duplicated.
compiler/inlining.m:
Respect the may_not_duplicate foreign_proc attribute.
compiler/pragma_c_gen.m:
Transmit any annotations about liveness from the HLDS to the LLDS,
since without does_not_affect_liveness annotations use_local_vars.m
cannot optimize foreign_proc_codes.
Transmit any annotations about may_duplicate from the HLDS to the LLDS,
since with them jumpopt can do a better job.
compiler/llds.m:
Use the new foreign_proc attribute instead of a boolean to represent
whether a foreign code fragment may be duplicated.
compiler/simplify.m:
Generate an error message if a may_duplicate or may_not_duplicate
attribute on a foreign_proc conflicts with a no_inline or inline pragma
(respectively) on the predicate it belongs to.
compiler/hlds_pred.m:
Fix some comment rot.
compiler/jumpopt.m:
compiler/livemap.m:
compiler/proc_gen.m:
compiler/trace_gen.m:
Conform to the changes above.
doc/reference_manual.texi:
Document the new foreign_proc attribute.
library/array.m:
library/builtin.m:
library/char.m:
library/dir.m:
library/float.m:
library/int.m:
library/io.m:
library/lexer.m:
library/math.m:
library/private_builtin.m:
library/string.m:
library/version_array.m:
Add does_not_affect_liveness annotations to the C foreign_procs that
deserve them.
configure.in:
Require the installed compiler to support does_not_affect_liveness.
tests/invalid/test_may_duplicate.{m,err_exp}:
Add a new test case to test the error checking code in simplify.m.
tests/invalid/Mmakefile:
Enable the new test case.
Estimated hours taken: 0.2
Branches: main, release
library/*.m:
Improve the library reference manual by formatting the beginning of
library modules consistently.
library/integer.m:
Fix some bad indentation.
Estimated hours taken: 1
Branches: main
library/*.m:
Replace __ with . as the module qualifier everywhere.
tests/hard_coded/test_injection.exp:
Replace __ with . as the module qualifier in expected exceptions.
Estimated hours taken: 0.5
Branches: main
library/*.m:
Annotate foreign_procs with trail usage information throughout most of
the standard library.
Fix an out of date comment in string.m.
Fix some minor formatting problems.
Estimated hours taken: 4
Branches: main
library/*.m:
Convert to four-space indentation most of the library modules that
weren't already indented that way. Use predmode syntax where possible.
In some modules, shorten long lines by deleting module name prefixes.
Fix departures from our coding standards.
In some modules, simplify code, mostly using field names and/or state
variables.
There are no changes in algorithms, except for neg_list in integer.m.
Estimated hours taken: 3.5
Branches: main
Make the positioning of descriptive comments conform
to the coding standard for the following library modules.
Convert preds to predmode syntax where possible.
Make the ordering of related predicates and functions
conform to the coding standard, where the descriptive
comment makes it possible to do that.
Other minor changes are listed below.
library/bimap.m:
Fix capitalisation of a few comments.
library/dir.m:
s/throw an exception/throws an exception/.
library/exception.m:
Fix the comment about the exception_result/1 type.
There is only one type and an inst following the comment.
library/map.m:
Remove the unique modes for map.set/4, map.delete/3 and
map.delete_list/3.
library/rbtree.m:
Remove the unique modes for rbtree.set/4, rbtree.delete/3,
rbtree.remove/4, rbtree.remove_smallest/4 and rbtree.remove_largest/4.
library/tree234.m:
Remove left over unique modes for preds.
library/set.m:
XXX the ordering of procedures in this module is a bit strange.
library/set_bbbtree.m:
library/set_unordlist.m:
Remove various unique modes for set operations like
delete/3. (Some of these were commented out anyway).
library/term_to_xml.m:
Fix a spot where line width exceeded 79 characters.
library/array.m:
library/assoc_list.m:
library/random.m:
library/multi_map.m:
library/pqueue.m:
library/queue.m:
library/bool.m:
library/char.m:
library/construct.m:
library/counter.m:
library/deconstruct.m:
library/eqvclass.m:
library/gc.m:
library/io.m:
library/sparse_bitset.m:
library/stack.m:
library/std_util.m:
library/store.m:
library/string.m:
library/term.m:
library/term_io.m:
library/type_desc.m:
library/varset.m:
As above.
Estimated hours taken: 2
Branches: main
Add some library functions I discovered I needed while working on my scanner
generator.
library/char.m:
Add char__int_to_char, a more expressive name for the reverse mode
of char__to_int, and a deterministic version, char__det_int_to_char.
library/list.m:
Add some more arities of the foldl and map_foldl predicates.
Use a consistent naming scheme for the type variables in signatures
of the various versions of the foldl and map_foldl predicates.
library/map.m:
Add a new predicate reverse_map, which turns a map(K, V) into
a map(V, set(K)).
library/svarray.m:
library/svqueue.m:
library/svbimap.m:
New modules, which contain state-variable-friendly versions of the
relevant predicates from array.m, queue.m and bimap.m respectively.
library/library.m:
NEWS:
Mention the three new modules.
Estimated hours taken: 8
Branches: main
library/*.m:
Bring these modules up to date with our current style guidelines.
Use predmode declarations where appropriate. Use state variable syntax
where appropriate. Reorder arguments where this makes it possible to
to use state variable syntax. Standardize format of predicate
description comments. Standardize indentation.
Estimatetimated hours taken: 0.5
Branches: main
Implemented some library functions for the char library in java.
library/char.m:
Implemented the following predicates in java:
char__to_int/2
char__max_char_value/1
Estimated hours taken: 2
Branches: main
Begin porting the the library just to use C# as its foreign_proc
language.
library/array.m:
library/char.m:
library/exception.m:
library/float.m:
library/int.m:
library/math.m:
library/private_builtin.m:
library/rtti_implementation.m:
library/std_util.m:
Trivial changes to convert MC++ to C#.
library/table_builtin.m:
Delete some unused MC++ functions.