library/string.m:
We long had uint_to_hex_string and uint_to_uc_hex_string. Add
uint_to_lc_hex_string as well, and make uint_to_hex_string call it.
This way, users don't have to remember which of the upper and lower
case versions is defined, and which is missing.
Do the same for the 64 bit version.
NEWS:
Announce the new functions.
library/string.format.m:
Call the new functions.
library/string.m:
Add functions for converting uint64s to strings of base 8 or base 16
digits. For most integer types we can cast to a uint and then use the
uint versions of these operations but for 64-bit types we cannot since
on some of our supported platforms uints are 32-bit.
NEWS:
Announce the additions.
tests/hard_coded/Mmakefile:
tests/hard_coded/uint64_string_conv.{m,exp}:
Add a test of the new functions.
Currently, the Mercury implementation of string formatting handles uints by
casting them to ints and then using the code for formatting signed integers as
unsigned values. Add an implementation that works directly on uints and make
the code that formats signed integers as unsigned integers use that instead.
The new implementation is simpler and avoids unnecessary conversions to
arbitrary precision integers.
Add new functions for converting uint values directly to octal and hexadecimal
strings that use functionality provided by the underlying platforms; replace
the Mercury code that previously did that with calls to these new functions.
library/string.m:
Add the functions uint_to_hex_string/1, uint_to_uc_hex_string/1 and
uint_to_octal_string/1.
library/string.format.m:
Make format_uint/6 operate directly on uints instead of casting the value
to a signed int and calling format_unsigned_int/6.
Make format_unsigned_int/6 cast the int value to a uint and then call
format_uint/6.
Delete predicates and functions used to convert ints to octal and
hexadecimal strings. We now just use the functions exported by
the string module.
NEWS:
Announce the additions to the string module.
tests/hard_coded/Mmakefile:
tests/hard_coded/uint_string_conv.{m,exp*}:
Add a test of uint string conversion.
library/string.m:
Add {i,u}{8.16,32,64} as function symbols in the poly_type type,
each with a single argument containing an integer with the named
signedness and size.
The idea is that each of these poly_type values works exactly
the same way as the i(_) poly_type (if signed) or the u(_) poly_type
(if unsigned), with the exception that the value specified by the call
is cast to int or uint before being processed.
library/string.parse_runtime.m:
Parse the new kinds of poly_types. Change the representation of the result
of the parsing to allow recording of the sizes of ints and uints.
Put the code that does the parsing into a predicate of its own.
library/string.format.m:
Do a cast to int or uint if the size information recorded in the
specification of a signed or unsigned integer value calls for it.
Provide functions to do the casting that do not require the import
of {int,uint}{8,16,32,64}.m. This is to allow the compiler to generate
calls to do such casts without having to implicitly import those modules.
Abort if a 64 bit number is being cast to a 32 bit word.
compiler/parse_string_format.m:
Make the same changes as in string.parse_runtime.m, mutatis mutandis.
compiler/format_call.m:
Handle the new kinds of poly_types by adding a cast to int or uint
if necessary, using the predicates added to library/string.format.m.
Use a convenience function to make code creating instmap deltas
more readable.
library/io.m:
library/pprint.m:
library/string.parse_util.m:
tests/invalid/string_format_bad.m:
tests/invalid/string_format_unknown.m:
Conform to the changes above.
tests/string_format/string_format_d.m:
tests/string_format/string_format_u.m:
Test the printing of some of the new poly_types.
tests/string_format/string_format_d.exp2:
tests/string_format/string_format_u.exp2:
Update the expected output of these tests on 64-bit platforms.
tests/string_format/string_format_lib.m:
Update programming style.
library/*.m:
Delete Erlang foreign code and foreign types.
Delete documentation specific to Erlang targets.
library/deconstruct.m:
Add pragma no_determinism_warning to allow functor_number_cc/3
to compile for now.
library/Mercury.options:
Delete workaround only needed when targetting Erlang.
browser/listing.m:
mdbcomp/rtti_access.m:
Delete Erlang foreign code and foreign types.
Extend the operations that perform formatted conversion, such as
string.format/2, to be able to handle values of type uint directly. We have
always supported formatting values of type int as unsigned values, but
currently the only way to format uint values is by explicitly casting them to
an int. This addresses Mantis issue #502.
library/string.m:
Add a new alternative to the poly_type/0 type that wraps uint
values.
Update the documentation for string.format. uint values may
now be formatted using the u, x, X, o or p conversion specifiers.
library/string.format.m:
Add the necessary machinery for handling formatting of uint values.
library/string.parse_runtime.m:
library/string.parse_util.m:
Handle uint poly_types.
library/io.m:a
Handle uint values in the write_many predicates.
library/pprint.m:
Handle uint values in the poly/1 function.
compiler/format_call.m:
compiler/parse_string_format.m:
Conform to the above changes.
compiler/options.m:
Add a way to detect if a compiler supports this change.
NEWS:
Announce the above changes.
tests/hard_coded/stream_format.{m,exp}:
Extend this test to cover uints.
tests/invalid/string_format_bad.m:
tests/invalid/string_format_unknown.m:
Conform to the above changes.
tests/string_format/Mmakefile:
tests/string_format/string_format_uint_o.{m,exp,exp2}:
tests/string_format/string_format_uint_u.{m,exp,exp2}:
tests/string_format/string_format_uint_x.{m,exp,exp2}:
Add tests of string.format with uints.
library/string.m:
Add index_next_repl, unsafe_index_next_repl, prev_index_repl,
unsafe_prev_index_repl predicates that return an indication if a
replacement character was returned because an ill-formed code unit
sequence was encountered.
Add more pragma inlines for indexing predicates.
Remove may_not_duplicate attribute on the Erlang version of
unsafe_prev_index_repl, which would conflict with the pragma inline
declaration. This requires the helper function do_unsafe_prev_index
to be exported.
tests/hard_coded/string_append_ooi_ilseq.m:
tests/hard_coded/string_set_char_ilseq.m:
Use index_next_repl in test cases.
NEWS:
Announce additions.
library/string.m:
Add missing word.
Just write "code points" instead of "character" followed by
clarification in a few spots.
Delete _underscores_ which aren't particularly helpful.
library/string.m:
Define behaviour of set_char, det_set_char and unsafe_set_char on
ill-formed sequences. Also define them to throw an exception on an
attempt to set a null character or surrogate code point in a UTF-8
string.
Delete claim that unsafe_set_char is constant time. That would only
be true for the destructive mode of unsafe_set_char, and that mode
has been disabled for a long time.
Implement the defined behaviour for C and C# versions of
unsafe_set_char. The Java version already behaved as defined.
Use unsafe_set_char to implement set_char instead of duplicating
foreign code.
Replace a couple of uses of strcpy with MR_memcpy as it was
convenient to do so. (On OpenBSD, the linker issues a warning
whenever strcpy is used. Avoiding the warning is not high priority
but we might still like to eliminate all uses of strcpy eventually.)
tests/hard_coded/Mmakefile:
tests/hard_coded/string_set_char_ilseq.exp:
tests/hard_coded/string_set_char_ilseq.exp2:
tests/hard_coded/string_set_char_ilseq.m:
Add test case.
library/string.m:
Use unsafe_append_string_pieces in Mercury implementations of
append_list and join_list. This has no practical effect as we have
foreign code implementations of both, for all target languages.
library/string.m:
Implement string.replace_all using unsafe_append_string_pieces to
avoid intermediate strings. Use unsafe_sub_string_search_start to
avoid repeated range checks.
library/string.m:
Delete long-deprecated substring/3 function and substring/4 predicate.
The newly introduced `string_piece' type has a substring/3 data
constructor which takes (start, end) offsets into the base string,
whereas the function and predicate take (start, count) arguments.
To reduce potential confusion, delete the deprecated function and
predicate.
Delete other deprecated substring predicates and functions as well.
tests/general/Mercury.options:
tests/general/string_foldl_substring.exp:
tests/general/string_foldl_substring.m:
tests/general/string_foldr_substring.exp:
tests/general/string_foldr_substring.m:
tests/hard_coded/Mercury.options:
tests/hard_coded/string_substring.m:
Delete tests for deprecated predicates.
tests/tabling/mercury_java_parser_dead_proc_elim_bug.m:
tests/tabling/mercury_java_parser_dead_proc_elim_bug2.m:
tests/valid/mercury_java_parser_follow_code_bug.m:
Replace calls to unsafe_substring with unsafe_between.
NEWS:
Announce the changes.
library/string.m:
Define behaviour of string.replace_all on ill-formed code unit
sequences when the pattern is empty.
Implement that behaviour.
Use better variable names in documentation of string.replace and
string.replace_all.
tests/general/string_replace.exp:
tests/general/string_replace.exp2:
tests/general/string_replace.m:
Extend test case.
Update code style.
library/string.m:
Make generic implementations of string.to_upper and string.to_lower
preserve ill-formed sequences. (The foreign language implementations
already did so.)
library/string.m:
Document that count_utf8_code_units throws an exception if the
string contains an unpaired surrogate code point.
Make the exception message thrown more useful to callers.
Delete unnecessary foreign_procs.
library/string.m:
Document that from_code_unit_list fails if the result string would
contain a null character, and enforce that in the Java and C#
implementations. It was already enforced in the C implementation.
Make from_code_unit_list fail if the code unit list contains an
invalid value (negative or >0xff or >0xffff).
Document that from_utf{8,16}_code_unit_list fails if the result
string would contain a null character.
Make from_utf8_code_unit_list call semidet_from_rev_char_list rather
than from_rev_char_list so that it fails as documented instead of
throwing an exception if the code unit list correctly encodes a list
of code points, but the code points cannot be encoded into a string.
Similarly for from_utf16_code_unit_list.
tests/hard_coded/Mmakefile:
tests/hard_coded/string_from_code_unit_list.exp:
tests/hard_coded/string_from_code_unit_list.exp2:
tests/hard_coded/string_from_code_unit_list.m:
Add test case.
library/string.m:
Define sub_string_search_start to fail if the BeginAt parameter is
negative or past the end of the string to search. The original C
implementation did not check for an out-of-range starting offset,
and could crash the program. The C implementation was later amended
to fail instead, but not other implementations.
Check for negative starting offset in non-C implementations of
sub_string_search_start.
tests/hard_coded/string_sub_string_search.m:
Extend test case.
library/string.m:
Make C# implementation of sub_string_search perform ordinal
(Unicode code point) based string search, instead of a
culture-sensitive search.
library/string.m:
Make split_at_separator never consider ill-formed sequences in UTF-8
strings as potential separators, as they cannot contain any code
points that could satify any given DelimP predicate on code points.
Previously, split_at_separator would call DelimP(U+FFFD) for every
code unit in an ill-formed sequence.
library/string.m:
Make words_separator never consider ill-formed sequences in UTF-8
strings as potential separators, as they cannot contain any code
points that could satisfy any given SepP predicate on code points.
Previously, words_separator would call SepP(U+FFFD) for every code
unit in an ill-formed sequence.
library/string.m:
Make all_match(Pred, String) always fail if the string contains an
ill-formed code unit sequence, and strings use UTF-8 encoding.
Such sequences do not contain any code points that could satisfy a
test on code points. Previously, all_match would call Pred(U+FFFD)
for every code unit in an ill-formed sequence.
Define all_match to rule out an interpretation that could ignore
ill-formed sequences.
library/string.m:
Make prefix_length and suffix_length stop at an ill-formed sequence
in UTF-8 strings. Such a sequence does not contain any code point
that could satisfy a test on code points. Previously, prefix_length
and suffix_length would would call Pred(U+FFFD) for every code unit
in an ill-formed sequence.
Tweak documentation.
Delete obsolete comments.
library/string.m:
Fix C implementation of contains_char to fail when asked to test for
a surrogate code point in a string. It previously would (always)
succeed, which is a bug.
Fix generic implementation so that contains_char(String, '\uFFFD')
will not succeed just because String contains an ill-formed sequence
(in UTF-8 grades).
Delete obsolete comment.
library/string.m:
Add index_next_repl, unsafe_index_next_repl, prev_index_repl,
unsafe_prev_index_repl predicates. These are internal for now,
so we can try them out in the string module without committing
to the interface.
library/string.m:
Fix definition of index/3 and index_next/4 to account for an offset
into a non-initial code unit in a well-formed code unit sequence.
Similarly for prev_index/4.
library/string.m:
Define behaviour of char_to_string when the string is not
well-formed or if the char is a surrogate code point.
Implement char_to_string/2 using multiple clauses
as the described behaviour doesn't match to_char_list/2.
tests/hard_coded/Mmakefile:
tests/hard_coded/char_to_string.exp:
tests/hard_coded/char_to_string.exp2:
tests/hard_coded/char_to_string.m:
Add test case.
library/string.m:
Define first_char/3 to fail if the input string begins with an
ill-formed code unit sequence.
Define the reverse mode to throw an exception on an attempt to
encode a null character or surrogate code point in the output
string.
Reimplement first_char/3 in Mercury.
hard_coded/Mmakefile:
hard_coded/string_first_char_ilseq.exp:
hard_coded/string_first_char_ilseq.m:
Add test case.
library/string.m:
Make from_char_list, from_rev_char_list, to_char_list throw an
exception if the list of chars includes a surrogate code point that
cannot be encoded in a UTF-8 string.
Make semidet_from_char_list, semidet_from_rev_char_list,
to_char_list fail if the list of chars includes a surrogate code
point that cannot be encoded in a UTF-8 string.
runtime/mercury_string.h:
Document return value of MR_utf8_width.
tests/hard_coded/Mmakefile:
tests/hard_coded/string_from_char_list_ilseq.exp:
tests/hard_coded/string_from_char_list_ilseq.exp2:
tests/hard_coded/string_from_char_list_ilseq.m:
Add test case.
tests/hard_coded/null_char.exp:
Expect new message in exceptions thrown by from_char_list,
from_rev_char_list.
tests/hard_coded/string_hash.m:
Don't generate surrogate code points in random strings.