library/string.m:
Even though format_table_max is a minor tweak on format_table,
its implementation used to be completely separate. Act on an old XXX
and make format_table use the same primitive ops as format_table_max.
Document the operation of format_table a bit better.
Change the way that format_table_max handles column-width limits,
by accepting overlong column contents *without* starting a new line.
Document the new semantics.
Use predmode decls when possible.
tests/general/string_test.{m,exp}:
Add a test of format_table_max, which previously did not have one.
library/io.m:
Add the predicates
read_named_file_as_string_wf
read_named_file_as_lines_wf
read_file_as_string_wf
read_file_as_string_and_num_code_units_wf
which extend their base predicates (without the _wf suffix) by checking
whether the string read from the file is well formed, and returning
an error if it is not.
library/io.stream_op.m:
Fix and expand a comment.
library/io.text_read.m:
Add a comment.
library/string.m:
Add check_well_formedness, a predicate that checks whether a string
is well formed, and (if relevant) specifies the offset of the first
non-well-formed character.
Make the documentation of index_next and index_next_repl more detailed.
runtime/mercury_string.[ch]:
Define MR_utf8_find_ill_formed_char, a version of MR_utf8_verify
that returns the offset of the first ill-formed UTF-8 char in the
given string, if there is one. This is used in the implementation of
check_well_formedness for C.
NEWS.md:
Announce the new library predicates.
Sort some lists of pred names.
library/string.m:
... in favor of the s/codepoint/code_point/ versions.
NEWS.md:
Mention that these predicates and functions have been marked obsolete.
Mention that all the X_to_doc functions in modules other than
pretty_printer.m have been marked obsolete in favour of the versions
in pretty_printer.m.
Standardize on indenting lists of function and predicate names
by four spaces, not three. (There were more than five times as many
that were indented by four than by three.)
library/string.m:
For each predicate and function whose name includes "codepoint",
- create a version in which "codepoint" is replaced by "code_point",
- make this version the main implementation, making the "codepoint"
versions forward to the "code_point" versions,
- add obsolete pragmas for the "codepoint" versions, though these are
commented out for now. This is so that an installed compiler containing
this change will already have the recommended alternative available
when the commenting-out is removed (maybe in a week or so).
NEWS.md:
Announce the new predicates and functions.
compiler/c_util.m:
compiler/const_prop.m:
compiler/inst_check.m:
compiler/parse_tree_out_term.m:
compiler/structure_reuse.direct.choose_reuse.m:
compiler/write_error_spec.m:
library/pprint.m:
library/pretty_printer.m:
library/string.format.m:
Replace all uses of the "codepoint" versions with the "code_point"
versions.
library/array.m:
library/char.m:
library/float.m:
library/int.m:
library/int16.m:
library/int32.m:
library/int64.m:
library/int8.m:
library/list.m:
library/one_or_more.m:
library/string.m:
library/tree234.m:
library/uint.m:
library/uint16.m:
library/uint32.m:
library/uint64.m:
library/uint8.m:
library/version_array.m:
Mark the X_to_doc function in each of these modules as obsolete,
and make it a forwarding function to the actual implementation
in pretty_printer.m. The intention is that when these forwarding
functions are eventually removed, this will also remove the dependency
of these modules on pretty_printer.m. This should help at least some
of these modules escape the giant SCC in the library's dependency graph.
(It does not make sense that a library module that adds code to increment
an int thereby becomes dependent on pretty_printer.m through int.m.)
library/pretty_printer.m:
Move all the X_to_doc functions from the above modules here.
Fix the one_or_more_to_doc function, which was
- missing the comma between the two arguments of the one_or_more
function symbol, and
- would print "..., ...]" instead of just "...]" at the end of the
tail list when that list exceeded the limits of the specified pp_params.
Rename one of the moved types along with its function symbols,
to reduce ambiguity.
Put arrays before their indexes in the argument lists of some of
the moved functions.
Some of the moved X_to_doc functions for compound types returned
a doc that had an indent wrapper. These indents differed between the
various X_to_doc functions without any visible reason, but they are
also redundant. The callers can trivially add such wrappers if they
want to, but taking them off, if they want them off, is harder.
Eliminate the problem by deleting all such indent wrappers.
Add formatters for the intN, uintN and one_or_more types to the
default formatter map. Their previous absence was an oversight.
Add a function, get_formatter_map_entry_types, that returns the ids
of the types in the formatter_map given to the function. It is intended
for tests/hard_coded/test_pretty_printer_defaults.m, but is exported
for anyone to use.
tests/hard_coded/test_pretty_printer_defaults.{m,exp}:
Use get_formatter_map_entry_types to print the default formatter map
in a format that is much more easily readable.
NEWS:
Announce all the user-visible changes above.
Mercury inherited its original system of operator priorities from Prolog,
because during its initial development, we wanted to be able execute
the Mercury compiler using NU-Prolog and later SICStus Prolog.
That consideration has long been obsolete, and now we may fix the
design error that gifted Prolog with its counter-intuitive system
of operator priorities, in which higher *numerical* operator priorities
mean lower *actual* priorities. This diff does that.
library/ops.m:
Change the meaning of operator priorities, to make higher numerical
priorities mean also higher actual priorities.
This semantic change requires corresponding changes in any other module
that uses ops.m. To force this change to occur, change the type
representing priorities from being a synonym for a bare int to being
a notag wrapper around an uint.
The old "assoc" type had a misleading name, since it is related to
associativity but is not itself a representation of associativity.
Its two function symbols, which used to be just "x" and "y", meant that
the priority of an argument must be (respectively) greater than,
or greater than equal to, the priority of the operator. So rename
x to arg_gt, y to arg_ge, and assoc to arg_prio_gt_or_ge.
Rename the old adjust_priority_for_assoc predicate to min_priority_for_arg,
which better expresses its semantics. Turn it into a function, since
some of its users want it that way, and move its declaration to the
public part of the interface.
Add a method named tightest_op_priority to replace the use of 0
as a priority.
Rename the max_priority method as the loosest_op_priority method.
Add a method named universal_priority to replace the use of
max_priority + 1 as a priority.
Add a function to return the priority of the comma operator,
to allow other modules to stop hardcoding it.
Add operations for comparing priorities and for incrementing/decrementing
priorities.
Change the prefix on the names of the predicates that take op_infos
as inputs from "mercury_op_table_" to "op_infos_", since the old prefix
was misleading.
Add a note on an significant old problem with an exported type synonym.
library/mercury_term_parser.m:
Conform to the changes above.
Delete unnecessary module qualifiers, since they were just clutter.
Add "XXX OPS" to note further opportunities for improvement.
library/pprint.m:
Conform to the changes above.
Rename a function to avoid ambiguity.
library/pretty_printer.m:
library/stream.string_writer.m:
library/string.to_string.m:
library/term_io.m:
Conform to the changes above.
library/string.m:
Add a note on an significant old problem.
NEWS:
Announce the user-visible changes.
tests/hard_coded/bug383.m:
Update this test case to use the new system of operator priorities.
tests/hard_coded/term_io_test.{m,inp}:
Fix white space.
extras/old_library_modules/old_mercury_term_parser.m:
extras/old_library_modules/old_ops.m:
The old contents of the mercury_term_parser and ops modules,
in case anyone wants to continue using them instead of updating
their code to use their updated equivalents.
samples/calculator2.m:
Import the old versions of mercury_term_parser and ops.
configure.ac:
Require the installed compiler to support disable_warning scopes
for unknown_format_calls.
compiler/Mercury.options:
library/Mercury.options:
Do not disable unknown_format_call warnings in whole files.
compiler/parse_tree_out_info.m:
compiler/pd_debug.m:
library/io.m:
library/stream.string_writer.m:
library/string.m:
Disable unknown_format_call warnings for just the format calls
that need it.
library/string.m:
Fix string.all_match to fail if the string being tested contains
any ill-formed code unit sequences.
Fix the Mercury implementation of string.contains_char to continue
searching for the character past any ill-formed code unit sequences.
The foreign code implementations already did so.
Fix unsafe_index_next_repl and unsafe_prev_index_repl in C grades.
We indexed the C string with `ReplacedCodeUnit = Str[Index]' but
since Mercury strings have the C type `char *', ReplacedCodeUnit
could take on a negative value. When ReplacedCodeUnit == -1,
the caller would assume there is a valid encoding of a code point
beginning at Index, and therefore return `not_replaced' instead of
`replaced_code_unit(255)'.
Add casts to `unsigned char' in other places where we index C
strings.
Clarify documentation.
Document that string.duplicate_char may throw an exception.
tests/hard_coded/string_all_match.m:
tests/hard_coded/string_all_match.exp:
tests/hard_coded/string_all_match.exp2:
Add test for string.all_match.
tests/hard_coded/contains_char_2.m:
tests/hard_coded/contains_char_2.exp:
Delete older test case for string.contains_char.
tests/hard_coded/string_contains_char.m:
tests/hard_coded/string_contains_char.exp:
tests/hard_coded/string_contains_char.exp2:
Add better test for string.contains_char.
tests/hard_coded/Mmakefile:
Enable the tests.
NEWS:
Announce the changes.
Add a new predicate that tests if string contains any characters that succeed
for a given test predicate.
library/string.m:
Add the new predicate.
compiler/options.m:
Replace the predicate string_contains_whitespace/1 defined here
with a call to the new library predicate.
NEWS:
Announce the new predicate.
tests/hard_coded/Mmakefile:
tests/hard_coded/string_contains_match.{m,exp}:
Add a test of the new predicate.
compiler/add_clause.m:
Generate a warning for mode-specific clauses when the clause is for
a predicate that has only one mode, provided that the warning is enabled.
compiler/options.m:
Add an option to enable this warning.
doc/user_guide.texi:
Document this option.
library/exception.m:
library/int.m:
library/rtti_implementation.m:
library/string.m:
Delete modes from clause heads that would get this warning.
tests/valid/spurious_purity_warning.m:
Delete modes from clause heads that would get this warning.
Do not interleave predicate definitions.
tests/warnings/unneeded_mode_specific_clause.{m,exp}:
A test case for this warning.
tests/warnings/Mmakefile:
Enable the new test case.
tests/invalid/multimode_syntax.err_exp:
Expect the new warning.
library/string.m:
Delete procedures that have been marked as obsolete since 2019.
NEWS:
Announce the deletions.
tests/hard_coded/string_append_ooi.m:
tests/hard_coded/string_append_ooi_ilseq.m:
tests/hard_coded/string_presuffix.{m,exp}:
Conform to the above changes.
In the Mercury standard library, every exported predicate or function
has (or at least *should* have) a comment that documents it, including
the meanings of its arguments. About 35-40% of these modules put `'s
(left and right quotes) around the names of the variable representing
those arguments. Some tried to do it consistently (though there were spots
with unquoted or half quoted names), while some did it only a few places.
This is inconsistent: we should either do it everywhere, or nowhere.
This diff makes it nowhere, because
- this is what the majority of the standard library modules do;
- this is what virtually all of the modules in the compiler, profiler,
deep_profiler etc directories do;
- typing all those quotes when adding new predicates in modules that
follow this convention is a pain in the ass; and because
- on many modern terminals, `' looks non-symmetrical and weird.
Likewise, the comment explaining a predicate often started with
% `predname(arguments)' returns ...
This diff deletes these quotes as well, since they add nothing useful.
This diff does leave in place quotes around code fragments, both terms
and goals, where this helps delineate the boundaries of that fragment.
In C grades, uint32_to_string/1 is currently implemented by doing the
following:
1. Calling sprintf() to write the string into a buffer on the stack.
2. Calling strlen() on the buffer to determine the actual number of digits.
3. Allocate the appropriate amount of space on the heap.
4. Copying the string from the buffer on to the heap.
The benchmark included in this diff contains a number of alternative
implementations, all of which avoid much of the above overhead.
All of these implementations work by computing the required number of digits,
allocating space on the heap and then writing the digits into the string
backwards.
library/string.m:
Replace uint32_to_string with a faster implementation.
(This is alternative 1 in the benchmark program below.)
tests/hard_coded/Mmakefile:
tests/hard_coded/uint32_to_string.{m,exp}:
A test of uint32 to string.
benchmarks/progs/integer_to_string/uint32_conversion.m:
A program for benchmarking various uint32 to string conversion
implementations.
The int_to_string conversion operation is currently implemented using
int_to_base_string/3. For the other integer types, we implement the equivalent
conversion using library code in the target languages. Benchmarking
int_to_string shows that it is *much* slower than the equivalent target
language code. When converting 10,000,000 randomly generated (64-bit) integers,
the timings in asm_fast.gc are:
int_to_base_string: 37100 ms
target language: 8920 ms
(Repeated six times using benchmark_det_io/7).
library/string.m:
Implement int_to_string using foreign code.
library/string.m:
Test whether base string is between 2 and 36 just once.
library/char.m:
Provide a predicate to convert a character to a base digit
that does not check whether the base is between 2 and 36,
leaving that to the caller.
Make the error message for bases not in that range more informative,
in the predicates that *do* check the base.
NEWS:
Announce unsafe_base_digit_to_int.
library/uint16.m:
library/uint32.m:
library/uint64.m:
library/uint8.m:
Add to each of these modules whichever subset of {from_uint, det_from_uint,
cast_from_uint} makes sense for that module.
tests/hard_coded/from_uint_uint8.{m,exp}:
tests/hard_coded/from_uint_uint16.{m,exp}:
tests/hard_coded/from_uint_uint32.{m,exp}:
Three new test cases to test the changes to uint{8,16,32}.m.
The change to uint64.m will be tested later.
tests/hard_coded/Mmakefile:
Enable the new test cases.
library/string.m:
Add ways to convert strings to uints.
NEWS:
Announce the new predicates and functions.
library/io.m:
Add two new predicates.
The new predicate read_named_file_as_string reads in a named file,
and returns its contents as a single giant string. This is effectively
a combination of open_input, read_file_as_string, and close_input.
The new predicate read_named_file_as_lines reads in a named file,
and returns its contents as a list of strings, each containing
the contents one line (minus the newline char at the end, if any).
This is effectively a combination of open_input, read_file_as_string,
split_into_lines (see below), and close_input.
library/string.m:
Add a new function, split_into_lines, which breaks up a string into
its constituent lines, returning each line minus its newline.
This differs from the old split_at_char('\n', ...) in that it
expects the input string to end with a newline, and does not return
an empty string after a final newline. (If there *are* characters
after the final newline, it does return *them* as the final line.)
NEWS:
Announce the new predicates and function.
compiler/source_file_map.m:
Use the new functionality to simplify the code that reads in
Mercury.modules files.
tests/hard_coded/string_split.{m,exp}:
Add tests for split_into_lines.
library/string.m:
We long had uint_to_hex_string and uint_to_uc_hex_string. Add
uint_to_lc_hex_string as well, and make uint_to_hex_string call it.
This way, users don't have to remember which of the upper and lower
case versions is defined, and which is missing.
Do the same for the 64 bit version.
NEWS:
Announce the new functions.
library/string.format.m:
Call the new functions.
library/string.m:
Add functions for converting uint64s to strings of base 8 or base 16
digits. For most integer types we can cast to a uint and then use the
uint versions of these operations but for 64-bit types we cannot since
on some of our supported platforms uints are 32-bit.
NEWS:
Announce the additions.
tests/hard_coded/Mmakefile:
tests/hard_coded/uint64_string_conv.{m,exp}:
Add a test of the new functions.
Currently, the Mercury implementation of string formatting handles uints by
casting them to ints and then using the code for formatting signed integers as
unsigned values. Add an implementation that works directly on uints and make
the code that formats signed integers as unsigned integers use that instead.
The new implementation is simpler and avoids unnecessary conversions to
arbitrary precision integers.
Add new functions for converting uint values directly to octal and hexadecimal
strings that use functionality provided by the underlying platforms; replace
the Mercury code that previously did that with calls to these new functions.
library/string.m:
Add the functions uint_to_hex_string/1, uint_to_uc_hex_string/1 and
uint_to_octal_string/1.
library/string.format.m:
Make format_uint/6 operate directly on uints instead of casting the value
to a signed int and calling format_unsigned_int/6.
Make format_unsigned_int/6 cast the int value to a uint and then call
format_uint/6.
Delete predicates and functions used to convert ints to octal and
hexadecimal strings. We now just use the functions exported by
the string module.
NEWS:
Announce the additions to the string module.
tests/hard_coded/Mmakefile:
tests/hard_coded/uint_string_conv.{m,exp*}:
Add a test of uint string conversion.
library/string.m:
Add {i,u}{8.16,32,64} as function symbols in the poly_type type,
each with a single argument containing an integer with the named
signedness and size.
The idea is that each of these poly_type values works exactly
the same way as the i(_) poly_type (if signed) or the u(_) poly_type
(if unsigned), with the exception that the value specified by the call
is cast to int or uint before being processed.
library/string.parse_runtime.m:
Parse the new kinds of poly_types. Change the representation of the result
of the parsing to allow recording of the sizes of ints and uints.
Put the code that does the parsing into a predicate of its own.
library/string.format.m:
Do a cast to int or uint if the size information recorded in the
specification of a signed or unsigned integer value calls for it.
Provide functions to do the casting that do not require the import
of {int,uint}{8,16,32,64}.m. This is to allow the compiler to generate
calls to do such casts without having to implicitly import those modules.
Abort if a 64 bit number is being cast to a 32 bit word.
compiler/parse_string_format.m:
Make the same changes as in string.parse_runtime.m, mutatis mutandis.
compiler/format_call.m:
Handle the new kinds of poly_types by adding a cast to int or uint
if necessary, using the predicates added to library/string.format.m.
Use a convenience function to make code creating instmap deltas
more readable.
library/io.m:
library/pprint.m:
library/string.parse_util.m:
tests/invalid/string_format_bad.m:
tests/invalid/string_format_unknown.m:
Conform to the changes above.
tests/string_format/string_format_d.m:
tests/string_format/string_format_u.m:
Test the printing of some of the new poly_types.
tests/string_format/string_format_d.exp2:
tests/string_format/string_format_u.exp2:
Update the expected output of these tests on 64-bit platforms.
tests/string_format/string_format_lib.m:
Update programming style.
library/*.m:
Delete Erlang foreign code and foreign types.
Delete documentation specific to Erlang targets.
library/deconstruct.m:
Add pragma no_determinism_warning to allow functor_number_cc/3
to compile for now.
library/Mercury.options:
Delete workaround only needed when targetting Erlang.
browser/listing.m:
mdbcomp/rtti_access.m:
Delete Erlang foreign code and foreign types.
Extend the operations that perform formatted conversion, such as
string.format/2, to be able to handle values of type uint directly. We have
always supported formatting values of type int as unsigned values, but
currently the only way to format uint values is by explicitly casting them to
an int. This addresses Mantis issue #502.
library/string.m:
Add a new alternative to the poly_type/0 type that wraps uint
values.
Update the documentation for string.format. uint values may
now be formatted using the u, x, X, o or p conversion specifiers.
library/string.format.m:
Add the necessary machinery for handling formatting of uint values.
library/string.parse_runtime.m:
library/string.parse_util.m:
Handle uint poly_types.
library/io.m:a
Handle uint values in the write_many predicates.
library/pprint.m:
Handle uint values in the poly/1 function.
compiler/format_call.m:
compiler/parse_string_format.m:
Conform to the above changes.
compiler/options.m:
Add a way to detect if a compiler supports this change.
NEWS:
Announce the above changes.
tests/hard_coded/stream_format.{m,exp}:
Extend this test to cover uints.
tests/invalid/string_format_bad.m:
tests/invalid/string_format_unknown.m:
Conform to the above changes.
tests/string_format/Mmakefile:
tests/string_format/string_format_uint_o.{m,exp,exp2}:
tests/string_format/string_format_uint_u.{m,exp,exp2}:
tests/string_format/string_format_uint_x.{m,exp,exp2}:
Add tests of string.format with uints.
library/string.m:
Add index_next_repl, unsafe_index_next_repl, prev_index_repl,
unsafe_prev_index_repl predicates that return an indication if a
replacement character was returned because an ill-formed code unit
sequence was encountered.
Add more pragma inlines for indexing predicates.
Remove may_not_duplicate attribute on the Erlang version of
unsafe_prev_index_repl, which would conflict with the pragma inline
declaration. This requires the helper function do_unsafe_prev_index
to be exported.
tests/hard_coded/string_append_ooi_ilseq.m:
tests/hard_coded/string_set_char_ilseq.m:
Use index_next_repl in test cases.
NEWS:
Announce additions.
library/string.m:
Add missing word.
Just write "code points" instead of "character" followed by
clarification in a few spots.
Delete _underscores_ which aren't particularly helpful.
library/string.m:
Define behaviour of set_char, det_set_char and unsafe_set_char on
ill-formed sequences. Also define them to throw an exception on an
attempt to set a null character or surrogate code point in a UTF-8
string.
Delete claim that unsafe_set_char is constant time. That would only
be true for the destructive mode of unsafe_set_char, and that mode
has been disabled for a long time.
Implement the defined behaviour for C and C# versions of
unsafe_set_char. The Java version already behaved as defined.
Use unsafe_set_char to implement set_char instead of duplicating
foreign code.
Replace a couple of uses of strcpy with MR_memcpy as it was
convenient to do so. (On OpenBSD, the linker issues a warning
whenever strcpy is used. Avoiding the warning is not high priority
but we might still like to eliminate all uses of strcpy eventually.)
tests/hard_coded/Mmakefile:
tests/hard_coded/string_set_char_ilseq.exp:
tests/hard_coded/string_set_char_ilseq.exp2:
tests/hard_coded/string_set_char_ilseq.m:
Add test case.
library/string.m:
Use unsafe_append_string_pieces in Mercury implementations of
append_list and join_list. This has no practical effect as we have
foreign code implementations of both, for all target languages.
library/string.m:
Implement string.replace_all using unsafe_append_string_pieces to
avoid intermediate strings. Use unsafe_sub_string_search_start to
avoid repeated range checks.