Commit Graph

20465 Commits

Author SHA1 Message Date
AlaskanEmily
416fbe954a Add option for link-time optimization/link-time code generation
The option is disabled by default.

configure.ac:
    Add --enable-lto option to the configure script to enable LTO/LTCG.

scripts/mgnuc.in:
scripts/ml.in:
    Pass LTO options to the C compiler/linker.

compiler/compile_target_code.m:
compiler/options.m:
scripts/Mercury.config.in:
    Add internal options to specify C compiler and linker LTO/LTCG flags.
2019-11-15 16:51:13 +11:00
Peter Wang
9a042f4fb1 Minor documentation changes.
library/string.m:
    Add missing word.

    Just write "code points" instead of "character" followed by
    clarification in a few spots.

    Delete _underscores_ which aren't particularly helpful.
2019-11-14 15:45:40 +11:00
Peter Wang
7ef407e937 Enable pragma obsolete_proc declarations.
library/string.m:
    Enable pragma obsolete_proc declarations since we now require a
    recent enough compiler version.
2019-11-14 11:28:25 +11:00
Zoltan Somogyi
44d2436556 Delete an obsolete test, and move related code next to each other. 2019-11-12 23:06:04 +11:00
Peter Wang
5c3b392ed0 Implement string.(un)capitalize_first more efficiently.
library/string.m:
    Avoid creating temporary string in capitalize_first and
    uncapitalize_first.
2019-11-12 17:16:50 +11:00
Peter Wang
f71b5f20ed Define behaviour of string.set_char etc on ill-formed sequences.
library/string.m:
    Define behaviour of set_char, det_set_char and unsafe_set_char on
    ill-formed sequences. Also define them to throw an exception on an
    attempt to set a null character or surrogate code point in a UTF-8
    string.

    Delete claim that unsafe_set_char is constant time. That would only
    be true for the destructive mode of unsafe_set_char, and that mode
    has been disabled for a long time.

    Implement the defined behaviour for C and C# versions of
    unsafe_set_char. The Java version already behaved as defined.

    Use unsafe_set_char to implement set_char instead of duplicating
    foreign code.

    Replace a couple of uses of strcpy with MR_memcpy as it was
    convenient to do so. (On OpenBSD, the linker issues a warning
    whenever strcpy is used. Avoiding the warning is not high priority
    but we might still like to eliminate all uses of strcpy eventually.)

tests/hard_coded/Mmakefile:
tests/hard_coded/string_set_char_ilseq.exp:
tests/hard_coded/string_set_char_ilseq.exp2:
tests/hard_coded/string_set_char_ilseq.m:
    Add test case.
2019-11-12 17:16:34 +11:00
Zoltan Somogyi
ac0050fb97 Record more info about imports/uses for smart recompilation.
compiler/module_imports.m:
    Replace the recomp_need_qualifier type with the recomp_avail type,
    which contains more information about how a module became available
    to the module being compiled. Specifically, besides recording whether
    it was the subject of an import_module or use_module declaration
    (which governes whether references must be fully qualified or not),
    it says in which section that declaration was (i.e. where references
    may appear at all), whether there was a use_module in the interface
    and an import_module in the implementation section (in which case
    references must be fully qualified in the interface section but not
    in the implementation section).

    Update the discussion of module timestamps to reflect our current
    understanding of it.

compiler/grab_modules.m:
    Indirectly fill in the affected slot of module timestamps with
    more detailed info.

compiler/recompilation.usage.m:
    Record the newly available information about module timestamps.

compiler/recompilation.check.m:
    Read in the newly available information about module timestamps.
    For the time being, accept both the old and new ways of recording
    such information.

    Update the code that makes decisions based on this information.
    The new code should always make the same decisions as the old code,
    but those decisions look to be appropriate in only *some* cases,
    mark each such code fragment with XXX RECOMP401. (The 401 part
    is there because, as discussed on m-rev, one probable bug resembles
    Mantis bug #401.)
2019-11-12 01:14:09 +11:00
Zoltan Somogyi
1b093f34ee Stop using a type where it does not belong.
compiler/module_imports.m:
    The module_timestamp type has long had a field of the need_qualifier type.
    For a year or more now, this field has had a big comment on it explaining
    why its use here does not match the semantics of the need_qualifier type.

    This diff gives this field a new, bespoke type, that is isomorphic
    to need_qualifier, but is completely deparate. Document the reason why
    I *can't* document the semantics of this new type :-(

compiler/grab_modules.m:
    Do the same kind of replacement on values that are used only to set
    this field of module_timestamps.

compiler/recompilation.check.m:
compiler/recompilation.usage.m:
    Use the values of the new type to make the same decisions we used to make
    using values of the need_qualifier type.

tests/recompilation/*.m*:
    Bring the programming style of these modules up to date.

tests/recompilation/*.err_exp.2:
    Expect the updated line numbers in messages.

Delete the need_qualifier field from timestamps.
2019-11-09 17:53:43 +11:00
Zoltan Somogyi
cafc75477f Make orders of declarations and definitions match in exception.m. 2019-11-09 08:35:47 +11:00
Peter Wang
ae2dda693e Avoid range checks in string.split_at_separator.
library/string.m:
    Avoid unnecessary range checks in split_at_separator.
2019-11-08 14:25:23 +11:00
Peter Wang
b68548d4dc Avoid garbage in Mercury versions of string.append_list/join_list.
library/string.m:
    Use unsafe_append_string_pieces in Mercury implementations of
    append_list and join_list. This has no practical effect as we have
    foreign code implementations of both, for all target languages.
2019-11-08 14:25:23 +11:00
Peter Wang
68ae33c426 Avoid intermediate strings in string.replace_all.
library/string.m:
    Implement string.replace_all using unsafe_append_string_pieces to
    avoid intermediate strings. Use unsafe_sub_string_search_start to
    avoid repeated range checks.
2019-11-08 14:25:23 +11:00
Peter Wang
3daee4fc23 Avoid intermediate strings in string.replace.
library/string.m:
    Implement string.replace using unsafe_append_string_pieces.
2019-11-08 14:25:23 +11:00
Peter Wang
7eb78c66d1 Add string.unsafe_sub_string_search_start.
library/string.m:
    Add unsafe_sub_string_search_start/4.

NEWS:
    Announce addition.
2019-11-08 14:25:23 +11:00
Peter Wang
3621cfa650 Delete deprecated substring predicates and functions.
library/string.m:
    Delete long-deprecated substring/3 function and substring/4 predicate.
    The newly introduced `string_piece' type has a substring/3 data
    constructor which takes (start, end) offsets into the base string,
    whereas the function and predicate take (start, count) arguments.
    To reduce potential confusion, delete the deprecated function and
    predicate.

    Delete other deprecated substring predicates and functions as well.

tests/general/Mercury.options:
tests/general/string_foldl_substring.exp:
tests/general/string_foldl_substring.m:
tests/general/string_foldr_substring.exp:
tests/general/string_foldr_substring.m:
tests/hard_coded/Mercury.options:
tests/hard_coded/string_substring.m:
    Delete tests for deprecated predicates.

tests/tabling/mercury_java_parser_dead_proc_elim_bug.m:
tests/tabling/mercury_java_parser_dead_proc_elim_bug2.m:
tests/valid/mercury_java_parser_follow_code_bug.m:
    Replace calls to unsafe_substring with unsafe_between.

NEWS:
    Announce the changes.
2019-11-08 14:25:23 +11:00
Peter Wang
96b2caf536 Add string.unsafe_append_string_pieces.
library/string.m:
    Add unsafe_append_string_pieces/2 predicate.

NEWS:
    Announce addition.
2019-11-08 14:23:06 +11:00
Peter Wang
f2e0998651 Add string.append_string_pieces.
library/string.m:
    Add append_string_pieces/2 predicate.

library/io.m:
    Add a comment about a potential future change.

tests/hard_coded/Mmakefile:
tests/hard_coded/string_append_pieces.exp:
tests/hard_coded/string_append_pieces.m:
    Add test case.

NEWS:
    Announce addition.
2019-11-08 14:23:06 +11:00
Peter Wang
d2c3ede17d Make string.replace_all with empty pattern preserve ill-formed sequences.
library/string.m:
    Define behaviour of string.replace_all on ill-formed code unit
    sequences when the pattern is empty.

    Implement that behaviour.

    Use better variable names in documentation of string.replace and
    string.replace_all.

tests/general/string_replace.exp:
tests/general/string_replace.exp2:
tests/general/string_replace.m:
    Extend test case.

    Update code style.
2019-11-08 13:57:38 +11:00
Peter Wang
8a3404d59c Extend string_replace test case.
tests/general/string_replace.exp:
tests/general/string_replace.m:
    As above.
2019-11-08 13:57:38 +11:00
Zoltan Somogyi
6245b6e54d Allow reading parse_tree_{plain,trans}_opt.
compiler/prog_item.m:
    Fix a bug: provide a slot for foreign_procs in parse_tree_plain_opts.

    Eliminate unnecessary differences between parse_tree_{plain,trans}_opt
    and parse_tree_int[0123].

compiler/parse_module.m:
    As a temporary measure for testing,

    - convert every .opt and .trans_opt file read in from its generic
      parse_tree_opt representation to its specific parse_tree_plain_opt
      or parse_tree_trans_opt representation, in order to check for
      items that should not occur in the purpose-specific representations,
    - and then convert it back.

    Provide (not yet used) predicates for reading in optimization files
    into their specific parse tree formats. These do the first conversion
    but not the second.

compiler/parse_tree_out.m:
    Add predicates for writing out parse_tree_{plain,trans}_opt directly,
    without conversion to a generic opt file.

compiler/convert_parse_tree.m:
    Make the above possible by adding conversions to and from
    parse_tree_{plain,trans}_opt.

    Give exported predicates more expressive names.

compiler/comp_unit_interface.m:
compiler/intermod.m:
    Conform to the changes above.

compiler/add_pred.m:
compiler/grab_modules.m:
    Minor style improvements.

compiler/add_foreign_proc.m:
    Fix the wording of an error message.

tests/invalid/fp_dup_bug.err_exp:
    Expect the updated wording.
2019-11-08 13:52:10 +11:00
Zoltan Somogyi
43cf1c1cde Put the contents of io.m into a sensible order.
library/io.m:
    Put the contents of the implementation section of io.m into
    the same order as the interface section, after some minor reordering
    in the interface section.

library/Mercury.options:
    Don't specify --no-warn-inconsistent-pred-order-clauses for io.m
    anymore.
2019-11-08 12:08:56 +11:00
Zoltan Somogyi
bfb985c0b4 Enable --halt-at-invalid-interface by default.
compiler/options.m:
    As above.

configure.ac:
    Require a recent-enough compiler, one that does not generate
    invalid-by-the-recently-changed-standard interfaces.
2019-11-07 20:38:07 +11:00
Peter Wang
0a1f289b6d Make generic versions of string.to_upper/lower preserve ill-formed sequences.
library/string.m:
    Make generic implementations of string.to_upper and string.to_lower
    preserve ill-formed sequences. (The foreign language implementations
    already did so.)
2019-11-06 13:43:54 +11:00
Peter Wang
031b6d915d Document that string.count_utf8_code_units throws exceptions.
library/string.m:
    Document that count_utf8_code_units throws an exception if the
    string contains an unpaired surrogate code point.

    Make the exception message thrown more useful to callers.

    Delete unnecessary foreign_procs.
2019-11-06 13:43:54 +11:00
Peter Wang
2e5f6ddef9 Make string.to_utf16_code_unit_list throw exception for ill-formed UTF-8.
library/string.m:
    As above.
2019-11-06 13:43:54 +11:00
Peter Wang
67234fc898 Document that string.to_utf8_code_unit_list throws exceptions.
library/string.m:
    Document that string.to_utf8_code_unit_list throws an exception
    if the string contains an unpaired surrogate code point.
2019-11-06 13:43:54 +11:00
Peter Wang
1e85dcb99e Add string.from_code_unit_list_allow_ill_formed.
library/string.m:
    Add string.from_code_unit_list_allow_ill_formed/2.

tests/hard_coded/string_from_code_unit_list.exp:
tests/hard_coded/string_from_code_unit_list.exp2:
tests/hard_coded/string_from_code_unit_list.m:
    Extend test case.

NEWS:
    Announce addition.
2019-11-06 13:43:54 +11:00
Peter Wang
adbf4c51c8 Tighten up string.from_code_unit_list et al.
library/string.m:
    Document that from_code_unit_list fails if the result string would
    contain a null character, and enforce that in the Java and C#
    implementations. It was already enforced in the C implementation.

    Make from_code_unit_list fail if the code unit list contains an
    invalid value (negative or >0xff or >0xffff).

    Document that from_utf{8,16}_code_unit_list fails if the result
    string would contain a null character.

    Make from_utf8_code_unit_list call semidet_from_rev_char_list rather
    than from_rev_char_list so that it fails as documented instead of
    throwing an exception if the code unit list correctly encodes a list
    of code points, but the code points cannot be encoded into a string.

    Similarly for from_utf16_code_unit_list.

tests/hard_coded/Mmakefile:
tests/hard_coded/string_from_code_unit_list.exp:
tests/hard_coded/string_from_code_unit_list.exp2:
tests/hard_coded/string_from_code_unit_list.m:
    Add test case.
2019-11-06 13:43:54 +11:00
Peter Wang
389d973a1b Skip two tests in profdeep grades.
tests/hard_coded/Mmakefile:
    Skip two tests in profdeep grades that need to catch exceptions.
2019-11-06 13:35:27 +11:00
Julien Fischer
a6c4040ae0 Clarify an XXX.
compiler/compile_target_code.m:
    As above.
2019-11-03 23:32:56 +11:00
Peter Wang
33da7c82b4 Fix use of MR_strerror.
extras/posix/posix.strerror.m:
    Fix use of MR_strerror (it may return a pointer to a static string
    instead of modifying the provided buffer).

    Allocate string such that it will be attributed to the predicate
    when profiling memory retention.
2019-10-31 23:47:18 +11:00
Julien Fischer
da5fe0974e Adjust a predicate.
extras/posix/posix.m:
    Always return a valid error number from error_to_cerrno/2.
2019-10-31 21:43:00 +11:00
Peter Wang
0c6778c89f Simplify Erlang implementation of sub_string_search_start.
library/string.m:
    As above. (Not that simple in the end.)
2019-10-31 17:20:09 +11:00
Peter Wang
c4fcbdaea3 Make generic version of string.sub_string_search_start more efficient.
library/string.m:
    Use unsafe_compare_substrings in generic version of
    sub_string_search_start.
2019-10-31 17:20:09 +11:00
Peter Wang
91868fe7ef Define string.sub_string_search_start for out-of-range starting offset.
library/string.m:
    Define sub_string_search_start to fail if the BeginAt parameter is
    negative or past the end of the string to search. The original C
    implementation did not check for an out-of-range starting offset,
    and could crash the program. The C implementation was later amended
    to fail instead, but not other implementations.

    Check for negative starting offset in non-C implementations of
    sub_string_search_start.

tests/hard_coded/string_sub_string_search.m:
    Extend test case.
2019-10-31 17:20:09 +11:00
Peter Wang
30d0933f59 Fix C# version of string.sub_string_search to be culture-insensitive.
library/string.m:
    Make C# implementation of sub_string_search perform ordinal
    (Unicode code point) based string search, instead of a
    culture-sensitive search.
2019-10-31 15:56:24 +11:00
Julien Fischer
2c0351aea9 Add a binding to sterror().
Contributed by Volker Wysk.

extras/posix/posix.strerror.m:
extras/posix/posix.m:
    As above.
2019-10-31 11:26:03 +11:00
Peter Wang
d40ab1ab44 Slightly improve string stripping functions.
library/string.m:
    Use unsafe_between for chomp, lstrip_pred, rstrip_pred
    to avoid range checks.
2019-10-30 16:51:00 +11:00
Peter Wang
09512195fc Make string.split_at_separator skip ill-formed sequences in UTF-8 strings.
library/string.m:
    Make split_at_separator never consider ill-formed sequences in UTF-8
    strings as potential separators, as they cannot contain any code
    points that could satify any given DelimP predicate on code points.
    Previously, split_at_separator would call DelimP(U+FFFD) for every
    code unit in an ill-formed sequence.
2019-10-30 16:51:00 +11:00
Peter Wang
1b91cf375c Make string.words_separator skip ill-formed sequences in UTF-8.
library/string.m:
    Make words_separator never consider ill-formed sequences in UTF-8
    strings as potential separators, as they cannot contain any code
    points that could satisfy any given SepP predicate on code points.
    Previously, words_separator would call SepP(U+FFFD) for every code
    unit in an ill-formed sequence.
2019-10-30 16:51:00 +11:00
Peter Wang
de2af8cdd7 Make string.all_match fail on UTF-8 string containing ill-formed sequence.
library/string.m:
    Make all_match(Pred, String) always fail if the string contains an
    ill-formed code unit sequence, and strings use UTF-8 encoding.
    Such sequences do not contain any code points that could satisfy a
    test on code points. Previously, all_match would call Pred(U+FFFD)
    for every code unit in an ill-formed sequence.

    Define all_match to rule out an interpretation that could ignore
    ill-formed sequences.
2019-10-30 16:51:00 +11:00
Peter Wang
817cf44efd Make string.prefix_length/suffix_length stop at ill-formed sequence.
library/string.m:
    Make prefix_length and suffix_length stop at an ill-formed sequence
    in UTF-8 strings. Such a sequence does not contain any code point
    that could satisfy a test on code points. Previously, prefix_length
    and suffix_length would would call Pred(U+FFFD) for every code unit
    in an ill-formed sequence.

    Tweak documentation.

    Delete obsolete comments.
2019-10-30 16:51:00 +11:00
Peter Wang
265ffa15f0 Fix two bugs in string.contains_char.
library/string.m:
    Fix C implementation of contains_char to fail when asked to test for
    a surrogate code point in a string. It previously would (always)
    succeed, which is a bug.

    Fix generic implementation so that contains_char(String, '\uFFFD')
    will not succeed just because String contains an ill-formed sequence
    (in UTF-8 grades).

    Delete obsolete comment.
2019-10-30 16:51:00 +11:00
Peter Wang
6c0c337568 Add string indexing predicates that indicate if the char was replaced.
library/string.m:
    Add index_next_repl, unsafe_index_next_repl, prev_index_repl,
    unsafe_prev_index_repl predicates. These are internal for now,
    so we can try them out in the string module without committing
    to the interface.
2019-10-30 16:51:00 +11:00
Peter Wang
7da7c103df Improve definition of string.index, index_next, prev_index.
library/string.m:
    Fix definition of index/3 and index_next/4 to account for an offset
    into a non-initial code unit in a well-formed code unit sequence.

    Similarly for prev_index/4.
2019-10-30 16:51:00 +11:00
Zoltan Somogyi
09a95acca2 Add forgotten file. 2019-10-30 15:28:51 +11:00
Zoltan Somogyi
c76a9c70d7 Move io.format to its own section. 2019-10-30 12:51:06 +11:00
Zoltan Somogyi
c7bc31f2d2 Rename convert_interface.m to convert_parse_tree.m.
compiler/convert_interface.m:
    As above. I am about to add code to convert optimization files as well.

compiler/parse_tree.m:
    Include the module under its new name.

compiler/notes/compiler_design.html:
    Document the module under its new name.

compiler/comp_unit_interface.m:
compiler/intermod.m:
compiler/parse_module.m:
    Import the module under its new name.
2019-10-30 12:10:16 +11:00
Zoltan Somogyi
576f099176 Fix comment. 2019-10-30 12:10:16 +11:00
Peter Wang
9bee18553c Correct documentation for string.from_char_list. 2019-10-30 12:02:42 +11:00