mercury

mirror of https://github.com/Mercury-Language/mercury.git synced 2026-04-18 02:43:40 +00:00

Author	SHA1	Message	Date
Zoltan Somogyi	040d6717a6	Fix comments.	2021-01-26 23:22:54 +11:00
Zoltan Somogyi	9c248726a6	Add uint{64,}_to_lc_hex_string. library/string.m: We long had uint_to_hex_string and uint_to_uc_hex_string. Add uint_to_lc_hex_string as well, and make uint_to_hex_string call it. This way, users don't have to remember which of the upper and lower case versions is defined, and which is missing. Do the same for the 64 bit version. NEWS: Announce the new functions. library/string.format.m: Call the new functions.	2021-01-22 17:18:57 +11:00
Julien Fischer	52b31f5089	Add uint64 to string conversion for bases 8 and 16. library/string.m: Add functions for converting uint64s to strings of base 8 or base 16 digits. For most integer types we can cast to a uint and then use the uint versions of these operations but for 64-bit types we cannot since on some of our supported platforms uints are 32-bit. NEWS: Announce the additions. tests/hard_coded/Mmakefile: tests/hard_coded/uint64_string_conv.{m,exp}: Add a test of the new functions.	2020-12-15 22:45:31 +11:00
Julien Fischer	f8e65add3a	Format uints directly. Currently, the Mercury implementation of string formatting handles uints by casting them to ints and then using the code for formatting signed integers as unsigned values. Add an implementation that works directly on uints and make the code that formats signed integers as unsigned integers use that instead. The new implementation is simpler and avoids unnecessary conversions to arbitrary precision integers. Add new functions for converting uint values directly to octal and hexadecimal strings that use functionality provided by the underlying platforms; replace the Mercury code that previously did that with calls to these new functions. library/string.m: Add the functions uint_to_hex_string/1, uint_to_uc_hex_string/1 and uint_to_octal_string/1. library/string.format.m: Make format_uint/6 operate directly on uints instead of casting the value to a signed int and calling format_unsigned_int/6. Make format_unsigned_int/6 cast the int value to a uint and then call format_uint/6. Delete predicates and functions used to convert ints to octal and hexadecimal strings. We now just use the functions exported by the string module. NEWS: Announce the additions to the string module. tests/hard_coded/Mmakefile: tests/hard_coded/uint_string_conv.{m,exp*}: Add a test of uint string conversion.	2020-11-20 23:07:52 +11:00
Julien Fischer	8f35be65f5	Delete default Mercury clauses previously used for the Erlang backend. library/string.m: As above.	2020-11-14 14:39:08 +11:00
Zoltan Somogyi	d4861d739d	Allow formatting of sized integers. library/string.m: Add {i,u}{8.16,32,64} as function symbols in the poly_type type, each with a single argument containing an integer with the named signedness and size. The idea is that each of these poly_type values works exactly the same way as the i(_) poly_type (if signed) or the u(_) poly_type (if unsigned), with the exception that the value specified by the call is cast to int or uint before being processed. library/string.parse_runtime.m: Parse the new kinds of poly_types. Change the representation of the result of the parsing to allow recording of the sizes of ints and uints. Put the code that does the parsing into a predicate of its own. library/string.format.m: Do a cast to int or uint if the size information recorded in the specification of a signed or unsigned integer value calls for it. Provide functions to do the casting that do not require the import of {int,uint}{8,16,32,64}.m. This is to allow the compiler to generate calls to do such casts without having to implicitly import those modules. Abort if a 64 bit number is being cast to a 32 bit word. compiler/parse_string_format.m: Make the same changes as in string.parse_runtime.m, mutatis mutandis. compiler/format_call.m: Handle the new kinds of poly_types by adding a cast to int or uint if necessary, using the predicates added to library/string.format.m. Use a convenience function to make code creating instmap deltas more readable. library/io.m: library/pprint.m: library/string.parse_util.m: tests/invalid/string_format_bad.m: tests/invalid/string_format_unknown.m: Conform to the changes above. tests/string_format/string_format_d.m: tests/string_format/string_format_u.m: Test the printing of some of the new poly_types. tests/string_format/string_format_d.exp2: tests/string_format/string_format_u.exp2: Update the expected output of these tests on 64-bit platforms. tests/string_format/string_format_lib.m: Update programming style.	2020-11-10 11:00:47 +11:00
Peter Wang	0d3fcbaae3	Delete Erlang code from library/mdbcomp/browser directories. library/*.m: Delete Erlang foreign code and foreign types. Delete documentation specific to Erlang targets. library/deconstruct.m: Add pragma no_determinism_warning to allow functor_number_cc/3 to compile for now. library/Mercury.options: Delete workaround only needed when targetting Erlang. browser/listing.m: mdbcomp/rtti_access.m: Delete Erlang foreign code and foreign types.	2020-10-28 14:10:56 +11:00
Zoltan Somogyi	a36eed702d	Add add_suffix to the standard library. compiler/write_deps_file.m: library/string.m: Move a generally-useful function to the library. NEWS: Announce the addition.	2020-10-19 15:52:47 +11:00
Julien Fischer	9528f326d2	Formatting of uints using string.format etc. Extend the operations that perform formatted conversion, such as string.format/2, to be able to handle values of type uint directly. We have always supported formatting values of type int as unsigned values, but currently the only way to format uint values is by explicitly casting them to an int. This addresses Mantis issue #502. library/string.m: Add a new alternative to the poly_type/0 type that wraps uint values. Update the documentation for string.format. uint values may now be formatted using the u, x, X, o or p conversion specifiers. library/string.format.m: Add the necessary machinery for handling formatting of uint values. library/string.parse_runtime.m: library/string.parse_util.m: Handle uint poly_types. library/io.m:a Handle uint values in the write_many predicates. library/pprint.m: Handle uint values in the poly/1 function. compiler/format_call.m: compiler/parse_string_format.m: Conform to the above changes. compiler/options.m: Add a way to detect if a compiler supports this change. NEWS: Announce the above changes. tests/hard_coded/stream_format.{m,exp}: Extend this test to cover uints. tests/invalid/string_format_bad.m: tests/invalid/string_format_unknown.m: Conform to the above changes. tests/string_format/Mmakefile: tests/string_format/string_format_uint_o.{m,exp,exp2}: tests/string_format/string_format_uint_u.{m,exp,exp2}: tests/string_format/string_format_uint_x.{m,exp,exp2}: Add tests of string.format with uints.	2020-05-23 14:01:01 +10:00
Zoltan Somogyi	a6228a9e1a	Fix too-long lines.	2020-04-10 03:22:40 +10:00
Zoltan Somogyi	a2bdcece54	Improve English in some comments.	2020-04-07 22:24:00 +10:00
Peter Wang	ff0c363ea4	Define int to string conversions more precisely. library/string.m: As above.	2020-01-21 16:19:27 +11:00
Peter Wang	7d52b9f593	Announce recent changes to string type and string module. NEWS: Announce changes regarding ill-formed code unit sequences in strings. library/string.m: Delete a note about ongoing work.	2019-11-19 14:23:15 +11:00
Peter Wang	78da14c581	Add string indexing predicates that indicate a code unit was replaced. library/string.m: Add index_next_repl, unsafe_index_next_repl, prev_index_repl, unsafe_prev_index_repl predicates that return an indication if a replacement character was returned because an ill-formed code unit sequence was encountered. Add more pragma inlines for indexing predicates. Remove may_not_duplicate attribute on the Erlang version of unsafe_prev_index_repl, which would conflict with the pragma inline declaration. This requires the helper function do_unsafe_prev_index to be exported. tests/hard_coded/string_append_ooi_ilseq.m: tests/hard_coded/string_set_char_ilseq.m: Use index_next_repl in test cases. NEWS: Announce additions.	2019-11-19 14:23:15 +11:00
Peter Wang	9a042f4fb1	Minor documentation changes. library/string.m: Add missing word. Just write "code points" instead of "character" followed by clarification in a few spots. Delete _underscores_ which aren't particularly helpful.	2019-11-14 15:45:40 +11:00
Peter Wang	7ef407e937	Enable pragma obsolete_proc declarations. library/string.m: Enable pragma obsolete_proc declarations since we now require a recent enough compiler version.	2019-11-14 11:28:25 +11:00
Peter Wang	5c3b392ed0	Implement string.(un)capitalize_first more efficiently. library/string.m: Avoid creating temporary string in capitalize_first and uncapitalize_first.	2019-11-12 17:16:50 +11:00
Peter Wang	f71b5f20ed	Define behaviour of string.set_char etc on ill-formed sequences. library/string.m: Define behaviour of set_char, det_set_char and unsafe_set_char on ill-formed sequences. Also define them to throw an exception on an attempt to set a null character or surrogate code point in a UTF-8 string. Delete claim that unsafe_set_char is constant time. That would only be true for the destructive mode of unsafe_set_char, and that mode has been disabled for a long time. Implement the defined behaviour for C and C# versions of unsafe_set_char. The Java version already behaved as defined. Use unsafe_set_char to implement set_char instead of duplicating foreign code. Replace a couple of uses of strcpy with MR_memcpy as it was convenient to do so. (On OpenBSD, the linker issues a warning whenever strcpy is used. Avoiding the warning is not high priority but we might still like to eliminate all uses of strcpy eventually.) tests/hard_coded/Mmakefile: tests/hard_coded/string_set_char_ilseq.exp: tests/hard_coded/string_set_char_ilseq.exp2: tests/hard_coded/string_set_char_ilseq.m: Add test case.	2019-11-12 17:16:34 +11:00
Peter Wang	ae2dda693e	Avoid range checks in string.split_at_separator. library/string.m: Avoid unnecessary range checks in split_at_separator.	2019-11-08 14:25:23 +11:00
Peter Wang	b68548d4dc	Avoid garbage in Mercury versions of string.append_list/join_list. library/string.m: Use unsafe_append_string_pieces in Mercury implementations of append_list and join_list. This has no practical effect as we have foreign code implementations of both, for all target languages.	2019-11-08 14:25:23 +11:00
Peter Wang	68ae33c426	Avoid intermediate strings in string.replace_all. library/string.m: Implement string.replace_all using unsafe_append_string_pieces to avoid intermediate strings. Use unsafe_sub_string_search_start to avoid repeated range checks.	2019-11-08 14:25:23 +11:00
Peter Wang	3daee4fc23	Avoid intermediate strings in string.replace. library/string.m: Implement string.replace using unsafe_append_string_pieces.	2019-11-08 14:25:23 +11:00
Peter Wang	7eb78c66d1	Add string.unsafe_sub_string_search_start. library/string.m: Add unsafe_sub_string_search_start/4. NEWS: Announce addition.	2019-11-08 14:25:23 +11:00
Peter Wang	3621cfa650	Delete deprecated substring predicates and functions. library/string.m: Delete long-deprecated substring/3 function and substring/4 predicate. The newly introduced `string_piece' type has a substring/3 data constructor which takes (start, end) offsets into the base string, whereas the function and predicate take (start, count) arguments. To reduce potential confusion, delete the deprecated function and predicate. Delete other deprecated substring predicates and functions as well. tests/general/Mercury.options: tests/general/string_foldl_substring.exp: tests/general/string_foldl_substring.m: tests/general/string_foldr_substring.exp: tests/general/string_foldr_substring.m: tests/hard_coded/Mercury.options: tests/hard_coded/string_substring.m: Delete tests for deprecated predicates. tests/tabling/mercury_java_parser_dead_proc_elim_bug.m: tests/tabling/mercury_java_parser_dead_proc_elim_bug2.m: tests/valid/mercury_java_parser_follow_code_bug.m: Replace calls to unsafe_substring with unsafe_between. NEWS: Announce the changes.	2019-11-08 14:25:23 +11:00
Peter Wang	96b2caf536	Add string.unsafe_append_string_pieces. library/string.m: Add unsafe_append_string_pieces/2 predicate. NEWS: Announce addition.	2019-11-08 14:23:06 +11:00
Peter Wang	f2e0998651	Add string.append_string_pieces. library/string.m: Add append_string_pieces/2 predicate. library/io.m: Add a comment about a potential future change. tests/hard_coded/Mmakefile: tests/hard_coded/string_append_pieces.exp: tests/hard_coded/string_append_pieces.m: Add test case. NEWS: Announce addition.	2019-11-08 14:23:06 +11:00
Peter Wang	d2c3ede17d	Make string.replace_all with empty pattern preserve ill-formed sequences. library/string.m: Define behaviour of string.replace_all on ill-formed code unit sequences when the pattern is empty. Implement that behaviour. Use better variable names in documentation of string.replace and string.replace_all. tests/general/string_replace.exp: tests/general/string_replace.exp2: tests/general/string_replace.m: Extend test case. Update code style.	2019-11-08 13:57:38 +11:00
Peter Wang	0a1f289b6d	Make generic versions of string.to_upper/lower preserve ill-formed sequences. library/string.m: Make generic implementations of string.to_upper and string.to_lower preserve ill-formed sequences. (The foreign language implementations already did so.)	2019-11-06 13:43:54 +11:00
Peter Wang	031b6d915d	Document that string.count_utf8_code_units throws exceptions. library/string.m: Document that count_utf8_code_units throws an exception if the string contains an unpaired surrogate code point. Make the exception message thrown more useful to callers. Delete unnecessary foreign_procs.	2019-11-06 13:43:54 +11:00
Peter Wang	2e5f6ddef9	Make string.to_utf16_code_unit_list throw exception for ill-formed UTF-8. library/string.m: As above.	2019-11-06 13:43:54 +11:00
Peter Wang	67234fc898	Document that string.to_utf8_code_unit_list throws exceptions. library/string.m: Document that string.to_utf8_code_unit_list throws an exception if the string contains an unpaired surrogate code point.	2019-11-06 13:43:54 +11:00
Peter Wang	1e85dcb99e	Add string.from_code_unit_list_allow_ill_formed. library/string.m: Add string.from_code_unit_list_allow_ill_formed/2. tests/hard_coded/string_from_code_unit_list.exp: tests/hard_coded/string_from_code_unit_list.exp2: tests/hard_coded/string_from_code_unit_list.m: Extend test case. NEWS: Announce addition.	2019-11-06 13:43:54 +11:00
Peter Wang	adbf4c51c8	Tighten up string.from_code_unit_list et al. library/string.m: Document that from_code_unit_list fails if the result string would contain a null character, and enforce that in the Java and C# implementations. It was already enforced in the C implementation. Make from_code_unit_list fail if the code unit list contains an invalid value (negative or >0xff or >0xffff). Document that from_utf{8,16}_code_unit_list fails if the result string would contain a null character. Make from_utf8_code_unit_list call semidet_from_rev_char_list rather than from_rev_char_list so that it fails as documented instead of throwing an exception if the code unit list correctly encodes a list of code points, but the code points cannot be encoded into a string. Similarly for from_utf16_code_unit_list. tests/hard_coded/Mmakefile: tests/hard_coded/string_from_code_unit_list.exp: tests/hard_coded/string_from_code_unit_list.exp2: tests/hard_coded/string_from_code_unit_list.m: Add test case.	2019-11-06 13:43:54 +11:00
Peter Wang	0c6778c89f	Simplify Erlang implementation of sub_string_search_start. library/string.m: As above. (Not that simple in the end.)	2019-10-31 17:20:09 +11:00
Peter Wang	c4fcbdaea3	Make generic version of string.sub_string_search_start more efficient. library/string.m: Use unsafe_compare_substrings in generic version of sub_string_search_start.	2019-10-31 17:20:09 +11:00
Peter Wang	91868fe7ef	Define string.sub_string_search_start for out-of-range starting offset. library/string.m: Define sub_string_search_start to fail if the BeginAt parameter is negative or past the end of the string to search. The original C implementation did not check for an out-of-range starting offset, and could crash the program. The C implementation was later amended to fail instead, but not other implementations. Check for negative starting offset in non-C implementations of sub_string_search_start. tests/hard_coded/string_sub_string_search.m: Extend test case.	2019-10-31 17:20:09 +11:00
Peter Wang	30d0933f59	Fix C# version of string.sub_string_search to be culture-insensitive. library/string.m: Make C# implementation of sub_string_search perform ordinal (Unicode code point) based string search, instead of a culture-sensitive search.	2019-10-31 15:56:24 +11:00
Peter Wang	d40ab1ab44	Slightly improve string stripping functions. library/string.m: Use unsafe_between for chomp, lstrip_pred, rstrip_pred to avoid range checks.	2019-10-30 16:51:00 +11:00
Peter Wang	09512195fc	Make string.split_at_separator skip ill-formed sequences in UTF-8 strings. library/string.m: Make split_at_separator never consider ill-formed sequences in UTF-8 strings as potential separators, as they cannot contain any code points that could satify any given DelimP predicate on code points. Previously, split_at_separator would call DelimP(U+FFFD) for every code unit in an ill-formed sequence.	2019-10-30 16:51:00 +11:00
Peter Wang	1b91cf375c	Make string.words_separator skip ill-formed sequences in UTF-8. library/string.m: Make words_separator never consider ill-formed sequences in UTF-8 strings as potential separators, as they cannot contain any code points that could satisfy any given SepP predicate on code points. Previously, words_separator would call SepP(U+FFFD) for every code unit in an ill-formed sequence.	2019-10-30 16:51:00 +11:00
Peter Wang	de2af8cdd7	Make string.all_match fail on UTF-8 string containing ill-formed sequence. library/string.m: Make all_match(Pred, String) always fail if the string contains an ill-formed code unit sequence, and strings use UTF-8 encoding. Such sequences do not contain any code points that could satisfy a test on code points. Previously, all_match would call Pred(U+FFFD) for every code unit in an ill-formed sequence. Define all_match to rule out an interpretation that could ignore ill-formed sequences.	2019-10-30 16:51:00 +11:00
Peter Wang	817cf44efd	Make string.prefix_length/suffix_length stop at ill-formed sequence. library/string.m: Make prefix_length and suffix_length stop at an ill-formed sequence in UTF-8 strings. Such a sequence does not contain any code point that could satisfy a test on code points. Previously, prefix_length and suffix_length would would call Pred(U+FFFD) for every code unit in an ill-formed sequence. Tweak documentation. Delete obsolete comments.	2019-10-30 16:51:00 +11:00
Peter Wang	265ffa15f0	Fix two bugs in string.contains_char. library/string.m: Fix C implementation of contains_char to fail when asked to test for a surrogate code point in a string. It previously would (always) succeed, which is a bug. Fix generic implementation so that contains_char(String, '\uFFFD') will not succeed just because String contains an ill-formed sequence (in UTF-8 grades). Delete obsolete comment.	2019-10-30 16:51:00 +11:00
Peter Wang	6c0c337568	Add string indexing predicates that indicate if the char was replaced. library/string.m: Add index_next_repl, unsafe_index_next_repl, prev_index_repl, unsafe_prev_index_repl predicates. These are internal for now, so we can try them out in the string module without committing to the interface.	2019-10-30 16:51:00 +11:00
Peter Wang	7da7c103df	Improve definition of string.index, index_next, prev_index. library/string.m: Fix definition of index/3 and index_next/4 to account for an offset into a non-initial code unit in a well-formed code unit sequence. Similarly for prev_index/4.	2019-10-30 16:51:00 +11:00
Peter Wang	9bee18553c	Correct documentation for string.from_char_list.	2019-10-30 12:02:42 +11:00
Peter Wang	831003f042	Delete outdated todo.	2019-10-30 11:21:02 +11:00
Peter Wang	658c8a5ad5	Define behaviour of string.char_to_string on edge cases. library/string.m: Define behaviour of char_to_string when the string is not well-formed or if the char is a surrogate code point. Implement char_to_string/2 using multiple clauses as the described behaviour doesn't match to_char_list/2. tests/hard_coded/Mmakefile: tests/hard_coded/char_to_string.exp: tests/hard_coded/char_to_string.exp2: tests/hard_coded/char_to_string.m: Add test case.	2019-10-30 11:21:02 +11:00
Peter Wang	56687d235e	Define behaviour of string.first_char/3 on edge cases. library/string.m: Define first_char/3 to fail if the input string begins with an ill-formed code unit sequence. Define the reverse mode to throw an exception on an attempt to encode a null character or surrogate code point in the output string. Reimplement first_char/3 in Mercury. hard_coded/Mmakefile: hard_coded/string_first_char_ilseq.exp: hard_coded/string_first_char_ilseq.m: Add test case.	2019-10-30 11:21:02 +11:00
Peter Wang	025bee0549	Check for surrogates when converting list of char to string. library/string.m: Make from_char_list, from_rev_char_list, to_char_list throw an exception if the list of chars includes a surrogate code point that cannot be encoded in a UTF-8 string. Make semidet_from_char_list, semidet_from_rev_char_list, to_char_list fail if the list of chars includes a surrogate code point that cannot be encoded in a UTF-8 string. runtime/mercury_string.h: Document return value of MR_utf8_width. tests/hard_coded/Mmakefile: tests/hard_coded/string_from_char_list_ilseq.exp: tests/hard_coded/string_from_char_list_ilseq.exp2: tests/hard_coded/string_from_char_list_ilseq.m: Add test case. tests/hard_coded/null_char.exp: Expect new message in exceptions thrown by from_char_list, from_rev_char_list. tests/hard_coded/string_hash.m: Don't generate surrogate code points in random strings.	2019-10-30 11:21:02 +11:00

1 2 3 4 5 ...

467 Commits