mercury

mirror of https://github.com/Mercury-Language/mercury.git synced 2026-04-29 00:04:55 +00:00

Author	SHA1	Message	Date
Peter Wang	3621cfa650	Delete deprecated substring predicates and functions. library/string.m: Delete long-deprecated substring/3 function and substring/4 predicate. The newly introduced `string_piece' type has a substring/3 data constructor which takes (start, end) offsets into the base string, whereas the function and predicate take (start, count) arguments. To reduce potential confusion, delete the deprecated function and predicate. Delete other deprecated substring predicates and functions as well. tests/general/Mercury.options: tests/general/string_foldl_substring.exp: tests/general/string_foldl_substring.m: tests/general/string_foldr_substring.exp: tests/general/string_foldr_substring.m: tests/hard_coded/Mercury.options: tests/hard_coded/string_substring.m: Delete tests for deprecated predicates. tests/tabling/mercury_java_parser_dead_proc_elim_bug.m: tests/tabling/mercury_java_parser_dead_proc_elim_bug2.m: tests/valid/mercury_java_parser_follow_code_bug.m: Replace calls to unsafe_substring with unsafe_between. NEWS: Announce the changes.	2019-11-08 14:25:23 +11:00
Peter Wang	96b2caf536	Add string.unsafe_append_string_pieces. library/string.m: Add unsafe_append_string_pieces/2 predicate. NEWS: Announce addition.	2019-11-08 14:23:06 +11:00
Peter Wang	f2e0998651	Add string.append_string_pieces. library/string.m: Add append_string_pieces/2 predicate. library/io.m: Add a comment about a potential future change. tests/hard_coded/Mmakefile: tests/hard_coded/string_append_pieces.exp: tests/hard_coded/string_append_pieces.m: Add test case. NEWS: Announce addition.	2019-11-08 14:23:06 +11:00
Peter Wang	d2c3ede17d	Make string.replace_all with empty pattern preserve ill-formed sequences. library/string.m: Define behaviour of string.replace_all on ill-formed code unit sequences when the pattern is empty. Implement that behaviour. Use better variable names in documentation of string.replace and string.replace_all. tests/general/string_replace.exp: tests/general/string_replace.exp2: tests/general/string_replace.m: Extend test case. Update code style.	2019-11-08 13:57:38 +11:00
Peter Wang	0a1f289b6d	Make generic versions of string.to_upper/lower preserve ill-formed sequences. library/string.m: Make generic implementations of string.to_upper and string.to_lower preserve ill-formed sequences. (The foreign language implementations already did so.)	2019-11-06 13:43:54 +11:00
Peter Wang	031b6d915d	Document that string.count_utf8_code_units throws exceptions. library/string.m: Document that count_utf8_code_units throws an exception if the string contains an unpaired surrogate code point. Make the exception message thrown more useful to callers. Delete unnecessary foreign_procs.	2019-11-06 13:43:54 +11:00
Peter Wang	2e5f6ddef9	Make string.to_utf16_code_unit_list throw exception for ill-formed UTF-8. library/string.m: As above.	2019-11-06 13:43:54 +11:00
Peter Wang	67234fc898	Document that string.to_utf8_code_unit_list throws exceptions. library/string.m: Document that string.to_utf8_code_unit_list throws an exception if the string contains an unpaired surrogate code point.	2019-11-06 13:43:54 +11:00
Peter Wang	1e85dcb99e	Add string.from_code_unit_list_allow_ill_formed. library/string.m: Add string.from_code_unit_list_allow_ill_formed/2. tests/hard_coded/string_from_code_unit_list.exp: tests/hard_coded/string_from_code_unit_list.exp2: tests/hard_coded/string_from_code_unit_list.m: Extend test case. NEWS: Announce addition.	2019-11-06 13:43:54 +11:00
Peter Wang	adbf4c51c8	Tighten up string.from_code_unit_list et al. library/string.m: Document that from_code_unit_list fails if the result string would contain a null character, and enforce that in the Java and C# implementations. It was already enforced in the C implementation. Make from_code_unit_list fail if the code unit list contains an invalid value (negative or >0xff or >0xffff). Document that from_utf{8,16}_code_unit_list fails if the result string would contain a null character. Make from_utf8_code_unit_list call semidet_from_rev_char_list rather than from_rev_char_list so that it fails as documented instead of throwing an exception if the code unit list correctly encodes a list of code points, but the code points cannot be encoded into a string. Similarly for from_utf16_code_unit_list. tests/hard_coded/Mmakefile: tests/hard_coded/string_from_code_unit_list.exp: tests/hard_coded/string_from_code_unit_list.exp2: tests/hard_coded/string_from_code_unit_list.m: Add test case.	2019-11-06 13:43:54 +11:00
Peter Wang	0c6778c89f	Simplify Erlang implementation of sub_string_search_start. library/string.m: As above. (Not that simple in the end.)	2019-10-31 17:20:09 +11:00
Peter Wang	c4fcbdaea3	Make generic version of string.sub_string_search_start more efficient. library/string.m: Use unsafe_compare_substrings in generic version of sub_string_search_start.	2019-10-31 17:20:09 +11:00
Peter Wang	91868fe7ef	Define string.sub_string_search_start for out-of-range starting offset. library/string.m: Define sub_string_search_start to fail if the BeginAt parameter is negative or past the end of the string to search. The original C implementation did not check for an out-of-range starting offset, and could crash the program. The C implementation was later amended to fail instead, but not other implementations. Check for negative starting offset in non-C implementations of sub_string_search_start. tests/hard_coded/string_sub_string_search.m: Extend test case.	2019-10-31 17:20:09 +11:00
Peter Wang	30d0933f59	Fix C# version of string.sub_string_search to be culture-insensitive. library/string.m: Make C# implementation of sub_string_search perform ordinal (Unicode code point) based string search, instead of a culture-sensitive search.	2019-10-31 15:56:24 +11:00
Peter Wang	d40ab1ab44	Slightly improve string stripping functions. library/string.m: Use unsafe_between for chomp, lstrip_pred, rstrip_pred to avoid range checks.	2019-10-30 16:51:00 +11:00
Peter Wang	09512195fc	Make string.split_at_separator skip ill-formed sequences in UTF-8 strings. library/string.m: Make split_at_separator never consider ill-formed sequences in UTF-8 strings as potential separators, as they cannot contain any code points that could satify any given DelimP predicate on code points. Previously, split_at_separator would call DelimP(U+FFFD) for every code unit in an ill-formed sequence.	2019-10-30 16:51:00 +11:00
Peter Wang	1b91cf375c	Make string.words_separator skip ill-formed sequences in UTF-8. library/string.m: Make words_separator never consider ill-formed sequences in UTF-8 strings as potential separators, as they cannot contain any code points that could satisfy any given SepP predicate on code points. Previously, words_separator would call SepP(U+FFFD) for every code unit in an ill-formed sequence.	2019-10-30 16:51:00 +11:00
Peter Wang	de2af8cdd7	Make string.all_match fail on UTF-8 string containing ill-formed sequence. library/string.m: Make all_match(Pred, String) always fail if the string contains an ill-formed code unit sequence, and strings use UTF-8 encoding. Such sequences do not contain any code points that could satisfy a test on code points. Previously, all_match would call Pred(U+FFFD) for every code unit in an ill-formed sequence. Define all_match to rule out an interpretation that could ignore ill-formed sequences.	2019-10-30 16:51:00 +11:00
Peter Wang	817cf44efd	Make string.prefix_length/suffix_length stop at ill-formed sequence. library/string.m: Make prefix_length and suffix_length stop at an ill-formed sequence in UTF-8 strings. Such a sequence does not contain any code point that could satisfy a test on code points. Previously, prefix_length and suffix_length would would call Pred(U+FFFD) for every code unit in an ill-formed sequence. Tweak documentation. Delete obsolete comments.	2019-10-30 16:51:00 +11:00
Peter Wang	265ffa15f0	Fix two bugs in string.contains_char. library/string.m: Fix C implementation of contains_char to fail when asked to test for a surrogate code point in a string. It previously would (always) succeed, which is a bug. Fix generic implementation so that contains_char(String, '\uFFFD') will not succeed just because String contains an ill-formed sequence (in UTF-8 grades). Delete obsolete comment.	2019-10-30 16:51:00 +11:00
Peter Wang	6c0c337568	Add string indexing predicates that indicate if the char was replaced. library/string.m: Add index_next_repl, unsafe_index_next_repl, prev_index_repl, unsafe_prev_index_repl predicates. These are internal for now, so we can try them out in the string module without committing to the interface.	2019-10-30 16:51:00 +11:00
Peter Wang	7da7c103df	Improve definition of string.index, index_next, prev_index. library/string.m: Fix definition of index/3 and index_next/4 to account for an offset into a non-initial code unit in a well-formed code unit sequence. Similarly for prev_index/4.	2019-10-30 16:51:00 +11:00
Peter Wang	9bee18553c	Correct documentation for string.from_char_list.	2019-10-30 12:02:42 +11:00
Peter Wang	831003f042	Delete outdated todo.	2019-10-30 11:21:02 +11:00
Peter Wang	658c8a5ad5	Define behaviour of string.char_to_string on edge cases. library/string.m: Define behaviour of char_to_string when the string is not well-formed or if the char is a surrogate code point. Implement char_to_string/2 using multiple clauses as the described behaviour doesn't match to_char_list/2. tests/hard_coded/Mmakefile: tests/hard_coded/char_to_string.exp: tests/hard_coded/char_to_string.exp2: tests/hard_coded/char_to_string.m: Add test case.	2019-10-30 11:21:02 +11:00
Peter Wang	56687d235e	Define behaviour of string.first_char/3 on edge cases. library/string.m: Define first_char/3 to fail if the input string begins with an ill-formed code unit sequence. Define the reverse mode to throw an exception on an attempt to encode a null character or surrogate code point in the output string. Reimplement first_char/3 in Mercury. hard_coded/Mmakefile: hard_coded/string_first_char_ilseq.exp: hard_coded/string_first_char_ilseq.m: Add test case.	2019-10-30 11:21:02 +11:00
Peter Wang	025bee0549	Check for surrogates when converting list of char to string. library/string.m: Make from_char_list, from_rev_char_list, to_char_list throw an exception if the list of chars includes a surrogate code point that cannot be encoded in a UTF-8 string. Make semidet_from_char_list, semidet_from_rev_char_list, to_char_list fail if the list of chars includes a surrogate code point that cannot be encoded in a UTF-8 string. runtime/mercury_string.h: Document return value of MR_utf8_width. tests/hard_coded/Mmakefile: tests/hard_coded/string_from_char_list_ilseq.exp: tests/hard_coded/string_from_char_list_ilseq.exp2: tests/hard_coded/string_from_char_list_ilseq.m: Add test case. tests/hard_coded/null_char.exp: Expect new message in exceptions thrown by from_char_list, from_rev_char_list. tests/hard_coded/string_hash.m: Don't generate surrogate code points in random strings.	2019-10-30 11:21:02 +11:00
Peter Wang	527fae384e	Make compare_ignore_case_ascii loop over code units. library/string.m: Make compare_ignore_case_ascii loop over code units instead of code points, allowing it to work on strings that contain ill-formed code unit sequences.	2019-10-29 11:16:23 +11:00
Peter Wang	0ed3599f26	Deprecate multi modes of string.prefix and string.suffix. The two modes of string.prefix and string.suffix are not equivalent in the presence of ill-formed code unit sequences. The solution is to deprecate the lesser used mode of each. library/string.m: As above. Delete outdated comments. NEWS: Announce the changes.	2019-10-25 15:10:45 +11:00
Peter Wang	cdddf3a047	Simplify string.prefix and string.suffix implementations. library/string.m: Implement prefix(in, in) and suffix(in, in) using compare_substrings.	2019-10-25 15:10:45 +11:00
Peter Wang	a12ee1907e	Add string.append(uo, in, in) mode. library/string.m: Add string.append(uo, in, in) mode. The comment about it being multi instead of semidet was written back when string.append was implemented in terms of list.append. Implement remove_suffix using the new procedure (more efficient). Implement remove_suffix_if_present using remove_suffix (more efficient). Add comments about the argument orders of remove_suffix, det_remove_suffix, remove_suffix_if_present.	2019-10-25 15:10:45 +11:00
Peter Wang	a12663ea76	Deprecate string.append(out, out, in) mode. Mark pointed out that the string.append(out, out, in) mode does not match the forward modes. The simplest solution is to deprecate and eventually remove it. library/string.m: Deprecate string.append(out, out, in) mode. Add string.nondet_append/3 as its replacement. Add more documentation. NEWS: Announce changes.	2019-10-24 12:58:28 +11:00
Peter Wang	cd899271c6	Make string.append(out, out, in) work with ill-formed sequences. library/string.m: Simplify string.append(out, out, in) and make it work sensibly in the presence of ill-formed code unit sequences, breaking the input string after each code point or code unit in an ill-formed sequence. tests/hard_coded/Mmakefile: tests/hard_coded/string_append_ooi_ilseq.exp: tests/hard_coded/string_append_ooi_ilseq.exp2: tests/hard_coded/string_append_ooi_ilseq.m: Add test case.	2019-10-24 12:31:29 +11:00
Peter Wang	30f287951c	Simplify string.append(in, in, in) implementation. library/string.m: Replace foreign code implementations with Mercury code.	2019-10-24 12:31:29 +11:00
Peter Wang	93bf252632	Simplify string.append(in, out, in) implementation. library/string.m: Replace foreign code implementations with Mercury code.	2019-10-24 12:31:29 +11:00
Peter Wang	3c68d3d8f2	Implement string.unsafe_compare_substrings with foreign code. library/string.m: Add C and C# native implementations of unsafe_compare_substrings.	2019-10-24 12:31:29 +11:00
Peter Wang	bf1f624632	Add string.compare_substrings and unsafe_compare_substrings. library/string.m: Add the new predicates. tests/hard_coded/Mmakefile: tests/hard_coded/string_compare_substrings.exp: tests/hard_coded/string_compare_substrings.m: Add test case. NEWS: Announce additions.	2019-10-24 12:31:29 +11:00
Peter Wang	e28ff5bbe7	Define string.between_codepoints more precisely and fix bug. library/string.m: Define string.between_codepoints in terms of codepoint_offset. Fix behaviour in the case where Start < 0, End < 0, End > Start tests/hard_coded/string_codepoint.exp: tests/hard_coded/string_codepoint.exp2: tests/hard_coded/string_codepoint.m: Extend test case.	2019-10-24 12:31:29 +11:00
Peter Wang	8143b07257	Simplify string.between implementation. library/string.m: Replace foreign code implementations with Mercury code.	2019-10-24 12:31:29 +11:00
Peter Wang	5e52d45cc4	Make string.left and string.right not create unused substrings. library/string.m: Implement string.left and string.right using string.between instead of string.split so as not to create unused substrings.	2019-10-24 12:31:29 +11:00
Peter Wang	b18a47c32f	Simplify string.split implementation. library/string.m: Replace foreign code implementations with Mercury code.	2019-10-24 12:31:29 +11:00
Peter Wang	778bff560d	Deprecate modes of string predicates that imply round-trippability. Mark pointed out that to_char_list/2 having multiple modes implies the ability to round trip convert between a string and list of chars, which is not true if to_char_list replaces code units in ill-formed sequences with U+FFFD; converting the list of chars back to a string may produce a different string from the original input. library/string.m: Deprecate reverse modes of to_char_list/2, to_rev_char_list/2 and from_char_list/2. Add commented out `obsolete_proc' pragmas to be enabled at a later date. Delete the unused Mercury implementation of string.append/3 that depends on multi-moded to_char_list/2. The implementation is incorrect anyway in the presence of ill-formed code unit sequences. Add comment about a future change to char_to_string. NEWS: Announce changes.	2019-10-24 09:24:50 +11:00
Peter Wang	7350e7f0b6	Define behaviour of string.codepoint_offset on ill-formed sequences. library/string.m: Define how string.codepoint_offset counts code units in ill-formed sequences. Delete C and C# foreign implementations in favour of the Mercury implementation that has the intended behaviour. (The Java implementation uses String.offsetByCodePoints which also matches our intended behaviour.) tests/hard_coded/Mmakefile: tests/hard_coded/string_codepoint_offset_ilseq.exp2: tests/hard_coded/string_codepoint_offset_ilseq.m: Add test case.	2019-10-24 09:22:13 +11:00
Peter Wang	edfbeb1d9a	Define behaviour of string.foldl etc on ill-formed sequences. library/string.m: As above. tests/hard_coded/Mmakefile: tests/hard_coded/string_fold_ilseq.exp: tests/hard_coded/string_fold_ilseq.exp2: tests/hard_coded/string_fold_ilseq.m: Add test case.	2019-10-24 09:22:13 +11:00
Peter Wang	250b5bcc2e	Define behaviour of string.count_codepoints with ill-formed sequences. library/string.m: Make each code unit in an ill-formed sequence contribute one to the value of string.count_codepoints. tests/hard_coded/Mmakefile: tests/hard_coded/string_count_codepoints_ilseq.exp: tests/hard_coded/string_count_codepoints_ilseq.exp2: tests/hard_coded/string_count_codepoints_ilseq.m: Add test case.	2019-10-24 09:14:46 +11:00
Peter Wang	9b25e167e1	Define behaviour of string.to_char_list (and rev) on ill-formed sequences. library/string.m: Define string.to_char_list and string.to_rev_char_list to either replace code units in ill-formed sequences with U+FFFD or return unpaired surrogate code points. Use Mercury version of do_to_char_list instead of updating the foreign language implementations. tests/hard_coded/Mmakefile: tests/hard_coded/string_char_list_ilseq.exp: tests/hard_coded/string_char_list_ilseq.exp2: tests/hard_coded/string_char_list_ilseq.m: Add test case.	2019-10-24 09:14:46 +11:00
Peter Wang	0c9bdf2587	Define behaviour of string.prev_index on ill-formed sequences. library/string.m: Make string.prev_index and string.unsafe_prev_index return either U+FFFD or an unpaired surrogate code point when an ill-formed code unit sequence is detected. tests/hard_coded/Mmakefile: tests/hard_coded/string_prev_index_ilseq.exp: tests/hard_coded/string_prev_index_ilseq.exp2: tests/hard_coded/string_prev_index_ilseq.m: Add test case.	2019-10-24 09:14:46 +11:00
Peter Wang	d055627fd2	Define behaviour of string.index_next on ill-formed sequences. library/string.m: Make string.index_next and string.unsafe_index_next return either U+FFFD or an unpaired surrogate code point when an ill-formed code unit sequence is detected. tests/hard_coded/Mmakefile: tests/hard_coded/string_index_next_ilseq.exp: tests/hard_coded/string_index_next_ilseq.exp2: tests/hard_coded/string_index_next_ilseq.m: Add test case.	2019-10-24 09:14:46 +11:00
Peter Wang	47d0f70ea4	Define behaviour of string.index on ill-formed sequences. library/string.m: Make string.index/3 and string.unsafe_index/3 return either U+FFFD or an unpaired surrogate code point when an ill-formed code unit sequence is detected. tests/hard_coded/Mmakefile: tests/hard_coded/string_index_ilseq.exp: tests/hard_coded/string_index_ilseq.exp2: tests/hard_coded/string_index_ilseq.m: Add test case.	2019-10-24 09:14:46 +11:00
Peter Wang	1a619af68e	Add more TODOs relating to ill-formed code unit sequences.	2019-09-13 15:51:02 +10:00

1 2 3 4 5 ...

494 Commits