mercury

mirror of https://github.com/Mercury-Language/mercury.git synced 2026-04-20 03:43:51 +00:00

Author	SHA1	Message	Date
Julien Fischer	bc79412af2	Delete the old random number generator. It has been deprecated since Mercury 20.01. library/random.m: Delete the old generator. library/array.m: Delete the predicate random_permutation/4. NEWS: Announce the above. tests/hard_coded/string_hash.m: Update this test to use the new RNG framework. tests/hard_coded/Mmakefile: tests/hard_coded/random_permutation.{m,exp}: tests/hard_coded/random_simple.{m,exp}: Delete these tests, they were specific to the old RNG. extras/curs/samples/nibbles.m: extras/solver_types/library/any_array.m: Replace use of the old RNG.	2022-03-15 16:08:48 +11:00
Zoltan Somogyi	d23c4f74a3	Update the style of more tests.	2020-10-06 19:20:18 +11:00
Peter Wang	025bee0549	Check for surrogates when converting list of char to string. library/string.m: Make from_char_list, from_rev_char_list, to_char_list throw an exception if the list of chars includes a surrogate code point that cannot be encoded in a UTF-8 string. Make semidet_from_char_list, semidet_from_rev_char_list, to_char_list fail if the list of chars includes a surrogate code point that cannot be encoded in a UTF-8 string. runtime/mercury_string.h: Document return value of MR_utf8_width. tests/hard_coded/Mmakefile: tests/hard_coded/string_from_char_list_ilseq.exp: tests/hard_coded/string_from_char_list_ilseq.exp2: tests/hard_coded/string_from_char_list_ilseq.m: Add test case. tests/hard_coded/null_char.exp: Expect new message in exceptions thrown by from_char_list, from_rev_char_list. tests/hard_coded/string_hash.m: Don't generate surrogate code points in random strings.	2019-10-30 11:21:02 +11:00
Adrian Wong	081aba18cd	Add tests for string.hash{4,5,6}. tests/hard_coded/string_hash.m: As above.	2019-07-29 11:35:53 +10:00
Zoltan Somogyi	b4092d2e4e	Further improvements in the implementation of string switches, along with Estimated hours taken: 12 Branches: main Further improvements in the implementation of string switches, along with some bug fixes. If the chosen hash function does not yield any collisions for the strings in the switch arms, then we can optimize away the table column that we would otherwise need for open addressing. This was implemented in a previous diff. For an ordinary (non-lookup) string switch, the hash table has two columns in the presence of collisions and one column in their absence. Therefore if doubling the size of the table allows us to eliminate collisions, the table size is unaffected, though the corresponding array of labels we have to put into the computed_goto instruction we generate has to double as well. Thus the only cost of such doubling is an increase in "code" size, and for small tables, the elimination of the open addressing loop may compensate for this, at least partially. For lookup string switches, doubling the table size this way has a bigger space cost, but the elimination of the open addressing loop still brings a useful speed boost. We therefore now DO double the table size if this eliminates collisions. In the library, compiler etc directories, this eliminates collisions in 19 out of 47 switch switches that had collisions with the standard table size. compiler/switch_util.m: Replace the separate sets of predicates we used to have for computing hash maps (one for lookup switches and one for non-lookup switches) with a single set that works for both. Change this set to double the table size if this eliminates collisions. This requires it to decide the table size, a task previously done separately by each of its callers. One version of this set had an old bug, which caused it to effectively ignore the second and third string hash functions. This diff fixes it. There were two bugs in my previous diff: the unneeded table column was not being optimized away from several_soln lookup switches, and the lookup code for one_soln lookup switches used the wrong column offset. This diff fixes these too. Since doubling the table size requires recalculating all the hash values, decouple the computation of the hash values from generating code for each switch arm, since the latter shouldn't be done more than once. Add a note on an old problem. compiler/ml_string_switch.m: compiler/string_switch.m: Bring the code for generating code for the arms of string switches here from switch_util.m. tests/hard_coded/Mmakefile: Fix the reason why the bugs mentioned above were not detected: the relevant test cases weren't enabled. tests/hard_coded/string_hash.m: Update this test case to test the correspondence of the compiler's and the runtime's versions of not just the first hash function, but also the second and third. runtime/mercury_string.h: Fix a typo in a comment.	2011-08-02 00:05:44 +00:00
Peter Wang	3788a9d6fb	Improve Unicode support. Branches: main Improve Unicode support. Declare that we use the Unicode character set, and UTF-8 or UTF-16 for the internal string representation (depending on the backend). User code may be written to those assumptions. Other external encodings can be supported in the future by translating to/from Unicode internally. The `char' type now represents a Unicode code point. NOTE: questions about how to handle unpaired surrogate code points, etc. have been left for later. library/char.m: Define a `char' to be a Unicode code point and extend ranges appropriately. Add predicates: to_utf8, to_utf16, is_surrogate, is_noncharacter. Update some documentation. library/io.m: Declare I/O predicates on text streams to read/write code points, not ambiguous "characters". Text files are expected to use UTF-8 encoding. Supporting other encodings is for future work. Update the C and Erlang implementations to understand UTF-8 encoding. Update Java and C# implementations to read/write code points (Mercury char) instead of UTF-16 code units. Add `may_not_duplicate' attributes to some foreign_procs. Improve Erlang implementations of seeking and getting the stream size. library/string.m: Declare the string representations, as described earlier. Distinguish between code units and code points everywhere. Existing functions and predicates which take offset and length arguments continue to take them in terms of code units. Add procedures: count_code_units, count_codepoints, codepoint_offset, to_code_unit_list, from_code_unit_list, index_next, unsafe_index_next, unsafe_prev_index, unsafe_index_code_unit, split_by_codepoint, left_by_codepoint, right_by_codepoint, substring_by_codepoint. Make index, index_det call error/1 if an illegal sequence is detected, as they already do for invalid offsets. Clarify that is_all_alpha, is_all_alnum_or_underscore, is_alnum_or_underscore only succeed for the ASCII characters under each of those categories. Clarify that whitespace stripping functions only strip whitespace characters in the ASCII range. Add comments about the future treatment of surrogate code points (not yet implemented). Use Mercury format implementation when necessary instead of `sprintf'. The %c specifier does not work for code points which require multi-byte representation. The field width modifier for %s only works if the string contains only single-byte code points. library/lexer.m: Conform to string encoding changes. Simplify code dealing with \uNNNN escapes now that encoding/decoding is handled by the string module. library/term_io.m: Allow code points above 126 directly in Mercury source. NOTE: \x and \o codes are treated as code points by this change. runtime/mercury_types.h: Redefine `MR_Char' to be `int' to hold a Unicode code point. `MR_String' has to be defined as a pointer to `char' instead of a pointer to `MR_Char'. Some C foreign code will be affected by this change. runtime/mercury_string.c: runtime/mercury_string.h: Add UTF-8 helper routines and macros. Make hash routines conform to type changes. compiler/c_util.m: Fix output_quoted_string_lang so that it correctly outputs non-ASCII characters for each of the target languages. Fix quote_char for non-ASCII characters. compiler/elds_to_erlang.m: Write out code points above 126 normally instead of using escape syntax. Conform to string encoding changes. compiler/mlds_to_cs.m: Change Mercury `char' to be represented by C# `int'. compiler/mlds_to_java.m: Change Mercury `char' to be represented by Java `int'. doc/reference_manual.texi: Uncomment description of \u and \U escapes in string literals. Update description of C# and Java representations for Mercury `char' which are now `int'. tests/debugger/tailrec1.m: Conform to renaming. tests/general/string_replace.exp: tests/general/string_replace.m: Test non-ASCII characters to string.replace. tests/general/string_test.exp: tests/general/string_test.m: Test non-ASCII characters to string.duplicate_char, string.pad_right, string.pad_left, string.format_table. tests/hard_coded/char_unicode.exp: tests/hard_coded/char_unicode.m: Add test for new procedures in `char' module. tests/hard_coded/contains_char_2.m: Test non-ASCII characters to string.contains_char. tests/hard_coded/nonascii.exp: tests/hard_coded/nonascii.m: tests/hard_coded/nonascii_gen.c: Add code points above 255 to this test case. Change test data encoding to UTF-8. tests/hard_coded/string_class.exp: tests/hard_coded/string_class.m: Add test case for string.is_alpha, etc. tests/hard_coded/string_codepoint.exp: tests/hard_coded/string_codepoint.exp2: tests/hard_coded/string_codepoint.m: Add test case for new string procedures dealing with code points. tests/hard_coded/string_first_char.exp: tests/hard_coded/string_first_char.m: Add test case for all modes of string.first_char. tests/hard_coded/string_hash.m: Don't use buggy random.random/5 predicate which can overflow on a large range (such as the range of code points). tests/hard_coded/string_presuffix.exp: tests/hard_coded/string_presuffix.m: Add test case for string.prefix, string.suffix, etc. tests/hard_coded/string_set_char.m: Test non-ASCII characters to string.set_char. tests/hard_coded/string_strip.exp: tests/hard_coded/string_strip.m: Test non-ASCII characters to string stripping procedures. tests/hard_coded/string_sub_string_search.m: Test non-ASCII characters to string.sub_string_search. tests/hard_coded/unicode_test.exp: Update expected output due to change of behaviour of `string.to_char_list'. tests/hard_coded/unicode_test.m: Test non-ASCII character in separator string argument to string.join_list. tests/hard_coded/utf8_io.exp: tests/hard_coded/utf8_io.m: Add tests for UTF-8 I/O. tests/hard_coded/words_separator.exp: tests/hard_coded/words_separator.m: Add test case for `string.words_separator'. tests/hard_coded/Mmakefile: Add new test cases. Make special_char test case run on all backends. tests/hard_coded/special_char.exp: tests/valid/mercury_java_parser_follow_code_bug.m: Reencode these files in UTF-8. NEWS: Add a news entry.	2011-04-04 07:10:42 +00:00
Simon Taylor	5647714667	Make all functions which create strings from characters throw an exception Estimated hours taken: 15 Branches: main Make all functions which create strings from characters throw an exception or fail if the list of characters contains a null character. This removes a potential source of security vulnerabilities where one part of the program performs checks against the whole of a string passed in by an attacker (processing the string as a list of characters or using `unsafe_index' to look past the null character), but then passes the string to another part of the program or an operating system call that only sees up to the first null character. Even if Mercury stored the length with the string, allowing the creation of strings containing nulls would be a bad idea because it would be too easy to pass a string to foreign code without checking. For examples see: <http://insecure.org/news/P55-07.txt> <http://www.securiteam.com/securitynews/5WP0B1FKKQ.html> <http://www.securityfocus.com/archive/1/445788> <http://www.securityfocus.com/archive/82/368750> <http://secunia.com/advisories/16420/> NEWS: Document the change. library/string.m: Throw an exception if null characters are found in string.from_char_list and string.from_rev_char_list. Add string.from_char_list_semidet and string.from_rev_char_list_semidet which fail rather throwing an exception. This doesn't match the normal naming convention, but string.from_{,rev_}char_list are widely used, so changing their determinism would be a bit too disruptive. Don't allocate an unnecessary extra word for each string created by from_char_list and from_rev_char_list. Explain that to_upper and to_lower only work on un-accented Latin letters. library/lexer.m: Check for invalid characters when reading Mercury strings and quoted names. Improve error messages by skipping to the end of any string or quoted name containing an error. Previously we just stopped processing at the error leaving an unmatched quote. library/io.m: Make io.read_line_as_string and io.read_file_as_string return an error code if the input file contains a null character. Fix an XXX: '\0\' is not recognised as a character constant, but char.det_from_int can be used to make a null character. library/char.m: Explain the workaround for '\0\' not being accepted as a char constant. Explain that to_upper and to_lower only work on un-accented Latin letters. compiler/layout.m: compiler/layout_out.m: compiler/c_util.m: compiler/stack_layout.m: compiler/llds.m: compiler/mlds.m: compiler/ll_backend..m: compiler/ml_backend..m: Don't pass around strings containing null characters (the string tables for the debugger). This doesn't cause any problems now, but won't work with the accurate garbage collector. Use lists of strings instead, and add the null characters when writing the strings out. tests/hard_coded/null_char.{m,exp}: Change an existing test case to test that creation of a string containing a null throws an exception. tests/hard_coded/null_char.exp2: Deleted because alternative output is no longer needed. tests/invalid/Mmakefile: tests/invalid/null_char.m: tests/invalid/null_char.err_exp: Test error messages for construction of strings containing null characters by the lexer. tests/invalid/unicode{1,2}.err_exp: Update the expected output after the change to the handling of invalid quoted names and strings.	2007-03-18 23:35:04 +00:00
Zoltan Somogyi	c4345f7b8b	Avoid a C compiler warning about shadowing the identifier "String". Estimated hours taken: 0.1 Branches: main tests/hard_coded/string_hash.m: Avoid a C compiler warning about shadowing the identifier "String".	2004-03-19 10:13:14 +00:00
Simon Taylor	2b822930c3	Fix a bug which caused the results of MR_hash_string() Estimated hours taken: 1 Branches: main, release runtime/mercury_string.h: Fix a bug which caused the results of MR_hash_string() and string__hash to differ -- cast each character to MR_UnsignedChar before combining it with the hash value. tests/hard_coded/Mmakefile: tests/hard_coded/string_hash.{m,exp}: Test case.	2002-11-21 09:00:28 +00:00

9 Commits