mercury

mirror of https://github.com/Mercury-Language/mercury.git synced 2026-04-20 11:54:02 +00:00

Author	SHA1	Message	Date
Zoltan Somogyi	d769b04a96	Base string.format_table{,_max} on common code. library/string.m: Even though format_table_max is a minor tweak on format_table, its implementation used to be completely separate. Act on an old XXX and make format_table use the same primitive ops as format_table_max. Document the operation of format_table a bit better. Change the way that format_table_max handles column-width limits, by accepting overlong column contents without starting a new line. Document the new semantics. Use predmode decls when possible. tests/general/string_test.{m,exp}: Add a test of format_table_max, which previously did not have one.	2023-05-22 19:23:37 +10:00
Peter Wang	3788a9d6fb	Improve Unicode support. Branches: main Improve Unicode support. Declare that we use the Unicode character set, and UTF-8 or UTF-16 for the internal string representation (depending on the backend). User code may be written to those assumptions. Other external encodings can be supported in the future by translating to/from Unicode internally. The `char' type now represents a Unicode code point. NOTE: questions about how to handle unpaired surrogate code points, etc. have been left for later. library/char.m: Define a `char' to be a Unicode code point and extend ranges appropriately. Add predicates: to_utf8, to_utf16, is_surrogate, is_noncharacter. Update some documentation. library/io.m: Declare I/O predicates on text streams to read/write code points, not ambiguous "characters". Text files are expected to use UTF-8 encoding. Supporting other encodings is for future work. Update the C and Erlang implementations to understand UTF-8 encoding. Update Java and C# implementations to read/write code points (Mercury char) instead of UTF-16 code units. Add `may_not_duplicate' attributes to some foreign_procs. Improve Erlang implementations of seeking and getting the stream size. library/string.m: Declare the string representations, as described earlier. Distinguish between code units and code points everywhere. Existing functions and predicates which take offset and length arguments continue to take them in terms of code units. Add procedures: count_code_units, count_codepoints, codepoint_offset, to_code_unit_list, from_code_unit_list, index_next, unsafe_index_next, unsafe_prev_index, unsafe_index_code_unit, split_by_codepoint, left_by_codepoint, right_by_codepoint, substring_by_codepoint. Make index, index_det call error/1 if an illegal sequence is detected, as they already do for invalid offsets. Clarify that is_all_alpha, is_all_alnum_or_underscore, is_alnum_or_underscore only succeed for the ASCII characters under each of those categories. Clarify that whitespace stripping functions only strip whitespace characters in the ASCII range. Add comments about the future treatment of surrogate code points (not yet implemented). Use Mercury format implementation when necessary instead of `sprintf'. The %c specifier does not work for code points which require multi-byte representation. The field width modifier for %s only works if the string contains only single-byte code points. library/lexer.m: Conform to string encoding changes. Simplify code dealing with \uNNNN escapes now that encoding/decoding is handled by the string module. library/term_io.m: Allow code points above 126 directly in Mercury source. NOTE: \x and \o codes are treated as code points by this change. runtime/mercury_types.h: Redefine `MR_Char' to be `int' to hold a Unicode code point. `MR_String' has to be defined as a pointer to `char' instead of a pointer to `MR_Char'. Some C foreign code will be affected by this change. runtime/mercury_string.c: runtime/mercury_string.h: Add UTF-8 helper routines and macros. Make hash routines conform to type changes. compiler/c_util.m: Fix output_quoted_string_lang so that it correctly outputs non-ASCII characters for each of the target languages. Fix quote_char for non-ASCII characters. compiler/elds_to_erlang.m: Write out code points above 126 normally instead of using escape syntax. Conform to string encoding changes. compiler/mlds_to_cs.m: Change Mercury `char' to be represented by C# `int'. compiler/mlds_to_java.m: Change Mercury `char' to be represented by Java `int'. doc/reference_manual.texi: Uncomment description of \u and \U escapes in string literals. Update description of C# and Java representations for Mercury `char' which are now `int'. tests/debugger/tailrec1.m: Conform to renaming. tests/general/string_replace.exp: tests/general/string_replace.m: Test non-ASCII characters to string.replace. tests/general/string_test.exp: tests/general/string_test.m: Test non-ASCII characters to string.duplicate_char, string.pad_right, string.pad_left, string.format_table. tests/hard_coded/char_unicode.exp: tests/hard_coded/char_unicode.m: Add test for new procedures in `char' module. tests/hard_coded/contains_char_2.m: Test non-ASCII characters to string.contains_char. tests/hard_coded/nonascii.exp: tests/hard_coded/nonascii.m: tests/hard_coded/nonascii_gen.c: Add code points above 255 to this test case. Change test data encoding to UTF-8. tests/hard_coded/string_class.exp: tests/hard_coded/string_class.m: Add test case for string.is_alpha, etc. tests/hard_coded/string_codepoint.exp: tests/hard_coded/string_codepoint.exp2: tests/hard_coded/string_codepoint.m: Add test case for new string procedures dealing with code points. tests/hard_coded/string_first_char.exp: tests/hard_coded/string_first_char.m: Add test case for all modes of string.first_char. tests/hard_coded/string_hash.m: Don't use buggy random.random/5 predicate which can overflow on a large range (such as the range of code points). tests/hard_coded/string_presuffix.exp: tests/hard_coded/string_presuffix.m: Add test case for string.prefix, string.suffix, etc. tests/hard_coded/string_set_char.m: Test non-ASCII characters to string.set_char. tests/hard_coded/string_strip.exp: tests/hard_coded/string_strip.m: Test non-ASCII characters to string stripping procedures. tests/hard_coded/string_sub_string_search.m: Test non-ASCII characters to string.sub_string_search. tests/hard_coded/unicode_test.exp: Update expected output due to change of behaviour of `string.to_char_list'. tests/hard_coded/unicode_test.m: Test non-ASCII character in separator string argument to string.join_list. tests/hard_coded/utf8_io.exp: tests/hard_coded/utf8_io.m: Add tests for UTF-8 I/O. tests/hard_coded/words_separator.exp: tests/hard_coded/words_separator.m: Add test case for `string.words_separator'. tests/hard_coded/Mmakefile: Add new test cases. Make special_char test case run on all backends. tests/hard_coded/special_char.exp: tests/valid/mercury_java_parser_follow_code_bug.m: Reencode these files in UTF-8. NEWS: Add a news entry.	2011-04-04 07:10:42 +00:00
Ian MacLarty	3acc99dc4e	Add string.word_wrap/2 which breaks a string into multiple lines, preserving Estimated hours taken: 2 Branches: main Add string.word_wrap/2 which breaks a string into multiple lines, preserving whole words where possible. NEWS Mention string.word_wrap/2. library/string.m Add string.word_wrap/2. Rearrange some code so it's in top-down order. tests/general/string_test.exp tests/general/string_test.m Test string.word_wrap/2.	2005-03-30 10:52:05 +00:00
Ian MacLarty	7ac48dfd3f	Add a function to the string module to generate a formatted text table. Estimated hours taken: 3 Branches: main Add a function to the string module to generate a formatted text table. library/string.m Add format_table/2 which generates a formatted text table from a list of columns. tests/general/string_test.exp tests/general/string_test.m Test the new function.	2005-02-04 05:55:16 +00:00
Ian MacLarty	d3891bc782	Add predicates and functions to the string module to format integers with Estimated hours taken: 2 Branches: main Add predicates and functions to the string module to format integers with thousand separators. library/string.m Add a predicate and function to convert an int to a string with commas as thousand separators. Add a predicate and function to convert an int to any base with any string between any number of digits. tests/general/string_test.exp tests/general/string_test.m Test the new functionality.	2005-02-03 09:06:37 +00:00
Zoltan Somogyi	ae2ab72716	Compare actual outputs with the outputs computed by NU-Prolog. Estimated hours taken: 1 runtests: Compare actual outputs with the outputs computed by NU-Prolog. Mmake: Enable the dnf test. commit_bug.m: Use more readable formatting. environment.m: Since the expected output may be generated on a different machine than the one on which the test is run, don't print the value of a possibly machine-specific environment variable such as PATH. semidet_lambda.m: Fix the name of the module. univ.m: Add a couple of tests to exercise the typeinfo comparison routine. unreachable.m: Fix a comment. *.exp: The expected output files.	1996-11-04 07:08:57 +00:00

6 Commits