Commit Graph

93 Commits

Author SHA1 Message Date
Zoltan Somogyi
5cbcfaa0ed Move X_to_doc functions to pretty_printer.m.
library/array.m:
library/char.m:
library/float.m:
library/int.m:
library/int16.m:
library/int32.m:
library/int64.m:
library/int8.m:
library/list.m:
library/one_or_more.m:
library/string.m:
library/tree234.m:
library/uint.m:
library/uint16.m:
library/uint32.m:
library/uint64.m:
library/uint8.m:
library/version_array.m:
    Mark the X_to_doc function in each of these modules as obsolete,
    and make it a forwarding function to the actual implementation
    in pretty_printer.m. The intention is that when these forwarding
    functions are eventually removed, this will also remove the dependency
    of these modules on pretty_printer.m. This should help at least some
    of these modules escape the giant SCC in the library's dependency graph.
    (It does not make sense that a library module that adds code to increment
    an int thereby becomes dependent on pretty_printer.m through int.m.)

library/pretty_printer.m:
    Move all the X_to_doc functions from the above modules here.

    Fix the one_or_more_to_doc function, which was

    - missing the comma between the two arguments of the one_or_more
      function symbol, and

    - would print "..., ...]" instead of just "...]" at the end of the
      tail list when that list exceeded the limits of the specified pp_params.

    Rename one of the moved types along with its function symbols,
    to reduce ambiguity.

    Put arrays before their indexes in the argument lists of some of
    the moved functions.

    Some of the moved X_to_doc functions for compound types returned
    a doc that had an indent wrapper. These indents differed between the
    various X_to_doc functions without any visible reason, but they are
    also redundant. The callers can trivially add such wrappers if they
    want to, but taking them off, if they want them off, is harder.
    Eliminate the problem by deleting all such indent wrappers.

    Add formatters for the intN, uintN and one_or_more types to the
    default formatter map. Their previous absence was an oversight.

    Add a function, get_formatter_map_entry_types, that returns the ids
    of the types in the formatter_map given to the function. It is intended
    for tests/hard_coded/test_pretty_printer_defaults.m, but is exported
    for anyone to use.

tests/hard_coded/test_pretty_printer_defaults.{m,exp}:
    Use get_formatter_map_entry_types to print the default formatter map
    in a format that is much more easily readable.

NEWS:
    Announce all the user-visible changes above.
2022-12-27 18:27:52 +11:00
Julien Fischer
5274170784 Make characters an instance of the uenum typeclass
The recent change to sparse_bitsets broke the lex library in extras.
Specifically, we now now need to make characters an instance of the
uenum typeclass. This diff does so.

library/char.m:
     Add predicates and functions for converting between unsigned integers
     and characters.

     Make characters an instance of the uenum typeclass.

tests/hard_coded/Mmakefile:
tests/hard_coded/char_uint_conv.{m,exp,exp2}:
     Add a test of the above conversions.

NEWS:
     Announce the additions.

extras/lex/lex.m:
     Conform to recent changes.
2022-12-20 20:18:54 +11:00
Zoltan Somogyi
b64f0fbedc Add a "mercury_term_" prefix to lexer.m/parser.m.
library/mercury_term_lexer.m:
library/mercury_term_parser.m:
    As above.

NEWS:
    Announce the change.

browser/interactive_query.m:
compiler/analysis.file.m:
compiler/fact_table.m:
compiler/make.module_dep_file.m:
compiler/parse_module.m:
compiler/parse_tree_out_term.m:
compiler/recompilation.used_file.m:
library/MODULES_DOC:
library/char.m:
library/integer.m:
library/io.m:
library/library.m:
library/ops.m:
library/term.m:
library/term_io.m:
mdbcomp/trace_counts.m:
tests/hard_coded/impl_def_lex.m:
tests/hard_coded/impl_def_lex_string.m:
tests/hard_coded/lco_pack_args_3.m:
tests/hard_coded/lexer_bigint.m:
tests/hard_coded/lexer_ints.m:
tests/hard_coded/lexer_zero.m:
tests/hard_coded/parse_number_from_string.m:
tests/valid_seq/nested_module_bug.m:
    Conform to the change.
2021-12-31 02:38:07 +11:00
Julien Fischer
e4bb91fc10 Encoding chars as uint8s or uint16s.
library/char.m:
    Add to_utf8_uint8/2 and to_utf16_uint16/2. These do the
    same thing as the existing to_utf8/2 and to_utf16/2 predicates,
    but return the code units as uint8s or uint16s respectively.

    Update the comment at the head of the module about which version
    of the Unicode standard this module conforms to.
    (For the purposes of this module, nothing has changed between
    versions 10 and 13.)

NEWS:
    Announce the additions.
2021-07-21 18:18:57 +10:00
Zoltan Somogyi
3d2a22382f Speed up converting strings to ints/uints.
library/string.m:
    Test whether base string is between 2 and 36 just once.

library/char.m:
    Provide a predicate to convert a character to a base digit
    that does not check whether the base is between 2 and 36,
    leaving that to the caller.

    Make the error message for bases not in that range more informative,
    in the predicates that *do* check the base.

NEWS:
    Announce unsafe_base_digit_to_int.
2021-07-06 12:04:52 +10:00
Peter Wang
0d3fcbaae3 Delete Erlang code from library/mdbcomp/browser directories.
library/*.m:
    Delete Erlang foreign code and foreign types.

    Delete documentation specific to Erlang targets.

library/deconstruct.m:
    Add pragma no_determinism_warning to allow functor_number_cc/3
    to compile for now.

library/Mercury.options:
    Delete workaround only needed when targetting Erlang.

browser/listing.m:
mdbcomp/rtti_access.m:
    Delete Erlang foreign code and foreign types.
2020-10-28 14:10:56 +11:00
Zoltan Somogyi
58ea6ffff2 Delete old obsolete predicates and functions.
library/*.m:
    Specifically, delete any predicates and functions whose `pragma obsolete'
    dates from 2018 or before. Keep the ones that were obsoleted
    only this year or last year.

NEWS:
    Announce the changes.

tests/debugger/io_tab_goto.m:
tests/debugger/tabled_read.m:
tests/declarative_debugger/io_stream_test.m:
tests/declarative_debugger/tabled_read_decl.m:
tests/declarative_debugger/tabled_read_decl_goto.m:
tests/general/array_test.m:
tests/hard_coded/mutable_init_impure.m:
tests/hard_coded/remove_file.m:
tests/tabling/mercury_java_parser_dead_proc_elim_bug.m:
tests/tabling/mercury_java_parser_dead_proc_elim_bug2.m:
tests/valid/mercury_java_parser_follow_code_bug.m:
    Replace references to predicates and functions that this diff deletes
    with their suggested replacements.

    In several test cases, bring the programming style up to date.

tests/hard_coded/shift_test.{m,exp}:
    Most of this test case tested the now-deleted legacy shift operations.
    Replace these with tests of their non-legacy versions, including
    testing for the expected exceptions.

tests/hard_coded/shift_test.{m,exp}:
    Don't pass --no-warn-obsolete when compiling shift_test.m anymore.
2020-08-18 11:57:47 +10:00
Julien Fischer
d33647299a Rationalise hash functions across the standard library.
Currently, the hash functions for (some of) the primitive types are defined in
three places:

   1. The hash_table module.
   2. The version_hash_table module (duplicates of the above).
   3. Some (but not all) of the library modules for the primitive types.

This change makes the library module for a primitive type provide the hash
function for that type and deprecates the versions in the hash table modules.

Additionally, deprecate the "generic" has functions in the hash table modules.

library/hash_table.m:
library/version_hash_table.m:
    As above.

library/char.m:
library/int.m:
library/uint.m:
    Add hash/1 and hash/2.

library/float.m:
    Add hash/2.

library/robdd.m:
    Replace a call to the deprecated function.

NEWS:
    Announce the above additions and deprecations.

tests/hard_coded/hash_table_delete.m:
tests/hard_coded/hash_table_test.m:
tests/hard_coded/version_hash_table_delete.m:
tests/hard_coded/version_hash_table_test.m:
    Conform to the above change.
2020-02-11 14:22:42 +11:00
Julien Fischer
f89054f165 Use error($pred, "...") in more spots in the standard library.
Also, throw software_error/1 exceptions rather than directly throwing strings
in a few spots.

Undo special Ralph-style formatting.

library/*.:
    As above.

tests/hard_coded/array2d_from_array.exp:
tests/hard_coded/array2d.exp:
tests/hard_coded/test_injection.exp:
   Update to conform with the above change.
2019-10-15 17:48:36 +11:00
Zoltan Somogyi
95f8f56716 Delete unneeded $module args from calls to expect/unexpected. 2019-07-03 22:37:19 +02:00
Zoltan Somogyi
c7f8ebbe2f Avoid warnings from --warn-non-contiguous-{clauses,foreign-procs}.
browser/collect_lib.m:
browser/declarative_execution.m:
browser/dl.m:
browser/io_action.m:
compiler/make.util.m:
compiler/pickle.m:
compiler/process_util.m:
compiler/prog_event.m:
library/array.m:
library/benchmarking.m:
library/bit_buffer.m:
library/builtin.m:
library/char.m:
library/deconstruct.m:
library/dir.m:
library/erlang_rtti_implementation.m:
library/int.m:
library/int16.m:
library/int32.m:
library/int64.m:
library/int8.m:
library/io.m:
library/math.m:
library/mutvar.m:
library/private_builtin.m:
library/profiling_builtin.m:
library/rtti_implementation.m:
library/store.m:
library/string.format.m:
library/string.m:
library/table_builtin.m:
library/term_size_prof_builtin.m:
library/thread.m:
library/time.m:
library/type_desc.m:
library/uint16.m:
library/uint32.m:
library/uint64.m:
library/uint8.m:
ssdb/ssdb.m:
    As above. This mostly involved two things.

    The first was grouping foreign_procs by predicate instead of by language.
    In a few cases, this revealed that some predicates *had* no foreign_proc
    for a language, while related predicates did have one that just aborted
    if called. This diff adds similar aborting foreign_procs to predicate/
    language combinations that were missing them, when this seemed obviously
    the right thing to do.

    The second was moving pragmas about a predicate from the middle of the
    block of clauses of that predicate to the start of that block.
2018-10-19 11:01:33 +11:00
Zoltan Somogyi
a12692a0de Replace /* */ comments with // in the library.
Keep the old style comments where they do not go to the end of the line,
or where it is important that the comment line not have a // on it.
2018-06-21 18:55:08 +02:00
Mark Brown
d465fa53cb Update the COPYING.LIB file and references to it.
Discussion of these changes can be found on the Mercury developers
mailing list archives from June 2018.

COPYING.LIB:
    Add a special linking exception to the LGPL.

*:
    Update references to COPYING.LIB.

    Clean up some minor errors that have accumulated in copyright
    messages.
2018-06-09 17:43:12 +10:00
Peter Wang
4af0c874af Clarify meaning of "abort" in library documentation.
library/assoc_list.m:
library/bag.m:
library/bimap.m:
library/calendar.m:
library/char.m:
library/digraph.m:
library/list.m:
library/map.m:
library/multi_map.m:
library/psqueue.m:
library/rbtree.m:
library/string.m:
library/term.m:
library/tree234.m:
library/type_desc.m:
library/univ.m:
library/varset.m:
    Replace most occurrences of "abort" with "throw an exception".

    Slightly improve the documentation for map.search, map.lookup,
    map.inverse_search.

library/deconstruct.m:
    Replace "abort" with "runtime abort" where that is meant.
2017-10-09 21:48:29 +11:00
Peter Wang
70eb332e92 Give Unicode names of whitespace characters.
library/char.m:
    As above.
2017-10-02 17:31:34 +11:00
Peter Wang
0b36d737f8 Document what whitespace means to library predicates.
library/char.m:
    Present list of whitespace characters for char.whitespace
    in tabular form.

library/io.m:
library/lexer.m:
library/pprint.m:
library/stream.m:
library/string.m:
    Mention char.is_whitespace in the documentation of predicates which
    use that definition of whitespace characters.

    Module qualify some calls to char.is_whitespace for clarity.
2017-10-02 14:30:08 +11:00
Zoltan Somogyi
c42cb358b5 Fix white space. 2017-09-02 15:38:50 +10:00
Peter Wang
de1b2aa7c0 Add char.is_ascii/1.
library/char.m:
    Add char.is_ascii/1.

NEWS:
    Announce addition.
2017-08-15 16:10:44 +10:00
Julien Fischer
cfaee70768 Update header comment in char module.
library/char.m:
    Delete an out-of-date comment: Mercury characters are Unicode
    code points, so the set of characters supported and their mapping
    to integers is not implementation dependent.
2017-07-26 22:26:06 +10:00
Julien Fischer
70f9ef4a7f Add more character classification predicates.
library/char.m:
     Add the predicates is_control/1, is_space_separator/1,
     is_paragraph_separator/1, is_line_separator/1 and is_private_use/1.

     Reword some documentation from "Succeed if ..." to "True iff ...".
     This is more consistent with the other documentation in this module.

NEWS:
     Announce the additions.
2017-07-26 14:33:56 +10:00
Zoltan Somogyi
5898b127db Fix some warnings from --warn-inconsistent-pred-order-clauses.
library/bimap.m:
library/bitmap.m:
library/calendar.m:
library/char.m:
library/cord.m:
library/deconstruct.m:
library/diet.m:
library/dir.m:
library/eqvclass.m:
library/map.m:
library/pprint.m:
library/pqueue.m:
library/stream.string_writer.m:
library/term_conversion.m:
library/time.m:
library/type_desc.m:
library/version_array.m:
library/version_array2d.m:
library/version_bitmap.m:
library/version_hash_table.m:
    Fix inconsistencies between (a) the order in which functions and predicates
    are declared, and (b) the order in which they are defined.

    In most of these modules, either the order of the declarations
    or the order of the definitions made sense, and I changed the other
    to match. In a few modules, neither made sense, so I changed *both*
    to an order that *does* make sense (i.e. it has related predicates
    together).

    In some places, put dividers between groups of related
    functions/predicates, to make the groups themselves more visible.

    In some places, fix comments or programming style, give some auxiliary
    (non-exported) predicates better names, or delete some unneeded module
    qualifications.

    In some places, have the function form of a procedure forward
    the work to the predicate form, instead of vice versa, where this is
    more natural (usually because it allows the use of state variables).
    However, this is the only kind of "algorithmic" change in this diff;
    the rest is just moving code around.
2017-04-27 11:44:24 +10:00
Julien Fischer
14639c8b03 Fix omissions in the char module.
The functions det_int_to_{binary,octal,decimal,hex)_digit/1 were supposed to
have been added at the same time as the corresponding semidet predicates but
weren't: add them.

In the char module, don't call unexpected/3 in cases where the error arises as
a result of bad inputs: throwing an exception in that case *is* expected.

library/char.m:
     As above.

NEWS:
     Announce the above additions.

samples/e.m:
     Avoid a call to an obsolete predicate and update syntax.
2015-08-30 14:49:22 +10:00
Zoltan Somogyi
b6b9e4063a Convert (C->T;E) to (if C then T else E). 2015-08-22 22:03:57 +10:00
Sebastian Godelet
03be91476c Add is_(leading|trailing)_surrogate/1 to char
For input validation and character encoding transformations
it is essential to check for surrogate characters and their correct
sequence.

NEWS:
    Announce the addition of is_(leading|trailing)_surrogate/1.

library/char.m:
    Add is_leading_surrogate/1 which succeeds if a character is a
    leading surrogate character.
    is_trailing_surrogate/1 succeeds if the character is a
    trailing surrogate character.
2014-12-22 14:14:58 +08:00
Zoltan Somogyi
7f9791aa26 Standardize divider line lengths in the library.
library/*.m:
    As above.

tool/stdlines:
    A new shell script to do the job.
2014-11-23 22:05:34 +11:00
Zoltan Somogyi
9f61bf03d3 Clean up library/string.m.
library/string.m:
    Avoid unnecessary use of higher order code in the implementation of
    string.format. Avoid the unnecessary insertion of empty manifest strings
    between actual format conversions.

    Represent the optional width and precision in string.format as integers,
    not strings, since the strings take up more space and require more time
    to create and to process.

    Simplify the code implementing word_wrap.

    Give some variables better names.

    Clarify some documentation.

    Call unexpected instead of error in cases where the source of a problem
    can only be a software bug inside string.m, and not bad input supplied
    from outside of it.

    Don't mention the arity of a predicate that is the source of an exception,
    since the predicate's identity is unambiguous even without the arity,
    and since other exceptions also don't mention arities.

library/require.m:
    Add two-argument versions of error/1 and func_error/1, to allow
    the name of the predicate and the error message to be passed separately,
    possible with the first being specified as $pred. string.m had several
    places where these were being appended on the fly.

library/char.m:
    Clarify some code.

NEWS:
    Mention the additions to require.m.
2014-11-10 11:29:02 +11:00
Julien Fischer
bed96b93ff Avoid module qualification in library interfaces where possible.
NOTE: this change does not affect the io module -- I've left that for a
separate change.

library/*.m:
	As per the recent change to the coding standard, avoid module
	qualification in library interfaces where possible.

	Reformat declarations and descriptive comments to better utilise
	any space freed up by the above.
2014-10-10 15:08:24 +11:00
Julien Fischer
696829e6c1 Fix problems with the char module.
(1) The behaviour of digit_to_int/2 was inconsistent with that of is_digit/2.
The former succeeds for all of 0-9, a-z and A-Z while the latter succeeds only
for 0-9 (i.e. it was possible for digit_to_int/2 to succeed for non-decimal
characters, which is not what was intended in many of it uses).

(2) Predicates involving hexadecimal digits were inconsistently named, they
were "hex digits" in one predicate name, "hex chars" in another.

This change ensures that the following operations are supported for binary,
octal, decimal and hexadecimal digits and that we use a consistent naming
scheme for the predicates that implement them:
    - testing if a character is a digit of the given base
    - conversion to an int
    - conversion from an int

In addition, we also add predicates for supporting these operations for user
defined bases, ranging from 2-36.

library/char.m:
    Add the predicate is_decimal_digit/1, which is a synonym for is_digit/1.

    Add the predicate is_base_digit/2.

    Add the predicates int_to_{binary,octal,decimal,hex}_digit/2 and
    base_int_to_digit/3.

    Add the predicates {binary,octal,decimal,hex}_digit_to_int/2 and
    base_digit_to_int/3.

    Add det function versions of the above.

    Delete the function det_digit_to_int/1 that I added the other day.

    Mark the following as obsolete:
        - is_hex_digit/2
        - int_to_hex_char/2
        - int_to_digit/2
        - det_int_to_digit/1
        - det_int_to_digit/2

    Avoid redundant module qualification in the implementation.

    Mark some C foreign_procs as not modifying the trail.

    Re-order some declarations according to how the coding standard says they
    should be ordered.

library/bitmap.m:
library/integer.m:
library/parsing_utils.m:
library/string.m:
compiler/prog_rep_tables.m:
    Replace calls to obsolete predicates or functions.

NEWS:
    Announce the above changes.

    Add note advising users of digit_to_int/2 to check their code for the
    problem described above.

tests/hard_coded/Mmakefile:
tests/hard_coded/test_char_digits.m:
tests/hard_coded/test_char_digits.exp:
    Add a systematic test for the above predicates.
2014-09-12 01:16:00 +10:00
Julien Fischer
f680ddc56a Add char.det_digit_to_int/1 to the library.
library/char.m:
	As above.

NEWS:
	Announce the addition.
2014-09-05 15:05:05 +10:00
Peter Wang
e4b5b500fe Document that char predicates work in ASCII range.
library/char.m:
	Document explicitly the range of code points that char
	predicates work with so that there is no ambiguity,
	and to give pause to anyone considering to extend them
	in future.
2013-06-25 15:43:04 +10:00
Julien Fischer
4255cdd025 Add cord.init/0 for consistency with other modules.
Branches: main

library/cord.m:
	Add cord.init/0 for consistency with other modules.

library/int.m:
library/char.m:
	Group function definitions with the corresponding
	predicate definition.

library/version_store.m:
	Fix indentation.

NEWS:
	Announce the addition of cord.init/0.

	Fix some earlier entries.
2011-05-10 07:02:28 +00:00
Peter Wang
3788a9d6fb Improve Unicode support.
Branches: main

Improve Unicode support.

Declare that we use the Unicode character set, and UTF-8 or UTF-16 for the
internal string representation (depending on the backend).  User code may be
written to those assumptions.  Other external encodings can be supported in
the future by translating to/from Unicode internally.

The `char' type now represents a Unicode code point.

NOTE: questions about how to handle unpaired surrogate code points, etc. have
been left for later.


library/char.m:
        Define a `char' to be a Unicode code point and extend ranges
        appropriately.

        Add predicates: to_utf8, to_utf16, is_surrogate, is_noncharacter.

	Update some documentation.

library/io.m:
	Declare I/O predicates on text streams to read/write code points, not
	ambiguous "characters".  Text files are expected to use UTF-8 encoding.
	Supporting other encodings is for future work.

        Update the C and Erlang implementations to understand UTF-8 encoding.

	Update Java and C# implementations to read/write code points (Mercury
	char) instead of UTF-16 code units.

	Add `may_not_duplicate' attributes to some foreign_procs.

	Improve Erlang implementations of seeking and getting the stream size.

library/string.m:
	Declare the string representations, as described earlier.

        Distinguish between code units and code points everywhere.
	Existing functions and predicates which take offset and length
	arguments continue to take them in terms of code units.

        Add procedures: count_code_units, count_codepoints, codepoint_offset,
	to_code_unit_list, from_code_unit_list, index_next, unsafe_index_next,
	unsafe_prev_index, unsafe_index_code_unit, split_by_codepoint,
	left_by_codepoint, right_by_codepoint, substring_by_codepoint.

	Make index, index_det call error/1 if an illegal sequence is detected,
	as they already do for invalid offsets.

	Clarify that is_all_alpha, is_all_alnum_or_underscore,
	is_alnum_or_underscore only succeed for the ASCII characters under each
	of those categories.

        Clarify that whitespace stripping functions only strip whitespace
        characters in the ASCII range.

	Add comments about the future treatment of surrogate code points
	(not yet implemented).

	Use Mercury format implementation when necessary instead of `sprintf'.
	The %c specifier does not work for code points which require multi-byte
	representation.  The field width modifier for %s only works if the
	string contains only single-byte code points.

library/lexer.m:
        Conform to string encoding changes.

        Simplify code dealing with \uNNNN escapes now that encoding/decoding
        is handled by the string module.

library/term_io.m:
        Allow code points above 126 directly in Mercury source.

        NOTE: \x and \o codes are treated as code points by this change.

runtime/mercury_types.h:
        Redefine `MR_Char' to be `int' to hold a Unicode code point.

	`MR_String' has to be defined as a pointer to `char' instead of a
	pointer to `MR_Char'.  Some C foreign code will be affected by this
	change.

runtime/mercury_string.c:
runtime/mercury_string.h:
        Add UTF-8 helper routines and macros.

        Make hash routines conform to type changes.

compiler/c_util.m:
        Fix output_quoted_string_lang so that it correctly outputs non-ASCII
        characters for each of the target languages.

        Fix quote_char for non-ASCII characters.

compiler/elds_to_erlang.m:
        Write out code points above 126 normally instead of using escape
        syntax.

        Conform to string encoding changes.

compiler/mlds_to_cs.m:
        Change Mercury `char' to be represented by C# `int'.

compiler/mlds_to_java.m:
        Change Mercury `char' to be represented by Java `int'.

doc/reference_manual.texi:
        Uncomment description of \u and \U escapes in string literals.

        Update description of C# and Java representations for Mercury `char'
	which are now `int'.

tests/debugger/tailrec1.m:
        Conform to renaming.

tests/general/string_replace.exp:
tests/general/string_replace.m:
	Test non-ASCII characters to string.replace.

tests/general/string_test.exp:
tests/general/string_test.m:
	Test non-ASCII characters to string.duplicate_char,
	string.pad_right, string.pad_left, string.format_table.

tests/hard_coded/char_unicode.exp:
tests/hard_coded/char_unicode.m:
	Add test for new procedures in `char' module.

tests/hard_coded/contains_char_2.m:
	Test non-ASCII characters to string.contains_char.

tests/hard_coded/nonascii.exp:
tests/hard_coded/nonascii.m:
tests/hard_coded/nonascii_gen.c:
        Add code points above 255 to this test case.

	Change test data encoding to UTF-8.

tests/hard_coded/string_class.exp:
tests/hard_coded/string_class.m:
	Add test case for string.is_alpha, etc.

tests/hard_coded/string_codepoint.exp:
tests/hard_coded/string_codepoint.exp2:
tests/hard_coded/string_codepoint.m:
	Add test case for new string procedures dealing with code points.

tests/hard_coded/string_first_char.exp:
tests/hard_coded/string_first_char.m:
	Add test case for all modes of string.first_char.

tests/hard_coded/string_hash.m:
	Don't use buggy random.random/5 predicate which can overflow on
	a large range (such as the range of code points).

tests/hard_coded/string_presuffix.exp:
tests/hard_coded/string_presuffix.m:
	Add test case for string.prefix, string.suffix, etc.

tests/hard_coded/string_set_char.m:
	Test non-ASCII characters to string.set_char.

tests/hard_coded/string_strip.exp:
tests/hard_coded/string_strip.m:
	Test non-ASCII characters to string stripping procedures.

tests/hard_coded/string_sub_string_search.m:
	Test non-ASCII characters to string.sub_string_search.

tests/hard_coded/unicode_test.exp:
        Update expected output due to change of behaviour of
        `string.to_char_list'.

tests/hard_coded/unicode_test.m:
	Test non-ASCII character in separator string argument to
	string.join_list.

tests/hard_coded/utf8_io.exp:
tests/hard_coded/utf8_io.m:
	Add tests for UTF-8 I/O.

tests/hard_coded/words_separator.exp:
tests/hard_coded/words_separator.m:
        Add test case for `string.words_separator'.

tests/hard_coded/Mmakefile:
	Add new test cases.

	Make special_char test case run on all backends.

tests/hard_coded/special_char.exp:
tests/valid/mercury_java_parser_follow_code_bug.m:
	Reencode these files in UTF-8.

NEWS:
	Add a news entry.
2011-04-04 07:10:42 +00:00
Peter Wang
c053f90088 Allow inlining of Java foreign_procs.
Branches: main, 10.04

Allow inlining of Java foreign_procs.

This revealed a problem with directly using the `succeeded' flag directly as
the success indicator in Java foreign_procs.  When the code of the foreign_proc
becomes a nested function, and after nested functions are eliminated, there may
not be a variable called `succeeded' in that context; it is moved into
environment struct, and the transformation is not able to update handwritten
code to reflect that.  The solution is to declare a local variable for the
foreign_proc, let the handwritten code assign that, then assign its final
value to the `succeeded' flag with an MLDS statement.

We take the opportunity to name the local variable `SUCCESS_INDICATOR', in
line with other backends.

compiler/inlining.m:
        Allow inlining of Java foreign_procs.

compiler/ml_foreign_proc_gen.m:
        In the code generated for semidet Java foreign_procs, declare a local
        `SUCCESS_INDICATOR' variable and assign its value to the `succeeded'
        flag afterwards.

        Add braces to give the foreign_proc variables a limited scope.

compiler/make_hlds_warn.m:
        Conform to renaming.

doc/reference_manual.texi:
        Update documentation for the renaming of the `succeeded' variable.

library/array.m:
library/bitmap.m:
library/builtin.m:
library/char.m:
library/construct.m:
library/dir.m:
library/exception.m:
library/float.m:
library/int.m:
library/io.m:
library/math.m:
library/private_builtin.m:
library/rtti_implementation.m:
library/string.m:
library/thread.m:
library/time.m:
library/type_desc.m:
library/version_array.m:
        Conform to renaming.

        Fix problems with Java foreign_procs that may now be copied into other
        modules when intermodule optimisation is enabled, some by disallowing
        the procedures from being duplicated, some by making referenced
        classes/fields `public'.

        [Some of the `may_not_duplicate' attributes may not indicate actual
        problems, just that it seems not worthwhile inlining calls to the
        procedure.]

extras/solver_types/library/any_array.m:
tests/hard_coded/equality_pred_which_requires_boxing.m:
tests/hard_coded/external_unification_pred.m:
tests/hard_coded/java_test.m:
tests/hard_coded/redoip_clobber.m:
tests/hard_coded/user_compare.m:
tests/valid/exported_foreign_type2.m:
tests/warnings/warn_succ_ind.m:
tests/warnings/warn_succ_ind.exp3:
        Conform to renaming.
2010-05-07 03:12:27 +00:00
Zoltan Somogyi
fd58d815cb This module has four versions of the get_token predicate, some
Estimated hours taken: 3
Branches: main

library/lexer.m:
	This module has four versions of the get_token predicate, some
	of which consume significant time during typical compilations.
	This diff factors out some of the code common among them,
	and restructures this common code from long sequences
	of nested if-then-elses to switches.

	This yields an overall speedup of 3.2% on tools/speedtest.

	Don't disable vim's wrapmargin functionality.

library/char.m:
	Add notes to the predicates which are now linked to code in lexer.m.
2008-05-20 00:52:31 +00:00
Ralph Becket
67ec243a51 Define pretty_printer formatters for some common standard library types.
Estimated hours taken: 4
Branches: main

Define pretty_printer formatters for some common standard library types.
Include these formatters in the default pretty_printer formatter_map.
Add a useful function to the pretty_printer interface.

library/array.m:
library/char.m:
library/float.m:
library/int.m:
library/list.m:
library/string.m:
library/tree234.m:
	Add <type>_to_doc functions.

library/pretty_printer.m:
	Added function format_arg/1.
	Initialise the default formatter_map to use the <type>_to_doc
	functions.

tests/hard_coded/Mmakefile:
tests/hard_coded/test_pretty_printer_defaults.exp:
tests/hard_coded/test_pretty_printer_defaults.m:
	Test case.
2007-08-14 04:21:09 +00:00
Peter Ross
e3f8bbf9aa Implement the TypeCtorInfo RTTI so that generic compare
Estimated hours taken: 16
Branches: main

Implement the TypeCtorInfo RTTI so that generic compare
and unify work via rtti_implementation.

However after discussion with zs, this current design of
RTTI has to be adapted to become more erlang specific, due
to the different data representation on the erlang backend.
The current code will serve as a template for this new design
though, which is why it is being checked in.

Add various erlang library implementations which are
needed to run useful programs when testing the erlang backend.

compiler/elds.m:
	Add a type_info_id.

compiler/elds_to_erlang.m:
	We now generate type_ctor_info's so call them.

compiler/erl_rtti.m:
	After discussions with zs, type_info and pseudo_type_infos
	should never occur on the erlang backend as they are needed
	for gc and the debugger so throw an exception for them.
	Add an implementation of creating a static type_info, but which
	isn't used in case we need it again later.
	Create type_ctor_info with all the fields except the
	TypeFunctors, the TypeLayout and the FunctorNumberMap

compiler/special_pred.m:
	Make sure we generate RTTI for the builtin types.

library/builtin.m:
library/char.m:
library/exception.m:
library/float.m:
library/int.m:
library/io.m:
library/lexer.m:
library/math.m:
library/mutvar.m:
library/ops.m:
library/par_builtin.m:
library/private_builtin.m:
library/rtti_implementation.m:
library/solutions.m:
library/store.m:
library/string.m:
library/table_builtin.m:
library/thread.semaphore.m:
library/time.m:
library/type_desc.m:
	Erlang implementations of std library functions.
2007-05-30 08:16:09 +00:00
Simon Taylor
5647714667 Make all functions which create strings from characters throw an exception
Estimated hours taken: 15
Branches: main

Make all functions which create strings from characters throw an exception
or fail if the list of characters contains a null character.

This removes a potential source of security vulnerabilities where one
part of the program performs checks against the whole of a string passed
in by an attacker (processing the string as a list of characters or using
`unsafe_index' to look past the null character), but then passes the string
to another part of the program or an operating system call that only sees
up to the first null character.  Even if Mercury stored the length with
the string, allowing the creation of strings containing nulls would be a
bad idea because it would be too easy to pass a string to foreign code
without checking.

For examples see:
<http://insecure.org/news/P55-07.txt>
<http://www.securiteam.com/securitynews/5WP0B1FKKQ.html>
<http://www.securityfocus.com/archive/1/445788>
<http://www.securityfocus.com/archive/82/368750>
<http://secunia.com/advisories/16420/>

NEWS:
	Document the change.

library/string.m:
	Throw an exception if null characters are found in
	string.from_char_list and string.from_rev_char_list.

	Add string.from_char_list_semidet and string.from_rev_char_list_semidet
	which fail rather throwing an exception.  This doesn't match the
	normal naming convention, but string.from_{,rev_}char_list are widely
	used, so changing their determinism would be a bit too disruptive.

	Don't allocate an unnecessary extra word for each string created by
	from_char_list and from_rev_char_list.

	Explain that to_upper and to_lower only work on un-accented
	Latin letters.

library/lexer.m:
	Check for invalid characters when reading Mercury strings and
	quoted names.

	Improve error messages by skipping to the end of any string
	or quoted name containing an error.  Previously we just stopped
	processing at the error leaving an unmatched quote.

library/io.m:
	Make io.read_line_as_string and io.read_file_as_string return
	an error code if the input file contains a null character.

	Fix an XXX: '\0\' is not recognised as a character constant,
	but char.det_from_int can be used to make a null character.

library/char.m:
	Explain the workaround for '\0\' not being accepted as a char
	constant.

	Explain that to_upper and to_lower only work on un-accented
	Latin letters.

compiler/layout.m:
compiler/layout_out.m:
compiler/c_util.m:
compiler/stack_layout.m:
compiler/llds.m:
compiler/mlds.m:
compiler/ll_backend.*.m:
compiler/ml_backend.*.m:
	Don't pass around strings containing null characters (the string
	tables for the debugger).  This doesn't cause any problems now,
	but won't work with the accurate garbage collector.  Use lists
	of strings instead, and add the null characters when writing the
	strings out.

tests/hard_coded/null_char.{m,exp}:
	Change an existing test case to test that creation of a string
	containing a null throws an exception.

tests/hard_coded/null_char.exp2:
	Deleted because alternative output is no longer needed.

tests/invalid/Mmakefile:
tests/invalid/null_char.m:
tests/invalid/null_char.err_exp:
	Test error messages for construction of strings containing null
	characters by the lexer.

tests/invalid/unicode{1,2}.err_exp:
	Update the expected output after the change to the handling of
	invalid quoted names and strings.
2007-03-18 23:35:04 +00:00
Simon Taylor
9c650e1d83 Improvements for bitmap.m, to make it useable as a general container
Estimated hours taken: 80
Branches: main

Improvements for bitmap.m, to make it useable as a general container
for binary data.

library/bitmap.m:
runtime/mercury_bitmap.c:
runtime/mercury_bitmap.h:
	Specialize the representation of bitmaps to an array of unsigned
	bytes defined as a foreign type.

	This is better than building on top of array(int) because it:
	- is better for interfacing with foreign code
	- has a more sensible machine-independent comparison order
	  (same as array(bool))
	- avoids storing the size twice
	- has more efficient copying, unification, comparison and tabling
	  (although we should probably specialize the handling of array(int)
	  and isomorphic types as well)
	- uses GC_MALLOC_ATOMIC to avoid problems with bit patterns that look
	  like pointers (although we should do that for array(int) as well)

	XXX The code for the Java and IL backends is untested.
	Building the library in grade Java with Sun JDK 1.6 failed (but
	at least passed error checking), and I don't have access to a
	copy of MSVS.NET.  The foreign code that needs to be tested is
	trivial.

	Add fields `bit', `bits' and `byte' to get/set a single bit,
	multiple bits (from an int) or an 8 bit byte.

	Add functions for converting bitmaps to hex strings and back,
	for use by stream.string_writer.write and deconstruct.functor/4.

	bitmap.intersect was buggy in the case where the input bitmaps
	had a different size.  Given that bitmaps are implemented with
	a fixed domain (lookups out of range throw an exception), it
	makes more sense to throw an exception in that case anyway,
	so all of the set operations do that now.

	The difference operation actually performed xor.  Fix it and
	add an xor function.

library/version_bitmap.m:
	This hasn't been fully updated to be the same as bitmap.m.
	The payoff would be much less because foreign code can't
	really do anything with version_bitmaps.

	Add a `bit' field.

	Deprecate the `get/2' function in favour of the `bit' field.

	Fix the union, difference, intersection and xor functions
	as for bitmap.m.

	Fix comparison of version_arrays so that it uses the same
	method as array.m: compare size then elements in order.
	The old code found version_arrays to be equal if one was
	a suffix of the other.

library/char.m:
	Add predicates for converting between hex digits and integers.

library/io.m:
library/stream.string_writer.m:
library/term.m:
	Read and write bitmaps.

runtime/mercury_type_info.h:
runtime/mercury_deep_copy_body.h:
runtime/mercury_mcpp.h:
runtime/mercury_table_type_body.h:
runtime/mercury_tabling_macros.h:
runtime/mercury_unify_compare_body.h:
runtime/mercury_construct.c:
runtime/mercury_deconstruct.c:
runtime/mercury_term_size.c:
runtime/mercury_string.h:
library/construct.m:
library/deconstruct.m
compiler/prog_type.m:
compiler/mlds_to_gcc.m:
compiler/rtti.m:
	Add a MR_TypeCtorRep for bitmaps, and handle it in the library
	and runtinme.

library/Mercury.options:
	Compile bitmap.m with `--no-warn-insts-without-matching-type'.

runtime/mercury_type_info.h:
	Bump MR_RTTI_VERSION.

NEWS:
	Document the changes.

tests/hard_coded/Mmakefile:
tests/hard_coded/bitmap_test.m:
tests/hard_coded/bitmap_simple.m:
tests/hard_coded/bitmap_tester.m:
tests/hard_coded/bitmap_test.exp:
tests/tabling/Mmakefile:
tests/tabling/expand_bitmap.m:
tests/tabling/expand_bitmap.exp:
tests/hard_coded/version_array_test.m:
tests/hard_coded/version_array_test.exp:
	Test cases.
2007-02-13 01:59:04 +00:00
Zoltan Somogyi
7b7dabb89a Extend this optimization to handle temporaries being both defined in
Estimated hours taken: 12
Branches: main

compiler/use_local_vars.m:
	Extend this optimization to handle temporaries being both defined in
	and used by foreign_proc_code instructions. This should eliminate
	unnecessary accesses to the MR_fake_reg array, and thus speed up
	programs that use foreign code a lot, including typeclass- and
	tabling-intensive programs, since those features are implemented using
	inline foreign code. I/O intensive should also benefit, but not much,
	since the cost of the I/O itself overwhelms the cost of the
	MR_fake_reg accesses.

	Group together the LLDS instructions that are handled similarly.
	Factor out some common code.

compiler/opt_util.m:
	Allow for the fact that foreign_proc_codes can now refer to
	temporaries.

compiler/opt_debug.m:
	Print more useful information about foreign_proc_code components.

compiler/prog_data.m:
	Rename the types and function symbols of the recently added
	foreign_proc attributes to avoid clashing with the keywords
	representing them in source code.

	Add a new foreign_proc attribute, proc_may_duplicate that governs
	whether the body of foreign code is allowed to be duplicated.

compiler/table_gen.m:
	Include does_not_affect_liveness among the annotations for the
	foreign_proc calls generated by this module. Some of these procedures
	affect memory beyond their arguments, but that memory is in tables,
	not in unlisted registers.

	Allow some of the smaller code fragments generated by this module
	to be duplicated.

compiler/inlining.m:
	Respect the may_not_duplicate foreign_proc attribute.

compiler/pragma_c_gen.m:
	Transmit any annotations about liveness from the HLDS to the LLDS,
	since without does_not_affect_liveness annotations use_local_vars.m
	cannot optimize foreign_proc_codes.

	Transmit any annotations about may_duplicate from the HLDS to the LLDS,
	since with them jumpopt can do a better job.

compiler/llds.m:
	Use the new foreign_proc attribute instead of a boolean to represent
	whether a foreign code fragment may be duplicated.

compiler/simplify.m:
	Generate an error message if a may_duplicate or may_not_duplicate
	attribute on a foreign_proc conflicts with a no_inline or inline pragma
	(respectively) on the predicate it belongs to.

compiler/hlds_pred.m:
	Fix some comment rot.

compiler/jumpopt.m:
compiler/livemap.m:
compiler/proc_gen.m:
compiler/trace_gen.m:
	Conform to the changes above.

doc/reference_manual.texi:
	Document the new foreign_proc attribute.

library/array.m:
library/builtin.m:
library/char.m:
library/dir.m:
library/float.m:
library/int.m:
library/io.m:
library/lexer.m:
library/math.m:
library/private_builtin.m:
library/string.m:
library/version_array.m:
	Add does_not_affect_liveness annotations to the C foreign_procs that
	deserve them.

configure.in:
	Require the installed compiler to support does_not_affect_liveness.

tests/invalid/test_may_duplicate.{m,err_exp}:
	Add a new test case to test the error checking code in simplify.m.

tests/invalid/Mmakefile:
	Enable the new test case.
2007-01-15 02:24:04 +00:00
Julien Fischer
e0f5ac47db Make it easier for vi to jump past the initial comments
Estimated hours taken: 0.1
Branches: main

library/*.m:
	Make it easier for vi to jump past the initial comments
	at the head of a module.
2006-04-19 05:18:00 +00:00
Julien Fischer
5e92224eec Improve the library reference manual by formatting the beginning of
Estimated hours taken: 0.2
Branches: main, release

library/*.m:
	Improve the library reference manual by formatting the beginning of
	library modules consistently.

library/integer.m:
	Fix some bad indentation.
2006-04-13 06:08:05 +00:00
Zoltan Somogyi
b293bd999d Replace __ with . as the module qualifier everywhere.
Estimated hours taken: 1
Branches: main

library/*.m:
	Replace __ with . as the module qualifier everywhere.

tests/hard_coded/test_injection.exp:
	Replace __ with . as the module qualifier in expected exceptions.
2006-03-07 22:23:58 +00:00
Julien Fischer
7a8681f774 Annotate foreign_procs with trail usage information throughout most of
Estimated hours taken: 0.5
Branches: main

library/*.m:
	Annotate foreign_procs with trail usage information throughout most of
	the standard library.

	Fix an out of date comment in string.m.

	Fix some minor formatting problems.
2005-12-14 10:33:56 +00:00
Zoltan Somogyi
57b8f436eb Convert to four-space indentation most of the library modules that
Estimated hours taken: 4
Branches: main

library/*.m:
	Convert to four-space indentation most of the library modules that
	weren't already indented that way. Use predmode syntax where possible.
	In some modules, shorten long lines by deleting module name prefixes.
	Fix departures from our coding standards.

	In some modules, simplify code, mostly using field names and/or state
	variables.

	There are no changes in algorithms, except for neg_list in integer.m.
2005-10-17 11:35:22 +00:00
Julien Fischer
1de1dbb394 Fix more typos in library documentation.
Estimated hours taken: 0.5
Branches: main

Fix more typos in library documentation.

library/builtin.m:
	s/wheiter/whether.

library/char.m:
	s/consise/concise/

library/getopt.m:
library/getopt_io.m:
	s/keept/keep/

library/int.m:
	s/expontiation/exponentiation/

library/ops.m:
	s/precendence/precedence/

library/robdd.m:
	s/efficent/efficient/

library/set_bbbtree.m:
	s/noticable/noticeable/

library/term_to_xml.m:
	s/funtor/functor/
	s/attrinutes/attributes/
	s/fuctor/functor/

library/version_array.m:
	s/incurr/incur/
2005-01-27 03:59:27 +00:00
Julien Fischer
b13a50c7f6 Make the positioning of descriptive comments conform
Estimated hours taken: 3.5
Branches: main

Make the positioning of descriptive comments conform
to the coding standard for the following library modules.

Convert preds to predmode syntax where possible.

Make the ordering of related predicates and functions
conform to the coding standard, where the descriptive
comment makes it possible to do that.

Other minor changes are listed below.

library/bimap.m:
	Fix capitalisation of a few comments.

library/dir.m:
	s/throw an exception/throws an exception/.

library/exception.m:
	Fix the comment about the exception_result/1 type.
	There is only one type and an inst following the comment.

library/map.m:
	Remove the unique modes for map.set/4, map.delete/3 and
	map.delete_list/3.

library/rbtree.m:
	Remove the unique modes for rbtree.set/4, rbtree.delete/3,
	rbtree.remove/4, rbtree.remove_smallest/4 and rbtree.remove_largest/4.

library/tree234.m:
	Remove left over unique modes for preds.

library/set.m:
	XXX the ordering of procedures in this module is a bit strange.
library/set_bbbtree.m:
library/set_unordlist.m:
	Remove various unique modes for set operations like
	delete/3.  (Some of these were commented out anyway).

library/term_to_xml.m:
	Fix a spot where line width exceeded 79 characters.

library/array.m:
library/assoc_list.m:
library/random.m:
library/multi_map.m:
library/pqueue.m:
library/queue.m:
library/bool.m:
library/char.m:
library/construct.m:
library/counter.m:
library/deconstruct.m:
library/eqvclass.m:
library/gc.m:
library/io.m:
library/sparse_bitset.m:
library/stack.m:
library/std_util.m:
library/store.m:
library/string.m:
library/term.m:
library/term_io.m:
library/type_desc.m:
library/varset.m:
	As above.
2005-01-24 23:16:40 +00:00
Zoltan Somogyi
39295e0f3a Add some library functions I discovered I needed while working on my scanner
Estimated hours taken: 2
Branches: main

Add some library functions I discovered I needed while working on my scanner
generator.

library/char.m:
	Add char__int_to_char, a more expressive name for the reverse mode
	of char__to_int, and a deterministic version, char__det_int_to_char.

library/list.m:
	Add some more arities of the foldl and map_foldl predicates.

	Use a consistent naming scheme for the type variables in signatures
	of the various versions of the foldl and map_foldl predicates.

library/map.m:
	Add a new predicate reverse_map, which turns a map(K, V) into
	a map(V, set(K)).

library/svarray.m:
library/svqueue.m:
library/svbimap.m:
	New modules, which contain state-variable-friendly versions of the
	relevant predicates from array.m, queue.m and bimap.m respectively.

library/library.m:
NEWS:
	Mention the three new modules.
2005-01-06 05:08:15 +00:00
Zoltan Somogyi
6909674f14 Bring these modules up to date with our current style guidelines.
Estimated hours taken: 8
Branches: main

library/*.m:
	Bring these modules up to date with our current style guidelines.
	Use predmode declarations where appropriate. Use state variable syntax
	where appropriate. Reorder arguments where this makes it possible to
	to use state variable syntax. Standardize format of predicate
	description comments. Standardize indentation.
2004-03-15 23:49:36 +00:00
James Goddard
28b2e4f771 Implemented some library functions for the char library in java.
Estimatetimated hours taken: 0.5
Branches: main

Implemented some library functions for the char library in java.

library/char.m:
        Implemented the following predicates in java:

                char__to_int/2
                char__max_char_value/1
2003-12-12 02:20:30 +00:00
Peter Ross
d28ac0ec53 Begin porting the the library just to use C# as its foreign_proc
Estimated hours taken: 2
Branches: main

Begin porting the the library just to use C# as its foreign_proc
language.

library/array.m:
library/char.m:
library/exception.m:
library/float.m:
library/int.m:
library/math.m:
library/private_builtin.m:
library/rtti_implementation.m:
library/std_util.m:
	Trivial changes to convert MC++ to C#.

library/table_builtin.m:
	Delete some unused MC++ functions.
2003-11-07 16:51:36 +00:00