mirror of
https://github.com/Mercury-Language/mercury.git
synced 2026-04-15 01:13:30 +00:00
library/string.m:
The documentation comment for to_uppercase gives the conversion range for
to_lowercase.
7031 lines
240 KiB
Mathematica
7031 lines
240 KiB
Mathematica
%---------------------------------------------------------------------------%
|
|
% vim: ts=4 sw=4 et ft=mercury
|
|
%---------------------------------------------------------------------------%
|
|
% Copyright (C) 1993-2012 The University of Melbourne.
|
|
% Copyright (C) 2013-2026 The Mercury team.
|
|
% This file is distributed under the terms specified in COPYING.LIB.
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% File: string.m.
|
|
% Main authors: fjh, petdr, wangp.
|
|
% Stability: high.
|
|
%
|
|
% This module provides basic string handling facilities.
|
|
%
|
|
% Mercury strings are Unicode strings. They use either the UTF-8 or UTF-16
|
|
% encoding, depending on the target language.
|
|
%
|
|
% When Mercury is compiled to C, strings are UTF-8 encoded, with a null
|
|
% character as the string terminator. With UTF-8, each code unit is one byte,
|
|
% and a single code point requires one to four of these code units to encode.
|
|
%
|
|
% When Mercury is compiled to Java, strings are represented using Java's
|
|
% String type. When Mercury is compiled to C#, strings are represented using
|
|
% C#'s `System.String' type. Both of these types use the UTF-16 encoding.
|
|
% With UTF-16, each code unit is a 16 bit integer, and a single code point
|
|
% requires one or two of these code units to encode.
|
|
%
|
|
% The Mercury compiler will only allow well-formed UTF-8 or UTF-16 string
|
|
% constants. However, it is possible to produce strings containing invalid
|
|
% UTF-8 or UTF-16 via I/O, foreign code, and substring operations.
|
|
% Predicates or functions that inspect strings may fail, throw an exception,
|
|
% or else behave in some other special way when they encounter an ill-formed
|
|
% code unit sequence.
|
|
%
|
|
% Unexpected null characters embedded in the middle of strings can be a source
|
|
% of security vulnerabilities, so the Mercury library predicates and functions
|
|
% which create strings from (lists of) characters throw an exception if they
|
|
% detect such a null character. Programmers must not create strings that might
|
|
% contain null characters using the foreign language interface.
|
|
%
|
|
% The builtin comparison operation on strings is also dependent on the target
|
|
% language. The current implementation performs string comparison using
|
|
%
|
|
% - C's strcmp() function, when compiling to C;
|
|
% - Java's String.compareTo() method, when compiling to Java; and
|
|
% - C#'s System.String.CompareOrdinal() method, when compiling to C#.
|
|
%
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% This module is divided into several sections. These sections are:
|
|
%
|
|
% - Wrapper types that associate particular semantics with raw strings.
|
|
% - Identifying the Unicode encoding form used by the current platform.
|
|
% - Converting between strings and lists of characters.
|
|
% - Reading characters from strings.
|
|
% - Writing characters to strings.
|
|
% - Determining the lengths of strings.
|
|
% - Computing hashes of strings.
|
|
% - Tests on strings.
|
|
% - Appending strings.
|
|
% - Splitting up strings.
|
|
% - Dealing with prefixes and suffixes.
|
|
% - Transformations of strings.
|
|
% - Folds over the characters in strings.
|
|
% - Formatting tables.
|
|
% - Converting strings to docs.
|
|
% - Converting strings to values of builtin types.
|
|
% - Converting values of builtin types to strings.
|
|
% - Converting values of arbitrary types to strings.
|
|
% - Converting values to strings based on a format string.
|
|
%
|
|
%---------------------------------------------------------------------------%
|
|
|
|
:- module string.
|
|
:- interface.
|
|
|
|
:- include_module builder.
|
|
|
|
:- import_module assoc_list.
|
|
:- import_module char.
|
|
:- import_module deconstruct.
|
|
:- import_module list.
|
|
:- import_module maybe.
|
|
:- import_module ops.
|
|
:- import_module pretty_printer.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Wrapper types that associate particular semantics with raw strings.
|
|
%
|
|
% These types are useful for defining stream typeclass instances
|
|
% where you want different instances for strings representing different
|
|
% semantic entities. Using the string type itself, without a wrapper,
|
|
% would be ambiguous in such situations.
|
|
%
|
|
% While each module that associates semantics with strings could define
|
|
% its own wrapper types, the notions of lines and text files are so common
|
|
% that it is simpler to define them just once, and this is the logical
|
|
% place to do that.
|
|
%
|
|
|
|
% A line is:
|
|
%
|
|
% - a possibly empty sequence of non-newline characters terminated by a
|
|
% newline character; or
|
|
% - a non-empty sequence of non-newline characters terminated by the end
|
|
% of the file.
|
|
%
|
|
:- type line
|
|
---> line(string).
|
|
|
|
% A text file is a possibly empty sequence of characters
|
|
% terminated by the end of the file.
|
|
%
|
|
:- type text_file
|
|
---> text_file(string).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
|
|
:- type string_encoding
|
|
---> utf8
|
|
; utf16.
|
|
|
|
% Return the internal string encoding on the current platform.
|
|
%
|
|
:- func internal_string_encoding = string_encoding.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Conversions between strings and lists of characters.
|
|
%
|
|
|
|
% Convert the string to a list of characters (code points).
|
|
%
|
|
% If strings use UTF-8 encoding, then each code unit in an ill-formed
|
|
% sequence is replaced by U+FFFD REPLACEMENT CHARACTER in the list.
|
|
% If strings use UTF-16 encoding, then each unpaired surrogate code point
|
|
% is returned as a separate code point in the list.
|
|
%
|
|
:- func to_char_list(string) = list(char).
|
|
:- pred to_char_list(string::in, list(char)::out) is det.
|
|
|
|
% Convert the string to a list of characters (code points) in reverse
|
|
% order.
|
|
%
|
|
% If strings use UTF-8 encoding, then each code unit in an ill-formed
|
|
% sequence is replaced by U+FFFD REPLACEMENT CHARACTER in the list.
|
|
% If strings use UTF-16 encoding, then each unpaired surrogate code point
|
|
% is returned as a separate code point in the list.
|
|
%
|
|
:- func to_rev_char_list(string) = list(char).
|
|
:- pred to_rev_char_list(string::in, list(char)::out) is det.
|
|
|
|
% Convert a list of characters (code points) to a string.
|
|
% Throws an exception if the list contains a null character or code point
|
|
% that cannot be encoded in a string. (Namely, surrogate code points cannot
|
|
% be encoded in UTF-8 strings.)
|
|
%
|
|
:- func from_char_list(list(char)::in) = (string::uo) is det.
|
|
:- pred from_char_list(list(char)::in, string::uo) is det.
|
|
|
|
% As above, but fail instead of throwing an exception if the list contains
|
|
% a null character or code point that cannot be encoded in a string.
|
|
%
|
|
:- pred semidet_from_char_list(list(char)::in, string::uo) is semidet.
|
|
|
|
% Same as from_char_list, except that it reverses the order
|
|
% of the characters.
|
|
% Throws an exception if the list contains a null character or code point
|
|
% that cannot be encoded in a string. (Namely, surrogate code points cannot
|
|
% be encoded in UTF-8 strings.)
|
|
%
|
|
:- func from_rev_char_list(list(char)::in) = (string::uo) is det.
|
|
:- pred from_rev_char_list(list(char)::in, string::uo) is det.
|
|
|
|
% As above, but fail instead of throwing an exception if the list contains
|
|
% a null character or code point that cannot be encoded in a string.
|
|
%
|
|
:- pred semidet_from_rev_char_list(list(char)::in, string::uo) is semidet.
|
|
|
|
% Convert a string into a list of code units of the string encoding used
|
|
% by the current process.
|
|
%
|
|
:- pred to_code_unit_list(string::in, list(int)::out) is det.
|
|
|
|
% Convert a string into a list of UTF-8 code units.
|
|
% Throws an exception if the string contains an unpaired surrogate code
|
|
% point, as the encoding of surrogate code points is prohibited in UTF-8.
|
|
%
|
|
:- pred to_utf8_code_unit_list(string::in, list(int)::out) is det.
|
|
|
|
% Convert a string into a list of UTF-16 code units.
|
|
% Throws an exception if strings use UTF-8 encoding and the given string
|
|
% contains an ill-formed code unit sequence, as arbitrary bytes cannot be
|
|
% represented in UTF-16 (even allowing for ill-formed sequences).
|
|
%
|
|
:- pred to_utf16_code_unit_list(string::in, list(int)::out) is det.
|
|
|
|
% Convert a list of code units to a string.
|
|
% Fails if the list does not contain a valid encoding of a string
|
|
% (in the encoding expected by the current process),
|
|
% or if the string would contain a null character.
|
|
%
|
|
:- pred from_code_unit_list(list(int)::in, string::uo) is semidet.
|
|
|
|
% Convert a list of code units to a string.
|
|
% The resulting string may contain ill-formed sequences.
|
|
% Fails if the list contains a code unit that is out of range
|
|
% or if the string would contain a null character.
|
|
%
|
|
:- pred from_code_unit_list_allow_ill_formed(list(int)::in, string::uo)
|
|
is semidet.
|
|
|
|
% Convert a list of UTF-8 code units to a string.
|
|
% Fails if the list does not contain a valid encoding of a string
|
|
% or if the string would contain a null character.
|
|
%
|
|
:- pred from_utf8_code_unit_list(list(int)::in, string::uo) is semidet.
|
|
|
|
% Convert a list of UTF-16 code units to a string.
|
|
% Fails if the list does not contain a valid encoding of a string
|
|
% or if the string would contain a null character.
|
|
%
|
|
:- pred from_utf16_code_unit_list(list(int)::in, string::uo) is semidet.
|
|
|
|
% duplicate_char(Char, Count, String):
|
|
%
|
|
% Construct a string consisting of Count occurrences of Char code points
|
|
% in sequence, returning the empty string if Count is less than or equal
|
|
% to zero. Throws an exception if Char is a null character or code point
|
|
% that cannot be encoded in a string. (Namely, surrogate code points cannot
|
|
% be encoded in UTF-8 strings.)
|
|
%
|
|
:- func duplicate_char(char::in, int::in) = (string::uo) is det.
|
|
:- pred duplicate_char(char::in, int::in, string::uo) is det.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Reading characters from strings.
|
|
%
|
|
|
|
% This type is used by the _repl indexing predicates to distinguish a
|
|
% U+FFFD code point that is actually in a string from a U+FFFD code point
|
|
% generated when the predicate encounters an ill-formed code unit sequence
|
|
% in a UTF-8 string.
|
|
%
|
|
:- type maybe_replaced
|
|
---> not_replaced
|
|
; replaced_code_unit(uint8).
|
|
|
|
% index(String, Index, Char):
|
|
%
|
|
% If Index is the initial code unit offset of a well-formed code unit
|
|
% sequence in String then Char is the code point encoded by that
|
|
% sequence.
|
|
%
|
|
% Otherwise, if Index is in range, Char is either a U+FFFD REPLACEMENT
|
|
% CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate
|
|
% code point at Index (when strings are UTF-16 encoded).
|
|
%
|
|
% Fails if Index is out of range (negative, or greater than or equal to
|
|
% the length of String).
|
|
%
|
|
:- pred index(string::in, int::in, char::uo) is semidet.
|
|
|
|
% det_index(String, Index, Char):
|
|
%
|
|
% Like index/3 but throws an exception if Index is out of range
|
|
% (negative, or greater than or equal to the length of String).
|
|
%
|
|
:- func det_index(string, int) = char.
|
|
:- pred det_index(string::in, int::in, char::uo) is det.
|
|
|
|
% unsafe_index(String, Index, Char):
|
|
%
|
|
% Like index/3 but does not check that Index is in range.
|
|
%
|
|
% WARNING: behavior is UNDEFINED if Index is out of range
|
|
% (negative, or greater than or equal to the length of String).
|
|
% This version is constant time, whereas det_index
|
|
% may be linear in the length of the string. Use with care!
|
|
%
|
|
:- func unsafe_index(string, int) = char.
|
|
:- pred unsafe_index(string::in, int::in, char::uo) is det.
|
|
|
|
% A synonym for det_index/2:
|
|
% String ^ elem(Index) = det_index(String, Index).
|
|
%
|
|
:- func string ^ elem(int) = char.
|
|
|
|
% A synonym for unsafe_index/2:
|
|
% String ^ unsafe_elem(Index) = unsafe_index(String, Index).
|
|
%
|
|
:- func string ^ unsafe_elem(int) = char.
|
|
|
|
% index_next(String, Index, NextIndex, Char):
|
|
%
|
|
% Succeeds if and only if Index is between 0 and Len-1 (both inclusive)
|
|
% where Len is the number of code units in String.
|
|
%
|
|
% If Index is the initial code unit offset of a well-formed code unit
|
|
% sequence in String, then Char will be set to the code point encoded
|
|
% by that sequence, and NextIndex will be set to the offset of the code
|
|
% unit immediately following that sequence.
|
|
%
|
|
% If Index is *not* the initial code unit offset of a well-formed
|
|
% code unit sequence, NextIndex will be set to Index + 1, but the value
|
|
% of Char will depend on string encoding used by the target platform.
|
|
%
|
|
% - On platforms that encode strings using UTF-8 (i.e. when targeting C)
|
|
% Char will be set to U+FFFD (the Unicode replacement character).
|
|
%
|
|
% - On platforms that encode strings using UTF-16 (i.e. when targeting
|
|
% C# or Java), Char will be set to the unpaired surrogate code point
|
|
% at Index. (For more details, see the comment just below.)
|
|
%
|
|
:- pred index_next(string::in, int::in, int::out, char::uo) is semidet.
|
|
|
|
% index_next_repl(String, Index, NextIndex, Char, MaybeReplaced):
|
|
%
|
|
% Does the same job as index_next/4 but on success, it also returns
|
|
% MaybeReplaced, which will specify whether Char is the result
|
|
% of the replacement of a non-well-formed UTF-8 character with U+FFFD.
|
|
%
|
|
% On platforms that encode strings using UTF-8 (i.e. when targeting C),
|
|
% there are three cases.
|
|
%
|
|
% - If Char is not U+FFFD, then MaybeReplaced will be `not_replaced'.
|
|
%
|
|
% - If Char is U+FFFD because there is a well-formed code point encoded
|
|
% in String starting at Index, and that code point is U+FFFD, then
|
|
% MaybeReplaced will also be `not_replaced'.
|
|
%
|
|
% - If Char is U+FFFD but there is *no* well formed code point encoded
|
|
% in String starting at Index, then MaybeReplaced will be
|
|
% `replaced_code_unit(CodeUnit)', where CodeUnit is the code unit
|
|
% at offset Index in String.
|
|
%
|
|
% On platforms that encode strings using UTF-16 (i.e. when targeting C#
|
|
% or Java), MaybeReplaced will always be bound to `not_replaced'.
|
|
% The only ways that a UTF-16 string may be non-well-formed are
|
|
%
|
|
% - by having a high surrogate code unit (between 0xD800 and 0xDBFF)
|
|
% that is not immediately followed by a low surrogate code unit
|
|
% (between 0xDC00 and 0xDFFF), or
|
|
%
|
|
% - by having a low surrogate code unit that is not immediately preceded
|
|
% by a high surrogate code unit.
|
|
%
|
|
% In both cases, index_next_repl will return the unpaired surrogate
|
|
% unchanged as Char. There is no replacement required, because
|
|
%
|
|
% - surrogate code units are all in the range 0xD800 to 0xDFFF, and
|
|
% - the Unicode standard deliberately does not assign any characters
|
|
% to the code points in this range.
|
|
%
|
|
% This means that if Char is in this range, then it must be an unpaired
|
|
% surrogate, but since Char actually appears in String, it won't be
|
|
% a *replacement* of another character.
|
|
%
|
|
:- pred index_next_repl(string::in, int::in, int::out, char::uo,
|
|
maybe_replaced::out) is semidet.
|
|
|
|
% unsafe_index_next(String, Index, NextIndex, Char):
|
|
%
|
|
% Like index_next/4 but does not check that Index is in range.
|
|
% Fails if Index is equal to the length of String.
|
|
%
|
|
% WARNING: behavior is UNDEFINED if Index is out of range
|
|
% (negative, or greater than the length of String).
|
|
%
|
|
:- pred unsafe_index_next(string::in, int::in, int::out, char::uo) is semidet.
|
|
|
|
% unsafe_index_next_repl(String, Index, NextIndex, Char, MaybeReplaced):
|
|
%
|
|
% Like index_next_repl/5 but does not check that Index is in range.
|
|
% Fails if Index is equal to the length of String.
|
|
%
|
|
% WARNING: behavior is UNDEFINED if Index is out of range
|
|
% (negative, or greater than the length of String).
|
|
%
|
|
:- pred unsafe_index_next_repl(string::in, int::in, int::out, char::uo,
|
|
maybe_replaced::out) is semidet.
|
|
|
|
% prev_index(String, Index, PrevIndex, Char):
|
|
%
|
|
% If Index - 1 is the final code unit offset of a well-formed sequence in
|
|
% String then Char is the code point encoded by that sequence, and
|
|
% PrevIndex is the initial code unit offset of that sequence.
|
|
%
|
|
% Otherwise, if Index is in range, Char is either a U+FFFD REPLACEMENT
|
|
% CHARACTER (when strings are UTF-8 encoded) or the unpaired surrogate
|
|
% code point at Index - 1 (when strings are UTF-16 encoded), and
|
|
% PrevIndex is Index - 1.
|
|
%
|
|
% Fails if Index is out of range (non-positive, or greater than the
|
|
% length of String).
|
|
%
|
|
:- pred prev_index(string::in, int::in, int::out, char::uo) is semidet.
|
|
|
|
% prev_index_repl(String, Index, PrevIndex, Char, MaybeReplaced):
|
|
%
|
|
% Like prev_index/4 but also returns MaybeReplaced on success.
|
|
% When Char is not U+FFFD, then MaybeReplaced is always `not_replaced'.
|
|
% When Char is U+FFFD (the Unicode replacement character), then there are
|
|
% two cases:
|
|
%
|
|
% - If there is a U+FFFD code point encoded in String at
|
|
% [PrevIndex, Index) then MaybeReplaced is `not_replaced'.
|
|
%
|
|
% - Otherwise, MaybeReplaced is `replaced_code_unit(CodeUnit)' where
|
|
% CodeUnit is the code unit in String at Index - 1.
|
|
%
|
|
:- pred prev_index_repl(string::in, int::in, int::out, char::uo,
|
|
maybe_replaced::out) is semidet.
|
|
|
|
% unsafe_prev_index(String, Index, PrevIndex, Char):
|
|
%
|
|
% Like prev_index/4 but does not check that Index is in range.
|
|
% Fails if Index is zero.
|
|
%
|
|
% WARNING: behavior is UNDEFINED if Index is out of range
|
|
% (negative, or greater than the length of String).
|
|
%
|
|
:- pred unsafe_prev_index(string::in, int::in, int::out, char::uo) is semidet.
|
|
|
|
% unsafe_prev_index_repl(String, Index, PrevIndex, Char, MaybeReplaced):
|
|
%
|
|
% Like prev_index_repl/5 but does not check that Index is in range.
|
|
% Fails if Index is zero.
|
|
%
|
|
% WARNING: behavior is UNDEFINED if Index is out of range
|
|
% (negative, or greater than the length of String).
|
|
%
|
|
:- pred unsafe_prev_index_repl(string::in, int::in, int::out, char::uo,
|
|
maybe_replaced::out) is semidet.
|
|
|
|
% unsafe_index_code_unit(String, Index, CodeUnit):
|
|
%
|
|
% CodeUnit is the code unit in String at the offset Index.
|
|
% WARNING: behavior is UNDEFINED if Index is out of range
|
|
% (negative, or greater than or equal to the length of String).
|
|
%
|
|
:- pred unsafe_index_code_unit(string::in, int::in, int::out) is det.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Writing characters to strings.
|
|
%
|
|
|
|
% set_char(Char, Index, String0, String):
|
|
%
|
|
% String is String0, with the code unit sequence beginning at Index
|
|
% replaced by the encoding of Char. If the code unit at Index is the
|
|
% initial code unit in a valid encoding of a code point, then that entire
|
|
% code unit sequence is replaced. Otherwise, only the code unit at Index
|
|
% is replaced.
|
|
%
|
|
% Fails if Index is out of range (negative, or greater than or equal to
|
|
% the length of String0).
|
|
%
|
|
% Throws an exception if Char is the null character or a code point that
|
|
% cannot be encoded in a string (namely, surrogate code points cannot be
|
|
% encoded in UTF-8 strings).
|
|
%
|
|
:- pred set_char(char, int, string, string).
|
|
:- mode set_char(in, in, in, out) is semidet.
|
|
% NOTE This mode is disabled because the compiler puts constant strings
|
|
% into static data even when they might be updated.
|
|
% :- mode set_char(in, in, di, uo) is semidet.
|
|
|
|
% det_set_char(Char, Index, String0, String):
|
|
%
|
|
% Same as set_char/4 but throws an exception if Index is out of range
|
|
% (negative, or greater than or equal to the length of String0).
|
|
%
|
|
:- func det_set_char(char, int, string) = string.
|
|
:- pred det_set_char(char, int, string, string).
|
|
:- mode det_set_char(in, in, in, out) is det.
|
|
% NOTE This mode is disabled because the compiler puts constant strings
|
|
% into static data even when they might be updated.
|
|
% :- mode det_set_char(in, in, di, uo) is det.
|
|
|
|
% unsafe_set_char(Char, Index, String0, String):
|
|
%
|
|
% Same as set_char/4 but does not check if Index is in range.
|
|
% WARNING: behavior is UNDEFINED if Index is out of range
|
|
% (negative, or greater than or equal to the length of String0).
|
|
% Use with care!
|
|
%
|
|
:- func unsafe_set_char(char, int, string) = string.
|
|
:- mode unsafe_set_char(in, in, in) = out is det.
|
|
% NOTE This mode is disabled because the compiler puts constant strings
|
|
% into static data even when they might be updated.
|
|
% :- mode unsafe_set_char(in, in, di) = uo is det.
|
|
:- pred unsafe_set_char(char, int, string, string).
|
|
:- mode unsafe_set_char(in, in, in, out) is det.
|
|
% NOTE This mode is disabled because the compiler puts constant strings
|
|
% into static data even when they might be updated.
|
|
% :- mode unsafe_set_char(in, in, di, uo) is det.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Determining the lengths of strings.
|
|
%
|
|
|
|
% Determine the length of a string, in code units.
|
|
% An empty string has length zero.
|
|
%
|
|
% NOTE: code points (characters) are encoded using one or more code units,
|
|
% i.e. bytes for UTF-8; 16-bit integers for UTF-16.
|
|
%
|
|
:- func length(string::in) = (int::uo) is det.
|
|
:- pred length(string, int).
|
|
:- mode length(in, uo) is det.
|
|
:- mode length(ui, uo) is det.
|
|
|
|
% Synonyms for length.
|
|
%
|
|
:- func count_code_units(string) = int.
|
|
:- pred count_code_units(string::in, int::out) is det.
|
|
|
|
% Determine the number of code points in a string.
|
|
%
|
|
% Each valid code point, and each code unit that is part of an ill-formed
|
|
% sequence, contributes one to the result.
|
|
% (This matches the number of steps it would take to iterate over the
|
|
% string using string.index_next or string.prev_index.)
|
|
%
|
|
:- func count_code_points(string) = int.
|
|
:- pred count_code_points(string::in, int::out) is det.
|
|
|
|
% NOTE We are changing all occurrences of "codepoint" in the
|
|
% names of predicates and functions to "code_point", for consistency
|
|
% with predicate and function names that talk about code_units.
|
|
%
|
|
:- func count_codepoints(string) = int.
|
|
:- pred count_codepoints(string::in, int::out) is det.
|
|
:- pragma obsolete(func(count_codepoints/1), [count_code_points/1]).
|
|
:- pragma obsolete(pred(count_codepoints/2), [count_code_points/2]).
|
|
|
|
% count_utf8_code_units(String) = Length:
|
|
%
|
|
% Return the number of code units required to represent a string in
|
|
% UTF-8 encoding (with allowance for ill-formed sequences).
|
|
% Equivalent to Length = length(to_utf8_code_unit_list(String)).
|
|
%
|
|
% Throws an exception if strings use UTF-16 encoding but the given string
|
|
% contains an unpaired surrogate code point. Surrogate code points cannot
|
|
% be represented in UTF-8.
|
|
%
|
|
:- func count_utf8_code_units(string) = int.
|
|
|
|
% code_point_offset(String, StartOffset, Count, Offset):
|
|
%
|
|
% Let S be the substring of String from code unit StartOffset to the
|
|
% end of the string. Offset is code unit offset after advancing Count
|
|
% steps in S, where each step skips over either:
|
|
% - one encoding of a Unicode code point, or
|
|
% - one code unit that is part of an ill-formed sequence.
|
|
%
|
|
% Fails if StartOffset is out of range (negative, or greater than the
|
|
% length of String), or if there are fewer than Count steps possible in S.
|
|
%
|
|
:- pred code_point_offset(string::in, int::in, int::in, int::out) is semidet.
|
|
|
|
% NOTE We are changing all occurrences of "codepoint" in the
|
|
% names of predicates and functions to "code_point", for consistency
|
|
% with predicate and function names that talk about code_units.
|
|
%
|
|
:- pred codepoint_offset(string::in, int::in, int::in, int::out) is semidet.
|
|
:- pragma obsolete(pred(codepoint_offset/4), [code_point_offset/4]).
|
|
|
|
% code_point_offset(String, Count, Offset):
|
|
%
|
|
% Same as `code_point_offset(String, 0, Count, Offset)'.
|
|
%
|
|
:- pred code_point_offset(string::in, int::in, int::out) is semidet.
|
|
|
|
% NOTE We are changing all occurrences of "codepoint" in the
|
|
% names of predicates and functions to "code_point", for consistency
|
|
% with predicate and function names that talk about code_units.
|
|
%
|
|
:- pred codepoint_offset(string::in, int::in, int::out) is semidet.
|
|
:- pragma obsolete(pred(codepoint_offset/3), [code_point_offset/3]).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Computing hashes of strings.
|
|
%
|
|
|
|
% Compute a hash value for a string.
|
|
%
|
|
:- func hash(string) = int.
|
|
:- pred hash(string::in, int::out) is det.
|
|
|
|
% Two other hash functions for strings.
|
|
%
|
|
:- func hash2(string) = int.
|
|
:- func hash3(string) = int.
|
|
|
|
% Cross-compilation-friendly versions of hash, hash2 and hash3
|
|
% respectively.
|
|
:- func hash4(string) = int.
|
|
:- func hash5(string) = int.
|
|
:- func hash6(string) = int.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Tests on strings.
|
|
%
|
|
|
|
% True if string is the empty string.
|
|
%
|
|
:- pred is_empty(string::in) is semidet.
|
|
|
|
% True if the string is a valid UTF-8 or UTF-16 string.
|
|
% In target languages that use UTF-8 string encoding, `is_well_formed(S)'
|
|
% is true if-and-only-if S consists of a well-formed UTF-8 code unit
|
|
% sequence.
|
|
% In target languages that use UTF-16 string encoding, `is_well_formed(S)'
|
|
% is true if-and-only-if S consists of a well-formed UTF-16 code unit
|
|
% sequence.
|
|
%
|
|
:- pred is_well_formed(string::in) is semidet.
|
|
|
|
% Values of this type record whether a string is well or ill formed.
|
|
% In the latter case, the integer gives the offset in the string
|
|
% (as a count of either UTF-8 or UTF-16 code units, depending on the
|
|
% target language) of the first position at which the string departs
|
|
% from well-formedness.
|
|
%
|
|
:- type well_or_ill_formed
|
|
---> well_formed
|
|
; ill_formed(int).
|
|
|
|
% Does the same job as is_well_formed, but if the string is NOT well
|
|
% formed, it will return the offset (as a count of code units) of the
|
|
% first position at which the string departs from well-formedness.
|
|
%
|
|
:- pred check_well_formedness(string::in, well_or_ill_formed::out) is det.
|
|
|
|
% True if string contains only alphabetic characters [A-Za-z].
|
|
%
|
|
:- pred is_all_alpha(string::in) is semidet.
|
|
|
|
% True if string contains only alphabetic characters [A-Za-z] and digits
|
|
% [0-9].
|
|
%
|
|
:- pred is_all_alnum(string::in) is semidet.
|
|
|
|
% True if string contains only alphabetic characters [A-Za-z] and
|
|
% underscores.
|
|
%
|
|
:- pred is_all_alpha_or_underscore(string::in) is semidet.
|
|
|
|
% True if string contains only alphabetic characters [A-Za-z],
|
|
% digits [0-9], and underscores.
|
|
%
|
|
:- pred is_all_alnum_or_underscore(string::in) is semidet.
|
|
|
|
% True if the string contains only decimal digits (0-9).
|
|
%
|
|
:- pred is_all_digits(string::in) is semidet.
|
|
|
|
% all_match(TestPred, String):
|
|
%
|
|
% True if-and-only-if all code points in String satisfy TestPred,
|
|
% and String contains no ill-formed code unit sequences.
|
|
%
|
|
:- pred all_match(pred(char)::in(pred(in) is semidet), string::in) is semidet.
|
|
|
|
% contains_match(TestPred, String):
|
|
%
|
|
% True if-and-only-if String contains at least one code point
|
|
% that satisfies TestPred. Any ill-formed code unit sequences in String
|
|
% are ignored as they do not encode code points.
|
|
%
|
|
:- pred contains_match(pred(char)::in(pred(in) is semidet), string::in)
|
|
is semidet.
|
|
|
|
% contains_char(String, Char):
|
|
%
|
|
% Succeed if the code point Char occurs in String.
|
|
% Any ill-formed code unit sequences within String are ignored
|
|
% as they will not contain Char.
|
|
%
|
|
:- pred contains_char(string::in, char::in) is semidet.
|
|
|
|
% compare_substrings(Res, X, StartX, Y, StartY, Length):
|
|
%
|
|
% Compare two substrings by code unit order. The two substrings are
|
|
% the substring of X between StartX and StartX + Length, and
|
|
% the substring of Y between StartY and StartY + Length.
|
|
% StartX, StartY and Length are all in terms of code units.
|
|
%
|
|
% Fails if StartX or StartX + Length are not within [0, length(X)],
|
|
% or if StartY or StartY + Length are not within [0, length(Y)],
|
|
% or if Length is negative.
|
|
%
|
|
:- pred compare_substrings(comparison_result::uo, string::in, int::in,
|
|
string::in, int::in, int::in) is semidet.
|
|
|
|
% unsafe_compare_substrings(Res, X, StartX, Y, StartY, Length):
|
|
%
|
|
% Same as compare_substrings/6 but without range checks.
|
|
% WARNING: if any of StartX, StartY, StartX + Length or
|
|
% StartY + Length are out of range, or if Length is negative,
|
|
% then the behaviour is UNDEFINED. Use with care!
|
|
%
|
|
:- pred unsafe_compare_substrings(comparison_result::uo, string::in, int::in,
|
|
string::in, int::in, int::in) is det.
|
|
|
|
% compare_ignore_case_ascii(Res, X, Y):
|
|
%
|
|
% Compare two strings by code unit order, ignoring the case of letters
|
|
% (A-Z, a-z) in the ASCII range.
|
|
% Equivalent to `compare(Res, to_lower(X), to_lower(Y))'
|
|
% but more efficient.
|
|
%
|
|
:- pred compare_ignore_case_ascii(comparison_result::uo,
|
|
string::in, string::in) is det.
|
|
|
|
% prefix_length(Pred, String):
|
|
%
|
|
% The length (in code units) of the maximal prefix of String consisting
|
|
% entirely of code points satisfying Pred.
|
|
%
|
|
:- func prefix_length(pred(char)::in(pred(in) is semidet), string::in)
|
|
= (int::out) is det.
|
|
|
|
% suffix_length(Pred, String):
|
|
%
|
|
% The length (in code units) of the maximal suffix of String consisting
|
|
% entirely of code points satisfying Pred.
|
|
%
|
|
:- func suffix_length(pred(char)::in(pred(in) is semidet), string::in)
|
|
= (int::out) is det.
|
|
|
|
% sub_string_search(String, SubString, Index):
|
|
%
|
|
% Index is the code unit position in String where the first
|
|
% occurrence of SubString begins. Indices start at zero, so if
|
|
% SubString is a prefix of String, this will return Index = 0.
|
|
%
|
|
:- pred sub_string_search(string::in, string::in, int::out) is semidet.
|
|
|
|
% sub_string_search_start(String, SubString, BeginAt, Index):
|
|
%
|
|
% Index is the code unit position in String where the first
|
|
% occurrence of SubString occurs such that 'Index' is greater than or
|
|
% equal to BeginAt. Indices start at zero.
|
|
% Fails if either BeginAt is negative, or greater than
|
|
% length(String) - length(SubString).
|
|
%
|
|
:- pred sub_string_search_start(string::in, string::in, int::in, int::out)
|
|
is semidet.
|
|
|
|
% unsafe_sub_string_search_start(String, SubString, BeginAt, Index):
|
|
%
|
|
% Same as sub_string_search_start/4 but does not check that BeginAt
|
|
% is in range.
|
|
% WARNING: if BeginAt is either negative, or greater than length(String),
|
|
% then the behaviour is UNDEFINED. Use with care!
|
|
%
|
|
:- pred unsafe_sub_string_search_start(string::in, string::in, int::in,
|
|
int::out) is semidet.
|
|
|
|
% find_first_char(String, Char, Index):
|
|
%
|
|
% Find the first occurrence of the code point Char in String.
|
|
% On success, Index is the code unit offset of that code point.
|
|
%
|
|
:- pred find_first_char(string::in, char::in, int::out) is semidet.
|
|
|
|
% find_first_char_start(String, Char, BeginAt, Index):
|
|
%
|
|
% Find the first occurrence of the code point Char in String,
|
|
% beginning from the code unit offset BeginAt in String.
|
|
% On success, Index is the code unit offset of that code point.
|
|
%
|
|
% Fails if BeginAt is out of range (negative, or greater than or equal
|
|
% to the length of String).
|
|
%
|
|
:- pred find_first_char_start(string::in, char::in, int::in, int::out)
|
|
is semidet.
|
|
|
|
% unsafe_find_first_char_start(String, Char, BeginAt, Index):
|
|
%
|
|
% Same as find_first_char_start/4 but does not check that BeginAt
|
|
% is in range.
|
|
% WARNING: if BeginAt is either negative, or greater than length(String),
|
|
% then the behaviour is UNDEFINED. Use with care!
|
|
%
|
|
:- pred unsafe_find_first_char_start(string::in, char::in, int::in, int::out)
|
|
is semidet.
|
|
|
|
% find_last_char(String, Char, Index):
|
|
%
|
|
% Find the last occurrence of the code point Char in String.
|
|
% On success, Index is the code unit offset of that code point.
|
|
%
|
|
:- pred find_last_char(string::in, char::in, int::out) is semidet.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Appending strings.
|
|
%
|
|
|
|
% Append two strings together.
|
|
%
|
|
:- func append(string::in, string::in) = (string::uo) is det.
|
|
|
|
% append(S1, S2, S3):
|
|
%
|
|
% Append two strings together. S3 consists of the code units of S1
|
|
% followed by the code units of S2, in order.
|
|
%
|
|
% An ill-formed code unit sequence at the end of S1 may join with an
|
|
% ill-formed code unit sequence at the start of S2 to produce a valid
|
|
% encoding of a code point in S3.
|
|
%
|
|
:- pred append(string, string, string).
|
|
:- mode append(in, in, in) is semidet. % implied
|
|
:- mode append(in, uo, in) is semidet.
|
|
:- mode append(in, in, uo) is det.
|
|
:- mode append(uo, in, in) is semidet.
|
|
|
|
% nondet_append(S1, S2, S3):
|
|
%
|
|
% Non-deterministically return S1 and S2, where S1 ++ S2 = S3.
|
|
% S3 is split after each code point or code unit in an ill-formed sequence.
|
|
%
|
|
:- pred nondet_append(string::out, string::out, string::in) is multi.
|
|
|
|
% S1 ++ S2 = S :- append(S1, S2, S).
|
|
%
|
|
% Append two strings together using nicer inline syntax.
|
|
%
|
|
:- func string ++ string = string.
|
|
:- mode in ++ in = uo is det.
|
|
|
|
% Append a list of strings together.
|
|
%
|
|
:- func append_list(list(string)::in) = (string::uo) is det.
|
|
:- pred append_list(list(string)::in, string::uo) is det.
|
|
|
|
% join_list(Separator, Strings) = JoinedString:
|
|
%
|
|
% Append together the strings in Strings, putting Separator between
|
|
% each pair of adjacent strings. If Strings is the empty list,
|
|
% return the empty string.
|
|
%
|
|
:- func join_list(string::in, list(string)::in) = (string::uo) is det.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Making strings from smaller pieces.
|
|
%
|
|
|
|
:- type string_piece
|
|
---> string(string)
|
|
; substring(string, int, int). % string, start, end offset
|
|
|
|
% append_string_pieces(Pieces, String):
|
|
%
|
|
% Append together the strings and substrings in Pieces into a string.
|
|
% Throws an exception if Pieces contains an element
|
|
% `substring(S, Start, End)' where Start or End are not within
|
|
% the range [0, length(S)], or if Start > End.
|
|
%
|
|
:- pred append_string_pieces(list(string_piece)::in, string::uo) is det.
|
|
|
|
% Same as append_string_pieces/2 but without range checks.
|
|
% WARNING: if any piece `substring(S, Start, End)' has Start or End
|
|
% outside the range [0, length(S)], or if Start > End,
|
|
% then the behaviour is UNDEFINED. Use with care!
|
|
%
|
|
:- pred unsafe_append_string_pieces(list(string_piece)::in, string::uo)
|
|
is det.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Splitting up strings.
|
|
%
|
|
|
|
% first_char(String, Char, Rest) is true if-and-only-if String begins
|
|
% with a well-formed code unit sequence, Char is the code point encoded by
|
|
% that sequence, and Rest is the rest of String after that sequence.
|
|
%
|
|
% The (uo, in, in) mode throws an exception if Char cannot be encoded in
|
|
% a string, or if Char is a surrogate code point (for consistency with
|
|
% the other modes).
|
|
%
|
|
% WARNING: first_char makes a copy of Rest because the garbage collector
|
|
% doesn't handle references into the middle of an object, at least not the
|
|
% way we use it. This means that repeated use of first_char to iterate
|
|
% over a string will result in very poor performance. If you want to
|
|
% iterate over the characters in a string, use foldl or to_char_list
|
|
% instead.
|
|
%
|
|
:- pred first_char(string, char, string).
|
|
:- mode first_char(in, in, in) is semidet. % implied
|
|
:- mode first_char(in, uo, in) is semidet. % implied
|
|
:- mode first_char(in, in, uo) is semidet. % implied
|
|
:- mode first_char(in, uo, uo) is semidet.
|
|
:- mode first_char(uo, in, in) is det.
|
|
|
|
% split(String, Index, LeftSubstring, RightSubstring):
|
|
%
|
|
% Split a string into two substrings at the code unit offset Index.
|
|
% (If Index is out of the range [0, length of String], it is treated
|
|
% as if it were the nearest end-point of that range.)
|
|
%
|
|
:- pred split(string::in, int::in, string::out, string::out) is det.
|
|
|
|
% split_by_code_point(String, Count, LeftSubstring, RightSubstring):
|
|
%
|
|
% LeftSubstring is the left-most Count code points of String,
|
|
% and RightSubstring is the remainder of String.
|
|
% (If Count is out of the range [0, length of String], it is treated
|
|
% as if it were the nearest end-point of that range.)
|
|
%
|
|
:- pred split_by_code_point(string::in, int::in, string::out, string::out)
|
|
is det.
|
|
|
|
% NOTE We are changing all occurrences of "codepoint" in the
|
|
% names of predicates and functions to "code_point", for consistency
|
|
% with predicate and function names that talk about code_units.
|
|
%
|
|
:- pred split_by_codepoint(string::in, int::in, string::out, string::out)
|
|
is det.
|
|
:- pragma obsolete(pred(split_by_codepoint/4), [split_by_code_point/4]).
|
|
|
|
% left(String, Count, LeftSubstring):
|
|
%
|
|
% LeftSubstring is the left-most Count code units of String.
|
|
% (If Count is out of the range [0, length of String], it is treated
|
|
% as if it were the nearest end-point of that range.)
|
|
%
|
|
:- func left(string::in, int::in) = (string::out) is det.
|
|
:- pred left(string::in, int::in, string::out) is det.
|
|
|
|
% left_by_code_point(String, Count, LeftSubstring):
|
|
%
|
|
% LeftSubstring is the left-most Count code points of String.
|
|
% (If Count is out of the range [0, length of String], it is treated
|
|
% as if it were the nearest end-point of that range.)
|
|
%
|
|
:- func left_by_code_point(string::in, int::in) = (string::out) is det.
|
|
:- pred left_by_code_point(string::in, int::in, string::out) is det.
|
|
|
|
% NOTE We are changing all occurrences of "codepoint" in the
|
|
% names of predicates and functions to "code_point", for consistency
|
|
% with predicate and function names that talk about code_units.
|
|
%
|
|
:- func left_by_codepoint(string::in, int::in) = (string::out) is det.
|
|
:- pred left_by_codepoint(string::in, int::in, string::out) is det.
|
|
:- pragma obsolete(func(left_by_codepoint/2), [left_by_code_point/2]).
|
|
:- pragma obsolete(pred(left_by_codepoint/3), [left_by_code_point/3]).
|
|
|
|
% right(String, Count, RightSubstring):
|
|
%
|
|
% RightSubstring is the right-most Count code units of String.
|
|
% (If Count is out of the range [0, length of String], it is treated
|
|
% as if it were the nearest end-point of that range.)
|
|
%
|
|
:- func right(string::in, int::in) = (string::out) is det.
|
|
:- pred right(string::in, int::in, string::out) is det.
|
|
|
|
% right_by_code_point(String, Count, RightSubstring):
|
|
%
|
|
% RightSubstring is the right-most Count code points of String.
|
|
% (If Count is out of the range [0, length of String], it is treated
|
|
% as if it were the nearest end-point of that range.)
|
|
%
|
|
:- func right_by_code_point(string::in, int::in) = (string::out) is det.
|
|
:- pred right_by_code_point(string::in, int::in, string::out) is det.
|
|
|
|
% NOTE We are changing all occurrences of "codepoint" in the
|
|
% names of predicates and functions to "code_point", for consistency
|
|
% with predicate and function names that talk about code_units.
|
|
%
|
|
:- func right_by_codepoint(string::in, int::in) = (string::out) is det.
|
|
:- pred right_by_codepoint(string::in, int::in, string::out) is det.
|
|
:- pragma obsolete(func(right_by_codepoint/2), [right_by_code_point/2]).
|
|
:- pragma obsolete(pred(right_by_codepoint/3), [right_by_code_point/3]).
|
|
|
|
% between(String, Start, End, Substring):
|
|
%
|
|
% Substring consists of the segment of String within the half-open
|
|
% interval [Start, End), where Start and End are code unit offsets.
|
|
% (If Start is out of the range [0, length of String], it is treated
|
|
% as if it were the nearest end-point of that range.
|
|
% If End is out of the range [Start, length of String],
|
|
% it is treated as if it were the nearest end-point of that range.)
|
|
%
|
|
:- func between(string::in, int::in, int::in) = (string::uo) is det.
|
|
:- pred between(string::in, int::in, int::in, string::uo) is det.
|
|
|
|
% between_code_points(String, Start, End, Substring):
|
|
%
|
|
% Substring is the part of String between the code point positions
|
|
% Start and End. The result is equivalent to:
|
|
%
|
|
% between(String, StartOffset, EndOffset, Substring)
|
|
%
|
|
% where:
|
|
%
|
|
% StartOffset is from code_point_offset(String, Start, StartOffset)
|
|
% if Start is in [0, count_code_points(String)],
|
|
% StartOffset = 0 if Start < 0,
|
|
% StartOffset = length(String) otherwise;
|
|
%
|
|
% EndOffset is from code_point_offset(String, End, EndOffset)
|
|
% if End is in [0, count_code_points(String)],
|
|
% EndOffset = 0 if End < 0,
|
|
% EndOffset = length(String) otherwise.
|
|
%
|
|
% between/4 will enforce StartOffset =< EndOffset.
|
|
%
|
|
:- func between_code_points(string::in, int::in, int::in)
|
|
= (string::uo) is det.
|
|
:- pred between_code_points(string::in, int::in, int::in, string::uo) is det.
|
|
|
|
% NOTE We are changing all occurrences of "codepoint" in the
|
|
% names of predicates and functions to "code_point", for consistency
|
|
% with predicate and function names that talk about code_units.
|
|
%
|
|
:- func between_codepoints(string::in, int::in, int::in)
|
|
= (string::uo) is det.
|
|
:- pred between_codepoints(string::in, int::in, int::in, string::uo) is det.
|
|
:- pragma obsolete(func(between_codepoints/3), [between_code_points/3]).
|
|
:- pragma obsolete(pred(between_codepoints/4), [between_code_points/4]).
|
|
|
|
% unsafe_between(String, Start, End, Substring):
|
|
%
|
|
% Substring consists of the segment of String within the half-open
|
|
% interval [Start, End), where Start and End are code unit offsets.
|
|
% WARNING: if Start is out of the range [0, length of String] or
|
|
% End is out of the range [Start, length of String]
|
|
% then the behaviour is UNDEFINED. Use with care!
|
|
% This version takes time proportional to the length of the substring,
|
|
% whereas substring may take time proportional to the length
|
|
% of the whole string.
|
|
%
|
|
:- func unsafe_between(string::in, int::in, int::in) = (string::uo) is det.
|
|
:- pred unsafe_between(string::in, int::in, int::in, string::uo) is det.
|
|
|
|
% words_separator(SepP, String) returns the list of non-empty
|
|
% substrings of String (in first to last order) that are delimited
|
|
% by non-empty sequences of code points matched by SepP.
|
|
% For example,
|
|
%
|
|
% words_separator(char.is_whitespace, " the cat sat on the mat") =
|
|
% ["the", "cat", "sat", "on", "the", "mat"]
|
|
%
|
|
% Note the difference to split_at_separator.
|
|
%
|
|
:- func words_separator(pred(char), string) = list(string).
|
|
:- mode words_separator(in(pred(in) is semidet), in) = out is det.
|
|
|
|
% words(String) =
|
|
% words_separator(char.is_whitespace, String).
|
|
%
|
|
:- func words(string) = list(string).
|
|
|
|
% split_at_separator(SepP, String) returns the list of (possibly empty)
|
|
% substrings of String (in first to last order) that are delimited
|
|
% by code points matched by SepP. For example,
|
|
%
|
|
% split_at_separator(char.is_whitespace, " the cat sat on the mat")
|
|
% = ["", "the", "cat", "", "sat", "on", "the", "", "mat"]
|
|
%
|
|
% Note the difference to words_separator.
|
|
%
|
|
:- func split_at_separator(pred(char), string) = list(string).
|
|
:- mode split_at_separator(in(pred(in) is semidet), in) = out is det.
|
|
|
|
% split_at_char(Char, String) =
|
|
% split_at_separator(unify(Char), String)
|
|
%
|
|
:- func split_at_char(char, string) = list(string).
|
|
|
|
% split_at_string(Separator, String) returns the list of substrings
|
|
% of String that are delimited by Separator. For example,
|
|
%
|
|
% split_at_string("|||", "|||fld2|||fld3") = ["", "fld2", "fld3"]
|
|
%
|
|
% Always the first match of Separator is used to break the String, for
|
|
% example: split_at_string("aa", "xaaayaaaz") = ["x", "ay", "az"]
|
|
%
|
|
:- func split_at_string(string, string) = list(string).
|
|
|
|
% split_into_lines(String) breaks String into a sequence of lines,
|
|
% with each line consisting of a possibly empty sequence of non-newline
|
|
% characters, followed either by a newline character, or by the end
|
|
% of the string. The string returned for a line will not contain
|
|
% the newline character.
|
|
%
|
|
:- func split_into_lines(string) = list(string).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Dealing with prefixes and suffixes.
|
|
%
|
|
|
|
% prefix(String, Prefix) is true if-and-only-if
|
|
% Prefix is a prefix of String.
|
|
% Same as append(Prefix, _, String).
|
|
%
|
|
:- pred prefix(string::in, string::in) is semidet.
|
|
|
|
% suffix(String, Suffix) is true if-and-only-if
|
|
% Suffix is a suffix of String.
|
|
% Same as append(_, Suffix, String).
|
|
%
|
|
:- pred suffix(string::in, string::in) is semidet.
|
|
|
|
% remove_prefix(Prefix, String, Suffix):
|
|
%
|
|
% This is a synonym for append(Prefix, Suffix, String) but with the
|
|
% arguments in a more convenient order for use with higher-order code.
|
|
%
|
|
% WARNING: the argument order differs from remove_suffix.
|
|
%
|
|
:- pred remove_prefix(string::in, string::in, string::out) is semidet.
|
|
|
|
% det_remove_prefix(Prefix, String, Suffix):
|
|
%
|
|
% This is a synonym for append(Prefix, Suffix, String) but with the
|
|
% arguments in a more convenient order for use with higher-order code.
|
|
%
|
|
% WARNING: the argument order differs from remove_suffix.
|
|
%
|
|
:- pred det_remove_prefix(string::in, string::in, string::out) is det.
|
|
|
|
% remove_prefix_if_present(Prefix, String) = Suffix returns String minus
|
|
% Prefix if String begins with Prefix, and String if it doesn't.
|
|
%
|
|
:- func remove_prefix_if_present(string, string) = string.
|
|
|
|
% add_prefix(Prefix, Str) = PrefixStr:
|
|
%
|
|
% Does the same job as Prefix ++ Str = PrefixStr, but allows
|
|
% using list.map to add the same prefix to many strings.
|
|
%
|
|
:- func add_prefix(string, string) = string.
|
|
|
|
% remove_suffix(String, Suffix, Prefix):
|
|
%
|
|
% The same as append(Prefix, Suffix, String).
|
|
%
|
|
% WARNING: the argument order differs from both remove_prefix and
|
|
% remove_suffix_if_present.
|
|
%
|
|
:- pred remove_suffix(string::in, string::in, string::out) is semidet.
|
|
|
|
% det_remove_suffix(String, Suffix) returns the same value as
|
|
% remove_suffix, except it throws an exception if String does not end
|
|
% with Suffix.
|
|
%
|
|
% WARNING: the argument order differs from both remove_prefix and
|
|
% remove_suffix_if_present.
|
|
%
|
|
:- func det_remove_suffix(string, string) = string.
|
|
|
|
% remove_suffix_if_present(Suffix, String) returns String minus Suffix
|
|
% if String ends with Suffix, and String if it doesn't.
|
|
%
|
|
% WARNING: the argument order differs from remove_suffix and
|
|
% det_remove_suffix.
|
|
%
|
|
:- func remove_suffix_if_present(string, string) = string.
|
|
|
|
% add_suffix(Suffix, Str) = StrSuffix:
|
|
%
|
|
% Does the same job as Str ++ Suffix = StrSuffix, but allows
|
|
% using list.map to add the same suffix to many strings.
|
|
%
|
|
:- func add_suffix(string, string) = string.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Transformations of strings.
|
|
%
|
|
|
|
% Convert the first character (if any) of a string to uppercase.
|
|
% Only letters (a-z) in the ASCII range are converted.
|
|
%
|
|
% This function transforms the initial code point of a string,
|
|
% whether or not the code point occurs as part of a combining sequence.
|
|
%
|
|
:- func capitalize_first(string) = string.
|
|
:- pred capitalize_first(string::in, string::out) is det.
|
|
|
|
% Convert the first character (if any) of a string to lowercase.
|
|
% Only letters (A-Z) in the ASCII range are converted.
|
|
%
|
|
% This function transforms the initial code point of a string,
|
|
% whether or not the code point occurs as part of a combining sequence.
|
|
%
|
|
:- func uncapitalize_first(string) = string.
|
|
:- pred uncapitalize_first(string::in, string::out) is det.
|
|
|
|
% Converts a string to uppercase.
|
|
% Only letters (a-z) in the ASCII range are converted.
|
|
%
|
|
% This function transforms each code point individually.
|
|
% Letters that occur within a combining sequence will be converted,
|
|
% whereas the precomposed character equivalent to the combining
|
|
% sequence would not be converted. For example:
|
|
%
|
|
% to_upper("a\u0301") ==> "A\u0301" % á decomposed
|
|
% to_upper("\u00E1") ==> "\u00E1" % á precomposed
|
|
%
|
|
:- func to_upper(string::in) = (string::uo) is det.
|
|
:- pred to_upper(string, string).
|
|
:- mode to_upper(in, uo) is det.
|
|
:- mode to_upper(in, in) is semidet. % implied
|
|
|
|
% Converts a string to lowercase.
|
|
% Only letters (a-z) in the ASCII range are converted.
|
|
%
|
|
% This function transforms each code point individually.
|
|
% Letters that occur within a combining sequence will be converted,
|
|
% whereas the precomposed character equivalent to the combining
|
|
% sequence would not be converted. For example:
|
|
%
|
|
% to_lower("A\u0301") ==> "a\u0301" % Á decomposed
|
|
% to_lower("\u00C1") ==> "\u00C1" % Á precomposed
|
|
%
|
|
:- func to_lower(string::in) = (string::uo) is det.
|
|
:- pred to_lower(string, string).
|
|
:- mode to_lower(in, uo) is det.
|
|
:- mode to_lower(in, in) is semidet. % implied
|
|
|
|
% pad_left(String0, PadChar, Width, String):
|
|
%
|
|
% Insert PadChars at the left of String0 until it is at least as long
|
|
% as Width, giving String. Width is currently measured as the number
|
|
% of code points.
|
|
%
|
|
:- func pad_left(string, char, int) = string.
|
|
:- pred pad_left(string::in, char::in, int::in, string::out) is det.
|
|
|
|
% pad_right(String0, PadChar, Width, String):
|
|
%
|
|
% Insert PadChars at the right of String0 until it is at least as long
|
|
% as Width, giving String. Width is currently measured as the number
|
|
% of code points.
|
|
%
|
|
:- func pad_right(string, char, int) = string.
|
|
:- pred pad_right(string::in, char::in, int::in, string::out) is det.
|
|
|
|
% chomp(String):
|
|
%
|
|
% Return String minus any single trailing newline character.
|
|
%
|
|
:- func chomp(string) = string.
|
|
|
|
% strip(String):
|
|
%
|
|
% Returns String minus any initial and trailing ASCII whitespace
|
|
% characters, i.e. characters satisfying char.is_whitespace.
|
|
%
|
|
:- func strip(string) = string.
|
|
|
|
% lstrip(String):
|
|
%
|
|
% Return String minus any initial ASCII whitespace characters,
|
|
% i.e. characters satisfying char.is_whitespace.
|
|
%
|
|
:- func lstrip(string) = string.
|
|
|
|
% rstrip(String):
|
|
%
|
|
% Returns String minus any trailing ASCII whitespace characters,
|
|
% i.e. characters satisfying char.is_whitespace.
|
|
%
|
|
:- func rstrip(string) = string.
|
|
|
|
% lstrip_pred(Pred, String):
|
|
%
|
|
% Returns String minus the maximal prefix consisting entirely
|
|
% of code points satisfying Pred.
|
|
%
|
|
:- func lstrip_pred(pred(char)::in(pred(in) is semidet), string::in)
|
|
= (string::out) is det.
|
|
|
|
% rstrip_pred(Pred, String):
|
|
%
|
|
% Returns String minus the maximal suffix consisting entirely
|
|
% of code points satisfying Pred.
|
|
%
|
|
:- func rstrip_pred(pred(char)::in(pred(in) is semidet), string::in)
|
|
= (string::out) is det.
|
|
|
|
% replace(String0, Pattern, Subst, String):
|
|
%
|
|
% Replaces the first occurrence of Pattern in String0 with Subst to give
|
|
% String. Fails if Pattern does not occur in String0.
|
|
%
|
|
:- pred replace(string::in, string::in, string::in, string::uo) is semidet.
|
|
|
|
% replace_all(String0, Pattern, Subst, String):
|
|
%
|
|
% Replaces any occurrences of Pattern in String0 with Subst to give
|
|
% String.
|
|
%
|
|
% If Pattern is the empty string, then Subst is inserted at every point
|
|
% in String0 except between two code units in an encoding of a code point.
|
|
% For example, these are true:
|
|
%
|
|
% replace_all("", "", "|", "|")
|
|
% replace_all("a", "", "|", "|a|")
|
|
% replace_all("ab", "", "|", "|a|b|")
|
|
%
|
|
:- func replace_all(string::in, string::in, string::in) = (string::uo) is det.
|
|
:- pred replace_all(string::in, string::in, string::in, string::uo) is det.
|
|
|
|
% replace_all_sv(Pattern, Subst, String0, String):
|
|
%
|
|
% Does the exact same job as replace_all, but takes the arguments
|
|
% in a different order. The advantage is that this order is easier to use
|
|
% with higher order code such as
|
|
%
|
|
% list.map(replace_all_sv(Pattern, Subst), Strings0, Strings)
|
|
%
|
|
% and with state variables, in code such as
|
|
%
|
|
% list.map(replace_all_sv(Pattern, Subst), !Strings)
|
|
%
|
|
:- pred replace_all_sv(string::in, string::in, string::in, string::uo) is det.
|
|
|
|
% word_wrap(Str, LineLen) = Wrapped:
|
|
%
|
|
% Wrapped is Str with newlines inserted between words (separated by ASCII
|
|
% space characters) so that at most LineLen code points appear on any line,
|
|
% and each line contains as many whole words as possible subject to that
|
|
% constraint. If any one word exceeds LineLen code points in length, then
|
|
% it will be broken over two (or more) lines. Sequences of whitespace
|
|
% characters are replaced by a single space.
|
|
%
|
|
% See char.is_whitespace for the definition of whitespace characters
|
|
% used by this predicate.
|
|
%
|
|
:- func word_wrap(string, int) = string.
|
|
|
|
% word_wrap_separator(Str, LineLen, BrokenWordSeparator) = Wrapped:
|
|
%
|
|
% word_wrap_separator/3 is like word_wrap/2, except that words that
|
|
% need to be broken up over multiple lines have BrokenWordSeparator
|
|
% inserted between each pair of pieces. If the number of code points in
|
|
% BrokenWordSeparator is greater than or equal to LineLen, then this
|
|
% function ignores the separator, since it would leave no room on a line
|
|
% for any actual words.
|
|
%
|
|
:- func word_wrap_separator(string, int, string) = string.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Folds over the characters in strings.
|
|
%
|
|
|
|
% foldl(Pred, String, !Acc):
|
|
%
|
|
% Pred is an accumulator predicate which is to be called for each
|
|
% code point of the string String in turn.
|
|
% If String contains ill-formed sequences, Pred is called for each
|
|
% code unit in an ill-formed sequence. If strings use UTF-8 encoding,
|
|
% U+FFFD is passed to Pred in place of each such code unit.
|
|
% If strings use UTF-16 encoding, each code unit in an ill-formed sequence
|
|
% is an unpaired surrogate code point, which will be passed to Pred.
|
|
%
|
|
% The initial value of the accumulator is !.Acc and the final value is
|
|
% !:Acc.
|
|
% (foldl(Pred, String, !Acc) is equivalent to
|
|
% to_char_list(String, Chars),
|
|
% list.foldl(Pred, Chars, !Acc)
|
|
% but is implemented more efficiently.)
|
|
%
|
|
:- func foldl(func(char, A) = A, string, A) = A.
|
|
:- pred foldl(pred(char, A, A), string, A, A).
|
|
:- mode foldl(in(pred(in, di, uo) is det), in, di, uo) is det.
|
|
:- mode foldl(in(pred(in, in, out) is det), in, in, out) is det.
|
|
:- mode foldl(in(pred(in, in, out) is semidet), in, in, out) is semidet.
|
|
:- mode foldl(in(pred(in, in, out) is nondet), in, in, out) is nondet.
|
|
:- mode foldl(in(pred(in, in, out) is multi), in, in, out) is multi.
|
|
|
|
% foldl2(Pred, String, !Acc1, !Acc2):
|
|
% A variant of foldl with two accumulators.
|
|
%
|
|
:- pred foldl2(pred(char, A, A, B, B), string, A, A, B, B).
|
|
:- mode foldl2(in(pred(in, di, uo, di, uo) is det),
|
|
in, di, uo, di, uo) is det.
|
|
:- mode foldl2(in(pred(in, in, out, di, uo) is det),
|
|
in, in, out, di, uo) is det.
|
|
:- mode foldl2(in(pred(in, in, out, in, out) is det),
|
|
in, in, out, in, out) is det.
|
|
:- mode foldl2(in(pred(in, in, out, in, out) is semidet),
|
|
in, in, out, in, out) is semidet.
|
|
:- mode foldl2(in(pred(in, in, out, in, out) is nondet),
|
|
in, in, out, in, out) is nondet.
|
|
:- mode foldl2(in(pred(in, in, out, in, out) is multi),
|
|
in, in, out, in, out) is multi.
|
|
|
|
% foldl_between(Pred, String, Start, End, !Acc)
|
|
% is equivalent to foldl(Pred, SubString, !Acc)
|
|
% where SubString = between(String, Start, End).
|
|
%
|
|
% Start and End are in terms of code units.
|
|
%
|
|
:- func foldl_between(func(char, A) = A, string, int, int, A) = A.
|
|
:- pred foldl_between(pred(char, A, A), string, int, int, A, A).
|
|
:- mode foldl_between(in(pred(in, in, out) is det), in, in, in,
|
|
in, out) is det.
|
|
:- mode foldl_between(in(pred(in, di, uo) is det), in, in, in,
|
|
di, uo) is det.
|
|
:- mode foldl_between(in(pred(in, in, out) is semidet), in, in, in,
|
|
in, out) is semidet.
|
|
:- mode foldl_between(in(pred(in, in, out) is nondet), in, in, in,
|
|
in, out) is nondet.
|
|
:- mode foldl_between(in(pred(in, in, out) is multi), in, in, in,
|
|
in, out) is multi.
|
|
|
|
% foldl2_between(Pred, String, Start, End, !Acc1, !Acc2)
|
|
% A variant of foldl_between with two accumulators.
|
|
%
|
|
% Start and End are in terms of code units.
|
|
%
|
|
:- pred foldl2_between(pred(char, A, A, B, B),
|
|
string, int, int, A, A, B, B).
|
|
:- mode foldl2_between(in(pred(in, di, uo, di, uo) is det),
|
|
in, in, in, di, uo, di, uo) is det.
|
|
:- mode foldl2_between(in(pred(in, in, out, di, uo) is det),
|
|
in, in, in, in, out, di, uo) is det.
|
|
:- mode foldl2_between(in(pred(in, in, out, in, out) is det),
|
|
in, in, in, in, out, in, out) is det.
|
|
:- mode foldl2_between(in(pred(in, in, out, in, out) is semidet),
|
|
in, in, in, in, out, in, out) is semidet.
|
|
:- mode foldl2_between(in(pred(in, in, out, in, out) is nondet),
|
|
in, in, in, in, out, in, out) is nondet.
|
|
:- mode foldl2_between(in(pred(in, in, out, in, out) is multi),
|
|
in, in, in, in, out, in, out) is multi.
|
|
|
|
% foldr(Pred, String, !Acc):
|
|
% As foldl/4, except that processing proceeds right-to-left.
|
|
%
|
|
:- func foldr(func(char, T) = T, string, T) = T.
|
|
:- pred foldr(pred(char, T, T), string, T, T).
|
|
:- mode foldr(in(pred(in, in, out) is det), in, in, out) is det.
|
|
:- mode foldr(in(pred(in, di, uo) is det), in, di, uo) is det.
|
|
:- mode foldr(in(pred(in, in, out) is semidet), in, in, out) is semidet.
|
|
:- mode foldr(in(pred(in, in, out) is nondet), in, in, out) is nondet.
|
|
:- mode foldr(in(pred(in, in, out) is multi), in, in, out) is multi.
|
|
|
|
% foldr_between(Pred, String, Start, End, !Acc)
|
|
% is equivalent to foldr(Pred, SubString, !Acc)
|
|
% where SubString = between(String, Start, End).
|
|
%
|
|
% Start and End are in terms of code units.
|
|
%
|
|
:- func foldr_between(func(char, T) = T, string, int, int, T) = T.
|
|
:- pred foldr_between(pred(char, T, T), string, int, int, T, T).
|
|
:- mode foldr_between(in(pred(in, in, out) is det), in, in, in,
|
|
in, out) is det.
|
|
:- mode foldr_between(in(pred(in, di, uo) is det), in, in, in,
|
|
di, uo) is det.
|
|
:- mode foldr_between(in(pred(in, in, out) is semidet), in, in, in,
|
|
in, out) is semidet.
|
|
:- mode foldr_between(in(pred(in, in, out) is nondet), in, in, in,
|
|
in, out) is nondet.
|
|
:- mode foldr_between(in(pred(in, in, out) is multi), in, in, in,
|
|
in, out) is multi.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Formatting tables.
|
|
%
|
|
|
|
:- type justified_column
|
|
---> left(list(string))
|
|
; right(list(string)).
|
|
|
|
% format_table(Columns, Separator) = Table:
|
|
%
|
|
% This function takes a list of columns and a column separator,
|
|
% and returns a formatted table, where
|
|
%
|
|
% - the N'th line contains the N'th string in each column;
|
|
% - that string will be padded to the width of the widest string
|
|
% in that column;
|
|
% - each field will be left justified within that width if the column
|
|
% has a "left()" wrapper, and right justified if it has a "right()"
|
|
% wrapper;
|
|
% - the fields on each line are separated with Separator;
|
|
% - successive lines are separated by newlines.
|
|
%
|
|
% There won't be a newline at the end of Table, to allow callers to decide
|
|
% whether they want to add one or not.
|
|
%
|
|
% This predicate considers the length of a string to be the number of
|
|
% code points in the string. Note that this is only an approximation:
|
|
% it will be inaccurate in the presence of e.g. combining characters.
|
|
%
|
|
% This predicate requires all the columns to contain the same number
|
|
% of strings, and throws an exception if this is not the case.
|
|
%
|
|
% An example:
|
|
%
|
|
% format_table([right(["a", "bb", "ccc"]), left(["1", "22", "333"])],
|
|
% " * ")
|
|
%
|
|
% would return the table:
|
|
%
|
|
% a * 1
|
|
% bb * 22
|
|
% ccc * 333
|
|
%
|
|
:- func format_table(list(justified_column), string) = string.
|
|
|
|
% format_table_max(Columns, Separator) does the same job as format_table,
|
|
% but allows the caller to associate a maximum width with each column.
|
|
% If some column had strings of e.g. lengths 18, 20, 35 and 45, then
|
|
% format_table would format that column as being 45 characters wide
|
|
% in all rows, but if a call to format_table_max specified 30 as the
|
|
% max width of that column, then format_table_max would format that column
|
|
% as being 30 characters wide in the first two rows, and would widen the
|
|
% column only when the value does not fit in the maximum, and in each case,
|
|
% it would widen the column only as much as necessary. In this example,
|
|
% the column would be 35 and 45 characters wide respectively in the
|
|
% last two rows.
|
|
%
|
|
:- func format_table_max(assoc_list(justified_column, maybe(int)), string)
|
|
= string.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Converting strings to docs.
|
|
%
|
|
|
|
% Convert a string to a pretty_printer.doc for formatting.
|
|
%
|
|
:- func string_to_doc(string) = pretty_printer.doc.
|
|
:- pragma obsolete(func(string_to_doc/1), [pretty_printer.string_to_doc/1]).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Converting strings to values of builtin types.
|
|
%
|
|
|
|
% Convert a string to an int. The string must contain only digits [0-9],
|
|
% optionally preceded by a plus or minus sign. If the string does
|
|
% not match this syntax or the number is not in the range
|
|
% [min_int, max_int], to_int fails.
|
|
%
|
|
:- pred to_int(string::in, int::out) is semidet.
|
|
|
|
% Convert a signed base 10 string to an int. Throws an exception if the
|
|
% string argument does not match the regexp [+-]?[0-9]+ or the number is
|
|
% not in the range [min_int, max_int].
|
|
%
|
|
:- func det_to_int(string) = int.
|
|
|
|
% Convert a string in the specified base (2-36) to an int. The string
|
|
% must contain one or more digits in the specified base, optionally
|
|
% preceded by a plus or minus sign. For bases > 10, digits 10 to 35
|
|
% are represented by the letters A-Z or a-z. If the string does not match
|
|
% this syntax or the number is not in the range [min_int, max_int],
|
|
% the predicate fails.
|
|
%
|
|
:- pred base_string_to_int(int::in, string::in, int::out) is semidet.
|
|
|
|
% Convert a signed base N string to an int. Throws an exception
|
|
% if the string argument is not precisely an optional sign followed by
|
|
% a non-empty string of base N digits, or if the number is not in
|
|
% the range [min_int, max_int].
|
|
%
|
|
:- func det_base_string_to_int(int, string) = int.
|
|
|
|
%---------------------%
|
|
|
|
% Convert a string to a uint. The string must contain only digits [0-9].
|
|
% If the string does not match this syntax or the number is not
|
|
% in the range [0, max_uint], to_uint fails.
|
|
%
|
|
:- pred to_uint(string::in, uint::out) is semidet.
|
|
|
|
% Convert an unsigned base 10 string to a uint. Throws an exception if the
|
|
% string argument does not match the regexp [0-9]+ or the number is
|
|
% not in the range [0, max_uint].
|
|
%
|
|
:- func det_to_uint(string) = uint.
|
|
|
|
% Convert a string in the specified base (2-36) to a uint. The string
|
|
% must contain one or more digits in the specified base. For bases > 10,
|
|
% digits 10 to 35 are represented by the letters A-Z or a-z. If the string
|
|
% does not match this syntax or the number is not in the range
|
|
% [0, max_uint], the predicate fails.
|
|
%
|
|
:- pred base_string_to_uint(int::in, string::in, uint::out) is semidet.
|
|
|
|
% Convert an unsigned base N string to a uint. Throws an exception
|
|
% if the string argument is not precisely a non-empty string of base N
|
|
% digits, or if the number is not in the range [0, max_uint].
|
|
%
|
|
:- func det_base_string_to_uint(int, string) = uint.
|
|
|
|
%---------------------%
|
|
|
|
% Convert a string to a float, returning infinity or -infinity if the
|
|
% conversion overflows. Fails if the string is not a syntactically correct
|
|
% float literal.
|
|
%
|
|
:- pred to_float(string::in, float::out) is semidet.
|
|
|
|
% Convert a string to a float, returning infinity or -infinity if the
|
|
% conversion overflows. Throws an exception if the string is not a
|
|
% syntactically correct float literal.
|
|
%
|
|
:- func det_to_float(string) = float.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Converting values of builtin types to strings.
|
|
%
|
|
|
|
%---------------------%
|
|
% Converting chars to strings.
|
|
|
|
|
|
% char_to_string(Char, String):
|
|
%
|
|
% Converts a character to a string, or vice versa.
|
|
% True if String is the well-formed string that encodes the code point
|
|
% Char; or, if strings are UTF-16 encoded, Char is a surrogate code
|
|
% point and String is the string that contains only that surrogate code
|
|
% point. Otherwise, `char_to_string(Char, String)' is false.
|
|
%
|
|
% Throws an exception if Char is the null character or a code point that
|
|
% cannot be encoded in a string (namely, surrogate code points cannot be
|
|
% encoded in UTF-8 strings).
|
|
%
|
|
:- func char_to_string(char::in) = (string::uo) is det.
|
|
:- pred char_to_string(char, string).
|
|
:- mode char_to_string(in, uo) is det.
|
|
:- mode char_to_string(out, in) is semidet.
|
|
|
|
% A synonym for char_to_string/1.
|
|
%
|
|
:- func from_char(char::in) = (string::uo) is det.
|
|
|
|
%---------------------%
|
|
% Converting integers to strings.
|
|
|
|
% The more complex conversions that build on the simpler conversions below.
|
|
|
|
% int_to_base_string(Int, Base, String):
|
|
%
|
|
% Convert an integer to a string in a given Base.
|
|
% String will consist of a minus sign (U+002D HYPHEN-MINUS)
|
|
% if Int is negative, followed by one or more decimal digits (0-9)
|
|
% or uppercase letters (A-Z). There will be no leading zeros.
|
|
%
|
|
% Base must be between 2 and 36, both inclusive; if it is not,
|
|
% the predicate will throw an exception.
|
|
%
|
|
:- func int_to_base_string(int::in, int::in) = (string::uo) is det.
|
|
:- pred int_to_base_string(int::in, int::in, string::uo) is det.
|
|
|
|
% Convert an integer to a string in base 10 with commas as thousand
|
|
% separators.
|
|
%
|
|
:- func int_to_string_thousands(int::in) = (string::uo) is det.
|
|
|
|
% int_to_base_string_group(Int, Base, GroupLength, Separator, String):
|
|
%
|
|
% Convert an integer to a string in a given Base,
|
|
% in the same format as int_to_base_string,
|
|
% with Separator inserted between every GroupLength digits
|
|
% (grouping from the end of the string).
|
|
% If GroupLength is less than one, no separators will appear
|
|
% in the output. Useful for formatting numbers like "1,300,000".
|
|
%
|
|
% Base must be between 2 and 36, both inclusive; if it is not,
|
|
% the predicate will throw an exception.
|
|
%
|
|
:- func int_to_base_string_group(int, int, int, string) = string.
|
|
:- mode int_to_base_string_group(in, in, in, in) = uo is det.
|
|
|
|
% The simpler conversions.
|
|
|
|
% Convert an integer to a string in base 10.
|
|
% See int_to_base_string for the string format.
|
|
%
|
|
:- func int_to_string(int::in) = (string::uo) is det.
|
|
:- pred int_to_string(int::in, string::uo) is det.
|
|
|
|
% A synonym for int_to_string/1.
|
|
%
|
|
:- func from_int(int::in) = (string::uo) is det.
|
|
|
|
% Convert an unsigned integer to a string in base 10.
|
|
%
|
|
:- func uint_to_string(uint::in) = (string::uo) is det.
|
|
|
|
% Convert an unsigned integer to a string in base 16.
|
|
% Alphabetic digits will be lowercase (e.g. a-f).
|
|
%
|
|
:- func uint_to_hex_string(uint::in) = (string::uo) is det.
|
|
:- func uint_to_lc_hex_string(uint::in) = (string::uo) is det.
|
|
|
|
% Convert an unsigned integer to a string in base 16.
|
|
% Alphabetic digits will be uppercase (e.g. A-F).
|
|
%
|
|
:- func uint_to_uc_hex_string(uint::in) = (string::uo) is det.
|
|
|
|
% Convert an unsigned integer to a string in base 8.
|
|
%
|
|
:- func uint_to_octal_string(uint::in) = (string::uo) is det.
|
|
|
|
% Convert a signed/unsigned 8/16/32/64 bit integer to a string.
|
|
%
|
|
:- func int8_to_string(int8::in) = (string::uo) is det.
|
|
:- func uint8_to_string(uint8::in) = (string::uo) is det.
|
|
:- func int16_to_string(int16::in) = (string::uo) is det.
|
|
:- func uint16_to_string(uint16::in) = (string::uo) is det.
|
|
:- func int32_to_string(int32::in) = (string::uo) is det.
|
|
:- func uint32_to_string(uint32::in) = (string::uo) is det.
|
|
:- func int64_to_string(int64::in) = (string::uo) is det.
|
|
:- func uint64_to_string(uint64::in) = (string::uo) is det.
|
|
|
|
% Convert an unsigned 64-bit integer to a string in base 16.
|
|
% Alphabetic digits will be lowercase (e.g. a-f).
|
|
%
|
|
:- func uint64_to_hex_string(uint64::in) = (string::uo) is det.
|
|
:- func uint64_to_lc_hex_string(uint64::in) = (string::uo) is det.
|
|
|
|
% Convert an unsigned 64-bit integer to a string in base 16.
|
|
% Alphabetic digits will be uppercase (e.g. A-F).
|
|
%
|
|
:- func uint64_to_uc_hex_string(uint64::in) = (string::uo) is det.
|
|
|
|
% Convert an unsigned 64-bit integer to a string in base 8.
|
|
%
|
|
:- func uint64_to_octal_string(uint64::in) = (string::uo) is det.
|
|
|
|
%---------------------%
|
|
% Converting floats to strings.
|
|
|
|
% Convert a float to a string.
|
|
% In the current implementation, the resulting float will be in the form
|
|
% that it was printed using the format string "%#.<prec>g".
|
|
% <prec> will be in the range p to (p+2)
|
|
% where p = floor(mantissa_digits * log2(base_radix) / log2(10)).
|
|
% The precision chosen from this range will be such as to allow
|
|
% a successful decimal -> binary conversion of the float.
|
|
%
|
|
:- func float_to_string(float::in) = (string::uo) is det.
|
|
:- pred float_to_string(float::in, string::uo) is det.
|
|
|
|
% A synonym for float_to_string/1.
|
|
%
|
|
:- func from_float(float::in) = (string::uo) is det.
|
|
|
|
%---------------------%
|
|
% Converting c_pointers to strings.
|
|
|
|
% Convert a c_pointer to a string. The format is "c_pointer(0xXXXX)"
|
|
% where XXXX is the hexadecimal representation of the pointer.
|
|
%
|
|
:- func c_pointer_to_string(c_pointer::in) = (string::uo) is det.
|
|
:- pred c_pointer_to_string(c_pointer::in, string::uo) is det.
|
|
|
|
% A synonym for c_pointer_to_string/1.
|
|
%
|
|
:- func from_c_pointer(c_pointer::in) = (string::uo) is det.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Converting values of arbitrary types to strings.
|
|
%
|
|
|
|
% string(X): Returns a canonicalized string representation of the value X
|
|
% using the standard Mercury operators.
|
|
%
|
|
:- func string(T) = string.
|
|
|
|
% As above, but using the supplied table of operators.
|
|
% NOTE_TO_IMPLEMENTORS: XXX The only table we accept is the Mercury table.
|
|
%
|
|
:- func string_ops(ops.table, T) = string.
|
|
|
|
% string_ops_noncanon(NonCanon, OpTable, X, String)
|
|
%
|
|
% As above, but the caller specifies what behaviour should occur for
|
|
% non-canonical terms (i.e. terms where multiple representations
|
|
% may compare as equal):
|
|
%
|
|
% - `do_not_allow' will throw an exception if (any subterm of)
|
|
% the argument is not canonical;
|
|
% - `canonicalize' will substitute a string indicating the presence
|
|
% of a non-canonical subterm;
|
|
% - `include_details_cc' will show the structure of any non-canonical
|
|
% subterms, but can only be called from a committed choice context.
|
|
%
|
|
:- pred string_ops_noncanon(noncanon_handling, ops.table, T, string).
|
|
:- mode string_ops_noncanon(in(do_not_allow), in, in, out) is det.
|
|
:- mode string_ops_noncanon(in(canonicalize), in, in, out) is det.
|
|
:- mode string_ops_noncanon(in(include_details_cc), in, in, out) is cc_multi.
|
|
:- mode string_ops_noncanon(in, in, in, out) is cc_multi.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Converting values to strings based on a format string.
|
|
%
|
|
|
|
% NOTE_TO_IMPLEMENTORS If you modify this type, you will also need to modify
|
|
% NOTE_TO_IMPLEMENTORS the type that represents abstract poly_types,
|
|
% NOTE_TO_IMPLEMENTORS abstract_poly_type in compiler/parse_string_format.m,
|
|
% NOTE_TO_IMPLEMENTORS as well as the predicates that parse concrete and
|
|
% NOTE_TO_IMPLEMENTORS abstract poly_types, in library/string.parse_runtime.m
|
|
% NOTE_TO_IMPLEMENTORS and in compiler/parse_string_format.m.
|
|
:- type poly_type
|
|
---> f(float)
|
|
; i(int)
|
|
; i8(int8)
|
|
; i16(int16)
|
|
; i32(int32)
|
|
; i64(int64)
|
|
; u(uint)
|
|
; u8(uint8)
|
|
; u16(uint16)
|
|
; u32(uint32)
|
|
; u64(uint64)
|
|
; s(string)
|
|
; c(char).
|
|
|
|
% A function similar to sprintf() in C.
|
|
%
|
|
% For example,
|
|
% format("%s %i %c %f\n",
|
|
% [s("Square-root of"), i(2), c('='), f(1.41)], String)
|
|
% will return
|
|
% String = "Square-root of 2 = 1.41\n".
|
|
%
|
|
% The following options available in C are supported: flags [0+-# ],
|
|
% a field width (or *), and a precision (could be a ".*").
|
|
%
|
|
% Valid conversion character types are {dioxXucsfeEgGp%}. %n is not
|
|
% supported. format will not return the length of the string.
|
|
%
|
|
% conv var output form. effect of '#'.
|
|
% char. type(s).
|
|
%
|
|
% d int signed integer
|
|
% i int signed integer
|
|
% o int, uint unsigned octal with '0' prefix
|
|
% x,X int, uint unsigned hex with '0x', '0X' prefix
|
|
% u int, uint unsigned integer
|
|
% c char character
|
|
% s string string
|
|
% f float rational number with '.', if precision 0
|
|
% e,E float [-]m.dddddE+-xx with '.', if precision 0
|
|
% g,G float either e or f with trailing zeros.
|
|
% p int, uint integer
|
|
%
|
|
% The valid conversion characters for int8, int16, int32 and int64
|
|
% are the same as for int, and the valid conversion characters for
|
|
% uint8, uint16, uint32 and uint64 are the same as for uint.
|
|
%
|
|
% An option of zero will cause any padding to be zeros rather than spaces.
|
|
% A '-' will cause the output to be left-justified in its 'space'.
|
|
% (Without a `-', the default is for fields to be right-justified.)
|
|
% A '+' forces a sign to be printed. This is not sensible for string
|
|
% and character output. A ' ' causes a space to be printed before a thing
|
|
% if there is no sign there. The other option is the '#', which modifies
|
|
% the output string's format. These options are normally put directly
|
|
% after the '%'.
|
|
%
|
|
% Notes:
|
|
%
|
|
% %#.0e, %#.0E now prints a '.' before the 'e'.
|
|
%
|
|
% Asking for more precision than a float actually has will result in
|
|
% potentially misleading output.
|
|
%
|
|
% Numbers are now rounded by precision value, not truncated as previously.
|
|
%
|
|
% The implementation uses the sprintf() function in C grades,
|
|
% so the actual output will depend on the C standard library.
|
|
%
|
|
:- func format(string, list(poly_type)) = string.
|
|
:- pred format(string::in, list(poly_type)::in, string::out) is det.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%---------------------------------------------------------------------------%
|
|
|
|
:- implementation.
|
|
|
|
% The modules string.format and string.parse_util have to be visible
|
|
% from outside the string module, since they need to be visible to the
|
|
% compiler (specifically, to format_call.m and its submodule
|
|
% parse_format_string.m.). However, they should not be part of the
|
|
% publicly documented interface of the Mercury standard library,
|
|
% since we do not want any user code to depend on the implementation
|
|
% details they contain.
|
|
:- interface.
|
|
:- include_module format.
|
|
:- include_module parse_util.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
|
|
:- implementation.
|
|
|
|
:- include_module parse_runtime.
|
|
:- include_module to_string.
|
|
|
|
:- import_module bool.
|
|
:- import_module cord.
|
|
:- import_module int.
|
|
:- import_module pair.
|
|
:- import_module require.
|
|
:- import_module string.format.
|
|
:- import_module string.to_string.
|
|
:- import_module uint.
|
|
:- import_module uint8.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
|
|
% Many routines in this module are implemented using foreign language code.
|
|
% And many of the integer-to-string conversion operations use common
|
|
% components.
|
|
|
|
:- pragma foreign_decl("C",
|
|
"
|
|
#include <ctype.h>
|
|
#include <string.h>
|
|
#include <stdio.h>
|
|
#include <inttypes.h>
|
|
|
|
#include ""mercury_string.h"" // for MR_allocate_aligned_string*() etc.
|
|
#include ""mercury_tags.h"" // for MR_list_*()
|
|
|
|
// Doing a binary search for the correct value of num_digits
|
|
// would yield better expected-case performance if values of U8
|
|
// were randomly distributed along its entire range, but this
|
|
// linear search is better for use cases where smaller values of the U8
|
|
// are more likely than larger values.
|
|
|
|
#define get_num_decimal_digits_in_uint8(U8, num_digits) \
|
|
do { \
|
|
if ((U8) < 10U) { /* 10^1 */ \
|
|
(num_digits) = 1; \
|
|
} else if ((U8) < 100U) { /* 10^2 */ \
|
|
(num_digits) = 2; \
|
|
} else { \
|
|
/* UINT8_MAX = (2^8)-1 = 255, which needs 3 digits. */ \
|
|
(num_digits) = 3; \
|
|
} \
|
|
} while (0)
|
|
|
|
#define get_num_decimal_digits_in_uint16(U16, num_digits) \
|
|
do { \
|
|
if ((U16) < 10U) { /* 10^1 */ \
|
|
(num_digits) = 1; \
|
|
} else if ((U16) < 100U) { /* 10^2 */ \
|
|
(num_digits) = 2; \
|
|
} else if ((U16) < 1000U) { /* 10^3 */ \
|
|
(num_digits) = 3; \
|
|
} else if ((U16) < 10000U) { /* 10^4 */ \
|
|
(num_digits) = 4; \
|
|
} else { \
|
|
/* UINT16_MAX = (2^16)-1 = 65_535 which needs 5 digits. */ \
|
|
(num_digits) = 5; \
|
|
} \
|
|
} while (0)
|
|
|
|
#define get_num_decimal_digits_in_uint32(U32, num_digits) \
|
|
do { \
|
|
if ((U32) < 10U) { /* 10^1 */ \
|
|
(num_digits) = 1; \
|
|
} else if ((U32) < 100U) { /* 10^2 */ \
|
|
(num_digits) = 2; \
|
|
} else if ((U32) < 1000U) { /* 10^3 */ \
|
|
(num_digits) = 3; \
|
|
} else if ((U32) < 10000U) { /* 10^4 */ \
|
|
(num_digits) = 4; \
|
|
} else if ((U32) < 100000U) { /* 10^5 */ \
|
|
(num_digits) = 5; \
|
|
} else if ((U32) < 1000000U) { /* 10^6 */ \
|
|
(num_digits) = 6; \
|
|
} else if ((U32) < 10000000U) { /* 10^7 */ \
|
|
(num_digits) = 7; \
|
|
} else if ((U32) < 100000000U) { /* 10^8 */ \
|
|
(num_digits) = 8; \
|
|
} else if ((U32) < 1000000000U) { /* 10^9 */ \
|
|
(num_digits) = 9; \
|
|
} else { \
|
|
/* UINT32_MAX = (2^32)-1 = 4_294_967_295, */ \
|
|
/* which needs 10 digits. */ \
|
|
(num_digits) = 10; \
|
|
} \
|
|
} while (0)
|
|
|
|
#define get_num_decimal_digits_in_uint64(U64, num_digits) \
|
|
do { \
|
|
if ((U64) < 10U) { /* 10^1 */ \
|
|
(num_digits) = 1; \
|
|
} else if ((U64) < 100U) { /* 10^2 */ \
|
|
(num_digits) = 2; \
|
|
} else if ((U64) < 1000U) { /* 10^3 */ \
|
|
(num_digits) = 3; \
|
|
} else if ((U64) < 10000U) { /* 10^4 */ \
|
|
(num_digits) = 4; \
|
|
} else if ((U64) < 100000U) { /* 10^5 */ \
|
|
(num_digits) = 5; \
|
|
} else if ((U64) < 1000000U) { /* 10^6 */ \
|
|
(num_digits) = 6; \
|
|
} else if ((U64) < 10000000U) { /* 10^7 */ \
|
|
(num_digits) = 7; \
|
|
} else if ((U64) < 100000000U) { /* 10^8 */ \
|
|
(num_digits) = 8; \
|
|
} else if ((U64) < 1000000000U) { /* 10^9 */ \
|
|
(num_digits) = 9; \
|
|
} else if ((U64) < 10000000000U) { /* 10^10 */ \
|
|
(num_digits) = 10; \
|
|
} else if ((U64) < 100000000000U) { /* 10^11 */ \
|
|
(num_digits) = 11; \
|
|
} else if ((U64) < 1000000000000U) { /* 10^12 */ \
|
|
(num_digits) = 12; \
|
|
} else if ((U64) < 10000000000000U) { /* 10^13 */ \
|
|
(num_digits) = 13; \
|
|
} else if ((U64) < 100000000000000U) { /* 10^14 */ \
|
|
(num_digits) = 14; \
|
|
} else if ((U64) < 1000000000000000U) { /* 10^15 */ \
|
|
(num_digits) = 15; \
|
|
} else if ((U64) < 10000000000000000U) { /* 10^16 */ \
|
|
(num_digits) = 16; \
|
|
} else if ((U64) < 100000000000000000U) { /* 10^17 */ \
|
|
(num_digits) = 17; \
|
|
} else if ((U64) < 1000000000000000000U) { /* 10^18 */ \
|
|
(num_digits) = 18; \
|
|
} else if ((U64) < 10000000000000000000U) { /* 10^19 */ \
|
|
(num_digits) = 19; \
|
|
} else { \
|
|
/* UINT64_MAX = (2^64)-1 = 18_446_744_073_709_551_615, */ \
|
|
/* which needs 20 digits. */ \
|
|
(num_digits) = 20; \
|
|
} \
|
|
} while (0)
|
|
|
|
#define get_num_octal_digits_in_uint32(U32, num_digits) \
|
|
do { \
|
|
if ((U32) < 010) { /* 8^1 */ \
|
|
(num_digits) = 1; \
|
|
} else if ((U32) < 0100) { /* 8^2 */ \
|
|
(num_digits) = 2; \
|
|
} else if ((U32) < 01000) { /* 8^3 */ \
|
|
(num_digits) = 3; \
|
|
} else if ((U32) < 010000) { /* 8^4 */ \
|
|
(num_digits) = 4; \
|
|
} else if ((U32) < 0100000) { /* 8^5 */ \
|
|
(num_digits) = 5; \
|
|
} else if ((U32) < 01000000) { /* 8^6 */ \
|
|
(num_digits) = 6; \
|
|
} else if ((U32) < 010000000) { /* 8^7 */ \
|
|
(num_digits) = 7; \
|
|
} else if ((U32) < 0100000000) { /* 8^8 */ \
|
|
(num_digits) = 8; \
|
|
} else if ((U32) < 01000000000) { /* 8^9 */ \
|
|
(num_digits) = 9; \
|
|
} else if ((U32) < 010000000000) { /* 8^10 */ \
|
|
(num_digits) = 10; \
|
|
} else { \
|
|
/* 32 bits is 10 groups of three bits, plus two bits. */ \
|
|
(num_digits) = 11; \
|
|
} \
|
|
} while (0)
|
|
|
|
#define get_num_octal_digits_in_uint64(U64, num_digits) \
|
|
do { \
|
|
if ((U64) < 010) { /* 8^1 */ \
|
|
(num_digits) = 1; \
|
|
} else if ((U64) < 0100) { /* 8^2 */ \
|
|
(num_digits) = 2; \
|
|
} else if ((U64) < 01000) { /* 8^3 */ \
|
|
(num_digits) = 3; \
|
|
} else if ((U64) < 010000) { /* 8^4 */ \
|
|
(num_digits) = 4; \
|
|
} else if ((U64) < 0100000) { /* 8^5 */ \
|
|
(num_digits) = 5; \
|
|
} else if ((U64) < 01000000) { /* 8^6 */ \
|
|
(num_digits) = 6; \
|
|
} else if ((U64) < 010000000) { /* 8^7 */ \
|
|
(num_digits) = 7; \
|
|
} else if ((U64) < 0100000000) { /* 8^8 */ \
|
|
(num_digits) = 8; \
|
|
} else if ((U64) < 01000000000) { /* 8^9 */ \
|
|
(num_digits) = 9; \
|
|
} else if ((U64) < 010000000000) { /* 8^10 */ \
|
|
(num_digits) = 10; \
|
|
} else if ((U64) < 0100000000000) { /* 8^11 */ \
|
|
(num_digits) = 11; \
|
|
} else if ((U64) < 01000000000000) { /* 8^12 */ \
|
|
(num_digits) = 12; \
|
|
} else if ((U64) < 010000000000000) { /* 8^13 */ \
|
|
(num_digits) = 13; \
|
|
} else if ((U64) < 0100000000000000) { /* 8^14 */ \
|
|
(num_digits) = 14; \
|
|
} else if ((U64) < 01000000000000000) { /* 8^15 */ \
|
|
(num_digits) = 15; \
|
|
} else if ((U64) < 010000000000000000) { /* 8^16 */ \
|
|
(num_digits) = 16; \
|
|
} else if ((U64) < 0100000000000000000) { /* 8^17 */ \
|
|
(num_digits) = 17; \
|
|
} else if ((U64) < 01000000000000000000) { /* 8^18 */ \
|
|
(num_digits) = 18; \
|
|
} else if ((U64) < 010000000000000000000) { /* 8^19 */ \
|
|
(num_digits) = 19; \
|
|
} else if ((U64) < 0100000000000000000000) { /* 8^20 */ \
|
|
(num_digits) = 20; \
|
|
} else if ((U64) < 01000000000000000000000) { /* 8^21 */ \
|
|
(num_digits) = 21; \
|
|
} else { \
|
|
/* 64 bits is 21 groups of three bits, plus one bit. */ \
|
|
(num_digits) = 22; \
|
|
} \
|
|
} while (0)
|
|
|
|
#define get_num_hex_digits_in_uint32(U32, num_digits) \
|
|
do { \
|
|
if ((U32) < 0x10) { /* 16^1 */ \
|
|
(num_digits) = 1; \
|
|
} else if ((U32) < 0x100) { /* 16^2 */ \
|
|
(num_digits) = 2; \
|
|
} else if ((U32) < 0x1000) { /* 16^3 */ \
|
|
(num_digits) = 3; \
|
|
} else if ((U32) < 0x10000) { /* 16^4 */ \
|
|
(num_digits) = 4; \
|
|
} else if ((U32) < 0x100000) { /* 16^5 */ \
|
|
(num_digits) = 5; \
|
|
} else if ((U32) < 0x1000000) { /* 16^6 */ \
|
|
(num_digits) = 6; \
|
|
} else if ((U32) < 0x10000000) { /* 16^7 */ \
|
|
(num_digits) = 7; \
|
|
} else if ((U32) < 0x100000000) { /* 16^8 */ \
|
|
(num_digits) = 8; \
|
|
} else { \
|
|
/* 32 bits is 8 groups of four bits. */ \
|
|
(num_digits) = 16; \
|
|
} \
|
|
} while (0)
|
|
|
|
#define get_num_hex_digits_in_uint64(U64, num_digits) \
|
|
do { \
|
|
if ((U64) < 0x10) { /* 16^1 */ \
|
|
(num_digits) = 1; \
|
|
} else if ((U64) < 0x100) { /* 16^2 */ \
|
|
(num_digits) = 2; \
|
|
} else if ((U64) < 0x1000) { /* 16^3 */ \
|
|
(num_digits) = 3; \
|
|
} else if ((U64) < 0x10000) { /* 16^4 */ \
|
|
(num_digits) = 4; \
|
|
} else if ((U64) < 0x100000) { /* 16^5 */ \
|
|
(num_digits) = 5; \
|
|
} else if ((U64) < 0x1000000) { /* 16^6 */ \
|
|
(num_digits) = 6; \
|
|
} else if ((U64) < 0x10000000) { /* 16^7 */ \
|
|
(num_digits) = 7; \
|
|
} else if ((U64) < 0x100000000) { /* 16^8 */ \
|
|
(num_digits) = 8; \
|
|
} else if ((U64) < 0x1000000000) { /* 16^9 */ \
|
|
(num_digits) = 9; \
|
|
} else if ((U64) < 0x10000000000) { /* 16^10 */ \
|
|
(num_digits) = 10; \
|
|
} else if ((U64) < 0x100000000000) { /* 16^11 */ \
|
|
(num_digits) = 11; \
|
|
} else if ((U64) < 0x1000000000000) { /* 16^12 */ \
|
|
(num_digits) = 12; \
|
|
} else if ((U64) < 0x10000000000000) { /* 16^13 */ \
|
|
(num_digits) = 13; \
|
|
} else if ((U64) < 0x100000000000000) { /* 16^14 */ \
|
|
(num_digits) = 14; \
|
|
} else if ((U64) < 0x1000000000000000) { /* 16^15 */ \
|
|
(num_digits) = 15; \
|
|
} else { \
|
|
/* 64 bits is 16 groups of four bits. */ \
|
|
(num_digits) = 16; \
|
|
} \
|
|
} while (0)
|
|
|
|
// NOTE The expression <('0' + (U % 10))> is likely to be faster than
|
|
// <\"0123456789\"[U % 10]>. However, we have to use the latter approach
|
|
// for hex digits, both because ASCII 'a' does not immediately follow '9',
|
|
// and because we want both lower case and upper case versions.
|
|
|
|
// NOTE All the following macros fill in S back-to-front.
|
|
|
|
#define fill_string_with_unsigned_decimal(S, U, num_digits, alloc) \
|
|
do { \
|
|
MR_allocate_aligned_string_msg((S), (num_digits), (alloc)); \
|
|
(S)[(num_digits)] = '\\0'; \
|
|
int i = (num_digits) - 1; \
|
|
do { \
|
|
(S)[i] = (char) ('0' + ((U) % 10)); \
|
|
(U) = (U) / 10; \
|
|
i--; \
|
|
} while ((U) > 0); \
|
|
} while (0)
|
|
|
|
#define fill_string_with_negative_unsigned_decimal(S, U, num_digits, alloc) \
|
|
do { \
|
|
MR_allocate_aligned_string_msg((S), (1 + (num_digits)), (alloc)); \
|
|
(S)[1 + (num_digits)] = '\\0'; \
|
|
int i = (num_digits); \
|
|
do { \
|
|
(S)[i] = (char) ('0' + ((U) % 10)); \
|
|
(U) = (U) / 10; \
|
|
i--; \
|
|
} while ((U) > 0); \
|
|
(S)[0] = '-'; \
|
|
} while (0)
|
|
|
|
#define fill_string_with_unsigned_octal(S, U, num_digits, alloc) \
|
|
do { \
|
|
MR_allocate_aligned_string_msg((S), (num_digits), (alloc)); \
|
|
(S)[(num_digits)] = '\\0'; \
|
|
int i = (num_digits) - 1; \
|
|
do { \
|
|
(S)[i] = (char) ('0' + ((U) & 07)); \
|
|
(U) = (U) >> 3; \
|
|
i--; \
|
|
} while ((U) > 0); \
|
|
} while (0)
|
|
|
|
#define fill_string_with_unsigned_hex_lc(S, U, num_digits, alloc) \
|
|
do { \
|
|
MR_allocate_aligned_string_msg((S), (num_digits), (alloc)); \
|
|
(S)[(num_digits)] = '\\0'; \
|
|
int i = (num_digits) - 1; \
|
|
do { \
|
|
(S)[i] = \"0123456789abcdef\"[(U) & 0xf]; \
|
|
(U) = (U) >> 4; \
|
|
i--; \
|
|
} while ((U) > 0); \
|
|
} while (0)
|
|
|
|
#define fill_string_with_unsigned_hex_uc(S, U, num_digits, alloc) \
|
|
do { \
|
|
MR_allocate_aligned_string_msg((S), (num_digits), (alloc)); \
|
|
(S)[(num_digits)] = '\\0'; \
|
|
int i = (num_digits) - 1; \
|
|
do { \
|
|
(S)[i] = \"0123456789ABCDEF\"[(U) & 0xf]; \
|
|
(U) = (U) >> 4; \
|
|
i--; \
|
|
} while ((U) > 0); \
|
|
} while (0)
|
|
").
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% String encoding.
|
|
%
|
|
|
|
:- pragma foreign_export_enum("C", string_encoding/0, [uppercase]).
|
|
:- pragma foreign_export_enum("C#", string_encoding/0, [uppercase]).
|
|
:- pragma foreign_export_enum("Java", string_encoding/0, [uppercase]).
|
|
|
|
:- pragma foreign_proc("C",
|
|
internal_string_encoding = (Encoding::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Encoding = UTF8;
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
internal_string_encoding = (Encoding::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Encoding = UTF16;
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
internal_string_encoding = (Encoding::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Encoding = UTF16;
|
|
").
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Conversions between strings and lists of characters.
|
|
%
|
|
|
|
to_char_list(S) = Cs :-
|
|
to_char_list(S, Cs).
|
|
|
|
to_char_list(Str, CharList) :-
|
|
do_to_char_list_loop(Str, string.count_code_units(Str), [], CharList).
|
|
|
|
:- pred do_to_char_list_loop(string::in, int::in,
|
|
list(char)::in, list(char)::out) is det.
|
|
|
|
do_to_char_list_loop(Str, Index0, !CharList) :-
|
|
( if string.unsafe_prev_index(Str, Index0, Index1, C) then
|
|
!:CharList = [C | !.CharList],
|
|
do_to_char_list_loop(Str, Index1, !CharList)
|
|
else
|
|
true
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
to_rev_char_list(S) = Cs :-
|
|
to_rev_char_list(S, Cs).
|
|
|
|
to_rev_char_list(Str, RevCharList) :-
|
|
do_to_rev_char_list_loop(Str, 0, [], RevCharList).
|
|
|
|
:- pred do_to_rev_char_list_loop(string::in, int::in,
|
|
list(char)::in, list(char)::out) is det.
|
|
|
|
do_to_rev_char_list_loop(Str, Index0, !RevCharList) :-
|
|
( if string.unsafe_index_next(Str, Index0, Index1, C) then
|
|
!:RevCharList = [C | !.RevCharList],
|
|
do_to_rev_char_list_loop(Str, Index1, !RevCharList)
|
|
else
|
|
true
|
|
).
|
|
|
|
%---------------------%
|
|
%
|
|
% XXX There is an inconsistency in that
|
|
%
|
|
% - from_char_list/from_rev_char_list throw exceptions
|
|
% - but from_code_unit_list/from_{utf8,utf16}_code_unit_list fail
|
|
%
|
|
% when the list of code points cannot be encoded in a string.
|
|
|
|
from_char_list(Cs) = S :-
|
|
from_char_list(Cs, S).
|
|
|
|
from_char_list(Chars, Str) :-
|
|
( if semidet_from_char_list(Chars, Str0) then
|
|
Str = Str0
|
|
else
|
|
unexpected($pred, "null character or surrogate code point in list")
|
|
).
|
|
|
|
:- pragma foreign_proc("C",
|
|
semidet_from_char_list(CharList::in, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, may_not_duplicate, no_sharing],
|
|
"{
|
|
MR_Word char_list_ptr;
|
|
size_t size;
|
|
|
|
// Loop to calculate list length + sizeof(MR_Word) in `size'
|
|
// using list in `char_list_ptr'.
|
|
SUCCESS_INDICATOR = MR_TRUE;
|
|
size = 0;
|
|
char_list_ptr = CharList;
|
|
while (! MR_list_is_empty(char_list_ptr)) {
|
|
MR_Char c = (MR_Char) MR_list_head(char_list_ptr);
|
|
if (c == '\\0') {
|
|
SUCCESS_INDICATOR = MR_FALSE;
|
|
break;
|
|
}
|
|
if (MR_is_ascii(c)) {
|
|
size++;
|
|
} else {
|
|
size_t csize = MR_utf8_width(c);
|
|
if (csize == 0) {
|
|
// c is a surrogate code point (or even out of range,
|
|
// but that is not supposed to happen).
|
|
SUCCESS_INDICATOR = MR_FALSE;
|
|
break;
|
|
}
|
|
size += csize;
|
|
}
|
|
char_list_ptr = MR_list_tail(char_list_ptr);
|
|
}
|
|
|
|
if (SUCCESS_INDICATOR) {
|
|
// Allocate heap space for string.
|
|
MR_allocate_aligned_string_msg(Str, size, MR_ALLOC_ID);
|
|
|
|
// Loop to copy the characters from the char_list to the string.
|
|
size = 0;
|
|
char_list_ptr = CharList;
|
|
while (! MR_list_is_empty(char_list_ptr)) {
|
|
MR_Char c = (MR_Char) MR_list_head(char_list_ptr);
|
|
if (MR_is_ascii(c)) {
|
|
Str[size] = c;
|
|
size++;
|
|
} else {
|
|
size += MR_utf8_encode(Str + size, c);
|
|
}
|
|
char_list_ptr = MR_list_tail(char_list_ptr);
|
|
}
|
|
|
|
Str[size] = '\\0';
|
|
}
|
|
}").
|
|
:- pragma foreign_proc("C#",
|
|
semidet_from_char_list(CharList::in, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness],
|
|
"
|
|
SUCCESS_INDICATOR = true;
|
|
System.Text.StringBuilder sb = new System.Text.StringBuilder();
|
|
while (!list.is_empty(CharList)) {
|
|
int cp = (int) list.det_head(CharList);
|
|
if (cp == 0x0000) {
|
|
SUCCESS_INDICATOR = false;
|
|
break;
|
|
} else if (cp <= 0xffff) {
|
|
sb.Append((char) cp);
|
|
} else {
|
|
sb.Append(System.Char.ConvertFromUtf32(cp));
|
|
}
|
|
CharList = list.det_tail(CharList);
|
|
}
|
|
Str = sb.ToString();
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
semidet_from_char_list(CharList::in, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness],
|
|
"
|
|
java.lang.StringBuilder sb = new StringBuilder();
|
|
Iterable<Integer> iterable = new list.ListIterator<Integer>(CharList);
|
|
SUCCESS_INDICATOR = true;
|
|
for (int c : iterable) {
|
|
if (c == 0x0000) {
|
|
SUCCESS_INDICATOR = false;
|
|
break;
|
|
} else if (c <= 0xffff) {
|
|
// Fast path.
|
|
sb.append((char) c);
|
|
} else {
|
|
sb.append(java.lang.Character.toChars(c));
|
|
}
|
|
}
|
|
Str = sb.toString();
|
|
").
|
|
|
|
semidet_from_char_list(CharList, Str) :-
|
|
(
|
|
CharList = [],
|
|
Str = ""
|
|
;
|
|
CharList = [C | Cs],
|
|
not char.to_int(C, 0),
|
|
internal_encoding_is_utf8 => not char.is_surrogate(C),
|
|
semidet_from_char_list(Cs, Str0),
|
|
first_char(Str, C, Str0)
|
|
).
|
|
|
|
%---------------------%
|
|
%
|
|
% We could implement from_rev_char_list using list.reverse and from_char_list,
|
|
% but the optimized implementation in C below is there for efficiency.
|
|
% At the time this predicate was added, it improved the overall speed of
|
|
% parsing by about 7%.
|
|
%
|
|
|
|
from_rev_char_list(Cs) = S :-
|
|
from_rev_char_list(Cs, S).
|
|
|
|
from_rev_char_list(Chars, Str) :-
|
|
( if semidet_from_rev_char_list(Chars, Str0) then
|
|
Str = Str0
|
|
else
|
|
unexpected($pred, "null character or surrogate code point in list")
|
|
).
|
|
|
|
:- pragma foreign_proc("C",
|
|
semidet_from_rev_char_list(Chars::in, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, may_not_duplicate, no_sharing],
|
|
"{
|
|
MR_Word list_ptr;
|
|
MR_Word size;
|
|
|
|
// Loop to calculate list length in `size' using list in `list_ptr'.
|
|
SUCCESS_INDICATOR = MR_TRUE;
|
|
size = 0;
|
|
list_ptr = Chars;
|
|
while (!MR_list_is_empty(list_ptr)) {
|
|
MR_Char c = (MR_Char) MR_list_head(list_ptr);
|
|
if (c == '\\0') {
|
|
SUCCESS_INDICATOR = MR_FALSE;
|
|
break;
|
|
}
|
|
if (MR_is_ascii(c)) {
|
|
size++;
|
|
} else {
|
|
size_t csize = MR_utf8_width(c);
|
|
if (csize == 0) {
|
|
// c is a surrogate code point (or even out of range,
|
|
// but that is not supposed to happen).
|
|
SUCCESS_INDICATOR = MR_FALSE;
|
|
break;
|
|
}
|
|
size += csize;
|
|
}
|
|
list_ptr = MR_list_tail(list_ptr);
|
|
}
|
|
|
|
if (SUCCESS_INDICATOR) {
|
|
// Allocate heap space for string.
|
|
MR_allocate_aligned_string_msg(Str, size, MR_ALLOC_ID);
|
|
|
|
// Set size to be the offset of the end of the string
|
|
// (i.e. the \\0) and null terminate the string.
|
|
Str[size] = '\\0';
|
|
|
|
// Loop to copy the characters from the list_ptr to the string
|
|
// in reverse order.
|
|
list_ptr = Chars;
|
|
while (! MR_list_is_empty(list_ptr)) {
|
|
MR_Char c = (MR_Char) MR_list_head(list_ptr);
|
|
if (MR_is_ascii(c)) {
|
|
size--;
|
|
Str[size] = c;
|
|
} else {
|
|
size -= MR_utf8_width(c);
|
|
MR_utf8_encode(Str + size, c);
|
|
}
|
|
list_ptr = MR_list_tail(list_ptr);
|
|
}
|
|
}
|
|
}").
|
|
:- pragma foreign_proc("C#",
|
|
semidet_from_rev_char_list(Chars::in, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, may_not_duplicate, no_sharing],
|
|
"
|
|
int size = 0;
|
|
list.List_1 list_ptr = Chars;
|
|
while (!list.is_empty(list_ptr)) {
|
|
int c = (int) list.det_head(list_ptr);
|
|
if (c <= 0xffff) {
|
|
size++;
|
|
} else {
|
|
size += 2;
|
|
}
|
|
list_ptr = list.det_tail(list_ptr);
|
|
}
|
|
|
|
char[] arr = new char[size];
|
|
list_ptr = Chars;
|
|
SUCCESS_INDICATOR = true;
|
|
while (!list.is_empty(list_ptr)) {
|
|
int c = (int) list.det_head(list_ptr);
|
|
if (c == 0x0000) {
|
|
SUCCESS_INDICATOR = false;
|
|
break;
|
|
} else if (c <= 0xffff) {
|
|
arr[--size] = (char) c;
|
|
} else {
|
|
string s = System.Char.ConvertFromUtf32(c);
|
|
arr[--size] = s[1];
|
|
arr[--size] = s[0];
|
|
}
|
|
list_ptr = list.det_tail(list_ptr);
|
|
}
|
|
|
|
Str = new string(arr);
|
|
").
|
|
|
|
semidet_from_rev_char_list(Chars, Str) :-
|
|
semidet_from_char_list(list.reverse(Chars), Str).
|
|
|
|
%---------------------%
|
|
|
|
to_code_unit_list(String, CodeUnits) :-
|
|
NumCodeUnits = string.count_code_units(String),
|
|
to_code_unit_list_loop(String, 0, NumCodeUnits, CodeUnits).
|
|
|
|
:- pred to_code_unit_list_loop(string::in, int::in, int::in,
|
|
list(int)::out) is det.
|
|
|
|
to_code_unit_list_loop(String, Index, End, CodeUnits) :-
|
|
( if Index >= End then
|
|
CodeUnits = []
|
|
else
|
|
unsafe_index_code_unit(String, Index, CodeUnit),
|
|
to_code_unit_list_loop(String, Index + 1, End, TailCodeUnits),
|
|
CodeUnits = [CodeUnit | TailCodeUnits]
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
to_utf8_code_unit_list(String, CodeUnits) :-
|
|
Encoding = internal_string_encoding,
|
|
(
|
|
Encoding = utf8,
|
|
to_code_unit_list(String, CodeUnits)
|
|
;
|
|
Encoding = utf16,
|
|
string.foldr(encode_utf8, String, [], CodeUnits)
|
|
).
|
|
|
|
:- pred encode_utf8(char::in, list(int)::in, list(int)::out) is det.
|
|
|
|
encode_utf8(Char, StrCodeUnits0, StrCodeUnits) :-
|
|
( if char.to_utf8(Char, CharCodeUnits) then
|
|
StrCodeUnits = CharCodeUnits ++ StrCodeUnits0
|
|
else
|
|
unexpected($pred, "surrogate code point")
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
to_utf16_code_unit_list(String, Utf16CodeUnits) :-
|
|
Encoding = internal_string_encoding,
|
|
(
|
|
Encoding = utf8,
|
|
NumUtf8CodeUnits = string.count_code_units(String),
|
|
utf8_to_utf16_code_units_rev_loop(String, NumUtf8CodeUnits,
|
|
[], Utf16CodeUnits)
|
|
;
|
|
Encoding = utf16,
|
|
to_code_unit_list(String, Utf16CodeUnits)
|
|
).
|
|
|
|
:- pred utf8_to_utf16_code_units_rev_loop(string::in, int::in,
|
|
list(int)::in, list(int)::out) is det.
|
|
|
|
utf8_to_utf16_code_units_rev_loop(String, Index, !StrUtf16CodeUnits) :-
|
|
( if
|
|
unsafe_prev_index_repl(String, Index, PrevIndex, Char, MaybeReplaced)
|
|
then
|
|
(
|
|
MaybeReplaced = replaced_code_unit(_),
|
|
unexpected($pred, "ill-formed code unit sequence")
|
|
;
|
|
MaybeReplaced = not_replaced,
|
|
( if char.to_utf16(Char, CharUtf16CodeUnits) then
|
|
% We add CharUtf16CodeUnits to the start of !StrUtf16CodeUnits
|
|
% because we are iterating from the end of String backwards
|
|
% towards its start. We first add the CharUtf16CodeUnits
|
|
% of the last character in String to !StrUtf16CodeUnits,
|
|
% then the one before, and so on, until at the end, we add
|
|
% the CharUtf16CodeUnits of the first character.
|
|
!:StrUtf16CodeUnits = CharUtf16CodeUnits ++ !.StrUtf16CodeUnits
|
|
else
|
|
unexpected($pred, "char.to_utf16 failed")
|
|
)
|
|
),
|
|
utf8_to_utf16_code_units_rev_loop(String, PrevIndex,
|
|
!StrUtf16CodeUnits)
|
|
else
|
|
true
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
from_code_unit_list(CodeList, Str) :-
|
|
Verify = yes,
|
|
do_from_code_unit_list(CodeList, Verify, Str).
|
|
|
|
from_code_unit_list_allow_ill_formed(CodeList, Str) :-
|
|
Verify = no,
|
|
do_from_code_unit_list(CodeList, Verify, Str).
|
|
|
|
:- pred do_from_code_unit_list(list(int)::in, bool::in, string::uo) is semidet.
|
|
|
|
:- pragma foreign_proc("C",
|
|
do_from_code_unit_list(CodeList::in, Verify::in, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, may_not_duplicate, no_sharing],
|
|
"
|
|
MR_Word list_ptr;
|
|
size_t size;
|
|
|
|
size = 0;
|
|
list_ptr = CodeList;
|
|
while (! MR_list_is_empty(list_ptr)) {
|
|
size++;
|
|
list_ptr = MR_list_tail(list_ptr);
|
|
}
|
|
|
|
MR_allocate_aligned_string_msg(Str, size, MR_ALLOC_ID);
|
|
|
|
SUCCESS_INDICATOR = MR_TRUE;
|
|
size = 0;
|
|
list_ptr = CodeList;
|
|
while (! MR_list_is_empty(list_ptr)) {
|
|
unsigned c = (unsigned) MR_list_head(list_ptr);
|
|
// Check for null character or invalid code unit.
|
|
if (c == 0 || c > 0xff) {
|
|
SUCCESS_INDICATOR = MR_FALSE;
|
|
break;
|
|
}
|
|
Str[size] = c;
|
|
size++;
|
|
list_ptr = MR_list_tail(list_ptr);
|
|
}
|
|
|
|
Str[size] = '\\0';
|
|
|
|
if (SUCCESS_INDICATOR && Verify == MR_YES) {
|
|
SUCCESS_INDICATOR = MR_utf8_verify(Str);
|
|
}
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
do_from_code_unit_list(CodeList::in, Verify::in, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness],
|
|
"
|
|
java.lang.StringBuilder sb = new java.lang.StringBuilder();
|
|
|
|
SUCCESS_INDICATOR = true;
|
|
|
|
if (Verify == bool.YES) {
|
|
boolean prev_high = false;
|
|
Iterable<Integer> iterable = new list.ListIterator<Integer>(CodeList);
|
|
for (int i : iterable) {
|
|
// Check for null character or invalid code unit.
|
|
if (i <= 0 || i > 0xffff) {
|
|
SUCCESS_INDICATOR = false;
|
|
break;
|
|
}
|
|
char c = (char) i;
|
|
if (prev_high) {
|
|
if (!java.lang.Character.isLowSurrogate(c)) {
|
|
SUCCESS_INDICATOR = false;
|
|
break;
|
|
}
|
|
prev_high = false;
|
|
} else if (java.lang.Character.isHighSurrogate(c)) {
|
|
prev_high = true;
|
|
} else if (java.lang.Character.isLowSurrogate(c)) {
|
|
SUCCESS_INDICATOR = false;
|
|
break;
|
|
}
|
|
sb.append(c);
|
|
}
|
|
SUCCESS_INDICATOR = SUCCESS_INDICATOR && !prev_high;
|
|
} else {
|
|
Iterable<Integer> iterable = new list.ListIterator<Integer>(CodeList);
|
|
for (int i : iterable) {
|
|
// Check for null character or invalid code unit.
|
|
if (i <= 0 || i > 0xffff) {
|
|
SUCCESS_INDICATOR = false;
|
|
break;
|
|
}
|
|
char c = (char) i;
|
|
sb.append(c);
|
|
}
|
|
}
|
|
|
|
if (SUCCESS_INDICATOR) {
|
|
Str = sb.toString();
|
|
} else {
|
|
Str = """";
|
|
}
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
do_from_code_unit_list(CodeList::in, Verify::in, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness],
|
|
"
|
|
System.Text.StringBuilder sb = new System.Text.StringBuilder();
|
|
|
|
SUCCESS_INDICATOR = true;
|
|
|
|
if (Verify == mr_bool.YES) {
|
|
bool prev_high = false;
|
|
while (!list.is_empty(CodeList)) {
|
|
int i = (int) list.det_head(CodeList);
|
|
// Check for null character or invalid code unit.
|
|
if (i <= 0 || i > 0xffff) {
|
|
SUCCESS_INDICATOR = false;
|
|
break;
|
|
}
|
|
char c = (char) i;
|
|
if (prev_high) {
|
|
if (!System.Char.IsLowSurrogate(c)) {
|
|
SUCCESS_INDICATOR = false;
|
|
break;
|
|
}
|
|
prev_high = false;
|
|
} else if (System.Char.IsHighSurrogate(c)) {
|
|
prev_high = true;
|
|
} else if (System.Char.IsLowSurrogate(c)) {
|
|
SUCCESS_INDICATOR = false;
|
|
break;
|
|
}
|
|
sb.Append(c);
|
|
CodeList = list.det_tail(CodeList);
|
|
}
|
|
SUCCESS_INDICATOR = SUCCESS_INDICATOR && !prev_high;
|
|
} else {
|
|
while (!list.is_empty(CodeList)) {
|
|
int i = (int) list.det_head(CodeList);
|
|
// Check for null character or invalid code unit.
|
|
if (i <= 0 || i > 0xffff) {
|
|
SUCCESS_INDICATOR = false;
|
|
break;
|
|
}
|
|
char c = (char) i;
|
|
sb.Append(c);
|
|
CodeList = list.det_tail(CodeList);
|
|
}
|
|
}
|
|
|
|
if (SUCCESS_INDICATOR) {
|
|
Str = sb.ToString();
|
|
} else {
|
|
Str = """";
|
|
}
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
from_utf8_code_unit_list(CodeUnits, String) :-
|
|
Encoding = internal_string_encoding,
|
|
(
|
|
Encoding = utf8,
|
|
from_code_unit_list(CodeUnits, String)
|
|
;
|
|
Encoding = utf16,
|
|
acc_rev_chars_from_utf8_code_units(CodeUnits, [], RevChars),
|
|
% XXX This checks whether RevChars represents a well-formed string.
|
|
% Why? The call to acc_rev_chars_from_utf8_code_units has ensured that
|
|
% already.
|
|
semidet_from_rev_char_list(RevChars, String)
|
|
).
|
|
|
|
:- pred acc_rev_chars_from_utf8_code_units(list(int)::in,
|
|
list(char)::in, list(char)::out) is semidet.
|
|
|
|
acc_rev_chars_from_utf8_code_units([], !RevChars).
|
|
acc_rev_chars_from_utf8_code_units([A | FollowA], !RevChars) :-
|
|
( if A < 0 then
|
|
fail
|
|
else if A =< 0x7f then % 1-byte sequence
|
|
CodePointInt = A,
|
|
Rest = FollowA
|
|
else if A =< 0xc1 then
|
|
fail
|
|
else if A =< 0xdf then % 2-byte sequence
|
|
FollowA = [B | Rest],
|
|
utf8_is_trail_byte(B),
|
|
CodePointInt = (A /\ 0x1f) << 6
|
|
\/ (B /\ 0x3f),
|
|
CodePointInt >= 0x80
|
|
else if A =< 0xef then % 3-byte sequence
|
|
FollowA = [B, C | Rest],
|
|
utf8_is_trail_byte(B),
|
|
utf8_is_trail_byte(C),
|
|
CodePointInt = (A /\ 0x0f) << 12
|
|
\/ (B /\ 0x3f) << 6
|
|
\/ (C /\ 0x3f),
|
|
CodePointInt >= 0x800,
|
|
not char.char_int_is_surrogate(CodePointInt)
|
|
else if A =< 0xf4 then % 4-byte sequence
|
|
FollowA = [B, C, D | Rest],
|
|
utf8_is_trail_byte(B),
|
|
utf8_is_trail_byte(C),
|
|
utf8_is_trail_byte(D),
|
|
CodePointInt = (A /\ 0x07) << 18
|
|
\/ (B /\ 0x3f) << 12
|
|
\/ (C /\ 0x3f) << 6
|
|
\/ (D /\ 0x3f),
|
|
CodePointInt >= 0x10000
|
|
else
|
|
fail
|
|
),
|
|
char.from_int(CodePointInt, Char),
|
|
!:RevChars = [Char | !.RevChars],
|
|
acc_rev_chars_from_utf8_code_units(Rest, !RevChars).
|
|
|
|
:- pred utf8_is_trail_byte(int::in) is semidet.
|
|
|
|
utf8_is_trail_byte(C) :-
|
|
(C /\ 0xc0) = 0x80.
|
|
|
|
%---------------------%
|
|
|
|
from_utf16_code_unit_list(CodeUnits, String) :-
|
|
Encoding = internal_string_encoding,
|
|
(
|
|
Encoding = utf8,
|
|
acc_rev_chars_from_utf16_code_units(CodeUnits, [], RevChars),
|
|
semidet_from_rev_char_list(RevChars, String)
|
|
;
|
|
Encoding = utf16,
|
|
from_code_unit_list(CodeUnits, String)
|
|
).
|
|
|
|
:- pred acc_rev_chars_from_utf16_code_units(list(int)::in,
|
|
list(char)::in, list(char)::out) is semidet.
|
|
|
|
acc_rev_chars_from_utf16_code_units([], !RevChars).
|
|
acc_rev_chars_from_utf16_code_units([A | FollowA], !RevChars) :-
|
|
( if A < 0 then
|
|
fail
|
|
else if A < 0xd800 then
|
|
CodePointInt = A,
|
|
Rest = FollowA
|
|
else if A < 0xdc00 then
|
|
FollowA = [B | Rest],
|
|
B >= 0xdc00,
|
|
B =< 0xdfff,
|
|
CodePointInt = (A << 10) + B - 0x35fdc00
|
|
else if A =< 0xffff then
|
|
CodePointInt = A,
|
|
Rest = FollowA
|
|
else
|
|
fail
|
|
),
|
|
char.from_int(CodePointInt, Char),
|
|
!:RevChars = [Char | !.RevChars],
|
|
acc_rev_chars_from_utf16_code_units(Rest, !RevChars).
|
|
|
|
%---------------------%
|
|
|
|
duplicate_char(C, N) = S :-
|
|
duplicate_char(C, N, S).
|
|
|
|
duplicate_char(Char, Count, String) :-
|
|
String = from_char_list(list.duplicate(Count, Char)).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Reading characters from strings.
|
|
%
|
|
|
|
% It is important to inline predicates that index into strings,
|
|
% so that the compiler can do loop invariant hoisting on calls to them
|
|
% that occur in loops.
|
|
|
|
:- pragma inline(pred(index/3)).
|
|
:- pragma inline(pred(det_index/3)).
|
|
:- pragma inline(pred(index_next/4)).
|
|
:- pragma inline(pred(index_next_repl/5)).
|
|
:- pragma inline(pred(unsafe_index_next/4)).
|
|
:- pragma inline(pred(unsafe_index_next_repl/5)).
|
|
:- pragma inline(pred(unsafe_index_next_repl_2/5)).
|
|
:- pragma inline(pred(prev_index/4)).
|
|
:- pragma inline(pred(prev_index_repl/5)).
|
|
:- pragma inline(pred(unsafe_prev_index/4)).
|
|
:- pragma inline(pred(unsafe_prev_index_repl/5)).
|
|
:- pragma inline(pred(unsafe_prev_index_repl_2/5)).
|
|
|
|
index(Str, Index, Char) :-
|
|
Len = string.count_code_units(Str),
|
|
( if private_builtin.in_range(Index, Len) then
|
|
unsafe_index(Str, Index, Char)
|
|
else
|
|
fail
|
|
).
|
|
|
|
det_index(S, N) = C :-
|
|
det_index(S, N, C).
|
|
|
|
det_index(String, Int, Char) :-
|
|
( if index(String, Int, Char0) then
|
|
Char = Char0
|
|
else
|
|
unexpected($pred, "index out of range")
|
|
).
|
|
|
|
unsafe_index(S, N) = C :-
|
|
unsafe_index(S, N, C).
|
|
|
|
:- pragma foreign_proc("C",
|
|
unsafe_index(Str::in, Index::in, Ch::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
Ch = (unsigned char) Str[Index];
|
|
if (!MR_is_ascii(Ch)) {
|
|
int width;
|
|
Ch = MR_utf8_get_mb(Str, Index, &width);
|
|
if (Ch < 0) {
|
|
Ch = 0xFFFD;
|
|
}
|
|
}
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
unsafe_index(Str::in, Index::in, Ch::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
char c1 = Str[Index];
|
|
Ch = c1;
|
|
if (System.Char.IsSurrogate(c1)) {
|
|
try {
|
|
char c2 = Str[Index + 1];
|
|
Ch = System.Char.ConvertToUtf32(c1, c2);
|
|
} catch (System.ArgumentOutOfRangeException) {
|
|
// Return unpaired surrogate code point.
|
|
} catch (System.IndexOutOfRangeException) {
|
|
// Return unpaired surrogate code point.
|
|
}
|
|
}
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
unsafe_index(Str::in, Index::in, Ch::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Ch = Str.codePointAt(Index);
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
String ^ elem(Index) = det_index(String, Index).
|
|
String ^ unsafe_elem(Index) = unsafe_index(String, Index).
|
|
|
|
%---------------------%
|
|
|
|
index_next(Str, Index, NextIndex, Char) :-
|
|
index_next_repl(Str, Index, NextIndex, Char, _MaybeReplaced).
|
|
|
|
index_next_repl(Str, Index, NextIndex, Char, MaybeReplaced) :-
|
|
Len = string.count_code_units(Str),
|
|
( if private_builtin.in_range(Index, Len) then
|
|
unsafe_index_next_repl(Str, Index, NextIndex, Char, MaybeReplaced)
|
|
else
|
|
fail
|
|
).
|
|
|
|
unsafe_index_next(Str, Index, NextIndex, Ch) :-
|
|
unsafe_index_next_repl_2(Str, Index, NextIndex, Ch, _ReplacedCodeUnit).
|
|
|
|
unsafe_index_next_repl(Str, Index, NextIndex, Ch, MaybeReplaced) :-
|
|
unsafe_index_next_repl_2(Str, Index, NextIndex, Ch, ReplacedCodeUnit),
|
|
( if ReplacedCodeUnit = -1 then
|
|
MaybeReplaced = not_replaced
|
|
else
|
|
CodeUnit = uint8.cast_from_int(ReplacedCodeUnit),
|
|
MaybeReplaced = replaced_code_unit(CodeUnit)
|
|
).
|
|
|
|
:- pred unsafe_index_next_repl_2(string::in, int::in, int::out, char::uo,
|
|
int::out) is semidet.
|
|
|
|
:- pragma foreign_proc("C",
|
|
unsafe_index_next_repl_2(Str::in, Index::in, NextIndex::out, Ch::uo,
|
|
ReplacedCodeUnit::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
Ch = (unsigned char) Str[Index];
|
|
ReplacedCodeUnit = -1;
|
|
if (MR_is_ascii(Ch)) {
|
|
NextIndex = Index + 1;
|
|
SUCCESS_INDICATOR = (Ch != 0);
|
|
} else {
|
|
NextIndex = Index;
|
|
Ch = MR_utf8_get_next_mb(Str, &NextIndex);
|
|
if (Ch < 0) {
|
|
Ch = 0xfffd;
|
|
ReplacedCodeUnit = (unsigned char) Str[Index];
|
|
NextIndex = Index + 1;
|
|
}
|
|
SUCCESS_INDICATOR = MR_TRUE;
|
|
}
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
unsafe_index_next_repl_2(Str::in, Index::in, NextIndex::out, Ch::uo,
|
|
ReplacedCodeUnit::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
ReplacedCodeUnit = -1;
|
|
try {
|
|
Ch = System.Char.ConvertToUtf32(Str, Index);
|
|
if (Ch <= 0xffff) {
|
|
NextIndex = Index + 1;
|
|
} else {
|
|
NextIndex = Index + 2;
|
|
}
|
|
SUCCESS_INDICATOR = true;
|
|
} catch (System.ArgumentOutOfRangeException) {
|
|
Ch = 0;
|
|
NextIndex = Index;
|
|
SUCCESS_INDICATOR = false;
|
|
} catch (System.ArgumentException) {
|
|
// Return unpaired surrogate code point.
|
|
Ch = Str[Index];
|
|
NextIndex = Index + 1;
|
|
SUCCESS_INDICATOR = true;
|
|
}
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
unsafe_index_next_repl_2(Str::in, Index::in, NextIndex::out, Ch::uo,
|
|
ReplacedCodeUnit::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
ReplacedCodeUnit = -1;
|
|
try {
|
|
Ch = Str.codePointAt(Index);
|
|
NextIndex = Index + java.lang.Character.charCount(Ch);
|
|
SUCCESS_INDICATOR = true;
|
|
} catch (IndexOutOfBoundsException e) {
|
|
Ch = 0;
|
|
NextIndex = Index;
|
|
SUCCESS_INDICATOR = false;
|
|
}
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
prev_index(Str, Index, PrevIndex, Char) :-
|
|
prev_index_repl(Str, Index, PrevIndex, Char, _MaybeReplaced).
|
|
|
|
prev_index_repl(Str, Index, PrevIndex, Char, MaybeReplaced) :-
|
|
Len = string.count_code_units(Str),
|
|
( if private_builtin.in_range(Index - 1, Len) then
|
|
unsafe_prev_index_repl(Str, Index, PrevIndex, Char, MaybeReplaced)
|
|
else
|
|
fail
|
|
).
|
|
|
|
unsafe_prev_index(Str, Index, PrevIndex, Ch) :-
|
|
unsafe_prev_index_repl_2(Str, Index, PrevIndex, Ch, _ReplacedCodeUnit).
|
|
|
|
unsafe_prev_index_repl(Str, Index, PrevIndex, Ch, MaybeReplaced) :-
|
|
unsafe_prev_index_repl_2(Str, Index, PrevIndex, Ch, ReplacedCodeUnit),
|
|
( if ReplacedCodeUnit = -1 then
|
|
MaybeReplaced = not_replaced
|
|
else
|
|
CodeUnit = uint8.cast_from_int(ReplacedCodeUnit),
|
|
MaybeReplaced = replaced_code_unit(CodeUnit)
|
|
).
|
|
|
|
:- pred unsafe_prev_index_repl_2(string::in, int::in, int::out, char::uo,
|
|
int::out) is semidet.
|
|
|
|
:- pragma foreign_proc("C",
|
|
unsafe_prev_index_repl_2(Str::in, Index::in, PrevIndex::out, Ch::uo,
|
|
ReplacedCodeUnit::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
ReplacedCodeUnit = -1;
|
|
if (Index <= 0) {
|
|
PrevIndex = Index;
|
|
Ch = 0;
|
|
SUCCESS_INDICATOR = MR_FALSE;
|
|
} else {
|
|
PrevIndex = Index - 1;
|
|
Ch = (unsigned char) Str[PrevIndex];
|
|
if (! MR_is_ascii(Ch)) {
|
|
Ch = MR_utf8_prev_get(Str, &PrevIndex);
|
|
// XXX MR_utf8_prev_get currently just scans backwards to find a
|
|
// lead byte, so we need a separate check to ensure no bytes are
|
|
// unaccounted for.
|
|
if (Ch < 0 || PrevIndex + MR_utf8_width(Ch) != Index) {
|
|
Ch = 0xfffd;
|
|
ReplacedCodeUnit = (unsigned char) Str[Index - 1];
|
|
PrevIndex = Index - 1;
|
|
}
|
|
}
|
|
SUCCESS_INDICATOR = MR_TRUE;
|
|
}
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
unsafe_prev_index_repl_2(Str::in, Index::in, PrevIndex::out, Ch::uo,
|
|
ReplacedCodeUnit::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
ReplacedCodeUnit = -1;
|
|
if (Index <= 0) {
|
|
Ch = 0;
|
|
PrevIndex = Index;
|
|
SUCCESS_INDICATOR = false;
|
|
} else {
|
|
char c2 = Str[Index - 1];
|
|
if (System.Char.IsLowSurrogate(c2)) {
|
|
try {
|
|
char c1 = Str[Index - 2];
|
|
Ch = System.Char.ConvertToUtf32(c1, c2);
|
|
PrevIndex = Index - 2;
|
|
} catch (System.ArgumentOutOfRangeException) {
|
|
// Return unpaired surrogate code point.
|
|
Ch = (int) c2;
|
|
PrevIndex = Index - 1;
|
|
} catch (System.IndexOutOfRangeException) {
|
|
// Return unpaired surrogate code point.
|
|
Ch = (int) c2;
|
|
PrevIndex = Index - 1;
|
|
}
|
|
} else {
|
|
Ch = (int) c2;
|
|
PrevIndex = Index - 1;
|
|
}
|
|
SUCCESS_INDICATOR = true;
|
|
}
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
unsafe_prev_index_repl_2(Str::in, Index::in, PrevIndex::out, Ch::uo,
|
|
ReplacedCodeUnit::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
ReplacedCodeUnit = -1;
|
|
try {
|
|
Ch = Str.codePointBefore(Index);
|
|
PrevIndex = Index - java.lang.Character.charCount(Ch);
|
|
SUCCESS_INDICATOR = true;
|
|
} catch (IndexOutOfBoundsException e) {
|
|
Ch = 0;
|
|
PrevIndex = Index;
|
|
SUCCESS_INDICATOR = false;
|
|
}
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
:- pragma foreign_proc("C",
|
|
unsafe_index_code_unit(Str::in, Index::in, Code::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
const unsigned char *s = (const unsigned char *) Str;
|
|
Code = s[Index];
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
unsafe_index_code_unit(Str::in, Index::in, Code::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Code = Str[Index];
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
unsafe_index_code_unit(Str::in, Index::in, Code::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Code = Str.charAt(Index);
|
|
").
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Writing characters to strings.
|
|
%
|
|
|
|
set_char(Char, Index, Str0, Str) :-
|
|
( if char.to_int(Char, 0) then
|
|
unexpected($pred, "null character")
|
|
else if
|
|
internal_string_encoding = utf8,
|
|
char.is_surrogate(Char)
|
|
then
|
|
unexpected($pred, "surrogate code point")
|
|
else
|
|
Len0 = string.count_code_units(Str0),
|
|
( if private_builtin.in_range(Index, Len0) then
|
|
unsafe_set_char_copy_string(Char, Index, Len0, Str0, Str)
|
|
else
|
|
fail
|
|
)
|
|
).
|
|
|
|
det_set_char(Char, Index, Str0) = Str :-
|
|
det_set_char(Char, Index, Str0, Str).
|
|
|
|
det_set_char(Char, Index, Str0, Str) :-
|
|
( if set_char(Char, Index, Str0, StrPrime) then
|
|
Str = StrPrime
|
|
else
|
|
unexpected($pred, "index out of range")
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
unsafe_set_char(C, N, S0) = S :-
|
|
unsafe_set_char(C, N, S0, S).
|
|
|
|
unsafe_set_char(Char, Index, Str0, Str) :-
|
|
( if char.to_int(Char, 0) then
|
|
unexpected($pred, "null character")
|
|
else if
|
|
internal_string_encoding = utf8,
|
|
char.is_surrogate(Char)
|
|
then
|
|
unexpected($pred, "surrogate code point")
|
|
else
|
|
Len0 = string.count_code_units(Str0),
|
|
unsafe_set_char_copy_string(Char, Index, Len0, Str0, Str)
|
|
).
|
|
|
|
:- pred unsafe_set_char_copy_string(char, int, int, string, string).
|
|
:- mode unsafe_set_char_copy_string(in, in, in, in, uo) is det.
|
|
|
|
:- pragma foreign_proc("C",
|
|
unsafe_set_char_copy_string(Ch::in, Index::in, Len0::in,
|
|
Str0::in, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness],
|
|
"
|
|
int b;
|
|
size_t oldlen;
|
|
size_t oldwidth;
|
|
size_t newwidth;
|
|
size_t newlen;
|
|
|
|
// The cast to (unsigned char *) is to prevent sign extension.
|
|
b = ((const unsigned char *) Str0)[Index];
|
|
if (MR_utf8_is_lead_byte(b)) {
|
|
MR_Integer next_index = Index;
|
|
int oldc = MR_utf8_get_next_mb(Str0, &next_index);
|
|
if (oldc < 0) {
|
|
oldwidth = 1;
|
|
} else {
|
|
oldwidth = next_index - Index;
|
|
}
|
|
} else {
|
|
oldwidth = 1;
|
|
}
|
|
|
|
if (MR_is_ascii(Ch)) {
|
|
// Fast path.
|
|
newwidth = 1;
|
|
} else {
|
|
newwidth = MR_utf8_width(Ch);
|
|
}
|
|
|
|
oldlen = Len0;
|
|
newlen = oldlen - oldwidth + newwidth;
|
|
|
|
MR_allocate_aligned_string_msg(Str, newlen, MR_ALLOC_ID);
|
|
MR_memcpy(Str, Str0, Index);
|
|
if (MR_is_ascii(Ch)) {
|
|
// Fast path.
|
|
Str[Index] = Ch;
|
|
} else {
|
|
MR_utf8_encode(Str + Index, Ch);
|
|
}
|
|
MR_memcpy(Str + Index + newwidth,
|
|
Str0 + Index + oldwidth,
|
|
oldlen - Index - oldwidth + 1);
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
unsafe_set_char_copy_string(Ch::in, Index::in, _Len0::in,
|
|
Str0::in, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
int oldwidth;
|
|
if (System.Char.IsHighSurrogate(Str0, Index)
|
|
&& Index + 1 < Str0.Length
|
|
&& System.Char.IsLowSurrogate(Str0, Index + 1))
|
|
{
|
|
oldwidth = 2;
|
|
} else {
|
|
oldwidth = 1;
|
|
}
|
|
Str = Str0.Substring(0, Index)
|
|
+ System.Char.ConvertFromUtf32(Ch)
|
|
+ Str0.Substring(Index + oldwidth);
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
unsafe_set_char_copy_string(Ch::in, Index::in, _Len0::in,
|
|
Str0::in, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
int oldc = Str0.codePointAt(Index);
|
|
int oldwidth = java.lang.Character.charCount(oldc);
|
|
Str = Str0.subSequence(0, Index)
|
|
+ new String(Character.toChars(Ch))
|
|
+ Str0.subSequence(Index + oldwidth, Str0.length());
|
|
").
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Determining the lengths of strings.
|
|
%
|
|
|
|
length(S) = L :-
|
|
length(S, L).
|
|
|
|
:- pragma promise_equivalent_clauses(pred(length/2)).
|
|
|
|
:- pragma foreign_proc("C",
|
|
length(Str::in, Length::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
Length = strlen(Str);
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
length(Str::in, Length::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Length = Str.Length;
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
length(Str::in, Length::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Length = Str.length();
|
|
").
|
|
|
|
:- pragma foreign_proc("C",
|
|
length(Str::ui, Length::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
Length = strlen(Str);
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
length(Str::ui, Length::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Length = Str.Length;
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
length(Str::ui, Length::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Length = Str.length();
|
|
").
|
|
|
|
length(Str, Len) :-
|
|
to_code_unit_list(Str, CodeList),
|
|
list.length(CodeList, Len0),
|
|
Len = unsafe_promise_unique(Len0).
|
|
|
|
count_code_units(Str) = length(Str).
|
|
|
|
count_code_units(Str, Length) :-
|
|
length(Str, Length).
|
|
|
|
%---------------------%
|
|
|
|
count_code_points(String) = Count :-
|
|
count_code_points(String, Count).
|
|
|
|
:- pragma foreign_proc("Java",
|
|
count_code_points(String::in, Count::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe, may_not_duplicate],
|
|
"
|
|
Count = String.codePointCount(0, String.length());
|
|
").
|
|
|
|
count_code_points(String, Count) :-
|
|
count_code_points_loop(String, 0, 0, Count).
|
|
|
|
:- pred count_code_points_loop(string::in, int::in, int::in, int::out) is det.
|
|
|
|
count_code_points_loop(String, I, Count0, Count) :-
|
|
( if unsafe_index_next(String, I, J, _) then
|
|
count_code_points_loop(String, J, Count0 + 1, Count)
|
|
else
|
|
Count = Count0
|
|
).
|
|
|
|
count_codepoints(String) = Count :-
|
|
count_code_points(String, Count).
|
|
|
|
count_codepoints(String, Count) :-
|
|
count_code_points(String, Count).
|
|
|
|
%---------------------%
|
|
|
|
count_utf8_code_units(String) = NumUtf8CodeUnits :-
|
|
Encoding = internal_string_encoding,
|
|
(
|
|
Encoding = utf8,
|
|
NumUtf8CodeUnits = string.count_code_units(String)
|
|
;
|
|
Encoding = utf16,
|
|
string.foldl(count_utf16_to_utf8_code_units, String,
|
|
0, NumUtf8CodeUnits)
|
|
).
|
|
|
|
:- pred count_utf16_to_utf8_code_units(char::in, int::in, int::out) is det.
|
|
|
|
count_utf16_to_utf8_code_units(Char, !NumUtf8CodeUnits) :-
|
|
char.to_int(Char, CharInt),
|
|
( if CharInt =< 0x7f then
|
|
!:NumUtf8CodeUnits = !.NumUtf8CodeUnits + 1
|
|
else if char.to_utf8(Char, CharUtf8CodeUnits) then
|
|
!:NumUtf8CodeUnits = !.NumUtf8CodeUnits +
|
|
list.length(CharUtf8CodeUnits)
|
|
else
|
|
error($pred, "surrogate code point")
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
:- pragma foreign_proc("Java",
|
|
code_point_offset(String::in, StartOffset::in, N::in, Index::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe, may_not_duplicate],
|
|
"
|
|
try {
|
|
Index = String.offsetByCodePoints(StartOffset, N);
|
|
SUCCESS_INDICATOR = (Index < String.length());
|
|
} catch (IndexOutOfBoundsException e) {
|
|
Index = -1;
|
|
SUCCESS_INDICATOR = false;
|
|
}
|
|
").
|
|
|
|
code_point_offset(String, StartOffset, N, Index) :-
|
|
StartOffset >= 0,
|
|
NumCodeUnits = string.count_code_units(String),
|
|
code_point_offset_loop(String, StartOffset, NumCodeUnits, N, Index).
|
|
|
|
:- pred code_point_offset_loop(string::in, int::in, int::in, int::in, int::out)
|
|
is semidet.
|
|
|
|
code_point_offset_loop(String, Offset, NumCodeUnits, N, Index) :-
|
|
Offset < NumCodeUnits,
|
|
( if N = 0 then
|
|
Index = Offset
|
|
else
|
|
unsafe_index_next(String, Offset, NextOffset, _),
|
|
code_point_offset_loop(String, NextOffset, NumCodeUnits, N - 1, Index)
|
|
).
|
|
|
|
codepoint_offset(String, StartOffset, N, Index) :-
|
|
code_point_offset(String, StartOffset, N, Index).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
|
|
code_point_offset(String, N, Index) :-
|
|
code_point_offset(String, 0, N, Index).
|
|
|
|
codepoint_offset(String, N, Index) :-
|
|
code_point_offset(String, 0, N, Index).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Computing hashes of strings.
|
|
%
|
|
% Note that these functions are also defined in runtime/mercury_string.h.
|
|
% The definition here and in mercury_string.h must be kept equivalent.
|
|
%
|
|
|
|
hash(String) = HashVal :-
|
|
hash(String, HashVal).
|
|
|
|
hash(String, HashVal) :-
|
|
string.count_code_units(String, NumCodeUnits),
|
|
hash_loop(String, 0, NumCodeUnits, 0, HashVal1),
|
|
HashVal = HashVal1 `xor` NumCodeUnits.
|
|
|
|
:- pred hash_loop(string::in, int::in, int::in, int::in, int::out)
|
|
is det.
|
|
|
|
hash_loop(String, Index, NumCodeUnits, !HashVal) :-
|
|
( if Index < NumCodeUnits then
|
|
unsafe_index_code_unit(String, Index, C),
|
|
!:HashVal = !.HashVal `xor` (!.HashVal `unchecked_left_shift` 5),
|
|
!:HashVal = !.HashVal `xor` C,
|
|
hash_loop(String, Index + 1, NumCodeUnits, !HashVal)
|
|
else
|
|
true
|
|
).
|
|
|
|
hash2(String) = HashVal :-
|
|
string.count_code_units(String, NumCodeUnits),
|
|
hash2_loop(String, 0, NumCodeUnits, 0, HashVal1),
|
|
HashVal = HashVal1 `xor` NumCodeUnits.
|
|
|
|
:- pred hash2_loop(string::in, int::in, int::in, int::in, int::out)
|
|
is det.
|
|
|
|
hash2_loop(String, Index, NumCodeUnits, !HashVal) :-
|
|
( if Index < NumCodeUnits then
|
|
unsafe_index_code_unit(String, Index, C),
|
|
!:HashVal = !.HashVal * 37,
|
|
!:HashVal = !.HashVal + C,
|
|
hash2_loop(String, Index + 1, NumCodeUnits, !HashVal)
|
|
else
|
|
true
|
|
).
|
|
|
|
hash3(String) = HashVal :-
|
|
string.count_code_units(String, NumCodeUnits),
|
|
hash3_loop(String, 0, NumCodeUnits, 0, HashVal1),
|
|
HashVal = HashVal1 `xor` NumCodeUnits.
|
|
|
|
:- pred hash3_loop(string::in, int::in, int::in, int::in, int::out)
|
|
is det.
|
|
|
|
hash3_loop(String, Index, NumCodeUnits, !HashVal) :-
|
|
( if Index < NumCodeUnits then
|
|
unsafe_index_code_unit(String, Index, C),
|
|
!:HashVal = !.HashVal * 49,
|
|
!:HashVal = !.HashVal + C,
|
|
hash3_loop(String, Index + 1, NumCodeUnits, !HashVal)
|
|
else
|
|
true
|
|
).
|
|
|
|
hash4(String) = HashVal :-
|
|
string.count_code_units(String, NumCodeUnits),
|
|
hash4_loop(String, 0, NumCodeUnits, 0, HashVal1),
|
|
HashVal = HashVal1 `xor` NumCodeUnits.
|
|
|
|
:- pred hash4_loop(string::in, int::in, int::in, int::in, int::out)
|
|
is det.
|
|
|
|
hash4_loop(String, Index, NumCodeUnits, !HashVal) :-
|
|
( if Index < NumCodeUnits then
|
|
unsafe_index_code_unit(String, Index, C),
|
|
!:HashVal = keep_30_bits(!.HashVal `xor`
|
|
(!.HashVal `unchecked_left_shift` 5)),
|
|
!:HashVal = !.HashVal `xor` C,
|
|
hash4_loop(String, Index + 1, NumCodeUnits, !HashVal)
|
|
else
|
|
true
|
|
).
|
|
|
|
hash5(String) = HashVal :-
|
|
string.count_code_units(String, NumCodeUnits),
|
|
hash5_loop(String, 0, NumCodeUnits, 0, HashVal1),
|
|
HashVal = HashVal1 `xor` NumCodeUnits.
|
|
|
|
:- pred hash5_loop(string::in, int::in, int::in, int::in, int::out)
|
|
is det.
|
|
|
|
hash5_loop(String, Index, NumCodeUnits, !HashVal) :-
|
|
( if Index < NumCodeUnits then
|
|
unsafe_index_code_unit(String, Index, C),
|
|
!:HashVal = keep_30_bits(!.HashVal * 37),
|
|
!:HashVal = keep_30_bits(!.HashVal + C),
|
|
hash5_loop(String, Index + 1, NumCodeUnits, !HashVal)
|
|
else
|
|
true
|
|
).
|
|
|
|
hash6(String) = HashVal :-
|
|
string.count_code_units(String, NumCodeUnits),
|
|
hash6_loop(String, 0, NumCodeUnits, 0, HashVal1),
|
|
HashVal = HashVal1 `xor` NumCodeUnits.
|
|
|
|
:- pred hash6_loop(string::in, int::in, int::in, int::in, int::out)
|
|
is det.
|
|
|
|
hash6_loop(String, Index, NumCodeUnits, !HashVal) :-
|
|
( if Index < NumCodeUnits then
|
|
unsafe_index_code_unit(String, Index, C),
|
|
!:HashVal = keep_30_bits(!.HashVal * 49),
|
|
!:HashVal = keep_30_bits(!.HashVal + C),
|
|
hash6_loop(String, Index + 1, NumCodeUnits, !HashVal)
|
|
else
|
|
true
|
|
).
|
|
|
|
:- func keep_30_bits(int) = int.
|
|
|
|
keep_30_bits(N) = N /\ ((1 `unchecked_left_shift` 30) - 1).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Tests on strings.
|
|
%
|
|
|
|
is_empty("").
|
|
|
|
%---------------------%
|
|
|
|
is_well_formed(String) :-
|
|
find_first_ill_formed_pos(String, FirstIllFormedPos),
|
|
FirstIllFormedPos < 0.
|
|
|
|
check_well_formedness(String, Result) :-
|
|
find_first_ill_formed_pos(String, FirstIllFormedPos),
|
|
( if FirstIllFormedPos < 0 then
|
|
Result = well_formed
|
|
else
|
|
Result = ill_formed(FirstIllFormedPos)
|
|
).
|
|
|
|
% Return the position (as an offset in the code unit appropriate
|
|
% to the UTF version used by the target platform) of the first place
|
|
% in the string that is not well formed.
|
|
%
|
|
:- pred find_first_ill_formed_pos(string::in, int::out) is det.
|
|
|
|
:- pragma foreign_proc("C",
|
|
find_first_ill_formed_pos(S::in, FirstIllFormedPos::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness],
|
|
"
|
|
FirstIllFormedPos = MR_utf8_find_ill_formed_char(S);
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
find_first_ill_formed_pos(S::in, FirstIllFormedPos::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness],
|
|
"
|
|
FirstIllFormedPos = -1;
|
|
for (int i = 0; i < S.length(); i++) {
|
|
if (java.lang.Character.isLowSurrogate(S.charAt(i))) {
|
|
FirstIllFormedPos = i;
|
|
break;
|
|
}
|
|
if (java.lang.Character.isHighSurrogate(S.charAt(i))) {
|
|
i++;
|
|
if (i >= S.length() ||
|
|
!java.lang.Character.isLowSurrogate(S.charAt(i)))
|
|
{
|
|
FirstIllFormedPos = i-1; // Could also be just i.
|
|
break;
|
|
}
|
|
}
|
|
}
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
find_first_ill_formed_pos(S::in, FirstIllFormedPos::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness],
|
|
"
|
|
FirstIllFormedPos = -1;
|
|
for (int i = 0; i < S.Length; i++) {
|
|
if (System.Char.IsLowSurrogate(S[i])) {
|
|
FirstIllFormedPos = i;
|
|
break;
|
|
}
|
|
if (System.Char.IsHighSurrogate(S[i])) {
|
|
i++;
|
|
if (i >= S.Length || !System.Char.IsLowSurrogate(S[i])) {
|
|
FirstIllFormedPos = i-1; // Could also be just i.
|
|
break;
|
|
}
|
|
}
|
|
}
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
% For speed, most of these predicates have C versions as well as
|
|
% Mercury versions. XXX why not all?
|
|
|
|
:- pragma foreign_proc("C",
|
|
is_all_alpha(S::in),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, may_duplicate, no_sharing],
|
|
"
|
|
const char *p;
|
|
|
|
SUCCESS_INDICATOR = MR_TRUE;
|
|
for (p = S; *p != '\\0'; p++) {
|
|
switch (*p) {
|
|
case 'a': case 'b': case 'c': case 'd': case 'e':
|
|
case 'f': case 'g': case 'h': case 'i': case 'j':
|
|
case 'k': case 'l': case 'm': case 'n': case 'o':
|
|
case 'p': case 'q': case 'r': case 's': case 't':
|
|
case 'u': case 'v': case 'w': case 'x': case 'y':
|
|
case 'z':
|
|
|
|
case 'A': case 'B': case 'C': case 'D': case 'E':
|
|
case 'F': case 'G': case 'H': case 'I': case 'J':
|
|
case 'K': case 'L': case 'M': case 'N': case 'O':
|
|
case 'P': case 'Q': case 'R': case 'S': case 'T':
|
|
case 'U': case 'V': case 'W': case 'X': case 'Y':
|
|
case 'Z':
|
|
continue;
|
|
|
|
default:
|
|
SUCCESS_INDICATOR = MR_FALSE;
|
|
break;
|
|
}
|
|
}
|
|
").
|
|
|
|
is_all_alpha(S) :-
|
|
all_match(char.is_alpha, S).
|
|
|
|
is_all_alnum(S) :-
|
|
all_match(char.is_alnum, S).
|
|
|
|
:- pragma foreign_proc("C",
|
|
is_all_alpha_or_underscore(S::in),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, may_duplicate, no_sharing],
|
|
"
|
|
const char *p;
|
|
|
|
SUCCESS_INDICATOR = MR_TRUE;
|
|
for (p = S; *p != '\\0'; p++) {
|
|
switch (*p) {
|
|
case 'a': case 'b': case 'c': case 'd': case 'e':
|
|
case 'f': case 'g': case 'h': case 'i': case 'j':
|
|
case 'k': case 'l': case 'm': case 'n': case 'o':
|
|
case 'p': case 'q': case 'r': case 's': case 't':
|
|
case 'u': case 'v': case 'w': case 'x': case 'y':
|
|
case 'z':
|
|
|
|
case 'A': case 'B': case 'C': case 'D': case 'E':
|
|
case 'F': case 'G': case 'H': case 'I': case 'J':
|
|
case 'K': case 'L': case 'M': case 'N': case 'O':
|
|
case 'P': case 'Q': case 'R': case 'S': case 'T':
|
|
case 'U': case 'V': case 'W': case 'X': case 'Y':
|
|
case 'Z':
|
|
|
|
case '_':
|
|
continue;
|
|
|
|
default:
|
|
SUCCESS_INDICATOR = MR_FALSE;
|
|
break;
|
|
}
|
|
}
|
|
").
|
|
|
|
is_all_alpha_or_underscore(S) :-
|
|
all_match(char.is_alpha_or_underscore, S).
|
|
|
|
:- pragma foreign_proc("C",
|
|
is_all_alnum_or_underscore(S::in),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, may_duplicate, no_sharing],
|
|
"
|
|
const char *p;
|
|
|
|
SUCCESS_INDICATOR = MR_TRUE;
|
|
for (p = S; *p != '\\0'; p++) {
|
|
switch (*p) {
|
|
case '0': case '1': case '2': case '3': case '4':
|
|
case '5': case '6': case '7': case '8': case '9':
|
|
|
|
case 'a': case 'b': case 'c': case 'd': case 'e':
|
|
case 'f': case 'g': case 'h': case 'i': case 'j':
|
|
case 'k': case 'l': case 'm': case 'n': case 'o':
|
|
case 'p': case 'q': case 'r': case 's': case 't':
|
|
case 'u': case 'v': case 'w': case 'x': case 'y':
|
|
case 'z':
|
|
|
|
case 'A': case 'B': case 'C': case 'D': case 'E':
|
|
case 'F': case 'G': case 'H': case 'I': case 'J':
|
|
case 'K': case 'L': case 'M': case 'N': case 'O':
|
|
case 'P': case 'Q': case 'R': case 'S': case 'T':
|
|
case 'U': case 'V': case 'W': case 'X': case 'Y':
|
|
case 'Z':
|
|
|
|
case '_':
|
|
continue;
|
|
|
|
default:
|
|
SUCCESS_INDICATOR = MR_FALSE;
|
|
break;
|
|
}
|
|
}
|
|
").
|
|
|
|
is_all_alnum_or_underscore(S) :-
|
|
all_match(char.is_alnum_or_underscore, S).
|
|
|
|
% The C version is faster than the Mercury version.
|
|
:- pragma foreign_proc("C",
|
|
is_all_digits(S::in),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, may_duplicate, no_sharing],
|
|
"
|
|
const char *p;
|
|
|
|
SUCCESS_INDICATOR = MR_TRUE;
|
|
for (p = S; *p != '\\0'; p++) {
|
|
switch (*p) {
|
|
case '0': case '1': case '2': case '3': case '4':
|
|
case '5': case '6': case '7': case '8': case '9':
|
|
continue;
|
|
|
|
default:
|
|
SUCCESS_INDICATOR = MR_FALSE;
|
|
break;
|
|
}
|
|
}
|
|
").
|
|
|
|
is_all_digits(S) :-
|
|
all_match(char.is_digit, S).
|
|
|
|
%---------------------%
|
|
|
|
all_match(P, String) :-
|
|
all_match_loop(P, String, 0).
|
|
|
|
:- pred all_match_loop(pred(char)::in(pred(in) is semidet), string::in,
|
|
int::in) is semidet.
|
|
|
|
all_match_loop(P, String, Cur) :-
|
|
( if unsafe_index_next_repl(String, Cur, Next, Char, MaybeReplaced) then
|
|
MaybeReplaced = not_replaced,
|
|
P(Char),
|
|
all_match_loop(P, String, Next)
|
|
else
|
|
true
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
contains_match(P, String) :-
|
|
contains_match_loop(P, String, 0).
|
|
|
|
:- pred contains_match_loop(pred(char)::in(pred(in) is semidet), string::in,
|
|
int::in) is semidet.
|
|
|
|
contains_match_loop(P, String, Cur) :-
|
|
unsafe_index_next_repl(String, Cur, Next, Char, MaybeReplaced),
|
|
( if
|
|
MaybeReplaced = not_replaced,
|
|
P(Char)
|
|
then
|
|
true
|
|
else
|
|
contains_match_loop(P, String, Next)
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
contains_char(Str, Char) :-
|
|
find_first_char(Str, Char, _).
|
|
|
|
%---------------------%
|
|
|
|
compare_substrings(Res, X, StartX, Y, StartY, Length) :-
|
|
LengthX = string.count_code_units(X),
|
|
LengthY = string.count_code_units(Y),
|
|
( if
|
|
Length >= 0,
|
|
StartX >= 0,
|
|
StartY >= 0,
|
|
StartX + Length =< LengthX,
|
|
StartY + Length =< LengthY
|
|
then
|
|
unsafe_compare_substrings(Res, X, StartX, Y, StartY, Length)
|
|
else
|
|
fail
|
|
).
|
|
|
|
:- pragma foreign_proc("C",
|
|
unsafe_compare_substrings(Res::uo, X::in, StartX::in, Y::in, StartY::in,
|
|
Length::in),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
int res = memcmp(X + StartX, Y + StartY, Length);
|
|
Res = ((res < 0) ? MR_COMPARE_LESS
|
|
: (res == 0) ? MR_COMPARE_EQUAL
|
|
: MR_COMPARE_GREATER);
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
unsafe_compare_substrings(Res::uo, X::in, StartX::in, Y::in, StartY::in,
|
|
Length::in),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
int res = System.String.CompareOrdinal(X, StartX, Y, StartY, Length);
|
|
Res = ((res < 0) ? builtin.COMPARE_LESS
|
|
: (res == 0) ? builtin.COMPARE_EQUAL
|
|
: builtin.COMPARE_GREATER);
|
|
").
|
|
|
|
unsafe_compare_substrings(Res, X, StartX, Y, StartY, Length) :-
|
|
unsafe_compare_substrings_loop(X, Y, StartX, StartY, Length, Res).
|
|
|
|
:- pred unsafe_compare_substrings_loop(string::in, string::in,
|
|
int::in, int::in, int::in, comparison_result::uo) is det.
|
|
|
|
unsafe_compare_substrings_loop(X, Y, IX, IY, Rem, Res) :-
|
|
( if Rem = 0 then
|
|
Res = (=)
|
|
else
|
|
unsafe_index_code_unit(X, IX, CodeX),
|
|
unsafe_index_code_unit(Y, IY, CodeY),
|
|
compare(Res0, CodeX, CodeY),
|
|
(
|
|
Res0 = (=),
|
|
unsafe_compare_substrings_loop(X, Y, IX + 1, IY + 1, Rem - 1, Res)
|
|
;
|
|
( Res0 = (<)
|
|
; Res0 = (>)
|
|
),
|
|
Res = Res0
|
|
)
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
compare_ignore_case_ascii(Res, X, Y) :-
|
|
LenX = string.count_code_units(X),
|
|
LenY = string.count_code_units(Y),
|
|
CommonLen = min(LenX, LenY),
|
|
compare_ignore_case_ascii_loop(X, Y, 0, CommonLen, Res0),
|
|
(
|
|
Res0 = (=),
|
|
compare(Res, LenX, LenY)
|
|
;
|
|
( Res0 = (<)
|
|
; Res0 = (>)
|
|
),
|
|
Res = Res0
|
|
).
|
|
|
|
:- pred compare_ignore_case_ascii_loop(string::in, string::in,
|
|
int::in, int::in, comparison_result::uo) is det.
|
|
|
|
compare_ignore_case_ascii_loop(X, Y, I, CommonLen, Res) :-
|
|
( if I = CommonLen then
|
|
Res = (=)
|
|
else
|
|
unsafe_index_code_unit(X, I, CodeX),
|
|
unsafe_index_code_unit(Y, I, CodeY),
|
|
to_lower_code_unit(CodeX, LowerCodeX),
|
|
to_lower_code_unit(CodeY, LowerCodeY),
|
|
compare(Res0, LowerCodeX, LowerCodeY),
|
|
(
|
|
Res0 = (=),
|
|
compare_ignore_case_ascii_loop(X, Y, I + 1, CommonLen, Res)
|
|
;
|
|
( Res0 = (<)
|
|
; Res0 = (>)
|
|
),
|
|
Res = Res0
|
|
)
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
prefix_length(P, S) = Index :-
|
|
prefix_length_loop(P, S, 0, Index).
|
|
|
|
:- pred prefix_length_loop(pred(char)::in(pred(in) is semidet),
|
|
string::in, int::in, int::out) is det.
|
|
|
|
prefix_length_loop(P, S, I, Index) :-
|
|
( if
|
|
unsafe_index_next_repl(S, I, J, Char, not_replaced),
|
|
P(Char)
|
|
then
|
|
prefix_length_loop(P, S, J, Index)
|
|
else
|
|
Index = I
|
|
).
|
|
|
|
suffix_length(P, S) = End - Index :-
|
|
End = string.count_code_units(S),
|
|
suffix_length_loop(P, S, End, Index).
|
|
|
|
:- pred suffix_length_loop(pred(char)::in(pred(in) is semidet),
|
|
string::in, int::in, int::out) is det.
|
|
|
|
suffix_length_loop(P, S, I, Index) :-
|
|
( if
|
|
unsafe_prev_index_repl(S, I, J, Char, not_replaced),
|
|
P(Char)
|
|
then
|
|
suffix_length_loop(P, S, J, Index)
|
|
else
|
|
Index = I
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
sub_string_search(WholeString, Pattern, Index) :-
|
|
sub_string_search_start(WholeString, Pattern, 0, Index).
|
|
|
|
sub_string_search_start(WholeString, Pattern, BeginAt, Index) :-
|
|
( if
|
|
(
|
|
BeginAt = 0
|
|
;
|
|
BeginAt > 0,
|
|
BeginAt =< string.count_code_units(WholeString)
|
|
)
|
|
then
|
|
unsafe_sub_string_search_start(WholeString, Pattern, BeginAt, Index)
|
|
else
|
|
fail
|
|
).
|
|
|
|
:- pragma foreign_proc("C",
|
|
unsafe_sub_string_search_start(WholeString::in, Pattern::in, BeginAt::in,
|
|
Index::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"{
|
|
char *match = strstr(WholeString + BeginAt, Pattern);
|
|
if (match) {
|
|
Index = match - WholeString;
|
|
SUCCESS_INDICATOR = MR_TRUE;
|
|
} else {
|
|
SUCCESS_INDICATOR = MR_FALSE;
|
|
}
|
|
}").
|
|
:- pragma foreign_proc("C#",
|
|
unsafe_sub_string_search_start(WholeString::in, Pattern::in, BeginAt::in,
|
|
Index::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"{
|
|
Index = WholeString.IndexOf(Pattern, BeginAt,
|
|
System.StringComparison.Ordinal);
|
|
SUCCESS_INDICATOR = (Index >= 0);
|
|
}").
|
|
:- pragma foreign_proc("Java",
|
|
unsafe_sub_string_search_start(WholeString::in, Pattern::in, BeginAt::in,
|
|
Index::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Index = WholeString.indexOf(Pattern, BeginAt);
|
|
SUCCESS_INDICATOR = (Index >= 0);
|
|
").
|
|
|
|
unsafe_sub_string_search_start(String, SubString, BeginAt, Index) :-
|
|
Len = string.count_code_units(String),
|
|
SubLen = string.count_code_units(SubString),
|
|
LastStart = Len - SubLen,
|
|
unsafe_sub_string_search_start_loop(String, SubString, BeginAt, LastStart,
|
|
SubLen, Index).
|
|
|
|
% Brute force string searching. For short Strings this is good;
|
|
% for longer strings Boyer-Moore is much better.
|
|
%
|
|
:- pred unsafe_sub_string_search_start_loop(string::in, string::in, int::in,
|
|
int::in, int::in, int::out) is semidet.
|
|
|
|
unsafe_sub_string_search_start_loop(String, SubString, I, LastI,
|
|
SubLen, Index) :-
|
|
I =< LastI,
|
|
( if unsafe_compare_substrings((=), String, I, SubString, 0, SubLen) then
|
|
Index = I
|
|
else
|
|
unsafe_sub_string_search_start_loop(String, SubString, I + 1, LastI,
|
|
SubLen, Index)
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
find_first_char(Str, Char, Index) :-
|
|
unsafe_find_first_char_start(Str, Char, 0, Index).
|
|
|
|
find_first_char_start(Str, Char, BeginAt, Index) :-
|
|
( if
|
|
(
|
|
BeginAt = 0
|
|
;
|
|
BeginAt > 0,
|
|
BeginAt < string.count_code_units(Str)
|
|
)
|
|
then
|
|
unsafe_find_first_char_start(Str, Char, BeginAt, Index)
|
|
else
|
|
fail
|
|
).
|
|
|
|
:- pragma foreign_proc("C",
|
|
unsafe_find_first_char_start(Str::in, Ch::in, BeginAt::in, Index::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
char *p = NULL;
|
|
|
|
if (MR_is_ascii(Ch)) {
|
|
// strchr will always find the null terminator, but the terminator
|
|
// is not part of the string.
|
|
if (Ch != '\\0') {
|
|
p = strchr(Str + BeginAt, Ch);
|
|
}
|
|
} else {
|
|
char buf[5];
|
|
size_t len;
|
|
|
|
len = MR_utf8_encode(buf, Ch);
|
|
if (len > 0) {
|
|
buf[len] = '\\0';
|
|
p = strstr(Str + BeginAt, buf);
|
|
}
|
|
}
|
|
if (p != NULL) {
|
|
SUCCESS_INDICATOR = MR_TRUE;
|
|
Index = (p - Str);
|
|
} else {
|
|
SUCCESS_INDICATOR = MR_FALSE;
|
|
Index = -1;
|
|
}
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
unsafe_find_first_char_start(Str::in, Ch::in, BeginAt::in, Index::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
if (Ch <= 0xffff) {
|
|
Index = Str.IndexOf((char) Ch, BeginAt);
|
|
} else {
|
|
string s = System.Char.ConvertFromUtf32(Ch);
|
|
Index = Str.IndexOf(s, BeginAt, System.StringComparison.Ordinal);
|
|
}
|
|
SUCCESS_INDICATOR = (Index >= 0);
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
unsafe_find_first_char_start(Str::in, Ch::in, BeginAt::in, Index::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Index = Str.indexOf(Ch, BeginAt);
|
|
SUCCESS_INDICATOR = (Index >= 0);
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
find_last_char(Str, Char, Index) :-
|
|
% For C grades, find_last_char_2 only works for a single code unit.
|
|
% i.e. an ASCII character.
|
|
( if
|
|
string.internal_string_encoding = utf8,
|
|
not char.is_ascii(Char)
|
|
then
|
|
find_last_char_loop(Str, Char, 0, string.count_code_units(Str), Index)
|
|
else
|
|
find_last_char_2(Str, Char, Index)
|
|
).
|
|
|
|
% NOTE: the C implementation expects an ASCII character only.
|
|
%
|
|
:- pred find_last_char_2(string::in, char::in, int::out) is semidet.
|
|
|
|
:- pragma foreign_proc("C",
|
|
find_last_char_2(Str::in, Ch::in, Index::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
char *p = NULL;
|
|
// strchr will always find the null terminator, but the terminator
|
|
// is not part of the string.
|
|
if (Ch != '\\0') {
|
|
p = strrchr(Str, Ch);
|
|
}
|
|
if (p != NULL) {
|
|
SUCCESS_INDICATOR = MR_TRUE;
|
|
Index = (p - Str);
|
|
} else {
|
|
SUCCESS_INDICATOR = MR_FALSE;
|
|
Index = -1;
|
|
}
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
find_last_char_2(Str::in, Ch::in, Index::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
if (Ch <= 0xffff) {
|
|
Index = Str.LastIndexOf((char) Ch);
|
|
} else {
|
|
string s = System.Char.ConvertFromUtf32(Ch);
|
|
Index = Str.LastIndexOf(s, System.StringComparison.Ordinal);
|
|
}
|
|
SUCCESS_INDICATOR = (Index >= 0);
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
find_last_char_2(Str::in, Ch::in, Index::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Index = Str.lastIndexOf(Ch);
|
|
SUCCESS_INDICATOR = (Index >= 0);
|
|
").
|
|
|
|
% Pre-condition: LowerIndex =< Index0
|
|
%
|
|
:- pred find_last_char_loop(string::in, char::in, int::in, int::in, int::out)
|
|
is semidet.
|
|
|
|
find_last_char_loop(Str, MatchChar, LowerIndex, Index0, MatchIndex) :-
|
|
string.unsafe_prev_index_repl(Str, Index0, Index1, Char, MaybeReplaced),
|
|
LowerIndex =< Index1,
|
|
( if
|
|
MaybeReplaced = not_replaced,
|
|
MatchChar = Char
|
|
then
|
|
MatchIndex = Index1
|
|
else
|
|
find_last_char_loop(Str, MatchChar, LowerIndex, Index1, MatchIndex)
|
|
).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Appending strings.
|
|
%
|
|
|
|
append(S1, S2) = S3 :-
|
|
append(S1, S2, S3).
|
|
|
|
:- pragma promise_equivalent_clauses(pred(append/3)).
|
|
|
|
append(S1::in, S2::in, S3::in) :-
|
|
append_iii(S1, S2, S3).
|
|
append(S1::in, S2::uo, S3::in) :-
|
|
append_ioi(S1, S2, S3).
|
|
append(S1::in, S2::in, S3::uo) :-
|
|
append_iio(S1, S2, S3).
|
|
append(S1::uo, S2::in, S3::in) :-
|
|
append_oii(S1, S2, S3).
|
|
|
|
:- pred append_iii(string::in, string::in, string::in) is semidet.
|
|
|
|
append_iii(S1, S2, S3) :-
|
|
Len1 = string.count_code_units(S1),
|
|
Len2 = string.count_code_units(S2),
|
|
Len3 = string.count_code_units(S3),
|
|
( if Len3 = Len1 + Len2 then
|
|
unsafe_compare_substrings((=), S1, 0, S3, 0, Len1),
|
|
unsafe_compare_substrings((=), S2, 0, S3, Len1, Len2)
|
|
else
|
|
fail
|
|
).
|
|
|
|
:- pred append_ioi(string::in, string::uo, string::in) is semidet.
|
|
|
|
append_ioi(S1, S2, S3) :-
|
|
Len1 = string.count_code_units(S1),
|
|
Len3 = string.count_code_units(S3),
|
|
( if
|
|
Len1 =< Len3,
|
|
unsafe_compare_substrings((=), S1, 0, S3, 0, Len1)
|
|
then
|
|
string.unsafe_between(S3, Len1, Len3, S2)
|
|
else
|
|
fail
|
|
).
|
|
|
|
:- pred append_iio(string::in, string::in, string::uo) is det.
|
|
|
|
:- pragma foreign_proc("C",
|
|
append_iio(S1::in, S2::in, S3::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"{
|
|
size_t len_1, len_2;
|
|
len_1 = strlen(S1);
|
|
len_2 = strlen(S2);
|
|
MR_allocate_aligned_string_msg(S3, len_1 + len_2, MR_ALLOC_ID);
|
|
strcpy(S3, S1);
|
|
strcpy(S3 + len_1, S2);
|
|
}").
|
|
:- pragma foreign_proc("C#",
|
|
append_iio(S1::in, S2::in, S3::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"{
|
|
S3 = System.String.Concat(S1, S2);
|
|
}").
|
|
:- pragma foreign_proc("Java",
|
|
append_iio(S1::in, S2::in, S3::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S3 = S1.concat(S2);
|
|
").
|
|
|
|
:- pred append_oii(string::uo, string::in, string::in) is semidet.
|
|
|
|
append_oii(S1, S2, S3) :-
|
|
Len2 = string.count_code_units(S2),
|
|
Len3 = string.count_code_units(S3),
|
|
( if
|
|
Len2 =< Len3,
|
|
Len1 = Len3 - Len2,
|
|
compare_substrings((=), S3, Len1, S2, 0, Len2)
|
|
then
|
|
string.unsafe_between(S3, 0, Len1, S1)
|
|
else
|
|
fail
|
|
).
|
|
|
|
nondet_append(S1, S2, S3) :-
|
|
Len3 = string.count_code_units(S3),
|
|
nondet_append_2(0, Len3, S1, S2, S3).
|
|
|
|
:- pred nondet_append_2(int::in, int::in, string::out, string::out,
|
|
string::in) is multi.
|
|
|
|
nondet_append_2(Start2, Len3, S1, S2, S3) :-
|
|
(
|
|
unsafe_between(S3, 0, Start2, S1),
|
|
unsafe_between(S3, Start2, Len3, S2)
|
|
;
|
|
unsafe_index_next(S3, Start2, NextStart2, _Char),
|
|
nondet_append_2(NextStart2, Len3, S1, S2, S3)
|
|
).
|
|
|
|
S1 ++ S2 = append(S1, S2).
|
|
|
|
%---------------------%
|
|
%
|
|
% We implement append_list in foreign code as the Mercury version
|
|
% creates some unnecessary garbage.
|
|
%
|
|
|
|
:- pragma foreign_proc("C",
|
|
append_list(Strs::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, may_not_duplicate, no_sharing],
|
|
"{
|
|
MR_Word list = Strs;
|
|
size_t len;
|
|
|
|
// Determine the total length of all strings.
|
|
len = 0;
|
|
while (!MR_list_is_empty(list)) {
|
|
len += strlen((MR_String) MR_list_head(list));
|
|
list = MR_list_tail(list);
|
|
}
|
|
|
|
// Allocate enough word aligned memory for the string.
|
|
MR_allocate_aligned_string_msg(Str, len, MR_ALLOC_ID);
|
|
|
|
// Copy the strings into the new memory.
|
|
len = 0;
|
|
list = Strs;
|
|
while (!MR_list_is_empty(list)) {
|
|
strcpy((MR_String) Str + len, (MR_String) MR_list_head(list));
|
|
len += strlen((MR_String) MR_list_head(list));
|
|
list = MR_list_tail(list);
|
|
}
|
|
|
|
Str[len] = '\\0';
|
|
}").
|
|
:- pragma foreign_proc("C#",
|
|
append_list(Strs::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness],
|
|
"
|
|
System.Text.StringBuilder sb = new System.Text.StringBuilder();
|
|
while (!list.is_empty(Strs)) {
|
|
sb.Append((string) list.det_head(Strs));
|
|
Strs = list.det_tail(Strs);
|
|
}
|
|
Str = sb.ToString();
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
append_list(Strs::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness],
|
|
"
|
|
java.lang.StringBuilder sb = new java.lang.StringBuilder();
|
|
|
|
Iterable<String> iterable = new list.ListIterator<String>(Strs);
|
|
for (String s : iterable) {
|
|
sb.append(s);
|
|
}
|
|
|
|
Str = sb.toString();
|
|
").
|
|
|
|
append_list(Strs) = Str :-
|
|
append_list(Strs, Str).
|
|
|
|
append_list(Strs, Str) :-
|
|
Pieces = map(make_string_piece, Strs),
|
|
unsafe_append_string_pieces(Pieces, Str).
|
|
|
|
:- func make_string_piece(string) = string_piece.
|
|
|
|
make_string_piece(S) = substring(S, 0, string.count_code_units(S)).
|
|
|
|
%---------------------%
|
|
%
|
|
% We implement join_list in foreign code as the Mercury version
|
|
% creates some unnecessary garbage.
|
|
%
|
|
|
|
:- pragma foreign_proc("C",
|
|
join_list(Sep::in, Strs::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, may_not_duplicate, no_sharing],
|
|
"{
|
|
MR_Word list;
|
|
size_t len;
|
|
size_t sep_len;
|
|
MR_bool add_sep;
|
|
|
|
list = Strs;
|
|
len = 0;
|
|
sep_len = strlen(Sep);
|
|
|
|
// Determine the total length of all strings.
|
|
len = 0;
|
|
add_sep = MR_FALSE;
|
|
while (!MR_list_is_empty(list)) {
|
|
if (add_sep) {
|
|
len += sep_len;
|
|
}
|
|
|
|
len += strlen((MR_String) MR_list_head(list));
|
|
list = MR_list_tail(list);
|
|
add_sep = MR_TRUE;
|
|
}
|
|
|
|
MR_allocate_aligned_string_msg(Str, len, MR_ALLOC_ID);
|
|
|
|
// Copy the strings into the new memory.
|
|
len = 0;
|
|
list = Strs;
|
|
add_sep = MR_FALSE;
|
|
while (!MR_list_is_empty(list)) {
|
|
if (add_sep) {
|
|
strcpy((MR_String) Str + len, Sep);
|
|
len += sep_len;
|
|
}
|
|
|
|
strcpy((MR_String) Str + len, (MR_String) MR_list_head(list));
|
|
len += strlen((MR_String) MR_list_head(list));
|
|
list = MR_list_tail(list);
|
|
add_sep = MR_TRUE;
|
|
}
|
|
|
|
Str[len] = '\\0';
|
|
}").
|
|
:- pragma foreign_proc("C#",
|
|
join_list(Sep::in, Strs::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness],
|
|
"
|
|
System.Text.StringBuilder sb = new System.Text.StringBuilder();
|
|
bool add_sep = false;
|
|
|
|
while (!list.is_empty(Strs)) {
|
|
if (add_sep) {
|
|
sb.Append(Sep);
|
|
}
|
|
sb.Append((string) list.det_head(Strs));
|
|
add_sep = true;
|
|
Strs = list.det_tail(Strs);
|
|
}
|
|
|
|
Str = sb.ToString();
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
join_list(Sep::in, Strs::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness],
|
|
"
|
|
java.lang.StringBuilder sb = new java.lang.StringBuilder();
|
|
boolean add_sep = false;
|
|
|
|
Iterable<String> iterable = new list.ListIterator<String>(Strs);
|
|
for (String s : iterable) {
|
|
if (add_sep) {
|
|
sb.append(Sep);
|
|
}
|
|
sb.append(s);
|
|
add_sep = true;
|
|
}
|
|
|
|
Str = sb.toString();
|
|
").
|
|
|
|
join_list(_Sep, []) = "".
|
|
join_list(Sep, [H | T]) = Str :-
|
|
join_list_loop(make_string_piece(Sep), T, TailPieces),
|
|
Pieces = [make_string_piece(H) | TailPieces],
|
|
unsafe_append_string_pieces(Pieces, Str).
|
|
|
|
:- pred join_list_loop(string_piece::in, list(string)::in,
|
|
list(string_piece)::out) is det.
|
|
|
|
join_list_loop(_Sep, [], []).
|
|
join_list_loop(Sep, [H | T], Pieces) :-
|
|
join_list_loop(Sep, T, TailPieces),
|
|
Pieces = [Sep, make_string_piece(H) | TailPieces].
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Making strings from smaller pieces.
|
|
%
|
|
|
|
:- type string_buffer
|
|
---> string_buffer(string).
|
|
|
|
:- pragma foreign_type("C", string_buffer, "char *",
|
|
[can_pass_as_mercury_type]).
|
|
:- pragma foreign_type("C#", string_buffer, "char[]").
|
|
:- pragma foreign_type("Java", string_buffer, "java.lang.StringBuilder").
|
|
|
|
:- pred alloc_buffer(int::in, string_buffer::uo) is det.
|
|
|
|
:- pragma foreign_proc("C",
|
|
alloc_buffer(Size::in, Buffer::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
MR_allocate_aligned_string_msg(Buffer, Size, MR_ALLOC_ID);
|
|
Buffer[Size] = '\\0';
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
alloc_buffer(Size::in, Buffer::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Buffer = new char[Size];
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
alloc_buffer(Size::in, Buffer::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Buffer = new java.lang.StringBuilder(Size);
|
|
").
|
|
|
|
alloc_buffer(_Size, Buffer) :-
|
|
Buffer = string_buffer("").
|
|
|
|
:- pred buffer_to_string(string_buffer::di, string::uo) is det.
|
|
|
|
:- pragma foreign_proc("C",
|
|
buffer_to_string(Buffer::di, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe,
|
|
does_not_affect_liveness],
|
|
"
|
|
Str = Buffer;
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
buffer_to_string(Buffer::di, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Str = new string(Buffer);
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
buffer_to_string(Buffer::di, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Str = Buffer.toString();
|
|
").
|
|
|
|
buffer_to_string(Buffer, Str) :-
|
|
Buffer = string_buffer(Str).
|
|
|
|
:- pred copy_into_buffer(string_buffer::di, string_buffer::uo,
|
|
int::in, int::out, string::in, int::in, int::in) is det.
|
|
|
|
:- pragma foreign_proc("C",
|
|
copy_into_buffer(Dest0::di, Dest::uo, DestOffset0::in, DestOffset::out,
|
|
Src::in, SrcStart::in, SrcEnd::in),
|
|
[will_not_call_mercury, promise_pure, thread_safe,
|
|
does_not_affect_liveness],
|
|
"
|
|
size_t count;
|
|
|
|
MR_CHECK_EXPR_TYPE(Dest0, char *);
|
|
MR_CHECK_EXPR_TYPE(Dest, char *);
|
|
|
|
count = SrcEnd - SrcStart;
|
|
Dest = Dest0;
|
|
MR_memcpy(Dest + DestOffset0, Src + SrcStart, count);
|
|
DestOffset = DestOffset0 + count;
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
copy_into_buffer(Dest0::di, Dest::uo, DestOffset0::in, DestOffset::out,
|
|
Src::in, SrcStart::in, SrcEnd::in),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
int count = SrcEnd - SrcStart;
|
|
Dest = Dest0;
|
|
Src.CopyTo(SrcStart, Dest, DestOffset0, count);
|
|
DestOffset = DestOffset0 + count;
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
copy_into_buffer(Dest0::di, Dest::uo, DestOffset0::in, DestOffset::out,
|
|
Src::in, SrcStart::in, SrcEnd::in),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
// The Java implementation does not actually use the dest offsets.
|
|
Dest = Dest0;
|
|
Dest.append(Src, SrcStart, SrcEnd);
|
|
DestOffset = DestOffset0 + (SrcEnd - SrcStart);
|
|
").
|
|
|
|
copy_into_buffer(Dest0, Dest, DestOffset0, DestOffset, Src,
|
|
SrcStart, SrcEnd) :-
|
|
Dest0 = string_buffer(Buffer0),
|
|
Buffer = Buffer0 ++ string.unsafe_between(Src, SrcStart, SrcEnd),
|
|
DestOffset = DestOffset0 + (SrcEnd - SrcStart),
|
|
Dest = string_buffer(Buffer).
|
|
|
|
%---------------------%
|
|
|
|
append_string_pieces(Pieces, String) :-
|
|
DoCheck = yes,
|
|
sum_piece_lengths($pred, DoCheck, Pieces, 0, BufferLen),
|
|
do_append_string_pieces(Pieces, BufferLen, String).
|
|
|
|
unsafe_append_string_pieces(Pieces, String) :-
|
|
DoCheck = no,
|
|
sum_piece_lengths($pred, DoCheck, Pieces, 0, BufferLen),
|
|
do_append_string_pieces(Pieces, BufferLen, String).
|
|
|
|
:- pred sum_piece_lengths(string::in, bool::in, list(string_piece)::in,
|
|
int::in, int::out) is det.
|
|
|
|
sum_piece_lengths(PredName, DoCheck, Pieces, !Len) :-
|
|
(
|
|
Pieces = []
|
|
;
|
|
Pieces = [Piece | TailPieces],
|
|
(
|
|
Piece = string(Str),
|
|
PieceLen = string.count_code_units(Str)
|
|
;
|
|
Piece = substring(BaseStr, Start, End),
|
|
(
|
|
DoCheck = yes,
|
|
BaseLen = string.count_code_units(BaseStr),
|
|
( if
|
|
Start >= 0,
|
|
Start =< BaseLen,
|
|
End >= Start,
|
|
End =< BaseLen
|
|
then
|
|
true
|
|
else
|
|
unexpected(PredName, "substring index out of range")
|
|
)
|
|
;
|
|
DoCheck = no
|
|
),
|
|
PieceLen = End - Start
|
|
),
|
|
!:Len = !.Len + PieceLen,
|
|
sum_piece_lengths(PredName, DoCheck, TailPieces, !Len)
|
|
).
|
|
|
|
:- pred do_append_string_pieces(list(string_piece)::in, int::in, string::uo)
|
|
is det.
|
|
|
|
do_append_string_pieces(Pieces, BufferLen, String) :-
|
|
alloc_buffer(BufferLen, Buffer0),
|
|
list.foldl2(copy_piece_into_buffer, Pieces, 0, End, Buffer0, Buffer),
|
|
expect(unify(End, BufferLen), $pred, "End != BufferLen"),
|
|
buffer_to_string(Buffer, String).
|
|
|
|
:- pred copy_piece_into_buffer(string_piece::in, int::in, int::out,
|
|
string_buffer::di, string_buffer::uo) is det.
|
|
|
|
copy_piece_into_buffer(Piece, !DestOffset, !DestBuffer) :-
|
|
(
|
|
Piece = string(Src),
|
|
SrcStart = 0,
|
|
SrcEnd = string.count_code_units(Src)
|
|
;
|
|
Piece = substring(Src, SrcStart, SrcEnd)
|
|
),
|
|
copy_into_buffer(!DestBuffer, !DestOffset, Src, SrcStart, SrcEnd).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Splitting up strings.
|
|
%
|
|
|
|
:- pragma promise_equivalent_clauses(pred(first_char/3)).
|
|
|
|
first_char(Str::in, First::in, Rest::in) :-
|
|
first_char_rest_in(Str, First, Rest).
|
|
first_char(Str::in, First::uo, Rest::in) :-
|
|
first_char_rest_in(Str, First, Rest).
|
|
first_char(Str::in, First::in, Rest::uo) :-
|
|
first_char_rest_out(Str, First, Rest).
|
|
first_char(Str::in, First::uo, Rest::uo) :-
|
|
first_char_rest_out(Str, First, Rest).
|
|
first_char(Str::uo, First::in, Rest::in) :-
|
|
first_char_str_out(Str, First, Rest).
|
|
|
|
:- pred first_char_rest_in(string, char, string).
|
|
:- mode first_char_rest_in(in, in, in) is semidet.
|
|
:- mode first_char_rest_in(in, uo, in) is semidet.
|
|
|
|
first_char_rest_in(Str, First, Rest) :-
|
|
index_next_repl(Str, 0, NextIndex, First0, not_replaced),
|
|
not is_surrogate(First0),
|
|
unsafe_promise_unique(First0, First),
|
|
unsafe_compare_substrings((=), Str, NextIndex, Rest,
|
|
0, string.count_code_units(Rest)).
|
|
|
|
:- pred first_char_rest_out(string, char, string).
|
|
:- mode first_char_rest_out(in, in, uo) is semidet.
|
|
:- mode first_char_rest_out(in, uo, uo) is semidet.
|
|
|
|
first_char_rest_out(Str, First, Rest) :-
|
|
index_next_repl(Str, 0, NextIndex, First0, not_replaced),
|
|
not is_surrogate(First0),
|
|
unsafe_promise_unique(First0, First),
|
|
unsafe_between(Str, NextIndex, string.count_code_units(Str), Rest).
|
|
|
|
:- pred first_char_str_out(string, char, string).
|
|
:- mode first_char_str_out(uo, in, in) is det.
|
|
|
|
first_char_str_out(Str, First, Rest) :-
|
|
( if char.to_int(First, 0) then
|
|
unexpected($pred, "null character")
|
|
else if char.is_surrogate(First) then
|
|
unexpected($pred, "surrogate code point")
|
|
else
|
|
Str = char_to_string(First) ++ Rest
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
split(Str, Index, Left, Right) :-
|
|
( if Index =< 0 then
|
|
Left = "",
|
|
Right = Str
|
|
else
|
|
Len = string.count_code_units(Str),
|
|
( if Index >= Len then
|
|
Left = Str,
|
|
Right = ""
|
|
else
|
|
unsafe_between(Str, 0, Index, Left),
|
|
unsafe_between(Str, Index, Len, Right)
|
|
)
|
|
).
|
|
|
|
split_by_code_point(Str, Count, Left, Right) :-
|
|
( if code_point_offset(Str, Count, Offset) then
|
|
split(Str, Offset, Left, Right)
|
|
else if Count =< 0 then
|
|
Left = "",
|
|
Right = Str
|
|
else
|
|
Left = Str,
|
|
Right = ""
|
|
).
|
|
|
|
split_by_codepoint(Str, Count, Left, Right) :-
|
|
split_by_code_point(Str, Count, Left, Right).
|
|
|
|
%---------------------%
|
|
|
|
left(S1, N) = S2 :-
|
|
left(S1, N, S2).
|
|
|
|
left(String, Count, LeftString) :-
|
|
between(String, 0, Count, LeftString).
|
|
|
|
left_by_code_point(String, Count) = LeftString :-
|
|
left_by_code_point(String, Count, LeftString).
|
|
|
|
left_by_code_point(String, Count, LeftString) :-
|
|
split_by_code_point(String, Count, LeftString, _RightString).
|
|
|
|
left_by_codepoint(String, Count) = LeftString :-
|
|
left_by_code_point(String, Count, LeftString).
|
|
|
|
left_by_codepoint(String, Count, LeftString) :-
|
|
split_by_code_point(String, Count, LeftString, _RightString).
|
|
|
|
right(S1, N) = S2 :-
|
|
right(S1, N, S2).
|
|
|
|
right(String, RightCount, RightString) :-
|
|
string.count_code_units(String, Length),
|
|
Start = Length - RightCount,
|
|
between(String, Start, Length, RightString).
|
|
|
|
right_by_code_point(String, RightCount) = RightString :-
|
|
right_by_code_point(String, RightCount, RightString).
|
|
|
|
right_by_code_point(String, RightCount, RightString) :-
|
|
count_code_points(String, TotalCount),
|
|
LeftCount = TotalCount - RightCount,
|
|
split_by_code_point(String, LeftCount, _LeftString, RightString).
|
|
|
|
right_by_codepoint(String, RightCount) = RightString :-
|
|
right_by_code_point(String, RightCount, RightString).
|
|
|
|
right_by_codepoint(String, RightCount, RightString) :-
|
|
right_by_code_point(String, RightCount, RightString).
|
|
|
|
%---------------------%
|
|
|
|
between(Str, Start, End) = SubString :-
|
|
between(Str, Start, End, SubString).
|
|
|
|
between(Str, Start, End, SubStr) :-
|
|
Len = string.count_code_units(Str),
|
|
( if Start =< 0 then
|
|
ClampStart = 0
|
|
else if Start >= Len then
|
|
ClampStart = Len
|
|
else
|
|
ClampStart = Start
|
|
),
|
|
( if End =< ClampStart then
|
|
ClampEnd = ClampStart
|
|
else if End >= Len then
|
|
ClampEnd = Len
|
|
else
|
|
ClampEnd = End
|
|
),
|
|
unsafe_between(Str, ClampStart, ClampEnd, SubStr).
|
|
|
|
%---------------------%
|
|
|
|
between_code_points(Str, Start, End) = SubString :-
|
|
between_code_points(Str, Start, End, SubString).
|
|
|
|
between_code_points(Str, Start, End, SubString) :-
|
|
( if Start < 0 then
|
|
StartOffset = 0
|
|
else if code_point_offset(Str, Start, StartOffset0) then
|
|
StartOffset = StartOffset0
|
|
else
|
|
StartOffset = string.count_code_units(Str)
|
|
),
|
|
( if End < 0 then
|
|
EndOffset = 0
|
|
else if code_point_offset(Str, End, EndOffset0) then
|
|
EndOffset = EndOffset0
|
|
else
|
|
EndOffset = string.count_code_units(Str)
|
|
),
|
|
% between/4 will enforce StartOffset =< EndOffset.
|
|
between(Str, StartOffset, EndOffset, SubString).
|
|
|
|
between_codepoints(Str, Start, End) = SubString :-
|
|
between_code_points(Str, Start, End, SubString).
|
|
|
|
between_codepoints(Str, Start, End, SubString) :-
|
|
between_code_points(Str, Start, End, SubString).
|
|
|
|
%---------------------%
|
|
|
|
unsafe_between(Str, Start, End) = SubString :-
|
|
unsafe_between(Str, Start, End, SubString).
|
|
|
|
:- pragma foreign_proc("C",
|
|
unsafe_between(Str::in, Start::in, End::in, SubString::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"{
|
|
MR_Integer Count;
|
|
|
|
Count = End - Start;
|
|
MR_allocate_aligned_string_msg(SubString, Count, MR_ALLOC_ID);
|
|
MR_memcpy(SubString, Str + Start, Count);
|
|
SubString[Count] = '\\0';
|
|
}").
|
|
:- pragma foreign_proc("C#",
|
|
unsafe_between(Str::in, Start::in, End::in, SubString::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"{
|
|
SubString = Str.Substring(Start, End - Start);
|
|
}").
|
|
:- pragma foreign_proc("Java",
|
|
unsafe_between(Str::in, Start::in, End::in, SubString::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
SubString = Str.substring(Start, End);
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
words_separator(SepP, String) = Words :-
|
|
skip_to_next_word_start(SepP, String, 0, WordStart),
|
|
words_loop(SepP, String, WordStart, Words).
|
|
|
|
words(String) = words_separator(char.is_whitespace, String).
|
|
|
|
:- pred words_loop(pred(char)::in(pred(in) is semidet), string::in, int::in,
|
|
list(string)::out) is det.
|
|
|
|
words_loop(SepP, String, WordStartPos, Words) :-
|
|
skip_to_word_end(SepP, String, WordStartPos, PastWordEndPos),
|
|
( if PastWordEndPos = WordStartPos then
|
|
Words = []
|
|
else
|
|
unsafe_between(String, WordStartPos, PastWordEndPos, HeadWord),
|
|
skip_to_next_word_start(SepP, String, PastWordEndPos,
|
|
NextWordStartPos),
|
|
( if PastWordEndPos = NextWordStartPos then
|
|
Words = [HeadWord]
|
|
else
|
|
words_loop(SepP, String, NextWordStartPos, TailWords),
|
|
Words = [HeadWord | TailWords]
|
|
)
|
|
).
|
|
|
|
% Return the smallest NextWordStartPos >= CurPos such that
|
|
% `not SepP(String[NextWordStartPos])'.
|
|
%
|
|
:- pred skip_to_next_word_start(pred(char)::in(pred(in) is semidet),
|
|
string::in, int::in, int::out) is det.
|
|
|
|
skip_to_next_word_start(SepP, String, CurPos, NextWordStartPos) :-
|
|
( if
|
|
unsafe_index_next_repl(String, CurPos, NextPos, Char, not_replaced),
|
|
SepP(Char)
|
|
then
|
|
skip_to_next_word_start(SepP, String, NextPos, NextWordStartPos)
|
|
else
|
|
NextWordStartPos = CurPos
|
|
).
|
|
|
|
% Return the smallest NextWordStartPos >= CurPos such that
|
|
% SepP(String[NextWordStartPos]).
|
|
%
|
|
:- pred skip_to_word_end(pred(char)::in(pred(in) is semidet),
|
|
string::in, int::in, int::out) is det.
|
|
|
|
skip_to_word_end(SepP, String, CurPos, PastWordEndPos) :-
|
|
( if
|
|
unsafe_index_next_repl(String, CurPos, NextPos, Char, MaybeReplaced)
|
|
then
|
|
( if
|
|
MaybeReplaced = not_replaced,
|
|
SepP(Char)
|
|
then
|
|
PastWordEndPos = CurPos
|
|
else
|
|
skip_to_word_end(SepP, String, NextPos, PastWordEndPos)
|
|
)
|
|
else
|
|
PastWordEndPos = CurPos
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
split_at_separator(DelimP, Str) = Segments :-
|
|
Len = string.count_code_units(Str),
|
|
split_at_separator_loop(DelimP, Str, Len, Len, [], Segments).
|
|
|
|
:- pred split_at_separator_loop(pred(char)::in(pred(in) is semidet),
|
|
string::in, int::in, int::in, list(string)::in, list(string)::out) is det.
|
|
|
|
split_at_separator_loop(DelimP, Str, CurPos, PastSegEnd, !Segments) :-
|
|
% We walk Str backwards, extending the accumulated list of segments
|
|
% as we find code points matching DelimP.
|
|
%
|
|
% Invariant: 0 =< CurPos =< length(Str).
|
|
% PastSegEnd is one past the last index of the current segment.
|
|
%
|
|
( if unsafe_prev_index_repl(Str, CurPos, PrevPos, Char, MaybeReplaced) then
|
|
( if
|
|
MaybeReplaced = not_replaced,
|
|
DelimP(Char)
|
|
then
|
|
% Chop here.
|
|
SegStart = CurPos,
|
|
Segment = unsafe_between(Str, SegStart, PastSegEnd),
|
|
!:Segments = [Segment | !.Segments],
|
|
split_at_separator_loop(DelimP, Str, PrevPos, PrevPos, !Segments)
|
|
else
|
|
% Extend current segment.
|
|
split_at_separator_loop(DelimP, Str, PrevPos, PastSegEnd,
|
|
!Segments)
|
|
)
|
|
else
|
|
% We have reached the beginning of the string.
|
|
Segment = unsafe_between(Str, 0, PastSegEnd),
|
|
!:Segments = [Segment | !.Segments]
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
split_at_char(C, String) =
|
|
split_at_separator(unify(C), String).
|
|
|
|
%---------------------%
|
|
|
|
split_at_string(Separator, Str) = Segments :-
|
|
split_at_string_loop(Separator, string.count_code_units(Separator), Str, 0,
|
|
Segments).
|
|
|
|
:- pred split_at_string_loop(string::in, int::in, string::in, int::in,
|
|
list(string)::out) is det.
|
|
|
|
split_at_string_loop(Separator, SeparatorLen, Str, CurPos, Segments) :-
|
|
( if unsafe_sub_string_search_start(Str, Separator, CurPos, SepPos) then
|
|
HeadSegment = unsafe_between(Str, CurPos, SepPos),
|
|
% This call is tail recursive when targeting C because we compile
|
|
% all library modules with --optimize-constructor-last-call.
|
|
%
|
|
% When targeting languages other than C, that option has no effect,
|
|
% but that is ok, because in the vast majority of cases, Str is
|
|
% not very long.
|
|
split_at_string_loop(Separator, SeparatorLen,
|
|
Str, SepPos + SeparatorLen, TailSegments),
|
|
Segments = [HeadSegment | TailSegments]
|
|
else
|
|
unsafe_between(Str, CurPos, string.count_code_units(Str), LastSegment),
|
|
Segments = [LastSegment]
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
split_into_lines(Str) = Lines :-
|
|
split_into_lines_loop(Str, 0, [], RevLines),
|
|
list.reverse(RevLines, Lines).
|
|
|
|
:- pred split_into_lines_loop(string::in, int::in,
|
|
list(string)::in, list(string)::out) is det.
|
|
|
|
split_into_lines_loop(Str, CurPos, !RevLines) :-
|
|
( if unsafe_sub_string_search_start(Str, "\n", CurPos, SepPos) then
|
|
Line = unsafe_between(Str, CurPos, SepPos),
|
|
!:RevLines = [Line | !.RevLines],
|
|
% Unlike split_at_string, split_into_lines can absolutely be expected
|
|
% to be invoked on huge strings fairly frequently, so we want this
|
|
% to be tail recursive even if we are not targeting C. This is why
|
|
% this is a tail call. The price of making this a tail call is
|
|
% the call to list.reverse in our parent.
|
|
split_into_lines_loop(Str, SepPos + 1, !RevLines)
|
|
else
|
|
StrLen = string.count_code_units(Str),
|
|
( if CurPos = StrLen then
|
|
true
|
|
else
|
|
unsafe_between(Str, CurPos, StrLen, LastLine),
|
|
!:RevLines = [LastLine | !.RevLines]
|
|
)
|
|
).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Dealing with prefixes and suffixes.
|
|
%
|
|
|
|
prefix(String, Prefix) :-
|
|
compare_substrings((=), String, 0, Prefix, 0,
|
|
string.count_code_units(Prefix)).
|
|
|
|
suffix(String, Suffix) :-
|
|
StringLength = string.count_code_units(String),
|
|
SuffixLength = string.count_code_units(Suffix),
|
|
StringStart = StringLength - SuffixLength,
|
|
compare_substrings((=), String, StringStart, Suffix, 0, SuffixLength).
|
|
|
|
%---------------------%
|
|
|
|
remove_prefix(Prefix, String, Suffix) :-
|
|
append(Prefix, Suffix, String).
|
|
|
|
det_remove_prefix(Prefix, String, Suffix) :-
|
|
( if remove_prefix(Prefix, String, SuffixPrime) then
|
|
Suffix = SuffixPrime
|
|
else
|
|
unexpected($pred, "string does not have the given prefix")
|
|
).
|
|
|
|
remove_prefix_if_present(Prefix, String) = Out :-
|
|
( if remove_prefix(Prefix, String, Suffix) then
|
|
Out = Suffix
|
|
else
|
|
Out = String
|
|
).
|
|
|
|
add_prefix(Prefix, String) = Prefix ++ String.
|
|
|
|
remove_suffix(String, Suffix, Prefix) :-
|
|
append(Prefix, Suffix, String).
|
|
|
|
det_remove_suffix(String, Suffix) = Prefix :-
|
|
( if remove_suffix(String, Suffix, PrefixPrime) then
|
|
Prefix = PrefixPrime
|
|
else
|
|
unexpected($pred, "string does not have the given suffix")
|
|
).
|
|
|
|
remove_suffix_if_present(Suffix, String) = Out :-
|
|
( if remove_suffix(String, Suffix, Prefix) then
|
|
Out = Prefix
|
|
else
|
|
Out = String
|
|
).
|
|
|
|
add_suffix(Suffix, Str) = Str ++ Suffix.
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Transformations of strings.
|
|
%
|
|
|
|
capitalize_first(S1) = S2 :-
|
|
capitalize_first(S1, S2).
|
|
|
|
capitalize_first(S0, S) :-
|
|
( if
|
|
unsafe_index_next(S0, 0, _NextIndex, C),
|
|
char.to_upper(C, UpperC),
|
|
C \= UpperC
|
|
then
|
|
unsafe_set_char(UpperC, 0, S0, S)
|
|
else
|
|
S = S0
|
|
).
|
|
|
|
uncapitalize_first(S1) = S2 :-
|
|
uncapitalize_first(S1, S2).
|
|
|
|
uncapitalize_first(S0, S) :-
|
|
( if
|
|
unsafe_index_next(S0, 0, _NextIndex, C),
|
|
char.to_lower(C, LowerC),
|
|
C \= LowerC
|
|
then
|
|
unsafe_set_char(LowerC, 0, S0, S)
|
|
else
|
|
S = S0
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
to_upper(S1) = S2 :-
|
|
to_upper(S1, S2).
|
|
|
|
:- pragma promise_equivalent_clauses(pred(to_upper/2)).
|
|
|
|
to_upper(StrIn::in, StrOut::uo) :-
|
|
% Use to_code_unit_list instead of to_char_list to preserve ill-formed
|
|
% sequences.
|
|
to_code_unit_list(StrIn, CodeList0),
|
|
list.map(to_upper_code_unit, CodeList0, CodeList),
|
|
( if from_code_unit_list_allow_ill_formed(CodeList, StrPrime) then
|
|
StrOut = StrPrime
|
|
else
|
|
unexpected($pred, "string.from_code_unit_list_allow_ill_formed failed")
|
|
).
|
|
|
|
to_upper(X::in, Y::in) :-
|
|
string.count_code_units(X, LenX),
|
|
string.count_code_units(Y, LenY),
|
|
( if LenX = LenY then
|
|
check_upper_loop(X, Y, 0, LenX)
|
|
else
|
|
fail
|
|
).
|
|
|
|
:- pragma foreign_proc("C",
|
|
to_upper(StrIn::in, StrOut::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
MR_Integer i;
|
|
|
|
MR_make_aligned_string_copy_msg(StrOut, StrIn, MR_ALLOC_ID);
|
|
|
|
for (i = 0; StrOut[i] != '\\0'; i++) {
|
|
if (StrOut[i] >= 'a' && StrOut[i] <= 'z') {
|
|
StrOut[i] = StrOut[i] - 'a' + 'A';
|
|
}
|
|
}
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
to_upper(StrIn::in, StrOut::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
char[] cs = StrIn.ToCharArray();
|
|
for (int i = 0; i < cs.Length; i++) {
|
|
if (cs[i] >= 'a' && cs[i] <= 'z') {
|
|
cs[i] = (char)(cs[i] - 'a' + 'A');
|
|
}
|
|
}
|
|
StrOut = new System.String(cs);
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
to_upper(StrIn::in, StrOut::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
char[] cs = StrIn.toCharArray();
|
|
for (int i = 0; i < cs.length; i++) {
|
|
if (cs[i] >= 'a' && cs[i] <= 'z') {
|
|
cs[i] = (char)(cs[i] - 'a' + 'A');
|
|
}
|
|
}
|
|
StrOut = new String(cs);
|
|
").
|
|
|
|
:- pred check_upper_loop(string::in, string::in, int::in, int::in) is semidet.
|
|
|
|
check_upper_loop(X, Y, Index, End) :-
|
|
( if Index = End then
|
|
true
|
|
else
|
|
unsafe_index_code_unit(X, Index, CodeX),
|
|
unsafe_index_code_unit(Y, Index, CodeY),
|
|
to_upper_code_unit(CodeX, CodeY),
|
|
check_upper_loop(X, Y, Index + 1, End)
|
|
).
|
|
|
|
:- pred to_upper_code_unit(int::in, int::out) is det.
|
|
|
|
to_upper_code_unit(Code0, Code) :-
|
|
( if
|
|
Code0 >= to_int('a'),
|
|
Code0 =< to_int('z')
|
|
then
|
|
Code = Code0 - to_int('a') + to_int('A')
|
|
else
|
|
Code = Code0
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
to_lower(S1) = S2 :-
|
|
to_lower(S1, S2).
|
|
|
|
:- pragma promise_equivalent_clauses(pred(to_lower/2)).
|
|
|
|
to_lower(StrIn::in, StrOut::uo) :-
|
|
% Use to_code_unit_list instead of to_char_list to preserve ill-formed
|
|
% sequences.
|
|
to_code_unit_list(StrIn, CodeList0),
|
|
list.map(to_lower_code_unit, CodeList0, CodeList),
|
|
( if from_code_unit_list_allow_ill_formed(CodeList, StrPrime) then
|
|
StrOut = StrPrime
|
|
else
|
|
unexpected($pred, "string.from_code_unit_list_allow_ill_formed failed")
|
|
).
|
|
|
|
to_lower(X::in, Y::in) :-
|
|
string.count_code_units(X, LenX),
|
|
string.count_code_units(Y, LenY),
|
|
( if LenX = LenY then
|
|
check_lower_loop(X, Y, 0, LenX)
|
|
else
|
|
fail
|
|
).
|
|
|
|
:- pragma foreign_proc("C",
|
|
to_lower(StrIn::in, StrOut::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
MR_Integer i;
|
|
|
|
MR_make_aligned_string_copy_msg(StrOut, StrIn, MR_ALLOC_ID);
|
|
|
|
for (i = 0; StrOut[i] != '\\0'; i++) {
|
|
if (StrOut[i] >= 'A' && StrOut[i] <= 'Z') {
|
|
StrOut[i] = StrOut[i] - 'A' + 'a';
|
|
}
|
|
}
|
|
").
|
|
:- pragma foreign_proc("C#",
|
|
to_lower(StrIn::in, StrOut::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
char[] cs = StrIn.ToCharArray();
|
|
for (int i = 0; i < cs.Length; i++) {
|
|
if (cs[i] >= 'A' && cs[i] <= 'Z') {
|
|
cs[i] = (char)(cs[i] - 'A' + 'a');
|
|
}
|
|
}
|
|
StrOut = new System.String(cs);
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
to_lower(StrIn::in, StrOut::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
char[] cs = StrIn.toCharArray();
|
|
for (int i = 0; i < cs.length; i++) {
|
|
if (cs[i] >= 'A' && cs[i] <= 'Z') {
|
|
cs[i] = (char)(cs[i] - 'A' + 'a');
|
|
}
|
|
}
|
|
StrOut = new String(cs);
|
|
").
|
|
|
|
:- pred check_lower_loop(string::in, string::in, int::in, int::in) is semidet.
|
|
|
|
check_lower_loop(X, Y, Index, End) :-
|
|
( if Index = End then
|
|
true
|
|
else
|
|
unsafe_index_code_unit(X, Index, CodeX),
|
|
unsafe_index_code_unit(Y, Index, CodeY),
|
|
to_lower_code_unit(CodeX, CodeY),
|
|
check_lower_loop(X, Y, Index + 1, End)
|
|
).
|
|
|
|
:- pred to_lower_code_unit(int::in, int::out) is det.
|
|
|
|
to_lower_code_unit(Code0, Code) :-
|
|
( if
|
|
Code0 >= to_int('A'),
|
|
Code0 =< to_int('Z')
|
|
then
|
|
Code = Code0 - to_int('A') + to_int('a')
|
|
else
|
|
Code = Code0
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
pad_left(S1, C, N) = S2 :-
|
|
pad_left(S1, C, N, S2).
|
|
|
|
pad_left(String0, PadChar, Width, String) :-
|
|
count_code_points(String0, Length),
|
|
( if Length < Width then
|
|
Count = Width - Length,
|
|
duplicate_char(PadChar, Count, PadString),
|
|
append(PadString, String0, String)
|
|
else
|
|
String = String0
|
|
).
|
|
|
|
pad_right(S1, C, N) = S2 :-
|
|
pad_right(S1, C, N, S2).
|
|
|
|
pad_right(String0, PadChar, Width, String) :-
|
|
count_code_points(String0, Length),
|
|
( if Length < Width then
|
|
Count = Width - Length,
|
|
duplicate_char(PadChar, Count, PadString),
|
|
append(String0, PadString, String)
|
|
else
|
|
String = String0
|
|
).
|
|
|
|
chomp(S) = Chomp :-
|
|
( if prev_index(S, string.count_code_units(S), Index, '\n') then
|
|
Chomp = unsafe_between(S, 0, Index)
|
|
else
|
|
Chomp = S
|
|
).
|
|
|
|
strip(S0) = S :-
|
|
L = prefix_length(char.is_whitespace, S0),
|
|
R = suffix_length(char.is_whitespace, S0),
|
|
Start = L,
|
|
End = max(L, string.count_code_units(S0) - R),
|
|
S = unsafe_between(S0, Start, End).
|
|
|
|
lstrip(S) = lstrip_pred(char.is_whitespace, S).
|
|
|
|
rstrip(S) = rstrip_pred(char.is_whitespace, S).
|
|
|
|
lstrip_pred(P, S0) = S :-
|
|
L = prefix_length(P, S0),
|
|
S = unsafe_between(S0, L, string.count_code_units(S0)).
|
|
|
|
rstrip_pred(P, S0) = S :-
|
|
R = suffix_length(P, S0),
|
|
S = unsafe_between(S0, 0, string.count_code_units(S0) - R).
|
|
|
|
%---------------------%
|
|
|
|
replace(Str, Pat, Subst, Result) :-
|
|
sub_string_search(Str, Pat, PatStart),
|
|
Pieces = [
|
|
substring(Str, 0, PatStart),
|
|
substring(Subst, 0, string.count_code_units(Subst)),
|
|
substring(Str, PatStart + string.count_code_units(Pat),
|
|
string.count_code_units(Str))
|
|
],
|
|
unsafe_append_string_pieces(Pieces, Result).
|
|
|
|
replace_all(Str, Pat, Subst) = Result :-
|
|
replace_all(Str, Pat, Subst, Result).
|
|
|
|
replace_all(Str, Pat, Subst, Result) :-
|
|
( if Pat = "" then
|
|
replace_all_empty_pat(Str, Subst, Result)
|
|
else
|
|
% Using substring instead of string avoids two calls to string.length
|
|
% every time that SubstPiece appears in Pieces.
|
|
SubstPiece = substring(Subst, 0, string.count_code_units(Subst)),
|
|
replace_all_loop(Str, Pat, string.count_code_units(Pat), SubstPiece, 0,
|
|
[], RevPieces),
|
|
list.reverse(RevPieces, Pieces),
|
|
unsafe_append_string_pieces(Pieces, Result)
|
|
).
|
|
|
|
replace_all_sv(Pat, Subst, Str, Result) :-
|
|
replace_all(Str, Pat, Subst, Result).
|
|
|
|
:- pred replace_all_empty_pat(string::in, string::in, string::uo) is det.
|
|
|
|
replace_all_empty_pat(Str, Subst, Result) :-
|
|
% This implementation is not the most efficient, but it is not expected
|
|
% to be used much in practice.
|
|
to_code_unit_list(Subst, SubstCodes),
|
|
Codes0 = SubstCodes,
|
|
replace_all_empty_pat_loop(Str, SubstCodes, string.count_code_units(Str),
|
|
Codes0, Codes),
|
|
( if string.from_code_unit_list_allow_ill_formed(Codes, ResultPrime) then
|
|
Result = ResultPrime
|
|
else
|
|
unexpected($pred, "string.from_code_unit_list_allow_ill_formed failed")
|
|
).
|
|
|
|
:- pred replace_all_empty_pat_loop(string::in, list(int)::in, int::in,
|
|
list(int)::in, list(int)::out) is det.
|
|
|
|
replace_all_empty_pat_loop(Str, Subst, Index, Codes0, Codes) :-
|
|
( if unsafe_prev_index(Str, Index, PrevIndex, Char) then
|
|
char.to_int(Char, CharInt),
|
|
( if CharInt =< 0x7f then
|
|
% Fast path for single code unit code points.
|
|
Codes1 = [CharInt | Codes0]
|
|
else
|
|
prepend_code_units(Str, PrevIndex, Index - 1, Codes0, Codes1)
|
|
),
|
|
Codes2 = Subst ++ Codes1,
|
|
replace_all_empty_pat_loop(Str, Subst, PrevIndex, Codes2, Codes)
|
|
else
|
|
Codes = Codes0
|
|
).
|
|
|
|
:- pred prepend_code_units(string::in, int::in, int::in,
|
|
list(int)::in, list(int)::out) is det.
|
|
|
|
prepend_code_units(Str, FirstIndex, Index, Codes0, Codes) :-
|
|
unsafe_index_code_unit(Str, Index, Code),
|
|
Codes1 = [Code | Codes0],
|
|
( if Index = FirstIndex then
|
|
Codes = Codes1
|
|
else
|
|
prepend_code_units(Str, FirstIndex, Index - 1, Codes1, Codes)
|
|
).
|
|
|
|
:- pred replace_all_loop(string::in, string::in, int::in, string_piece::in,
|
|
int::in, list(string_piece)::in, list(string_piece)::out) is det.
|
|
|
|
replace_all_loop(Str, Pat, PatLength, SubstPiece, BeginAt,
|
|
RevPieces0, RevPieces) :-
|
|
( if unsafe_sub_string_search_start(Str, Pat, BeginAt, PatStart) then
|
|
InitialPiece = substring(Str, BeginAt, PatStart),
|
|
RevPieces1 = [SubstPiece, InitialPiece | RevPieces0],
|
|
replace_all_loop(Str, Pat, PatLength, SubstPiece, PatStart + PatLength,
|
|
RevPieces1, RevPieces)
|
|
else
|
|
TailPiece = substring(Str, BeginAt, string.count_code_units(Str)),
|
|
RevPieces = [TailPiece | RevPieces0]
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
word_wrap(Str, N) = word_wrap_separator(Str, N, "").
|
|
|
|
word_wrap_separator(Str, N, BrokenWordSep0) = Wrapped :-
|
|
Words = words_separator(char.is_whitespace, Str),
|
|
BrokenWordSepLen0 = count_code_points(BrokenWordSep0),
|
|
( if BrokenWordSepLen0 >= N then
|
|
BrokenWordSep = "",
|
|
BrokenWordSepLen = 0
|
|
else
|
|
BrokenWordSep = BrokenWordSep0,
|
|
BrokenWordSepLen = BrokenWordSepLen0
|
|
),
|
|
CurCol = 1,
|
|
MaxCol = N,
|
|
word_wrap_loop(Words, BrokenWordSep, BrokenWordSepLen, CurCol, MaxCol,
|
|
cord.init, WordsSpacesNlsCord),
|
|
WordsSpacesNls = cord.list(WordsSpacesNlsCord),
|
|
Wrapped = append_list(WordsSpacesNls).
|
|
|
|
% word_wrap_loop(Words, BrokenWordSep, BrokenWordSepLen, CurCol, MaxCol,
|
|
% !WordsSpacesNlsCord):
|
|
%
|
|
% This predicate loops over a list of words to wrap and returns
|
|
% a list of strings, with each containing a word, a spaces or a newline.
|
|
% When this list of strings (which we build as an accumulator)
|
|
% is reversed and appended together, the result should be
|
|
% the linewrapped version of the original word stream.
|
|
%
|
|
% Words is the list of words to process. BrokenWordSep is the string to use
|
|
% as a separator if a word has to split between two lines, because it is
|
|
% too long to fit on one line. BrokenWordSepLen is the length of
|
|
% BrokenWordSep.
|
|
%
|
|
% CurCol is the column where the next character should be written
|
|
% if there is space for a whole word, and MaxCol is the number of
|
|
% columns in a line.
|
|
%
|
|
:- pred word_wrap_loop(list(string)::in, string::in, int::in,
|
|
int::in, int::in, cord(string)::in, cord(string)::out) is det.
|
|
|
|
word_wrap_loop([], _, _, _, _, !WordsSpacesNlsCord).
|
|
word_wrap_loop([Word | Words], BrokenWordSep, BrokenWordSepLen, CurCol, MaxCol,
|
|
!WordsSpacesNlsCord) :-
|
|
WordLen = count_code_points(Word),
|
|
( if
|
|
% We are on the first column and the length of the word
|
|
% is less than the line length.
|
|
CurCol = 1,
|
|
WordLen < MaxCol
|
|
then
|
|
NewWords = Words,
|
|
cord.snoc(Word, !WordsSpacesNlsCord),
|
|
NewCol = CurCol + WordLen
|
|
else if
|
|
% The word takes up the whole line.
|
|
CurCol = 1,
|
|
WordLen = MaxCol
|
|
then
|
|
NewWords = Words,
|
|
cord.snoc(Word, !WordsSpacesNlsCord),
|
|
% We only add a newline if there are more words to follow.
|
|
(
|
|
NewWords = []
|
|
;
|
|
NewWords = [_ | _],
|
|
cord.snoc("\n", !WordsSpacesNlsCord)
|
|
),
|
|
NewCol = 1
|
|
else if
|
|
% If we add a space and the current word to the line,
|
|
% we will still be within the line length limit.
|
|
CurCol + WordLen < MaxCol
|
|
then
|
|
NewWords = Words,
|
|
cord.snoc(" ", !WordsSpacesNlsCord),
|
|
cord.snoc(Word, !WordsSpacesNlsCord),
|
|
NewCol = CurCol + WordLen + 1
|
|
else if
|
|
% Adding the word and a space takes us to the end of the line exactly.
|
|
CurCol + WordLen = MaxCol
|
|
then
|
|
NewWords = Words,
|
|
cord.snoc(" ", !WordsSpacesNlsCord),
|
|
cord.snoc(Word, !WordsSpacesNlsCord),
|
|
% We only add a newline if there are more words to follow.
|
|
(
|
|
NewWords = []
|
|
;
|
|
NewWords = [_ | _],
|
|
cord.snoc("\n", !WordsSpacesNlsCord)
|
|
),
|
|
NewCol = 1
|
|
else
|
|
% Adding the word would take us over the line limit.
|
|
( if CurCol = 1 then
|
|
% Break up words that are too big to fit on a line.
|
|
% Here, we break off a piece that *just* fits on this line.
|
|
% We let the recursive call check whether the remainder
|
|
% is also too big to fit in a line.
|
|
%
|
|
% Note that we cannot handle the last piece resulting from
|
|
% breaking up Word here without duplicating the code above
|
|
% handling e.g. the case where the last word *just* fits on a line.
|
|
%
|
|
% Without duplicating the code above, we also cannot judge
|
|
% whether the last piece *should* be split off from the previous
|
|
% piece, because this depends on the relative lengths of the
|
|
% last piece and BrokenWordSep. (See the contents of tests/general/
|
|
% string_test.exp as of 2024 mar 23.)
|
|
%
|
|
% And in any case, leaving RightWordPiece to be handled by the
|
|
% recursive call is not a performance problem. Words that do not
|
|
% fit on one line are rare, but words that do not fit on *two*
|
|
% lines are practically nonexistent.
|
|
split_by_code_point(Word, MaxCol - BrokenWordSepLen,
|
|
LeftWordPiece, RightWordPiece),
|
|
NewWords = [RightWordPiece | Words],
|
|
cord.snoc(LeftWordPiece, !WordsSpacesNlsCord),
|
|
cord.snoc(BrokenWordSep, !WordsSpacesNlsCord),
|
|
cord.snoc("\n", !WordsSpacesNlsCord),
|
|
NewCol = 1
|
|
else
|
|
NewWords = [Word | Words],
|
|
cord.snoc("\n", !WordsSpacesNlsCord),
|
|
NewCol = 1
|
|
)
|
|
),
|
|
word_wrap_loop(NewWords, BrokenWordSep, BrokenWordSepLen, NewCol, MaxCol,
|
|
!WordsSpacesNlsCord).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Folds over the characters in strings.
|
|
%
|
|
|
|
foldl(Func, S, A) = B :-
|
|
Pred = ( pred(X::in, Y::in, Z::out) is det :- Z = Func(X, Y) ),
|
|
foldl(Pred, S, A, B).
|
|
|
|
foldl(Pred, String, !Acc) :-
|
|
string.count_code_units(String, Length),
|
|
foldl_between(Pred, String, 0, Length, !Acc).
|
|
|
|
foldl2(Pred, String, !Acc1, !Acc2) :-
|
|
string.count_code_units(String, Length),
|
|
foldl2_between(Pred, String, 0, Length, !Acc1, !Acc2).
|
|
|
|
foldl_between(Func, S, Start, End, A) = B :-
|
|
P = ( pred(X::in, Y::in, Z::out) is det :- Z = Func(X, Y) ),
|
|
foldl_between(P, S, Start, End, A, B).
|
|
|
|
foldl_between(Pred, String, Start0, End0, !Acc) :-
|
|
Start = int.max(0, Start0),
|
|
End = int.min(End0, string.count_code_units(String)),
|
|
foldl_between_loop(Pred, String, Start, End, !Acc).
|
|
|
|
foldl2_between(Pred, String, Start0, End0, !Acc1, !Acc2) :-
|
|
Start = max(0, Start0),
|
|
End = min(End0, string.count_code_units(String)),
|
|
foldl2_between_loop(Pred, String, Start, End, !Acc1, !Acc2).
|
|
|
|
:- pred foldl_between_loop(pred(char, A, A), string, int, int, A, A).
|
|
:- mode foldl_between_loop(in(pred(in, di, uo) is det), in, in, in,
|
|
di, uo) is det.
|
|
:- mode foldl_between_loop(in(pred(in, in, out) is det), in, in, in,
|
|
in, out) is det.
|
|
:- mode foldl_between_loop(in(pred(in, in, out) is semidet), in, in, in,
|
|
in, out) is semidet.
|
|
:- mode foldl_between_loop(in(pred(in, in, out) is nondet), in, in, in,
|
|
in, out) is nondet.
|
|
:- mode foldl_between_loop(in(pred(in, in, out) is multi), in, in, in,
|
|
in, out) is multi.
|
|
|
|
foldl_between_loop(Pred, String, I, End, !Acc) :-
|
|
( if
|
|
I < End,
|
|
unsafe_index_next(String, I, J, Char),
|
|
J =< End
|
|
then
|
|
Pred(Char, !Acc),
|
|
foldl_between_loop(Pred, String, J, End, !Acc)
|
|
else
|
|
true
|
|
).
|
|
|
|
:- pred foldl2_between_loop(pred(char, A, A, B, B), string, int, int,
|
|
A, A, B, B).
|
|
:- mode foldl2_between_loop(in(pred(in, di, uo, di, uo) is det),
|
|
in, in, in, di, uo, di, uo) is det.
|
|
:- mode foldl2_between_loop(in(pred(in, in, out, di, uo) is det),
|
|
in, in, in, in, out, di, uo) is det.
|
|
:- mode foldl2_between_loop(in(pred(in, in, out, in, out) is det),
|
|
in, in, in, in, out, in, out) is det.
|
|
:- mode foldl2_between_loop(in(pred(in, in, out, in, out) is semidet),
|
|
in, in, in, in, out, in, out) is semidet.
|
|
:- mode foldl2_between_loop(in(pred(in, in, out, in, out) is nondet),
|
|
in, in, in, in, out, in, out) is nondet.
|
|
:- mode foldl2_between_loop(in(pred(in, in, out, in, out) is multi),
|
|
in, in, in, in, out, in, out) is multi.
|
|
|
|
foldl2_between_loop(Pred, String, I, End, !Acc1, !Acc2) :-
|
|
( if
|
|
I < End,
|
|
unsafe_index_next(String, I, J, Char),
|
|
J =< End
|
|
then
|
|
Pred(Char, !Acc1, !Acc2),
|
|
foldl2_between_loop(Pred, String, J, End, !Acc1, !Acc2)
|
|
else
|
|
true
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
foldr(Func, String, Acc0) = Acc :-
|
|
Pred = ( pred(X::in, Y::in, Z::out) is det :- Z = Func(X, Y)),
|
|
foldr(Pred, String, Acc0, Acc).
|
|
|
|
foldr(Pred, String, !Acc) :-
|
|
foldr_between(Pred, String, 0, string.count_code_units(String), !Acc).
|
|
|
|
foldr_between(Func, String, Start, Count, Acc0) = Acc :-
|
|
Pred = ( pred(X::in, Y::in, Z::out) is det :- Z = Func(X, Y) ),
|
|
foldr_between(Pred, String, Start, Count, Acc0, Acc).
|
|
|
|
foldr_between(Pred, String, Start0, End0, !Acc) :-
|
|
Start = max(0, Start0),
|
|
End = min(End0, string.count_code_units(String)),
|
|
foldr_between_2(Pred, String, Start, End, !Acc).
|
|
|
|
:- pred foldr_between_2(pred(char, T, T), string, int, int, T, T).
|
|
:- mode foldr_between_2(in(pred(in, in, out) is det), in, in, in,
|
|
in, out) is det.
|
|
:- mode foldr_between_2(in(pred(in, di, uo) is det), in, in, in,
|
|
di, uo) is det.
|
|
:- mode foldr_between_2(in(pred(in, in, out) is semidet), in, in, in,
|
|
in, out) is semidet.
|
|
:- mode foldr_between_2(in(pred(in, in, out) is nondet), in, in, in,
|
|
in, out) is nondet.
|
|
:- mode foldr_between_2(in(pred(in, in, out) is multi), in, in, in,
|
|
in, out) is multi.
|
|
|
|
foldr_between_2(Pred, String, Start, I, !Acc) :-
|
|
( if
|
|
I > Start,
|
|
unsafe_prev_index(String, I, J, Char),
|
|
J >= Start
|
|
then
|
|
Pred(Char, !Acc),
|
|
foldr_between_2(Pred, String, Start, J, !Acc)
|
|
else
|
|
true
|
|
).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Formatting tables.
|
|
%
|
|
% Currently, format_table simply assumes each code point occupies
|
|
% a single column in a fixed-width output device. Thus the output will
|
|
% only be aligned if limited to an (important) subset of characters,
|
|
% namely ASCII and European characters (excluding combining characters).
|
|
% It would be relatively easy to support CJK double-width characters
|
|
% and zero-width characters (see wcswidth), which would be enough
|
|
% to cover the needs of very many people.
|
|
%
|
|
% These considerations may also apply to predicates such as pad_left,
|
|
% pad_right, format (with field widths), word_wrap, etc.
|
|
%
|
|
|
|
format_table(Columns, Separator) = Table :-
|
|
list.map3(find_max_width, Columns, SWs, Counts, ColumnStrs),
|
|
( if
|
|
Counts = [HeadCount | TailCounts],
|
|
column_counts_match(HeadCount, TailCounts)
|
|
then
|
|
SepLen = count_code_points(Separator),
|
|
generate_rows(Separator, SepLen, SWs, ColumnStrs, cord.init, RowCord),
|
|
Rows = cord.list(RowCord),
|
|
Table = string.join_list("\n", Rows)
|
|
else
|
|
(
|
|
Counts = [],
|
|
unexpected($pred, "no columns")
|
|
;
|
|
Counts = [_ | _],
|
|
unexpected($pred, "different columns have different lengths")
|
|
)
|
|
).
|
|
|
|
format_table_max(ColumnsLimits, Separator) = Table :-
|
|
list.map3(find_max_width_with_limit, ColumnsLimits,
|
|
SWs, Counts, ColumnStrs),
|
|
( if
|
|
Counts = [HeadCount | TailCounts],
|
|
column_counts_match(HeadCount, TailCounts)
|
|
then
|
|
SepLen = count_code_points(Separator),
|
|
generate_rows(Separator, SepLen, SWs, ColumnStrs, cord.init, RowCord),
|
|
Rows = cord.list(RowCord),
|
|
Table = string.join_list("\n", Rows)
|
|
else
|
|
(
|
|
Counts = [],
|
|
unexpected($pred, "no columns")
|
|
;
|
|
Counts = [_ | _],
|
|
unexpected($pred, "different columns have different lengths")
|
|
)
|
|
).
|
|
|
|
:- pred column_counts_match(int::in, list(int)::in) is semidet.
|
|
|
|
column_counts_match(_HeadCount, []).
|
|
column_counts_match(HeadCount, [HeadTailCount | TailTailCounts]) :-
|
|
HeadCount = HeadTailCount,
|
|
column_counts_match(HeadCount, TailTailCounts).
|
|
|
|
:- pred generate_rows(string::in, int::in,
|
|
list(sense_width)::in, list(list(string))::in,
|
|
cord(string)::in, cord(string)::out) is det.
|
|
|
|
generate_rows(Separator, SepLen, SWs, Columns0, !RowCord) :-
|
|
( if get_next_line(Columns0, Line, Columns) then
|
|
pad_row(SWs, Line, Separator, SepLen, 0, Row),
|
|
cord.snoc(Row, !RowCord),
|
|
generate_rows(Separator, SepLen, SWs, Columns, !RowCord)
|
|
else
|
|
true
|
|
).
|
|
|
|
:- pred get_next_line(list(list(string))::in,
|
|
list(string)::out, list(list(string))::out) is semidet.
|
|
|
|
get_next_line([], [], []).
|
|
get_next_line([Column | Columns], [ColumnTop | ColumnTops],
|
|
[ColumnRest | ColumnRests]) :-
|
|
Column = [ColumnTop | ColumnRest],
|
|
get_next_line(Columns, ColumnTops, ColumnRests).
|
|
|
|
:- pred pad_row(list(sense_width)::in, list(string)::in,
|
|
string::in, int::in, int::in, string::out) is det.
|
|
|
|
pad_row([], [], _, _, _, "").
|
|
pad_row([], [_ | _], _, _, _, _) :-
|
|
error($pred, "list length mismatch").
|
|
pad_row([_ | _], [], _, _, _, _) :-
|
|
error($pred, "list length mismatch").
|
|
pad_row([SenseWidth | SenseWidths], [ColumnStr0 | ColumnStrs0],
|
|
Separator, SepLen, CurColumn, Line) :-
|
|
SenseWidth = sense_width(JustifySense, ColumnWidth),
|
|
NextColumn = CurColumn + ColumnWidth + SepLen,
|
|
% XXX Counting code points here is an approximation. What we actually want
|
|
% is the *width* of ColumnStr0, which may be less than ColumnWidth
|
|
% in the presence of combining characters.
|
|
( if count_code_points(ColumnStr0) =< ColumnWidth then
|
|
(
|
|
JustifySense = just_left,
|
|
ColumnStr = pad_right(ColumnStr0, ' ', ColumnWidth)
|
|
;
|
|
JustifySense = just_right,
|
|
ColumnStr = pad_left(ColumnStr0, ' ', ColumnWidth)
|
|
)
|
|
else
|
|
% This is wider the "max width" of this column, but, as per
|
|
% the discussion on m-rev on 2023 May 22, splitting up ColumnStr0
|
|
% at ColumnWidth would be a bad idea, because there is a nontrivial
|
|
% chance that the cut would come between two combining Unicode
|
|
% characters.
|
|
%
|
|
% As it is, leaving ColumnStr0 uncut will probably make this cell
|
|
% in the table bulge out to the right.
|
|
ColumnStr = ColumnStr0
|
|
),
|
|
(
|
|
SenseWidths = [],
|
|
Line = ColumnStr
|
|
;
|
|
SenseWidths = [_ | _],
|
|
pad_row(SenseWidths, ColumnStrs0, Separator, SepLen, NextColumn,
|
|
LineRest),
|
|
Line = ColumnStr ++ Separator ++ LineRest
|
|
).
|
|
|
|
:- type justify_sense
|
|
---> just_left
|
|
; just_right.
|
|
|
|
:- type sense_width
|
|
---> sense_width(justify_sense, int).
|
|
|
|
:- pred find_max_width(justified_column::in,
|
|
sense_width::out, int::out, list(string)::out) is det.
|
|
|
|
find_max_width(JustColumn, SW, Count, Strings) :-
|
|
(
|
|
JustColumn = left(Strings),
|
|
Sense = just_left
|
|
;
|
|
JustColumn = right(Strings),
|
|
Sense = just_right
|
|
),
|
|
list.foldl2(count_and_find_max_str_length, Strings,
|
|
0, Count, 0, MaxWidth),
|
|
SW = sense_width(Sense, MaxWidth).
|
|
|
|
:- pred find_max_width_with_limit(pair(justified_column, maybe(int))::in,
|
|
sense_width::out, int::out, list(string)::out) is det.
|
|
|
|
find_max_width_with_limit(JustColumn - MaybeLimit, SW, Count, Strings) :-
|
|
(
|
|
JustColumn = left(Strings),
|
|
Sense = just_left
|
|
;
|
|
JustColumn = right(Strings),
|
|
Sense = just_right
|
|
),
|
|
list.foldl2(count_and_find_max_str_length, Strings,
|
|
0, Count, 0, MaxWidth0),
|
|
(
|
|
MaybeLimit = yes(Limit),
|
|
( if MaxWidth0 > Limit then
|
|
MaxWidth = Limit
|
|
else
|
|
MaxWidth = MaxWidth0
|
|
)
|
|
;
|
|
MaybeLimit = no,
|
|
MaxWidth = MaxWidth0
|
|
),
|
|
SW = sense_width(Sense, MaxWidth).
|
|
|
|
:- pred count_and_find_max_str_length(string::in,
|
|
int::in, int::out, int::in, int::out) is det.
|
|
|
|
count_and_find_max_str_length(Str, Count0, Count, PrevMaxLen, MaxLen) :-
|
|
Count = Count0 + 1,
|
|
Length = count_code_points(Str),
|
|
( if Length > PrevMaxLen then
|
|
MaxLen = Length
|
|
else
|
|
MaxLen = PrevMaxLen
|
|
).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Converting strings to docs.
|
|
%
|
|
|
|
string_to_doc(S) = pretty_printer.string_to_doc(S).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Converting strings to values of builtin types.
|
|
%
|
|
|
|
to_int(String, Int) :-
|
|
base_string_to_int(10, String, Int).
|
|
|
|
det_to_int(S) = det_base_string_to_int(10, S).
|
|
|
|
base_string_to_int(Base, String, Int) :-
|
|
string.index(String, 0, Char),
|
|
End = string.count_code_units(String),
|
|
( if
|
|
( Char = ('-'), FoldPred = base_negative_int_accumulator(Base)
|
|
; Char = ('+'), FoldPred = base_positive_int_accumulator(Base)
|
|
)
|
|
then
|
|
% Start at the first digit, which *should* be just after the sign.
|
|
End > 1,
|
|
foldl_between(FoldPred, String, 1, End, 0, Int)
|
|
else
|
|
FoldPred = base_positive_int_accumulator(Base),
|
|
foldl_between(FoldPred, String, 0, End, 0, Int)
|
|
).
|
|
|
|
det_base_string_to_int(Base, S) = N :-
|
|
( if base_string_to_int(Base, S, N0) then
|
|
N = N0
|
|
else
|
|
unexpected($pred, "conversion failed")
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
:- func base_positive_int_accumulator(int) = pred(char, int, int).
|
|
:- mode base_positive_int_accumulator(in) =
|
|
out(pred(in, in, out) is semidet) is det.
|
|
|
|
base_positive_int_accumulator(Base) = Pred :-
|
|
% Avoid allocating a closure for the common bases. A more general, but
|
|
% finicky, way to avoid the allocation is to inline foldl_between so that
|
|
% the higher-order calls in base_string_to_int can be specialised.
|
|
% The redundant closures will also need to be deleted by unused argument
|
|
% elimination.
|
|
( if Base = 10 then
|
|
Pred = accumulate_int(10)
|
|
else if Base = 16 then
|
|
Pred = accumulate_int(16)
|
|
else if Base = 8 then
|
|
Pred = accumulate_int(8)
|
|
else if Base = 2 then
|
|
Pred = accumulate_int(2)
|
|
else if 2 =< Base, Base =< 36 then
|
|
Pred = accumulate_int(Base)
|
|
else
|
|
string.format("the base must be between 2 and 36; %d is not",
|
|
[i(Base)], Msg),
|
|
unexpected($pred, Msg)
|
|
).
|
|
|
|
:- pred accumulate_int(int::in, char::in, int::in, int::out) is semidet.
|
|
|
|
accumulate_int(Base, Char, N0, N) :-
|
|
char.unsafe_base_digit_to_int(Base, Char, M),
|
|
N = (Base * N0) + M,
|
|
% Fail on overflow.
|
|
% XXX depends on undefined behaviour
|
|
N0 =< N.
|
|
|
|
:- func base_negative_int_accumulator(int) = pred(char, int, int).
|
|
:- mode base_negative_int_accumulator(in) = out(pred(in, in, out) is semidet)
|
|
is det.
|
|
|
|
base_negative_int_accumulator(Base) = Pred :-
|
|
% Avoid allocating a closure for the common bases.
|
|
( if Base = 10 then
|
|
Pred = accumulate_negative_int(10)
|
|
else if Base = 16 then
|
|
Pred = accumulate_negative_int(16)
|
|
else if Base = 8 then
|
|
Pred = accumulate_negative_int(8)
|
|
else if Base = 2 then
|
|
Pred = accumulate_negative_int(2)
|
|
else if 2 =< Base, Base =< 36 then
|
|
Pred = accumulate_negative_int(Base)
|
|
else
|
|
string.format("the base must be between 2 and 36; %d is not",
|
|
[i(Base)], Msg),
|
|
unexpected($pred, Msg)
|
|
).
|
|
|
|
:- pred accumulate_negative_int(int::in, char::in,
|
|
int::in, int::out) is semidet.
|
|
|
|
accumulate_negative_int(Base, Char, N0, N) :-
|
|
char.unsafe_base_digit_to_int(Base, Char, M),
|
|
N = (Base * N0) - M,
|
|
% Fail on overflow.
|
|
% XXX depends on undefined behaviour
|
|
N =< N0.
|
|
|
|
%---------------------%
|
|
|
|
to_uint(String, UInt) :-
|
|
base_string_to_uint(10, String, UInt).
|
|
|
|
det_to_uint(S) = det_base_string_to_uint(10, S).
|
|
|
|
base_string_to_uint(Base, String, UInt) :-
|
|
End = string.count_code_units(String),
|
|
foldl_between(base_uint_accumulator(Base), String,
|
|
0, End, 0u, UInt).
|
|
|
|
det_base_string_to_uint(Base, S) = N :-
|
|
( if base_string_to_uint(Base, S, N0) then
|
|
N = N0
|
|
else
|
|
unexpected($pred, "conversion failed")
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
:- func base_uint_accumulator(int) = pred(char, uint, uint).
|
|
:- mode base_uint_accumulator(in) =
|
|
out(pred(in, in, out) is semidet) is det.
|
|
|
|
base_uint_accumulator(Base) = Pred :-
|
|
% Avoid allocating a closure for the common bases. A more general, but
|
|
% finicky, way to avoid the allocation is to inline foldl_between so that
|
|
% the higher-order calls in base_string_to_int can be specialised.
|
|
% The redundant closures will also need to be deleted by unused argument
|
|
% elimination.
|
|
( if Base = 10 then
|
|
Pred = accumulate_uint(10u, 10)
|
|
else if Base = 16 then
|
|
Pred = accumulate_uint(16u, 16)
|
|
else if Base = 8 then
|
|
Pred = accumulate_uint(8u, 8)
|
|
else if Base = 2 then
|
|
Pred = accumulate_uint(2u, 2)
|
|
else if 2 =< Base, Base =< 36 then
|
|
Pred = accumulate_uint(uint.det_from_int(Base), Base)
|
|
else
|
|
string.format("the base must be between 2 and 36; %d is not",
|
|
[i(Base)], Msg),
|
|
unexpected($pred, Msg)
|
|
).
|
|
|
|
:- pred accumulate_uint(uint::in, int::in, char::in, uint::in, uint::out)
|
|
is semidet.
|
|
|
|
accumulate_uint(Base, BaseInt, Char, N0, N) :-
|
|
char.unsafe_base_digit_to_int(BaseInt, Char, M),
|
|
N = (Base * N0) + uint.det_from_int(M),
|
|
% Fail on overflow.
|
|
% XXX depends on undefined behaviour
|
|
N0 =< N.
|
|
|
|
%---------------------%
|
|
|
|
:- pragma foreign_export("C", to_float(in, out),
|
|
"ML_string_to_float").
|
|
|
|
:- pragma foreign_proc("C",
|
|
to_float(FloatString::in, FloatVal::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"{
|
|
// The %c checks for any erroneous characters appearing after the float;
|
|
// if there are some, then sscanf() will return 2 rather than 1.
|
|
char tmpc;
|
|
SUCCESS_INDICATOR =
|
|
(!MR_isspace(FloatString[0])) &&
|
|
(sscanf(FloatString, MR_FLT_FMT ""%c"", &FloatVal, &tmpc) == 1);
|
|
// MR_TRUE if sscanf succeeds, MR_FALSE otherwise.
|
|
}").
|
|
:- pragma foreign_proc("C#",
|
|
to_float(FloatString::in, FloatVal::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"{
|
|
// FloatVal must be initialized to suppress error messages
|
|
// when the predicate fails.
|
|
FloatVal = 0.0;
|
|
|
|
// Leading or trailing whitespace is not allowed.
|
|
if (FloatString.Length == 0 ||
|
|
System.Char.IsWhiteSpace(FloatString, 0) ||
|
|
System.Char.IsWhiteSpace(FloatString, FloatString.Length - 1))
|
|
{
|
|
SUCCESS_INDICATOR = false;
|
|
} else {
|
|
try {
|
|
FloatVal = System.Convert.ToDouble(FloatString);
|
|
SUCCESS_INDICATOR = true;
|
|
} catch (System.FormatException) {
|
|
SUCCESS_INDICATOR = false;
|
|
} catch (System.OverflowException) {
|
|
if (FloatString[0] == '-') {
|
|
FloatVal = System.Double.NegativeInfinity;
|
|
} else {
|
|
FloatVal = System.Double.PositiveInfinity;
|
|
}
|
|
SUCCESS_INDICATOR = true;
|
|
}
|
|
}
|
|
}").
|
|
:- pragma foreign_proc("Java",
|
|
to_float(FloatString::in, FloatVal::out),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
// FloatVal must be initialized to suppress error messages
|
|
// when the predicate fails.
|
|
FloatVal = 0.0;
|
|
|
|
// Leading or trailing whitespace is not allowed.
|
|
if (FloatString.length() == 0 || FloatString.trim() != FloatString) {
|
|
SUCCESS_INDICATOR = false;
|
|
} else {
|
|
try {
|
|
FloatVal = java.lang.Double.parseDouble(FloatString);
|
|
SUCCESS_INDICATOR = true;
|
|
} catch(java.lang.NumberFormatException e) {
|
|
// At this point it *should* in theory be safe just to set
|
|
// SUCCESS_INDICATOR = false, since the Java API claims that
|
|
// Double.parseDouble() will handle all the cases we require.
|
|
// However, it turns out that in practice (tested with Sun's
|
|
// Java 2 SDK, Standard Edition, version 1.3.1_04) Java actually
|
|
// throws a NumberFormatException when you give it NaN or
|
|
// infinity, so we handle these cases below.
|
|
|
|
if (FloatString.equalsIgnoreCase(""nan"")) {
|
|
FloatVal = java.lang.Double.NaN;
|
|
SUCCESS_INDICATOR = true;
|
|
} else if (FloatString.equalsIgnoreCase(""infinity"")) {
|
|
FloatVal = java.lang.Double.POSITIVE_INFINITY;
|
|
SUCCESS_INDICATOR = true;
|
|
} else if (FloatString.substring(1).
|
|
equalsIgnoreCase(""infinity""))
|
|
{
|
|
if (FloatString.charAt(0) == '+') {
|
|
FloatVal = java.lang.Double.POSITIVE_INFINITY;
|
|
SUCCESS_INDICATOR = true;
|
|
} else if (FloatString.charAt(0) == '-') {
|
|
FloatVal = java.lang.Double.NEGATIVE_INFINITY;
|
|
SUCCESS_INDICATOR = true;
|
|
} else {
|
|
SUCCESS_INDICATOR = false;
|
|
}
|
|
} else {
|
|
SUCCESS_INDICATOR = false;
|
|
}
|
|
}
|
|
}
|
|
").
|
|
|
|
det_to_float(FloatString) = Float :-
|
|
( if to_float(FloatString, FloatPrime) then
|
|
Float = FloatPrime
|
|
else
|
|
unexpected($pred, "conversion failed.")
|
|
).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Converting values of builtin types to strings.
|
|
%
|
|
|
|
char_to_string(C) = S1 :-
|
|
char_to_string(C, S1).
|
|
|
|
:- pragma promise_equivalent_clauses(pred(char_to_string/2)).
|
|
|
|
char_to_string(Char::in, String::uo) :-
|
|
from_char_list([Char], String).
|
|
char_to_string(Char::out, String::in) :-
|
|
index_next_repl(String, 0, NextIndex, Char, not_replaced),
|
|
% Check that String contains nothing else after Char.
|
|
string.count_code_units(String, NextIndex).
|
|
|
|
from_char(Char) = char_to_string(Char).
|
|
|
|
%---------------------%
|
|
|
|
int_to_base_string(N1, N2) = S2 :-
|
|
int_to_base_string(N1, N2, S2).
|
|
|
|
int_to_base_string(N, Base, Str) :-
|
|
( if 2 =< Base, Base =< 36 then
|
|
true
|
|
else
|
|
string.format("the base must be between 2 and 36; %d is not",
|
|
[i(Base)], Msg),
|
|
unexpected($pred, Msg)
|
|
),
|
|
% Note that in order to handle MININT correctly, we need to do the
|
|
% conversion of the absolute number into digits using negative numbers.
|
|
% we can't use positive numbers, because -MININT overflows.
|
|
( if N < 0 then
|
|
int_to_base_string_loop(N, Base, ['-'], RevChars)
|
|
else
|
|
NegN = 0 - N,
|
|
int_to_base_string_loop(NegN, Base, [], RevChars)
|
|
),
|
|
from_rev_char_list(RevChars, Str).
|
|
|
|
:- pred int_to_base_string_loop(int::in, int::in,
|
|
list(char)::in, list(char)::out) is det.
|
|
|
|
int_to_base_string_loop(NegN, Base, !RevChars) :-
|
|
% int_to_base_string_loop/3 is almost identical to
|
|
% int_to_base_string_group_loop/6 below so any changes here might
|
|
% also need to be applied to int_to_base_string_group_loop/3.
|
|
( if NegN > -Base then
|
|
N = -NegN,
|
|
DigitChar = char.det_base_int_to_digit(Base, N),
|
|
!:RevChars = [DigitChar | !.RevChars]
|
|
else
|
|
NegN1 = NegN // Base,
|
|
N10 = (NegN1 * Base) - NegN,
|
|
DigitChar = char.det_base_int_to_digit(Base, N10),
|
|
int_to_base_string_loop(NegN1, Base, !RevChars),
|
|
!:RevChars = [DigitChar | !.RevChars]
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
int_to_string_thousands(N) =
|
|
int_to_base_string_group(N, 10, 3, ",").
|
|
|
|
int_to_base_string_group(N, Base, GroupLength, Sep) = Str :-
|
|
( if 2 =< Base, Base =< 36 then
|
|
true
|
|
else
|
|
string.format("the base must be between 2 and 36; %d is not",
|
|
[i(Base)], Msg),
|
|
unexpected($pred, Msg)
|
|
),
|
|
% Note that in order to handle MININT correctly, we need to do
|
|
% the conversion of the absolute number into digits using negative numbers.
|
|
% We can't use positive numbers, because -MININT overflows.
|
|
( if N < 0 then
|
|
int_to_base_string_group_loop(N, Base, 0, GroupLength, Sep, Str1),
|
|
string.append("-", Str1, Str)
|
|
else
|
|
N1 = 0 - N,
|
|
int_to_base_string_group_loop(N1, Base, 0, GroupLength, Sep, Str)
|
|
).
|
|
|
|
% int_to_base_string_group_loop(NegN, Base, Curr, GroupLength, Sep, Str):
|
|
%
|
|
% GroupLength is how many digits there should be between separators.
|
|
% Curr is how many digits have been processed since the last separator
|
|
% was inserted.
|
|
% int_to_base_string_group_loop/6 is almost identical to
|
|
% int_to_base_string_loop/3 above so any changes here might also
|
|
% need to be applied to int_to_base_string_loop/3.
|
|
%
|
|
:- pred int_to_base_string_group_loop(int::in, int::in, int::in, int::in,
|
|
string::in, string::uo) is det.
|
|
|
|
int_to_base_string_group_loop(NegN, Base, Curr, GroupLength, Sep, Str) :-
|
|
( if
|
|
Curr = GroupLength,
|
|
GroupLength > 0
|
|
then
|
|
int_to_base_string_group_loop(NegN, Base, 0, GroupLength, Sep, Str1),
|
|
string.append(Str1, Sep, Str)
|
|
else
|
|
( if NegN > -Base then
|
|
N = -NegN,
|
|
DigitChar = char.det_base_int_to_digit(Base, N),
|
|
string.char_to_string(DigitChar, Str)
|
|
else
|
|
NegN1 = NegN // Base,
|
|
N10 = (NegN1 * Base) - NegN,
|
|
DigitChar = char.det_base_int_to_digit(Base, N10),
|
|
string.char_to_string(DigitChar, DigitString),
|
|
int_to_base_string_group_loop(NegN1, Base, Curr + 1,
|
|
GroupLength, Sep, Str1),
|
|
string.append(Str1, DigitString, Str)
|
|
)
|
|
).
|
|
|
|
%---------------------%
|
|
|
|
:- pragma foreign_proc("C",
|
|
int_to_string(I::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing, may_not_export_body],
|
|
"
|
|
int num_digits;
|
|
MR_Unsigned U;
|
|
if (I < 0) {
|
|
U = (MR_Unsigned) -I;
|
|
#ifdef MR_MERCURY_IS_64_BITS
|
|
get_num_decimal_digits_in_uint64(U, num_digits);
|
|
#else
|
|
get_num_decimal_digits_in_uint32(U, num_digits);
|
|
#endif
|
|
fill_string_with_negative_unsigned_decimal(S, U, num_digits,
|
|
MR_ALLOC_ID);
|
|
} else {
|
|
U = (MR_Unsigned) I;
|
|
#ifdef MR_MERCURY_IS_64_BITS
|
|
get_num_decimal_digits_in_uint64(U, num_digits);
|
|
#else
|
|
get_num_decimal_digits_in_uint32(U, num_digits);
|
|
#endif
|
|
fill_string_with_unsigned_decimal(S, U, num_digits, MR_ALLOC_ID);
|
|
}
|
|
").
|
|
|
|
:- pragma foreign_proc("C#",
|
|
int_to_string(I::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = I.ToString();
|
|
").
|
|
|
|
:- pragma foreign_proc("Java",
|
|
int_to_string(I::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = java.lang.Integer.toString(I);
|
|
").
|
|
|
|
int_to_string(N, Str) :-
|
|
Str = int_to_string(N).
|
|
|
|
from_int(N) = int_to_string(N).
|
|
|
|
%---------------------%
|
|
|
|
:- pragma foreign_proc("C",
|
|
uint_to_string(U::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing, may_not_export_body],
|
|
"
|
|
int num_digits;
|
|
#ifdef MR_MERCURY_IS_64_BITS
|
|
get_num_decimal_digits_in_uint64(U, num_digits);
|
|
#else
|
|
get_num_decimal_digits_in_uint32(U, num_digits);
|
|
#endif
|
|
fill_string_with_unsigned_decimal(Str, U, num_digits, MR_ALLOC_ID);
|
|
").
|
|
|
|
:- pragma foreign_proc("C#",
|
|
uint_to_string(U::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Str = U.ToString();
|
|
").
|
|
|
|
:- pragma foreign_proc("Java",
|
|
uint_to_string(U::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Str = java.lang.Long.toString(U & 0xffffffffL);
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
uint_to_hex_string(UInt) =
|
|
uint_to_lc_hex_string(UInt).
|
|
|
|
:- pragma inline(func(uint_to_lc_hex_string/1)).
|
|
|
|
:- pragma foreign_proc("C",
|
|
uint_to_lc_hex_string(U::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing, may_not_export_body],
|
|
"
|
|
int num_digits;
|
|
#ifdef MR_MERCURY_IS_64_BITS
|
|
get_num_hex_digits_in_uint64(U, num_digits);
|
|
#else
|
|
get_num_hex_digits_in_uint32(U, num_digits);
|
|
#endif
|
|
fill_string_with_unsigned_hex_lc(Str, U, num_digits, MR_ALLOC_ID);
|
|
").
|
|
|
|
:- pragma foreign_proc("C#",
|
|
uint_to_lc_hex_string(U::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Str = U.ToString(""x"");
|
|
").
|
|
|
|
:- pragma foreign_proc("Java",
|
|
uint_to_lc_hex_string(U::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Str = java.lang.Integer.toHexString(U);
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
:- pragma foreign_proc("C",
|
|
uint_to_uc_hex_string(U::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing, may_not_export_body],
|
|
"
|
|
int num_digits;
|
|
#ifdef MR_MERCURY_IS_64_BITS
|
|
get_num_hex_digits_in_uint64(U, num_digits);
|
|
#else
|
|
get_num_hex_digits_in_uint32(U, num_digits);
|
|
#endif
|
|
fill_string_with_unsigned_hex_uc(Str, U, num_digits, MR_ALLOC_ID);
|
|
").
|
|
|
|
:- pragma foreign_proc("C#",
|
|
uint_to_uc_hex_string(U::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Str = U.ToString(""X"");
|
|
").
|
|
|
|
:- pragma foreign_proc("Java",
|
|
uint_to_uc_hex_string(U::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Str = java.lang.Integer.toHexString(U).toUpperCase();
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
:- pragma foreign_proc("C",
|
|
uint_to_octal_string(U::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing, may_not_export_body],
|
|
"
|
|
int num_digits;
|
|
#ifdef MR_MERCURY_IS_64_BITS
|
|
get_num_octal_digits_in_uint64(U, num_digits);
|
|
#else
|
|
get_num_octal_digits_in_uint32(U, num_digits);
|
|
#endif
|
|
fill_string_with_unsigned_octal(Str, U, num_digits, MR_ALLOC_ID);
|
|
").
|
|
|
|
:- pragma foreign_proc("C#",
|
|
uint_to_octal_string(U::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Str = System.Convert.ToString(U, 8);
|
|
").
|
|
|
|
:- pragma foreign_proc("Java",
|
|
uint_to_octal_string(U::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Str = java.lang.Integer.toOctalString(U);
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
:- pragma foreign_proc("C",
|
|
int8_to_string(I8::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing, may_not_export_body],
|
|
"
|
|
int num_digits;
|
|
uint8_t U8;
|
|
if (I8 < 0) {
|
|
U8 = (uint8_t) -I8;
|
|
get_num_decimal_digits_in_uint8(U8, num_digits);
|
|
fill_string_with_negative_unsigned_decimal(S, U8, num_digits,
|
|
MR_ALLOC_ID);
|
|
} else {
|
|
U8 = (uint8_t) I8;
|
|
get_num_decimal_digits_in_uint8(U8, num_digits);
|
|
fill_string_with_unsigned_decimal(S, U8, num_digits, MR_ALLOC_ID);
|
|
}
|
|
").
|
|
|
|
:- pragma foreign_proc("C#",
|
|
int8_to_string(I8::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = I8.ToString();
|
|
").
|
|
|
|
:- pragma foreign_proc("Java",
|
|
int8_to_string(I8::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = java.lang.Integer.toString(I8);
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
:- pragma foreign_proc("C",
|
|
uint8_to_string(U8::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing, may_not_export_body],
|
|
"
|
|
int num_digits;
|
|
get_num_decimal_digits_in_uint8(U8, num_digits);
|
|
fill_string_with_unsigned_decimal(S, U8, num_digits, MR_ALLOC_ID);
|
|
").
|
|
|
|
:- pragma foreign_proc("C#",
|
|
uint8_to_string(U8::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = U8.ToString();
|
|
").
|
|
|
|
:- pragma foreign_proc("Java",
|
|
uint8_to_string(U8::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = java.lang.Integer.toString(U8 & 0xff);
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
:- pragma foreign_proc("C",
|
|
int16_to_string(I16::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing, may_not_export_body],
|
|
"
|
|
int num_digits;
|
|
uint16_t U16;
|
|
if (I16 < 0) {
|
|
U16 = (uint16_t) -I16;
|
|
get_num_decimal_digits_in_uint16(U16, num_digits);
|
|
fill_string_with_negative_unsigned_decimal(S, U16, num_digits,
|
|
MR_ALLOC_ID);
|
|
} else {
|
|
U16 = (uint16_t) I16;
|
|
get_num_decimal_digits_in_uint16(U16, num_digits);
|
|
fill_string_with_unsigned_decimal(S, U16, num_digits, MR_ALLOC_ID);
|
|
}
|
|
").
|
|
|
|
:- pragma foreign_proc("C#",
|
|
int16_to_string(I16::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = I16.ToString();
|
|
").
|
|
|
|
:- pragma foreign_proc("Java",
|
|
int16_to_string(I16::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = java.lang.Integer.toString(I16);
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
:- pragma foreign_proc("C",
|
|
uint16_to_string(U16::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing, may_not_export_body],
|
|
"
|
|
int num_digits;
|
|
get_num_decimal_digits_in_uint16(U16, num_digits);
|
|
fill_string_with_unsigned_decimal(S, U16, num_digits, MR_ALLOC_ID);
|
|
").
|
|
|
|
:- pragma foreign_proc("C#",
|
|
uint16_to_string(U16::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = U16.ToString();
|
|
").
|
|
|
|
:- pragma foreign_proc("Java",
|
|
uint16_to_string(U16::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = java.lang.Integer.toString(U16 & 0xffff);
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
:- pragma foreign_proc("C",
|
|
int32_to_string(I32::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing, may_not_export_body],
|
|
"
|
|
int num_digits;
|
|
uint32_t U32;
|
|
if (I32 < 0) {
|
|
U32 = (uint32_t) -I32;
|
|
get_num_decimal_digits_in_uint32(U32, num_digits);
|
|
fill_string_with_negative_unsigned_decimal(S, U32, num_digits,
|
|
MR_ALLOC_ID);
|
|
} else {
|
|
U32 = (uint32_t) I32;
|
|
get_num_decimal_digits_in_uint32(U32, num_digits);
|
|
fill_string_with_unsigned_decimal(S, U32, num_digits, MR_ALLOC_ID);
|
|
}
|
|
").
|
|
|
|
:- pragma foreign_proc("C#",
|
|
int32_to_string(I32::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = I32.ToString();
|
|
").
|
|
|
|
:- pragma foreign_proc("Java",
|
|
int32_to_string(I32::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = java.lang.Integer.toString(I32);
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
:- pragma foreign_proc("C",
|
|
uint32_to_string(U32::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing, may_not_export_body],
|
|
"
|
|
int num_digits;
|
|
get_num_decimal_digits_in_uint32(U32, num_digits);
|
|
fill_string_with_unsigned_decimal(S, U32, num_digits, MR_ALLOC_ID);
|
|
").
|
|
|
|
:- pragma foreign_proc("C#",
|
|
uint32_to_string(U32::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = U32.ToString();
|
|
").
|
|
|
|
:- pragma foreign_proc("Java",
|
|
uint32_to_string(U32::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = java.lang.Long.toString(U32 & 0xffffffffL);
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
:- pragma foreign_proc("C",
|
|
int64_to_string(I64::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing, may_not_export_body],
|
|
"
|
|
int num_digits;
|
|
uint64_t U64;
|
|
if (I64 < 0) {
|
|
U64 = (uint64_t) -I64;
|
|
get_num_decimal_digits_in_uint64(U64, num_digits);
|
|
fill_string_with_negative_unsigned_decimal(S, U64, num_digits,
|
|
MR_ALLOC_ID);
|
|
} else {
|
|
U64 = (uint64_t) I64;
|
|
get_num_decimal_digits_in_uint64(U64, num_digits);
|
|
fill_string_with_unsigned_decimal(S, U64, num_digits, MR_ALLOC_ID);
|
|
}
|
|
").
|
|
|
|
:- pragma foreign_proc("C#",
|
|
int64_to_string(I64::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = I64.ToString();
|
|
").
|
|
|
|
:- pragma foreign_proc("Java",
|
|
int64_to_string(I64::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = java.lang.Long.toString(I64);
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
:- pragma foreign_proc("C",
|
|
uint64_to_string(U64::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing, may_not_export_body],
|
|
"
|
|
int num_digits;
|
|
get_num_decimal_digits_in_uint64(U64, num_digits);
|
|
fill_string_with_unsigned_decimal(S, U64, num_digits, MR_ALLOC_ID);
|
|
").
|
|
|
|
:- pragma foreign_proc("C#",
|
|
uint64_to_string(U64::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = U64.ToString();
|
|
").
|
|
|
|
:- pragma foreign_proc("Java",
|
|
uint64_to_string(U64::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = java.lang.Long.toUnsignedString(U64);
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
uint64_to_hex_string(UInt) =
|
|
uint64_to_lc_hex_string(UInt).
|
|
|
|
:- pragma inline(func(uint64_to_lc_hex_string/1)).
|
|
|
|
:- pragma foreign_proc("C",
|
|
uint64_to_lc_hex_string(U64::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail],
|
|
"
|
|
char buffer[17]; // 16 for digits, 1 for nul.
|
|
sprintf(buffer, ""%"" PRIx64, U64);
|
|
MR_allocate_aligned_string_msg(S, strlen(buffer), MR_ALLOC_ID);
|
|
strcpy(S, buffer);
|
|
").
|
|
|
|
:- pragma foreign_proc("C#",
|
|
uint64_to_lc_hex_string(U64::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = U64.ToString(""x"");
|
|
").
|
|
|
|
:- pragma foreign_proc("Java",
|
|
uint64_to_lc_hex_string(U64::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = java.lang.Long.toHexString(U64);
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
:- pragma foreign_proc("C",
|
|
uint64_to_uc_hex_string(U64::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail],
|
|
"
|
|
char buffer[17]; // 16 for digits, 1 for nul.
|
|
sprintf(buffer, ""%"" PRIX64, U64);
|
|
MR_allocate_aligned_string_msg(S, strlen(buffer), MR_ALLOC_ID);
|
|
strcpy(S, buffer);
|
|
").
|
|
|
|
:- pragma foreign_proc("C#",
|
|
uint64_to_uc_hex_string(U64::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = U64.ToString(""X"");
|
|
").
|
|
|
|
:- pragma foreign_proc("Java",
|
|
uint64_to_uc_hex_string(U64::in) = (S::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
S = java.lang.Long.toHexString(U64).toUpperCase();
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
:- pragma foreign_proc("C",
|
|
uint64_to_octal_string(U64::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
char buffer[23]; // 22 for digits, 1 for nul.
|
|
sprintf(buffer, ""%"" PRIo64, U64);
|
|
MR_allocate_aligned_string_msg(Str, strlen(buffer), MR_ALLOC_ID);
|
|
strcpy(Str, buffer);
|
|
").
|
|
|
|
:- pragma foreign_proc("C#",
|
|
uint64_to_octal_string(U64::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
// We need to cast to a long here since C# does not provide an overloading
|
|
// of ToString() for ulongs. This works since ToString() will use the
|
|
// unsigned representation for non-decimal bases.
|
|
Str = System.Convert.ToString((long) U64, 8);
|
|
").
|
|
|
|
:- pragma foreign_proc("Java",
|
|
uint64_to_octal_string(U64::in) = (Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
Str = java.lang.Long.toOctalString(U64);
|
|
").
|
|
|
|
%---------------------%
|
|
|
|
float_to_string(Float) = S2 :-
|
|
float_to_string(Float, S2).
|
|
|
|
:- pragma foreign_proc("C",
|
|
float_to_string(Flt::in, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"{
|
|
// For efficiency reasons, we duplicate the C implementation
|
|
// of lowlevel_float_to_string.
|
|
MR_float_to_string(Flt, Str, MR_ALLOC_ID);
|
|
}").
|
|
:- pragma foreign_proc("C#",
|
|
float_to_string(Flt::in, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
if (System.Double.IsNaN(Flt)) {
|
|
Str = ""nan"";
|
|
} else if (System.Double.IsPositiveInfinity(Flt)) {
|
|
Str = ""infinity"";
|
|
} else if (System.Double.IsNegativeInfinity(Flt)) {
|
|
Str = ""-infinity"";
|
|
} else {
|
|
Str = Flt.ToString(""R"");
|
|
|
|
// Append '.0' if there is no 'e' or '.' in the string.
|
|
bool contains = false;
|
|
foreach (char c in Str) {
|
|
if (c == 'e' || c == 'E' || c == '.') {
|
|
contains = true;
|
|
break;
|
|
}
|
|
}
|
|
if (!contains) {
|
|
Str = Str + "".0"";
|
|
}
|
|
}
|
|
").
|
|
|
|
:- pragma foreign_proc("Java",
|
|
float_to_string(Flt::in, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe, will_not_modify_trail,
|
|
does_not_affect_liveness, no_sharing],
|
|
"
|
|
if (Double.isNaN(Flt)) {
|
|
Str = ""nan"";
|
|
} else if (Double.isInfinite(Flt)) {
|
|
if (Flt < 0.0) {
|
|
Str = ""-infinity"";
|
|
} else {
|
|
Str = ""infinity"";
|
|
}
|
|
} else {
|
|
Str = java.lang.Double.toString(Flt);
|
|
}
|
|
").
|
|
|
|
float_to_string(Float, unsafe_promise_unique(String)) :-
|
|
% XXX This implementation has problems when the mantissa
|
|
% cannot fit in an int.
|
|
%
|
|
% XXX The unsafe_promise_unique is needed because in float_to_string_loop,
|
|
% the call to to_float doesn't have a (ui, out) mode, which means that
|
|
% the output string cannot be unique.
|
|
String = float_to_string_loop(min_precision, Float).
|
|
|
|
:- func float_to_string_loop(int, float) = (string) is det.
|
|
|
|
float_to_string_loop(Prec, Float) = String :-
|
|
disable_warning [unknown_format_calls] (
|
|
format("%#." ++ int_to_string(Prec) ++ "g", [f(Float)], Tmp)
|
|
),
|
|
( if Prec = max_precision then
|
|
String = Tmp
|
|
else
|
|
( if to_float(Tmp, Float) then
|
|
String = Tmp
|
|
else
|
|
String = float_to_string_loop(Prec + 1, Float)
|
|
)
|
|
).
|
|
|
|
% XXX For efficiency reasons, we assume that on non-C backends we use
|
|
% double precision floats. However the commented out code provides
|
|
% a general mechanism for calculating the required precision.
|
|
%
|
|
:- func min_precision = int.
|
|
|
|
min_precision = 15.
|
|
% min_precision =
|
|
% floor_to_int(float(mantissa_digits) * log2(float(radix)) / log2(10.0)).
|
|
|
|
:- func max_precision = int.
|
|
|
|
max_precision = min_precision + 2.
|
|
|
|
from_float(Float) = float_to_string(Float).
|
|
|
|
%---------------------%
|
|
|
|
c_pointer_to_string(P) = S :-
|
|
c_pointer_to_string(P, S).
|
|
|
|
:- pragma foreign_proc("C#",
|
|
c_pointer_to_string(C_Pointer::in, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
// Within the spirit of the function, at least.
|
|
if (C_Pointer == null) {
|
|
Str = ""null"";
|
|
} else {
|
|
Str = C_Pointer.ToString();
|
|
}
|
|
").
|
|
:- pragma foreign_proc("Java",
|
|
c_pointer_to_string(C_Pointer::in, Str::uo),
|
|
[will_not_call_mercury, promise_pure, thread_safe],
|
|
"
|
|
// Within the spirit of the function, at least.
|
|
if (C_Pointer == null) {
|
|
Str = ""null"";
|
|
} else {
|
|
Str = C_Pointer.toString();
|
|
}
|
|
").
|
|
|
|
c_pointer_to_string(C_Pointer, Str) :-
|
|
private_builtin.unsafe_type_cast(C_Pointer, Int),
|
|
Str = "c_pointer(0x" ++ int_to_base_string(Int, 16) ++ ")".
|
|
|
|
from_c_pointer(P) = S :-
|
|
c_pointer_to_string(P, S).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Converting values of arbitrary types to strings.
|
|
%
|
|
|
|
string(X) =
|
|
to_string.string_impl(X).
|
|
|
|
string_ops(OpTable, X) =
|
|
to_string.string_ops_impl(OpTable, X).
|
|
|
|
string_ops_noncanon(NonCanon, OpTable, X, String) :-
|
|
to_string.string_ops_noncanon_impl(NonCanon, OpTable, X, String).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
%
|
|
% Converting values to strings based on a format string.
|
|
%
|
|
|
|
format(S1, PT) = S2 :-
|
|
disable_warning [unknown_format_calls] (
|
|
format(S1, PT, S2)
|
|
).
|
|
|
|
format(FormatString, PolyList, String) :-
|
|
format.format_impl(FormatString, PolyList, String).
|
|
|
|
%---------------------------------------------------------------------------%
|
|
:- end_module string.
|
|
%---------------------------------------------------------------------------%
|