mirror of
https://github.com/Mercury-Language/mercury.git
synced 2026-04-22 04:43:53 +00:00
For now, the implementation covers only non-lookup switches.
compiler/builtin_ops.m:
Generalize the existing offset_str_eq binary op by adding an optional
size parameter, which, if present, restricts the equality test to look at
the given number of code units at most.
compiler/llds_out_data.m:
compiler/mlds_to_c_data.m:
Generalize the output of binop rvals whose operation is offset_str_eq.
In llds_out_data.m, fix a bug in the original code. (This bug did not
lead to problems because before this diff, we never generated this op.)
compiler/string_switch_util.m:
Add a predicate that recognizes when a trie node that is NOT a leaf
nevertheless represents the top of a stick, which means that it has
only one possible next code unit, which itself may have only one
possible next code unit, and so on, until we reach a node that *does*
have two or more next code units. (One of those may be the code unit
of the string-ending NULL character.)
compiler/ml_string_switch.m:
Use the new predicate in string_switch_util.m to generate better code
for sticks. Instead of comparing each character in the stick individually
against the relevant code unit of the string being switched on, compare
them all at once using the new binary op.
compiler/ml_switch_gen.m:
Insist on both the host machine and the target machine
using the C backend.
compiler/string_switch.m:
Implement non-lookup trie switches. The code follows the approach used
in ml_string_switch.m as much as possible, but there are plenty of
differences caused by targeting the LLDS.
Rename some predicates to specify which switch implementation method
they belong to.
Write a comment just once, and refer to it from elsewhere instead of
duplicating it at each reference site.
compiler/switch_gen.m:
Enable the use of trie switches when the option values call for it,
and when the switch is not a lookup switch.
compiler/cse_detection.m:
Do not flood the output of mmc -V with messages that have nothing to do
with the module being compiled.
compiler/options.m:
Add a way to specify --no-allow-inlining on the command line.
This can help debug code generator changes like this, by disallowing
a transform that can modify the Mercury code whose compilation process
you are trying to debug. (The documentation of the --inlining option
implies that --no-inlining should do the same job, but it does not.)
The option is not documented for users.
compiler/string_encoding.m:
Provide a version of from_code_unit_list_in_encoding that allows
non-well-formed code unit sequences as input, and provide det versions
of both versions. This is for use by both string_switch.m and
ml_string_switch.m.
compiler/hlds_goal.m:
Document the properties of case_ids.
compiler/llds.m:
Document the possibility that string constants are not well formed.
compiler/bytecode.m:
compiler/code_util.m:
compiler/mlds_dump.m:
compiler/ml_global_data.m:
compiler/mlds_to_cs_data.m:
compiler/mlds_to_java_data.m:
compiler/opt_debug.m:
Conform to the changes above.
library/string.m:
Replace the non-exported test predicate internal_encoding_is_utf8 with
an exported function that returns an enum specifying the string encoding.
NEWS.md:
Announce the new function.
runtime/mercury_string.h:
Add the C macro that implements the new form of the offset_str_eq
binary op.
tests/hard_coded/string_switch4.{m,exp}:
We have long had three copies of the exact same code, in string_switch.m,
string_switch2.m and string_switch3.m, which were compiled with
- no smart switch implementation
- smart switch implementation forced to use the hash table method
- smart switch implementation forced to use binary search method
Add this new copy, which is compiled with
- smart switch implementation forced to use the new trie method
tests/hard_coded/Mmakefile:
Add the new test case.
tests/hard_coded/Mercury.options:
Update the options of the test cases, and specify them for the new.
tests/hard_coded/string_switch.m:
tests/hard_coded/string_switch2.m:
tests/hard_coded/string_switch3.m:
Update the top-of-module comment block to be identical in all four copies
of this module.
137 lines
4.6 KiB
Mathematica
137 lines
4.6 KiB
Mathematica
%----------------------------------------------------------------------------%
|
|
% vim: ft=mercury ts=4 sw=4 et
|
|
%----------------------------------------------------------------------------%
|
|
% Copyright (C) 2015, 2024 The Mercury team.
|
|
% This file may only be copied under the terms of the GNU General
|
|
% Public License - see the file COPYING in the Mercury distribution.
|
|
%----------------------------------------------------------------------------%
|
|
|
|
:- module backend_libs.string_encoding.
|
|
:- interface.
|
|
|
|
:- import_module libs.
|
|
:- import_module libs.globals.
|
|
|
|
:- import_module list.
|
|
:- import_module string.
|
|
|
|
% target_char_range(Target, Min, Max):
|
|
%
|
|
% Return the smallest and largest integers that represent
|
|
% valid code points in the encoding we use on the given target platform.
|
|
%
|
|
:- pred target_char_range(compilation_target::in, int::out, int::out) is det.
|
|
|
|
% Return the string_encoding we use on the given target platform.
|
|
%
|
|
:- func target_string_encoding(compilation_target) = string_encoding.
|
|
|
|
% Convert a string to the list of its code units in the given encoding.
|
|
%
|
|
:- pred to_code_unit_list_in_encoding(string_encoding::in, string::in,
|
|
list(int)::out) is det.
|
|
|
|
% Convert a list of code units in the given encoding to a string.
|
|
% Fails if the list does not follow the rules of the encoding.
|
|
%
|
|
:- pred from_code_unit_list_in_encoding(string_encoding::in, list(int)::in,
|
|
string::out) is semidet.
|
|
:- pred det_from_code_unit_list_in_encoding(string_encoding::in, list(int)::in,
|
|
string::out) is det.
|
|
|
|
% Convert a list of code units in the given encoding to a string.
|
|
% Allows ill-formed sequences, and will succeed *unless* the given list
|
|
% includes a zero, signifying a null character.
|
|
%
|
|
% At the moment, it works only when the encoding specified by the first
|
|
% argument is utf8, *and* the compiler's own encoding is utf8.
|
|
% If either encoding is utf16, it will throw an exception.
|
|
%
|
|
:- pred from_code_unit_list_in_encoding_allow_ill_formed(string_encoding::in,
|
|
list(int)::in, string::out) is semidet.
|
|
:- pred det_from_code_unit_list_in_encoding_allow_ill_formed(
|
|
string_encoding::in, list(int)::in, string::out) is det.
|
|
|
|
%----------------------------------------------------------------------------%
|
|
%----------------------------------------------------------------------------%
|
|
|
|
:- implementation.
|
|
|
|
:- import_module require.
|
|
|
|
target_char_range(_Target, Min, Max) :-
|
|
% The range of `char' is the same for all existing targets.
|
|
Min = 0,
|
|
Max = 0x10ffff.
|
|
|
|
target_string_encoding(Target) = Encoding :-
|
|
(
|
|
Target = target_c,
|
|
Encoding = utf8
|
|
;
|
|
( Target = target_java
|
|
; Target = target_csharp
|
|
),
|
|
Encoding = utf16
|
|
).
|
|
|
|
to_code_unit_list_in_encoding(Encoding, String, CodeUnits) :-
|
|
require_complete_switch [Encoding]
|
|
(
|
|
Encoding = utf8,
|
|
string.to_utf8_code_unit_list(String, CodeUnits)
|
|
;
|
|
Encoding = utf16,
|
|
string.to_utf16_code_unit_list(String, CodeUnits)
|
|
).
|
|
|
|
from_code_unit_list_in_encoding(Encoding, CodeUnits, String) :-
|
|
require_complete_switch [Encoding]
|
|
(
|
|
Encoding = utf8,
|
|
string.from_utf8_code_unit_list(CodeUnits, String)
|
|
;
|
|
Encoding = utf16,
|
|
string.from_utf16_code_unit_list(CodeUnits, String)
|
|
).
|
|
|
|
det_from_code_unit_list_in_encoding(Encoding, CodeUnits, String) :-
|
|
( if from_code_unit_list_in_encoding(Encoding, CodeUnits, StringPrime) then
|
|
String = StringPrime
|
|
else
|
|
unexpected($pred, "from_code_unit_list_in_encoding failed")
|
|
).
|
|
|
|
from_code_unit_list_in_encoding_allow_ill_formed(Encoding, CodeUnits, String) :-
|
|
require_complete_switch [Encoding]
|
|
(
|
|
Encoding = utf8,
|
|
InternalEncoding = internal_string_encoding,
|
|
(
|
|
InternalEncoding = utf8,
|
|
string.from_code_unit_list_allow_ill_formed(CodeUnits, String)
|
|
;
|
|
InternalEncoding = utf16,
|
|
unexpected($pred, "implementing on utf16 is nyi")
|
|
)
|
|
;
|
|
Encoding = utf16,
|
|
unexpected($pred, "utf16 is nyi")
|
|
).
|
|
|
|
det_from_code_unit_list_in_encoding_allow_ill_formed(Encoding,
|
|
CodeUnits, String) :-
|
|
( if
|
|
from_code_unit_list_in_encoding_allow_ill_formed(Encoding,
|
|
CodeUnits, StringPrime)
|
|
then
|
|
String = StringPrime
|
|
else
|
|
unexpected($pred,
|
|
"from_code_unit_list_in_encoding_allow_ill_formed failed")
|
|
).
|
|
|
|
%----------------------------------------------------------------------------%
|
|
:- end_module backend_libs.string_encoding.
|
|
%----------------------------------------------------------------------------%
|