Commit Graph

96 Commits

Author SHA1 Message Date
Zoltan Somogyi
386160f937 s/dont/do_not/ in the compiler directory.
compiler/*.m:
    Standardize on the do_not spelling over the dont contraction
    in the compiler directory. (We used to have a lot of both spellings.)
2024-08-12 12:49:23 +02:00
Zoltan Somogyi
28ab8c2ade Group together related builtin operations.
compiler/builtin_ops.m:
    Replace six individual builtin comparison ops for str_{eq,ne,lt,le,gt,ge}
    with a single str_cmp/1 function symbol, whose *argument*
    is one of {eq,ne,lt,le,gt,ge}. Do the same with comparison operations
    on integers (including the operations that compare signed integers
    as if they were unsigned) and floats. The eq and ne operations
    on integers had names that did not fit into the scheme used by the
    other binops; this diff fixes that.

    Replace five individual builtin arithmetic ops for int_{add,sub,mul,mod}
    with a single int_arity/2 function symbol, one of whose arguments
    is one of {add,sub,mul,rem}. (This diff renames the "mod" (modulus)
    op to "rem" (remainder), as an XXX has been asking for a long time.)
    The other argument specifies *which* integer type the operation is on.
    Do a similar change for float arithmetic ops, with the exception that
    floats don't support the remainder op.

    The points of the above changes are

    - to allow us to factor out commonalities between operations,
      both between e.g. all comparison operations on integers,
      and between  e.g. lt comparisons on values of different types.

    - to stop forcing switches on binops to make distinctions that
      they do not actually care about.

    Rename the old str_cmp op, which returns a negative, zero or positive
    result (as does strcmp in C) to str_nzp, since the str_cmp name
    is now used for something else.

    Add some utility functions here, to allow the deletion of the
    many existing copies of the bodies of those functions elsewhere
    in the compiler.

compiler/closure_gen.m:
compiler/code_util.m:
compiler/dense_switch.m:
compiler/disj_gen.m:
compiler/ite_gen.m:
compiler/jumpopt.m:
compiler/llds.m:
compiler/llds_out_data.m:
compiler/lookup_switch.m:
compiler/middle_rec.m:
compiler/ml_disj_gen.m:
compiler/ml_foreign_proc_gen.m:
compiler/ml_global_data.m:
compiler/ml_lookup_switch.m:
compiler/ml_optimize.m:
compiler/ml_simplify_switch.m:
compiler/ml_string_switch.m:
compiler/ml_unify_gen.m:
compiler/ml_unify_gen_test.m:
compiler/mlds_dump.m:
compiler/mlds_to_c_data.m:
compiler/mlds_to_cs_data.m:
compiler/mlds_to_java_data.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/peephole.m:
compiler/pragma_c_gen.m:
compiler/string_switch.m:
compiler/tag_switch.m:
compiler/trace_gen.m:
compiler/transform_llds.m:
compiler/unify_gen.m:
compiler/unify_gen_test.m:
    Conform to the changes above, by either generating or consuming
    binops in their new form.
2024-07-13 15:02:08 +02:00
Zoltan Somogyi
2a63738b8e Implement det/semidet string trie lookup switches.
compiler/string_switch.m:
    Implement single-solution string trie lookup switches.
    The code managing the lookup table is new, while the code managing
    the trie search generalizes existing code. The latter required
    some redrawing of the predicate boundaries within that existing code,
    as well as adjusting some types and variable names.

    Include "jump" in the name of the non-lookup versions of string switches.

    Put state var arguments last in some predicate signatures.

compiler/switch_gen.m:
    Enable single-solution string trie lookup switches.

compiler/string_switch_util.m:
    Delete the call to build_str_case_id_list from the create_trie predicate,
    since it is needed only by its old caller, the implementation of string
    trie JUMP switches (which now does it itself), and not by its new caller,
    the implementation of string trie LOOKUP switches.

compiler/lookup_util.m:
compiler/code_util.m:
    Give some predicates more expressive names.

compiler/code_loc_dep.m:
compiler/disj_gen.m:
compiler/jumpopt.m:
compiler/lookup_switch.m:
compiler/middle_rec.m:
compiler/ml_string_switch.m:
compiler/tag_switch.m:
compiler/unify_gen_test.m:
    Conform to the changes above.

compiler/hlds_goal.m:
    Fix a comment.

tests/hard_coded/space.m:
    This test case caught a bug in an early version of this diff.
    Document this fact.

    Make the code more readable by

    - aligning the columns in some tables,
    - renaming some function symbols to avoid ambiguity,
    - replacing the remnants of calls to Prolog's "is" predicate
      with idiomatic Mercury code, and
    - deleting commented-out dead code that duplicated the body of predicate.

tests/hard_coded/Mercury.options:
    Make space.m's role as a test case for string trie switches official
    by compiling it with options that force trie switches.
2024-04-03 09:19:37 +11:00
Zoltan Somogyi
9dbee8bdb4 Implement trie string switches for the LLDS backend.
For now, the implementation covers only non-lookup switches.

compiler/builtin_ops.m:
    Generalize the existing offset_str_eq binary op by adding an optional
    size parameter, which, if present, restricts the equality test to look at
    the given number of code units at most.

compiler/llds_out_data.m:
compiler/mlds_to_c_data.m:
    Generalize the output of binop rvals whose operation is offset_str_eq.
    In llds_out_data.m, fix a bug in the original code. (This bug did not
    lead to problems because before this diff, we never generated this op.)

compiler/string_switch_util.m:
    Add a predicate that recognizes when a trie node that is NOT a leaf
    nevertheless represents the top of a stick, which means that it has
    only one possible next code unit, which itself may have only one
    possible next code unit, and so on, until we reach a node that *does*
    have two or more next code units. (One of those may be the code unit
    of the string-ending NULL character.)

compiler/ml_string_switch.m:
    Use the new predicate in string_switch_util.m to generate better code
    for sticks. Instead of comparing each character in the stick individually
    against the relevant code unit of the string being switched on, compare
    them all at once using the new binary op.

compiler/ml_switch_gen.m:
    Insist on both the host machine and the target machine
    using the C backend.

compiler/string_switch.m:
    Implement non-lookup trie switches. The code follows the approach used
    in ml_string_switch.m as much as possible, but there are plenty of
    differences caused by targeting the LLDS.

    Rename some predicates to specify which switch implementation method
    they belong to.

    Write a comment just once, and refer to it from elsewhere instead of
    duplicating it at each reference site.

compiler/switch_gen.m:
    Enable the use of trie switches when the option values call for it,
    and when the switch is not a lookup switch.

compiler/cse_detection.m:
    Do not flood the output of mmc -V with messages that have nothing to do
    with the module being compiled.

compiler/options.m:
    Add a way to specify --no-allow-inlining on the command line.
    This can help debug code generator changes like this, by disallowing
    a transform that can modify the Mercury code whose compilation process
    you are trying to debug. (The documentation of the --inlining option
    implies that --no-inlining should do the same job, but it does not.)
    The option is not documented for users.

compiler/string_encoding.m:
    Provide a version of from_code_unit_list_in_encoding that allows
    non-well-formed code unit sequences as input, and provide det versions
    of both versions. This is for use by both string_switch.m and
    ml_string_switch.m.

compiler/hlds_goal.m:
    Document the properties of case_ids.

compiler/llds.m:
    Document the possibility that string constants are not well formed.

compiler/bytecode.m:
compiler/code_util.m:
compiler/mlds_dump.m:
compiler/ml_global_data.m:
compiler/mlds_to_cs_data.m:
compiler/mlds_to_java_data.m:
compiler/opt_debug.m:
    Conform to the changes above.

library/string.m:
    Replace the non-exported test predicate internal_encoding_is_utf8 with
    an exported function that returns an enum specifying the string encoding.

NEWS.md:
    Announce the new function.

runtime/mercury_string.h:
    Add the C macro that implements the new form of the offset_str_eq
    binary op.

tests/hard_coded/string_switch4.{m,exp}:
    We have long had three copies of the exact same code, in string_switch.m,
    string_switch2.m and string_switch3.m, which were compiled with

    - no smart switch implementation
    - smart switch implementation forced to use the hash table method
    - smart switch implementation forced to use binary search method

    Add this new copy, which is compiled with

    - smart switch implementation forced to use the new trie method

tests/hard_coded/Mmakefile:
    Add the new test case.

tests/hard_coded/Mercury.options:
    Update the options of the test cases, and specify them for the new.

tests/hard_coded/string_switch.m:
tests/hard_coded/string_switch2.m:
tests/hard_coded/string_switch3.m:
    Update the top-of-module comment block to be identical in all four copies
    of this module.
2024-03-26 21:17:31 +11:00
Zoltan Somogyi
d5190e93c5 Fix LLDS/MLDS diffs in control of string switches.
Some of these diffs involve string lookup switches.

compiler/ml_lookup_switch.m:
    When testing whether a switch is a lookup switch for the MLDS,
    we sometimes need to update the code generator's state. We used to
    return the updated state whether or not the switch is a lookup switch,
    which is incorrect if the switch is NOT a lookup switch.
    (The incorrectness used to show up as allocated but unused entities,
    such as slots in the global const data table, which are harmless enough
    not to lead to crashes.)

    Fix this by putting the updated code generator state as a new argument
    into the function symbol that we return only when the switch *is*
    a lookup switch.

compiler/lookup_switch.m:
    The predicate we used to test whether a switch is a lookup switch
    for the LLDS used to be semidet, so it did have the above problem.
    It did have the problem that calls to it returned only the info
    appropriate for success; they did not return an indication about
    *whether* they succeed in a way that could be stored. This meant
    every method of implementing string switches had to repeat the call.

    Fix this by changing the predicate to return a success/failure indication,
    making it det. Make it use the new technique in ml_lookup_switch.m
    to avoid using inappropriate code generator states. In this case,
    that also means moving the code that remembers the branch start position
    of the code generator state here from our caller. This move allows us
    to delete a reset to the just-remembered position, which was never needed.

compiler/ml_string_switch.m:
compiler/string_switch.m:
    Conform to the changes in ml_lookup_switch.m/lookup_switch.m.

compiler/ml_switch_gen.m:
    Move producers of variables to just the code branches that need
    the value of that variable.

    Conform to the changes in ml_lookup_switch.m.

compiler/switch_gen.m:
    Move the code that decides how to implement a smart switch
    on a string value to a predicate of its own, to match
    ml_switch_gen.m. Change the structure of the moved code
    to follow the structure in ml_switch_gen.m.

    Conform to the changes in ml_lookup_switch.m.
2024-03-22 20:18:46 +11:00
Zoltan Somogyi
d385cdca37 Replace reversed lists with cords.
Add ml_ prefixes to some predicate names.

Put a piece of code into a predicate of its own, to prevent it from
distracting readers of the original predicate with its not-very-important
detail.
2024-03-22 04:10:15 +11:00
Zoltan Somogyi
0ed6e2d0d4 Prepare for string trie switches in the LLDS.
compiler/ml_string_switch.m:
compiler/string_switch_util.m:
    Move the backend-agnostic part of the existing MLDS implementation
    of string trie switches from ml_string_switch.m to string_switch_util.m.
    Clean it up a bit for more general use.

compiler/string_encoding.m:
    Document the exported predicates and functions.
2024-03-20 02:16:16 +11:00
Zoltan Somogyi
a6d81a3bb9 Carve three new modules out of switch_util.m.
compiler/lookup_switch_util.m:
compiler/string_switch_util.m:
compiler/tag_switch_util.m:
    Carve these three new modules out of switch_util.m. As their names imply,
    they contain the parts of the old switch_util.m that are concerned with
    lookup switches, switches on strings, and switches on tags respectively.

compiler/switch_util.m:
    Delete the code moved to the new modules.

compiler/backend_libs.m:
    Include the new modules in the backend_libs package.

compiler/notes/compiler_design.html:
    Document the new modules.

compiler/dense_switch.m:
compiler/lookup_switch.m:
compiler/ml_lookup_switch.m:
compiler/ml_simplify_switch.m:
compiler/ml_string_switch.m:
compiler/ml_switch_gen.m:
compiler/ml_tag_switch.m:
compiler/simplify_goal_switch.m:
compiler/string_switch.m:
compiler/switch_case.m:
compiler/switch_gen.m:
compiler/tag_switch.m:
    Conform to the changes above by importing one, or sometimes two, of
    the new modules, usually instead of switch_util.m, sometimes
    in addition to switch_util.m.

    In a few cases, delete explicit module qualifications that
    this diff has made incorrect.
2024-03-12 22:13:59 +11:00
Julien Fischer
f765494ec9 Fix spelling.
compiler/ml_string_switch.m:
    As above.
2023-04-05 00:38:24 +10:00
Zoltan Somogyi
c138bbb632 Fix a bug in string trie jump switches.
compiler/ml_string_switch.m:
    Fix a too-strong sanity check. It insisted on a semidet switch
    containing code to handle the failure of the switch, but a switch
    on strings can be both semidet and cannot_fail if

    - the switched-on variable's inst is known to contain only the strings
      handled by the arms of the switch, and

    - one or more of the switch arms containing semidet code.

    In that case, the switch does not need a default case, since it would be
    unreachable.

compiler/options.m:
    Provide a way to test for the presence of this fix.
2022-12-08 19:15:56 +11:00
Zoltan Somogyi
72e0014003 Rename more predicates to avoid ambiguities. 2022-07-07 06:24:09 +10:00
Zoltan Somogyi
9012395ec2 Don't let ml_tag_switch.m generate duplicate fields.
This fixes the second problem identified by Mantis bug #548.

compiler/ml_tag_switch.m:
    Detect the circumstances in which this problem would arise.
    In such cases, simply fail, and let ml_switch_gen.m fall back
    to implementing the switch as an if-then-else chain.

compiler/ml_switch_gen.m:
    Implement that fallback.

compiler/switch_util.m:
    The new code in ml_tag_switch.m needs to thread a fourth piece of state
    through the predicate it passes to group_cases_by_ptag, so change
    its argument list to accommodate such predicates. And since some other
    modules pass the same predicates to group_cases_by_ptag and
    string_binary_cases, make the same change in the argument list
    of that predicate as well.

    Delete one stray comment, and note that another comment seems misplaced.

compiler/ml_string_switch.m:
compiler/string_switch.m:
compiler/switch_case.m:
compiler/tag_switch.m:
    Conform to the changes in switch_util.m.

tests/hard_coded/bug548.exp:
tests/hard_coded/Mmakefile:
    Enable the previously-added test case for Mantis #548, after
    add an .exp file for it.
2022-02-11 21:32:53 +11:00
Zoltan Somogyi
9ddb180757 Handle const_var_maps left by add_trail_ops.m.
This fixes Mantis bug #544.

The code of add_trail_ops.m can transform

    <code that adds an entry to const_var_map>

into

    (
        ...
        <code that adds an entry to const_var_map>
        ...
    ;
        ...,
        fail
    )

where the const_var_map in the MLDS code generator records which variables'
values are available as ground terms.

The MLDS code generator used to reset the const_var_map in its main data
structure, the ml_gen_info, at the end of every disjunction (actually,
at the end of every branched control structure) to the value it had
at the start. This was intended to prevent the code following the branched
structure from relying on const_var_map entries that were added to the
const_var_map on *some* branches, but not others. However, in this case,
it has the effect of forgetting the entry added by the first disjunct,
even though

- the code after the disjunction can be reached *only* via the first disjunct,
  and

- the code after the disjunction (legitimately, until add_trail_ops) depended
  on that entry being available.

The fix is to allow the code after a branched control structure to depend
on any const_var_map entry that is present in the final const_var_map
in every branch of the branched control structure whose end is reachable.

The LLDS code generator was not affected by the bug, because it uses
totally separate systems both for implementing trailing, and for keeping
track of what variables' values are available statically. In particular,
it does not rely on operations inserted and the annotations left on
unifications by the add_trail_ops and mark_static_term passes,
having been written long before either module existed.

compiler/hlds_goal.m:
    Document the update above to what may be marked static.

compiler/ml_gen_info.m:
    Document the updated protocol for handling the const_var_map field.

    Use a named type instead of its expansion.

compiler/ml_code_gen.m:
    Make the predicates that generate code for a branch in a branched
    control structure return the final const_var_maps from the branches
    whose endpoints are reachable.

    Add a predicate that computes the consensus of all the gathered
    const_var_maps.

    Compute consensus const_var_maps for if-then-elses and negations.

    Fix some inconsistencies in variable naming.

    Simplify some code.

compiler/ml_disj_gen.m:
    Compute consensus const_var_maps for disjunctions.

compiler/ml_string_switch.m:
compiler/ml_switch_gen.m:
compiler/ml_tag_switch.m:
    Compute consensus const_var_maps for various kinds of switches.

    In some predicates, put related arguments next to each other.

compiler/ml_unify_gen_construct.m:
    Delete "dynamic" from the names of several predicates that also handled
    non-dynamic construction unifications.

    Fix an out-of-date comment.

compiler/mark_static_terms:
    Fix grammar in a comment.

library/map.m:
    Fix a careless bug: when doing a merge in map.common_subset_loop,
    we threw away an entry from the wrong list in one of three cases.

    Make such bugs harder to overlook by

    - deleting the common parts from variable names, leaving the differences
      easier to see, and

    - replacing numeric suffixes for completely separate data structures
      with A and B suffixes.

tests/valid/bug544.m:
    A new test case for the bug.

tests/valid/Mercury.options:
tests/valid/Mmakefile:
    Enable the bug, and run it with -O5.
2022-02-07 17:30:32 +11:00
Peter Wang
95f59cf7c9 Fix lookup switches on subtype enums.
compiler/switch_util.m:
    Rename dont_need_bit_vec_check variant of need_bit_vec_check to
    dont_need_bit_vec_check_no_gaps.

    Add dont_need_bit_vec_check_with_gaps (see below).

    Make type_range return the correct min and max values used by a
    subtype enum type. For now, it fails unless the range of values
    is contiguous.

    Make find_int_lookup_switch_params use the min and max values for a
    type returned by type_range, not assuming 0 to the max value.

    Make find_int_lookup_switch_params return
    dont_need_bit_vec_check_with_gaps when a bit vector check is not
    required before a table lookup, yet the table is expected to contain
    dummy rows. This is the case for a cannot_fail switch on a subtype
    enum type type, where the subtype does not use some values between
    the min and max values.

compiler/dense_switch.m:
    Make tagged_case_list_is_dense_switch use the min and max values for
    a type returned by type_range, not assuming 0 to the max value.

compiler/ml_lookup_switch.m:
    Expect the generated lookup table to contain dummy rows or not
    depending on dont_need_bit_vec_check_{with_gaps,no_gaps}.

    Conform to change to need_bit_vec_check.

compiler/lookup_switch.m:
compiler/ml_string_switch.m:
    Conform to change to need_bit_vec_check.

tests/hard_coded/Mmakefile:
tests/hard_coded/dense_lookup_switch4.exp:
tests/hard_coded/dense_lookup_switch4.m:
tests/hard_coded/dense_lookup_switch_non2.exp:
tests/hard_coded/dense_lookup_switch_non2.m:
    Add test cases.
2021-04-09 17:41:23 +10:00
Zoltan Somogyi
b66f45e4db Tighten the mlds_type type.
compiler/mlds.m:
    Make two changes to mlds_type.

    The simpler change is the deletion of the maybe(foreign_type_assertions)
    field from the MLDS representations of Mercury types. It was never used,
    because Mercury types that are defined in a foreign language that is
    acceptable for the current MLDS target platform are represented
    as mlds_foreign_type, not as mercury_type.

    The more involved change is to change the representation of builtin types.
    Until now, we had separate function symbols in mlds_type to represent
    ints, uints, floats and chars, but not strings or values of the sized
    types {int,uint}{8,16,32,64}; those had to be represented as Mercury types.
    This is an unnecessary inconsistency. It also had two allowed
    representations for ints, uints, floats and chars, which meant that
    some of the code handling those conceptual types had to be duplicated
    to handle both representations.

    This diff provides mlds_builtin_type_{int(_),float,string,char} function
    symbols to represent every builtin type, and changes mercury_type
    to mercury_nb_type to make clear that it is NOT to be used for builtins
    (the nb is short for "not builtin").

compiler/ml_code_util.m:
compiler/ml_util.m:
    Delete functions that used to construct MLDS representations of builtin
    types. The new representation of those types is so simple that using
    such functions is no less cumbersome than writing down the representations
    directly.

compiler/ml_accurate_gc.m:
compiler/ml_call_gen.m:
compiler/ml_closure_gen.m:
compiler/ml_disj_gen.m:
compiler/ml_foreign_proc_gen.m:
compiler/ml_global_data.m:
compiler/ml_lookup_switch.m:
compiler/ml_proc_gen.m:
compiler/ml_rename_classes.m:
compiler/ml_simplify_switch.m:
compiler/ml_string_switch.m:
compiler/ml_switch_gen.m:
compiler/ml_tag_switch.m:
compiler/ml_type_gen.m:
compiler/ml_unify_gen_construct.m:
compiler/ml_unify_gen_deconstruct.m:
compiler/ml_unify_gen_util.m:
compiler/mlds_dump.m:
compiler/mlds_to_c_data.m:
compiler/mlds_to_c_export.m:
compiler/mlds_to_c_func.m:
compiler/mlds_to_c_global.m:
compiler/mlds_to_c_stmt.m:
compiler/mlds_to_c_type.m:
compiler/mlds_to_cs_data.m:
compiler/mlds_to_cs_stmt.m:
compiler/mlds_to_cs_type.m:
compiler/mlds_to_java_data.m:
compiler/mlds_to_java_stmt.m:
compiler/mlds_to_java_type.m:
compiler/mlds_to_java_wrap.m:
compiler/rtti_to_mlds.m:
    Conform to the changes above.
2018-09-28 23:07:23 +10:00
Zoltan Somogyi
6a915eef05 Optimize field updates inside packed arg words.
Since june, we have been copying words containing packed-together
sub-word-sized arguments all in one piece if possible, for hlc grades.
This means that given a type such as

:- type t
    --->    f1(int8, bool, int8, int, bool, int8, bool).

whose first three and last three arguments are packed into one word each,
and a predicate such as

    p(T0, T) :-
        T0 = f1(A, B, C, _, E, F, G),
        D = 42,
        T  = f1(A, B, C, D, E, F, G).

we generated code such as

    MR_Integer D_12 = (MR_Integer) 42;
    MR_Unsigned packed_args_0 =
        (MR_Unsigned) ((MR_hl_field(MR_mktag(0), T0_3, (MR_Integer) 0)));
    MR_Unsigned packed_args_1 =
        (MR_Unsigned) ((MR_hl_field(MR_mktag(0), T0_3, (MR_Integer) 2)));

    base = (MR_Word) MR_new_object(MR_Word,
        ((MR_Integer) 3 * sizeof(MR_Word)), NULL, NULL);
    *T_4 = base;
    MR_hl_field(MR_mktag(0), base, 0) = (MR_Box) (packed_args_0);
    MR_hl_field(MR_mktag(0), base, 1) = ((MR_Box) (D_12));
    MR_hl_field(MR_mktag(0), base, 2) = (MR_Box) (packed_args_1);

which does NOT pick up the values A, B, C, E, F and G individually.
However, until now, we could reuse packed-together words only in their
unchanged form.

This diff lifts that limitation, which means that now, we can *also*
optimize code such as

    p(T0, T) :-
        T0 = f1(A, B, _, D, E, _, G),
        C = 42i8,
        F = 43i8,
        T  = f1(A, B, C, D, E, F, G).

by generating code like this:

    base = (MR_Word) MR_new_object(MR_Word,
        (3 * sizeof(MR_Word)), NULL, NULL);
    *T_4 = base;
    MR_hl_field(MR_mktag(0), base, 0) = (MR_Box)
        ((((packed_word_0 & (~((MR_Unsigned) 255U)))) |
        (MR_Unsigned) ((uint8_t) (C_12))));
    MR_hl_field(MR_mktag(0), base, 1) = ((MR_Box) (D_8));
    MR_hl_field(MR_mktag(0), base, 2) = (MR_Box)
        ((((packed_word_1 & (~((MR_Unsigned) 510U)))) |
        (((MR_Unsigned) ((uint8_t) (F_13)) << 1))));

The general scheme when reusing *part* of a word is: first set the bits
not being reused to zero, and then OR in new values of those bits.

Make this optimization as general as possible by making it work
not just for

- words in memory cells containing only arguments,

but also for

- words in memory cells containing a remote sectag as well as arguments, and
- words in registers cells containing a ptag, a local sectag as well as
  arguments.

compiler/ml_gen_info.m:
    Generalize the data structure we use to represent information about
    packed words to make possible approximate as well as exact lookups.
    The key in the old map was "these bitfields with the values of these
    variables in them", while the key in the new map is just "these bitfields",
    with the associated value being a list, each element of which says
    "the word with these values in those bitfields is available in this rval".
    This makes it possible to look for matches words that have some, but not
    all, of the right values in the bitfields.

    Since the packed words may now contain tags as well as arguments,
    rename "packed args" to "packed word".

compiler/ml_unify_gen_deconstruct.m:
    When deconstructing a term containing packed words, add them to the
    packed word map even when one of the bitfields inside the packed word
    contains tag information.

    Move the code that adds a packed word to the map into a separate predicate,
    now that it is needed from more than one place.

compiler/ml_unify_gen_construct.m:
    Change the code that handles packed words to work in terms of filled
    bitfields. Use this not only to implement the optimization described
    at the top, but also to make the handling of bitfields more systematic.
    At least one previous bug was caused by doing sign extension differently
    for the bitfield containing the first packed argument in a word than for
    the later packed arguments in that word; with the new design, such
    inconsistencies should not happen.

compiler/ml_unify_gen_util.m:
    Add utility predicates now needed for both construct and deconstruct
    unifications.

compiler/mlds.m:
    Document the new use of lvnc_packed_word (renamed from lvnc_packed_args).

compiler/ml_code_gen.m:
compiler/ml_code_util.m:
compiler/ml_commit_gen.m:
compiler/ml_disj_gen.m:
compiler/ml_string_switch.m:
compiler/ml_switch_gen.m:
compiler/ml_tag_switch.m:
    Conform to the changes above (mostly the packed_word rename).

compiler/mlds_to_c_data.m:
compiler/mlds_to_c_stmt.m:
    Omit unneeded casts from the output. Specifically, don't put (MR_Integer)
    casts in front of integer constants being used either as shift amounts,
    or as the number of words that a new_object MLDS operation should allocate.
    The casts only cluttered the output, making it harder to read, and
    therefore to judge its correctness.
2018-09-10 16:17:17 +10:00
Zoltan Somogyi
b06b2621b3 Move towards packing args with secondary tags.
compiler/hlds_data.m:
    Add bespoke types to record information about local and remote secondary
    tags. The one for local secondary tags includes the value of the
    primary and secondary tag together, since construct unifications
    need to assign this value, and it is better to compute this once,
    instead leaving the target language compiler to do it, potentially
    many times.

    Use a wrapped uint8 to record primary tag values, and wrapped uints
    to record secondary tag values. The wrap is to prevent any accidental
    confusion with other values. The use of uint8 and uint has two purposes.
    First, using the tighest possible representation. Tags are never negative,
    and primary tags cannot exceed 7. Second, using these types in the compiler
    help us eat our own dogfood; if a change causes a problem affecting
    these types, its bootcheck should fail, alerting us to the problem.

    Add commented-out types and fields that will be needed for packing
    sub-word-sized arguments together with both local and remote secondary
    tags.

compiler/du_type_layout.m:
    Generate references to tags in the new format.

compiler/ml_unify_gen.m:
compiler/unify_gen.m:

compiler/modecheck_goal.m:
    Conform to the changes above.

    Fix an old bug: the inst corresponding to a constant with a primary
    and a local secondary tag is not the secondary tag alone, but both tags
    together.

compiler/bytecode.m:
compiler/bytecode_gen.m:
compiler/closure_gen.m:
compiler/disj_gen.m:
compiler/export.m:
compiler/hlds_code_util.m:
compiler/jumpopt.m:
compiler/lco.m:
compiler/llds_out_data.m:
compiler/llds_out_instr.m:
compiler/lookup_switch.m:
compiler/lookup_util.m:
compiler/ml_accurate_gc.m:
compiler/ml_call_gen.m:
compiler/ml_closure_gen.m:
compiler/ml_code_util.m:
compiler/ml_elim_nested.m:
compiler/ml_string_switch.m:
compiler/ml_switch_gen.m:
compiler/ml_tag_switch.m:
compiler/ml_type_gen.m:
compiler/mlds_dump.m:
compiler/mlds_to_c_data.m:
compiler/mlds_to_c_stmt.m:
compiler/opt_debug.m:
compiler/peephole.m:
compiler/rtti.m:
compiler/rtti_out.m:
compiler/rtti_to_mlds.m:
compiler/string_switch.m:
compiler/switch_util.m:
compiler/tag_switch.m:
compiler/type_ctor_info.m:
    Conform to the change to hlds_data.m.

    In two places, in rtti_out.m and rtti_to_mlds.m, delete old code
    that was needed only to implement reserved tags, which we have
    stopped supporting a few months ago.

library/uint8.m:
library/uint16.m:
library/uint32.m:
library/uint64.m:
    Add predicates to cast from each of these types to uint.
2018-06-06 03:35:20 +02:00
Zoltan Somogyi
ec6a40ed85 Put related args of ml_field next to each other.
compiler/mlds.m:
    Put the *type* of the pointer next to the *value* of the pointer.

compiler/ml_accurate_gc.m:
compiler/ml_call_gen.m:
compiler/ml_closure_gen.m:
compiler/ml_code_util.m:
compiler/ml_elim_nested.m:
compiler/ml_optimize.m:
compiler/ml_rename_classes.m:
compiler/ml_string_switch.m:
compiler/ml_type_gen.m:
compiler/ml_unify_gen.m:
compiler/ml_unused_assign.m:
compiler/ml_util.m:
compiler/mlds_dump.m:
compiler/mlds_to_c_data.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_java.m:
compiler/mlds_to_target_util.m:
compiler/rtti_to_mlds.m:
    Conform to the change above.
2018-06-04 23:28:19 +02:00
Zoltan Somogyi
bbe0f28f3b Copy packed arguments all at once.
Copy words containing packed-together sub-word-sized arguments all
in one piece if possible, for hlc grades.

Given a type such as

:- type t
    --->    f1(int8, bool, int8, int, bool, int8, bool).

whose first three and last three arguments are packed into one word each,
and a predicate such as

p(T0, T) :-
    T0 = f1(A, B, C, _, E, F, G),
    D = 42,
    T  = f1(A, B, C, D, E, F, G).

we used to generate code that picked up each of the six named arguments
from T0, and used them to construct T. With this diff, we now translate
the above to

    MR_Integer D_12 = (MR_Integer) 42;
    MR_Unsigned packed_args_0 =
        (MR_Unsigned) ((MR_hl_field(MR_mktag(0), T0_3, (MR_Integer) 0)));
    MR_Unsigned packed_args_1 =
        (MR_Unsigned) ((MR_hl_field(MR_mktag(0), T0_3, (MR_Integer) 2)));

    base = (MR_Word) MR_new_object(MR_Word,
        ((MR_Integer) 3 * sizeof(MR_Word)), NULL, NULL);
    *T_4 = base;
    MR_hl_field(MR_mktag(0), base, 0) = (MR_Box) (packed_args_0);
    MR_hl_field(MR_mktag(0), base, 1) = ((MR_Box) (D_12));
    MR_hl_field(MR_mktag(0), base, 2) = (MR_Box) (packed_args_1);

compiler/ml_unify_gen.m:
    Implement the two main parts of this optimization.

    Part one is the change to deconstruction unifications. When we generate
    assignments from all the fields packed together into a word to their
    corresponding argument variables (such as A/B/C or E/F/G above),
    create a fresh variable (such as packed_args_0 above), assign to it
    the value of the whole word, and record in a new data structure (the
    packed_args_map) that these argument variables, in these positions
    within the word, are now available in the newly created variable.
    (We still define the argument variables as well, since they may be needed;
    deleting them if they are *not* needed is the job of ml_unused_assign.m.)

    Part two is the change to construction unifications. When we generate code
    to OR together the shifted and/or masked values of two or more variables
    to fill in one word in a new heap cell, we search the packed_args_map
    to see whether those variables, in the positions we need, are available
    in one of the variables created in part one. If yes, we discard
    the whole OR-ing together operation and we use that variable instead.

    Since part one can now create local variable definitions, return these
    upwards as needed.

compiler/ml_gen_info.m:
    Add two fields to the ml_gen_info structure (actually, to one of its
    substructures). One is the packed_args_map described above, the other
    is a counter we use to give a unique name to all the fresh variables.

    When creating ml_gen_infos, put the code defining each field of a
    substructure next to the creation of that substructure.

compiler/mlds.m:
    Add a kind of compiler-generated variable holding packed argument words.
    It is used in part one above.

compiler/ml_code_gen.m:
compiler/ml_code_util.m:
compiler/ml_commit_gen.m:
compiler/ml_disj_gen.m:
compiler/ml_string_switch.m:
compiler/ml_switch_gen.m:
compiler/ml_tag_switch.m:
    Save, reset and restore the packed_args_map as necessary to ensure that
    a construction unification sees an entry in that map only if the
    deconstruction unification that created that entry *had* to be executed
    before execution reaches the construction unification.

    This means that when we process a branched control structure, we have to
    make sure that (a) entries created by one branch are not seen when
    we generate code for the other branches, and (b) that code *after* the
    branched control structure sees only the entries created *before* the
    branched control structure, since such code following cannot use an entry
    that was created by a branch that may or may NOT have been executed
    on the way there.

    We also reset the packed_args_map to empty when generating code
    that will end up inside a nested function, for two reasons. First,
    I am not sure whether the code in ml_elim_nested.m that flattens out
    nested functions is general enough to handle the new kind of compiler
    generated variable correctly. And second, even if it is, the additional
    memory traffic for putting those variables into environments, and later
    pulling them out again, would definitely reduce and maybe completely
    eliminate the speedup from optimizing constructions.

compiler/ml_closure_gen.m:
    Conform to the change in ml_unify_gen.m.

compiler/ml_proc_gen.m:
    Invoke ml_unused_assign.m in both branches of an if-then-else.
    Previously, it was invoked in only the rarely executed branch,
    which is what hid its bugs.

    Fix one bug: for model_semi procedures, include the succeeded variable
    in the set of variables whose values is needed after the generated
    function body.

    Work around another bug: the ml_unused_assign.m cannot yet handle
    nested functions properly, so throw away its output in their presence.

compiler/ml_unused_assign.m:
    As part of the same workaround, if a block contains nested functions,
    tell ml_proc_gen.m to use the original code.

    Fix several other bugs.

    Don't delete variables from the seen_set when the backwards traversal
    finds an assignment to them, because the variable's absence from
    the seen_set would lead to the declaration of the variable being deleted.

    Delete a sanity check that made sense only the presence of such deletions.

    Never delete assignments to compiler-generated variables; we generate
    such assignments only when their results *will* be needed.

    When exiting the traversal of a block, *do* delete the variables
    declared locally in that block from the seen_set; being undeclared there,
    they cannot possibly be seen before that block. leaving them in
    does not compromise correctness, but does reduce performance
    by making operations on the seen_set slower than necessary.

    If deleting unused assignments makes the else part of an if-then-else
    empty, then delete the whole else part.

compiler/mlds_to_c_stmt.m:
    Generate a valid C statement even for an MLDS comment. When an buggy
    version of ml_unused_assign.m (incorrectly) deleted assignments to
    succeeded, it sometimes left an else part containing only a comment,
    which lead gcc to report syntax errors.
2018-06-02 18:56:40 +02:00
Zoltan Somogyi
b9afc8b78e Delete the mlds_unary_op type.
compiler/mlds.m:
    We used to have a function symbol ml_unop in the mlds_rval type
    that applied one of four kinds of operations to an argument mlds_rval:
    boxing, unboxing, casting or a standard unary operation, with a value
    of type mlds_unary_op selecting between the four. Replace this system
    with four separate function symbols in the mlds_rval type directly,
    and delete the mlds_unary_op type.

    The new arrangement requires fewer memory cells to be allocated,
    and less indirection; it also leads to shorter and somewhat
    more readable code.

compiler/ml_optimize.m:
    Conform to the change above.

    Recognize that a cast has negligible cost.

compiler/ml_code_util.m:
    Conform to the change above.

    Keep private a predicate that is not used by any other module,
    after merging it with another previously-exported predicate
    that only *it* uses.

    Delete some other predicates that are not used anywhere.

compiler/ml_accurate_gc.m:
compiler/ml_call_gen.m:
compiler/ml_closure_gen.m:
compiler/ml_code_gen.m:
compiler/ml_disj_gen.m:
compiler/ml_elim_nested.m:
compiler/ml_foreign_proc_gen.m:
compiler/ml_global_data.m:
compiler/ml_lookup_switch.m:
compiler/ml_rename_classes.m:
compiler/ml_string_switch.m:
compiler/ml_tag_switch.m:
compiler/ml_unify_gen.m:
compiler/ml_unused_assign.m:
compiler/ml_util.m:
compiler/mlds_to_c.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_java.m:
compiler/mlds_to_target_util.m:
compiler/rtti_to_mlds.m:
    Conform to the change above.
2018-05-13 12:23:38 +02:00
Zoltan Somogyi
fcefbb948d Delete assignments to dead variables in the MLDS.
At the moment, we tend not to generate such assignments, with the exception
of assignments to the MLDS versions of HLDS variables of dummy types.
The reason I am nevertheless adding this optimization is that I intend
to soon add code to ml_unify_gen.m that *will* generate assignments
to dead variables.

The idea is to optimize field updates involving packed arguments.
Given a type such as

:- type t
    --->    f(
                f1      :: bool,
                f2      :: bool,
                f3      :: enum1,
                f4      :: int
            ).

we currently implement a field update such as "T = T0 ^ f4 := 42",
whose HLDS representation is the two unifications

    T0 = f(T0f1, T0f2, T0f3, _),
    T = f(T0f1, T0f2, T0f3, 42)

using code that looks like this:

    T0f1 = (T0[0] >> ...) & ...
    T0f2 = (T0[0] >> ...) & ...
    T0f3 = (T0[0] >> ...) & ...
    T = allocate memory for new memory cell, put on primary tag
    T[0] = (T0f1 << ...) | (T0f2 << ...) | (T0f3 << ...)
    T[1] = 42

I want to implement it using code that looks like this:

    T0w0 = T0[0]
    T = allocate memory for new memory cell, put on primary tag
    T[0] = T0w0
    T[1] = 42

where T0w0 contains the entire first word of the memory cell of T0.
This code avoids a bunch of shifts, ORs and ANDs.

I propose to translate the T0 = f(T0f1, T0f2, T0f3, _) unification into

    T0w0 = T0[0]
    T0f1 = (T0[0] >> ...) & ...
    T0f2 = (T0[0] >> ...) & ...
    T0f3 = (T0[0] >> ...) & ...

while recording in the ml_gen_info/code_info that this *specific* packing of
T0f1, T0f2 and T0f3 is available in T0w0. When translating the following
unification, the code generator will see this, and this will allow it to
generate

    T[0] = T0w0

instead of

    T[0] = (T0f1 << ...) | (T0f2 << ...) | (T0f3 << ...)

However, by this time the assignments to T0f1, T0f2 and T0f3 have already
been generated. Whether or not they are dead assignments depends on whether
other code needs the values of those fields of T0. Deciding this
requires knowledge that the code generator can't have when translating
the deconstruction of T0. Hence the need for a new MLDS-to-MLDS optimization.

compiler/ml_unused_assign.m:
    A new compiler module implementing the new optimization.
    It is not part of ml_optimize.m because ml_optimize.m traverses
    the MLDS forwards, while this optimization requires a backwards traversal:
    you cannot know whether an assignment is dead unless you know that the
    following code does not need the value of the variable it assigns to.

compiler/ml_backend.m:
compiler/notes/compiler_design.html:
    Include the new module.

compiler/mlds.m:
    The new optimization needs extra information about loops.
    When it enters into the loop body, it knows which variables
    are needed *after* the loop, but it does not know which variables
    the loop body first reads and then writes. Without this knowledge,
    it would optimize away assignments to loop control variables,
    such as the increment of i in the loop

    i = 0;
    while (...) {
        ...; i = i+1; ...
    }

    Traditionally, compilers have solved this problem by doing fixpoint
    iteration, adding to the live set at each program point until
    no more additions are possible. We can do better, because we generate
    loops in the MLDS in only two kinds of cases:

    - loops implementing tail recursion, in which case the only extra
      variables that we need to preserve assignments to in the loop body
      are the input arguments of the procedure, and
    - loops created by the compiler itself to loop over a set of alternatives,
      for which the only extra variables that we need to preserve assignments
      to in the loop body are the variables the compiler uses to control
      the loop.

    To make it possible for ml_unused_assign.m to do its job without
    a fixpoint iteration, include in the MLDS representation of every
    while loop a list of these variables.

    Add a type to represent the identify of an MLDS local var,
    for use by some of the modules below. They used to store this info
    in the form of mlds_lvals, but that is not specific enough
    to be used to fill in the new field in while loops.

compiler/ml_proc_gen.m:
    Compute the information needed by the new pass, and invoke it
    if the relevant option is set.

compiler/options.m:
    Add this option. It is for developers only, so it is undocumented.

compiler/ml_util.m:
    Add a utility function needed in several places.

compiler/ml_accurate_gc.m:
compiler/ml_disj_gen.m:
compiler/ml_elim_nested.m:
compiler/ml_lookup_switch.m:
compiler/ml_optimize.m:
compiler/ml_rename_classes.m:
compiler/ml_string_switch.m:
compiler/mlds_to_c.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_java.m:
compiler/mlds_to_target_util.m:
    Conform to the changes in mlds.m.
2018-05-09 23:56:28 +02:00
Zoltan Somogyi
dc4196e5af Separate breaks from loops and breaks from switches in MLDS.
compiler/mlds.m:
    Replace goto_break with goto_break_loop and goto_break_switch, each
    intended to break from a particular construct. It was confusion between
    the two kinds of breaks that led to the earlier bug that broke
    --prefer-while-loop-over-jump-mutual; this separation should make
    such bugs easy to detect.

    Rename goto_continue as goto_continue_loop to match the new naming scheme.

compiler/mlds_to_c.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_java.m:
    When emitting goto_break_loop and goto_break_switch, check whether
    the nearest enclosing break-able scope is a loop or switch respectively.

    To make this check possible, record the nearest break-able scope.

    While these additions make the compiler do extra work, the performance
    impact is negligible.

compiler/mlds_to_target_util.m:
    Add the type that mlds_to_{c,cs,java}.m all use to identify
    break-able scopes.

compiler/ml_call_gen.m:
compiler/ml_proc_gen.m:
compiler/ml_string_switch.m:
    Update the code that generates gotos.
2017-11-11 12:35:54 +11:00
Zoltan Somogyi
034cb97988 Don't module- or type-qualify MLDS local variables.
Some global variables generated by the MLDS backend need to be visible
across module boundaries, and therefore mlds_data definitions, which
contained global as well as other variables, used to have their names
qualified; usually module-qualified, though sometimes type-qualified.

However, since the diff that partitioned mlds_data_defns into the
definitions of local variables, global variables and field variables,
the qualification of local variables has *not* been necessary, so this diff
removes such qualifications. This makes the MLDS code generating references
to local variables simpler, more readable, and slightly faster.
The generated code is also shorter and easier to read.

There are two exceptional cases in which local variables *did* need
qualification, both of which stretch the meaning of "local".

One such case is the "local" variable dummy_var, which (by definition)
is only ever assigned to, and never used. It is also never defined
in MLDS-generated code; instead, it is defined defined in private_builtin.m
(for the Java and C# backends) or the runtime (for C). All three backends
currently require references to this variable in the runtime to be module
qualified. There are three possible fixes to this problem, which is caused
by the fact that this "local" variable is in fact global.

- Fix 1a would be to make dummy_vars global, not local.
- Fix 1b is to special-case dummy_vars in mlds_to_{c,cs,java}.m, and put
  the fixed "private_builtin" qualifier in front of it.
- Fix 1c would be to modify the compiler to never generate any references
  to dummy vars at all.

This diff uses fix 1b, because it is simple. I (zs) will explore fix 1c
in the future, and see if it is viable.

The second such case occurs when generating code for unifications
involving function symbols represented by the addresses of reserved objects.
These addresses used to be represented as the addresses of mlds_data
definitions, then as addresses of field variables cast as qualified
local variables. Since diff this makes all local variables unqualified,
this can't continue. Two possible fixes are

- Fix 2a: introduce an mlds const rval representing the address of a field
  variable, which solves the problem because unlike local variables,
  field variables can still be either module- or type-qualified.
- Fix 2b: prohibit the use of the addresses of reserved objects as tags.

After a (short) discussion on m-dev, this diff uses fix 2b.

compiler/mlds.m:
    Delete the qual_local_var_name type, and replace all its uses
    with the mlds_local_var_name type. Delete the module qualifier field
    in mlds_data_addr_local_var consts.

compiler/ml_code_util.m:
    Simplify the predicates and functions whose task is to build references
    to local variables. Delete the arguments that they don't need anymore.
    Delete one function entirely, since calling it now takes both more
    characters and more code than its shortened body does.

compiler/ml_accurate_gc.m:
compiler/ml_call_gen.m:
compiler/ml_closure_gen.m:
compiler/ml_code_gen.m:
compiler/ml_commit_gen.m:
compiler/ml_disj_gen.m:
compiler/ml_elim_nested.m:
compiler/ml_foreign_proc_gen.m:
compiler/ml_lookup_switch.m:
compiler/ml_optimize.m:
compiler/ml_rename_classes.m:
compiler/ml_string_switch.m:
compiler/ml_tailcall.m:
compiler/ml_type_gen.m:
compiler/ml_unify_gen.m:
compiler/ml_util.m:
compiler/mlds_to_target_util.m:
compiler/rtti_to_mlds.m:
    Conform to the changes above. Stop qualifying local variable names,
    and stop passing the parameters that used to be used *only* for
    qualifying local variable names.

compiler/mlds_to_c.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_java.m:
    Conform to the changes above, and implement fix 1b.

NEWS:
compiler/options.m:
compiler/make_tags.m:
    Implement fix 2b by disabling the --num-reserved-objects option.
    This ensures that we don't use the addresses of reserved objects as tags.

library/private_builtin.m:
    Move the C# definition of dummy_var next to the Java definition,
    and fix the comments on them.
2017-08-09 18:23:53 +02:00
Zoltan Somogyi
1c01ed85eb Fix lines. 2017-07-29 14:15:15 +02:00
Zoltan Somogyi
91790794f1 Define the MLDS "succeeded" variable only if needed.
This makes the generated MLDS code less cluttered and easier to work on.

compiler/ml_gen_info.m:
    Add a field for recording whether the succeeded variable has been used.

compiler/ml_code_util.m:
    Change the predicates that return references to the succeeded variable
    to record that it has been used.

compiler/ml_call_gen.m:
compiler/ml_closure_gen.m:
compiler/ml_code_gen.m:
compiler/ml_commit_gen.m:
compiler/ml_disj_gen.m:
compiler/ml_foreign_proc_gen.m:
compiler/ml_lookup_switch.m:
compiler/ml_string_switch.m:
compiler/ml_unify_gen.m:
    Use the updated forms of the predicates in ml_code_util.m.

compiler/ml_proc_gen.m:
    Define the succeeded variable only if the new slot says it has been used.

compiler/ml_optimize.m:
    Fix a bug triggered by the above change: when a tail recursive call
    was the *entire body* of a MLDS function, ml_optimize.m did not find it,
    and thus did not do the setup needed to prepare for the tail recursion.
    Previously, the always-present declaration of "succeeded" made it
    impossible for the tail call to be the only thing in the body.
2017-07-29 01:40:56 +02:00
Zoltan Somogyi
b390231f22 Use mlds_target_lang in the MLDS backend.
The overall compilation target language (which is recorded in the globals)
can be C, Java, C# or Erlang. The target language of the MLDS backend
can only be the first three. Use the mlds_target_lang type (which has
three functors) instead of the compilation_target type (which has four)
to make target-specific decisions in the MLDS backend.

compiler/mercury_compile_mlds_back_end.m:
    Compute the MLDS target (which can be C, Java or C#) from the compilation
    target (which can also be Erlang).

compiler/ml_closure_gen.m:
compiler/ml_disj_gen.m:
compiler/ml_elim_nested.m:
compiler/ml_foreign_proc_gen.m:
compiler/ml_gen_info.m:
compiler/ml_global_data.m:
compiler/ml_proc_gen.m:
compiler/ml_string_switch.m:
compiler/ml_tag_switch.m:
compiler/ml_type_gen.m:
compiler/ml_unify_gen.m:
compiler/mlds.m:
compiler/rtti_to_mlds.m:
    Use the mlds_target_lang value computed in mercury_compile_mlds_back_end.m
    to make decisions. Code in most modules get this from the ml_gen_info;
    in some others, it is passed around, usually instead of the globals.

compiler/ml_code_util.m:
    Unify two separate copies of a comment.
2017-07-27 03:33:20 +02:00
Zoltan Somogyi
11c232f060 Store different kinds of definitions in blocks separately.
An ml_stmt_block contains some definitions and some statements.
The definitions were traditionally stored in a single list of mlds_defns,
but lots of code knew that some kinds of mlds_defns just couldn't occur
in blocks. This diff, by storing the definitions of (a) local variables
and (b) continuation functions in separate field in ml_stmt_blocks,
gets the type system to enforce the invariant that other kinds of definitions
can't occur in blocks.

This also allows the compiler to do less work, since definitions
don't have to wrapped and then later unwrapped, and code that wants to look
at only e.g. the function definitions in a block don't have to traverse
the definitions of local variables (of which there are many more).

compiler/mlds.m:
    Make the change described above.

compiler/ml_accurate_gc.m:
compiler/ml_call_gen.m:
compiler/ml_closure_gen.m:
compiler/ml_code_gen.m:
compiler/ml_code_util.m:
compiler/ml_commit_gen.m:
compiler/ml_disj_gen.m:
compiler/ml_elim_nested.m:
compiler/ml_lookup_switch.m:
compiler/ml_optimize.m:
compiler/ml_proc_gen.m:
compiler/ml_simplify_switch.m:
compiler/ml_string_switch.m:
compiler/ml_switch_gen.m:
compiler/ml_tailcall.m:
compiler/ml_type_gen.m:
compiler/ml_unify_gen.m:
compiler/ml_util.m:
compiler/mlds_to_c.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_java.m:
compiler/mlds_to_target_util.m:
    Conform to the change above. This allows us to avoid lots of wrapping
    up definitions.

    In some cases, after this change, we don't need to process mlds_defns
    *in general*, which leaves the predicates that used to do that,
    and some of the predicates that they used to call, unused. Delete these.

    In code that generated MLDS code, consistently use names containing
    the word "Defn", instead of "Decl", for variables that contain
    mlds_local_var_defns or mlds_function_defns. Some such predicates
    generate lists of both local var definition and function definitions,
    but most generate only one, and some generate neither.
2017-07-26 00:57:13 +02:00
Zoltan Somogyi
47f1df4a0a Split mlds_data_defn into three separate types.
We used to use mlds_data_defns to represent three related but nevertheless
distinct kinds of entities: global variables, local variables, and fields
in classes. This diff replaces the mlds_data_defn type with three separate
types: mlds_global_var_defn, mlds_local_var_defn and mlds_field_var_defn
respectively, with corresponding changes to related types, such as
mlds_data_name.

The global variables are completely separate from the other two kinds.
Local and field variables are *mostly* separate from each other, but they
are related in one way. When we flatten out nested functions, the child
nested function can no longer access its parent function's local variables,
so we pass those variables to it as fields of an environment structure.
This requires turning local variables to fields of that structure,
and the code in the flattened previously-nested function that accesses
those fields naturally wants to treat them as if they were local variables
(as indeed they sort-of were before the flattening). There are therefore
ways to convert each of local and fields vars into the other.

This restructuring makes clear several invariants of the MLDS we generate
that were previously hidden. For example, variables with certain kinds of
names (in the before-this-diff, general version of the mlds_var_name type)
could appear only as function arguments or as locals in ml_stmt_blocks,
not in ml_global_data, while for some other names the opposite was the case.
And in several cases, functions used to take a general mlds_data_defn
as argument but aborted if given the "wrong kind" of mlds_data_defn.

This diff also makes possible further simplifications. For example,
local vars should not need some flags (since e.g. they are never per-instance),
and should never need either module or type qualification, while global
variables (which are also never per-instance) should never need type
qualification (since they are not fields of a type). The definitions
in blocks should consist of local variables and (before flattening) functions,
not global variables, field variables or classes, while the members in classes
should be only field variables and functions (and maybe classes), not
global or local variables. Those changes will be in future diffs;
this is already large enough.

compiler/mlds.m:
    Make the changes described above.

    Use tighter types where possible.

    Use (a generalized version) of the mlconst_named_const functor
    to represent values of enum types defined in the runtimes
    of the target platforms.

compiler/ml_global_data.m:
    Store *only* global variables in fields that previously stored general
    mlds_datas (that by design were always global).

    Store *only* closure wrapper functions in the previous non-flat-defns
    field. Before this diff, the code generator only put closure wrapper
    functions in this field, but then ml_elim_nested.m put everything
    resulting from the expansion of those functions back into those fields
    as well, some of which were not functions. It now puts those non-function
    things into the MLDS data structure directly.

compiler/ml_code_util.m:
compiler/ml_util.m:
    Conform to the changes above.

    Use tighter types where possible. If appropriate, change the name
    of the function or predicate accordingly.

    Represent references to enum constants defined in the runtime of the
    target language as named constants (since they is what they are),
    instead of representing them as MLDS "variables", which required
    the code of mlds_to_cs.m had to special-case the treatment
    of those "variables".

compiler/ml_elim_nested.m:
    Conform to the changes above.

    Use tighter types where possible.

    Don't put the environment types resulting from flattening nested scopes
    back into the non-flat-defns slot of the ml_elim_info; instead, return
    them separately to code that puts them directly in the MLDS.

compiler/rtti.m:
    When returning the names of enum constants in the C runtime, return also
    the prefixes that you need to place in front of these to obtain their names
    in the Java and C# runtimes.

compiler/mercury_compile_mlds_back_end.m:
compiler/ml_accurate_gc.m:
compiler/ml_call_gen.m:
compiler/ml_closure_gen.m:
compiler/ml_code_gen.m:
compiler/ml_commit_gen.m:
compiler/ml_disj_gen.m:
compiler/ml_foreign_proc_gen.m:
compiler/ml_gen_info.m:
compiler/ml_lookup_switch.m:
compiler/ml_optimize.m:
compiler/ml_proc_gen.m:
compiler/ml_string_switch.m:
compiler/ml_switch_gen.m:
compiler/ml_tailcall.m:
compiler/ml_type_gen.m:
compiler/ml_unify_gen.m:
compiler/mlds_to_c.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_java.m:
compiler/mlds_to_target_util.m:
compiler/rtti_out.m:
compiler/rtti_to_mlds.m:
    Conform to the changes above.

    Move a utility function from ml_util.m to mlds_to_target_util.m,
    since it is used only in mlds_to_*.m.
2017-07-22 00:20:40 +02:00
Julien Fischer
8a240ba3f0 Add builtin 8, 16 and 32 bit integer types -- Part 1.
Add the new builtin types: int8, uint8, int16, uint16, int32 and uint32.
Support for these new types will need to be bootstrapped over several changes.
This is the first such change and does the following:

- Extends the compiler to recognise 'int8', 'uint8', 'int16', 'uint16', 'int32'
  and 'uint32' as builtin types.
- Extends the set of builtin arithmetic, bitwise and relational operators to
  cover the new types.
- Extends all of the code generators to handle new types.  There currently lots
  of limitations and placeholders marked by 'XXX FIXED SIZE INT'.  These will
  be lifted in later changes.
- Extends the runtimes to support the new types.
- Adds new modules to the standard library intended to hold the basic
  operations on the new types.  (These are currently empty and not documented.)

This change does not introduce the two 64-bit types, 'int64' and 'uint64'.
Their implementation is more complicated and is best left to a separate change.

compiler/prog_type.m:
compiler/prog_data.m:
compiler/builtin_lib_types.m:
    Recognise int8, uint8, int16, uint16, int32 and uint32 as builtin types.

    Add new type, int_type/0,that enumerates all the possible integer types.

    Extend the cons_id/0 type to cover the new types.

compiler/builtin_ops.m:
    Parameterize the integer operations in the unary_op/0 and binary_op/0
    types by the new int_type/0 type.

    Add builtin operations for all the new types.

compiler/hlds_data.m:
    Add new tag types for the new types.

compiler/hlds_pred.m:
    Parameterize integers in the table_trie_step/0 type.

compiler/ctgc.selector.m:
compiler/dead_proc_elim.m:
compiler/export.m:
compiler/foreign.m:
compiler/goal_util.m:
compiler/higher_order.m:
compiler/hlds_code_util.m:
compiler/hlds_dependency_graph.m:
compiler/hlds_out_pred.m:
compiler/hlds_out_util.m:
compiler/implementation_defined_literals.m:
compiler/inst_check.m:
compiler/mercury_to_mercury.m:
compiler/mode_util.m:
compiler/module_qual.qualify_items.m:
compiler/opt_debug.m:
compiler/opt_util.m:
compiler/parse_tree_out_info.m:
compiler/parse_tree_to_term.m:
compiler/parse_type_name.m:
compiler/polymorphism.m:
compiler/prog_out.m:
compiler/prog_rep.m:
compiler/prog_rep_tables.m:
compiler/prog_util.m:
compiler/rbmm.exection_path.m:
compiler/rtti.m:
compiler/rtti_to_mlds.m:
compiler/switch_util.m:
compiler/table_gen.m:
compiler/type_constraints.m:
compiler/type_ctor_info.m:
compiler/type_util.m:
compiler/typecheck.m:
compiler/unify_gen.m:
compiler/unify_proc.m:
compiler/unused_imports.m:
compiler/xml_documentation.m:
    Conform to the above changes to the parse tree and HLDS.

compiler/c_util.m:
    Support generating the builtin operations for the new types.

doc/reference_manual.texi:
    Add the new types to the list of reserved type names.

    Add the mapping from the new types to their target language types.
    These are commented out for now.

compiler/llds.m:
    Replace the lt_integer/0 and lt_unsigned functors of the llds_type/0,
    with a single lt_int/1 functor that is parameterized by the int_type/0
    type.

    Add a representations for constants of the new types to the LLDS.

compiler/call_gen.m:
compiler/dupproc.m:
compiler/exprn_aux.m:
compiler/global_data.m:
compiler/jumpopt.m:
compiler/llds_out_data.m:
compiler/llds_out_global.m:
compiler/llds_out_instr.m:
compiler/lookup_switch.m:
compiler/middle_rec.m:
compiler/peephole.m:
compiler/pragma_c_gen.m:
compiler/stack_layout.m:
compiler/string_switch.m:
compiler/switch_gen.m:
compiler/tag_switch.m:
compiler/trace_gen.m:
compiler/transform_llds.m:
    Support the new types in the LLDS code generator.

compiler/mlds.m:
    Support constants of the new types in the MLDS.

compiler/ml_accurate_gc.m:
compiler/ml_call_gen.m:
compiler/ml_code_util.m:
compiler/ml_disj_gen.m:
compiler/ml_foreign_proc_gen.m:
compiler/ml_global_data.m:
compiler/ml_lookup_switch.m:
compiler/ml_simplify_switch.m:
compiler/ml_string_switch.m:
compiler/ml_switch_gen.m:
compiler/ml_tailcall.m:
compiler/ml_type_gen.m:
compiler/ml_unify_gen.m:
compiler/ml_util.m:
compiler/mlds_to_target_util.m:
    Conform to the above changes to the MLDS.

compiler/mlds_to_c.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_java.m:
    Generate the appropriate target code for constants of the new
    types and operations involving them.

compiler/bytecode.m:
compiler/bytecode_gen.m:
    Handle the new types in the bytecode generator; we just abort if we
    encounter them for now.

compiler/elds.m:
compiler/elds_to_erlang.m:
compiler/erl_call_gen.m:
compiler/erl_code_util.m:
compiler/erl_rtti.m:
compiler/erl_unify_gen.m:
    Handle the new types in the Erlang code generator.

library/private_builtin.m:
    Add placeholders for the builtin unify and compare operations for
    the new types.  Since the bootstrapping compiler will not recognise
    the new types we give the polymorphic arguments.  These can be
    replaced after this change has bootstrapped.

    Update the Java list of TypeCtorRep constants.

library/int8.m:
library/int16.m:
library/int32.m:
library/uint8.m:
library/uint16.m:
library/uint32.m:
    New modules that will eventually contain builtin operations
    on the new types.

library/library.m:
library/MODULES_UNDOC:
    Do not include the above modules in the library documentation
    for now.

library/construct.m:
library/erlang_rtti_implementation.m:
library/rtti_implementation.m:
deep_profiler/program_representation_utils.m:
mdbcomp/program_representation.m:
    Handle the new types.

runtime/mercury_dotnet.cs.in:
java/runtime/TypeCtorRep.java:
runtime/mercury_type_info.h:
    Update the list of TypeCtorReps.

configure.ac:
runtime/mercury_conf.h.in:
    Check for the header stdint.h.

runtime/mercury_std.h:
    Include stdint.h; abort if that header is no present.

runtime/mercury_builtin_types.[ch]:
runtime/mercury_builtin_types_proc_layouts.h:
runtime/mercury_construct.c:
runtime/mercury_deconstruct.c:
runtime/mercury_deep_copy_body.h:
runtime/mercury_ml_expand_body.h
runtime/mercury_table_type_body.h:
runtime/mercury_tabling_macros.h:
runtime/mercury_tabling_preds.h:
runtime/mercury_term_size.c:
runtime/mercury_unify_compare_body.h:
    Add the new builtin types and handle them throughout the runtime.
2017-07-18 01:31:01 +10:00
Zoltan Somogyi
30ec420984 Fix an anomaly in how in MLDS treats scalar commons.
compiler/mlds.m:
    The MLDS used to have two different ways to refer to scalar common
    data structures. It had an rval for the *name* of the scalar common,
    and an mlds_name for its *address*. The name could then be wrapped up
    inside a mlconst_data_adr function symbol to convert it to rval.

    An mlds_name is intended to be used for the names of data definitions
    in the MLDS, but scalar commons were never defined in this way.
    And the name and address of a scalar common differ in C only by
    the addition of an "&" operator in front, so the fact that they
    had to be processed by different code (due to them having different types)
    *required* double maintenance.

    This diff fixes this anomaly by making both the name and the address
    of a scalar common its own specific function symbol in the mlds_rval type.
    They differ in the presence or absence of an "_addr" suffix.

    Since all references to a vector common are to its address, give
    the existing mlds_rval function symbol for vector commons the "_addr
    suffix as well, for consistency.

    Replace the general mlconst_data_addr function symbol in the
    mlds_rval_const with its remaining instances. This allows the code
    constructing them to be smaller and simpler, and enables them
    to be treated differently in the future, if needed.

compiler/mlds_to_c.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_java.m:
    Conform to the changes in mlds.m.

    Put the code translating the various common structures next to each other,
    where they werent' before. Add XXXs about the differences between them
    that are probably unnecessary and may possibly be latent problems.

compiler/ml_util.m:
    Conform to the changes in mlds.m.

    Change the interface to a set of predicates that looks for variables
    inside various MLDS constructs to take a variable name, not a data name,
    as the thing being looked for.

compiler/ml_closure_gen.m:
compiler/ml_code_util.m:
compiler/ml_elim_nested.m:
compiler/ml_global_data.m:
compiler/ml_optimize.m:
compiler/ml_proc_gen.m:
compiler/ml_string_switch.m:
compiler/ml_tailcall.m:
compiler/ml_unify_gen.m:
compiler/mlds_to_target_util.m:
compiler/rtti_to_mlds.m:
    Conform to the changes in mlds.m, and maybe ml_util.m.

    In ml_proc_gen.m, put related arguments of some predicates and functions
    next to each other.
2017-07-13 13:36:51 +02:00
Zoltan Somogyi
0d5dac8018 Delete output args that always return the same value. 2017-07-10 00:51:41 +02:00
Zoltan Somogyi
083f990dbb Simplify the use of contexts in the MLDS.
compiler/mlds.m:
    This diff fixes two minor annoyances imposed by the old use of the
    mlds_context type in the MLDS.

    The first annoyance was that the mlds_context type used to be an
    abstract type that was privately defined to be a trivial wrapper
    around a prog_context. It had the exact same information content
    as a prog_context, but you had to go through translation functions
    to translate prog_contexts to mlds_contexts and vice versa.
    I think Fergus's idea was that we may want to add other information
    to the mlds_context type. However, since we haven't felt the need
    to anything like that in the 18 years (almost to the day) that the
    mlds_context type existed, I think this turned out to be a classic
    case of YAGNI (you ain't gonna need it).

    This diff deletes the mlds_context type, and replaces its uses
    with prog_context.

    The second annoyance was that actual MLDS code, i.e. values of the
    mlds_stmt type, always had to wrapped up inside a term of the statement
    type, a term which paired a context with the mlds_stmt.

    This diff moves the context information (now prog_context, not
    mlds_context) into each function symbol of the mlds_stmt type,
    deletes the statement type, and replaces its uses with the now-expanded
    mlds_stmt type. This simplifies most code that deals with MLDS code.

compiler/ml_util.m:
    Add a function, get_mlds_stmt_context, for the (very rare) occasions
    where we want to know the context of an mlds_stmt *before* testing
    to see what function symbol it is bound to.

compiler/ml_accurate_gc.m:
compiler/ml_call_gen.m:
compiler/ml_closure_gen.m:
compiler/ml_code_gen.m:
compiler/ml_code_util.m:
compiler/ml_commit_gen.m:
compiler/ml_disj_gen.m:
compiler/ml_elim_nested.m:
compiler/ml_foreign_proc_gen.m:
compiler/ml_global_data.m:
compiler/ml_lookup_switch.m:
compiler/ml_optimize.m:
compiler/ml_proc_gen.m:
compiler/ml_simplify_switch.m:
compiler/ml_string_switch.m:
compiler/ml_switch_gen.m:
compiler/ml_tag_switch.m:
compiler/ml_tailcall.m:
compiler/ml_type_gen.m:
compiler/ml_unify_gen.m:
compiler/mlds_to_c.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_java.m:
compiler/rtti_to_mlds.m:
    Conform to the changes above.

    In some cases, a function was given two separate contexts, sometimes from
    two separate sources; a prog_context and an mlds_context. In such cases,
    keep only one source.

    Standardize on Stmt as the variable name for "statement".

    Delete redundant $module references from unexpected and other abort
    predicates.

    In one case, delete a function that was a duplicate of another function.

    Give some predicates and functions more meaningful names.
2017-07-09 18:44:05 +02:00
Zoltan Somogyi
869605956c Make MLDS definitions self-contained.
Until now, we used a single type, mlds_defn, to contain both

- generic information that we need for all MLDS definitions, such as
  name and context, and

- information that is specific to each different kind of MLDS definition,
  such as a variable's initializer or a function's list of parameter types.

The former were contained in three fields in the mlds_defns directly,
while the latter were contained in a fourth field that was a discriminated
union of mlds_data_defn, mlds_function_defn and mlds_class_defn.

While seemingly parsimonious, this design meant that if we had e.g. a list
of variable definitions, we would have to wrap the mlds_defn/4 wrapper around
them to give them their names, and thereafter, any code that processed
that list would have to be prepared to process not just variables but also
functions and classes.

This diff moves the three generic fields into each of the mlds_data_defn,
mlds_function_defn and mlds_class_defn types, making each those types
self-contained, and leaving mlds_defn as nothing more than a discriminated
union of those types.

In the few places that want to look at the generic fields *without*
caring about what kind of entity is being defined, this design requires
a bit of extra work compared to the old design, but in many other places,
the new design allows us to return mlds_data_defns, mlds_function_defns
or mlds_class_defns instead of just mlds_defns.

compiler/mlds.m:
    Make the change described above.

    Store type definions (for high level data) and table structures definitions
    separately from other definitions in the MLDS type, since we can now
    give them tighter types.

compiler/ml_global_data.m:
    Change the fields that store flat cells from storing mlds_defns to
    storing mlds_data_defns, since we can now do so.

    Add an XXX about an obsolete comment.

compiler/mercury_compile_mlds_back_end.m:
compiler/ml_accurate_gc.m:
compiler/ml_call_gen.m:
compiler/ml_closure_gen.m:
compiler/ml_code_gen.m:
compiler/ml_code_util.m:
compiler/ml_commit_gen.m:
compiler/ml_disj_gen.m:
compiler/ml_elim_nested.m:
compiler/ml_foreign_proc_gen.m:
compiler/ml_gen_info.m:
compiler/ml_lookup_switch.m:
compiler/ml_optimize.m:
compiler/ml_proc_gen.m:
compiler/ml_string_switch.m:
compiler/ml_switch_gen.m:
compiler/ml_tailcall.m:
compiler/ml_type_gen.m:
compiler/ml_util.m:
compiler/mlds_to_c.m:
compiler/mlds_to_cs.m:
compiler/mlds_to_java.m:
compiler/rtti_to_mlds.m:
    Conform to the changes above. Where possible with only local changes,
    return mlds_data_defns mlds_function_defns or mlds_class_defns instead
    of just mlds_defns. Put the mlds_data(_), mlds_function(_) or mlds_class(_)
    wrapper around those definitions as late as possible (typically, when
    our current code wants to put it into the same list as some other kind
    of definition), in the hope that in the future, that wrapping can be
    delayed even later, or even avoided altogether. Make the places where
    such improvements may be possible with "XXX MLDS_DEFN".

    In some places, the tighter data representation allows us to *delete*
    "XXX MLDS_DEFN" markers.

    Move some common code from mlds_to_{cs,java}.m to ml_util.m.

    In mlds_to_{cs,java}.m, add prefixes to the function symbols in a type
    to reduce ambiguity.
2017-05-24 09:45:21 +02:00
Zoltan Somogyi
8dbea9f096 Use a structured representation for MLDS variables.
compiler/mlds.m:
    Replace the old definition of mlds_var_name, which was a string
    with an optional integer. The integer was intended to be the number
    of a HLDS variable, while auxiliary variables created by the compiler,
    which do not correspond to a HLDS variable, would not have the optional
    integer.

    This design has a couple of minor problems. The first is that there is
    no place in the compiler where all the variable names are visible at once,
    and without such a place, we cannot be sure that two names constructed
    for different purposes don't accidentally end up with the same name.
    The probability of such a clash used to be astronomically small
    (which is why this hasn't been a problem), but it was not zero.

    The second problem is that several kinds of compiler-created MLDS variables
    want to have numerical suffixes too, usually with the suffix being a
    unique sequence number used as a means of disambiguation. Most of the
    places where these were created put the numerical suffix into the name
    string itself, while some put the sequence number as the optional integer.

    As it happens, neither of those actions is good when one wants to take
    the independently generated MLDS code of several procedures in an SCC
    and merge them into a single piece of MLDS code. For this, we want to
    rename apart both the HLDS variable numbers and the sequence numbers.
    Having the sequence number baked into the strings themselves obviously
    makes such renumbering unnecessarily hard, while having sequence numbers
    in the slots intended for HLDS variable numbers makes the job impossible
    to do safely.

    This diff switches to a new representation of mlds_var_names that
    has a separate function symbol for each different "origin story"
    that is possible for MLDS variables. This addresses both problems.

    The single predicate that converts this structured representation
    to a string is the place where we can ensure that two semantically
    different MLDS variables never get translated to the same string.
    The current version of this predicate does *not* offer this guarantee,
    but later versions could.

    And having all the integers used in mlds_var_names for different purposes
    stored as arguments of different function symbols (that clearly indicate
    their meaning) makes it possible to rename apart different sets
    of MLDS variables easily and reliably.

    Move the code for converting mlds_var_names from ml_code_util.m to here,
    to make it easier to maintain it together with the mlds_var_name type.

compiler/ml_code_util.m:
    Conform to the above change by generating structured MLDS var names.

    Delete a predicate that is not needed with structured var names.

    Delete the code moved to mlds.m.

    Delete a predicate that has been unused since we deleted the IL backend.

    Add ml_make_boxed_type as a version of ml_make_boxed_types that returns
    exactly one type. This simplifies some code elsewhere.

    Add "hld" to some predicate names to make clear that they are intended
    for use only with --high-level-data.

compiler/ml_type_gen.m:
    Conform to the above change by generating structured MLDS var names.

    Add "hld" to the names of the (many) predicates here that are used
    only with --high-level-data to make clear that fact.

compiler/mlds_to_cs.m:
compiler/mlds_to_java.m:
    Conform to the above change by generating structured MLDS var names.

    Add a "for_csharp" or "for_java" suffix to some predicate names
    to avoid ambiguities.

compiler/ml_accurate_gc.m:
compiler/ml_call_gen.m:
compiler/ml_closure_gen.m:
compiler/ml_commit_gen.m:
compiler/ml_disj_gen.m:
compiler/ml_elim_nested.m:
compiler/ml_foreign_proc_gen.m:
compiler/ml_gen_info.m:
compiler/ml_global_data.m:
compiler/ml_lookup_switch.m:
compiler/ml_optimize.m:
compiler/ml_string_switch.m:
compiler/ml_unify_gen.m:
compiler/ml_util.m:
compiler/mlds_to_c.m:
    Conform to the above change by generating structured MLDS var names.

compiler/prog_type.m:
    Add var_to_type, as a version of var_list_to_type_list that returns
    exactly one type. This simplifies some code elsewhere.

compiler/java_names.m:
    Give some predicates and functions better names.

compiler/ml_code_gen.m:
    Fix typo.
2017-04-24 15:16:36 +10:00
Zoltan Somogyi
5de235065d Fix too-long lines. 2015-11-16 00:09:26 +11:00
Zoltan Somogyi
cc9912faa8 Don't import anything in packages.
Packages are modules whose only job is to serve as a container for submodules.
Modules like top_level.m, hlds.m, parse_tree.m and ll_backend.m are packages
in this (informal) sense.

Besides the include_module declarations for their submodules, most of the
packages in the compiler used to import some modules, mostly other packages
whose component modules their submodules may need. For example, ll_backend.m
used to import parse_tree.m. This meant that modules in the ll_backend package
did not have to import parse_tree.m before importing modules in the parse_tree
package.

However, this had a price. When we add a new module to the parse_tree package,
parse_tree.int would change, and this would require the recompilation of ALL
the modules in the ll_backend package, even the ones that did NOT import ANY
of the modules in the parse_tree package.

This happened even at one remove. Pretty much all modules in every one
of the backend have to import one or more modules in the hlds package,
and they therefore have import hlds.m. Since hlds.m imported transform_hlds.m,
any addition of a new middle pass to the transform_hlds package required
the recompilation of all backend modules, even in the usual case of the two
having nothing to do with each other.

This diff removes all import_module declarations from the packages,
and replaces them with import_module declarations in the modules that need
them. This includes only a SUBSET of their child modules and of the non-child
modules that import them.
2015-11-13 15:03:20 +11:00
Zoltan Somogyi
7654ec847e Convert (C->T;E) to (if C then T else E). 2015-09-18 09:37:29 +10:00
Zoltan Somogyi
c1e0499140 Fix the fail code for model_non trie string switches.
This was Mantis bug #383.

compiler/ml_string_switch.m:
    For model_non switches in MLDS grades, a failure is indicated by
    a fall through. This can be represented by an empty sequence of
    MLDS statements, but the code that generated string trie switches
    took such an empty sequence to mean that the switch could not fail.
    Fix this incorrect assumption.

tests/hard_coded/bug383.{m,inp,exp}:
    A regression test for the bug.

tests/hard_coded/Mmakefile:
    Enable the new test case.
2015-03-25 19:51:08 +11:00
Peter Wang
9979764072 Build string switch tries in the target string encoding.
The compiler should work in code units of the TARGET string encoding
when building tries for string switches.  Using its own string encoding
would be incorrect if it differs from the target encoding.  Currently
that would only occur if the compiler is built in a java/csharp grade
(uses UTF-16 internally) and invoked to target high-level C (uses UTF-8).

Another motivation for this change is to remove a place where the
compiler behaviour depends on the setting of `--cross-compiling'.
As of now, the `--cross-compiling' option has no effect.

compiler/backend_libs.m:
compiler/string_encoding.m:
	Add new module with helper predicates.

compiler/ml_string_switch.m:
	Convert strings to/from code units in the target string
	encoding.

compiler/ml_switch_gen.m:
	Remove restriction on compiling string switches using tries when
	`--cross-compiling' is enabled.

compiler/notes/compiler_design.html:
	Document the new module.
2015-03-23 14:16:20 +11:00
Zoltan Somogyi
d041b83943 Implement string switches via tries for the MLDS backend.
The code we emit to decide which arm of the switch is selected looks like this:

    case_num = -1;
    switch (MR_nth_code_unit(switchvar, 0)) {
        case '98':
            switch (MR_nth_code_unit(switchvar, 1)) {
                case '99':
                    if (MR_offset_streq(2, switchvar, "abc"))
                        case_num = 0;
                    break;
                case '100':
                    if (MR_offset_streq(2, switchvar, "aceg"))
                        case_num = 1;
                    break;
            }
            break;
        case '99':
            if (MR_offset_streq(2, switchvar, "bbb"))
                case_num = 2;
            break;
    }

The part that acts on this will look like this for lookup switches:

    if (case_num < 0)
        succeeded = MR_FALSE;
    else {
        outvar1 = vector_common[case_num].f1;
        ...
        outvarn = vector_common[case_num].fn;
        succeeded = MR_TRUE;
    }

and like this for non-lookup switches:

    switch (case_num) {
    case 0:
        <code for case 0>
        break;
    ...
    case n:
        <code for case n>
        break;
    default:                    /* if the switch is can_fail */
        <code for failure>
        break;
    }

compiler/ml_string_switch.m:
    Implement both non-lookup and lookup string switches via tries,
    along the lines shown above.

compiler/ml_switch_gen.m:
    Invoke the predicates that implement string switches via tries
    in the circumstances in which option values call for them.

    For now, we generate tries only for the C backend. Once the
    problems identified for mlds_to_{cs,java,managed} below are fixed,
    we can enable them on those backends as well.

compiler/options.m:
doc/user_guide.texi:
    Add an option that governs the minimum size of trie switches.

compiler/ml_lookup_switch.m:
    Factor out the code common to the implementation of all model-non
    lookup switches, both in ml_lookup_switch.m and ml_string_switch.m,
    and put it all into a new exported predicate.

    The previously existing MLDS implementation methods for lookup switches
    all build their lookup tables from maps that maps each cons_id
    in the switch cases to the values of the output arguments of those cases.
    For switch cases that apply to more than one cons_id, this map had
    one entry for each of those cons_ids. For tries, we need a map
    from *case ids*, not *cons ids* to the outputs. Since it is
    easier to convert the one-to-one case_id->outputs map to the
    many-to-one cons_id->outputs map than vice versa, change the
    main data structure from which lookup tables are built to store data
    in a case_id->outputs format, and provide predicates for its conversion
    to the other (previously the only) format.

    Rename ml_gen_lookup_switch to ml_gen_atomic_lookup_swith to distinguish
    it from other predicates that also generate (other kinds of) lookup
    switches.

compiler/switch_util.m:
    Have the types representating lookup tables represent their contents
    as a map, not as the assoc list derived from the map. Previously,
    we didn't do anything with the map other than flatten it to the assoc list,
    but for the MLDS backend, we may now also need to convert it to another
    form of map (see immediately above).

compiler/builtin_ops.m:
    Add two new builtin ops. The first, string_unsafe_index_code_unit,
    returns the nth code unit in a string; the second, offset_str_eq,
    does a string equality test on the nth and later code units of
    two strings. They are used in the implementation of tries.

compiler/c_util.m:
    Add a new binop category for each new binop, since they are not like
    existing binops.

    Put some existing binops into their own categories as well, since
    bundling them with the other ops they were bundled with seems like
    a bad idea.

compiler/hlds_goal.m:
    Make the identifier of switch arms in tagged_cases a separate type
    from int.

compiler/mlds_to_c.m:
compiler/llds_out_data.m:
    Handle the new kinds of binops.

    When writing out binop expressions, we used to do a switch on the binop
    to get its category, and then another switch on the category. We now
    switch on the binop directory, since this much harder to write out
    code using new binops badly, and should be faster to boot.

    In mlds_to_c.m, also make some cosmetic changes to the output to make it
    easier to read, and thus to debug.

compiler/mlds_to_il.m:
    Handle the new kinds of binops.

compiler/mlds_to_cs.m:
compiler/mlds_to_java.m:
compiler/mlds_to_managed.m:
    Do not handle the new kinds of binops, since doing so would require
    changing the whole approach of how these modules handle binops.

    Clean up some predicates.

compiler/bytecode.m:
compiler/erl_call_gen.m:
compiler/lookup_switch.m:
compiler/ml_global_data.m:
compiler/ml_optimize.m:
compiler/ml_tag_switch.m:
compiler/opt_debug.m:
compiler/string_switch.m:
    Conform to the changes above.

compiler/ml_code_gen.m:
    Put the predicates of this module into a consistent order.

library/string.m:
    Fix white space.

runtime/mercury_string.h:
    Add a macro for each of the two new builtin operations.
2015-02-24 16:03:30 +11:00
Zoltan Somogyi
7ca1a07296 Allow the MLDS backend to generate indexing switches (switches implemented
Estimated hours taken: 16
Branches: main

Allow the MLDS backend to generate indexing switches (switches implemented
more efficiently than just a if-then-else chain) for strings even if the target
language does not support gotos.

Previously, we use always used gotos to break out of search loops
after we found a match:

    do {
        if (we have a match) {
            ... handle the match ...
            goto end
        } else {
            ... handle nonmatches ...
        }
    } while (loop should continue);
    maybe some code to handle the failure of the search
end:

Now, if the "maybe some code" is empty, we prefer to use break statements
if the target language supports this:

    do {
        if (we have a match) {
            ... handle the match ...
            break;
        } else {
            ... handle nonmatches ...
        }
    } while (loop should continue)

If we cannot use either gotos or break statements, we instead use
a boolean variable named "stop_loop":

    stop_loop = 0;
    do {
        if (we have a match) {
            ... handle the match ...
            stop_loop = 1;
        } else {
            ... handle nonmatches ...
        }
    } while (stop_loop == 0 && loop should continue)
    if (stop_loop == 0) {
    	maybe some code to handle the failure of the search
    }

We omit the final if statement if the then-part would be empty.

The break method generates the smallest code, followed by the goto code.
I don't have information on speed, since we don't have a benchmark that
runs long enough, and the compiler itself does not spend any significant
amount of time on string switches. Probably the break method is also the
fastest, simply because it leaves the code looking most like normal C code.
(Some optimizations are harder to apply to code containing gotos, and some
optimizer writers do not bother.)

For C, we now normally prefer to generate code using the second method
(breaks), if we can, though normally "maybe some code" is not empty,
in which case we use the first method (goto).

However, if the value of the --experiment option is set to "use_stop_loop",
we always use the third method, and if it is set to "use_end_label", we always
use the first, even when we could use the second. This allow us to test all
three approaches using the C back end.

With backends that support neither gotos nor break, we always use the third
method (stop_loop).

With backends that don't support gotos but do support breaks, we also always
use the third method. This is because trying to use the second method would
require us to commit to not creating the stop_loop variable BEFORE we know
that the "maybe some code to handle the failure of the search" is empty,
and if it isn't empty, then we don't have the goto method to fall back on.

compiler/ml_string_switch.m:
	Make the change described above. Where possible, make the required
	change not to the original code, but to a version in which common
	parts have been factored out. (Previously, the duplicated code was
	small; now, it would be big.)

compiler/ml_target_util.m:
	A new module containing existing functions that test various properties
	of the target language. Keeping some of those functions in their
	original modules would have introduced a circular dependency.

compiler/ml_switch_gen.m:
	Enable the new functionality by removing the tests that previously
	prevented the compiler from using indexing switches on strings
	if the target language did not support gotos.

	Remove the code moved to ml_target_util.m.

compiler/ml_optimize.m:
compiler/ml_unify_gen.m:
	Remove the code moved to ml_target_util.m.

compiler/ml_backend.m:
compiler/notes/compiler_design.m:
	Add the new module.

compiler/ml_proc_gen.m:
	Delete a predicate that hasn't been used for a long time.

tools/makebatch:
	Fix an old pair of typos.
2011-08-15 06:23:20 +00:00
Zoltan Somogyi
de56f9b77c Implement lookup table versions of hash and binary search switches for strings
Estimated hours taken: 24
Branches: main

Implement lookup table versions of hash and binary search switches for strings
in the MLDS backend (those versions already exist in the LLDS backend).

compiler/ml_string_switch.m:
	Make the above change.

	Where possible, factor out and reuse existing code.

compiler/ml_lookup_switch.m:
	Break up the predicate that used to both test a switch whether it is a
	lookup switch and also generate code for it if it was, into two parts,
	each doing just one job. The first part is now useful for switches on
	strings as well.

	Group auxiliary predicates with the main predicates they support.

	Factor out some code into new predicates, and export them for use
	by the new code in ml_string_switch.m.

	Make some predicates tail recursive.

	Remove some predicates made unnecessary by changes to lookup_switch.m.

compiler/ml_switch_gen.m:
	Invoke the new code when appropriate, and conform to the updated
	interface of ml_lookup_switch.m.

compiler/switch_util.m:
	Change some types, and the predicates that operate on them, to make
	them useful for lookup switches for the MLDS backend as well the LLDS
	backend.

	Add some utility predicates.

compiler/lookup_switch.m:
	Change the interface of some of the predicates in this module to allow
	us to factor out some common code from the higher order values passed
	by callers.

	Conform to the changes in switch_util.m.

compiler/string_switch.m:
	Conform to changes in switch_util.m.

compiler/switch_gen.m:
	Conform to changes in lookup_switch.m.
2011-08-09 05:34:35 +00:00
Zoltan Somogyi
6dabcc0aa1 Implement binary search switches for strings in the MLDS backend (they already
Estimated hours taken: 16
Branches: main

Implement binary search switches for strings in the MLDS backend (they already
exist in the LLDS backend). Binary search switches have higher big-O
complexity than hash table search switches, but lower startup costs,
and so are appropriate for switches involving a smaller tables of strings.

compiler/ml_string_switch.m:
	Implement binary search switches.

	Where possible, factor out and reuse code that already existed for
	implementing hash switches.

compiler/ml_switch_gen.m:
	Invoke the new code when appropriate.

compiler/switch_gen.m:
	Avoid executing the same test (NumArms > 1) more than once.

compiler/mlds.m:
	Fix a typo in a comment.

compiler/string_switch.m:
	Delete stray text from a comment.
2011-08-02 03:02:05 +00:00
Zoltan Somogyi
b4092d2e4e Further improvements in the implementation of string switches, along with
Estimated hours taken: 12
Branches: main

Further improvements in the implementation of string switches, along with
some bug fixes.

If the chosen hash function does not yield any collisions for the strings
in the switch arms, then we can optimize away the table column that we would
otherwise need for open addressing. This was implemented in a previous diff.

For an ordinary (non-lookup) string switch, the hash table has two columns
in the presence of collisions and one column in their absence. Therefore if
doubling the size of the table allows us to eliminate collisions, the table
size is unaffected, though the corresponding array of labels we have to put
into the computed_goto instruction we generate has to double as well.
Thus the only cost of such doubling is an increase in "code" size, and
for small tables, the elimination of the open addressing loop may compensate
for this, at least partially.

For lookup string switches, doubling the table size this way has a bigger
space cost, but the elimination of the open addressing loop still brings
a useful speed boost.

We therefore now DO double the table size if this eliminates collisions.
In the library, compiler etc directories, this eliminates collisions in
19 out of 47 switch switches that had collisions with the standard table size.

compiler/switch_util.m:
	Replace the separate sets of predicates we used to have for computing
	hash maps (one for lookup switches and one for non-lookup switches)
	with a single set that works for both.

	Change this set to double the table size if this eliminates collisions.
	This requires it to decide the table size, a task previously done
	separately by each of its callers.

	One version of this set had an old bug, which caused it to effectively
	ignore the second and third string hash functions. This diff fixes it.

	There were two bugs in my previous diff: the unneeded table column
	was not being optimized away from several_soln lookup switches, and the
	lookup code for one_soln lookup switches used the wrong column offset.
	This diff fixes these too.

	Since doubling the table size requires recalculating all the hash
	values, decouple the computation of the hash values from generating
	code for each switch arm, since the latter shouldn't be done more than
	once.

	Add a note on an old problem.

compiler/ml_string_switch.m:
compiler/string_switch.m:
	Bring the code for generating code for the arms of string switches
	here from switch_util.m.

tests/hard_coded/Mmakefile:
	Fix the reason why the bugs mentioned above were not detected:
	the relevant test cases weren't enabled.

tests/hard_coded/string_hash.m:
	Update this test case to test the correspondence of the compiler's
	and the runtime's versions of not just the first hash function,
	but also the second and third.

runtime/mercury_string.h:
	Fix a typo in a comment.
2011-08-02 00:05:44 +00:00
Zoltan Somogyi
065a440492 Simplify some code.
Estimated hours taken: 0.1
Branches: main

compiler/ml_string_switch.m:
	Simplify some code.
2011-07-27 01:16:15 +00:00
Zoltan Somogyi
fe566dbf42 When doing hash table lookup as part of the implementation of switches on
Estimated hours taken: 8
Branches: main

When doing hash table lookup as part of the implementation of switches on
strings, we use open addressing to handle collisions. However, if the chosen
hash function does not yield any collisions for the strings in the switch arms,
then open addressing is unnecessary: if a lookup does not find the string bound
to the switch variable in its home bucket, it won't be in the hash table
at all.

This diff optimizes such cases, by not generating for them the loop we would
otherwise use for open addressing, and optimizing away the table column
telling that loop where to check next.

compiler/string_switch.m:
	Implement the above optimization both for ordinary switches on strings,
	and for lookup table switches (both one_soln and several_soln) on
	strings.

compiler/ml_string_switch.m:
	Implement the above optimization for ordinary switches on strings.
	This module does not (yet) implement lookup table switches on strings.

compiler/switch_util.m:
	When deciding what hash function to use, return the number of
	collisions for string_switch and ml_string_switch to use.

	Rename the other_switch category to float_switch, since the only
	type category it covers is switches on floats.

compiler/switch_gen.m:
compiler/ml_switch_gen.m:
	Make the module header comments more organized, and use the same
	template for both, so one can see the differences more easily.

	Put the switch arms for the smart indexing methods into the same
	order in both files.

	Fix an old problem in ml_switch_gen.m: the test to see whether we can
	apply a smart indexing method that uses switches on integers was
	testing not the availability of int switches in the target, but
	the availability of computed gotos. While ml_simplify_switch
	would transform the int-switch-using code to computed-goto-using
	code or an if-then-else chain in *some* cases, it would not do so
	in *all* cases.

	In ml_switch_gen.m, remove a test that could not succeed, and
	a procedure that was used only in that test.

	Conform to the changes in switch_util.m.

compiler/lookup_switch.m:
compiler/ml_simplify_switch.m:
	Update comments.
2011-07-26 00:25:22 +00:00
Zoltan Somogyi
295415090e Convert almost all remaining modules in the compiler to use
Estimated hours taken: 6
Branches: main

compiler/*.m:
	Convert almost all remaining modules in the compiler to use
	"$module, $pred" instead of "this_file" in error messages.

	In a few cases, the old error message was misleading, since it
	contained an incorrect, out-of-date or cut-and-pasted predicate name.

tests/invalid/unresolved_overloading.err_exp:
	Update an expected output containing an updated error message.
2011-05-23 05:08:24 +00:00
Julien Fischer
9f68c330f0 Change the argument order of many of the predicates in the map, bimap, and
Branches: main

Change the argument order of many of the predicates in the map, bimap, and
multi_map modules so they are more conducive to the use of state variable
notation, i.e. make the order the same as in the sv* modules.

Prepare for the deprecation of the sv{bimap,map,multi_map} modules by
removing their use throughout the system.

library/bimap.m:
library/map.m:
library/multi_map.m:
	As above.
NEWS:
	Announce the change.

	Separate out the "highlights" from the "detailed listing" for
	the post-11.01 NEWS.

	Reorganise the announcement of the Unicode support.

benchmarks/*/*.m:
browser/*.m:
compiler/*.m:
deep_profiler/*.m:
extras/*/*.m:
mdbcomp/*.m:
profiler/*.m:
tests/*/*.m:
ssdb/*.m:
samples/*/*.m
slice/*.m:
	Conform to the above change.

	Remove any dependencies on the sv{bimap,map,multi_map} modules.
2011-05-03 04:35:04 +00:00
Zoltan Somogyi
022b559584 Make error messages for require_complete_switch scopes report the missing
Estimated hours taken: 8
Branches: main

Make error messages for require_complete_switch scopes report the missing
functors.

Knowing which functors are missing requires knowing not only the set of
functors in the switched-on variable's type, but also which of these functors
have been eliminated by earlier tests, which requires having the instmap at
the point of entry to the switch. Simplification, which initially detected
unmet require_complete_switch requirements, does not have the instmap, and
threading the instmap through it would make it significantly less efficient.
So instead we now detect any problems with require_complete_switch scopes
(and require_detism scopes, which are similar) during determinism checking.

compiler/det_report.m:
	Factor out the code for finding the missing functors in conventional
	determinism errors, to allow it to be used for this new purpose.

	Check whether the requirements of require_complete_switch and
	require_detism scopes are met IF the predicate has any such scopes.

compiler/det_analysis.m:
compiler/det_util.m:
	Record whether the predicate has any such scopes.

compiler/hlds_pred.m:
	Add a predicate marker that allows this recording.

compiler/simplify.m:
	Delete the code that checks the require_complete_switch and
	require_detism scopes. Keep the code that deletes those scopes.
	(We have to do that here because determinism error reporting
	never updates the goal).

compiler/prog_out.m:
	Delete an unused predicate.

compiler/*.m:
	Remove unnecesary imports as flagged by --warn-unused-imports.
2011-01-02 14:38:08 +00:00
Zoltan Somogyi
8a28e40c9b Add the predicates sorry, unexpected and expect to library/error.m.
Estimated hours taken: 2
Branches: main

Add the predicates sorry, unexpected and expect to library/error.m.

compiler/compiler_util.m:
library/error.m:
	Move the predicates sorry, unexpected and expect from compiler_util
	to error.

	Put the predicates in error.m into the same order as their
	declarations.

compiler/*.m:
	Change imports as needed.

compiler/lp.m:
compiler/lp_rational.m:
	Change imports as needed, and some minor cleanups.

deep_profiler/*.m:
	Switch to using the new library predicates, instead of calling error
	directly. Some other minor cleanups.

NEWS:
	Mention the new predicates in the standard library.
2010-12-15 06:30:36 +00:00