Files
mercury/runtime/mercury_deconstruct.h
Zoltan Somogyi 86f563a94d Pack subword-sized arguments next to a remote sectag.
compiler/du_type_layout.m:
    If the --allow-packing-remote-sectag option is set, then try to pack
    an initial subsequence of subword-sized arguments next to remote sectags.

    To allow the polymorphism transformation to put the type_infos and/or
    typeclass_infos it adds to a function symbol's argument list at the
    *front* of that argument list, pack arguments next to remote sectags
    only in function symbols that won't have any such extra arguments
    added to them.

    Do not write all new code for the new optimization; instead, generalize
    the code that already does a very similar job for packing args next to
    local sectags.

    Delete the code we used to have that picked the packed representation
    over the base unpacked representation only if it reduced the
    "rounded-to-even" number of words. A case could be made for its usefulness,
    but in the presence of the new optimization the extra code complexity
    it requires is not worth it (in my opinion).

    Extend the code that informs users about possible argument order
    rearrangements that yield better packing to take packing next to sectags
    into account.

compiler/hlds_data.m:
    Provide a representation for cons_tags that use the new optimization.
    Instead of adding a new cons_tag, we do this by replacing several old
    cons_tags that all represent pointers to memory cells with a single
    cons_tag named remote_args_tag with an argument that selects among
    the old cons_tags being replaced, and adding a new alternative inside
    this new type. The new alternative is remote_args_shared with a
    remote_sectag whose size is rsectag_subword(...).

    Instead of representing the value of the "data" field in classes
    on the Java and C# backends as a strange kind of secondary tag
    that is added to a memory cell by a class constructor instead of
    having to be explicitly added to the front of the argument vector
    by the code of a unification, represent it more directly as separate
    kind of remote_args_tag. Continuing to treat it as a sectag would have
    been very confusing to readers of the code of ml_unify_gen_*.m in the
    presence of the new optimization.

    Replacing several cons_tags that were usually treated similarly with
    one cons_tag simplifies many switches. Instead of an switch with that
    branches to the same switch arm for single_functor_tag, unshared_tag
    and shared_remote_tag, and then switches on these three tags again
    to get e.g. the primary tag of each, the new code of the switch arm
    is executed for just cons_tag value (remote_args_tag), and switches
    on the various kinds of remote args tags only when it needs to.
    In is also more natural to pass around the argument of remote_args_tag
    than to pass around a variable of type cons_tag that can be bound to only
    single_functor_tag, unshared_tag or shared_remote_tag.

    Add an XXX about possible further steps along these lines, such as
    making a new cons_tag named something like "user_const_tag" represent
    all user-visible constants.

compiler/unify_gen_construct.m:
compiler/unify_gen_deconstruct.m:
compiler/unify_gen_test.m:
compiler/unify_gen_util.m:
compiler/ml_unify_gen_construct.m:
compiler/ml_unify_gen_deconstruct.m:
compiler/ml_unify_gen_test.m:
compiler/ml_unify_gen_util.m:
    Implement X = f(Yi) unifications where f uses the new representation,
    i.e. some of its arguments are stored next to a remote sectag.

    Some of the Yi are stored in a tagword (a word that also contains a tag,
    in this case the remote secondary tag), while some are stored in other
    words in a memory cell. This means that such unifications have similarities
    both to unifications involving arguments being packed next to local
    sectags, and to unifications involving ordinary arguments in memory cells.
    Therefore wherever possible, their implemenation uses suitably generalized
    versions of existing code that did those two jobs for two separate kinds of
    cons_tags.

    Making such generalizations possible in some cases required shifting the
    boundary between predicates, moving work from a caller to a callee
    or vice versa.

    In unify_gen_deconstruct.m, stop using uni_vals to represent *either* a var
    *or* a word in a memory cell. While this enabled us to factor out some
    common code, the predicate boundaries it lead to are unsuitable for the
    generalizations we now need.

    Consistently use unsigned ints to represent both the whole and the parts
    of words containing packed arguments (and maybe sectags), except when
    comparing ptag constants with the result of applying the "tag" unop
    to a word, (since that unop returns an int, at least for now).

    In a few cases, avoid the recomputation of some information that we
    already know. The motivation is not efficiency, since the recomputation
    we avoid is usually cheap, but the simplification of the code's correctness
    argument.

    Use more consistent terminology in things such as variable names.

    Note the possibility of further future improvements in several places.

compiler/ml_foreign_proc_gen.m:
    Delete a long unused predicate.

compiler/mlds.m:
    Add an XXX documenting a possible improvement.

compiler/rtti.m:
    Update the compiler's internal representation of RTTI data structures
    to make them able to describe secondary tags that are smaller than
    a full word.

compiler/rtti_out.m:
    Conform to the changes above, and delete a long-unused predicate.

compiler/type_ctor_info.m:
    Use the RTTI's du_hl_rep to represent cons_tags that distinguish
    between function symbols using a field in a class.

compiler/ml_type_gen.m:
    Provide a specialized form of a function for code in ml_unify_gen_*.m.
    Conform to the changes above.

compiler/add_special_pred.m:
compiler/bytecode_gen.m:
compiler/export.m:
compiler/hlds_code_util.m:
compiler/lco.m:
compiler/ml_closure_gen.m:
compiler/ml_switch_gen.m:
compiler/ml_tag_switch.m:
compiler/rtti_to_mlds.m:
compiler/switch_util.m:
compiler/tag_switch.m:
    Conform to the changes above.

runtime/mercury_type_info.h:
    Update the runtime's representation of RTTI data structures to make them
    able to describe remote secondary tags that are smaller than a full word.

runtime/mercury_deconstruct.[ch]:
runtime/mercury_deconstruct.h:
runtime/mercury_deconstruct_macros.h:
runtime/mercury_ml_expand_body.h:
runtime/mercury_ml_arg_body.h:
runtime/mercury_ml_deconstruct_body.h:
runtime/mercury_ml_functor_body.h:
    These modules collectively implement the predicates in deconstruct.m
    in the library, and provide access to its functionality to other C code,
    e.g. in the debugger. Update these to be able to handle terms with the
    new data representation optimization.

    This update requires a significant change in the distribution of work
    between these files for the predicates deconstruct.deconstruct and
    deconstruct.limited_deconstruct. We used to have mercury_ml_expand_body.h
    fill in the fields of their expand_info structures (whose types are
    defined in mercury_deconstruct.h) with pointers to three vectors:
    (a) a vector of arg_locns with one element per argument, with a NULL
    pointer being equivalent to a vector with a given element in every slot;
    (b) a vector of type_infos with one element per argument, constructed
    dynamically (and later freed) if necessary; and (c) a vector of argument
    words. Once upon a time, before double-word and sub-word arguments,
    vector (c) also had one word per argument, but that hasn't been true
    for a while; we added vector (a) help the consumers of the expand_info
    decode the difference. The consumers of this info  always used these
    vectors to build up a Mercury term containing a list of univs,
    with one univ for each argument.

    This structure could be stretched to handle function symbols that store
    *all* their arguments in a tagword next to a local sectag, but I found
    that stretching it to cover function symbols that have *some* of their
    arguments packed next to a remote sectag and *some other* of their
    arguments in a memory cell as usual would have required a well-nigh
    incomprehensibly complex, and therefore almost undebuggable, interface
    between mercury_ml_expand_body.h and the other files above. This diff
    therefore changes the interface to have mercury_ml_expand_body.h
    build the list of univs directly. This make its code relatively simple
    and self-contained, and it should be somewhat faster then the old code
    as well, since it never needs to allocate, fill in and then free
    vectors of type_infos (each such typeinfo now gets put into a univ
    as soon as it is constructed). The downside is that if we ever wanted
    to get all the arguments at once for a purpose other than constructing
    a list of univs from them, it would nevertheless require constructing
    that list of univs anyway as an intermediate data structure. I don't see
    this downside is significant, because (a) I don't think such a use case
    is very likely, and (b) even if one arises, debuggable but a bit slow
    is probably preferable to faster but very hard to debug.

    Reduce the level of indentation of some of these files to make the code
    easier to edit. Do this by

    - not adding an indent level from switch statements to their cases; and
    - not adding an indent level when a case in a switch has a local block.

    Move the break or return ending a case inside that case's block,
    if it has one.

runtime/mercury_deep_copy_body.h:
runtime/mercury_table_type_body.h:
    Update these to enable the copying or tabling of terms whose
    representations uses the new optimization.

    Use the techniques listed above to reduce the level of indentation
    make the code easier to edit.

runtime/mercury_tabling.c:
runtime/mercury_term_size.c:
    Conform to the changes above.

runtime/mercury_unify_compare_body.h:
    Make this code compile after the changes above. It does need to work
    correctly, since we only ever used this code to compare the speed
    of unify-by-rtti with the speed of unify-by-compiler-generated-code,
    and in real life, we always use the latter. (It hasn't been updated
    to work right with previous arg packing changes either.)

library/construct.m:
    Update to enable the code to construct terms whose representations
    uses the new optimization.

    Add some sanity checks.

library/private_builtin.m:
runtime/mercury_dotnet.cs.in:
java/runtime/Sectag_Locn.java:
    Update the list of possible sectag kinds.

library/store.m:
    Conform to the changes above.

trace/mercury_trace_vars.c:
    Conform to the changes above.

tests/hard_coded/deconstruct_arg.{m,exp,exp2}:
    Extend this test to test the deconstruction of terms whose
    representations uses the new optimization.

    Modify some of the existing terms being tested to make them more diverse,
    in order to make the output easier to navigate.

tests/hard_coded/construct_packed.{m,exp}:
    A new test case to test the construction of terms whose
    representations uses the new optimization.

tests/debugger/browse_packed.{m,exp}:
    A new test case to test access to the fields of terms whose
    representations uses the new optimization.

tests/tabling/test_packed.{m,exp}:
    A new test case to test the tabling of terms whose
    representations uses the new optimization.

tests/debugger/Mmakefile:
tests/hard_coded/Mmakefile:
tests/tabling/Mmakefile:
    Enable the new test cases.
2018-08-30 05:14:38 +10:00

157 lines
6.2 KiB
C

// vim: ts=4 sw=4 expandtab ft=c
// Copyright (C) 2002, 2005, 2007, 2011 The University of Melbourne.
// Copyright (C) 2015-2016, 2018 The Mercury team.
// This file is distributed under the terms specified in COPYING.LIB.
// mercury_deconstruct.h
//
// This file declares utility functions for deconstructing terms,
// for use by the standard library and the debugger.
#ifndef MERCURY_DECONSTRUCT_H
#define MERCURY_DECONSTRUCT_H
#include "mercury_imp.h"
#include <stdio.h>
typedef struct {
int arity;
int functor_number;
MR_ConstString functor;
MR_Word arg_univs_list;
} MR_ExpandFunctorArgsInfo;
typedef struct {
int arity;
int functor_number;
MR_ConstString functor;
MR_Word arg_univs_list;
MR_bool limit_reached;
} MR_ExpandFunctorArgsLimitInfo;
typedef struct {
int arity;
int functor_number;
MR_ConstString functor_only;
} MR_ExpandFunctorOnlyInfo;
typedef struct {
int arity;
MR_Word arg_univs_list;
} MR_ExpandArgsOnlyInfo;
typedef struct {
int arity;
MR_bool chosen_index_exists;
MR_TypeInfo chosen_arg_type_info;
MR_Word chosen_arg_term;
MR_Word *chosen_arg_word_sized_ptr;
} MR_ExpandChosenArgOnlyInfo;
// MR_NONCANON_ABORT asks that deconstructions of noncanonical types should
// cause a runtime abort.
//
// MR_NONCANON_ALLOW asks that deconstructions of noncanonical types should
// return a constant that indicates this fact.
//
// MR_NONCANON_CC asks that deconstruction of noncanonical types should
// deconstruct the term as if it were canonical. Since by definition,
// noncanonical types may have more than one representation for the same value,
// this requires the caller to be in a committed choice context.
typedef enum {
MR_NONCANON_ABORT,
MR_NONCANON_ALLOW,
MR_NONCANON_CC
} MR_noncanon_handling;
// The MR_expand_* functions do the heavy lifting in the implementation
// of the main predicates of library/deconstruct.m. Given a term and its type,
// they can find out and return the name and the arity of the term's top
// function symbol, a list of the values of all its arguments, or the value
// of just one argument, chosen by index or by field name. Each variant
// returns some subset of this information. The subset is given by the fields
// of the MR_Expand* type referred to by its last argument.
// XXX In each of these functions, the name of the argument that represents
// the whole term is
//
// - Term in deconstruct.m in the library,
// - term_ptr in mercury_deconstruct.c, and
// - data_word_ptr here and in mercury_ml_expand_body.h.
//
// It would be nice if the terminology was consistent.
extern void MR_expand_functor_args(MR_TypeInfo type_info,
MR_Word *data_word_ptr, MR_noncanon_handling noncanon,
MR_ExpandFunctorArgsInfo *expand_info);
extern void MR_expand_functor_args_limit(MR_TypeInfo type_info,
MR_Word *data_word_ptr, MR_noncanon_handling noncanon,
int max_arity,
MR_ExpandFunctorArgsLimitInfo *expand_info);
extern void MR_expand_functor_only(MR_TypeInfo type_info,
MR_Word *data_word_ptr, MR_noncanon_handling noncanon,
MR_ExpandFunctorOnlyInfo *expand_info);
extern void MR_expand_args_only(MR_TypeInfo type_info,
MR_Word *data_word_ptr, MR_noncanon_handling noncanon,
MR_ExpandArgsOnlyInfo *expand_info);
extern void MR_expand_chosen_arg_only(MR_TypeInfo type_info,
MR_Word *data_word_ptr, MR_noncanon_handling noncanon,
int chosen, MR_ExpandChosenArgOnlyInfo *expand_info);
extern void MR_expand_named_arg_only(MR_TypeInfo type_info,
MR_Word *data_word_ptr, MR_noncanon_handling noncanon,
MR_ConstString chosen_name,
MR_ExpandChosenArgOnlyInfo *expand_info);
// MR_arg() takes the address of a term, its type, and an argument position
// (the first argument being at position 1), as well as an indication
// of the desired `canonicality' of the decomposition.
//
// If the given term has an argument at the specified position, MR_arg returns
// MR_TRUE, and fills in *arg_type_info_ptr and *arg_term_ptr with the
// type_info and value of that argument at the selected position. It also
// fills in *word_sized_arg_ptr, with the address of the argument
// if the argument's size is exactly one word, or with NULL if the size
// is anything else (double word, subword, or nothing for dummies).
//
// If the given term does not have an argument at the specified position,
// MR_arg fails, i.e. it returns MR_FALSE.
//
// You need to wrap MR_{save/restore}_transient_hp() around
// calls to this function.
extern MR_bool MR_arg(MR_TypeInfo type_info, MR_Word *term,
MR_noncanon_handling noncanon, int arg_index,
MR_TypeInfo *arg_type_info_ptr, MR_Word *arg_term_ptr,
MR_Word **word_sized_arg_ptr);
// MR_named_arg() is just like MR_arg, except the argument is selected by name,
// not by position.
//
// You need to wrap MR_{save/restore}_transient_hp() around
// calls to this function.
extern MR_bool MR_named_arg(MR_TypeInfo type_info, MR_Word *term,
MR_noncanon_handling noncanon, MR_ConstString arg_name,
MR_TypeInfo *arg_type_info_ptr, MR_Word *arg_term_ptr,
MR_Word **word_sized_arg_ptr);
// MR_named_arg_num() takes the address of a term, its type, and an argument
// name. If the given term has an argument with the given name, it succeeds
// and returns the argument number (counted starting from 0) of the argument.
// If it doesn't, it fails, i.e. returns MR_FALSE.
//
// You need to wrap MR_{save/restore}_transient_hp() around
// calls to this function.
extern MR_bool MR_named_arg_num(MR_TypeInfo type_info, MR_Word *term_ptr,
const char *arg_name, int *arg_num_ptr);
#endif // MERCURY_DECONSTRUCT_H