mercury/bytecode/Bytecode-doc


Summary of types
----------------

	- byte
		unsigned char 0-255
	- cstring
		Sequence of non-zero bytes terminated by zero-byte.
		XXX: May change this later to allow embedded
		zero-bytes in strings.
	- short
		2 bytes interpreted as an unsigned short.
		MSB is first byte read.
		2's complement.
	- int
		4 bytes interpreted as a signed int
		MSB read first.
		2's complement
	- float
		XXX: not yet supported but presumably..
		4 bytes interpreted as float.
		MSB read first.
		Must be IEEE float format.
	- list of T
		- contiguous sequence of T
	- determinism
		one byte interpreted as follows
			- 0 = det
			- 1 = semidet
			- 2 = multidet
			- 3 = nondet
			- 4 = cc_multidet
			- 5 = cc_nondet
			- 6 = erroneous
			- 7 = failure
	- tag is one of:
		% XXX Need explanation of all these.
		- 0 (byte) (simple tag)
			- primary (byte)
		- 1 (byte) (complicated tag)
			- primary (byte)
			- secondary (int)
		- 2 (byte) (complicated constant tag)
			- primary (byte)
			- secondary (int)
		- 3 (byte) (enum tag)
			Enumeration of pure constants.
		- 4 (byte) (no_tag)
	- cons_id (constructor id) is one of:
		% Note that not all of these alternatives are
		% meaningful in all bytecodes that have arguments of
		% type cons_id. XXX: Specify exactly which cases
		% are meaningful.
		- 0 (byte) (cons)
			- functor name (cstring)
			- arity (short)
			- tag (tag)
		- 1 (byte) (int const)
			- integer constant (int)
		- 2 (byte) (string const)
			- string constant (cstring) XXX: no '\0' in strings!
		- 3 (byte) (float const)
			- float constant (float) XXX: not yet supported
		- 4 (byte) (pred const)
			- module id (cstring)
			- predicate id (cstring)
			- arity (short)
			- procedure id (byte)
		- 5 (byte) (code addr const)
			- module id (cstring)
			- predicate id (cstring)
			- arity (short)
			- procedure id (byte)
		- 6 (byte) (base type info const)
			- module id (cstring)
			- type name (cstring)
			- type arity (byte)
	- op_arg (argument to an operator) is one of:
		- 0 (byte)
			- variable slot (short)
		- 1 (byte)
			- integer constant (int)
		- 2 (byte)
			- float constant (float) XXX: not yet supported
	- dir (direction of information movement in general unification)
	  is one of:
			- 0 (byte) to_arg
			- 1 (byte) to_var
			- 2 (byte) to_none


Summary of Bytecodes
--------------------

% Note: Currently we specify only the static layout of bytecodes.
% We also need to specify the operational semantics of the bytecodes,
% which can be done by specifying state transitions on the abstract
% machine. That is, to specify the meaning of a bytecode, we simply
% say how the state of the abstract machine has changed from before
% interpreting the bytecode to after interpreting the bytecode.

- enter_pred (0)
	- predicate name (cstring)
	- number of procedures in predicate (short)

- endof_pred (1)

- enter_proc (2)
	- procedure id (byte)	XXX: should use short instead?
		procedure id is used to distinguish the procedures
		in a predicate.
	- determinism of the procedure (determinism)
	- label count (short)
		Number of labels in the procedure. Used for allocating a
		table of labels in the interpreter.
	- temp count (short)
		Number of temporary variables needed for this procedure. (?)
	- length of list (short)
		Number of items in next arg
	- list of
		- Variable info (cstring)
		XXX: we should also have typeinfo for each variable.

- endof_proc (3)

- label (4)
	- Code label. (short)
	Used for jumps, switches, if-then-else, etc.

- enter_disjunction (5)
	- label id (short)
		Label refers to the label immediately after the disjunction.

- endof_disjunction (6)

- enter_disjunct (7)
	- label id (short)
		Label refers to label for next disjunct.

- endof_disjunct (8)
	- label id (short)
		Label refers to label for next disjunct.(?)
		Is -1 if there is no next disjunct in this disjunction.

- enter_switch (9)
	- variable in slots on which we are switching (short)
	- label immediately after the switch (short)
	We jump to the label after we've performed the switch.
		label refers to label immediately after corresponding
		endof_switch.

- endof_switch (10)

- enter_switch_arm (11)
	- constructor id (cons_id)
	- label id (short)
		label refers to label for next switch arm.

- endof_switch_arm (12)
	- label id (short)
		Label id refers to label immediately before next switch arm.
		(?)

- enter_if (13)
	- else label id (short)
	- follow label id (short)
		label refers to label at endof_if
		Note that we must've pushed a failure context
		before entering the enter_if. If the condition
		fails, we follow the failure context.
	- frame pointer tmp (short)
		XXX: hmm... dunno..


- enter_then (14)
	- frame pointer temp (short)
		XXX: what's this for?
	XXX: should have flag here? [I wrote this note in a meeting.
	What in hell did I mean?]

- endof_then (15) XXX: enter_else is a better name.
	- follow label (short)
		XXX: label just before endof_if ???

- endof_if (16)

- enter_negation (17)
	- label id (short)
		label refers to label at endof_negation.
		Note: As with if-then-else, we must push a failure
		context just before entering enter_negation. If the
		negation fails, we follow the failure context.

- endof_negation (18)

- enter_commit (19)
	- temp (short)
		XXX: what's this for?
	% XXX: how does this work?

- endof_commit (20)
	- temp (short)
		XXX: what's this for?

- assign (21)
	- Variable A in slots (short)
	- Variable B in slots (short)
	A := B. Copy contents of slot B to slot A.

- test (22)
	- Variable A in slots (short)
	- Variable B in slots (short)
	Used to test atomic values (int, float, etc). Before entering
	test, a failure context must be pushed. If the test fails,
	the failure context is followed.


- construct (23)
	- variable slot (short)
	- constructor id (cons_id)
	- list length of next arg (short)
	- list of:
		- variable slot (short)
	Apply constructor to list of arguments (in list of variable slots)
	and store result in a variable slot.

- deconstruct (24)
	- variable slot Var (short)
	- constructor id (cons_id)
	- list length of next arg (short)
	- list of:
		- variable slot (short)
	If cons_id is:
		- a functor applied to some args, then remove functor
			and put args into variable slots.
		- an integer constant, then check for equality of
			the constant and the value in the variable slot
		- a float constant, then check for equality of
			the constant and the value in the variable slot.
		- anything else, then makes no sense and interpreter
			should raise error. XXX: correct?
	Note: We must push a failure context before entering deconstruct.			If the deconstruct fails (i.e. functor of Var isn't
		the same as cons_id, or ints are not equal, or floats are
		not equal), then we must follow the failure context.

- complex_construct (25)
	- var (short)
	- cons id (cons_id)
	- list length (short)
	- list of:
		- var (short)
		- direction (dir)

	This used for general unification using partially instantiated
	terms. This is made possible by bromage's aliasing work.

- complex_deconstruct (26)
	- variable slot (short)
	- constructor id (cons_id)
	- list length of next arg (short)
	- list of
		- variable slot (short)
		- direction (dir)
	Note: This is a generalised deconstruct. The directions specify
	which way bindings move. XXX: This is still not 100% crystal clear.

- place_arg (27)
	- register number (byte)
		XXX: Do we have at most 256 registers?
	- variable number (short)
	Move number from variable slot to register.
	(Note: See notes for pickup_arg.)
	XXX: We will need to #include imp.h from ther Mercury runtime,
	since this specifies the usage of registers. For example, we
	need to know whether we're using the compact or non-compact
	register allocation method for parameter passing. (The compact
	method reuses input registers as output registers. In the
	non-compact mode, input and output registers are distinct.)

- pickup_arg (28)
	- register number (byte)
	- variable number in variable slots (short)
	Move argument from register to variable slot.
	(Note: We currently don't make use of floating-point registers.
	The datatype for pickup_arg in the bytecode generator allows
	for distinguishing register `types', that is floating-point
	register or normal registers. We may later want to spit out
	another byte `r' or `f' to identify the type of register.)


- call (29)
	- module id (cstring)
	- predicate id (cstring)
	- arity (short)
	- procedure id (byte)
	XXX: If we call a Mercury builtin, the module name is `mercury_builtin'.
	What if the user has a module called mercury_builtin?

- higher_order_call (30)
	- var (short)
	- input variable count (short)
	- output variable count (short)
	- determinism (determinism)

- builtin_binop (31)
	- binary operator (byte)
		This single byte is an index into a table of binary
		operators.
	- argument to binary operator (op_arg)
	- another argument to binary operator (op_arg)
	- variable slot which receives result of binary operation (short)
	XXX: Floating point operations must be distinguished from
	int operations. In the interpreter, we should use a lookup table
	that maps bytes to the operations.

- builtin_unop (32)
	- unary operator (byte)
		An index into a table of unary operators.
	- argument to unary operator (op_arg)
	- variable slot which receives result of unary operation (short)

- builtin_bintest (33)
	- binary operator (byte)
		An index into a table of binary operators.
	- argument to binary test (op_arg)
	- another argument to binary test op_arg)
	Note we must first push a choice point which we may follow should
	the test fail.

- builtin_untest (34)
	- unary operator (byte)
		An index into a tabler of unary operators.
	- argument to unary operator (op_arg)
	Note we must first push a choice point which we may follow should
	the test fail.

- semidet_succeed (35)

- semidet_success_check (36)

- fail (37)

- context (38)
	- line number in Mercury source that the current bytecode
		line corresponds to. (short)
	XXX: Still not clear how we should implement `step' in a debugger
	since a single context may have other contexts interleaved in it.

- not_supported (39)
	Some unsupported feature is used. Inline C in Mercury code,
	for instance. Any procedure thatr contains inline C
	(or is compiled Mercury?) must have the format:
		enter_pred ...
		not_supported
		endof_pred