mercury/compiler/notes/coding_standards.html

<!--
vim: ts=4 sw=4 expandtab ft=html
-->

<html>
<head>
<title>Mercury Coding Standard for the Mercury Project</title>
</head>

<body>

<h1>Mercury Coding Standard for the Mercury Project</h1>

<h2>Documentation</h2>

<p>
Each module should contain header comments
which state the module's name, main author(s), and purpose,
and give an overview of what the module does,
what are the major algorithms and data structures it uses, etc.
<p>
Everything that is exported from a module
should have sufficient documentation
that it can be understood without reference
to the module's implementation section.
<p>
Each procedure that is implemented using foreign code
should have sufficient documentation about its interface
that it can be implemented just by referring to that documentation,
without reference to the module's implementation section.
<p>
Each predicate other than trivial access predicates
should have a short comment describing what the predicate is supposed to do,
and what the meaning of the arguments is.
Ideally this description should also note any conditions
under which the predicate can fail or throw an exception.
<p>
There should be a comment for each field of a structure
saying what the field represents.
<p>
Any user-visible changes such as new compiler options or new features
should be documented in the appropriate section(s) of the Mercury documentation
(usually the Mercury User's Guide and/or the Mercury Reference Manual).
Any major new features should be documented in the NEWS file,
as should even small changes to the library interface,
or anything else that might cause anyone's existing code to break.
<p>
Any new compiler modules or other major design changes
should be documented in <code>compiler/notes/compiler_design.html</code>.
<p>
Any feature which is
left in an incompletely implemented form for a nontrivial length of time
should be mentioned in <code>compiler/notes/work_in_progress.html</code>.

<h2>Naming</h2>

<p>
Variables should always be given meaningful names,
unless they are irrelevant to the code in question.
<p>
Different states or different versions of the same entity
should be referred to using state variable notation,
although if there are only a few versions,
and their number is not likely to grow,
they may be named Foo0, Foo1, Foo2, ..., Foo.
<p>
Predicates which get or set a field of a structure or ADT
should be named typename_get_fieldname and typename_set_fieldname respectively.

<h2>Coding</h2>

<p>
Your code should reuse existing code as much possible, generalizing if needed.
"cut-and-paste" style reuse is highly discouraged.
<p>
Your code should be efficient.
Performance is a quite serious issue for the Mercury compiler.
<p>
No fixed limits please!
(If you really must have a fixed limit,
include detailed documentation explaining why it was so hard to avoid.)
<p>
Don't use DCG notation for new code.
<p>
Use state variables for threading state, such as the IO state.
The conventional IO state variable name is <code>!IO</code>.

<h2> Error handling </h2>

<p>
Code should check for both erroneous inputs from the user
and also invalid data being passed from other parts of the Mercury compiler.
You should also always check to make sure that
the routines that you call have succeeded;
make sure you don't silently ignore failures.
(This last point almost goes without saying in Mercury,
but is particularly important to bear in mind
if you are writing any C code or shell scripts,
or if you are interfacing with the OS.)
<p>
Calls to error/1, directly or indirectly,
should always indicate an internal software error,
not merely incorrect inputs from the user,
or failure of some library routine or system call.
In the compiler, do not call error/1 directly;
call it indirectly using unexpected/2 or sorry/2 from compiler_util.m.
Likewise, use expect/3 from compiler_util.m rather than require/2.
<p>
Error messages should be constructed using the facilities in error_util.m.
Compiler error messages should be complete sentences;
they should start with a capital letter and end in a full stop.
If including a full explanation
of the meaning of an error and/or of its likely causes
would be too tedious for everyday use
(probably because the error is frequent enough that
most non-novice users of Mercury would be expected to be familiar with it)
you should include that extra information
in a verbose_only part of the error message,
which causes it to be printed if the --verbose-errors option is set.
<p>
Error messages from the runtime system
should begin with the text "Mercury Runtime:",
preferably by using the MR_fatal_error() routine.
<p>
If a system call or C library function that sets errno fails,
the error message should be printed with perror()
or should contain MR_strerror(errno, errbuf, sizeof(errbuf)).
If it was a function manipulating some file,
the error message should include the filename.

<h2>Input / output</h2>

All new code in the compiler that reads from a stream or writes to a stream
should explicitly pass the stream being accessed
to the predicates that do the reading or writing.
Do not depend on the library's record
of what the current input and output streams are;
do not read from the current input stream,
or write to the current output stream.

<h2>Layout</h2>

<p>
Each module should be indented consistently,
with 4 spaces per level of indentation.
The indentation should be done using only spaces;
there should be no tabs.
The only exception should be Makefiles,
which should use one tab per level of indentation.
<p>
All files other than Makefiles
should have something like this at the top,
even before the copyright line:
<pre>
	% vim: ts=4 sw=4 expandtab ft=mercury
</pre>

Makefiles should have something like this instead:
<pre>
	% vim: ts=8 sw=8 noexpandtab
</pre>

<p>
No line should extend beyond 79 characters.
The reason we don't allow 80 character lines is that
these lines wrap around in diffs,
since diff adds an extra character at the start of each line.
<p>
Since "empty" lines that have spaces or tabs on them
prevent the proper functioning of paragraph-oriented commands in vi,
lines shouldn't have trailing white space.
They can be removed with a vi macro such as the following.
(Each pair of square brackets contains a space and a tab.)

<pre>
	map ;x :g/[     ][      ]*$/s///^M
</pre>

<p>
String literals that don't fit on a single line should be split
by writing them as two or more strings concatenated using the "++" operator;
the compiler will evaluate this at compile time,
if --optimize-constant-propagation is enabled (i.e. at -O3 or higher).
<p>
Predicates that have only one mode should use predmode declarations
rather than having a separate mode declaration.
<p>
If-then-elses should always be parenthesized,
except when an if-then-else that occurs as the else part
of another if-then-else doesn't need to be parenthesized.
We prefer the
<code>( if <i>C</i> then <i>T</i> else <i>E</i> )</code> syntax
over the
<code>( <i>C</i> -&gt; <i>T</i> ; <i>E</i> )</code> syntax.
<p>
The condition can either be on the same line as the '<code>if</code>' keyword:

<pre>
    ( if test1 then
        goal1
    else if test2 then
        goal2
    else
        goal
    )
</pre>

or, if the test is complicated, on separate line(s) between the
'<code>if</code>' and "<code>then</code>' keywords,

<pre>
    ( if
        very_long_test_that_does_not_fit_on_one_line(VeryLongArgument1,
            VeryLongArgument2)
    then
        goal1
    else if
        test2a,
        test2b
    then
        goal2
    else if
        test3    % would fit one one line, but separate for consistency
    then
        goal3
    else
        goal
    ).
</pre>

<!--
When using the
<code>( <i>C</i> -&gt; <i>T</i> ; <i>E</i> )</code> syntax,
the condition can either be
on the same line as the opening parenthesis and the
`<code>-&gt;</code>',

<pre>
	( test1 ->
		goal1
	; test2 ->
		goal2
	;
		goal
	)
</pre>

or, if the test is complicated, it can be on a line of its own:

<pre>
	(
		very_long_test_that_does_not_fit_on_one_line(VeryLongArgument1,
			VeryLongArgument2)
	->
		goal1
	;
		test2a,
		test2b
	->
		goal2
	;
		test3	% would fit one one line, but separate for consistency
	->
		goal3
	;
		goal
	).
</pre>
-->

<p>
Disjunctions should always be parenthesized.
The semicolon of a disjunction should never be at the end of a line
-- put it at the start of the next line instead.
<p>
Predicates and functions implemented via foreign code
should be formatted like this:

<pre>
    :- pragma foreign_proc("C",
        int.to_float(IntVal::in, FloatVal::out),
        [will_not_call_mercury, promise_pure],
    "
        FloatVal = IntVal;
    ").
</pre>

The predicate name and arguments should be on a line on their own,
as should the list of annotations.
The foreign code should also be on lines of its own;
it shouldn't share lines with the double quote marks surrounding it.
<p>
Type definitions should be formatted in one of the following styles:

<pre>
	:- type my_type
		--->	my_type(
                    % comment explaining field
                    some_other_type
                ).

	:- type some_other_type == int.

	:- type foo
		--->	bar(
                    int,		% short comment explaining field
                    float		% short comment explaining field
                )
        ;	    baz
        ;	    quux.

	:- type complicated
		--->	complicated_f(
                    % long comment explaining first field
                    f1_arg_name     :: int,

                    % long comment explaining second and third fields
                    % this is ok if they are related to each other
                    f2_arg_name     :: float,
                    f3_arg_name     :: float
                ).
</pre>

<p>
If an individual clause is long, it should be broken into sections,
and each section should have a "block comment" describing what it does;
blank lines should be used to show the separation into sections.
Comments should precede the code to which they apply, rather than following it.

<pre>
	% This is a block comment; it applies to the code in the next
	% section (up to the next blank line, or the next block comment).

	blah,
	blah,
	blahblah,
	blah,
</pre>

If a particular line or two needs explanation, a "line" comment

<pre>
	% This is a "line" comment; it applies to the next line or two
	% of code
	blahblah
</pre>

or an "inline" comment

<pre>
	blahblah	% This is an "inline" comment
</pre>

should be used.

<h2>Structuring</h2>

Modules should generally be arranged so that
their procedures, types, insts etc.
are listed in top-down order, not bottom-up.
<p>
Code should be grouped into bunches of related predicates, functions, etc.,
and sections of code that are conceptually separate
should be separated with dashed lines:

<pre>
%---------------------------------------------------------------------------%
</pre>

Ideally such sections should be identified
by "section heading" comments identifying the contents of the section,
optionally followed by a more detailed description.
These should be laid out like this:

<pre>
%---------------------------------------------------------------------------%
%
% Section title.
%

% Detailed description of the contents of the section and/or
% general comments about the contents of the section.
% This part may go one for several lines.
%
% It can even contain several paragraphs.

The actual code starts here.
</pre>

For example

<pre>
%---------------------------------------------------------------------------%
%
% Exception handling.
%

% This section contains all the code that deals with throwing or catching
% exceptions, including saving and restoring the virtual machine registers
% if necessary.
%
% Note that we need to take care to ensure that this code is thread-safe!

:- type foo ---&gt; ...
</pre>

Double dashed lines, i.e.

<pre>
%---------------------------------------------------------------------------%
%---------------------------------------------------------------------------%
</pre>

can also be used to indicate divisions into major sections.
Note that these dividing lines should not exceed the 79 character limit
(see above).

The tools/stdlines script will standardize the lengths of these dashed lines.

<h2>Module imports</h2>

Each group of :- import_module items should list only one module per line,
since this makes it much easier to read diffs
that change the set of imported modules.
In the compiler, when e.g. an interface section imports modules
from both the compiler and the standard library,
there should be two groups of imports,
the imports from the compiler first and then the ones from the library.
For the purposes of this rule,
consider the modules of mdbcomp to belong to the compiler.
<p>
Each group of import_module items should be sorted,
since this makes it easier to detect duplicate imports and missing imports.
It also groups together the imported modules from the same package.
There should be no blank lines between
the imports of modules from different packages,
since they would make it harder
to resort the group with a single editor command.

<h2>Standard library predicates</h2>

The descriptive comment for any predicate or function
that occurs in the interface of a standard library module
must be positioned above the predicate or function declaration.
It should be formatted as in the following example:

<pre>
		% Description of predicate foo.
		%
	:- pred foo(...
	:- mode foo(...
</pre>

A group of related predicate, mode and function declarations
may be grouped together under a single description,
provided that it is formatted as above.
If there is a function declaration in such a grouping,
then it should be listed before the others.

For example:

<pre>
		% Insert a new key and corresponding value into a map.
		% Fail if the key already exists.
		%
	:- func insert(map(K, V), K, V) = map(K, V).
	:- pred insert(map(K, V)::in, K::in, V::in, map(K, V)::out) is det.
</pre>

The reason for using this particular style is that
the reference manual for the standard library
is automatically generated from the module interfaces,
and we want to maintain a uniform appearance as much as is possible.
<p>
Avoid module qualification in the interface sections of library modules
except where necessary to resolve ambiguity.
<p>
A predicate or function which throws exceptions under certain conditions
should be described as such, using the phrase "throws an exception".
A predicate or function that may cause a runtime abort
should be described as such, using the phrase "runtime abort".

<h2>Testing</h2>

<p>
Every change should be tested before being committed.
The level of testing required depends on the nature of the change.
If the change does not touch the semantics of the code
(e.g. it modifies comments, renames predicates or adds module qualifications),
then checking whether recompilation succeeds is enough.
If the change replaces an abort with nonaborting code,
then just compiling it and running some tests by hand is sufficient.
For pretty much everything else, where the change might break the compiler,
you should run a bootstrap check (using the `bootcheck' script)
before committing.
<p>
If the change means that old versions of the compiler
will not be able to compile the new version of the compiler,
you must follow the procedures described in
<a href="bootstrapping.html">compiler/notes/bootstrapping.html</a>.
<p>
In addition to testing before a change is committed,
you need to make sure that the code will not get broken in the future.
Every time you add a new feature,
you should add some test cases for that new feature to the test suite.
Every time you fix a bug, you should add a regression test to the test suite.

<h2>Committing changes</h2>

<p>
When committing a nontrivial change,
you should get someone else to review your changes.
If the risk of breakage is low, you can ask for a review post commit.
Otherwise, get a review, and act on its recommendations, before you commit.
<p>

The file <a href="reviews.html">compiler/notes/reviews.html</a>
contains more information on review policy.

</body>
</html>