mercury/compiler/notes/c_coding_standard.html

<!--
vim: ts=4 sw=4 expandtab ft=html
-->

<html>
<head>
<title>C Coding Standard for the Mercury Project</title>
</head>

<body>

<h1>C Coding Standard for the Mercury Project</h1>

These coding guidelines are presented in the briefest manner possible
and therefore do not include rationales.
<p>
Because the coding standard has been kept deliberately brief,
there are some items missing
that would be included in a more comprehensive standard.
For more on commonsense C programming, consult
the <a href="ftp://ftp.cs.toronto.edu/doc/programming/ihstyle.ps">
Indian Hill C coding standard </a>
or the <a href="https://c-faq.com">comp.lang.c FAQ</a>.
<p>

<h2>1. File organization</h2>

<h3>1.1. Modules and interfaces</h3>

We impose a discipline on C to allow us to emulate (poorly)
the modules of languages such as Ada and Modula-3.

<ul>
<li>
Every .c file has a corresponding .h file with the same basename.
For example, list.c and list.h.
<li>
We consider the .c file to be the module's implementation
and the .h file to be the module's interface.
We will just use the terms `source file' and `header'.
<li>
All items exported from a source file must be declared in the header.
These items include functions, variables, #defines, typedefs, enums, structs,
and so on.
Such declarations must not allocate storage,
so qualify each variable declaration with the `extern' keyword.
(If a variable declaration in a header file lacks the `extern',
the C compiler will consider the declaration to be a definition as well,
and will therefore allocate storage for the variable
in every source file that includes that header file.)
Likewise, function prototypes in header files should have the `extern' keyword.
In their case, this is not required
to distinguish declaration from definition
(the absence of the function body will do that);
it is just for improving readability.
<li>
We import a module by including its header.
Never give extern declarations for imported functions in source files;
always include the header of the module instead.
<li>
Each header must #include any other headers on which it depends.
Hence it is that imperative every header file
be protected against multiple inclusion.
Also, take care to avoid circular dependencies.
<li>
Always include system headers using the angle brackets syntax,
rather than double quotes.
That is
<font color="#0000ff"><tt>#include &lt;stdio.h&gt;</tt></font>.
<li>
Mercury-specific headers should be included using the double quotes syntax.
That is
<font color="#0000ff"><tt>#include "mercury_module.h"</tt></font>.
<li>
Do not put root-relative or `..'-relative directories in #includes.
</ul>

<h3>1.2. Organization within a file</h3>

<h4>1.2.1. Source files</h4>

Items in source files should in general be in this order:
<ul>
<li>
A prologue comment containing the following.
    <ul>
    <li>
    Vim tag line: "vim: ts=4 sw=4 expandtab ft=c"
    <li>
    Copyright notice.
    <li>
    Licence information (e.g. GPL or LGPL).
    <li>
    A short description of the purpose of the module.
    </ul>
<li>
Comments about the module in general
that would be helpful for readers and/or maintainers of the module,
such as any design principles governing the module,
its main data structures, and so on.
This may possibly include
    <ul>
    <li>
    design approaches that have been considered but rejected
    (including <em>why</em> they were rejected),
    <li>
    any limitations and/or bugs that would be nice to fix
    (and they haven't been fixed yet), and
    <li>
    possible future extensions.
    </ul>
<li>
#includes of system header files (such as stdio.h and unistd.h)
<li>
#includes of header files specific to this project.
But note that for technical reasons,
<font color="#0000ff">mercury_imp.h</font>
must be the first #include.
<li>
Any local #defines.
<li>
Definitions of any global variables defined in the module,
whether exported or not.
<li>
Prototypes for any non-exported functions.
(The prototypes for any exported functions should go in the header file.)
<li>
The definitions of all the functions.
</ul>

Within each section,
items should generally be listed in top-down order, not bottom-up.
That is, if foo() calls bar(),
then the definition of foo() should precede the definition of bar().
(An exception to this rule is functions that are explicitly declared inline;
in that case, the definition should precede the call,
to make it easier for the C compiler
to perform the desired inlining.)

<h4>1.2.2. Header files</h4>

Items in headers should in general be in this order:
<ul>
<li>
A prologue comment containing
the same information as for source file files (see above).
<li>
Any exported type definitions,
whether typedefs, or the definitions of structs, unions, or enums.
<li>
Extern declarations of any global variables exported by the module.
<li>
Extern declarations specifying the prototypes
of any functions exported by the module.
<li>
Any exported #define macros.
</ul>

However, it is probably more important
to group items which are conceptually related
than to strictly follow this order.
<p>
Also note that #defines which
either define configuration macros used for conditional compilation,
or define constants that are used for array sizes,
will need to come before the code that uses them.
But in general,
configuration macros should be isolated in separate files
(e.g. runtime/mercury_conf.h.in and runtime/mercury_conf_param.h),
and fixed-length limits should be avoided,
so those cases should not arise often.
<p>

Every header should be protected against multiple inclusion
using the following idiom:
<font color="#0000ff">
<pre>
#ifndef MODULE_H
#define MODULE_H

... body of module.h ...

#endif // not MODULE_H
</pre>
</font>

<h2>2. Comments</h2>

<h3>2.1. What should be commented</h3>

<h4>2.1.1. Functions</h4>

Each function should have a one-sentence description of what it does.
<ul>
<li>
The comment should describe both the inputs and outputs,
including outputs for which the caller passes a pointer.
<li>
The comment should describe
any side-effects not passing through the explicit inputs and outputs.
<li>
If the function allocates any memory,
the comment should specify
whether that memory should ever be deallocated
(some, e.g. interned constant string literals, should live forever),
and if so, who is responsible for deallocation.
<li>
If the function updates memory that is not related to its argument list,
by updating either global variables or function-static data, mention it.
</ul>
<p>
Note: memory allocation for C code
that must interface with Mercury code or the Mercury runtime
should be done using the routines defined and documented in
mercury/runtime/mercury_memory.h and/or mercury/runtime/mercury_heap.h,
according to the documentation in those files,
in mercury/trace/README,
and in the Mercury Language Reference Manual.

Such function comments should be present in header files
for each function exported from a source file.
Ideally, a client of the module should not have to look at the implementation,
only the interface.
In C terminology, the header should suffice
for working out how an exported function works.

<h4>2.1.2. Macros</h4>

Each non-trivial macro should be documented just as for functions (see above).
It is also a good idea to document
the types of macro arguments and return values,
e.g. by including a function declaration in a comment.

<h4>2.1.3. Global variables</h4>

Any global variable should be excruciatingly documented;
it should be crystal clear what every one of its possible values would mean.
This is especially true when globals are exported from a module.
In general, there are very few circumstances that justify use of a global.

<h3>2.2. Comment style</h3>

New comments should use this form:
<font color="#0000ff">
<pre>
    // Here is a comment.
    // And here is some more comment.
</pre>
</font>
Older comments had this form:
<font color="#0000ff">
<pre>
    /*
    ** Here is a comment.
    ** And here's some more comment.
    */
</pre>
</font>
New annotations to a single line of code should use this form:
<font color="#0000ff">
<pre>
    i += 3; // Here is a comment about this line of code.
</pre>
</font>
Older annotations had this form:
<font color="#0000ff">
<pre>
    i += 3; /* Here's a comment about this line of code. */
</pre>
</font>

<h3>2.3. Guidelines for comments</h3>

<h4>2.3.1. Revisits</h4>

Any code that needs to be revisited because it is a temporary hack
(or some other expediency) must have a comment of the form:
<font color="#0000ff">
<pre>
    // XXX: &lt;reason for revisit&gt;
</pre>
</font>

The &lt;reason for revisit&gt; should explain the problem
in a way that can be understood by developers
other than the author of the comment.

<h4>2.3.2. Comments on preprocessor statements</h4>

The <tt>#ifdef</tt> constructs should be commented like so
if they extend for more than a few lines of code:
<font color="#0000ff">
<pre>
#ifdef SOME_VAR
    ...
#else   // not SOME_VAR
    ...
#endif  // not SOME_VAR
</pre>
</font>

Similarly for
<font color="#0000ff"><tt>#ifndef</tt></font>.
<p>
Use the GNU convention of comments that indicate whether
the variable is true in the #if and #else parts of an #ifdef or #ifndef.
For instance:
<font color="#0000ff">
<pre>
#ifdef SOME_VAR
#endif // SOME_VAR

#ifdef SOME_VAR
    ...
#else  // not SOME_VAR
    ...
#endif // not SOME_VAR

#ifndef SOME_VAR
    ...
#else  // SOME_VAR
    ...
#endif // SOME_VAR
</pre>
</font>

<h2>3. Declarations</h2>

<h3>3.1. Pointer declarations</h3>

Attach the pointer qualifier to the variable name.
<font color="#0000ff">
<pre>
    char    *str1, *str2;
</pre>
</font>

<h3>3.2. Static and extern declarations</h3>

Limit module exports to the absolute essentials.
Make as much static (that is, local) as possible,
since this keeps interfaces to modules simpler.

<h3>3.3. Typedefs</h3>

Use typedefs to make code self-documenting.
They are especially useful on structs, unions, and enums.

<h2>4. Naming conventions</h2>

<h3>4.1. Functions, function-like macros, and variables</h3>

Use all lowercase with underscores to separate words.
For instance, <tt>MR_soul_machine</tt>.

<h3>4.2. Enumeration constants, #define constants, and non-function-like macros</h3>

Use all uppercase with underscores to separate words.
For instance, <tt>ML_MAX_HEADROOM</tt>.

<h3>4.3. Typedefs</h3>

Other than the MR_ prefix, we use CamelCase
for type names composed of more than one word.
This means that each word has uppercase only for its first letter,
and that the boundaries between successive words
are indicated only by the change in capitalization.
For instance, <tt>MR_DirectoryEntry</tt>.

<h3>4.4. Structs and unions</h3>

If something is both a struct and a typedef,
the name for the struct should be formed
by appending `_Struct' to the typedef name:
<font color="#0000ff">
<pre>
    typedef struct MR_DirectoryEntry_Struct {
        ...
    } MR_DirectoryEntry;
</pre>
</font>

For unions, append `_Union' to the typedef name.

<h3>4.5. Mercury specifics</h3>

Every symbol that is externally visible (i.e. declared in a header file)
should be prefixed with a prefix
that is specific to the package that it comes from.

For anything exported from mercury/runtime, prefix it with MR_.
For anything exported from mercury/library, prefix it with ML_.

<h2>5. Syntax and layout</h2>

<h3>5.1. Minutiae</h3>

<ul>
<li>
Never use any tabs.
<li>
Use four space indentation.
<li>
No line should be longer than 79 characters.
<li>
If a statement is too long,
continue it on the next line <em>indented two levels deeper</em>.
If the statement extends over more than two lines,
then make sure the subsequent lines are indented to the same depth
as the second line.
For example:
<font color="#0000ff">
<pre>
    here = is_a_really_long_statement_that_does_not_fit +
            on_one_line + in_fact_it_doesnt_even_fit +
            on_two_lines;

    if (this_is_a_somewhat_long_conditional_test(
            in_the_condition_of_an +
            if_then))
    {
        ...
    }
</pre>
</font>
</ul>

<h3>5.2. Statements</h3>

Use one statement per line.

What follows are example layout styles for the various syntactic constructs.

<h4>5.2.1. If statement</h4>

Always put braces around the then-part,
and the else-part if it exists,
even if they contain only a single statement.

If an if-then-else statement is longer than a page,
consider whether it can be shortened
by moving some of its code to named functions.
If you decide against this,
then add an "// end if" comment at the end.

<font color="#0000ff">
<pre>
// Curlies are placed according to the K&amp;R one true brace style.
// And comments look like this.
if (blah) {
    // Always use curlies, even when there is only one statement
    // in the block.
} else {
    ...
} // end if

// If the condition is so long that the open curly doesn't fit
// on the same line as the `if', put it on a line of its own.
if (a_very_long_condition() &amp;&amp;
    another_long_condition_that_forces_a_line_wrap())
{
    ...
}
</pre>
</font>

<h4>5.2.2. Functions</h4>

Function names are flush against the left margin.
This makes it easier to grep for function definitions
(as opposed to their invocations).
In argument lists, put space after commas.
And if the function is longer than a page,
add a <tt>// func</tt> comment after the curly that ends its definition.

<font color="#0000ff">
<pre>
int
rhododendron(int a, float b, double c) {
    ...
} // end rhododendron()
</pre>
</font>

<h4>5.2.3. Variables</h4>

Variable names in variable declarations and definitions
shouldn't be flush left, however;
they should be preceded by the type.
<font color="#0000ff">
<pre>
int x = 0, y = 3, z;

int a[] = {
    1,2,3,4,5
};
</pre>
</font>

<h4>5.2.4. Switches</h4>

<font color="#0000ff">
<pre>
switch (blah) {
    case BLAH1:
        ...
        break;
    case BLAH2: {
        int i;

        ...
        break;
    }
    default:
        ...
        break;
} // switch
</pre>
</font>

<h4>5.2.5. Structs, unions, and enums</h4>

<font color="#0000ff">
<pre>
struct Point {
    int     tag;
    union   cool {
        int     ival;
        double  dval;
    } cool;
};
enum Stuff {
    STUFF_A, STUFF_B ...
};
</pre>
</font>

<h4>5.2.6. Loops</h4>

<font color="#0000ff">
<pre>
while (stuff) {
    ...
}

do {
    ...
} while (stuff);

for (this; that; those) {
    // Always use curlies, even if there is no body.
}

// If no body, do this...
while (stuff) {
    // Do nothing.
}
for (this; that; those) {
    // Do nothing.
}
</pre>
</font>

<h3>5.3. Preprocessing</h3>

<h4>5.3.1. Nesting</h4>

Nested #ifdefs, #ifndefs and #ifs should be indented
by two spaces for each level of nesting.
For example:

<font color="#0000ff">
<pre>
#ifdef GUAVA
  #ifndef PAPAYA
  #else  // PAPAYA
  #endif // PAPAYA
#else  // not GUAVA
#endif // not GUAVA
</pre>
</font>

<h2>6. Portability</h2>

<h3>6.1. Architecture specifics</h3>

Avoid relying on properties of a specific machine architecture
unless necessary, and if necessary localise such dependencies.
One solution is to have architecture-specific macros
to hide access to machine-dependent code.

Some machine-specific properties are:
<ul>
<li>
Size (in bits) of C builtin datatypes (short, int, long, float, double).
<li>
Byte-order. Big- or little-endian (or other).
<li>
Alignment requirements.
</ul>

<h3>6.2. Operating system specifics</h3>

Operating system APIs differ from platform to platform.
Although most support standard POSIX calls
such as `read', `write' and `unlink',
you cannot rely on the presence of, for instance,
System V shared memory, or BSD sockets.
<p>
Adhere to POSIX-supported operating system calls whenever possible
since they are widely supported, even by Windows and VMS.
<p>
When POSIX doesn't provide the required functionality,
ensure that the operating system specific calls are localised.

<h3>6.3. Compiler and C library specifics</h3>

ANSI C compilers are now widespread
and hence we needn't pander to old K&amp;R compilers.
However compilers (in particular the GNU C compiler)
often provide non-ANSI extensions.
Ensure that any use of compiler extensions
is localised and protected by #ifdefs.
<p>
Don't rely on features whose behaviour is undefined
according to the ANSI C standard.
For that matter, don't rely on C arcana even if they <em>are</em> defined.
For instance, <tt>setjmp/longjmp</tt> and ANSI signals
often have subtle differences in behaviour between platforms.
<p>
If you write threaded code,
make sure any non-reentrant code is appropriately protected
via mutual exclusion.
The biggest cause of non-reentrant (non-threadsafe) code
is function-static data.
Note that some C library functions may be non-reentrant.
This may or may not be documented in their man pages.

<h3>6.4. Environment specifics</h3>

This is one of the most important sections in the coding standard.
Here we mention what other tools Mercury depends on.
Mercury <em>must</em> depend on some tools,
however every tool that is needed to use Mercury
reduces the potential user base.
<p>
Bear this in mind when tempted to add YetAnotherTool<sup>TM</sup>.

<h4>6.4.1. Tools required for Mercury</h4>

In order to run Mercury (given that you have the binary installation),
you need:
<ul>
<li>
A shell compatible with Bourne shell (sh)
<li>
GNU make
<li>
One of:
<ul>
<li>
The GNU C compiler
<li>
Any ANSI C compiler
</ul>
</ul>

In order to build the Mercury compiler, you need the above and also:
<ul>
<li>
gzip
<li>
tar
<li>
Various POSIX utilities: <br>
awk basename cat cp dirname echo egrep expr false fgrep grep head
ln mkdir mv rmdir rm sed sort tail
<li>
Some Unix utilities: <br>
test true uniq xargs
</ul>

<p>

In order to modify and maintain the source code of the Mercury compiler,
you need the above and also:
<ul>
<li>
Perl <font color="#ff0000">XXX: Which version?</font>
<li>
git
<li>
autoconf
<li>
texinfo
<li>
TeX
</ul>

<h4>6.4.2. Documenting the tools</h4>

If further tools are required, you should add them to the above list.
And similarly, if you eliminate dependence on a tool,
remove it from the above list.

<h2>7. Coding specifics</h2>

<ul>

<li>
Do not assume arbitrary limits in data structures.
Don't just allocate `lots' and hope that's enough.
Either it is too much,
or it will eventually hit the wall and have to be debugged.
Using highwater-marking is one possible solution for strings, for instance.

<li>
Always check return values when they exist, even malloc and realloc.

<li>
Always give prototypes (function declarations) for functions.
When the prototype is in a header, import the header;
do not write the prototype for an extern function.

<li>
Stick to ANSI C whenever possible.
Stick to POSIX when ANSI doesn't provide what you need.
Avoid platform specific code unless necessary.

<li>
Use signals with extreme austerity.
They are messy and subject to platform idiosyncracies even within POSIX.

<li>
Don't assume the sizes of C data types.
Don't assume the byteorder of the platform.

<li>
Prefer enums to lists of #defines.
Note that enums constants are of type int,
hence if you want an enumeration of chars or shorts,
then you must use lists of #defines.

<li>
Parameters to macros should be in parentheses.
<font color="#0000ff">
<pre>
    #define STREQ(s1,s2)    (strcmp((s1),(s2)) == 0)
</pre>
</font>

</ul>

<hr>
<p>
Note: This coding standard
is an amalgam of suggestions from the entire Mercury team,
not necessarily the opinion of any single author.
<p>
Comments?
See our <a href="https://www.mercurylang.org/contact.html">contact</a> page.

</body>
</html>