mercury/analysis/README

-----------------------------------------------------------------------------
| Copyright (C) 2003, 2006, 2008 The University of Melbourne.
| This file may only be copied under the terms of the GNU General
| Public License - see the file COPYING in the Mercury distribution.
-----------------------------------------------------------------------------

This directory formerly contained an implementation of the inter-module
analysis framework described in

	Nicholas Nethercote. The Analysis Framework of HAL,
	Chapter 7: Inter-module Analysis, Master's Thesis,
	University of Melbourne, September 2001, revised April 2002.
	<http://www.cl.cam.ac.uk/~njn25/pubs/masters2001.ps.gz>.

The code has now been moved into the compiler directory, in the package
`analysis'. The files that remain (Mmakefile, etc.) are unused.

This framework records call and answer patterns for arbitrary analyses,
and performs dependency analysis to force recompilation where necessary
when modules change.

TODO:
- dependency tracking and invalidation after source modifications
  (partially done but requires testing with more analyses)
- garbage collection of unused versions
- least fixpoint analyses

DESIGN:

The analysis framework is a library which links into the client
compiler, allowing the class methods to examine compiler data
structures. The interface is as compiler-independent as possible,
so that compilers which can interface with Mercury code via .NET
can use it.

Clients of the library must define an instance of the typeclass
`analysis__compiler', which describes the analyses the compiler
wants to perform.

Each analysis is described by a call pattern type and an
answer pattern type. A call pattern describes the information
known about the argument variables before analysing a call
(by executing it in the abstract domain used by the analysis).
An answer pattern describes the information known after analysing
the call. Call and answer patterns must form a partial order, and
must be convertible to strings.

Analysis database
=================

Before analysing a module, the client should call
`analysis__lookup_results' and `analysis__lookup_requests' to find the
call patterns for which the module should be analysed, in addition to
the "default" call pattern (top).  `lookup_results' will give the call
patterns which we have encountered before whereas `lookup_requests'
will give the call patterns which have been recently requested by
other modules.

When analysing a module, at each call to an imported function
the client should call `analysis__lookup_results' or
`analysis__lookup_best_result' to find the results which
match the call pattern.

If no results exist, the client should call `analysis__record_request',
to ask that a specialized version be created on the next compilation
of the client module.  The client should also record the suboptimal
result that was used in place of a more specialized result in the
imported module's analysis registry.

There is currently no way to analyse higher-order or class method
calls. It might be possible to analyse such calls where the set of
possibly called predicates is known, but it is better to optimize away
higher-order or class method calls where possible.

The client should call `analysis__record_dependency' for each external
analysis result that is made use of by the module.

When compilation of a module is complete, the client should
call `analysis__write_analysis_files' to write out all
information collected during the compilation.

Called by analysis passes to record analysis requests and lookup
answers for imported functions. The status of each answer recorded
in the database is one of the following:

* invalid - the answer was computed using information which has changed,
  and must be recomputed. `invalid' entries may not be used in analysis
  or in generating code.

* fixpoint_invalid - the entry is for a least fixpoint analysis, and
  depends on an answer which has changed so that the new answer
  is strictly less precise than the old answer (moving towards to
  correct answer). `fixpoint_invalid' entries may be used when analysing
  a module, but code must not be generated which uses `fixpoint_invalid'
  results (even indirectly). In addition, code must not be generated when
  compiling a module in a strongly connected component of the analysis
  dependency graph which contains `fixpoint_invalid' entries. (Note that
  the method for handling least fixpoint analyses is not described in
  Nicholas Nethercote's thesis).
  This is not yet implemented.

* suboptimal - the entry does not depend on any `invalid' or
  `fixpoint_invalid' entries, but may be improved by further
  recompilation. `suboptimal' entries do not need to be recompiled,
  but efficiency may be improved if they are. `suboptimal' annotations
  are only possible for greatest fixpoint analyses (least fixpoint
  analyses start with a "super-optimal" answer and work towards the
  correct answer).

* optimal - the entry does not depend on any `invalid', `fixpoint_invalid'
  or `suboptimal' results. Modules containing only `optimal' entries do
  not need recompilation.

Analysis dependency checker (NYI)
=================================

Examines the dependencies between analysis results and the state
of the compilation, then orders recompilations so that there are no
`invalid' or `fixpoint_invalid' entries (with an option to eliminate
`suboptimal' entries).

Each client compiler should have an option which invokes the analysis
dependency checker rather than compiling code. This adjusts the status
of entries in the database, then invokes the compiler's build tools
(through a typeclass method) to recompile modules in the correct order.

If the implementation of a function changes, all of its answers are
marked as invalid, and the results of the functions it directly uses
in the SCC of the analysis dependency graph containing it are reset
to `top' (marked `suboptimal') for greatest fixpoint analyses, or
`bottom' (marked `fixpoint_invalid') for least fixpoint analyses.
This ensures that the new result for the function is not computed
using potentially invalid information.

After each compilation, the dependency checker examines the changes
in the analysis results for each function.

For greatest fixpoint analyses, if the new answer is
- less precise than or incomparable with the old result,
  all users of the call pattern are marked `invalid'.
- equal to the old result, no entries need to be marked.
- more precise than the old result, callers are marked
  as `suboptimal'.

For least fixpoint analyses, if the new answer is
- less precise than or incomparable with the old result,
  all users of the call pattern are marked `invalid'.
- equal to the old result, no entries need to be marked.
- more precise than the old result, callers are marked
  as `fixpoint_invalid'.

The new answer itself will be marked as `optimal'. This isn't
necessarily correct -- further recompilations may change its status
to `fixpoint_invalid' or `suboptimal' (or `invalid' if there
are source code changes).

Recompilation must proceed until there are no `invalid' or `fixpoint_invalid'
entries. Optionally, optimization can proceed until there are no new requests
or `suboptimal' answers.

It the responsibility of the analysis implementor to ensure termination of
the analysis process by not generating an infinite number of requests.

Note: although `mmc' does not have an analysis dependency checker as
described above, `mmc --make' now understands the intermodule analysis
framework.  Before the pass to compile modules to target code, there
is an analysis pass.  This is similar to the optimisation interface
pass which generated `.opt' and `.trans_opt' files.  `mmc --make' will
repeat the analysis pass as long as there are invalid analysis
registries, with the option to repeat the analysis pass as many times
as desired in order to bring `suboptimal' modules to `optimal'.

Granularity of dependencies
===========================

The description in Nicholas Nethercote's thesis uses fine-grained
dependency tracking, where for each exported answer only the imported
analysis results used to compute that answer are recorded.

For simplicity, the initial Mercury implementation will only record
dependencies of entire modules on particular analysis results
(effectively the exported results depend on all imported analysis
results used in that compilation). This is worthwhile because none of
the analyses in the Mercury compiler currently record the information
required for the more precise approach, and I would expect that other
compilers not designed for inter-module analysis would also not
record that information.

Differences of the implementation from Nick's thesis
====================================================

We do not store "version calling patterns" in the .analysis files
However, we have .request files which contain calling patterns for
which we had no answer, and lookup_best_result() predicate
automatically matches a calling pattern to the best possible result
available.

We do not support "perfect calls" any differently from any other
calls.

Our IMDGs only record entire modules as being dependent on some
analysis result so if an analysis result changes, we can only
mark/invalidate an entire module which makes use of the result.
This means having an analysis status associated with each individial
analysis result is somewhat pointless at the moment, as individial
statuses will never be marked/invalidated, and we also can't make
use of them to reanalyse only the procedures that actually need
analysing.

Still to do
===========

Page numbers are references to Nick's thesis.

We do not yet remove analysis results for procedures which no longer
exist. (p. 107)

In the get_ext_answer procedure Nick uses a glb function in addition
to a is_more_precise(Ans', Ans_glb) call.  I don't know if this is
necessary. (p. 108)

Garbage collection of calling patterns that are no longer used in the
program.  Our IMDGs do not record exactly which calling patterns are
in use so I'm not sure how this would be done.

Removal of IMDG arcs for modules that used to import a module, but no
longer do so.  This would not be hard if we remember which
modules were imported by M previously in M.imdg. (p. 110)

There is no special handling yet of procedure level cycles.
We cannot follow the approach of p. 116 as we have not enough
information in our IMDGs.

Our treatment of library modules is fairly dumb.  We just assume that
library modules have read-only analysis files, and that we cannot make
requests to or reanalysis library modules. (p. 117)

Modules are currently always analysed from the bottom of the module
dependency graph upwards.  When some analyses start making meaningful
requests (unlike now where all call patterns are of the type
`any_call'), the order that modules are analysed in should be changed
to reduce unnecessary reanalyses.