mirror of
https://github.com/Mercury-Language/mercury.git
synced 2025-12-10 11:23:15 +00:00
Estimated hours taken: 1 Branches: main extras/moose/README: extras/moose/BUGS: Document a bug with the handling of epsilon productions.
146 lines
5.1 KiB
Plaintext
146 lines
5.1 KiB
Plaintext
Moose is a parser generator for Mercury.
|
|
It does the same sort of thing for Mercury that Yacc and Bison do for C.
|
|
|
|
Please note that Moose is relatively immature. It works reasonably
|
|
well, but it has problems with epsilon productions, error handling
|
|
could be greatly improved, and there is room for adding quite a few
|
|
bells and whistles. See the files BUGS and TODO for more information.
|
|
|
|
|
|
|
|
Moose input files should be given a `.moo' suffix.
|
|
Moose input files contain Mercury code plus some additional
|
|
kinds of declarations and clauses that specify a grammar.
|
|
The `moose' program takes a Moose input file and converts it into
|
|
ordinary Mercury code.
|
|
|
|
Each Moose input file should contain:
|
|
|
|
- One Moose parser declaration, of the form
|
|
|
|
:- parse(<StartSymbol>, <TokenType>, <EndToken>, <Prefix>, <In>, <Out>).
|
|
|
|
Here <StartSymbol> is the <Name>/<Arity> of the starting symbol for the
|
|
grammar,
|
|
<TokenType> is the name of the Mercury type for tokens in this grammar,
|
|
<EndToken> is the token that signifies end-of-file,
|
|
<Prefix> is intended to be used as a prefix to the generated predicate
|
|
names, however this is currently unimplemented,
|
|
<In> and <Out> are the modes to use for the parser state.
|
|
|
|
- One or more Moose rule declarations, of the form
|
|
|
|
:- rule <Name>(<ArgumentTypes>).
|
|
|
|
A `:- rule' declaration declares a non-terminal symbol in the grammar.
|
|
Here <Name> is the name of the non-terminal symbol, and
|
|
<ArgumentTypes> gives the types of the arguments (i.e. attributes)
|
|
of that non-terminal.
|
|
|
|
- One or more Moose clauses.
|
|
The Moose clauses specify the productions for this grammar.
|
|
Each must be of the form
|
|
|
|
<NonTerminal> ---> <ProductionBody>.
|
|
|
|
Here <NonTerminal> is of the form <Name> or <Name>(<Arguments>),
|
|
where <Name> is the name of the non-terminal symbol, and
|
|
<Arguments> specify the arguments (i.e. attributes) for
|
|
that non-terminal.
|
|
<ProductionBody> must of one of the following forms:
|
|
|
|
[<TerminalList>]
|
|
<NonTerminal>
|
|
<ProductionBody> , <ProductionBody>
|
|
<ProductionBody> ; <ProductionBody>
|
|
{ <Action> }
|
|
|
|
[<TerminalList>] denotes a list of terminal symbols.
|
|
Each of the terminal symbols must be an element of the token type
|
|
specified in the `:- parse' declaration. The list can be empty.
|
|
<NonTerminal> denotes a non-terminal. Each non-terminal must be
|
|
declared with a `:- rule' declaration.
|
|
<Production> , <Production> denotes sequence.
|
|
<Production> ; <Production> denotes alternatives.
|
|
{ <Action> } denotes a grammar action. Here <Action> is an arbitrary
|
|
Mercury goal. Grammar actions can be used to compute attributes.
|
|
|
|
- Zero or more Moose action declarations, of the form
|
|
|
|
:- action(<Name>/<Arity>, <FuncName>).
|
|
|
|
Each action declaration will add a method called FuncName
|
|
to the type class parser state/1. The method will have the same types
|
|
as the rule given by Name/Arity, plus an <In> mode argument for the
|
|
parser state and returning an <Out> mode result of the same type.
|
|
|
|
For example
|
|
:- rule foo(int).
|
|
:- action(foo/1, process_foo).
|
|
will generate
|
|
:- typeclass parser state(T) where [
|
|
... get_token and any other action methods ...
|
|
func process_foo(int, T) = T,
|
|
mode process_foo(in, <In>) = <Out> is det
|
|
].
|
|
|
|
Whenever the parser reduces using a rule, it will invoke the associated
|
|
action method for that rule (if there is one). Since the parser state
|
|
is threaded through all action methods, it can be used to implement
|
|
inherited attributes. Actions can also modify the token stream in the
|
|
parser state (see below).
|
|
|
|
|
|
|
|
In order to use the Moose parser, you need to provide a lexer, which
|
|
generates tokens of the type <TokenType> given in the Moose parse
|
|
declaration. To allow flexibility in implementing the lexer, the parser
|
|
requests tokens using a type class.
|
|
|
|
The parser state type class is the set of all operations that must be
|
|
implemented in order to use the parser. parser state will contain at
|
|
least two methods:
|
|
|
|
:- typeclass parser_state(T) where [
|
|
|
|
pred get_token(token, T, T),
|
|
mode get_token(out, in, out) is det,
|
|
|
|
func unget_token(token, T) = T,
|
|
mode unget_token(in, <In>) = <Out> is det,
|
|
|
|
... any action methods ...
|
|
].
|
|
|
|
get_token returns the next token in the token stream. get_token should
|
|
return the <EndToken> on reaching the end of the input stream.
|
|
|
|
get_token and unget_token should satisfy the following property:
|
|
|
|
all [Tok, T0] some [T] get_token(Tok, unget_token(Tok, T0), T)
|
|
|
|
The other methods in parser state will be dictated by the Moose action
|
|
declarations. To use the Moose generated parser, simply call the
|
|
generated parse predicate with an instance of the parser state type
|
|
class.
|
|
|
|
The parse predicate will have the following signature
|
|
|
|
:- pred parse(parse_result, P, P) <= parser_state(P).
|
|
:- mode parse(out, <In>, <Out>) is det.
|
|
|
|
where
|
|
|
|
:- type parse_result
|
|
---> <StartSymbolName>(...)
|
|
; error(string).
|
|
|
|
and the arguments types of the <StartSymbolName> constructor match those
|
|
for the corresponding rule declaration.
|
|
|
|
|
|
|
|
The samples directory contains some simple grammars, lexers
|
|
and implementations of the parser state.
|
|
|