mirror of
https://github.com/Mercury-Language/mercury.git
synced 2026-04-18 10:53:40 +00:00
82 lines
2.6 KiB
Plaintext
82 lines
2.6 KiB
Plaintext
% vim: ts=4 sw=4 et ff=unix
|
|
|
|
Copyright (C) 2002 The University of Melbourne
|
|
|
|
THE LEX MODULE
|
|
|
|
The lex module provides tools for writing lexical analyzers.
|
|
A lexical analyzer parses a stream of chars (e.g. from a string or the
|
|
standard input stream) against a list of regular expressions,
|
|
returning the first, longest match along with an indication of which
|
|
regular expression was matched.
|
|
|
|
QUICK START GUIDE
|
|
|
|
A lexer is compiled from a list of lexemes and a predicate that will
|
|
read the next char from the input stream.
|
|
|
|
A lexeme is a pair consisting of a regular expression and a function
|
|
that will convert a string matched by the regular expression into a
|
|
token, which may be returned as a result by the lexical analyzer
|
|
(hereafter referred to as a `lexer'.)
|
|
|
|
The lex module provides a language for composing regular expressions
|
|
including literal strings, alternation, Kleene closure, grouping and
|
|
various other useful combinators, as well as a rich set of pre-defined
|
|
regular expressions such as identifier, signed_int, real and so forth.
|
|
(Also, consider the regexp/1 function defined in the regex module,
|
|
which supports the construction of regular expressions from strings
|
|
similar to those recognised by tools such as grep and sed.)
|
|
|
|
A lexer may be created as in the following example (this lexer works
|
|
over the standard input stream):
|
|
|
|
:- type token
|
|
---> id(string)
|
|
; int(int)
|
|
; float(float)
|
|
; lpar
|
|
; rpar
|
|
; comment.
|
|
|
|
Lexer = lex.init([
|
|
( identifier -> func(Id) = id(Id)),
|
|
( signed_int -> func(Int) = int(Int)),
|
|
( real -> func(Float) = float(Float)),
|
|
( "(" -> return(lpar)),
|
|
( ")" -> return(rpar)),
|
|
( "%" ++ junk -> return(comment))
|
|
], read_from_stdin).
|
|
|
|
The combinator return/2 is defined s.t. return(X) = (func(_) = X),
|
|
that is, it simply discards the matched string and returns X.
|
|
|
|
(There is also lex.init/3 which takes an extra argument, namely a predicate
|
|
which is used to silently ignore certain tokens such as whitespace, say.)
|
|
|
|
A lexer is activated by calling lex.start/2, which returns a (unique)
|
|
lexer state:
|
|
|
|
!:LexerState = lex.start(Lexer, !.IO)
|
|
|
|
The lex.read/3 predicate searches for the next, longest match in the
|
|
input stream and returns the corresponding token (or an error message
|
|
if there is no immediate match in the input stream):
|
|
|
|
lex.read(Result, !LexerState),
|
|
(
|
|
Result = eof,
|
|
...
|
|
;
|
|
Result = ok(Token),
|
|
...
|
|
;
|
|
Result = error(Message, Offset),
|
|
...
|
|
)
|
|
|
|
When lexical analysis is complete, the input source may be obtained
|
|
by calling lex.stop/1:
|
|
|
|
!:IO = lex.stop(!.LexerState)
|