The recent change to sparse_bitsets broke the lex library in extras.
Specifically, we now now need to make characters an instance of the
uenum typeclass. This diff does so.
library/char.m:
Add predicates and functions for converting between unsigned integers
and characters.
Make characters an instance of the uenum typeclass.
tests/hard_coded/Mmakefile:
tests/hard_coded/char_uint_conv.{m,exp,exp2}:
Add a test of the above conversions.
NEWS:
Announce the additions.
extras/lex/lex.m:
Conform to recent changes.
These are due to:
- differences between module and file names.
- redundant imports
- the recent change from math.domain_error -> exception.domain_error.
benchmarks/*/*.m:
extras/*/*.m:
As above.
Discussion of these changes can be found on the Mercury developers
mailing list archives from June 2018.
COPYING.LIB:
Add a special linking exception to the LGPL.
*:
Update references to COPYING.LIB.
Clean up some minor errors that have accumulated in copyright
messages.
Remove old .cvsignore files, moving their contents to .gitignore files.
There are now no .cvsignore files in the repository.
I've also sorted some .gitignore files and avoided repeating a pattern in a
subdirectory's .gitignore file when it is already mentioned in the parent
.gitignore file.
This file-specific setting will override a default setting of expandtabs
in $HOME/.vimrc.
*/Makefile:
*/Mmakefile:
As above.
tests/hard_coded/.gitignore:
Don't ignore the purity subdir. This ignore must have been left over
from when purity.m was a test in hard_coded, not hard_coded/purity,
and it ignored an executable, not a directory.
The recent change to the compiler that enabled stricter checking of non-ground
final insts "broke" the lex library in extras in three separate places. The
underlying problem is the same in all of them: subtype insts are not preserved
through calls to various procedures in the list module. This diff rewrites
two of those places to avoid the need for a call to a list procedure that
will cause the subtype information to be lost. In the third case, we add a
runtime check and an unsafe cast.
extras/lex/lex.convert_NFA_to_DFA.m:
Rewrite map_state_set_transitions_to_numbers/2 in order to avoid a call
to list.map/2.
Use set.map/2 in a spot where it would be appropriate.
extras/lex/lex.regexp.m:
Rewrite trans_closure/5 in order to avoid a call to list.map/2. Also,
fold directly over a set rather than converting it into a list in a
couple of spots.
Add a runtime check to ensure that add_atom_transitions/2 only
returns atom_transitions and add a call to an unsafe cast predicate
to restore the correct subtype inst.
Add a comment explaining all of this.
extras/*/*.m:
Replace the use of '__' as a module qualifier in the rest of
the extras.
s/io\.state/io/ in the extras.
Use '=' instead of is/2 in a spot.
lex_demo.m wasn't compiling because io.print_line doesn't exist. I also
found that drawing the "> " prompt wasn't working as expected. This patch
fixes both these issues.
extras/lex/samples/lex_demo.m:
As above.
/[a-z]{10}/ in this way: `Regex = range('a', 'z') * 10'.
extras/lex/lex.m:
Removed unused and unsafe str_foldr function,
added (T * int) = regexp function.
extras/lex/samples/lex_demo.m:
Removed whitespace in comments,
added an input prompt,
added a lexeme for '//' C++ comments using the new '*' operator.
This is a consecutive patch to my previous patch, it extends the .gitignore
files to work more thoroughly on Windows.
browser/.gitignore:
ignore .net .dll assemblies
compiler/.gitignore:
ignore *.obj
extras/.gitignore:
Ignore the tags directory
Ignore: *.bat, *.lib,
Ignore *.dll rather than lib*.dll because the .net assemblies are not
prefixed.
extras/dynamic_linking/.gitignore:
Ignore *.out (test output files)
extras/error/.gitignore:
Ignoring unix binary for the error utility
extras/lex/tests/.gitignore:
Ignore the test_regex binary
extras/moose/tests/.gitignore:
Ignore array_based.m because it is generated from array_based.moo
extras/.gitignore:
Mercury ignores the --use-{grade}-subdir dir
Mercury/** for git 1.8+ this recursively ignores all build files
ignoring *.mh and *.init files
*.err output files
lib*.{dll|so|a|dylib} ignores target compiler library output
*.jar ignores the Java grade output
*.exe for Windows executables
extras/dynamic_linking/.gitignore:
ignoring the copy of dl.m, name_mangle.m
ignoring hello lib and dl_test* executables
extras/moose/samples/.gitignore:
ignoring cgram.m small.m alpha.m expr.m which are
generated from the .moo grammar files
extras/graphics/mercury_cairo/samples/.gitignore:
ignoring *.png output and all executables
extras/**/.gitignore:
In each sample/test/example folder the linux executable/test
output is ignored
Lists of Unicode characters can be very long as there are many characters.
This change represents ranges of characters by storing only the first and
last character in each range.
extras/lex/lex.m:
As above.
The NFA is now using the sparse_bitset to compile the state transitions.
This significantly reduces the compile time for the regular expression, thus
enabling the use of a much larger Unicode range for character recognition.
The helper function anybut/1 has been made aware of unicode, i.e.
anybut/1 now recognises [`0x1` .. `0xffff`] - [but chars].
In addition, the regexp(string) instance now accepts Unicode chars,
the previous code was only valid for ASCII chars (when using the UTF-8
encoded C-strings in the C-based grades)
extras/lex/lex.convert_NFA_to_DFA.m:
Use the sparse_bitset ADT to build the tables used by the library.
extras/lex/lex.m:
Add a new constructor to the regexp type allowing regexps to be build
directly from character sets.
extras/lex/lex.regexp.m:
Build regexps from character sets.
extras/lex/samples/lex_demo.m:
Demonstrate the use of wide characters in the lexer demo.
extras/lex/lex.automata.m:
extras/lex/lex.lexeme.m:
Conform to above changes.
In lex.lexeme, all char codepoints where downcast to 8bit, thus rendering
Unicode matching impossible. This change removes this limitation.
extras/lex/lex.lexeme.m:
As above.
extras/lex/lex.m:
Document some caveats with the charset functions.
Add a range check to the two-argument charset function.
Simplify the anybut function.
+ Add a charset type that can be used to represent sets of characters.
These are useful when constructing lexers.
+ Start adding support for wider character sets.
+ Add support for matching on ranges of characters.
extras/lex/lex.m:
As above.
anybut/1 now uses unicode as a basis.
extras/lex/samples/lex_demo.m:
Demonstrate the new features.
Branches: main, 11.07
Make compilation of extras/references more reliable.
extras/references/Makefile:
extras/references/Mmakefile:
extras/references/Mercury.options:
Use mmc --make to build and install this library (as we do with
some of the other extras packages) instead of mmake. This allows
us to sue the grade filtering mechanism in mmc --make to ensure
that we only install the library in grades that support trailing.
Fix a number of problems that prevent this library installing cleanly:
+ don't require the presence of asm_fast grades; build the library in
the default grade with the trail segment component added.
+ use trail segment grades instead of fixed sized trail grades (the latter
are not installed anymore unless specifically requested by the user).
+ install the C header file that is part of this library.
+ delete ancient workarounds for supporting shared libraries on Linux.
extras/references/tests/Mmakefile:
extras/references/samples/Mmakefile:
Conform to the above changes.
Don't assume that the extension for static libraries is .a; it's
not on some systems.
extras/references/global.m:
Add a feature set pragma specifying that trailing is required.
extras/references/nb_reference.m:
s/__/./
extras/lex/lex.m:
Unrelated change: avoid using an obsolete function.
Branches: main, 11.07
Fix top-level invocations of mmake in the extras distribution. They were
breaking because the lex subdirectory didn't have an Mmakefile. (It uses
mmc --make and a normal Makefile instead.)
Make more of the extras distribution build from the top-level.
extras/lex/Mmakefile:
Add an Mmakefile that contains the targets required by the top-level
extras distribution Mmakefile. Each of the targets just forwards
the work to the actual Makefile.
extras/Mmakefile:
Update the list of things that won't compile ``out-of-the-box''.
(XXX we should use autoconf to configure these.)
Build the base64 encoding library, the fixed point arithmetic library
and the error utility by default.
extras/README:
Update the description of the lazy_evaluation subdirectory.
extras/base64/Makefile:
extras/base64/Mmakefile:
extras/base64/mercury_base64.m:
extars/base64/Mercury.options:
Build and install base64 as a library. We use mmc --make, controlled
from a normal Makefile to do this and then put a forwarding Mmakefile
in place using so that compilation from the top-level of the extras
distribution works. (One reason for doing this is that mmc --make
provides grade filtering capabilities which are needed here since
this library will only work in C grades.)
extras/base64/base64.m:
Avoid a compilation error: sizeof cannot be used on things with
an incomplete type.
extras/fixed/Makefile:
extras/fixed/Mmakefile:
extras/fixed/Mercury.options:
extras/fixed/mercury_fixed.m:
Build and install fixed as a library. As with base64, use mmc --make
and add a forwarding Mmakefile.
extras/fixed/fixed.m:
Style and formatting fixes.
extras/lex/Makefile:
Add a realclean target
extras/lex/lex.lexeme.m:
Replace a call to a deprecated procedure.
Estimated hours taken: 0
Branches: main
Avoid calling a deprecated function.
extras/lex/lex.lexeme.m:
Call bitmap.bit/2 in place of the now deprecated bitmap.get/2.
Estimated hours taken: 1
Branches: main
Replaced lex Mmakefile with Makefile to promote 'mmc --make' instead of
'mmake'. Moreover, lex installed with mmake was not usable with 'mmc --make'.
extras/lex/Makefile:
extras/lex/Mmakefile:
extras/lex/samples/Makefile:
extras/lex/samples/Mmakefile:
extras/lex/tests/Makefile:
extras/lex/tests/Mmakefile:
Replaced each Mmakefile with a corresponding Makefile and modified/added
rules to keep the functionality.
Estimated hours taken: 0.5
Branches: main
Fix compilation problems in the extras distribution caused by recent
changes.
extras/stream/stream.m:
Provide a definition for the type stream/1.
extras/*/*.m:
Conform to the recent changes to the standard library.
Estimated hours taken: 60
Branches: main
Added a new module, regex, as a companion to lex. The new module provides
functionality for converting conventional Unix-style regular expressions
into regexps for use with lex and a number of search and search-and-replace
functions for strings.
The new functionality has been tested fairly thoroughly (and led to several
bugs in lex being identified and fixed.)
NEWS:
Reported new additions.
extras/lex/README:
Now just points the reader to README.lex and README.regex.
extras/lex/README.lex:
extras/lex/README.regex:
Added. Brief introductions to the two libraries.
extras/lex/lex.automata.m:
extras/lex/lex.buf.m:
extras/lex/lex.convert_NFA_to_DFA.m:
extras/lex/lex.regexp.m:
Trivial formatting changes.
extras/lex/lex.lexeme.m:
Removed the parameter on inst compiled_lexeme.
extras/lex/lex.m:
Various formatting changes.
Added pred offset_from_start/3 which can be used to identify
the `current' point in the input stream with respect to lexing.
Added pred read_char/3 which can be used to read the `next'
char from the input stream without doing any lexing.
Added a field init_winner_func to the lexer_instance type. This
is used to resolve a bug whereby regular expressions that match
the empty string were not being spotted at the start of the input
stream.
Solved some bugs whereby an exception was incorrectly thrown in
some circumstance when the end of the input stream was reached.
extras/lex/regex.m:
Added. This file defines the functions for converting Unix-style
regular expression strings into regexps for use with lex and into
regexes for use with the string search(-and-replace) predicates
defined in this module.
extras/lex/Mmakefile:
Improved the installation instructions and included a check target.
extras/lex/tests:
extras/lex/tests/Mmakefile:
extras/lex/tests/test_regex:
extras/lex/tests/test_regex.in:
extras/lex/tests/test_regex.exp:
Added a test suite.
extras/lex/tests/cmp_regex_gawk:
This program looks for differences in behaviour between gawk and
regex.
extras/lex/samples/demo.m:
Moved to lex_demo.m
extras/lex/samples/lex_demo.m:
Was demo.m; slightly changed to include a match for unexpected
characters.
extras/lex/samples/regex_demo.m:
Added.
extras/lex/samples/Mmakefile:
Updated.
Estimated hours taken: 2
Branches: main
Add appropriate infrastructure to support doing `mmake depend && mmake
&& mmake install' in the `extras' directory, and having it automatically
install and build as much as possible.
extras/Mmakefile:
New file.
extras/cgi/Mmakefile:
Delete bogus reference to `ALL_LIBGRADES' and `mercury-config'.
extras/complex_numbers/Mmakefile:
extras/concurrency/Mmakefile:
extras/concurrency/concurrency.m:
extras/curses/Mmakefile:
extras/dynamic_linking/Mmakefile:
extras/lazy_evaluation/Mmakefile:
extras/lex/Mmakefile:
extras/moose/Mmakefile:
extras/xml/Mmakefile:
Add rules for `mmake install'.
extras/references/scoped_update.m:
Add macro guard around typedef in `c_header_code'.
extras/curs/Mmakefile:
extras/curses/Mmakefile:
Define MERCURY_BOOTSTRAP_H, to avoid name clash on `bool'
that is caused by our bootstrap code in runtime/Merucry.h.
extras/curs/curs.m:
Add `promise_pure' declarations for all pure procedures.
XXX Should this be needed? These predicates all take
io__state arguments.
Estimated hours taken: 1.5
Branches: main
extras/lex/lex.m:
Changed the lexing behaviour so that in ambiguous cases, the token
returned is that for the first competing lexeme given in the call
to init/[2,3]. This brings lex.m in line with the standard C lex.
Also included minor syntactic clean-ups.
Estimated hours taken: 1
Branches: main
Recovered the lost check-ins from the CVS archive, obtained and
applied the patches and checked that all is well.
extras/lex/Mmakefile:
extras/lex/README:
extras/lex/lex.automata.m:
extras/lex/lex.buf.m:
extras/lex/lex.convert_NFA_to_DFA.m:
extras/lex/lex.lexeme.m:
extras/lex/lex.m:
extras/lex/lex.regexp.m:
extras/lex/samples/demo.m:
Patched to recover the lost check-in.
Estimated hours taken: 20
Branches: main
These changes were made by Holger Krug <hkrug@rationalizer.com> and
checked in by Ralph Becket.
The interface to lex has received a complete overhaul: unfortunately
any old code (was there any?) using lex will have to be changed to
suit.
The syntax for defining regular expression has been simplified
through judicious use of type classes.
The annotated_lexeme type has been changed: rather than distinguishing
between value, noval and ignore tags, each lexeme is associated with a
function to turn the matched string into a token.
An argument to lex__init/3 allows the user to supply a predicate
indicating which tokens should be ignored in the input stream.
Various utility predicates and functions have been provided.
lex.buf.m:
lex.convert.m:
Minor cosmetic changes.
lex.lexeme.m:
Cosmetic changes and alterations to deal with the new
token reporting interface.
lex.regexp.m:
Cosmetic changes and simplifcation of the regexp definition
language.
lex.m:
Changes to the API to deal with the new token reporting
interface. Some cosmetic changes.
COPYING.DOC:
COPYING.LGPL:
Added to comply with the licence for these changes.
README:
Updated to describe the new API.
samples/demo.m:
Updated to use the new API.