mirror of
https://github.com/Mercury-Language/mercury.git
synced 2026-04-15 01:13:30 +00:00
When you have an unclosed (, [ or { in a clause, the diagnostic
you got did not tell you
- where the unclosed parenthesis was,
- which kind of parenthesis it was.
Fix this by including both pieces of information in the diagnostic.
Likewise, print more useful info for mixed-up parentheses,
such as [(]).
library/mercury_term_parser.m:
When consuming a (, [ or { token, push it and its context on a stack.
When consuming a ), ] or } token, pop off the top item from this stack,
and generate a diagnostic if the close token does not match it.
The one exception from this pushing and pulling is for code that
handles the case where the open is followed immediately by
the matching close, such as when parsing [] or {}.
Print the contents of the stack also when getting to either
the end of the term, or the end of the input, with a nonempty stack.
Maintaining this stack has a small performance cost, but I expect
it to be negligible, especially compared to the usefulness
of the new detail in diagnostics,
Completely rework the error handling parts of this module.
The main changes are the following.
First, the old code used to include *part* of the intended message
in the pr_error structures it created, with a "Syntax error: "
prefix being added later. Since this makes it hard to ensure
that the error messages follow the rules of English, change this
to generate each error message all at once.
Second, the old code included the list of the remaining tokens
in each pr_error structure. This was overkill, because the only part
of this list that was used was the id and the context of the
first token in the list. Apart from being inelegant, the main flaw
of this approach was that in the case of premature end-of-file
errors, the only token list available was token_nil, which
of course contains neither a token nor its context. The old code
compensated for it later by using the context of the *first* token
of the whole term being parsed, which is ... less than useful.
(The missing token is trivially replaced by "end-of-file".)
The new code replaces the token list with the context, if it
is available; if it is not, then later we compute the context
of the last token in the whole token list. The new code
does not return the token itself; instead, it includes
its string version in the generated error message where appropriate.
Third, as mentioned above, we now include info about unbalanced
(), [] and {} pairs in diagnostics, as extra sentences.
(These extra sentences are preceded by \n characters;
see the change to parse_module.m below.)
Fifth, to make the above possible without adding unnecesary
complications, the diagnostic texts this module generates
now always include the period at the ends of sentences:
they are not added by the compiler.
Fourth, we now consistently use "Syntax error at token abc:
expected def, fgh, or xyz" phraseology.
library/mercury_term_lexer.m:
Stop requiring the customers of this module to handle
- integer_dot tokens, which are needed only by, and are
an implementation detail of, the get_* family of predicates, and
- eof tokens, which the lexer also never returns, converting each one
into the end of its token list instead.
The fact that the lexer never returned integer_dot tokens was
documented, but the fact that it never returned eof tokens was not.
The reason for this change was simply that I did not want to write
two pieces of code to handle the out-of-input case in each affected
spot in the parser: once for an eof token, and once for token_nil.
library/stack.m:
Add a utility function needed by new code in mercury_term_parser.m.
compiler/parse_module.m:
Stop adding a period at the ends of error messages generated by
mercury_term_parser.m; mercury_term_parser.m now adds those itself.
Do post-process those messages by turning any \n characters in them
into nl format_pieces.
NEWS.md:
Announce the change in mercury_term_lexer.m, and the
new function in stack.m.
library/io.text_read.m:
Unrelated bug fix, for which I discovered the need while
working on the other library files: add a missing foreign import.
tests/invalid_nodepend/unbalanced.{m,err_exp}:
A new test case to check the updated diagnostics.
tests/invalid_nodepend/Mmakefile:
Enable the new test case.
tests/hard_coded/parse_number_from_string.exp:
tests/invalid_nodepend/impl_def_literal_syntax.err_exp:
tests/invalid_nodepend/invalid_binary_literal.err_exp:
tests/invalid_nodepend/invalid_float_literal.err_exp:
tests/invalid_nodepend/invalid_hex_literal.err_exp:
tests/invalid_nodepend/invalid_octal_literal.err_exp:
tests/invalid_nodepend/null_char.err_exp:
tests/invalid_nodepend/typeclass_test_1.err_exp:
tests/invalid_nodepend/unicode_1.err_exp:
tests/invalid_nodepend/unicode_2.err_exp:
tests/invalid_purity/purity_nonsense_2.err_exp:
Expect the updated diagnostics.
114 lines
8.4 KiB
Plaintext
114 lines
8.4 KiB
Plaintext
Valid decimal literals:
|
|
read_term("0.") = functor(integer(base_10, i(0, []), signed, size_word), [], context("", 1))
|
|
read_term("-0.") = functor(integer(base_10, i(0, []), signed, size_word), [], context("", 1))
|
|
read_term("00.") = functor(integer(base_10, i(0, []), signed, size_word), [], context("", 1))
|
|
read_term("0_0.") = functor(integer(base_10, i(0, []), signed, size_word), [], context("", 1))
|
|
read_term("10.") = functor(integer(base_10, i(1, [10]), signed, size_word), [], context("", 1))
|
|
read_term("-10.") = functor(integer(base_10, i(-1, [-10]), signed, size_word), [], context("", 1))
|
|
read_term("1_0.") = functor(integer(base_10, i(1, [10]), signed, size_word), [], context("", 1))
|
|
read_term("-1_0.") = functor(integer(base_10, i(-1, [-10]), signed, size_word), [], context("", 1))
|
|
read_term("01.") = functor(integer(base_10, i(1, [1]), signed, size_word), [], context("", 1))
|
|
read_term("0_1.") = functor(integer(base_10, i(1, [1]), signed, size_word), [], context("", 1))
|
|
read_term("1_000_000_000_000_000_000_000.") = functor(integer(base_10, i(5, [13877, 12907, 7261, 14976, 0]), signed, size_word), [], context("", 1))
|
|
read_term("-1_000_000_000_000_000_000_000.") = functor(integer(base_10, i(-5, [-13877, -12907, -7261, -14976, 0]), signed, size_word), [], context("", 1))
|
|
|
|
Invalid decimal literals:
|
|
read_term("123_.") = Syntax error: an underscore should separate two digits; it should not appear just before a decimal point.
|
|
read_term("-123_.") = Syntax error: an underscore should separate two digits; it should not appear just before a decimal point.
|
|
read_term("-_123") = Syntax error at end-of-file: expected an operator, or `.'.
|
|
|
|
Valid binary literals:
|
|
read_term("0b0.") = functor(integer(base_2, i(0, []), signed, size_word), [], context("", 1))
|
|
read_term("-0b0.") = functor(integer(base_2, i(0, []), signed, size_word), [], context("", 1))
|
|
read_term("0b_1.") = functor(integer(base_2, i(1, [1]), signed, size_word), [], context("", 1))
|
|
read_term("-0b_1.") = functor(integer(base_2, i(-1, [-1]), signed, size_word), [], context("", 1))
|
|
read_term("0b_1000_100.") = functor(integer(base_2, i(1, [68]), signed, size_word), [], context("", 1))
|
|
read_term("-0b_1000_100.") = functor(integer(base_2, i(-1, [-68]), signed, size_word), [], context("", 1))
|
|
|
|
Invalid binary literals:
|
|
read_term("0b.") = Syntax error: 0b is not followed by binary digits.
|
|
read_term("-0b.") = Syntax error: 0b is not followed by binary digits.
|
|
read_term("0b_.") = Syntax error: 0b is not followed by binary digits.
|
|
read_term("-0b_.") = Syntax error: 0b is not followed by binary digits.
|
|
read_term("0b11_.") = Syntax error: a binary literal cannot end with an underscore.
|
|
read_term("-0b11_.") = Syntax error: a binary literal cannot end with an underscore.
|
|
|
|
Valid octal literals:
|
|
read_term("0o77.") = functor(integer(base_8, i(1, [63]), signed, size_word), [], context("", 1))
|
|
read_term("-0o77.") = functor(integer(base_8, i(-1, [-63]), signed, size_word), [], context("", 1))
|
|
read_term("0o_77.") = functor(integer(base_8, i(1, [63]), signed, size_word), [], context("", 1))
|
|
read_term("-0o_77.") = functor(integer(base_8, i(-1, [-63]), signed, size_word), [], context("", 1))
|
|
read_term("0o_7_7.") = functor(integer(base_8, i(1, [63]), signed, size_word), [], context("", 1))
|
|
read_term("-0o_7_7.") = functor(integer(base_8, i(-1, [-63]), signed, size_word), [], context("", 1))
|
|
read_term("0o_7__7___7.") = functor(integer(base_8, i(1, [511]), signed, size_word), [], context("", 1))
|
|
read_term("-0o_7__7___7.") = functor(integer(base_8, i(-1, [-511]), signed, size_word), [], context("", 1))
|
|
|
|
Invalid octal literals:
|
|
read_term("0o.") = Syntax error: 0o is not followed by octal digits.
|
|
read_term("-0o") = Syntax error: 0o is not followed by octal digits.
|
|
read_term("0o_.") = Syntax error: 0o is not followed by octal digits.
|
|
read_term("-0o_.") = Syntax error: 0o is not followed by octal digits.
|
|
read_term("0o77_.") = Syntax error: an octal literal cannot end with an underscore.
|
|
read_term("-0o77_.") = Syntax error: an octal literal cannot end with an underscore.
|
|
|
|
Valid hexadecimal literals:
|
|
read_term("0xff.") = functor(integer(base_16, i(1, [255]), signed, size_word), [], context("", 1))
|
|
read_term("-0xff.") = functor(integer(base_16, i(-1, [-255]), signed, size_word), [], context("", 1))
|
|
read_term("0x_ff.") = functor(integer(base_16, i(1, [255]), signed, size_word), [], context("", 1))
|
|
read_term("-0x_ff.") = functor(integer(base_16, i(-1, [-255]), signed, size_word), [], context("", 1))
|
|
read_term("0xf_f.") = functor(integer(base_16, i(1, [255]), signed, size_word), [], context("", 1))
|
|
read_term("-0xf_f.") = functor(integer(base_16, i(-1, [-255]), signed, size_word), [], context("", 1))
|
|
read_term("0x_f_f__f.") = functor(integer(base_16, i(1, [4095]), signed, size_word), [], context("", 1))
|
|
read_term("-0x_f_f__f.") = functor(integer(base_16, i(-1, [-4095]), signed, size_word), [], context("", 1))
|
|
read_term("0xfffffffffffffffffffffffff.") = functor(integer(base_16, i(8, [3, 16383, 16383, 16383, 16383, 16383, 16383, 16383]), signed, size_word), [], context("", 1))
|
|
read_term("-0xfffffffffffffffffffffffff.") = functor(integer(base_16, i(-8, [-3, -16383, -16383, -16383, -16383, -16383, -16383, -16383]), signed, size_word), [], context("", 1))
|
|
|
|
Invalid hexadecimal literals:
|
|
read_term("0x.") = Syntax error: 0x is not followed by hexadecimal digits.
|
|
read_term("-0x.") = Syntax error: 0x is not followed by hexadecimal digits.
|
|
read_term("0x_.") = Syntax error: 0x is not followed by hexadecimal digits.
|
|
read_term("-0x_.") = Syntax error: 0x is not followed by hexadecimal digits.
|
|
read_term("0xff_.") = Syntax error: a hexadecimal literal cannot end with an underscore.
|
|
read_term("-0xff_.") = Syntax error: a hexadecimal literal cannot end with an underscore.
|
|
|
|
Valid float literals:
|
|
read_term("0.123.") = functor(float(0.123), [], context("", 1))
|
|
read_term("-0.123.") = functor(float(-0.123), [], context("", 1))
|
|
read_term("0.1_2__3.") = functor(float(0.123), [], context("", 1))
|
|
read_term("-0.1_2__3.") = functor(float(-0.123), [], context("", 1))
|
|
read_term("1.123.") = functor(float(1.123), [], context("", 1))
|
|
read_term("-1.123.") = functor(float(-1.123), [], context("", 1))
|
|
read_term("1_2.123.") = functor(float(12.123), [], context("", 1))
|
|
read_term("-1_2.123.") = functor(float(-12.123), [], context("", 1))
|
|
read_term("1__2.1_2__3.") = functor(float(12.123), [], context("", 1))
|
|
read_term("-1__2.1_2__3.") = functor(float(-12.123), [], context("", 1))
|
|
read_term("1_2_3e1_1.") = functor(float(12300000000000.0), [], context("", 1))
|
|
read_term("1_2_3E1_1.") = functor(float(12300000000000.0), [], context("", 1))
|
|
read_term("1_2e+1_1.") = functor(float(1200000000000.0), [], context("", 1))
|
|
read_term("1_2E+1_1.") = functor(float(1200000000000.0), [], context("", 1))
|
|
read_term("1_2e-1_1.") = functor(float(1.2e-10), [], context("", 1))
|
|
read_term("1_2E-1_1.") = functor(float(1.2e-10), [], context("", 1))
|
|
read_term("00.0.") = functor(float(0.0), [], context("", 1))
|
|
read_term("0_0.0.") = functor(float(0.0), [], context("", 1))
|
|
read_term("01.0.") = functor(float(1.0), [], context("", 1))
|
|
read_term("0_1.0.") = functor(float(1.0), [], context("", 1))
|
|
|
|
Invalid float literals:
|
|
read_term("1_2_3.1_2_3_.") = Syntax error: fractional part of float terminated by underscore.
|
|
read_term("1_2_3e1_2_3_.") = Syntax error: unterminated exponent in float literal.
|
|
read_term("123_._123.") = Syntax error: an underscore should separate two digits; it should not appear just before a decimal point.
|
|
read_term("123._123.") = Syntax error: underscore following decimal point.
|
|
read_term("123_.123.") = Syntax error: an underscore should separate two digits; it should not appear just before a decimal point.
|
|
read_term("123_e12.") = functor(float(123000000000000.0), [], context("", 1))
|
|
read_term("123_E12.") = functor(float(123000000000000.0), [], context("", 1))
|
|
read_term("123e_12.") = Syntax error: unterminated exponent in float literal.
|
|
read_term("123E_12.") = Syntax error: unterminated exponent in float literal.
|
|
read_term("123e12_.") = Syntax error: unterminated exponent in float literal.
|
|
read_term("123E12_.") = Syntax error: unterminated exponent in float literal.
|
|
read_term("12_e11.") = functor(float(1200000000000.0), [], context("", 1))
|
|
read_term("12_E11.") = functor(float(1200000000000.0), [], context("", 1))
|
|
read_term("123.12e-_12.") = Syntax error: unterminated exponent in float literal.
|
|
read_term("123.12e+_12.") = Syntax error: unterminated exponent in float literal.
|
|
read_term("123.12e12_.") = Syntax error: unterminated exponent in float literal.
|
|
read_term("123.12E12_.") = Syntax error: unterminated exponent in float literal.
|