Files
mercury/compiler/c_util.m
Simon Taylor 5647714667 Make all functions which create strings from characters throw an exception
Estimated hours taken: 15
Branches: main

Make all functions which create strings from characters throw an exception
or fail if the list of characters contains a null character.

This removes a potential source of security vulnerabilities where one
part of the program performs checks against the whole of a string passed
in by an attacker (processing the string as a list of characters or using
`unsafe_index' to look past the null character), but then passes the string
to another part of the program or an operating system call that only sees
up to the first null character.  Even if Mercury stored the length with
the string, allowing the creation of strings containing nulls would be a
bad idea because it would be too easy to pass a string to foreign code
without checking.

For examples see:
<http://insecure.org/news/P55-07.txt>
<http://www.securiteam.com/securitynews/5WP0B1FKKQ.html>
<http://www.securityfocus.com/archive/1/445788>
<http://www.securityfocus.com/archive/82/368750>
<http://secunia.com/advisories/16420/>

NEWS:
	Document the change.

library/string.m:
	Throw an exception if null characters are found in
	string.from_char_list and string.from_rev_char_list.

	Add string.from_char_list_semidet and string.from_rev_char_list_semidet
	which fail rather throwing an exception.  This doesn't match the
	normal naming convention, but string.from_{,rev_}char_list are widely
	used, so changing their determinism would be a bit too disruptive.

	Don't allocate an unnecessary extra word for each string created by
	from_char_list and from_rev_char_list.

	Explain that to_upper and to_lower only work on un-accented
	Latin letters.

library/lexer.m:
	Check for invalid characters when reading Mercury strings and
	quoted names.

	Improve error messages by skipping to the end of any string
	or quoted name containing an error.  Previously we just stopped
	processing at the error leaving an unmatched quote.

library/io.m:
	Make io.read_line_as_string and io.read_file_as_string return
	an error code if the input file contains a null character.

	Fix an XXX: '\0\' is not recognised as a character constant,
	but char.det_from_int can be used to make a null character.

library/char.m:
	Explain the workaround for '\0\' not being accepted as a char
	constant.

	Explain that to_upper and to_lower only work on un-accented
	Latin letters.

compiler/layout.m:
compiler/layout_out.m:
compiler/c_util.m:
compiler/stack_layout.m:
compiler/llds.m:
compiler/mlds.m:
compiler/ll_backend.*.m:
compiler/ml_backend.*.m:
	Don't pass around strings containing null characters (the string
	tables for the debugger).  This doesn't cause any problems now,
	but won't work with the accurate garbage collector.  Use lists
	of strings instead, and add the null characters when writing the
	strings out.

tests/hard_coded/null_char.{m,exp}:
	Change an existing test case to test that creation of a string
	containing a null throws an exception.

tests/hard_coded/null_char.exp2:
	Deleted because alternative output is no longer needed.

tests/invalid/Mmakefile:
tests/invalid/null_char.m:
tests/invalid/null_char.err_exp:
	Test error messages for construction of strings containing null
	characters by the lexer.

tests/invalid/unicode{1,2}.err_exp:
	Update the expected output after the change to the handling of
	invalid quoted names and strings.
2007-03-18 23:35:04 +00:00

458 lines
15 KiB
Mathematica

%-----------------------------------------------------------------------------%
% vim: ft=mercury ts=4 sw=4 et
%-----------------------------------------------------------------------------%
% Copyright (C) 1999-2007 The University of Melbourne.
% This file may only be copied under the terms of the GNU General
% Public License - see the file COPYING in the Mercury distribution.
%-----------------------------------------------------------------------------%
%
% File: c_util.m.
% Main author: fjh.
%
% This module defines utility routines that are useful when generating and/or
% emitting C code. Some of these routines are also useful with other languages
% whose syntax is similar to C.
%
% NOTE: changes to this module may require changes to be made to java_util.m.
%
%-----------------------------------------------------------------------------%
:- module backend_libs.c_util.
:- interface.
:- import_module backend_libs.builtin_ops.
:- import_module char.
:- import_module io.
:- import_module list.
%-----------------------------------------------------------------------------%
%
% Line numbering.
% set_line_num(FileName, LineNum):
%
% Emit a #line directive to set the specified filename and linenumber
% so that C compiler error messages etc. will refer to the correct location
% in the original source file location.
%
:- pred set_line_num(string::in, int::in, io::di, io::uo) is det.
% Emit a #line directive to cancel the effect of any previous #line
% directives, so that C compiler error messages etc. will refer to the
% appropriate location in the generated .c file.
%
:- pred reset_line_num(io::di, io::uo) is det.
%-----------------------------------------------------------------------------%
%
% String and character handling.
% Print out a string suitably escaped for use as a C string literal.
% This doesn't actually print out the enclosing double quotes --
% that is the caller's responsibility.
%
:- pred output_quoted_string(string::in, io::di, io::uo) is det.
% output_quoted_multi_string is like list.foldl(output_quoted_string)
% except that a null character will be written between each string
% in the list.
%
:- type multi_string == list(string).
:- pred output_quoted_multi_string(multi_string::in, io::di, io::uo) is det.
% Print out a char suitably escaped for use as a C char literal.
% This doesn't actually print out the enclosing single quotes --
% that is the caller's responsibility.
%
:- pred output_quoted_char(char::in, io::di, io::uo) is det.
% Convert a string to a form that is suitably escaped for use as a
% C string literal. This doesn't actually add the enclosing double quotes
% -- that is the caller's responsibility.
%
:- pred quote_string(string::in, string::out) is det.
% Convert a character to a form that is suitably escaped for use as a
% C character literal. This doesn't actually add the enclosing single
% quotes -- that is the caller's responsibility.
%
:- pred quote_char(char::in, string::out) is det.
%-----------------------------------------------------------------------------%
%
% Float literals.
% Convert a float to a string suitable for use as a C (or Java, or IL)
% floating point literal.
%
:- func make_float_literal(float) = string.
% As above, but write the string to the current output stream
% rather than returning it.
%
:- pred output_float_literal(float::in, io::di, io::uo) is det.
%-----------------------------------------------------------------------------%
%
% Operators.
%
% The following predicates all take as input an operator, check if it is
% an operator of the specified kind, and if so, return the name of the
% corresponding C operator that can be used to implement it.
% The operator returned will be <, >, etc.;
% it can be used in the form `strcmp(<Arg1>, <Arg2>) <Op> 0'.
%
:- pred string_compare_op(binary_op::in, string::out) is semidet.
% The operator returned will be +, *, etc.;
% the arguments should be floats and the result will be a float.
%
:- pred float_op(binary_op::in, string::out) is semidet.
% The operator returned will be <, >, etc.;
% the arguments should be floats and the result will be a boolean.
%
:- pred float_compare_op(binary_op::in, string::out) is semidet.
% The operator returned will be an infix operator. The arguments should be
% cast to MR_Unsigned, and the result will be a boolean.
%
:- pred unsigned_compare_op(binary_op::in, string::out) is semidet.
% The operator returned will be either a prefix operator or a macro
% or function name. The operand needs to be placed in parentheses
% after the operator name.
%
:- pred unary_prefix_op(unary_op::in, string::out) is det.
% The operator returned will be an infix operator. The arguments should be
% integer or booleans and the result will be an integer or a boolean.
%
:- pred binary_infix_op(binary_op::in, string::out) is semidet.
%-----------------------------------------------------------------------------%
% output_c_file_intro_and_grade(SourceFileName, Version):
%
% Outputs a comment which includes the settings used to generate
% the C file. This is used by configure to check the any existing C files
% are consistent with the current configuration. SourceFileName is the
% name of the file from which the C is generated, while Version is the
% version name of the mercury compiler.
%
:- pred output_c_file_intro_and_grade(string::in, string::in,
io::di, io::uo) is det.
%-----------------------------------------------------------------------------%
%-----------------------------------------------------------------------------%
:- implementation.
:- import_module libs.
:- import_module libs.globals.
:- import_module libs.options.
:- import_module bool.
:- import_module int.
:- import_module list.
:- import_module string.
%-----------------------------------------------------------------------------%
%
% Line numbering.
set_line_num(File, Line, !IO) :-
globals.io_lookup_bool_option(line_numbers, LineNumbers, !IO),
(
LineNumbers = yes,
(
Line > 0,
File \= ""
->
io.write_string("#line ", !IO),
io.write_int(Line, !IO),
io.write_string(" """, !IO),
output_quoted_string(File, !IO),
io.write_string("""\n", !IO)
;
reset_line_num(!IO)
)
;
LineNumbers = no
).
reset_line_num(!IO) :-
% We want to generate another #line directive to reset the C compiler's
% idea of what it is processing back to the file we are generating.
io.get_output_line_number(Line, !IO),
io.output_stream_name(FileName, !IO),
globals.io_lookup_bool_option(line_numbers, LineNumbers, !IO),
(
Line > 0,
FileName \= "",
LineNumbers = yes
->
io.write_string("#line ", !IO),
io.write_int(Line + 1, !IO),
io.write_string(" """, !IO),
output_quoted_string(FileName, !IO),
io.write_string("""\n", !IO)
;
true
).
%-----------------------------------------------------------------------------%
%
% String and character handling.
output_quoted_string(S, !IO) :-
output_quoted_string(0, length(S), S, !IO).
output_quoted_multi_string([], !IO).
output_quoted_multi_string([S | Ss], !IO) :-
output_quoted_string(S, !IO),
output_quoted_char(char.det_from_int(0), !IO),
output_quoted_multi_string(Ss, !IO).
:- pred output_quoted_string(int::in, int::in, string::in,
io::di, io::uo) is det.
output_quoted_string(Cur, Len, S, !IO) :-
( Cur < Len ->
% Avoid a limitation in the MSVC compiler where string literals
% can be no longer then 2048 chars. However if you output the string
% in chunks, eg "part a" "part b" it will accept a string longer than
% 2048 chars, go figure!
(
Cur \= 0,
Cur mod 512 = 0
->
io.write_string("\" \"", !IO)
;
true
),
string.unsafe_index(S, Cur, Char),
output_quoted_char(Char, !IO),
% Check for trigraph sequences in string literals. We break the
% trigraph by breaking the string into multiple chunks. For example,
% "??-" gets converted to "?" "?-".
(
Char = '?',
Cur < Len + 2
->
(
string.unsafe_index(S, Cur + 1, '?'),
string.unsafe_index(S, Cur + 2, ThirdChar),
is_trigraph_char(ThirdChar)
->
io.write_string("\" \"", !IO)
;
true
)
;
true
),
output_quoted_string(Cur + 1, Len, S, !IO)
;
true
).
output_quoted_char(Char, !IO) :-
quote_char(Char, EscapedChars),
io.write_string(EscapedChars, !IO).
quote_char(Char, QuotedChar) :-
quote_one_char(Char, [], RevQuotedChar),
string.from_rev_char_list(RevQuotedChar, QuotedChar).
quote_string(String, QuotedString) :-
string.foldl(quote_one_char, String, [], RevQuotedChars),
string.from_rev_char_list(RevQuotedChars, QuotedString).
:- pred quote_one_char(char::in, list(char)::in, list(char)::out) is det.
quote_one_char(Char, RevChars0, RevChars) :-
( escape_special_char(Char, EscapeChar) ->
RevChars = [EscapeChar, '\\' | RevChars0]
; is_c_source_char(Char) ->
RevChars = [Char | RevChars0]
; char.to_int(Char, 0) ->
RevChars = ['0', '\\' | RevChars0]
;
escape_any_char(Char, EscapeChars),
reverse_append(EscapeChars, RevChars0, RevChars)
).
:- pred escape_special_char(char::in, char::out) is semidet.
escape_special_char('"', '"').
escape_special_char('''', '''').
escape_special_char('\\', '\\').
escape_special_char('\n', 'n').
escape_special_char('\t', 't').
escape_special_char('\b', 'b').
escape_special_char('\a', 'a').
escape_special_char('\v', 'v').
escape_special_char('\r', 'r').
escape_special_char('\f', 'f').
% Succeed if the given character, prefixed with "??", is a trigraph.
%
:- pred is_trigraph_char(char::in) is semidet.
is_trigraph_char('(').
is_trigraph_char(')').
is_trigraph_char('<').
is_trigraph_char('>').
is_trigraph_char('=').
is_trigraph_char('/').
is_trigraph_char('\'').
is_trigraph_char('!').
is_trigraph_char('-').
% This succeeds iff the specified character is allowed as an (unescaped)
% character in standard-conforming C source code.
%
:- pred is_c_source_char(char::in) is semidet.
is_c_source_char(Char) :-
( char.is_alnum(Char)
; string.contains_char(c_graphic_chars, Char)
).
% This returns a string containing all the characters that the C standard
% specifies as being included in the "basic execution character set",
% except for the letters (a-z A-Z) and digits (0-9).
%
:- func c_graphic_chars = string.
c_graphic_chars = " !\"#%&'()*+,-./:;<=>?[\\]^_{|}~".
% reverse_append(Xs, Ys, Zs) <=> Zs = list.reverse(Xs) ++ Ys.
%
:- pred reverse_append(list(T)::in, list(T)::in, list(T)::out) is det.
reverse_append([], L, L).
reverse_append([X | Xs], L0, L) :-
reverse_append(Xs, [X | L0], L).
:- pred escape_any_char(char::in, list(char)::out) is det.
% Convert a character to the corresponding C octal escape code.
% XXX This assumes that the target language compiler's representation
% of characters is the same as the Mercury compiler's.
%
escape_any_char(Char, EscapeCodeChars) :-
char.to_int(Char, Int),
string.int_to_base_string(Int, 8, OctalString0),
string.pad_left(OctalString0, '0', 3, OctalString),
EscapeCodeChars = ['\\' | string.to_char_list(OctalString)].
%-----------------------------------------------------------------------------%
%
% Floating point literals.
%
% XXX These routines do not yet handle infinities and NaNs properly.
make_float_literal(Float) = string.format("%#.17g", [f(Float)]).
% This is used by the C, Java, and IL back-ends,
% so the output must be valid syntax in all three languages.
%
% We output literals using 17 digits of precision. This is the minimum
% needed to be able to convert IEEE double-precision floating point values
% to strings and back again without losing precision.
output_float_literal(Float, !IO) :-
io.write_string(make_float_literal(Float), !IO).
%-----------------------------------------------------------------------------%
%
% Operators.
unary_prefix_op(mktag, "MR_mktag").
unary_prefix_op(tag, "MR_tag").
unary_prefix_op(unmktag, "MR_unmktag").
unary_prefix_op(mkbody, "MR_mkbody").
unary_prefix_op(unmkbody, "MR_unmkbody").
unary_prefix_op(strip_tag, "MR_strip_tag").
unary_prefix_op(hash_string, "MR_hash_string").
unary_prefix_op(bitwise_complement, "~").
unary_prefix_op(logical_not, "!").
string_compare_op(str_eq, "==").
string_compare_op(str_ne, "!=").
string_compare_op(str_le, "<=").
string_compare_op(str_ge, ">=").
string_compare_op(str_lt, "<").
string_compare_op(str_gt, ">").
unsigned_compare_op(unsigned_le, "<=").
float_op(float_plus, "+").
float_op(float_minus, "-").
float_op(float_times, "*").
float_op(float_divide, "/").
float_compare_op(float_eq, "==").
float_compare_op(float_ne, "!=").
float_compare_op(float_le, "<=").
float_compare_op(float_ge, ">=").
float_compare_op(float_lt, "<").
float_compare_op(float_gt, ">").
binary_infix_op(int_add, "+").
binary_infix_op(int_sub, "-").
binary_infix_op(int_mul, "*").
binary_infix_op(int_div, "/").
binary_infix_op(unchecked_left_shift, "<<").
binary_infix_op(unchecked_right_shift, ">>").
binary_infix_op(bitwise_and, "&").
binary_infix_op(bitwise_or, "|").
binary_infix_op(bitwise_xor, "^").
binary_infix_op(int_mod, "%").
binary_infix_op(eq, "==").
binary_infix_op(ne, "!=").
binary_infix_op(logical_and, "&&").
binary_infix_op(logical_or, "||").
binary_infix_op(int_lt, "<").
binary_infix_op(int_gt, ">").
binary_infix_op(int_le, "<=").
binary_infix_op(int_ge, ">=").
%-----------------------------------------------------------------------------%
output_c_file_intro_and_grade(SourceFileName, Version, !IO) :-
globals.io_lookup_int_option(num_tag_bits, NumTagBits, !IO),
string.int_to_string(NumTagBits, NumTagBitsStr),
globals.io_lookup_bool_option(unboxed_float, UnboxedFloat, !IO),
UnboxedFloatStr = convert_bool_to_string(UnboxedFloat),
io.write_strings([
"/*\n",
"** Automatically generated from `", SourceFileName, "'\n",
"** by the Mercury compiler,\n",
"** version ", Version, ".\n",
"** Do not edit.\n",
"**\n",
"** The autoconfigured grade settings governing\n",
"** the generation of this C file were\n",
"**\n",
"** TAG_BITS=", NumTagBitsStr, "\n",
"** UNBOXED_FLOAT=", UnboxedFloatStr, "\n",
"**\n",
"** END_OF_C_GRADE_INFO\n",
"*/\n",
"\n"
], !IO).
:- func convert_bool_to_string(bool) = string.
convert_bool_to_string(no) = "no".
convert_bool_to_string(yes) = "yes".
%-----------------------------------------------------------------------------%