mercury/deep_profiler/DESIGN

This file describes, explains and justifies the changes made in the design
of the deep profiler since the deep profiler paper.

The mdprof_cgi program is the core of the deep profiler. Every web page the
user looks at in a profile viewing session is generated by a separate
invocation of mdprof_cgi. We don't have a choice about this; the design
of the CGI interface to web servers dictates this mode of operation.
The CGI interface also requires each invocation of mdprof_cgi to exit
when it finishes generating a web page; the web server doesn't know
that the web page is complete until the program generating it exits.

Unfortunately, reading and processing a deep profiling data file takes a
significant amount of time, so we don't want to reread the profiling data file
on every query. The solution we have adopted has mdprof_cgi sometimes
forking into two processes after it has generated a web page. The original
process (the parent) exits, to allow the web browser to display the generated
page. The child process sticks around as a server, processing queries from
later invocations of mdprof_cgi that specify the same profiling data file.
This server process communicates with these later invocations through a pair
of named pipes, whose names include the name of the profiling data file
(in a mangled form which translates /s to other characters).

This design has an inherent race condition when two mdprof_cgi processes
are created close together in time, they specify the same profiling data file,
and no server process for that data file exists. (If they specify different
profiling data files, then they are handled independently; if there is already
a server process for that data file, they will both send their queries to it.)
The problem is that the two processes compete to become the server.

The solution we adopt is to create a critical section, a piece of code that can
be executed by only one mdprof_cgi process at a time. To be maximally portable,
we use the `open' system call with the O_CREAT and O_EXCL flags, which make
the open call fail if the file we want to create already exists. Since the rate
at which we get new CGI requests is limited by human typing and/or clicking
speed, we can tolerate the overhead of this method of gaining mutual exclusion.
Releasing the mutual exclusion involves simply removing the lock file.

The code in the critical section includes both checking for the prior existence
of a server (which consists of checking for the existence of the two named
pipes) and, in the absence of an existing server process, the creation of the
named pipes and the commitment to become the server behind those pipes.
This solves the race condition described above: whichever mdprof_cgi process
gains entry to the critical section first will become the server, and all other
mdprof_cgi processes will send their queries to it (i.e. they will become
clients of the server).

Of course, we don't want server processes to live and thus consume resources
forever. The server process therefore has a timeout: it deletes its pipes
and exits when it has been idle for a given amount of time. The timeout
sets up another race condition, since there is a time window between the
delivery of the timeout alarm signal and the deletion of the pipes. If another
mdprof_cgi process tests for the existence of the pipes during this interval,
it could find that they do exist, and therefore commit to becoming a client,
only to find that either the pipes or the process behind them don't exist
when it actually tries to send its query to the server.

The solution we use to eliminate this race condition has two parts. First,
we make the timeout code in the server get mutual exclusion on the same lock
file that we use to solve client/client races, thus ensuring that all
operations that create or delete the pipes are guarded by the same mutex.
This is necessary because we use the existence of the pipes to denote the
existence of a server process committed to serving queries on the associated
profiling data file. However, this doesn't prevent the server from getting
the mutex, deleting the pipes and exiting immediately after another mdprof_cgi
process got the mutex, found that the pipes existed, committed to becoming
a client, and released the mutex. Therefore inside the critical section
in the timeout code, we check for the existence of waiting clients, and
abort the timeout if any exist. Since all operations on one end of a named
pipe block until another process performs the complementary operation on
the other end, we cannot use tests on the pipes to check for waiting clients.
Instead, each client creates a file with a known prefix to indicate that it
wants to use the services of a server, if one already exists. Would-be clients
create this file before releasing the mutex (actually, before even getting
the mutex) and do not delete it until they exit. (Unless they become the server
instead, in which case they delete it when they make that decision.)
The server aborts the timeout if it finds any of these `want' files.

On a very slow machine, it is possible for the server to abort a timeout
because of a want file that its creator is about to delete just before exiting.
However, this is OK, because when the server aborts its timeout, it sets up
another timeout, so the exit of the server is not delayed indefinitely.

The following is a high level description of mdprof_cgi in pseudocode:

	create "want" file
	get mutual exclusion (i.e. create mutex file)
	if the named pipes exist
		commit to being a client
		send the query to the server through the toserver pipe
		release mutual exclusion (i.e. delete mutex file)
		receive the result from the server through the fromserver pipe
		remove "want" file
	else
		read the profile data file and preprocess it
		if the reading and preprocessing found any errors
			report the error
			release mutual exclusion (i.e. delete mutex file)
			remove "want" file
		else
			report the result of the initial query
			commit to being a server
			create the named pipes
			release mutual exclusion (i.e. delete mutex file)
			remove "want" file
			loop forever
				setup timeout
				receive query on toserver pipe
				send result to fromserver pipe
			done
		fi
	fi

The pseudocode of the timeout handler:

	try to get mutual exclusion (i.e. create mutex file)
	if we got mutual exclusion
		check whether any want files exist
		if some do
			abort the timeout
		else
			clean up, deleting the mutex file last
			exit
		fi
	else
		some client has the lock, they need a server,
			so abort the timeout
	fi

The initial design of the deep profiler had two separate programs. The old
mdprof_cgi was strictly a client: it *always* sent the query to a process
running the other program, mdprof_server. The problem with this approach
is that having mdprof_cgi invoke mdprof_server via fork and exec made
it very difficult to debug mdprof_server and its interaction with mdprof_cgi,
and to control the level of detail in the diagnostics we print about any
problems we discover (whether in the profiling data file or in the manipulation
of the underlying infrastructure, e.g. the named pipes). The current design
allows a single debugging session to debug any part of the deep profiler
simply by specifying an option that inhibits the fork that normally detaches
the server process, and we can set option flags controlling diagnostics
directly, instead of setting the server's flags indirectly by specifying flags
to the client.