Copyright (C) The University of Melbourne 2012

There is one definite problem with the current implementation and several
potential ones.  I'm in the process of testing the following proposal.


Engine states and notifications
-------------------------------

An engine may be in one of the following states, see the es_state field
engine_sleep_sync_i

working      The engine has work to do and is working on it.
             The engine will not check for notifications, all
             notifications will be ignored.

idle         The engine finished its work and is looking for
             more work.  It is looking for a context to resume or a local
             spark.  If found, the engine will move to the working state,
             if not, it will check for notifications and if there are
             none it moves to the stealing state.  Only notify an idle
             engine with notifications that may be ignored.

stealing     The engine is now attempting to work steal.  It has now
             incremented the idle engine count to make it easier to
             receive notifications.  If it finds a spark it will decrement
             the count and execute the spark.  Otherwise it checks for
             notifications and moves to the sleeping state.  This state
             is similar to idle but separate as it allows another engine
             to understand if this engine has modified the idle engine
             count (which we don't want to do in the idle state as that
             will often find a local spark to execute).

sleeping     The engine has committed to going to sleep, to wake it up
             one must post to its sleep semaphore ensuring that it does
             not sleep.  Any notification can be sent at this stage as
             all will be acted upon, including the context notification
             which cannot be dropped.

notified
             The engine has received a notification, it cannot receive
             another notification now.  This state is initiated by the
             notifier, and therefore is done with either a compare and
             swap or a lock depending on the state of the engine.  See
             try_wake_engine and try_notify_engine.  Upon receiving the
             notification the engine will set its new status
             appropriately.

An engine can move itself through the following transitions of states
without locking or other protection.

working -> idle
idle -> working
stealing -> working
                     As the engine starts and finishes work it moves
                     between these states with a minimum of overhead.
                     These transitions may be made without a CAS or
                     locking.  We simply use write ordering to guarantee
                     that the new state (such as idle) is visible before
                     the engine acquires the runqueue lock.

notified -> working An engine wakes up, and finds work.

notified -> idle
notified -> stealing
                     An engine wakes up but doesn't find work, it goes
                     idle and checks for global work.

An engine can move itself through the following transitions provided that
it uses a CAS to do so.  This is so that it is guaranteed to observe the
notified state if another engine has set that state.

idle -> stealing
                     About to attempt work stealing.

stealing -> sleeping
                     The engine is about to call sem_wait, and MUST call
                     sem_wait after advertising the sleeping state.

A notifier may notify another engine with the following transitions.

sleeping -> notified
                     Wake an engine while holding the wake_lock, the
                     engine must also post to the sleep semaphore and
                     decrement the idle engines count.  See
                     try_wake_engine.

idle -> notified
stealing -> notified
                     Notify an engine of an event.  This must use a
                     CAS, so that it coordinates with the engine's own
                     CAS transitions.

See also par_engine_state.dot for a graph of these states and transitions.

The RTS can run in a polling mode where sem_timedwait is used instead of
sem_wait,  Define MR_WORKSTEAL_POLLING to enable this, it must also be
defined when compiling the application as the mercury_context.h will
define different macros.  When running in this mode an engine may move
from the sleeping to working states itself by using the lock in its sleep
structure.

This has been setup specifically to ensure that engines can be notified
individually and that work is never lost.  Any future changes must
continue to prevent the following races.

There are two engines, A becomes idle and checks for contexts to
execute, then B schedules a new context.  He cannot give it directly to A
because A is not sleeping.  So A never sees the context.  Worse
still, if this context can only be executed by A then B never continues
this work and the whole system deadlocks.  Therefore after placing the
context on the runqueue, a context advice message is given to any
engine that is in the idle, stealing or sleeping states (currently only
for cases where the context may only be executed by a single engine).

Similarly, if engine A is creating a spark, and engine B is in the
stealing state may have already checked A's deque.  So it's a good idea
to notify an engine of a spark if it is in the stealing or sleeping
states.