Paul DuBois
dubois@primate.wisc.edu
Wisconsin Regional Primate Research Center
Revision date: 10 April 1997
This document describes Exception and Termination Manager (ETM),
a simple(-minded) library to manage exceptional conditions that
arise during program execution, and to provide for orderly program
shutdown.
There are at least a couple of approaches one may adopt for handling
error conditions within an application:
Each approach has strengths and weaknesses. A difficulty with
the first is that actions composed of many subsidiary actions,
each of which may themselves succeed or fail, can easily become
very unwieldy when an attempt is made to handle all possible outcomes.
However, such a program will also continue in the face of extreme
adversity.
An advantage of the second approach is that it is, conceptually
at least, simpler to let a program die when a serious error occurs.
The difficulty lies in making sure the program cleans up and shuts
down properly before it exits. This can be a problem especially
when a program uses a number of independent modules which can
each encounter exceptional conditions and need to be shut down,
and which may know nothing of each other. ETM is designed to alleviate
the difficulties of this second approach.
The general architecture assumed for this discussion is that of
an application which uses zero or more subsystems which may be
more or less independent of each other, and which may each require
initialization and/or termination. Also, other application-specific
initialization and/or termination actions may need to be performed
which are unrelated to those of the subsystems, e.g., temporary
files created at the beginning of the application need to be removed
before final termination, network connections need to be shut
down, terminal state needs to be restored.
Ideally, when an application executes normally, it will initialize,
perform the main processing, then shut down in an orderly fashion.
This does not always occur. Exceptional conditions may be detected
which necessitate a "panic" (an immediate program exit)
because processing cannot continue further, or because it is judged
too burdensome to try to continue.
An individual subsystem may be easily written such that a panic
within itself causes its own shutdown code to be invoked. It is
more difficult to arrange for other subsystems to be notified
of the panic so that they can shut down as well, since the subsystem
in which the panic occurs may not even know about them.
An additional difficulty is that some exceptions may occur for
reasons not related to algorithmically detectable conditions.
For instance, the user of an application may cause a signal to
be delivered to it at any time. This has nothing to do with normal
execution and cannot be predicted.
The goals of ETM are thus twofold:
The model used by ETM is that the application initializes subsystems
in the order required by any dependencies among them, and then
terminates them in the reverse order. The presumption here is
that if subsystem ss2 is dependendent upon subsystem ss1,
then ss1 should be initialized first and terminated last;
the dependency is unlikely to make it wise to shut down ss1
before ss2.
ETM must itself be initialized before any other subsystem which
uses it. The initialization call, ETMInit(), takes as
an argument a pointer to a routine which performs any application-specific
cleanup not related to its subsystems, or NULL if there
is no such routine.
Each of the subsystems should then be initialized. A subsystem's
initialization routine should call ETMAddShutdownProc()
to register its own shutdown routine with ETM, if there is one.
(Some subsystems may require no explicit initialization or termination.
However, if there is a shutdown routine, you should at least call
ETMAddShutdownProc() to register it.)
When the program detects an exceptional condition, it calls ETMPanic()
to describe the problem and exit. ETMPanic() is also
called automatically when a signal is caught. A message is printed,
and all the shutdown routines that have been registered are automatically
executed, including the application-specific one.
ETM is designed to handle shutting down under unusual circumstances,
but it also works well for terminating normally. Instead of calling
ETMPanic(), the application calls ETMEnd().
This is much like calling ETMPanic(), except that no
error message is printed, and ETMEnd() returns to the
caller. which takes care of calling all the shutdown routines
that have been registered.
It is evident that the functionality provided by ETM is somewhat
like that of the atexit() routine provided on some systems.
Some differences between the two are:
Here is a short example of how to set up and shut down using ETM.
main () { . . . ETMInit (Cleanup); /* register application-specific cleanup */ SS1Init (); /* registers SS1End() for shutdown */ SS2Init (); /* registers SS2End() for shutdown */ SS3Init (); /* registers SS3End() for shutdown */ ... main processing here ... ETMEnd (); /* calls SS3End (), SS2End () and SS1End () */ exit (0); }
Subsystems that are themselves built on other subsystems may follow
this model, except that they would not call ETMInit()
or ETMEnd().
If there is no special initialization or shutdown activity, and
you don't care about catching signals, it is not necessary to
call ETMInit() and ETMEnd(). The application
may still call ETMPanic() to print error messages and
terminate. (Even if the application does use ETMInit()
and ETMEnd(), it is safe to call ETMPanic()
before any initialization has been done, because nothing needs
to be shut down at that point yet.)
If ETM itself encounters an exceptional condition (e.g., it cannot
allocate memory when it needs to), it will--of course--trigger
a panic. This should be rare, but if it occurs, ETM will generate
a message indicating what the problem was.
Caveats
Shutdown routines shouldn't call ETMPanic(), since ETMPanic()
causes shutdown routines to be executed. ETM detects loops of
this sort, but their occurrence indicate a flaw in program logic.
Similarly, if you install a print routine to redirect ETM's output
somewhere other than stderr, the routine shouldn't call
ETM to print any messages.
kill -9 is uncatchable and there's nothing you can do about
it.
Programming Interface
The ETM library should be installed in /usr/lib/libetm.a
or local equivalent, and applications should link in the ETM library
with the -letm flag. Source files that use ETM routines
should include etm.h. If you use ETM functions in a source
file without including etm.h, you will get undefined
symbol errors at link time.
The abstract types ETMProcRetType and ETMProcPtr
may be used for declaring and passing pointers to functions that
are passed to ETM routines. By default these will be void
and void(*)(), but on deficient systems with C compilers
lacking void pointers they will be int and int(*)(),
the usual C defaults for functions.
These types make it easier to declare properly typed functions
and NULL pointers. For instance, if you don't pass any
shutdown routine to ETMInit(), use
ETMInit ((ETMProcPtr) NULL);
If you do, use
ETMProcRetType ShutdownProc () { . . . } . . . main () { . . . ETMInit (ShutdownProc); . . . }
Descriptions of the ETM routines follow.
ETMProcRetType ETMInit (p) ETMProcPtr p;
ETMProcRetType ETMEnd ()
ETMProcRetType ETMPanic (fmt, ...) char *fmt;
ETMPanic() may be called at any time, including prior
to calling ETMInit(), but only those shutdown routines
which have been registered are invoked.
A common problem with applications that encounter exceptional
conditions such as segmentation faults is that you often don't
see all the output your application has produced. This is because
stdout is often buffered. To alleviate this problem,
stdout is flushed before any message is printed, so that
any pending application output is flushed and appears before the
error message.
By default, ETMPanic() prints the message on stderr.
This behavior may be modified with ETMSetPrintProc().
The default exit() value is 1. This may be modified with
ETMSetExitStatus().
ETMProcRetType ETMMsg (fmt, ...) char *fmt;
ETMMsg() may be called at any time, including prior to
calling ETMInit().
ETMProcRetType ETMAddShutdownProc (p) ETMProcPtr p;
ETMProcRetType ETMRemoveShutdownProc (p) ETMProcPtr p;
ETMProcRetType ETMSetSignalProc (signo, p) int signo; ETMProcPtr p;
To return a signal to its default action or to cause a signal
to be ignored, pass the following values for p (these
are defined in etm.h):
ETMSigIgnore signal is ignored ETMSigDefault signal default action is restored
ETMProcPtr ETMGetSignalProc (signo) int signo;
ETMProcRetType ETMSetPrintProc (p) ETMProcPtr p;
To override the default, pass the address of an alternate print
routine to ETMSetPrintProc(). The routine should take
one argument, a pointer to a character string, and return no value.
The argument will be the fully formatted panic message, complete
with a newline on the end. To restore the default, pass NULL.
The printing routine shouldn't call ETMPanic() or ETMMsg()
or a loop will be detected and ETM will conveniently panic as
a service to let you know you have a logic error in your program.
ETMProcPtr ETMGetPrintProc ()
ETMProcRetType ETMSetExitStatus (status) int status;
If ETMSetAbort() has been called to force an abort()
on a panic, the exit status is not returned.
int ETMGetExitStatus ()
ETMProcRetType ETMSetAbort (val) int val;
ETMSetAbort() is meaningless on systems with no concept
of a core image. Also, if you install a signal catcher for SIGABRT,
you may end up in a panic loop.
int ETMGetAbort () int val;