Author: Maria Scott <maria-12648430(at)hnc-agency(dot)org>, Jan Uhlig <juhlig(at)hnc-agency(dot)org> Status: Draft Type: Standards Track Created: 04-Mar-2021 Erlang-Version: 24.0 Post-History: 08-Mar-2021, 17-Mar-2021, 23-Mar-2021, 31-Mar-2021 Replaces:
This EEP introduces a way of automatically terminating supervisors based on the termination of specifically marked significant children.
This document is based on the discussion in OTP-PR 4521.
Children under a supervisor often represent a work unit, that means, a group of cooperating processes, as opposed to just a single process. Such work unit supervisors (called group supervisors in the context of this document) are themselves typically hosted by a simple_one_for_one supervisor, via which they are started as needed.
At the time of this writing, however, there is no good, canonical way of stopping such group supervisors once the work unit they represent has finished it's work and the respective child processes have terminated, meaning the group supervisors will hang around, idle forever unless stopped manually one way or another.
This has been addressed in applications in a variety of ways, none of which can be called truly good, straightforward, or canonical:
supervisor:terminate_child/2. As this is a blocking call, a process has to be spawned for it. Also, this will cause the top supervisor to be blocked until the group supervisor has been shut down, so it will not accept other requests until then.
Both of the above approaches suffer from the fact that the children responsible for the shutdown have to know things about their surroundings, namely...:
This may be tackled by having a dedicated overseer child that watches the other children and acts according to their behavior. However, this requires considerable boilerplate code for tasks that would be better suited in the supervisor. Also, there is the problem that the overseer process must keep the list of children it watches up to date should any of them be restarted, either by enabling the children to register with it on start (for which they in turn must know the overseer process' pid), or asking the supervisor for it.
Another approach that is often used is to make the children responsible for the shutdown of the group supervisor permanent and the supervisor's restart intensity to 0. This has the downside that the child will not be restarted but cause the supervisor to shut down if it exits abnormally but could be restarted. Another downside to this approach is that it produces error messages (crash reports), even if the shutdown is intended.
Last but not least, some people have taken the approach to clone the OTP supervisor and customize it to their needs, for reasons outlined here and others.
This EEP provides a means to alleviate the problems outlined in the
motivation by introducing a way to mark specific children as
via a new child spec flag, and a way to configure supervisors to shut down
automatically depending on the exit of significant children via a new
In order to keep backwards compatibility, the new flags will only be usable in the map forms of child specs and supervisor flags, and for the same reason the default values for the new flags are chosen such that, in their absence, the supervisor behaves the same as it does to date.
The new child spec flag is named
significant with possible values
false being the default.
The new supervisor flag is named
auto_shutdown with possible values
never being the default.
With the supervisor
auto_shutdown flag set to
never, the child spec flag
significant is not allowed to be
never value and the restriction
significant value is intended as a safety means to defend against
unintended automatic shutdowns, for example by the exit of a significant child
which was added later via
supervisor:start_child/2. As the spec for such a
child would not be present in the
supervisor:init/1 callback code but
somewhere else, debugging such unexplained supervisor shutdowns might be
Otherwise, the following rules apply when a significant child exits on its own:
transientchild will be restarted (not cause a supervisor shutdown) if it exits abnormally. If it exits normally...
any_significant, the supervisor will shut down
all_significant, the supervisor will shut down if the child was the last active significant child
temporarychild will never be restarted. If it exits normally or abnormally, the same rules as for
transientchildren apply, in regard to the supervisor
If the restart type is
significant flag is not allowed to
true, as this combination does not make sense.
To be clear, the above rules only apply when significant children exit
by themselves, that is, not when being terminated manually via
supervisor:terminate_child/2, not when other non-significant children exit,
and not when being terminated as a consequence of a sibling's death in the
The approach proposed here could also be used to the effect of "shutdown when
empty" by marking all children as
significant and setting the supervisor
auto_shutdown flag to
It is worth mentioning that the
simple_one_for_one strategy poses a special
case, as it can have only a single child spec that applies to all children.
That means that either all children are significant ones, or none is.
Using temporary significant children in
supervisors may lead to an edge case scenario in which an intended automatic
shutdown will not happen. Temporary children will not be restarted, not even
when their termination was caused by a sibling's death. On the other hand,
the automic shutdown of a supervisor is not triggered when a significant
child is terminated as a consequence of a sibling's death. Thus, a temporary
significant child intended to automatically shut down it's supervisor will
be lost if it is terminated as a consequence of a sibling's death.
The changes proposed in this document introduce no incompatible changes, as the new child spec and supervisor flags are optional and default to values that result in the current behavior. Also, all the current workarounds outlined in the Motivation will still work.
Although the proposed changes are backwards compatible, applications using this enhancement may not be compatible when compiled with previous OTP versions unless proper care is taken. Such an application compiled with older OTP versions will leak processes, as the automatic supervisor shutdowns it relies on to remove unused parts of it's supervision tree will not happen. Taking care of this issue is at the discretion of implementors if they expect an application which uses the significant child behavior to be compiled with an OTP version that predates it's appearance.
A reference implementation which will be updated to reflect the state of this document can be found in OTP-PR 4638.
This document has been placed in the public domain.