Redundant nodes

Fri Feb 22 03:09:28 CET 2002

Our shelf hardware was designed to support a pair of redundant control
cards, only one of which is "active" at any given time, the other intended
to be a "hot standby". However the specifics of the hardware design are such
that only the currently active card can talk to the outside world or to any
of the line cards in the shelf.  The currently standby processor can only
talk directly to the active card through a dedicated channel between them,
across which they exchange data that needs to be mirrored.  The intention
was that from the point of view of an external entity (such as a management
system), and from the point of view of the line cards, there is apparently
only one control card (entity, control point). The idea is that since only
one of the control cards can talk to the world at any given time, both cards
can offer the same IP address (even the same MAC address) to the world.
When a switchover occurs between the redundant control cards in the shelf,
it will appear to the world that there has been a temporary disruption since
TCP sessions (and perhaps their associated processes) will have to be
re-established or restarted, and if transactions were in process they may be
aborted, but otherwise the shelf responds to the same addresses and they
pick up where they left off. The same story from the line card's point of
view.

Assuming that all the processors within the shelf are running Erlang, as are
some processors outside the shelf, is the above scheme feasible with the
current design of the Erlang message distribution mechanism?  Or would a
switchover in the above scheme necessarily wreak havoc?

-- Kurt Luoto