[erlang-questions] auto-syncing mnesia after a network split

Tue Dec 2 21:30:05 CET 2008

On Tue, December 2, 2008 2:03 pm, Joel Reymont wrote:
> Alex,
>
> On Dec 2, 2008, at 5:18 PM, Alex wrote:
>
>> what happens when you have multiple updates to both sides of the
>> split?  if you just pick the highest vnum, you lose all the
>> transactions from the other side of the split when it rejoins.
>
>
> You can pick up new (inserted) records by doing a diff of primary keys
> for each table.
>
> You cannot do anything about deleted records, I think, so you'll just
> have to delete those again somehow. You could assume that the table
> replica with the latest timestamp is the right one and just delete the
> extra records from the other table.
>
> Imagine a bank account that's distributed across the split nodes,
> where a customer deposits money a 2 times and the deposits are split
> across the nodes. You'll pick up the latest deposit on one node and
> miss the other deposit.
>
> I think you can overcome this programmatically, with a timestamp _and_
> a version number. You can have a version table per node with three
> columns: table name, vnum and timestamp. The rest of the tables would
> have just the vnum in their records.
>
> When updating table T, you will first update the version table by
> storing the current time and bumping the vnum for the key T. You will
> then store the vnum in the record of table T that you are updating.
>
> You will be able to find the split time by looking at the version
> tables and figuring out when the vnums started to diverge. You can
> then invoke a merge function that figures out, for example, how to
> merge a bunch of bank deposit transactions into a single balance.
>
> You will know the vnum at split time and will only need to consider
> the transactions that happened after. Shouldn't be a lot of
> transactions for a short split time.
>
> What do you think?

The ideas are interesting, though I pray my bank never adopts such software.

I don't think bank software can continue to allow transactions like
deposit and withdrawal during a network partition--I just don't see how
that can be made to work while maintaining a consistent view of the
various accounts across disconnected nodes (e.g. how can a bank ATM allow
me to withdraw funds if it cannot reach its peer node(s) at my bank to
determine the availability of such funds?).

Most systems I work with implement a recovery procedure similar to what
Ulf has posted in the past on this list. This works in my _special case_
because I am tracking real-time telephony stats used to route calls (vs.
manage bank account information).

Because the systems I am referring to require high-availability over 100%
data consistency, this is perfectly ok (and works quite well). With issues
like telecom "glare" I couldn't be 100% accurate all the time anyway.

So, to recover from a partition it is enough to pick any functioning node
as the new "master" and have others restart and/or force load tables from
it. The entire time clients keep pushing new stats into the system, so
everything "converges on reality" in the end following a recovery attempt
anyway.

This system works extremely well, but again I wouldn't dream of using it
to implement ATM software for managing bank accounts.

-Rick