timer_server loses its brains (Fwd: Re: [erlang-questions] System limit bringing down rex and the VM)

Ulf Wiger ulf.wiger@REDACTED
Fri Sep 10 09:39:09 CEST 2010


I thought I'd forward this to the erlang-bugs list as well.
The timer_server should make up its mind - if it intends to
restart after crashing, it must remember all existing timers.
Otherwise, it would be better for it to act as a true kernel
process and bring down the node.

BR,
Ulf W

-------- Original Message --------
Subject: Re: [erlang-questions] System limit bringing down rex and the VM
Date: Fri, 10 Sep 2010 09:33:27 +0200
From: Ulf Wiger <ulf.wiger@REDACTED>
Organization: Erlang Solutions, Ltd.
To: erlang-questions@REDACTED

On 09/09/2010 07:33 PM, Musumeci, Antonio S wrote:
> 
> I'm seeing mnesia, rex and timer_server in my dump. If you
> kill timer_server though it restarts.

Actually, I consider this a bug.

Let's check to see what the result is of killing timer_server.

Eshell V5.7.5  (abort with ^G)
1> F = fun() ->
         timer:send_after(15000,self(),hello),
         receive
            Msg ->
               io:fwrite("got ~p~n", [Msg])
            end
        end.
#Fun<erl_eval.20.67289768>
2> f(P), P = spawn(F), time().
{9,25,48}
got hello
3> time().
{9,26,6}
4> whereis(timer_server).
<0.38.0>
5> f(P), P = spawn(F), time().
{9,26,22}
6> exit(whereis(timer_server),kill).
true
7> whereis(timer_server).
<0.43.0>
8> time().
{9,27,0}
9> process_info(P).
[{current_function,{erl_eval,receive_clauses,6}},
 {initial_call,{erlang,apply,2}},
 {status,waiting},
 {message_queue_len,0},
 {messages,[]},
 ...

So killing timer_server caused it to bounce back, but in the process,
it forgot all outstanding requests, so any processes depending on the
reliable service of the timer server are now left hanging, with no
indication whatsoever that something went wrong.

Personally, I think it would be much better if the timer server would
in fact stay dead, and bring the whole node down with it - that, or
make sure that its dying and restarting is truly transparent. Choosing
a middle way of merely pretending to be robust is the worst possible
choice.

Rather than concluding that the OTP team are incompetent in matters
of robustness (as there is overwhelming evidence that they are
anything but), I'd like to see this as yet another example of how
desperately difficult and dangerous it is to go down the path you're
suggesting. It may seem like a respectful thing to do, but you take
on a very heavy burden, and may well be much more likely to compound
the problem rather than helping it.

BR,
Ulf W


________________________________________________________________
erlang-questions (at) erlang.org mailing list.
See http://www.erlang.org/faq.html
To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED



More information about the erlang-bugs mailing list