[erlang-questions] non-trivial supervisor idioms? -- An open-source plea!

Wed Oct 27 09:49:56 CEST 2010

That puts a nice perspective on things... obvious now I was trying to use  
included apps for something they were not designed for. But your story  
really got me thinking...

Sometimes I wish our friends at Ericsson would open-source one of their  
larger Erlang projects. Those of us who have NOT picked up Erlang from  
industrial environments have a problem of finding code we can reliably  
read for inspiration.

I (and I think many), heard about Erlang from somewhere, went and read  
Joe's book (a superb introduction which covers the 'spirit' of Erlang),  
identified with the problems Erlang tries to solve, maybe read Cesarini  
and Thompson as a follow up (an excellent reference.) The problem is no  
matter how good these books are, there's only so much a book can teach  
you. A book certainly cannot advise you on how to write a complete  
application or how to avoid writing programs that become unwieldy as they  
grow. The question is what next?

The obvious answer is to look at some open-source code. But when most  
open-source projects have learned Erlang the same way it becomes a case of  
the blind leading the blind. Problems that must have already been  
experienced and dealt with at Ericsson are repeated. A large  
production-ready open-source project written by battle-hardened  
experienced Erlang programmers would really fill in that void. Something  
big and sufficiently complex like a soft-switch or something.

- Edmond -

On Tue, 28 Sep 2010 17:07:16 +1000, Ulf Wiger  
<ulf.wiger@REDACTED> wrote:

>
> On 27 Sep 2010, at 20:05, Edmond Begumisa wrote:
>
>> Hi again Ulf,
>>
>> It's great to get 'the guy' on this subject online so I'm going to take  
>> full advantage and ask two more questions that have been dogging me...
>>
>> Firstly, is there an open-source project you know of that uses  
>> included-applications and/or start phases properly that I could take a  
>> peek at? Maybe in OTP source itself?
>
> Off the top of my head, I really can't think of any. :)
>
> The area where start phases really come in handy is when your  
> application needs
> to support failover/takeover behaviour. This is also when the StartType  
> argument
> is needed. One can implement takeover by writing a special start phase  
> that instructs
> the processes to take over processing from the other side. In general,  
> it is best to do
> this at a point where all processes have been started and are ready to  
> process
> incoming requests.
>
> The initial reason for start phases was that the complex call-handling  
> applications
> at Ericsson had some pretty horrendous dependencies to sort out before  
> they
> could start accepting calls, and doing this work in the init function of  
> the processes
> simply wasn't feasible. Also, when a process dies in the init function,  
> this is
> interpreted as a start error, and the application start will fail,  
> whereas individual
> processes have proper supervision while they are responding to requests  
> from
> the start phase code (which runs in the application_starter process).
>
> Included applications were mainly introduced since the same call-handling
> applications needed to move as one during failover and takeover, and  
> starting
> a dozen or so top applications made that much more difficult. It was  
> just too much
> code and too many modules to integrate into one single application  
> without one
> more structuring layer.
>
> Initially, I wrote some code that read .appSrc files in each  
> sub-application and
> integrated them into one larger application, using a top-level resource  
> file - I
> think it had the extension .appLm (as in load module - never mind; it  
> made sense
> at Ericsson, and it was so long ago that I may be remembering wrong).  
> This was
> later generalised by OTP into included_applications.
>
> The O&M applications also had a problem during takeover: The snmp code
> assumed that the snmp agent was locally registered on the same node,  
> which
> wasn't necessarily the case during the transition - either on the node  
> taking over
> or on the node where it ran before. We then created a wrapper  
> application that
> included all the O&M applications, and called the individual start  
> functions for
> each included app.
>
> Later, we moved away from that, as we had to also support applications  
> that
> were written according to a different timeline, and therefore couldn't  
> be integrated
> the same way as our other apps. I came up with a solution for starting  
> and stopping
> included apps and plugging in their start phase hooks in the right  
> places in the
> startup flow, but for some reason people found it complicated... :)
>
> The better solution was to make use of the fact that the application  
> controller now
> had a message passing interface for controlling the starting and  
> stopping of apps.
> We were already using this in our cluster controller, so we could extend  
> it by
> specifying distributed start dependencies and which applications needed  
> to do
> takeover in parallel. This way, the cluster controller knew in which  
> order to move
> applications during takeover, and in which order to terminate them, once  
> migrated.
> Unfortunately, all this code is proprietary. It's on my long list of  
> things I'd like to do,
> but that list just keeps growing, without much ever being removed from  
> it...
>
> A long time ago, I made a prototype (and sent to OTP) that introduced  
> start phase
> dependencies. This would IMHO make it much easier to specify dependencies
> between applications. As an example, mnesia loads tables in the  
> background, so
> when the application:start() function returns, one cannot assume that  
> tables are
> loaded, and has to call mnesia:wait_for_tables() (which can time out,  
> and has some
> corner cases where tables will never be loaded without intervention -  
> not that the
> function itself will tell you when they occur). It might be better if  
> mnesia had a
> load_tables start phase, which other applications could depend on.
>
> BR,
> Ulf W
>
>>
>> Secondly, I've always liked the idea of using included applications not  
>> necessarily for start phases but as a delayed/start-on-demand mechanism  
>> (taking advantage of the fact that included apps are automatically  
>> loaded but not started.) That is, manually calling  
>> application:start(foo) only if a particular feature of my app is used.  
>> But I have one query that made attempts for such use short-lived... the  
>> fact that an application can only be included by one other application.  
>> I think this limitation makes it harder to use included apps and start  
>> phases especially if you're using apps that are not in-house.
>>
>> For example, lets say CouchDB starts using mnesia (ok that's dumb  
>> but...) and decide to start it up using start phases (and therefore add  
>> it as an included application in couch.app) Then I have my FunkyApp  
>> that's been using mnesia too as included application. I then decide to  
>> use CouchDB for a new funky feature of FunkyApp. Now things break  
>> because mnesia is being used by both FunkyApp and CouchDB. To fix this,  
>> I not only have to modify my in-house app I have to modify the  
>> out-house CouchDB too.
>>
>> Is there an obvious fix to this I've been missing?
>>
>> - Edmond -
>>
>> On Tue, 28 Sep 2010 03:19:29 +1000, Ulf Wiger  
>> <ulf.wiger@REDACTED> wrote:
>>
>>
>> On 27 Sep 2010, at 18:14, Edmond Begumisa wrote:
>>
>> Ulf,
>>
>> I've been doing such initialisation in the init function of a worker  
>> manager process. Using Daniel's example, I might have a gen_server  
>> child of the main supervisor called db_mgr and set up the mnesia schema  
>> in db_mgr:init
>>
>> Have I been doing the 'wrong' thing OTP-wise?
>>
>> Not necessarily, but my personal preference is to cleanly separate  
>> setup code
>> from application startup. This is in part because I used to work on a  
>> very complex
>> product, where the setup was decidedly non-trivial, and the startup  
>> process had
>> to be optimised in several steps.
>>
>> Still, even there, I believe that the setup logic was bootstrapped into  
>> the startup
>> phase, but the code was still kept cleanly separated. The only thing  
>> that was
>> part of the startup was a simple check to see if the setup code had  
>> been run.
>>
>> BR,
>> Ulf W
>>
>>
>>
>> - Edmond  -
>>
>> On Tue, 28 Sep 2010 00:31:47 +1000, Ulf Wiger  
>> <ulf.wiger@REDACTED> wrote:
>>
>> On 27/09/2010 16:15, Daniel Goertzen wrote:
>> I've read the documentation on supervision and have seen a few  
>> tutorials,
>> but they don't seem to move beyond the core concepts.  For example, what
>> happens if you want to check and optionally setup an mnesia schema  
>> during
>> startup...where should this code go?  In the supervisor init() or
>> start_link() function?  Should I have my supervisor create a worker  
>> process
>> whole sole job is to do this kind of setup and then dynamically add  
>> other
>> workers (or supervisors) to the supervisor with start_child()?
>>
>> I strongly recommend doing that sort of thing in a separate procedure,
>> rather than in the startup phase.
>>
>> If you want your application to be able to bootstrap itself, I would
>> suggest that you either:
>>
>> - create a special application that runs before your other apps,
>>  and verifies that the installation is ok. To this end, it might be
>>  useful to know that you can pre-sort the .rel file. The systools lib
>>  will only change the sort order if needed to respect start
>>  dependencies.
>> - Introduce start_phases, then do minimal work in the init function,
>>  and push the rest to functions that are called from start phase
>>  hooks. This also has the advantage that you know that your processes
>>  are all started and ready to respond during the init phase.
>>
>> Start phases are documented in
>> http://www.erlang.org/doc/apps/kernel/application.html#Module:start_phase-3
>>
>> BR,
>> Ulf W
>>
>>
>>
>> --
>> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
>>
>> Ulf Wiger, CTO, Erlang Solutions, Ltd.
>> http://erlang-solutions.com
>>
>>
>>
>>
>> ________________________________________________________________
>> erlang-questions (at) erlang.org mailing list.
>> See http://www.erlang.org/faq.html
>> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>>
>>
>>
>> --
>> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
>
> Ulf Wiger, CTO, Erlang Solutions, Ltd.
> http://erlang-solutions.com
>
>
>
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/