[erlang-questions] The Erlang Rationale

Thu Oct 2 08:55:02 CEST 2008

On Thu, Oct 2, 2008 at 1:23 AM, Richard O'Keefe <ok@REDACTED> wrote:

>
> On 2 Oct 2008, at 2:49 am, Edwin Fine wrote:
>
>  This is a dissenting vote regarding macros.
>>
>> Macros *can* make maintenance harder,
>>
>
> You've just agreed; this is not dissent.
>

Sorry, but I did not agree.  You just read it that way. What I perhaps
should have written is that you bluntly asserted that macros make
maintenance harder. I am bluntly asserting that it is not necessarily so.
That's dissent. Or maybe partial dissent, but there's dissent in there
somewhere, I'm sure :)

>
>  just like gotos *can* create spaghetti code. Neither of them are
>> intrinsically bad, merely easy to misuse.
>>
>
> Maintenance of large amounts of code has to be done with
> tools.  Macros make it harder to produce accurate tools.
> The only freely available program I've come across that
> does a decent job of cross-referencing for *both* macros
> *and* the underlying symbols for C is CScout.  (It isn't
> an Open Source program, but it is available in executable
> form for no money.)  As far as I know there is no real
> equivalent for Erlang.
>

I don't think this is a particularly compelling argument. I've been involved
in maintenance of multi-million line systems and had much bigger fish to fry
than macro-related difficulties with cross-reference tools. Maybe your
experience has been different.

>
>
>  Used with care and discipline, they both arguably have a place in good
>> programming practice.
>>
>
> As a matter of fact, I still use M4 on Java code.
> (Remember, Java 1.5 generics do *not* accept primitive types
> as arguments.  If you want to do that without the very heavy
> overhead of Java boxing and unboxing, M4 is the only game in town.)
>

So macros are NOT that evil :)

>
>
>  Now I may get shot down in flames for saying this, but tail recursion in
>> Erlang is effectively a restricted form of goto,
>>
>
> Perfectly true.  There's even a famous paper
> "Lambda, the Ultimate Goto".
>

So the evil GOTO is not always evil, only sometimes. So it is for macros, or
any dirty trick we have to pull because we never thought of the right way to
do it originally and now we have to retrofit a bolt-on solution.

>
>
>  and it's used a lot (not by choice over some other construct, though - the
>> language design forces the usage).
>>
>> There are some things that macros can do that I have not found as easy (or
>> possible) to do some other way, for example:
>>
>> -define(LOG_DBG(Msg, ArgList), iutil_log:log_debug(?MODULE, ?LINE, Msg,
>> ArgList)).
>>
>> Example usage:
>>
>> ?LOG_DBG("Received ~p from ~p~n", [Msg, Socket]).
>>
>
> This is a rather interesting one.
> What could replace it?
>
>        -module(flog).
>        -export([flog/2]).
>
>        flog(Format, Arguments) ->
>            {Module, Line, _} = erlang:call_site(),
>            iutil_log:log_debug(Module, Line, Format, Arguments).
>
> Some kind of primitive that extracted a return address and
> consulted a line number table.  That could do it.  In fact it
> could provide more information, such as {Function,Arity}.  As
> a debugging tool, presumably it would not need to be fast.
>

But it's not available *today*, and I need something that works *today* to
give my customer.

>
>
>>
>  iutil_log:log_debug(?MODULE, ?LINE, "Received ~p from ~p~n", [Msg,
>> Socket]).
>>
>
> But what if, instead of or in addition to ?MODULE and ?LINE,
> the system provided you with ?HERE, expanding to
> {Module, Line, {Function, Arity}}.  Then
>
>        iutil_log:flog(?HERE, "Received ~p from ~p~n", [Msg,Socket])
>
> doesn't seem _that_ horrible.
>

Sorry to belabor the point, but what if it did? It doesn't. What if we all
lived in peace and harmony and respected each other's rights?

>
>
>>
>> If I find that debug logging is causing too much overhead, I can decide to
>> conditionally compile it:
>>
>> -ifdef(DEBUG).
>>    -define(LOG_DBG(Msg, ArgList), iutil_log:log_debug(?MODULE, ?LINE, Msg,
>> ArgList)).
>> -else.
>>    -define(LOG_DBG(Msg, ArgList), ok).
>> -endif.
>>
>
> Suppose we had top-level variables instead.  So
>

But we don't. I wish we did.

>
>        Debug = false.
>
>        -inline([flog/3]).
>
>        flog({Module,Line,_}, Format, Arguments) when Debug ->
>            iutil:log_debug(Module, Line, Format, Arguments);
>        flog(_, _, _) ->
>            ok.
>
> Now we are down to
>
>        ... flog(?HERE, "Received ~p from ~p~n", [Msg,Socket]) ...
>
> with the *same* efficiency as the macro, as easily enabled or
> disabled, and no preprocessor.  The only thing we need that we
> don't have now (for we do have inlining) is ?HERE, which is no
> harder to provide than ?LINE.
>
> I've been experimentally rewriting some Erlang modules to see
> what top level variables would look like.  Rather nice, in fact.
> Well, we don't have those now, and we'd need non-trivial compiler
> changes to get them.  So let's do without.

Exactly my point.

>
>
>        -inline([debug/0, flog/3]).
>
>        debug() -> false.
>
>        flog(Where, Format, Arguments) ->
>            case debug()
>              of true ->
>                 {Module,Line,_} = Where,
>                 iutil:log_debug(Module, Line, Format, Arguments)
>               ; false ->
>                 ok
>            end.
>
> These functions we can write today.
>

True - but I was working towards my ultimate point that having to recompile
is a pain and wanted the ability to change run-time log levels. And, using
the above code, won't you get copious compiler or Dialyzer warnings
complaining that only one branch of the case will ever be reached? How would
you suppress those? Or are you going to invent a suitable -pragma() for
Erlang :) ?

>
>
>>
>  A minor inconvenience of the above is that if it uses variables that are
>> not otherwise used, you can get compile warnings when debug logging is
>> disabled. This is easily fixed by using the underscore: _Unused.
>>
>
> And the version using an inlined function doesn't have the problem
> in the first place.
>

True. Does the inlined function stay inlined even if you compile with
[debug_info]? I truly don't know - in C++ this often disables inlining.

>
>
>  If I don't like having to recompile the code to enable and disable debug
>> logging, but want to turn it on and off at run-time (and still have
>> negligible overhead when debug logging is disabled), I can do this (and the
>> source code using this does not change in any way, but of course must
>> undergo a once-off recompilation):
>>
>> -define(
>>    LOG_DBG(Msg, ArgList),
>>    case iutil_log:ok_to_log(debug) of
>>        true ->
>>            iutil_log:log_debug(?MODULE, ?LINE, Msg, ArgList);
>>        false ->
>>            ok
>>    end
>> ).
>>
>
> Ah, the old don't-evaluate-the-arguments trick.
> I've run into code that only worked when you had
> assertions enabled, because it relied on the
> argument of assert() being evaluated...
>

Ah, the old I've-seen-crap-code trick. Yes, absolutely true, and anyone
worth their salt will never put important code into log statements that
won't get evaluated.

>
>>
>  Checking if the Msg and ArgList should be evaluated before calling saves,
>> at the cost of an efficient function call, potentially enormous amounts of
>> unnecessary list creation and destruction (and garbage collection), not to
>> mention any evaluation of the list elements that might be needed.
>>
>
> True.  It also means that the debugging version and the non-debugging
> version of your program do not do the same thing.
>

Only if I'm careless. Anyway, the whole point of having a debugging and
non-debugging version is that they DON'T do the same thing - otherwise why
have two versions? But I am splitting hairs here - I know you mean that from
a functional standpoint, they won't do the same unless you take care. But
that's true of anything to do with programming.

> If the expressions involve only constants, variables, control
> structures, and calls to known pure functions, I would hope that
>

And they absolutely should!

>
>        -inline([flog/3]).
>        flog(Where, Format, Arguments) when Debug -> %%% Huh??
>
>            case iutil_log:ok_to_log(debug)
>              of true ->
>                 {Module, Line, _} = Where,
>                 iutil:log_debug(Module, Line, Format, Arguments)
>               ; false ->
>                 ok
>            end;
>        flog(_, _, _) ->
>            ok.
>
>        ... flog(?HERE, "......", [.....]) ...
>
> would push the evaluation of the format and arguments into the one
> case branch that uses them.  If it doesn't, we have far worse
> performance issues to worry about than this one.
>

Where does Debug come from? Is it a (non-existent) top-level variable?

> If the expressions involve side effects and calls to possibly
> impure functions, then you had better make sure they are _always_
> evaluated, otherwise what you log won't be what happens when you
> are not logging.

Totally agreed. But that's true of any conditional code, not just that to do
with logging. You have to be careful. I suppose that with a macro that you
don't understand might not always execute, you run a higher risk of screwing
up. Ah, if only languages would give us what we need without macros....!

>
>
> In this context, I find BitC's distinction between pure and impure
> functions _in the type system_ interesting.
>

That would be very useful - does that  mean the compiler automatically flags
functions as pure or impure based on their side-effects?

>
>>
>> In this case I would argue that the macro makes the code *more*
>> maintainable and easier to read, while keeping it efficient. Agreed, if the
>> macro changes (and this class of macro seldom changes) I will need to
>> recompile the dependent code, and this is definitely not good, but as Lord
>> Farquhar said, "[Many of you may be killed, but] it is a sacrifice I am
>> willing to make." ;-)
>>
>
> The problem is that ?LOG_DBG could do *anything*,
> and it isn't as easy as it should be to find the definition.

That's also its strength (to be able to do ANYTHING and you don't have to
change the code). To find the definition using find and grep isn't *that*
hard, is it? I have searched the entire Erlang source code base for things
of interest in under a minute with find/grep. Maybe it's slow using that
500MHz UltraSparc you are saddled with :(

> (I was going to explain what etags does with macros, since
> etags on my box claims to support Erlang, but what
> cd stdlib/src; etags -o fred *.erl
> does is to crash in strncpy().  Since it fails to note the
> arity of functions, it's dubiously useful anyway.)

The authors probably weren't careful with the side-effects of their debug
log statements :)

>
>
>>  The curious thing is that people keep on trotting out the *same*
> example of why the preprocessor is useful.  We *have* inlining.

We keep trotting out the *same* example because there are no *existing*
solutions that do what I want without using it!!

>
> If only we had ?HERE,

But we don't. I have to work with the real world. If ?HERE was here I would
use it!

> then
>
>    LOG_DBG("Received ~p from ~p~n", [Msg,Socket])
>
> would be
>
>    flog(?HERE, "Received ~p from ~p~n", [Msg,Socket])
>
> which I for one don't regard as unduly burdensome.  In
> fact the visible presence of ?HERE as an argument tells
> me that the location is being passed on, which is not so
> obvious in LOG_DBG.
>

Maybe so, but it gives me the flexibility to remove the location being
passed later if I find it is causing a problem, say with performance.

> Actually, if we had ?HERE, things could get even better.
> If the compiler handled constant terms specially (using a
> single static copy instead of building a new copy on the
> heap), passing ?HERE could be as cheap as passing 42, and
> we could pass around one possibly detailed location term
> instead of separate module/line arguments.  This would be
> nice to do for format strings too.
>

Well, I for one would vote for a compiler that detected constant terms and
kept them statically!!

The one important thing that I perhaps didn't understand from your
discussion is how you proposed to change the logging level at run time,
given the *existing* features of Erlang.

Understand that I gave the LOG_DBG macro as an example. As it happens, I
have LOG_TRACE, LOG_INFO, LOG_WARN, and LOG_ERROR as well. I know that
Erlang has trace facilities, but they are not retrospective. I need to go
back in the logs and see what happened historically. I don't like the idea
of sending all logging operations to a central manager process that decides
to throw away the ones it doesn't want, because I don't want the sending
process to be spewing out useless messages and using up CPU and GC time, and
the manager process to be throwing things away. I have found historical logs
to be an invaluable part of the fault-finding process, so I pepper my code
liberally with all kinds of log statements - and I want to do that with the
least runtime cost for the most debugging benefit.

Let me tell you what I have done, and this will probably give you nightmares
because it's a terrible hack, but it works well. I hacked it like this
because I didn't want to (a) pass log level variables in the parameter list
of every single function I ever write (b) store the log level in the process
dictionary where it has to be looked up thousands of times a second (c) even
worse, store it in an ETS table. In the absence of top-level variables, I
faked a top-level variable as follows.

I wrote a module that exports only one function, log_level(). This function
is hard-coded to return (say) the atom 'info', e.g.

log_level() -> info.

If I want to change the logging level to debug at runtime, I simply rewrite
the one line of code (plus module and export statements) in a string and
recompile the code at runtime, then purge. The logging level then changes
and I get debug, warning, error and info logs. (And I do understand that the
historical aspect I find so important can be affected by changing the
logging level, but I ensure that I log the most critical information at all
times). My understanding is that a call to mod:func is very cheap, probably
more so than any process dictionary or ETS lookups. So this way, for the
price of a bit of skulduggery, I get tremendous flexibility at a low runtime
cost. I am sure that the gurus will find some horrible flaw in this scheme,
which I'd like to hear about so I can fix it, but for now it is working very
well indeed.

I do look forward to the implementation an EEP for top-level variables,
though, then I can throw this ugliness away.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20081002/e58b3d07/attachment.htm>