Author:: Richard A. O'Keefe <ok(at)cs(dot)otago(dot)ac(dot)nz>
Status:: Draft
Type:: Standards Track
Created:: 27-May-2011
Erlang-Version:: OTP_R14B04
Source:: eep-0038.md

EEP 38: -discontiguous directive #

Abstract #

A -discontiguous directive is to be added so that specified functions may be presented as more than one group of clauses, possibly separated by other directives and function clause groups.

Specification #

A new directive

-discontiguous( Name_And_Arity_List ).

is added. Each function named in such a list must have at least one clause group in that module, and may have more. It remains an error for any function not named in such a list to have more than one clause group.

A function named in a -discontiguous directive need not have more than one clause group. If it does, it is if the clause groups were moved together without reordering and the full stop of each group but the last changed to a semicolon. The compiler should make no comment about the existence of multiple clause groups or their fusion into single clause groups.

The parser stage would do the regrouping and would not include any representation of the -discontiguous directive in its output, so that downstream tools would never know that -discontiguous had been there.

Motivation #

There are three problems which a single mechanism can solve.

The first is that Erlang has conditional compilation, but there is no really satisfactory to use it to select some but not all of the clauses of a function.

The -discontiguous directive allows you to write

-discontiguous([f/3]).

f(a, X, Y) -> .... .
-if(Cond).
f(b, X, Y) -> .... .
-endif.
f(c, X, T) -> .... .

The second may be called “topic-oriented programming”. It relates to human structuring of code around the data values computed on rather than the code they compute. I have found this in dealing with a virtual machine: I’ve wanted to place the code that assembles an instruction, the code that peephole optimises it, the code that encodes it into memory, and the code that interprets it into one place (involving different function), rather than organising it by function, thus scattering related information the length and breadth of the module.

It may be clearest to start with an example. The code in erl_syntax.erl reads:

-type syntaxTree() :: #tree{} | #wrapper{} | tuple().

%% All `erl_parse' tree nodes are represented by tuples
%% whose second field is the position information (usually
%% an integer), *with the exceptions of*
%% `{error, ...}' (type `error_marker') and
%% `{warning, ...}' (type `warning_marker'),
%% which only contain the associated line number *of the
%% error descriptor*; this is all handled transparently
%% by `get_pos' and `set_pos'.

get_pos(#tree{attr = Attr}) ->
    Attr#attr.pos;
get_pos(#wrapper{attr = Attr}) ->
    Attr#attr.pos;
get_pos({error, {Pos, _, _}}) ->
    Pos;
get_pos({warning, {Pos, _, _}}) ->
    Pos;
get_pos(Node) ->
    %% Here, we assume that we have an `erl_parse' node
    %% with position information in element 2.
    element(2, Node).

set_pos(Node, Pos) ->
    case Node of
        #tree{attr = Attr} ->
            Node#tree{attr = Attr#attr{pos = Pos}};
        #wrapper{attr = Attr} ->
            Node#wrapper{attr = Attr#attr{pos = Pos}};
        _ ->
            %% We then assume we have an `erl_parse' node,
            %% and create a wrapper around it to make
            %% things more uniform.
            set_pos(wrap(Node), Pos)
    end.

The type here is a little vague. The additional tuples appear to be {error,{Pos,_,_}}, {warning,{Pos,_,_}}, and the {Tag,Pos...} tuples returned by erl_parse. The thing here is that there are five different cases. For some purposes, it would be better to write

-discontiguous([get_pos/1,set_pos/2]).

get_pos(#tree{attr = Attr}) -> Attr#attr.pos.
set_pos(#tree{attr = Attr} = Node, Pos) ->
    Node#tree{attr = Attr#attr{pos = Pos}}.

get_pos(#wrapper{attr = Attr}) -> Attr#attr.pos.
set_pos(#wrapper{attr = Attr} = Node, Pos) ->
    Node#wrapper{attr = Attr#attr{pos = Pos}}.

get_pos({error, {Pos,_,_}}) -> Pos.
% What should set_pos/2 do in this case?

get_pos({warning, {Pos,_,_}}) -> Pos.
% What should set_pos/2 do in this case?

get_pos(Node) -> element(2, Node).  % assume erl_parse node
set_pos(Node, Pos) ->               % assume erl_parse node
    set_pos(wrap(Node), Pos).       % wrap it for uniformity

This brings out the parallel between the two functions, and the way the parallel fails, more clearly than any other possible layout. It nags at you to either finish the parallel with the obvious

set_pos({error, {_,X,Y}}, Pos) ->
    {error, {Pos,X,Y}}.

and

set_pos({warning, {_,X,Y}), Pos) ->
    {warning, {Pos,X,Y}}.

clauses or to at least change the comments to

% set_pos/2 falls through to the last case.

comments.

We have the same pattern, without the failure of parallelism, in two more functions from that file:

get_com(#tree{attr = Attr}) -> Attr#attr.com;
get_com(#wrapper{attr = Attr}) -> Attr#attr.com;
get_com(_) -> none.

set_com(Node, Com) ->
    case Node of
        #tree{attr = Attr} ->
            Node#tree{attr = Attr#attr{com = Com}};
        #wrapper{attr = Attr} ->
            Node#wrapper{attr = Attr#attr{com = Com}};
        _ ->
            set_com(wrap(Node), Com)
    end.

These could be

-discontiguous([get_com/1,set_com/1]).

get_com(#tree{attr = Attr}) -> Attr#attr.com.
set_com(#tree{attr = Attr} = Node, Com) ->
    Node#tree{attr = Attr#attr{com = Com}}.

get_com(#wrapper{attr = Attr}) -> Attr#attr.com.
set_com(#wrapper{attr = Attr} = Node, Com) ->
    Node#wrapper{attr = Attr#attr{com = Com}}.

get_com(_) -> none.  % error, warning, erl_parse.
set_com(Node, Com) ->
    set_com(wrap(Node), Com).

Well, once again the parallel is not quite perfect. The documentation for wrap/1 says that it assumes its argument is a class erl_parse tuple, which here means that it appears that it should NOT be an error or warning.

The point of interest here is that just looking at the existing functions didn’t ring any alarms; it was not until I said “these seem to be about the same data structure; I wonder if interleaving can make the connection clearer and make it easier to ensure that getters and setters are properly related?” that my attention was properly drawn to the differences.

It’s particularly interesting that the very first Erlang/OTP source file I looked at provided examples. The third is like the second, but relates to code written by a computer, not a human. For example, if generating a functional representation of some sort of state machine, it can be convenient to organise the output around the states, but the present scheme requires it to be organised around the functions that deal with the states.

Rationale #

Prolog systems have supported a :- discontiguous declaration for 20+ years. The approach is a proven one. It is a simple generalisation of the language which can be hidden from all “downstream” tools. Only tools that try to deal with Erlang syntax without fully parsing it could notice the difference, and they should largely ignore it.

Backwards Compatibility #

All existing Erlang code remains acceptable with unchanged semantics. Existing language-processing tools are unaffected if they rely on erl_parse.

Reference Implementation #

None in this draft.

References #

None.

Copyright #

This document has been placed in the public domain.