Author:
Richard A. O'Keefe <ok(at)cs(dot)otago(dot)ac(dot)nz>
Status:
Final/R15B Implemented in OTP release R15B
Type:
Standards Track
Content-Type:
text/plain
Created:
27-May-2011
Erlang-Version:
R14B04

EEP 37: Funs with names #

Abstract #

The syntax of funs is extended to allow a variable name to be consistently present before each argument list. This allows funs to be recursive. The knot is tied in the opcodes that apply a fun to its arguments, so no change to the garbage collector design is required.

Specification #

Currently, there are three forms for a fun:

fun Name/Arity
fun Module:Name/Arity

and

fun Fun_Clause {; Fun_Clause}... end

We add another form:

fun Variable Fun_Clause {; Variable Fun_Clause}... end

If any Fun_Clause has a Variable, all must, and they must all be the same variable. Like the variables in the argument list(s), this variable is local to the fun, not shared with its context. Within the fun, the variable is bound to the value of the fun expression.

There are several possible ways to implement this. One is rather neat because it preserves the cycle-freedom of the data structures the garbage collector has to deal with.

One way to implement existing funs is this:

  • a Create an auxiliary function with a generated name

      <foo>(...,X1,...,Xk) ...;
      ...
      <foo>(...,X1,...,Xk) ....
    

    having the same argument lists, guards, and clause bodies as the fun, except that each variable shared with the context appears as an extra argument.

  • b Translate the fun expression as

      '%mk-fun'({fun <foo>/n+k, X1, ..., Xk})
    

    which gives the tuple a special tag to say that it represents a fun value.

  • c Translate Foo(E1,...,Em)

    as A1 := E1, ..., Am := Em; funcall_m(Foo) where the funcall_m instruction checks that its argument is a closure expecting m arguments, moves the X1,...,Xk fields of the tuple to argument registers A<m+1>..A<m+k>, and then jumps to the address in the first field.

All it takes to implement recursive funs is

  • a’ Create an auxiliary function

      <foo>(...,X1,...,Xk,Variable) ...;
      ...
      <foo>(...,X1,...,Xk,Variable) ....
    
  • b’ Translate the fun expression as

      '%mk-rec-fun'({fun <foo>/<n+k+1>, X1, ..., Xk})
    

    which simply applies a second special tag.

  • c’ The funcall_m opcode acts the same for both old and

    recursive funs, except that just before jumping, it adds tne fun value Foo itself as argument A<m+k+1>. This “ties the knot”.

So a recursive fun takes no more space or time to create than an existing one, and does not involve creating any cycles of pointers. Its code can be inserted into the failure path for the funcall_m instructions, whatever their form.

Motivation #

Fun names can serve three purposes.

First, they can simply be documentation. For example,

cfun_files(CFun) ->
    fun(F1, F2) ->
        [[?OBJ(T1,_) | _] | _] = F1,
        [[?OBJ(T2,_) | _] | _] = F2,
        CFun(T1, T2)
    end.

can be written as

cfun_files(CFun) ->
    fun Compare([[?OBJ(T1,_)|_]|_], [[?OBJ(T2,_)|_]|_]) ->
    CFun(T1, T2)
    end.

A named fun whose name is not used can be implemented as if the name were not there.

Second, the fun’s name can be built into its generated name. At the time of writing, we might have

'-F/N-fun-K-'

where F/N is the name of the function that includes the fun and K is the number of earlier funs in F/N. We could build the name in instead, using

'-F/N-fun-Name-[K-]'

where K is present only if the outer function contains more than one fun with the same name. The point of this is that such names are more likely to be useful after hot loading. For example, if we start with

f(...Xs, Ys, ...) ->
    ...
    sort(Xs, fun X_Key({_,N,_}) -> N end),
    sort(Ys, fun Y_Key({N,_,_}) -> N end),
    ...

and then we revise it, swapping the two calls to sort/2. With named funs, the two funs retain their generated names, and the module is safe. With anonymous functions, the chances are that the two funs with swap names; oops!

Third, a frequently asked question in the Erlang mailing list is “why can’t I have recursive funs?” to which we will now be able to rely, “you can; here is what they look like.”

This still does not permit mutually recursive funs, but people do not seem to ask for that much.

Finally, the next time someone argues that Erlang syntax is inconsistent because function clauses have repeated names and fun clauses do not, we shall be able to reply “but fun clauses CAN have repeated names and probably should.”

Rationale #

There really seemed to be only two main questions.

What should the scope of the fun name variable be? Some variables in a fun are shared between the fun and its context. Doing that would let us write

f(...) ->
    fun G(...) -> ... end,
    fun H(...) -> ... end,
    ... use G and H ...

rather like using nested “define” in Scheme, except that while H could use G, G couldn’t use H.

Since you do not get mutual recursion this way, you should not be tricked into thinking you might. It’s better that you have to write

f(...) ->
    GG = fun G(...) -> ... end,
    HH = fun H(...) -> ... end,
    ... use GG and HH ...

so that you understand clearly what you are getting.

While variables in the body of a fun clause may be shared with the context, variables in the arguments are not, something I have found confusing. At least this way the fun name follows the same scope rule as the variables in the argument list right next to it.

The other main question was whether recursive fun values should be exactly the same representation as existing fun values, but with a cycle in it (tying the knot at construction time), or whether to introduce a new tag (tying the knot at call time). The lack of cycles in Erlang heaps has been a major factor in the design of several garbage collectors. I would expect changing that to be an order of magnitude harder than the changes required for this proposal. It was seeing that the knot could be tied at call with (without slowing down calls to existing funs) that made me dare to hope that this proposal might some day be accepted.

The main issue now is that this does not let us define a group of mutually recursive local functions. Adopting this proposal now might get in the way of a better proposal that handles mutual recursion as well.

I don’t see such a proposal as being likely to arrive soon.

There is a special case of this where the fun name is used only in tail call positions, which can be handled entirely by the compiler generating a jump back to the beginning. This need not have any consequences for the run time system at all.

Backwards Compatibility #

Code that does not use the new feature does not change its meaning. There may be code that relies on the form of generated function names; that would need changing.

All syntax tools would need to be revised to handle the new form. Existing parse transforms might well fail on code containing the new form, but would work unchanged on code that does not.

At least one new instruction is needed to create suitably distinguished closures. Existing programs that analyse BEAM files will not understand this until they are revised.

As described under ‘motivation’, naming functions is useful even if you do not use the name in any clause body. This means that we can have a staged delivery of the feature.

  1. Make the parser recognise fun names and check their identity. Have it report an error if the fun name is used in a body. Have it erase the fun names from the AST before any downstream tool sees it.

    At this stage, fun names may serve as documentation.

  2. Upgrade the downstream tools to recognise an extended 'fun' AST node with two extra fields: the fun name as an atom and a flag saying whether it is not used, used only in tail position, or used more generally.

    Upgrade the parser to report fun names, but retain the check that they are not used. Test the down stream tools.

  3. Modify the compiler to use the new, safer, form of generated name. Ensure that the generated names are accessed only through an interface, so all is consistent.

    At this stage, fun names help to reduce the danger from code revisions that add, remove, or re-order funs; a change that does not alter the number of funs with a particular name in a function should not change its name.

  4. (Optional.) Revise the code generator to accept the fun name in tail call position and generate a jump. Modify the parser to allow this.

    At this point, it is possible to pass a loop as a parameter, like a list traversal or a binary search. No changes to the representation of Erlang terms or the BEAM engine have been required yet.

  5. Add a new tag. Revise the funcall instructions to check for it if the existing check fails, and push the closure itself. Add a new instruction to make a new closure. Revise the Erlang term representation to encode recursive funs. Revise the type test instructions to recognise the new values. Teach HiPE what to do.

    This is the last stage.

Reference Implementation #

None in this draft. Stage 1 can be done fairly easily. Stage 2 would be hard for me because I’m not even sure what all the relevant modules are.

References #

None.

Copyright #

This document has been placed in the public domain.