4 The Abstract Format

This document describes the standard representation of parse trees for Erlang programs as Erlang terms. This representation is known as the abstract format. Functions dealing with such parse trees are compile:forms/[1,2] and functions in the modules epp, erl_eval, erl_lint, erl_pp, erl_parse, and io. They are also used as input and output for parse transforms (see the module compile).

We use the function Rep to denote the mapping from an Erlang source construct C to its abstract format representation R, and write R = Rep(C).

The word LINE below represents an integer, and denotes the number of the line in the source file where the construction occurred. Several instances of LINE in the same construction may denote different lines.

Since operators are not terms in their own right, when operators are mentioned below, the representation of an operator should be taken to be the atom with a printname consisting of the same characters as the operator.

4.1 Module declarations and forms

A module declaration consists of a sequence of forms that are either function declarations or attributes.

If D is a module declaration consisting of the forms F_1, ..., F_k, then Rep(D) = [Rep(F_1), ..., Rep(F_k)].
If F is an attribute -module(Mod), then Rep(F) = {attribute,LINE,module,Mod}.
If F is an attribute -export([Fun_1/A_1, ..., Fun_k/A_k]), then Rep(F) = {attribute,LINE,export,[{Fun_1,A_1}, ..., {Fun_k,A_k}]}.
If F is an attribute -import(Mod,[Fun_1/A_1, ..., Fun_k/A_k]), then Rep(F) = {attribute,LINE,import,{Mod,[{Fun_1,A_1}, ..., {Fun_k,A_k}]}}.
If F is an attribute -compile(Options), then Rep(F) = {attribute,LINE,compile,Options}.
If F is an attribute -file(File,Line), then Rep(F) = {attribute,LINE,file,{File,Line}}.
If F is a record declaration -record(Name,{V_1, ..., V_k}), then Rep(F) = {attribute,LINE,record,{Name,[Rep(V_1), ..., Rep(V_k)]}}. For Rep(V), see below.
If F is a wild attribute -A(T), then Rep(F) = {attribute,LINE,A,T}.
If F is a function declaration Name Fc_1 ; ... ; Name Fc_k, where each Fc_i is a function clause with a pattern sequence of the same length Arity, then Rep(F) = {function,LINE,Name,Arity,[Rep(Fc_1), ...,Rep(Fc_k)]}.

Record fields

Each field in a record declaration may have an optional explicit default initializer expression

If V is A, then Rep(V) = {record_field,LINE,Rep(A)}.
If V is A = E, then Rep(V) = {record_field,LINE,Rep(A),Rep(E)}.

Representation of parse errors and end of file

In addition to the representations of forms, the list that represents a module declaration (as returned by functions in erl_parse and epp) may contain tuples {error,E} and {warning,W}, denoting syntactically incorrect forms and warnings, and {eof,LINE}, denoting an end of stream encountered before a complete form had been parsed.

4.2 Atomic literals

There are five kinds of atomic literals, which are represented in the same way in patterns, expressions and guards:

If L is an integer or character literal, then Rep(L) = {integer,LINE,L}.
If L is a float literal, then Rep(L) = {float,LINE,L}.
If L is a string literal consisting of the characters C_1, ..., C_k, then Rep(L) = {string,LINE,[C_1, ..., C_k]}.
If L is an atom literal, then Rep(L) = {atom,LINE,L}.

Note that negative integer and float literals do not occur as such; they are parsed as an application of the unary negation operator.

4.3 Patterns

If Ps is a sequence of patterns P_1, ..., P_k, then Rep(Ps) = [Rep(P_1), ..., Rep(P_k)]. Such sequences occur as the list of arguments to a function or fun.

Individual patterns are represented as follows:

If P is an atomic literal L, then Rep(P) = Rep(L).
If P is a compound pattern P_1 = P_2, then Rep(P) = {match,LINE,Rep(P_1),Rep(P_2)}.
If P is a variable pattern V, then Rep(P) = {var,LINE,A}, where A is an atom with a printname consisting of the same characters as V.
If P is a universal pattern _, then Rep(P) = {var,LINE,'_'}.
If P is a tuple pattern {P_1, ..., P_k}, then Rep(P) = {tuple,LINE,[Rep(P_1), ..., Rep(P_k)]}.
If P is a nil pattern [], then Rep(P) = {nil,LINE}.
If P is a cons pattern [P_h | P_t], then Rep(P) = {cons,LINE,Rep(P_h),Rep(P_t)}.
If E is a binary pattern <<P_1:Size_1/TSL_1, ..., P_k:Size_k/TSL_k>>, then Rep(E) = {bin,LINE,[{bin_element,LINE,Rep(P_1),Rep(Size_1),Rep(TSL_1)}, ..., {bin_element,LINE,Rep(P_k),Rep(Size_k),Rep(TSL_k)}]}. For Rep(TSL), see below. An omitted Size is represented by default. An omitted TSL (type specifier list) is represented by default.
If P is P_1 Op P_2, where Op is a binary operator (this is either an occurrence of ++ applied to a literal string or character list, or an occurrence of an expression that can be evaluated to a number at compile time), then Rep(P) = {op,LINE,Op,Rep(P_1),Rep(P_2)}.
If P is Op P_0, where Op is a unary operator (this is an occurrence of an expression that can be evaluated to a number at compile time), then Rep(P) = {op,LINE,Op,Rep(P_0)}.
If P is a record pattern #Name{Field_1=P_1, ..., Field_k=P_k}, then Rep(P) = {record,LINE,Name, [{record_field,LINE,Rep(Field_1),Rep(P_1)}, ..., {record_field,LINE,Rep(Field_k),Rep(P_k)}]}.
If P is #Name.Field, then Rep(P) = {record_index,LINE,Name,Rep(Field)}.
If P is ( P_0 ), then Rep(P) = Rep(P_0), i.e., patterns cannot be distinguished from their bodies.

Note that every pattern has the same source form as some expression, and is represented the same way as the corresponding expression.

4.4 Expressions

A body B is a sequence of expressions E_1, ..., E_k, and Rep(B) = [Rep(E_1), ..., Rep(E_k)].

An expression E is one of the following alternatives:

If P is an atomic literal L, then Rep(P) = Rep(L).
If E is P = E_0, then Rep(E) = {match,LINE,Rep(P),Rep(E_0)}.
If E is a variable V, then Rep(E) = {var,LINE,A}, where A is an atom with a printname consisting of the same characters as V.
If E is a tuple skeleton {E_1, ..., E_k}, then Rep(E) = {tuple,LINE,[Rep(E_1), ..., Rep(E_k)]}.
If E is [], then Rep(E) = {nil,LINE}.
If E is a cons skeleton [E_h | E_t], then Rep(E) = {cons,LINE,Rep(E_h),Rep(E_t)}.
If E is a binary constructor <<V_1:Size_1/TSL_1, ..., V_k:Size_k/TSL_k>>, then Rep(E) = {bin,LINE,[{bin_element,LINE,Rep(V_1),Rep(Size_1),Rep(TSL_1)}, ..., {bin_element,LINE,Rep(V_k),Rep(Size_k),Rep(TSL_k)}]}. For Rep(TSL), see below. An omitted Size is represented by default. An omitted TSL (type specifier list) is represented by default.
If E is E_1 Op E_2, where Op is a binary operator, then Rep(E) = {op,LINE,Op,Rep(E_1),Rep(E_2)}.
If E is Op E_0, where Op is a unary operator, then Rep(E) = {op,LINE,Op,Rep(E_0)}.
If E is #Name{Field_1=E_1, ..., Field_k=E_k}, then Rep(E) = {record,LINE,Name, [{record_field,LINE,Rep(Field_1),Rep(E_1)}, ..., {record_field,LINE,Rep(Field_k),Rep(E_k)}]}.
If E is E_0#Name{Field_1=E_1, ..., Field_k=E_k}, then Rep(E) = {record,LINE,Rep(E_0),Name, [{record_field,LINE,Rep(Field_1),Rep(E_1)}, ..., {record_field,LINE,Rep(Field_k),Rep(E_k)}]}.
If E is #Name.Field, then Rep(E) = {record_index,LINE,Name,Rep(Field)}.
If E is E_0#Name.Field, then Rep(E) = {record_field,LINE,Rep(E_0),Name,Rep(Field)}.
If E is catch E_0, then Rep(E) = {'catch',LINE,Rep(E_0)}.
If E is E_0(E_1, ..., E_k), then Rep(E) = {call,LINE,Rep(E_0),[Rep(E_1), ..., Rep(E_k)]}.
If E is E_m:E_0(E_1, ..., E_k), then Rep(E) = {call,LINE,{remote,LINE,Rep(E_m),Rep(E_0)},[Rep(E_1), ..., Rep(E_k)]}.
If E is a list comprehension [E_0 || W_1, ..., W_k], where each W_i is a generator or a filter, then Rep(E) = {lc,LINE,Rep(E_0),[Rep(W_1), ..., Rep(W_k)]}. For Rep(W), see below.
If E is a binary comprehension <<E_0 || W_1, ..., W_k>>, where each W_i is a generator or a filter, then Rep(E) = {bc,LINE,Rep(E_0),[Rep(W_1), ..., Rep(W_k)]}. For Rep(W), see below.
If E is begin B end, where B is a body, then Rep(E) = {block,LINE,Rep(B)}.
If E is if Ic_1 ; ... ; Ic_k end, where each Ic_i is an if clause then Rep(E) = {'if',LINE,[Rep(Ic_1), ..., Rep(Ic_k)]}.
If E is case E_0 of Cc_1 ; ... ; Cc_k end, where E_0 is an expression and each Cc_i is a case clause then Rep(E) = {'case',LINE,Rep(E_0),[Rep(Cc_1), ..., Rep(Cc_k)]}.
If E is try B catch Tc_1 ; ... ; Tc_k end, where B is a body and each Tc_i is a catch clause then Rep(E) = {'try',LINE,Rep(B),[],[Rep(Tc_1), ..., Rep(Tc_k)],[]}.
If E is try B of Cc_1 ; ... ; Cc_k catch Tc_1 ; ... ; Tc_n end, where B is a body, each Cc_i is a case clause and each Tc_j is a catch clause then Rep(E) = {'try',LINE,Rep(B),[Rep(Cc_1), ..., Rep(Cc_k)],[Rep(Tc_1), ..., Rep(Tc_n)],[]}.
If E is try B after A end, where B and A are bodies then Rep(E) = {'try',LINE,Rep(B),[],[],Rep(A)}.
If E is try B of Cc_1 ; ... ; Cc_k after A end, where B and A are a bodies and each Cc_i is a case clause then Rep(E) = {'try',LINE,Rep(B),[Rep(Cc_1), ..., Rep(Cc_k)],[],Rep(A)}.
If E is try B catch Tc_1 ; ... ; Tc_k after A end, where B and A are bodies and each Tc_i is a catch clause then Rep(E) = {'try',LINE,Rep(B),[],[Rep(Tc_1), ..., Rep(Tc_k)],Rep(A)}.
If E is try B of Cc_1 ; ... ; Cc_k catch Tc_1 ; ... ; Tc_n after A end, where B and A are a bodies, each Cc_i is a case clause and each Tc_j is a catch clause then Rep(E) = {'try',LINE,Rep(B),[Rep(Cc_1), ..., Rep(Cc_k)],[Rep(Tc_1), ..., Rep(Tc_n)],Rep(A)}.
If E is receive Cc_1 ; ... ; Cc_k end, where each Cc_i is a case clause then Rep(E) = {'receive',LINE,[Rep(Cc_1), ..., Rep(Cc_k)]}.
If E is receive Cc_1 ; ... ; Cc_k after E_0 -> B_t end, where each Cc_i is a case clause, E_0 is an expression and B_t is a body, then Rep(E) = {'receive',LINE,[Rep(Cc_1), ..., Rep(Cc_k)],Rep(E_0),Rep(B_t)}.
If E is fun Name / Arity, then Rep(E) = {'fun',LINE,{function,Name,Arity}}.
If E is fun Module:Name/Arity, then Rep(E) = {'fun',LINE,{function,Rep(Module),Rep(Name),Rep(Arity)}}. (Before the R15 release: Rep(E) = {'fun',LINE,{function,Module,Name,Arity}}.)
If E is fun Fc_1 ; ... ; Fc_k end where each Fc_i is a function clause then Rep(E) = {'fun',LINE,{clauses,[Rep(Fc_1), ..., Rep(Fc_k)]}}.
If E is query [E_0 || W_1, ..., W_k] end, where each W_i is a generator or a filter, then Rep(E) = {'query',LINE,{lc,LINE,Rep(E_0),[Rep(W_1), ..., Rep(W_k)]}}. For Rep(W), see below.
If E is E_0.Field, a Mnesia record access inside a query, then Rep(E) = {record_field,LINE,Rep(E_0),Rep(Field)}.
If E is ( E_0 ), then Rep(E) = Rep(E_0), i.e., parenthesized expressions cannot be distinguished from their bodies.

Generators and filters

When W is a generator or a filter (in the body of a list or binary comprehension), then:

If W is a generator P <- E, where P is a pattern and E is an expression, then Rep(W) = {generate,LINE,Rep(P),Rep(E)}.
If W is a generator P <= E, where P is a pattern and E is an expression, then Rep(W) = {b_generate,LINE,Rep(P),Rep(E)}.
If W is a filter E, which is an expression, then Rep(W) = Rep(E).

Binary element type specifiers

A type specifier list TSL for a binary element is a sequence of type specifiers TS_1 - ... - TS_k. Rep(TSL) = [Rep(TS_1), ..., Rep(TS_k)].

When TS is a type specifier for a binary element, then:

If TS is an atom A, Rep(TS) = A.
If TS is a couple A:Value where A is an atom and Value is an integer, Rep(TS) = {A, Value}.

4.5 Clauses

There are function clauses, if clauses, case clauses and catch clauses.

A clause C is one of the following alternatives:

If C is a function clause ( Ps ) -> B where Ps is a pattern sequence and B is a body, then Rep(C) = {clause,LINE,Rep(Ps),[],Rep(B)}.
If C is a function clause ( Ps ) when Gs -> B where Ps is a pattern sequence, Gs is a guard sequence and B is a body, then Rep(C) = {clause,LINE,Rep(Ps),Rep(Gs),Rep(B)}.
If C is an if clause Gs -> B where Gs is a guard sequence and B is a body, then Rep(C) = {clause,LINE,[],Rep(Gs),Rep(B)}.
If C is a case clause P -> B where P is a pattern and B is a body, then Rep(C) = {clause,LINE,[Rep(P)],[],Rep(B)}.
If C is a case clause P when Gs -> B where P is a pattern, Gs is a guard sequence and B is a body, then Rep(C) = {clause,LINE,[Rep(P)],Rep(Gs),Rep(B)}.
If C is a catch clause P -> B where P is a pattern and B is a body, then Rep(C) = {clause,LINE,[Rep({throw,P,_})],[],Rep(B)}.
If C is a catch clause X : P -> B where X is an atomic literal or a variable pattern, P is a pattern and B is a body, then Rep(C) = {clause,LINE,[Rep({X,P,_})],[],Rep(B)}.
If C is a catch clause P when Gs -> B where P is a pattern, Gs is a guard sequence and B is a body, then Rep(C) = {clause,LINE,[Rep({throw,P,_})],Rep(Gs),Rep(B)}.
If C is a catch clause X : P when Gs -> B where X is an atomic literal or a variable pattern, P is a pattern, Gs is a guard sequence and B is a body, then Rep(C) = {clause,LINE,[Rep({X,P,_})],Rep(Gs),Rep(B)}.

4.6 Guards

A guard sequence Gs is a sequence of guards G_1; ...; G_k, and Rep(Gs) = [Rep(G_1), ..., Rep(G_k)]. If the guard sequence is empty, Rep(Gs) = [].

A guard G is a nonempty sequence of guard tests Gt_1, ..., Gt_k, and Rep(G) = [Rep(Gt_1), ..., Rep(Gt_k)].

A guard test Gt is one of the following alternatives:

If Gt is an atomic literal L, then Rep(Gt) = Rep(L).
If Gt is a variable pattern V, then Rep(Gt) = {var,LINE,A}, where A is an atom with a printname consisting of the same characters as V.
If Gt is a tuple skeleton {Gt_1, ..., Gt_k}, then Rep(Gt) = {tuple,LINE,[Rep(Gt_1), ..., Rep(Gt_k)]}.
If Gt is [], then Rep(Gt) = {nil,LINE}.
If Gt is a cons skeleton [Gt_h | Gt_t], then Rep(Gt) = {cons,LINE,Rep(Gt_h),Rep(Gt_t)}.
If Gt is a binary constructor <<Gt_1:Size_1/TSL_1, ..., Gt_k:Size_k/TSL_k>>, then Rep(Gt) = {bin,LINE,[{bin_element,LINE,Rep(Gt_1),Rep(Size_1),Rep(TSL_1)}, ..., {bin_element,LINE,Rep(Gt_k),Rep(Size_k),Rep(TSL_k)}]}. For Rep(TSL), see above. An omitted Size is represented by default. An omitted TSL (type specifier list) is represented by default.
If Gt is Gt_1 Op Gt_2, where Op is a binary operator, then Rep(Gt) = {op,LINE,Op,Rep(Gt_1),Rep(Gt_2)}.
If Gt is Op Gt_0, where Op is a unary operator, then Rep(Gt) = {op,LINE,Op,Rep(Gt_0)}.
If Gt is #Name{Field_1=Gt_1, ..., Field_k=Gt_k}, then Rep(E) = {record,LINE,Name, [{record_field,LINE,Rep(Field_1),Rep(Gt_1)}, ..., {record_field,LINE,Rep(Field_k),Rep(Gt_k)}]}.
If Gt is #Name.Field, then Rep(Gt) = {record_index,LINE,Name,Rep(Field)}.
If Gt is Gt_0#Name.Field, then Rep(Gt) = {record_field,LINE,Rep(Gt_0),Name,Rep(Field)}.
If Gt is A(Gt_1, ..., Gt_k), where A is an atom, then Rep(Gt) = {call,LINE,Rep(A),[Rep(Gt_1), ..., Rep(Gt_k)]}.
If Gt is A_m:A(Gt_1, ..., Gt_k), where A_m is the atom erlang and A is an atom or an operator, then Rep(Gt) = {call,LINE,{remote,LINE,Rep(A_m),Rep(A)},[Rep(Gt_1), ..., Rep(Gt_k)]}.
If Gt is {A_m,A}(Gt_1, ..., Gt_k), where A_m is the atom erlang and A is an atom or an operator, then Rep(Gt) = {call,LINE,Rep({A_m,A}),[Rep(Gt_1), ..., Rep(Gt_k)]}.
If Gt is ( Gt_0 ), then Rep(Gt) = Rep(Gt_0), i.e., parenthesized guard tests cannot be distinguished from their bodies.

Note that every guard test has the same source form as some expression, and is represented the same way as the corresponding expression.

4.7 The abstract format after preprocessing

The compilation option debug_info can be given to the compiler to have the abstract code stored in the abstract_code chunk in the BEAM file (for debugging purposes).

In OTP R9C and later, the abstract_code chunk will contain

{raw_abstract_v1,AbstractCode}

where AbstractCode is the abstract code as described in this document.

In releases of OTP prior to R9C, the abstract code after some more processing was stored in the BEAM file. The first element of the tuple would be either abstract_v1 (R7B) or abstract_v2 (R8B).