[Ericsson AB]

8 External Term Format

8.1 Introduction

The external term format is mainly used in the distribution mechanism of Erlang.

Since Erlang has a fixed number of types, there is no need for a programmer to define a specification for the external format used within some application. All Erlang terms has an external representation and the interpretation of the different terms are application specific.

In Erlang the BIF term_to_binary/1,2 is used to convert a term into the external format. To convert binary data encoding a term the BIF binary_to_term/1 is used.

The distribution does this implicitly when sending messages across node boundaries.

The overall format of the term format is:

1 1 N
131 Tag Data

A compressed term looks like this:

1 1 4 N
131 80 UncompressedSize Zlib-compressedData

Uncompressed Size (unsigned 32 bit integer in big-endian byte order) is the size of the data before it was compressed. The compressed data has the following format when it has been expanded:

1 Uncompressed Size
Tag Data

8.2 SMALL_INTEGER_EXT

1 1
97 Int

Unsigned 8 bit integer.

8.3 INTEGER_EXT

1 4
98 Int

Signed 32 bit integer in big-endian format (i.e. MSB first)

8.4 FLOAT_EXT

1 31
99 Float String

A float is stored in string format. the format used in sprintf to format the float is "%.20e" (there are more bytes allocated than necessary). To unpack the float use sscanf with format "%lf".

This term is used in minor version 0 of the external format; it has been superseded by NEW_FLOAT_EXT .

8.5 ATOM_EXT

1 2 Len
100 Len AtomName

An atom is stored with a 2 byte unsigned length in big-endian order, followed by Len numbers of 8 bit characters that forms the AtomName. Note: The maximum allowed value for Len is 255.

8.6 REFERENCE_EXT

1 N 4 1
101 Node ID Creation

Encode a reference object (an object generated with make_ref/0). The Node term is an encoded atom, i.e. ATOM_EXT, NEW_CACHE or CACHED_ATOM. The ID field contains a big-endian unsigned integer, but should be regarded as uninterpreted data since this field is node specific. Creation is a byte containing a node serial number that makes it possible to separate old (crashed) nodes from a new one.

In ID, only 18 bits are significant; the rest should be 0. In Creation, only 2 bits are significant; the rest should be 0. See NEW_REFERENCE_EXT.

8.7 PORT_EXT

1 N 4 1
102 Node ID Creation

Encode a port object (obtained form open_port/2). The ID is a node specific identifier for a local port. Port operations are not allowed across node boundaries. The Creation works just like in REFERENCE_EXT.

8.8 PID_EXT

1 N 4 4 1
103 Node ID Serial Creation

Encode a process identifier object (obtained from spawn/3 or friends). The ID and Creation fields works just like in REFERENCE_EXT, while the Serial field is used to improve safety. In ID, only 15 bits are significant; the rest should be 0.

8.9 SMALL_TUPLE_EXT

1 1 N
104 Arity Elements

SMALL_TUPLE_EXT encodes a tuple. The Arity field is an unsigned byte that determines how many element that follows in the Elements section.

8.10 LARGE_TUPLE_EXT

1 4 N
105 Arity Elements

Same as SMALL_TUPLE_EXT with the exception that Arity is an unsigned 4 byte integer in big endian format.

8.11 NIL_EXT

1
106

The representation for an empty list, i.e. the Erlang syntax [].

8.12 STRING_EXT

1 2 Len
107 Length Characters

String does NOT have a corresponding Erlang representation, but is an optimization for sending lists of bytes (integer in the range 0-255) more efficiently over the distribution. Since the Length field is an unsigned 2 byte integer (big endian), implementations must make sure that lists longer than 65535 elements are encoded as LIST_EXT.

8.13 LIST_EXT

1 4    
108 Length Elements Tail

Length is the number of elements that follows in the Elements section. Tail is the final tail of the list; it is NIL_EXT for a proper list, but may be anything type if the list is improper (for instance [a|b]).

8.14 BINARY_EXT

1 4 Len
109 Len Data

Binaries are generated with bit syntax expression or with list_to_binary/1, term_to_binary/1, or as input from binary ports. The Len length field is an unsigned 4 byte integer (big endian).

8.15 SMALL_BIG_EXT

1 1 1 n
110 n Sign d(0) ... d(n-1)

Bignums are stored in unary form with a Sign byte that is 0 if the binum is positive and 1 if is negative. The digits are stored with the LSB byte stored first. To calculate the integer the following formula can be used:
B = 256
(d0*B^0 + d1*B^1 + d2*B^2 + ... d(N-1)*B^(n-1))

8.16 LARGE_BIG_EXT

1 4 1 n
111 n Sign d(0) ... d(n-1)

Same as SMALL_BIG_EXT with the difference that the length field is an unsigned 4 byte integer.

8.17 NEW_CACHE

1 1 2 Len
78 index Len Atom name

NEW_CACHE works just like ATOM_EXT, but it must also cache the atom in the atom cache in the location given by index. The atom cache is currently only used between real Erlang nodes (not between Erlang nodes and C or Java nodes).

8.18 CACHED_ATOM

1 1
67 index

When the atom cache is in use, index is the slot number in which the atom MUST be located.

8.19 NEW_REFERENCE_EXT

1 2 N 1 N'
114 Len Node Creation ID ...

Node and Creation are as in REFERENCE_EXT.

ID contains a sequence of big-endian unsigned integers (4 bytes each, so N' is a multiple of 4), but should be regarded as uninterpreted data.

N' = 4 * Len.

In the first word (four bytes) of ID, only 18 bits are significant, the rest should be 0. In Creation, only 2 bits are significant, the rest should be 0.

NEW_REFERENCE_EXT was introduced with distribution version 4. In version 4, N' should be at most 12.

See REFERENCE_EXT).

8.20 FUN_EXT

1 4 N1 N2 N3 N4 N5
117 NumFree Pid Module Index Uniq Free vars ...
Pid
is a process identifier as in PID_EXT. It represents the process in which the fun was created.
Module
is an encoded as an atom, using ATOM_EXT, NEW_CACHE or CACHED_ATOM. This is the module that the fun is implemented in.
Index
is an integer encoded using SMALL_INTEGER_EXT or INTEGER_EXT. It is typically a small index into the module's fun table.
Uniq
is an integer encoded using SMALL_INTEGER_EXT or INTEGER_EXT. Uniq is the hash value of the parse for the fun.
Free vars
is NumFree number of terms, each one encoded according to its type.

8.21 NEW_FUN_EXT

1 4 1 16 4 4 N1 N2 N3 N4 N5
112 Size Arity Uniq Index NumFree Module OldIndex OldUniq Pid Free Vars

This is the new encoding of internal funs: fun F/A and fun(Arg1,..) -> ... end.

Size
is the total number of bytes, including the Size field.
Arity
is the arity of the function implementing the fun.
Uniq
is the 16 bytes MD5 of the significant parts of the Beam file.
Index
is an index number. Each fun within a module has an unique index. Index is stored in big-endian byte order.
NumFree
is the number of free variables.
Module
is an encoded as an atom, using ATOM_EXT, NEW_CACHE or CACHED_ATOM. This is the module that the fun is implemented in.
OldIndex
is an integer encoded using SMALL_INTEGER_EXT or INTEGER_EXT. It is typically a small index into the module's fun table.
OldUniq
is an integer encoded using SMALL_INTEGER_EXT or INTEGER_EXT. Uniq is the hash value of the parse tree for the fun.
Pid
is a process identifier as in PID_EXT. It represents the process in which the fun was created.
Free vars
is NumFree number of terms, each one encoded according to its type.

8.22 EXPORT_EXT

1 N1 N2 N3
113 Module Function Arity

This term is the encoding for external funs: fun M:F/A.

Module and Function are atoms (encoded using ATOM_EXT, NEW_CACHE or CACHED_ATOM).

Arity is an integer encoded using SMALL_INTEGER_EXT.

8.23 BIT_BINARY_EXT

1 4 1 Len
77 Len Bits Data

This term represents a bitstring whose length in bits is not a multiple of 8 (created using the bit syntax in R12B and later). The Len field is an unsigned 4 byte integer (big endian). The Bits field is the number of bits that are used in the last byte in the data field, counting from the most significant bit towards the least significant.

8.24 NEW_FLOAT_EXT

1 8
70 IEEE float

A float is stored as 8 bytes in big-endian IEEE format.

This term is used in minor version 1 of the external format.


erts 5.6.4
Copyright © 1991-2008 Ericsson AB