Author:
Björn-Egil Dahlberg <egil(at)Erlang.org>
Status:
Final/R17B Implemented in OTP R17B (except ...)
Type:
Standards Track
Created:
04-Apr-2013
Erlang-Version:
R17A

EEP 43: Maps #

Abstract #

The journey of Maps and this EEP has been long and by no means a straight-forward and continuous one. I had a crystal clear picture of what I wanted Maps to be when we first started discussing it within OTP about two-three years ago. This EEP resembles that vision but it has had a lot of contributions of other ideas from both within and outside of OTP.

The idea was a data-type, a syntax aware mapping of key-value associations with pattern matching. A syntax similar to records but without the hassle of compile-time dependency and with arbitrary terms as keys. Order was not important and it could be implemented with a Hash-Array-Mapped-Trie with good performance and memory trade-offs. This was a different approach than to replace records. It was meant to replace records where suitable and in other regards not be a replacement but its own thing.

From the community there has been many wishes of a Map like data-type and a few suggestions. The one suggestion that stands out is of course the Frames proposal from Richard O’Keefe. It is the most complete proposal I’ve seen and is very well thought out. Its goal is to be a record replacement and the proposal satisfies this goal very well.

  • If Frames are that good, why a separate EEP?
  • It boils down to goals and constraints.

A record replacement is just that, a replacement. It’s like asking the question, “What do we have?” instead of “What can we get?” The instant rebuttal would be “What do we need?” I say Maps.

Frames has certainly inspired and influenced Maps. In many regards Maps also encompasses Frames but Maps tries to do more. In the end I believe they are two different things and have different goals.

This EEP suggests a new built in data-type for Erlang, the map, #{ Key => Value }.

The new data-type shall have semantics, syntax and operations that:

  • provides an association set from key terms to value terms which can be constructed, accessed and updated using language syntax
  • can be uniquely distinguished from every other data-type in the language
  • has no compile-time dependency for constructing, accessing or updating contents of maps nor for passing maps between modules, processes or over Erlang distribution
  • can be used in matching expressions in the language
  • has a one-to-one association between printing and parsing the data-type
  • has a well defined order between terms of the type and other Erlang types
  • has at most O(log N) time complexity in insert and lookup operations, where N is the number of key-value associations.

Similar data-types exists in other languages, i.e. perl hashes, ruby hashes, python dictionaries, or scala maps.

Specification #

A map M consists of a number of associations and keeps an association from key terms K1..Kn to value terms V1..Vn where no two keys match. Any term, compound or otherwise, is a viable key or value. Terms of type Map are recognized by guard tests erlang:is_map/1. There are no operators acting on maps. Within maps there are two infix operators. An association operator, =>, pairs a key to a value and is used in creation and updates. A set-value operator, :=, is used to update a value on an already existing and matching key. The set-value operator is also used in matching to get the associated value from a key.

Terminology #

The size of a map is the number of associations in its set.

An association is a key-value pair of key K to value V in a Map.

Two keys, K1 and K2 are matching if, true = K1 =:= K2.

Syntax #

Defined syntax for declaring, updating and matching maps.

Construction syntax #

Constructing a new map is done by letting an expression K be associated to another expression V:

#{ K => V }

New maps may include multiple associations at construction by listing every association:

#{ K1 => V1, .. Kn => Vn }

An empty map is constructed by not associating any terms with each other:

#{}

All keys and values in the map are terms. Any expression is first evaluated and then the resulting terms are used as key and value respectively.

Keys and values are separated by the => arrow and associations are separated by ,.

Examples:

M0 = #{},                   % empty map
M1 = #{ a => <<"hello">> }, % single association with literals
M2 = #{ 1 => 2, b => b },   % multiple associations with literals
M3 = #{ A => B },           % single association with variables
M4 = #{ {A, B} => f() }.    % compound key associated to an evaluated expression

where, A and B are any expressions and M0 through M4 are the resulting map terms.

If two matching keys are declared, the latter key will take precedent.

Example:

1> #{ 1 => a, 1 => b }.
#{ 1 => b }
2> #{ 1.0 => a, 1 => b }.
#{ 1 => b, 1.0 => a }

The order in which the expressions constructing the keys and their associated values are evaluated is not defined. The syntactic order of the key-value pairs in the construction is of no relevance, except in the above mentioned case of two matching keys.

A simple BNF grammar for the construction follows:

 <map-construct>  ::= '#{' <key-value-exprs> '}'
<key-value-exprs> ::= /* empty */
                    | <key-value-list>
 <key-value-list> ::= <key-value-assoc>
                    | <key-value-assoc> ',' <key-value-list>
<key-value-assoc> ::= <expr> '=>' <expr>
           <expr> ::= <Erlang-expression>

Update syntax #

Updating a map has similar syntax as constructing it.

An expression defining the map to be updated is put in front of the expression defining the keys to be updated and their respective values.

M#{ K => V }

where M is an term of type map and K and V are any expression.

If key K does not match any existing key in the map, a new association will be created from key K to value V. If key K matches an existing key in map M its associated value will be replaced by the new value V. In both cases the evaluated map expression will return a new map.

If M is not of type map an exception of type badmap is thrown.

To only update an existing value, the following syntax is used,

M#{ K := V }

where M is an term of type map, V is an expression and K is an expression which evaluates to an existing key in M.

If key K does not match any existing keys in map M an exception of type badarg will be triggered at runtime. If a matching key K is present in map M its associated value will be replaced by the new value V and the evaluated map expression returns a new map.

If M is not of type map an exception of type badmap is thrown.

Examples:

M0 = #{},
M1 = M0#{ a => 0 },
M2 = M1#{ a => 1, b => 2 },
M3 = M2#{ "function" => fun() -> f() end },
M4 = M3#{ a := 2, b := 3 }.  % 'a' and 'b' was added in `M1` and `M2`.

where M0 is any map. It follows that M1 .. M4 are maps as well.

More Examples:

1> M = #{ 1 => a }.
#{ 1 => a }

2> M#{ 1.0 => b }.
#{ 1 => a, 1.0 => b }.

3> M#{ 1 := b }.
#{ 1 => b }

4> M#{ 1.0 := b }.
** exception error: bad argument

As in construction, the order in which the key and value expressions are evaluated are not defined. The syntactic order of the key-value pairs in the update is of no relevance, except in the case where two keys match, in which case the latter value is used.

A simple BNF grammar for map updates follows:

 <map-construct>  ::= <map-expr> '#{' <key-value-exprs> '}'
<key-value-exprs> ::= /* empty */
                    | <key-value-list>
 <key-value-list> ::= <key-value>
                    | <key-value> ',' <key-value-list>
      <key-value> ::= <key-value-assoc>
                    | <key-value-exact>
<key-value-assoc> ::= <expr> '=>' <expr>
<key-value-exact> ::= <expr> ':=' <expr>
       <map-expr> ::= <Erlang expression evaluating to a term of type map>
           <expr> ::= <Erlang expression>

Accessing a single value #

For accessing single values in maps, let us use an de-association:

V = M#{ K }.

Where M is a Map and K is any term.

If key K matches to an existing key in map M the associated value will be bound to V. If key K does not match to any existing key in map M an exception badarg will occur in runtime.

Examples:

M1 = #{ a => 1, c => 3 },
3 = M1#{ c }.

M2 = #{ 1.0 => a },
a = M2#{ 1 }.

Matching syntax #

Matching of key-value associations from maps, exemplified with the matching operator, is done in the following way:

#{ K := V } = M

where M is any map. The key K has to be an expression with bound variables or a literals, and V can be any pattern with either bound or unbound variables. If variables in V are unbound, it will be bound to the value associated with the key K, which has to exist in the map M. If variables in V are bound, it has to match the value associated with K in M.

Example:

M = #{ a => {1,2}},
#{ a := {1,B}} = M.

This will bind variable B to integer 2.

Similarly, multiple values from the map may be matched:

#{ K1 := V1, .., Kn := Vn } = M

where keys K1 .. Kn are any expressions with literals or bound variables. If all keys exists in map M all variables in V1 .. Vn will be matched to the associated values of there respective keys.

If the matching conditions are not met, the match will fail, either with

  1. a badmatch exception, if used in the context of the matching operator as in the example,
  2. or resulting in the next clause being tested in function heads and case expressions.

Matching in maps only allows for := as delimiters of associations.

The order in which keys are declared in matching has no relevance.

Duplicate keys are allowed in matching and will match each pattern associated to the keys.

#{ K := V1, K := V2 } = M

Matching an expression against an empty map literal will match its type but no variables will be bound:

#{} = Expr

This expression will match if the expression Expr is of type map, otherwise it will fail with an exception badmatch.

The grammar for the matching syntax is similar to that of construction.

Matching syntax: Example with literals in function heads #

Matching of literals as keys are allowed in function heads.

%% only start if not_started
handle_call(start, From, #{ state := not_started } = S) ->
...
    {reply, ok, S#{ state := start }};

%% only change if started
handle_call(change, From, #{ state := start } = S) ->
...
    {reply, ok, S#{ state := changed }};

Matching syntax: Frequency example #

More matching syntax, calculating frequency of terms in a list.

freq(Is)                    -> freq(Is, #{}).
freq([I|Is], #{I := C} = M) -> freq(Is, M#{ I := C + 1});
freq([I|Is], M)             -> freq(Is, M#{ I => 1 });
freq([], M)                 -> maps:to_list(M).

Equivalent code with gb_trees for comparison:

freq(Is)        -> freq(Is, gb_trees:empty()).
freq([I|Is], T) ->
    case gb_trees:lookup(I, T) of
        none       -> freq(Is, gb_trees:enter(I, 1), T);
        {value, V} -> freq(Is, gb_trees:enter(I, V + 1, T))
    end;
freq([], T) -> gb_trees:to_list(T).

Matching syntax: File information example #

Old API’s could be refined to use map syntax:

1> {ok, #{ type := Type, mtime := Mtime }} = file:read_file_info(File).
2> io:format("type: ~p, mtime: ~p~n", [Type, Mtime]).
type: regular, mtime: {{2012,7,18},{19,59,18}}
ok
3>

Map comprehension syntax #

Map comprehension declaration:

M1 = #{ E0 => E1 || K := V <- M0  }

where M0 is any Map, E0 and E1 are any erlang expression, K and V constitutes the pattern to be matched by each association in M0.

For each sequence in the generator an association is created from the evaluated expression E0 to the evaluated expression E1.

If M0 is not a Map, then a runtime exception of type {bad_generator, M0} will be generated.

A simple BNF grammar for map comprehension follows:

      <comprehension> ::= '#{' <key-value-assoc> '||' <comprehension-exprs> '}'
<comprehension-exprs> ::= <comprehension-expr>
                        | <comprehension-exprs> ',' <comprehension-expr>
 <comprehension-expr> ::= <generator>
                        | <filter>
          <generator> ::= <key-value-exact> '<-' <expr>
             <filter> ::= <expr>
    <key-value-assoc> ::= <expr> '=>' <expr>
    <key-value-exact> ::= <expr> ':=' <expr>
               <expr> ::= <Erlang expression>

Each association from all generators, which satisfies the filters, has an environment that consist of the initial environment and the environment for the association.

Examples:

M0 = #{ K => V*2  || K := V <- map() },
M1 = #{ I => f(I) || I <- list() },
M2 = #{ K => V    || <<L:8,K:L/binary,V/float>> <= binary() }.

Map generators may also be used in binary and list comprehensions.

Examples:

B1 = << <<V:8>> || _ := V <- map() >>,
L1 = [ {K,V} || K := V <- map() ].

Dialyzer and Type specification #

Keys known before hand can be specified directly and uniquely for a map.

-spec func(Opt, M) -> #{ 'status' => S, 'c' => integer() } when
      Opt :: 'inc' | 'dec',
        M :: #{ 'status' => S, 'c' => integer() },
        S :: 'update' | 'keep'.

func(inc, #{ status := update, c := C} = M) -> M#{ c := C + 1};
func(dec, #{ status := update, c := C} = M) -> M#{ c := C - 1};
func(_,   #{ status := keep } = M)          -> M.

It could also be specified by type only.

-spec plist_to_map(Ls) -> #{ binary() => integer() } when
      Ls :: [{binary(), integer()}].

plist_to_map([], M) ->
    M;
plist_to_map([{K,V}|Ls], M) when is_binary(K), is_integer(V) ->
    plist_to_map(Ls, M#{ K => V });
plist_to_map([_|Ls], M) ->
    plist_to_map(Ls, M).

It can similarly be specified as a type.

-type map1() :: #{ binary() => integer() }.
-type map2() :: #{ <<"list1">> | <<"list2">> => [numbers()] }.

Functions and Semantics #

The module implementing the functions is currently named in plural, maps in the same spirit as lists, gb_trees, sets etc but the singular map is shorter and may be more desirable.

Functions and semantics for maps. Originally much inspiration was from Richard O’Keefes frames proposal.

Erlang module extension #

erlang:is_map(M :: term()) -> boolean(). #

This function is a guard function.

Syntax equivalence: try #{} = M, true catch error:{badmatch,_} -> false end.

The function returns true if M is a map otherwise false.

erlang:map_size(M :: map()) -> non_neg_integer(). #

This function is a guard function.

Syntax equivalence: none.

The function returns the number of key-value pairs in the map. This operation happens in constant time.

Same as maps:size(M).

maps module #

maps:remove(K0 :: term(), M0 :: map()) -> M1 :: map(). #

Syntax equivalence: #{ K => V || K := V <- M0, K =/= K0 }. Only with comprehensions

The function removes the key K0, if it exists, and its associated value from map M0 and returns a new map M1 without key K0.

Same as, maps:from_list([{K,V}||{K,V} <- maps:to_list(M0), K =/= K0])

maps:get(K :: term(), M :: map()) -> V :: term(). #

Syntax equivalence: M#{ K }.

Returns the value V associated with key K if map M contains a key that matches K. If no value is associated with key K then the call will fail with an exception.

maps:keys(M :: map()) -> [K1, .., Kn]. #

Syntax equivalence: [K || K := _ <- M].

Returns a complete list of Keys, in arbitrary order, which resides within map M.

Same as, [K || {K,_} <- maps:to_list(M)].

maps:find(K :: term(), M :: map()) -> {ok, V :: term()} | error. #

Syntax equivalence: try #{ K := V } = M, V catch error:{badmatch,_} -> error end.

Returns a tuple {ok, V} with value V associated with key K if map M contains key K. If no value is associated with key K then the function will return error.

maps:fold(F :: fun((K :: term(), V :: term(), In :: term()) -> Out :: term()), A :: term(), M :: map()) -> Result :: term(). #

Syntax equivalence: none

Calls F(K, V, In) for every key K to value V association in map M in arbitrary order. The function fun F/3 must return a new accumulator which is passed to the next successive call. maps:fold/3 returns the final value of the accumulator. The initial accumulator value A is returned if the map is empty.

Same as, lists:foldl(fun({K,V}, In) -> F(K,V,In) end, A, maps:to_list(M)).

maps:from_list([{K1,V1}, .., {Kn,Vn}]) -> M :: map(). #

Syntax equivalence: #{ K1 => V1, .., Kn => Vn }

The function takes a list of key-value tuples elements and builds a map. The associations may be in any order and both keys and values in the association may be of any term. If the same key appears more than once, the latter (rightmost) value is used and the previous values are ignored.

maps:is_key(K :: term(), M :: map()) -> bool(). #

Syntax equivalence: try #{ K := _ } = M, true catch error:{badmatch, _} -> false end.

Returns true if map M contains a key that matches K.

maps:map(F :: function(), M0 :: map()) -> M1 :: map(). #

Syntax equivalence: #{ K => F(K,V) || K := V <- M }. Only with comprehensions

The function produces a new map M1 by calling the function fun F(K, V) for every key K to value V association in map M0 in arbitrary order. The function fun F/2 must return the value to be associated with key K for the new map M1.

Same as, maps:from_list(lists:map(fun({K,V}) -> {K, F(K,V)} end, maps:to_list(M))).

maps:new() -> M :: map(). #

Syntax equivalence: #{}.

Returns a new empty map.

Same as, maps:from_list([]).

maps:size(M :: map()) -> Size :: non_neg_integer(). #

Syntax equivalence: none.

The function returns the number of key-value associations in the map. This operation happens in constant time.

Same as erlang:map_size(M).

maps:put(K :: term(), V :: term(), M0 :: map()) -> M1 :: map(). #

Syntax equivalence: M0#{ K => V }.

Associates key K with value V and inserts the association into map M0. If a key exists that matches K, the old associated value is replaced by value V. The function returns a new map M1 containing the new association.

Same as, maps:from_list(maps:to_list(M0) ++ [{K,V}]).

maps:to_list(M :: map()) -> [{K1,V1}, ..., {Kn,Vn}]. #

Syntax equivalence: [{K, V} || K := V <- M].

Where the pairs, [{K1,V1}, ..., {Kn,Vn}], are returned in arbitrary order.

maps:update(K :: term(), V :: term, M0 :: map()) -> M1 :: map() #

Syntax equivalence: M0#{ K := V }.

If a key exists that matches K, the old associated value is replaced by value V. The function returns a new map M1 containing the new associated value.

Same as, maps:from_list(maps:to_list(M0) ++ [{K,V}]).

maps:values(M :: map()) -> [V1, .., Vn]. #

Syntax equivalence: [V || _ := V <- M].

Returns a complete list of values, in arbitrary order, contained in map M.

Same as, [V || {_,V} <- maps:to_list(M)].

maps:without([K1, .., Kn] = Ks, M0 :: map()) -> M1 :: map(). #

Syntax equivalence: #{ K => V || K := V <- M0, not lists:member(K, Ks) }. Only with comprehensions

Removes keys K1 through Kn, and their associated values, from map M0 and returns a new map M1.

Same as, maps:from_list([{K,V}||{K,V} <- maps:to_list(M0), not lists:member(K, Ks)]).

maps:merge(M0 :: map(), M1 :: map()) -> M2 :: map(). #

Syntax equivalence: none

Merges two maps into a single map. If two matching keys exists in both maps the value in map M0 will be superseded by the value in map M1.

Equality and Ordering #

Equality #

In the case of term A and term B both are maps,

  • If A and B have different sizes, then A and B are not equal.
  • Otherwise, if all corresponding keys of A and B are pair- wise equal with their corresponding values, then A and B are equal.
  • Otherwise, A and B are not equal.

It follows that two maps are equal if, and only if, they are of the same, type, size and all corresponding key-value associations are pairwise equal.

Ordering #

The term order is defined in Erlang specification 4.7 and quoted below:

  • The terms are primarily ordered according to their type, in the following order:

      numbers < atoms < refs < ports < PIDs < tuples < empty list < conses < binary
    

The specification is incomplete here, the actual term order is:

numbers < atoms < refs < funs < ports < pids < tuples < empty list < conses < binaries

The Maps data-type are ordered next after tuples:

numbers < atoms < refs < funs < ports < pids < tuples < maps < empty list < conses < binaries
                                                        ----

Maps are then ordered first by their size and then according to their respective keys and lastly by the associated values in Erlang term order.

Given two maps, M1 and M2, with the same size, they are compared so that each key, in Erlang term order of the keys, in M1 is compared to the corresponding key of M2. All keys are compared first, then the values, until a difference is found. If a key or value differs, the order of the respective terms, in Erlang term order, is the order of the maps. If no key-value pairs differ, the maps are considered equal.

Example:

> #{ b => 2 } > #{ a => 2 }.         % b > a
true
> #{ b => 2 } > #{ a => 1, b => 2 }. % size 1 < size 2
false
> #{ b => 1 } > #{ b => 2}.          % 1 < 2
false
> #{ b => 2, c => 3 } > #{ b => 1, d => 3}.  % c > d, compared before 2 and 1
false
> #{ b => 1 } > #{ 1 => 1}.          % b > 1
true
> #{ 1.0 => a } == #{ 1 => a }.      % 1.0 == 1
true
> #{ 1.0 => a } =:= #{ 1 => a }.     % 1.0 =:= 1
false

Maps are printed with keys in arbitrary order.

Operator Precedence #

Map association operator and set-value operator is ordered last, after match-operator and catch.

:
#
Unary + - bnot not
/ * div rem band and          Left associative
+ - bor bxor bsl bsr or xor   Left associative
++ --                         Right associative
== /= =< < >= > =:= =/=
andalso
orelse
= !                           Right associative
catch
=> :=

It follows that the map expression:

#{ key => C = 1 + 2 }

will evaluate in the following order:

#{ key => ( C = ( 1 + 2 ) ) }

Pattern matching #

Pattern matching is very powerful Erlang tool. Maps introduces a couple of new features with its pattern matching.

Pattern matching: Basics #

We will exemplify using the match operator.

Pattern matching with maps is similar to records on the surface. Keys requested in a LHS pattern will be bound with the values which is found in the RHS association map.

1> #{ a := V } = #{ a => 1 }.
#{ a => 1 }
2> 1 = V.
1

Keys requested in a LHS pattern which is not found in the RHS map will produce an exception, exception error: no match of right hand side value ....

1> #{ not_in_map := V } = #{ a => 1 }.
** exception error: no match of right hand side value #{ a => 1 }

Similarly, if a value for a requested key in the LHS pattern does not match the keys associated value in the RHS map the match will produce an exception.

1> #{ a := 10 } = #{ a => 1 }.
** exception error: no match of right hand side value #{ a => 1 }

Only the keys requested will bind to associations. Any unrequested keys which resides within the map being matched will be ignored.

1> #{ a := V1, b := V2 } = #{ a => 1, b => 2, c => 3}.
#{ a => 1, b => 2, c => 3 }
2> 1 = V1.
1
3> 2 = V2.
2

The order of keys requested has no significance when pattern matching a map.

1> #{ a := "1", b := "2" } = #{ a => "1", b => "2" }.
#{ a => "1", b => "2" }
2> #{ b := "2", a := "1" } = #{ a => "1", b => "2" }.
#{ a => "1", b => "2" }

Pattern matching: Continued #

The example below is a constructed example to illustrate the power of map pattern matching.

A match expression is evaluated so that variables used as keys in the expression are bound before they are evaluated (if possible).

As an example, keys can be bound by other key-value associations.

1> #{ K := V, id := K } = M = #{ id => b, a => 1, b => 2, c => 3}.
#{ id => b, a => 1, b => 2, c => 3}
2> b = K.
b
3> 2 = V.
2

In this case, the bound key id is evaluated first and looked up in M, binding the variable K. The K bound to b can then be used to bind V to 2.

Binding variables used as keys requires that there is a possible order of binding without cycles. The reordering extends to all terms in a matching expression, so that:

1> { #{ X := Y }, X } = { #{ 5 => 10 }, 5 }.

with X and Y unbound, results in a successful match binding X to 5 and Y to 10.

This is particular useful when updating specifics in map associations:

%% Function declared in module map_example
update_values([{K, V1}|Ls], #{ K := V0 } = M) -> update_values(Ls, M#{ K := V0 + V1 });
update_values([_|Ls], M) -> update_values(Ls, M);
update_values([], M)     -> M.

The first function clause is important here. Key K is bound in the tuple and will be used to request value V0 from map M. The map M is then updated to associate key K with the new value V0 + V1.

%% In the Erlang shell
1> M = #{ "a" => 1, "b" => 2, "c" => 3 }.
#{ "a" => 1, "b" => 2, "c" => 3 }

2> map_example:update_values([{"b", 10}, {"c", 20}, {"d", 30 }], M).
#{ "a" => 1, "b" => 12, "c" => 23 }

Note that since key "d" does not reside in map M it fails to match the first clause and does not update the map with the association "d" => 40.

An expression where the dependencies in the LHS of the match are cyclic, like:

1>  #{X := Y, Y := X} = #{5 => 10, 10 => 5}.

will result in an evaluator error (variable is unbound) or a compilation error.

External Term Format #

There are 255 tags that can be used to encode terms for external binary distribution. All tags are defined in external.h. The encoding starts by a magic byte 131.

The encoding tags used in R15B01 are the following:

SMALL_INTEGER_EXT        'a'     97
INTEGER_EXT              'b'     98
FLOAT_EXT                'c'     99
ATOM_EXT                 'd'    100
SMALL_ATOM_EXT           's'    115
REFERENCE_EXT            'e'    101
NEW_REFERENCE_EXT        'r'    114
PORT_EXT                 'f'    102
NEW_FLOAT_EXT            'F'     70
PID_EXT                  'g'    103
SMALL_TUPLE_EXT          'h'    104
LARGE_TUPLE_EXT          'i'    105
NIL_EXT                  'j'    106
STRING_EXT               'k'    107
LIST_EXT                 'l'    108
BINARY_EXT               'm'    109
BIT_BINARY_EXT           'M'     77
SMALL_BIG_EXT            'n'    110
LARGE_BIG_EXT            'o'    111
NEW_FUN_EXT              'p'    112
EXPORT_EXT               'q'    113
FUN_EXT                  'u'    117

DIST_HEADER              'D'     68
ATOM_CACHE_REF           'R'     82
ATOM_INTERNAL_REF2       'I'     73
ATOM_INTERNAL_REF3       'K'     75
BINARY_INTERNAL_REF      'J'     74
BIT_BINARY_INTERNAL_REF  'L'     76
COMPRESSED               'P'     80

For Maps we define tag MAP_EXT to 116 (t).

Data layout:

MAP_EXT
|-----------------------------------
|  1  |    4   |        |          |
|-----------------------------------
| 116 |  Size  |  Keys  |  Values  |
|-----------------------------------

The Size specifies the number of keys and values that follows the size descriptor.

An open questions, optimize for:

  1. encoding/decoding speed?
  2. ease of access?
  3. memory size?

Memory size should be a priority since we send this data over the wire. We should promote ease of access so other languages can integrate towards the format.

This leads to a flat and simple structure. It follows that encoding/decoding takes a performance hit.

Motivation #

Why would we need maps when we have records, dicts, gb_trees, ets and proplists?

Maps are envisioned to be an easy to use, lightweight yet powerful key-value association store.

Maps utilizes one of Erlang’s major strengths, pattern matching, to enrich user experience and provide a powerful tool to simplify code development. Pattern matching gives Maps a clear edge over dicts, gb_trees or proplists in usability.

Maps provides the possibility to associate arbitrary terms as keys, not only atoms, with arbitrary terms as values in a matching capable data-type.

Maps does not claim to be an replacement to records as the frames proposal does. Instead maps targets a larger usage domain and wishes to be a complement to records and supersede them where suitable.

Maps - Two approaches #

  1. Maps as an association array with pattern matching and syntax for constructing, accessing or updating them.
  2. Maps as a record replacement.

Maps were not envisioned as a record replacement at first, it was a hopeful requirement added later. A record replacement approach does not necessarily restrain any semantics but it may put some constraint on the implementation and underlying structure.

Records #

Records are powerful under the right circumstances:

  • fast lookups, O(1), due to compile time indexing of keys, and fast stores for small record sizes (~50 values),
  • no memory overhead to store keys, only values and a name: 2 + N words consumption,
  • ease of use in function head matching.

However some of the drawbacks are:

  • compile-time dependency and forces header file inclusions for inter-module usage,
  • only atoms as keys,
  • keys are not accessible in runtime,
  • no dynamic access of values, i.e. we cannot use variables to access values,
  • it is not a data-type and cannot be distinguished from tuples.

When comparing maps with records the drawbacks are easily remedied by Maps, however the positive effects is not as easy to replicate in a built-in data-type where values are determined at runtime instead of at compile time.

  • Being faster than direct-indexing array, where indices and possibly the resulting value are determined at compile time, is hard. In fact it is impossible.
  • A memory model for Maps where the efficiency was near that of records could be achieved by essentially using two tuples, one for keys and one for values as demonstrated in Frames. This would be impact performance of updates on Maps with a large number of entries and thus constrain the capability of a dictionary approach.
  • Maps would be as easy, or even easier, to use with matching in function heads.

Protocol Construction #

Arguments for a simpler a JSON representation using frames or maps has been raised. Using frames and thereby atoms for dynamic creation of keys would be a serious drawback. Using maps would grant the possibility of string binaries to represent keys and would not put the global atom pool in disarray.

Pattern Matching Example: The JSON Files #

Changing json-decoding. Mochiwebs mochijson decodes Json dictionaries the as the following:

{"key": "value"} -> {struct, [{"key", "value"}]}

This could instead be:

{"key": "value"} -> #{ "key" => "value"}

Consider the following JSON examples, from json.org.

{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}

mochijson:decode/1 will currently look like:

{struct, [
    {"menu", {struct, [
        {"id","file"},
        {"value","File"},
        {"popup", {struct, [
            {"menuitem", {array, [
                {struct, [{"value","New"},{"onclick","CreateNewDoc()"}]},
                {struct, [{"value","Open"},{"onclick","OpenDoc()"}]},
                {struct, [{"value","Close"}, {"onclick","CloseDoc()"}]}
            ]}}
        ]}}
    ]}}
]}

mochijson:decode/1 could look like:

#{ "menu" => #{
    "id" => "file",
    "value" => "File",
    "popup" => #{
        "menuitem" => [
          #{ "value" => "New",   "onclick" => "CreateNewDoc()"},
          #{ "value" => "Open",  "onclick" => "OpenDoc()"},
          #{ "value" => "Close", "onclick" => "CloseDoc()"}
        ]
    }
}}

Let us find "menu" -> "popup" -> "menuitem".

Traversing the first structure is a bit awkward. We would have to do the following:

Decoded         = mochijson:decode(Json),
{struct, Menu}  = proplists:get_value("menu", Decoded),
{struct, PopUp} = proplists:get_value("popup", Menu),
{struct, Items} = proplists:get_value("menuitem", PopUp),

With maps it could look like the following:

#{ "menu"     := Menu  } = mochijson:decode(Json),
#{ "popup"    := PopUp } = Menu,
#{ "menuitem" := Items } = PopUp,

or even:

Decoded = mochijson:decode(Json),
#{ "menu" := #{ "popup" := #{ "menuitem" := Items } } } = Decoded,

With maps, and single value access, it could look really simple:

Decoded = mochijson:decode(Json),
Items = Decoded#{ "menu" }#{ "popup" }#{ "menuitem "}.

Open Questions #

We have some usage scenarios that are still open for debate. A proposed answer is given for each question and stems from discussions in this proposal.

  1. What type of keys will we need to store in our Map? Will atoms suffice?
    • It is the authors view that we should refrain from any key restrictions unless there is overwhelmingly evidence that we can gain something from such a restriction.
    • A non atom key restriction satisfies our goal of a powerful Map mechanism.
    • Proposal: Any term as keys.
  2. How many key-value associations will we store in our map?
    • This question has less to do with syntax and semantics than it has with the choice of the underlying implementation.
    • If we enable the user to add key-value pairs dynamically surely he will use it. Since it is a powerful mechanism not afforded to us by records the usage pattern will also be different. This will in all likelihood produce larger Maps than records are today. This implies that we cannot compare records sizes with that of maps sizes since the usage scenario would be different.
    • Proposal: Arbitrary number of keys-value pairs.
  3. How many Map instances will we create for each Map with a specific set of keys?
    • This question is closely related to how we use records and if Maps should emulate this behavior and this should have no impact on semantics, only implementation.
    • The significant difference is the memory overhead in the storing structure. Since memory overhead for keys and values has the same behavior as in any compound term or abstract data-type, i.e. dict or gb_trees, the main difference occurs when comparing maps to records and frames. To ensure a logarithmic worst-case performance in update or retrieval some sort tree structure would likely be used for maps. Maps would then stores keys together with its values whereas frames stores keys outside its value structure and records generates key indexes at compile-time. This would indicate a memory overhead for Maps over Frames and records for each instance.
    • Proposal: Two tier approach, similar to binaries. Use flat compact, key-sharing approach for few associations (~50 associations). Use sorted tree approach and store keys with values beyond first tier limit. The rationale being it is more likely to have multiple instance where we have few keys.
  4. Only allow updates of already defined keys within a Map in syntax?
    • The question stems from a record replacement approach and the argument for it is to mitigate typos, i.e. trying to update key behavior where key behaviour was actually intended. Instead of getting two different keys, a runtime exception occurs at this point.
    • This approach will deny any dictionary like behavior, for instance storing spawned processes as keys in the map using syntax.
    • Proposal: Allow for any key to be stored by default syntax, existing or not, and use a special syntax for setting of values of existing keys only.

The answers from these questions are instrumental to how we should design and implement Maps. What we also should keep in the back of our minds is that we will never get rid of records completely and some of the frames arguments might thus be superfluous.

Rationale #

What should we expect from our syntax? #

As stated earlier, the current syntax is not set in stone but what should we expect from it?

  1. First and foremost it has to be unambiguous, meaning the syntax must produce single clearly defined behavior that cannot be misinterpreted by humans nor machines.

    1.1 Here we also include the notion that similar syntax should have similar behavior, or at least not completely different behavior. For example records, #record{ key = Value }, have O(1) performance and 2 + N words memory overhead. If we use a similar syntax, i.e. #{ key = Value }, we should also expect similar behavior both in regard to semantics and performance for any and all sizes of the map.

  2. The syntax must be as short as possible. It must not use more characters than necessary to describe a certain behavior as long as it does not violate rule 1.

    2.2 We want to avoid verbosity in the language. Verbosity pushes away information from our field of vision and obfuscates it. This needs to be avoided.

Syntax choice for Maps #

The author argues for: => as delimiter in ‘set-or-update’ and := in ‘set-existing’ and ‘matching’ syntax.

In the examples below we use #{ Key => Value } to describe map semantics and use-cases, but this is only one suggestion out of many.

Several syntax proposals exists, frames proposes <{ key ~ Value }> syntax and another syntax suggestion is very similar to record syntax #{ key = Value }.

The current variable and atom definitions puts restrictions on what we can use as delimiters and leaves us ~ and = as the only sane single character delimiters we can use. The case is very well argued in Richard O’Keefes No more need for records (fifth draft).

Delimiter discussion #

Arguments against a = delimiter are:

  • It lacks distinction from match, consider #{ A = B = v }, does A match B or does B match v? Which = is a match operation and which = delimits the key-value pair?
  • It might be interpreted as Key ‘equal’ Value,

and hence = is in violation of rule #1 from What do we expect from our syntax?. The interpretation of this syntax is ambiguous.

Arguments against a ~ delimiter are:

  • it might be interpreted as Key ‘NOT’ Value as ~ is the bitwise NOT operator in C,
  • it might be interpreted as Key ‘about equal’ Value as ~ is similar to mathematics ,
  • it lacks typographical distinction, i.e. it lacks distinction from - in certain fonts, ex. K ~ V might be interpreted as K - V, consider #{ K - 1 ~ 2 - V },

and hence this is in violation of rule #1 from What do we expect from our syntax?. The interpretation of this syntax is ambiguous.

Two two-character delimiter suggestions are #{ Key := Value } and #{ Key => Value}, where := is a common denominator for assignment and => should be read as maps to. A two-character delimiter should be avoided if at all possible since it increases the syntax footprint of the source code.

The assignment delimiter reads well for just assignment but suffers from the same reversed logic flaw as records when it comes to pattern matching. The match #{ key := Value } = M reads match M to the map pattern where Value is equal to key. That does not read well unless we call the assignment delimiter := for something that its not meant to be.

However, := is also similar to =:= which means “is exactly equal”, i.e. matches. This is a valuable meaning since we have a difference between == and =:= when dealing with numbers and thus := could be a more correct delimiter for matching syntax.

The delimiter -> would be suitable choice if it weren’t for the fact that it would overload the function clause meaning.

Both -> and => might be confusing when dealing with binaries. Consider #{ <<"key">> -> <<"value">> } and #{ <<"key">> => <<"value">> }, where => appears to be slightly more confusing than ->.

Listing of delimiters from above perceived desirability:

  1. #{ K => V } - No ambiguity, no overloading, reads as an association
  2. #{ K := V } - No ambiguity, no overloading, reads as an assignment or exact match
  3. #{ K ~ V } - Typographical ambiguity, no overloading, no clear meaning
  4. #{ K -> V } - Overloads function clause head and body separator, reads as an association
  5. #{ K = V } - Overloads match operator, reads as a match or an assignment

Using := assignment for existing keys seems as a good choice. The choice for set-or-update is between => and ~.

The case for two set-or-update semantics and its syntax #

A case for two different ways to update values in a Map is proposed.

One syntax if, and only if, we want to update a value for an already existing key and another if we want to update the Map with any key.

  • Use M#{ K => V } to declare new key value pairs or update already existing keys
  • Use M#{ K := V } to update already existing keys.
  • Use #{ K := V } = M to match maps.

Example 1:

foo() ->
    M = #{ key1 => 1, key2 => 2 }, % M is declared with keys 'key1' and 'key2'
    bar(M).

bar(M) ->
    M#{
        key1 := "1",  %% 'key1' will be set to "1"
        key2 := "2",  %% 'key2' will be set to "2"
        key3 := "3"   %% this causes an exception since 'key3' does not exist in M
    }.

> foo().
** exception error: no match of 'key3' in map

Example 2:

foo() ->
    M = #{ key1 => 1, key2 => 2 }, % M is declared with keys 'key1' and 'key2'
    bar(M).

bar(M) ->
    M#{
        key1 => "1",  %% 'key1' will be set to "1"
        key2 => "2",  %% 'key2' will be set to "2"
        key3 => "3"   %% 'key3' will be set to "3"
    }.

> foo().
#{ key1 => 1, key2 => "2", key3 => "3" }

Impact of syntax footprint #

We must lessen the syntax footprint impact on the source code and the language.

Currently the two normal ways of sending options to a functions are either via records or property lists. Both have some drawbacks. Records are compile time dependent and syntactic sugar for tuples. Property lists are generic but produces a lot of texts when defining them and operating on them.

Consider this example when parsing a list of arguments:

args(Args) ->
     args(Args, [{analyze, false}, {suppression, false}, {target, none}]).

args(["-r" | Args], Opts) ->
    args(Args, [{analyze, true}     | proplists:delete(analyze, Opts)]);
args(["-s="++File | Args], Opts) ->
    args(Args, [{suppression, File} | proplists:delete(suppression, Opts)]);
args([Target], Opts) ->
    [{target, Target} | proplists:delete(target, Opts)].

The textual impact, the number of characters, is quite heavy when operating on property lists.

If we instead use some kind of map with syntax, how would that look?

args(Args) ->
    args(Args, #{ analyze => false, suppression => false, target => none}).

args(["-r" | Args], Opts)        -> args(Args, Opts#{ analyze := true });
args(["-s="++File | Args], Opts) -> args(Args, Opts#{ suppression := File});
args([Target], Opts)             -> Opts#{ target := Target }.

This looks cleaner in my opinion but that is a very subjective view. To use some data we can count the characters, and we see that the property lists example has 390 characters versus the map examples 306. Property lists uses almost 30% more characters in this example.

Semantics and API-functions #

List conversions #

Perhaps the most sane maps:from_list/1 semantics would be to have the key-value significance order in left to right, meaning the first association is used and the latter values with matching keys are ignored.

This differs from the dict:from_list/1 behavior.

Consider the following dict example:

[{a,2}] = dict:to_list(dict:from_list([{a,1}, {a,2}])).

By letting the leftmost be the most significant key we could simplify conversion from and to lists.

Current suggestion has the following semantics:

Ls = [{a,old}],
#{ a := old } = maps:from_list([{a,new}|Ls]).

The reversal would be:

Ls = [{a,old}],
#{ a := new } = maps:from_list([{a,new}|Ls]).

Equality and Ordering #

A restriction set on the implementation by the Erlang specification is that order is total, i.e. satisfies antisymmetry, transitivity and totality.

  • If M1 =< M2 and M2 =< M1 then M1 == M2,
  • If M1 =< M2 and M2 =< M3 then M1 =< M3,
  • If M1 =< M2 or M2 =< M1 (always comparable) where M1, M2 and M3 are any Map term.

This only holds true in Erlang if we treat floats and integers as union of types, namely numbers. In the case of a Maps, true = #{ 1.0 => V } == #{ 1 => V}.

  • The need for order arises in a few cases.
    • comparison, for example sorting, lists:sort([M1, .., Mn])
    • introspection, for example when printed.
  • Ordered maps impose restrictions on the underlying implementation and a hashing approach will be nearly impossible.
  • The underlying structure does not need to be sorted, an order could be produced when needed,
    • M1 < M2, would result in an internal sort but would cost O( N1 * lg N1 + N2 * lg N2 ), where N1 = maps:size(M1) and N2 = maps:size(M2)

Accessing a single value #

Do we need to have single access or is matching sufficient?

Consider the following,

V = M#{ K }

is shorter than

#{ K := V } = M

It also allows for easy access of associated values in deep structures.

The syntax for single value access is the least developed (and contemplated) feature in this proposal and certainly could use some input.

More over, the dot syntax must be abolished. Currently it is used for records but it will not be used for maps. Dot represents end of expression list in last clause, or end of attribute.

It cannot be used to distinguish between floats or associations.

Example:

1> M = #{ 1.1 => a, 1 => #{ 1 => b } }.
#{ 1 => #{ 1 => b }, 1.1 => a }.

2> #M.1.1.
a | b ?

Backwards Compatibility #

Erlang code written with Maps will only be parseable, loadable and executable on Erlang/OTP R17A and later releases of Erlang/OTP but not on previous releases.

Erlang code written before Erlang/OTP R17A will be perfectly compatible, i.e. parseable, loadable and executable with these Maps changes.

Distribution will not be backwards compatible.

Copyright #

This document has been placed in the public domain.