Per Gustafsson <pergu(at)it(dot)uu(dot)se>
Final/R12B-0 Proposal is implemented in OTP release R12B-0
Standards Track

EEP 4: New BIFs for bit-level binaries (bit strings) #

Abstract #

This EEP describes the introduction of bit level binaries to the Erlang programming language. They can be constructed and manipulated using the bit syntax and a new set of BIFs which operate on bit-level binaries. These new BIFs are introduced in order to not alter the semantics of existing BIFs which operate on binaries, but instead implement similar operations for bit level binaries using new BIFs.

Definitions #

A bit-level binary in this document called a bit string is a sequence of bits of any length. A binary on the other hand is a sequence of bits where the number of bits is evenly divisible by eight. These definitions implies that any binary is also a bit string.

Manipulating bit strings using the bit syntax #

A bit syntax expression:


Evaluates to a bit string, if the sum of the sizes of all segments in the expression is divisible by eight the result is also a binary. Previously such expression could only evaluate to binaries and a runtime error was raised if this was not the case.

With this extension the expression Bin = <<1:9>> which previously caused a runtime error now creates a 9-bit binary. To be able to use this bit string to build a new bigger bit string we can write:

<<Bin/bitstring, 0:1>>

Note the use of bitstring as the type. This expands to binary-unit:1 where as the binary type would have expanded to binary-unit:8. Since bitstring is a long word to write in a binary pattern there is an alias bits which is used in the rest of this proposal, similarly for binary there is a shorthand bytes.

To match out a bit-level binary we also use the bit string type as in ::

case Bin of
  <<1:1,Rest/bits>> -> Rest;
  <<0:1,_/bits>> -> 0

This allows us to avoid situations were we would have to calculate padding.

Specifications #

bitsize/1::bitstring() -> integer()

Returns the size of a bit string in bits. This BIF is allowed in guards.

list_to_bitstring/1::bitstring_list() -> bitstr()

bitstring_list = cons(char() | bitstring()| bitstring_list(), bitstring() | bitstring_list) | nil()

Concatenates the bit strings and chars in the bitstring_list to create a bit string, each char becomes an 8-bit bit string.

is_bitstring/1::any() -> bool()

Returns true if the argument is a bit string, otherwise it returns false. This BIF is allowed in guards.

bitstring_to_list/1::bitstring() -> [char()|bitstring()]

Turns a bit string into a list of characters and if the number of bits in the bit string is not evenly divisible by eight the last element in the list is a bit string consisting of the last 1-7 bits of the original bit string.

Rationale #

The current definition of binaries makes it complicated to use the bit syntax for decoding when the format is not byte oriented, because the programmer is always forced to pad the binaries that he is using to become a sequence of bytes. Allowing bit-level binaries alleviates this problem.

The new BIFs proposed here are intended to give programmers the same tools to manipulate bit-level binaries as they are used to when manipulating binaries without changing the semantics of already existing BIFs and maintain properties such as if this statement:

is_binary(X) andalso size(X) =:= 0

evaluates to true then that implies that X = <<>>.

Implementation #

The extensions described in this document are either already implemented in R11B-4, but protected by a compiler switched or can be easily implemented.

Backwards Compatibility #

This change will not be entirely backward compatible for example: N=9, <<1:N>> would cause an error in the old system and now it would evaluate to a bit string.

The new BIFs are intended to give the same expressiveness for handling bit-level binaries as we have for ordinary binaries without changing the semantics of the BIFs for binaries such as size/1, binary_to_list/1, list_to_binary/1 etc.. This means that all such BIFs will throw an exception if their arguments contains bit strings.