Tom Davies <todavies5(at)gmail(dot)com>
Standards Track

EEP 63: Lightweight UTF-8 binary string literals and patterns #

Abstract #

This EEP proposes new syntax for UTF-8 binary string literals and patterns to bring them into line with list-strings.

List-strings (i.e. strings represented by lists of unicode codepoints) have a convenient syntax: "This is my string", but the corresponding syntax for UTF-8 encoded binary strings is more cumbersome: <<"This is my string"/utf8>>.

Here, we propose a lightweight, alternative syntax for UTF-8 binary string literals:

Str = b"This is my string".

and patterns:

case Str of
  b"This is my string" -> ok;
  _ -> error

Implementation outline #

Early during compilation and shell evaluation, the new syntax would be desugared to the corresponding existing syntax, e.g. b"This is my string" is rewritten to <<"This is my string"/utf8>>.

Reference Implementation #


Backward compatibility #

The new syntax is invalid in older releases, so would not impact existing code.

The implementation of the new syntax would purely be via an early re-writing step, so would be desugared to the existing representation before later compiler stages. This would mean bytecode would be unaffected, but debugging/AST data would reflect the new source represenation.

Copyright #

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.