This module provides functions for string processing.
A string in this module is represented by
unicode:chardata(), that is, a list of codepoints,
binaries with UTF-8-encoded codepoints
(UTF-8 binaries), or a mix of the two.
"abcd" is a valid string
<<"abcd">> is a valid string
["abcd"] is a valid string
<<"abc..åäö"/utf8>> is a valid string
<<"abc..åäö">> is NOT a valid string,
but a binary with Latin-1-encoded codepoints
[<<"abc">>, "..åäö"] is a valid string
[atom] is NOT a valid string
This module operates on grapheme clusters. A grapheme cluster
is a user-perceived character, which can be represented by several
"å"  or [97, 778]
"e̊" [101, 778]
The string length of "ß↑e̊" is 3, even though it is represented by the
codepoints [223,8593,101,778] or the UTF-8 binary
Grapheme clusters for codepoints of class prepend
and non-modern (or decomposed) Hangul is not handled for performance
Splitting and appending strings is to be done on grapheme clusters
There is no verification that the results of appending strings are
valid or normalized.
Most of the functions expect all input to be normalized to one form,
see for example
Language or locale specific handling of input is not considered
in any function.
The functions can crash for non-valid input strings. For example,
the functions expect UTF-8 binaries but not all functions
verify that all binaries are encoded correctly.
Unless otherwise specified the return value type is the same as
the input type. That is, binary input returns binary output,
list input returns a list output, and mixed input can return a
1> string:trim(" sarah ").
2> string:trim(<<" sarah ">>).
3> string:lexemes("foo bar", " ").
4> string:lexemes(<<"foo bar">>, " ").
This module has been reworked in Erlang/OTP 20 to
unicode:chardata() and operate on grapheme
clusters. The old
functions that only work on Latin-1 lists as input
are still available but should not be used, they will be
deprecated in a future release.