[erlang-questions] cookbook entry #1 - unicode/UTF-8 strings

Anthony Ramine nox@REDACTED
Thu Oct 20 11:18:03 CEST 2011


Le 19 oct. 2011 à 22:14, Michael Uvarov a écrit :

> Q: Is it easy to work with a list of code points?
> A: Both yes and no.
> Advantages:
> I you have an algorithm, which is based on code-paint processing, then
> it will be easy to implement. If you only pass text from point A to
> point B, I suggest keep a string as a binary. Also you can use both
> UTF-8 binaries and lists together to create an iolist from them.
> 
> -- 
> Best regards,
> Uvarov Michael

From what I understand, iolist() have no notion of encoding whatsoever
and don't represent code points or characters, they are just sequences of
bytes.

Even though the typespec documentation says they can contain chars [1],
erl says otherwise:

1> iolist_to_binary([16#10ffff]).
** exception error: bad argument
     in function  iolist_to_binary/1
        called as iolist_to_binary([1114111])

See also how io:format/2's "t" modifier behaves when used with "~s" [2],
iolist() and unicode:charlist() [3] are not the same types.

That has been already discussed on the ml a few months ago [4].

[1] http://www.erlang.org/doc/reference_manual/typespec.html
[2] http://www.erlang.org/doc/man/io.html#format-2
[3] http://www.erlang.org/doc/man/unicode.html
[4] http://erlang.org/pipermail/erlang-questions/2011-May/058012.html

-- 
Anthony Ramine / @nokusu
Dev:Extend — http://dev-extend.eu/
So as I pray, “Unlimited Erlang Works”




More information about the erlang-questions mailing list