[erlang-bugs] R16 breaks dots

Richard Carlsson carlsson.richard@REDACTED
Sat Mar 30 23:53:43 CET 2013


On 2013-03-30 10:42, Anthony Ramine wrote:
> I do want to know why dots aren't allowed in atoms anymore
> and would like to see them back too.

As Fred already mentioned, this feature was added as part of the 
"packages" and was removed along with them.

> It was pretty useful to be able to write unquoted fully-qualified
> node names in the prompt, e.g. foo@REDACTED

I think that many agree on this, and maybe the OTP team can be convinced 
to take this part back. It should be pretty simple to extract the 
relevant code from the commit that removes packages.

> Furthermore, it feels to me like their removal was a mistake, as
> demonstrated by this:
>
> 1> foo.bar. * 1: syntax error before: '.' 1> foo. bar. foo 2> bar.
> bar
>
> What you can see here is that the blanks after a dot are still
> mandatory to properly parse a '.' character as a 'dot' token,
> terminating an expression in the shell (or a form in a module), this
> was mandatory to distinguish dot terminators from dots in atoms.
>
> If dots are really to not be allowed anymore in atoms, the blanks
> should be made optional, to be consistent with the rest of the
> language where blanks are optional before or after a symbol (with the
> notable exception of a match '=' followed by a binary literal
> '<<...>>').

This is not quite how the grammar works. First of all, the 'dot' token 
is identified as a "." followed by whitespace or a comment or EOF, and 
the packages addition did not change that. However, periods that are not 
a dot token or part of any other token are seen as '.' tokens. For example:

1> erl_scan:string("foo.bar. ").
{ok,[{atom,1,foo},{'.',1},{atom,1,bar},{dot,1}],1}
2> erl_scan:string("foo. bar. ").
{ok,[{atom,1,foo},{dot,1},{atom,1,bar},{dot,1}],1}

Now, the Erlang parser works on complete "forms" at a time - these are 
the token sequences that are terminated by dot tokens. In the first 
case, you have one form containing three tokens. In the second case, you 
have two forms containing one token each. Blanks cannot be made optional 
after periods, because you must be able to distinguish between token 
sequences like these.

It's also the case that you can't just change the scanning of atoms to 
allow periods as part of the atom token - in that case, the scanner 
would report a single atom for "foo.bar" instead of three tokens 'foo' 
'.' 'bar', and then the grammar would not be able to identify phrases 
like "Rec#foo.bar" or "#foo.bar". To support dotted atoms, the packages 
added a grammar rule that allowed a seqence <atom> '.' ... <atom> to be 
merged into a single atom unless it was part of another rule such as '#' 
<atom> '.' <atom>. (I think that Haskell had to do some similar tricks 
with their grammar to allow dotted names.) This could easily be put back 
in there. But at no point has it been the case in Erlang that unquoted 
atom tokens could contain periods.

     /Richard



More information about the erlang-bugs mailing list