The Erlang Diagnostic Index is a standardized way to catalogue
diagnostic messages emitted by various tools and applications within the Erlang
ecosystem, including - but not limited to - the erlc
Erlang compiler,
the dialyzer
type checker and the ssl
application.
The indexes are not limited to Erlang/OTP, but can also be used by third-parties such as the Elixir language, the EqWAlizer type-checker or the Elvis code style reviewer.
Each diagnostic in an index is identified by a unique code and it is accompanied by a description, examples and possible courses of action. Diagnostic codes are namespaced based on the tool that generates them.
The namespaces for diagnostic indexes are not maintained centrally, but left up to the community to co-ordinate. It is recommended that one searches online before taking a namespace, just as when deciding on an application name.
Diagnostic codes can be leveraged by IDEs and language servers to provide better contextual information about errors and warnings and make them easier to search and reference. A standardized diagnostic index creates a common way for the community to provide extra examples and documentation.
The concept of an “Error Index” for a programming language is not a novel idea. Error catalogues already exist, for example, in the Rust and Haskell Communities.
Producing meaningful error messages can sometimes be challenging for developer tools such as compilers and type checkers due to various constraints, including limited context and character count.
By associating a unique code to each diagnostic we relieve tools from having to condense a lot of textual information into a - sometime cryptic - generic, single sentence. Furthermore, as specific wording of errors and warnings is improved over time, diagnostic codes remain constant, providing a search-engine friendly way to index and reference diagnostics.
A good example of this is the expression updates a literal warning message, introduced in Erlang/OTP 27. Given the following code:
-define(DEFAULT, #{timeout => 5000}).
updated(Value) ->
?DEFAULT#{timeout => Value}.
The compiler emits the following warning:
test.erl:8:11: Warning: expression updates a literal
% 8| ?DEFAULT#{timeout => 1000}.
% | ^
The meaning of the warning may not be obvious to everyone. Most importantly, the compiler provide no information on why the warning is raised and what a user could do about it. The user will then have to refer to a search engine, a forum or equivalent to proceed.
Conversely, we can associate a unique identifier to the code (say,
ERL-1234
):
test.erl:8:11: Warning: expression updates a literal (ERL-1234)
% 8| ?DEFAULT#{timeout => 1000}.
% | ^
The code makes it possible to link the warning message to an external resource (e.g. a wiki page), which contains all the required, additional, information about the error that would not be practical to present directly to the user. Here is an example of what the entry could look like for the above code:
Unique diagnostic codes also have the advantage to be better searchable in forums and chats, where the exact message could vary, but the diagnostic code would be the same.
Finally, diagnostic codes can be used by IDEs (e.g. via language servers) to match on diagnostic codes and provide contextual help. Both the ErlangLS and the ELP language server already use “unofficial” error codes.
A diagnostic code should be composed by two parts: an alphanumeric
namespace (three or more letters) and a numeric identifier (four or more digits),
divided by a dash (-
).
A potential set of namespaces could look like the following:
Namespace | Description |
---|---|
ERL | The Erlang compiler and related tools (linter, parser, scanner) |
DIA | The Dialyzer type-checker |
ELV | The Elvis code-style reviewer |
ELP | The Erlang Language Platform |
… | … |
A set of potential diagnostic codes could look like:
ERL-0123
DIA-0009
ELV-0015
ELP-0001
The exact number of characters/digits for each namespace and code is up to the tool. There can also be multiple namespaces within the same tool, for example the parser, scanner and linter of the Erlang compiler could have separate namespaces.
A diagnostic code must not be re-used. If a tool stops emitting a diagnostic code, the deprecated code is still documented in the index, together with a deprecation notice. This is to avoid re-using a single code for multiple purposes.
A diagnostic index will be associated with an Erlang application. The index
will be placed in the doc/diagnostics/
folder and the files should follow this
format:
$NAMESPACE-$CODE(-$ALIAS)?.$EXTENSION
where:
$NAMESPACE
- The namespace of the diagnostic, for example ERL
.
The NAMESPACE
must not contains any -
.$CODE
- The number of the diagnostic, for example 0001
. The code should
only be digits and be at least 4 digits long.$ALIAS
- an optional human-readable short-hand for the diagnostic, for example
update-literal
.$EXTENSION
- Any file extension, though only .md
can be rendered nicely
by erl
.To be able to fetch the detailed diagnostic information easily, some new APIs
will be introduced; the application:get_diagnostic/1,2
function, a
documentation_url
application key and the -explain
CLI argument.
application:get_diagnostic/1,2
#
-doc """
Fetches the data associated with the `DiagnosticCode`.
This function will search the `doc/diagnostics/` folders of all applications
in the code path looking for files with the [`rootname`](`file:rootname/1`) of
the `DiagnosticCode` in ether short or long form.
It will return all occurrences together with which application defined it,
the absolute filename of the file, the long and short diagnostic code
and the diagnostic file's contents.
Example:
```
> application:get_diagnostic("ERL-0001").
{ok, [#{application => compiler,
```erlang
filename => "/home/erlang/lib/compiler/doc/diagnostics/ERL-0001-update-literal.md",
url => "https://erlang.org/doc/ERL-0001-update-literal.html",
short => "ERL-0001",
long => "ERL-0001-update-literal",
doc => ~"# ERL-0001\n..."}]}
```
```
If no application defines the diagnostic code, then `{ok,[]}` is returned.
""".
-spec get_diagnostic(DiagnosticCode :: unicode:chardata()) ->
{ok, [#{application := atom(), filename := filename:name(),
url => uri_string:uri_string(),
short := string(), long := string(),
diagnostic := unicode:binary()}]}.
-doc """
Equivalent to `get_diagnostic/1`, but only searches a specific application for
the `DiagnosticCode`. Returns `error` if the code is not found.
""".
-spec get_diagnostic(Application :: atom(), DiagnosticCode :: unicode:chardata()) ->
{ok, #{application := atom(), filename := filename:name(),
url => uri_string:uri_string(),
short := string(), long := string(),
diagnostic := unicode:binary()}} | error.
documentation_url
application key #
The documentation_url
will be part of the .app
file and allow the application to specify the base address to find its documentation.
This is used to create the url for a specific diagnostic index, but can also be
used by ExDoc to point to non-hexdoc.pm documentation,
see elixir-lang/ex_doc#1975
for details.
The command line tools erl
, erlc
and dialyzer
will have options added called
-explain
(or --explain
) that can be used to print the data gotten from
application:get_diagnostic/1
. We will use shell_doc
to format any markdown
documents, while any other document type will be printed verbatim.
As different tools have different conventions regarding how to pass arguments,
it is recommended to follow the specific tools design in what the prefix explain
with. For example rebar3
could add a new command rebar3 explain
that would
print the explanation.
ExDoc will be extended to generate a new sidebar pane containing each diagnostic in an application together with an index page. Redirects will be generated for both the namespace-code and namespace-short-hand. That is all three of these will work:
The extension used by the diagnostic code files will be .diagnosticmd
,
following the pattern of .livemd
and .cheatmd
.
We should make it easy to print and fetch the diagnostic codes from rebar3,
so we should add rebar3 explain
and also make sure that compile:file
can
by customized so that the error/warning messages printed by it can have:
% help: call `rebar3 explain ERLC-0001` to see a detailed explanation
in their output.
Any layout is allowed for a diagnostic index, below is a recommended style in order to get a similar look and feel throughout the community.
# XYZ-ABCD - Short Title
## Example
```
Short example producing the error/warning/info
```
## Explanation
Longer text explaining the error/warning/info together with potential remedies,
more examples and references to the documentation.
For example:
# ERL-0001 - Function head mismatch
## Example
```erlang
%% foo.erl
-module(foo).
-export([foo/1]).
foo(0) -> 1;
boo(1) -> 2.
```
```bash
$ erlc foo.erl
foo.erl:5:1: head mismatch: previous function foo/1 is distinct from boo/1. [ERL-0001]
% 5| boo(1) -> 2.
% | ^
% help: call `erlc -explain ERL-0001` to see a detailed explanation
% help: Should the semicolon after foo/1 be replaced by a period?
```
## Explanation
The error message indicates that two function clauses belonging the same function
differ in their name or in the number of arguments.
In Erlang functions are uniquely identified by the module they belong to, the
function name and the number of argument they take (known as *arity*).
Each function can be composed by multiple *clauses*, separated by a semicolon (`;`).
Therefore, all clauses belonging to the same function have to share the same name.
To fix the error you need to ensure that every function clause has the same name
and that it takes the same number of arguments.
In the above example, `boo/1` could be a second clause for the `foo/1` function,
containing a typo. In that case, the corrective action would be to fix the typo:
```erlang
foo(0) -> 1;
foo(1) -> 2.
```
It could also be that `boo/1` is intended to be a completely different function.
In that case the error can be fixed by replacing the semicolon on the previous
line with a period. Leaving an empty line between the two functions would also
be a good idea, to help the reader understanding `foo/1` and `boo/1` are two
distinct functions:
```erlang
foo(0) -> 1.
boo(1) -> 2.
```
For more information about Erlang functions please refer to the
[Reference Manual](`e:system:ref_man_functions`).
When printing diagnostics, it is recommended that the short description is followed by the diagnostic code and a help text is printed explaining how to get more information is printed after. If no diagnostic code is available for the specific diagnostic, then the help text is not displayed. For example:
> erlc t.erl
t.erl:5:5: Warning: variable 'A' is unused [ERL-1001]
% 5| foo(A) -> ok.
% | ^
% note: `+warn_unused_vars` on by default
% help: call `erlc -explain ERL-1001` to see a detailed explanation
% help: rename the variable to '_A' to avoid this warning
This mimics how rustc prints diagnostics. Where possible the [ERL-1001]
should be printed with a http link to the docs (in the terminal using
\e]8
ANSI escape code).
To make it easier for language servers and IDEs, tools producing
diagnostics should produce diagnostics (errors and warnings) in a
standardized parsable format. This should be done
by specifying an extra option (for example --error-format json
).
A possible JSON format, heavily inspired by the LSP protocol, is:
{
uri: "file:///git/erlang/project/app/src/file.erl",
range: {
start: {
line: 5,
character: 23
},
end: {
line: 5,
character: 32
}
},
severity: "warning",
code: "DIA-1234",
doc_uri: "https://erlang.org/doc/apps/dialyzer/DIA-1234.html",
source: "dialyzer",
message: "This a descriptive error message from Dialyzer"
}
Where:
error
, warning
, information
, hint
.The standard library will be extended to help tools generate a standard look and feel for warnings and the machine readable json output. The exact API in the standard library is not part of this EEP.
Originally this EEP proposed a centralized catalog to store the namespace for the community. This would be better for discoverability and would eliminate the possibility of getting namespace clashes. However, we already have the issue of possible application + module name clashes and that works relatively well without any co-ordination. So to keep things simple, there will be no central catalog. We could possibly scrape hexdocs for applications with indexes and create a page with all indexes if there was a need.
Should this functionality be called an “Error Index” even if it includes things that are not errors? Or should it use a more general name, that is “Diagnostic Index”?
rustc and haskell seem to call theirs error code indexes. rustc
has
limited the index to only include errors, instead for warnings they
print the name of the warning option that triggered the warning. In Haskell
they seem to have errors, warnings and information items in the “Error Index”.
In this EEP I propose we use “diagnostic” index to include errors, warnings and info, but as diagnostic is a lot longer than “error” and maybe not as obvious a name, we may want to change this to error instead.
Another aspect to this discussion is to look a but closer at what rustc/gcc have decided to to do with warnings/info. They print the flags that enables the warning, not some error code. For example:
> erlc t.erl
t.erl:5:11: variable 'B' is unbound [ERL-0001]
% 5| foo(A) -> B.
% | ^
% help: call `erlc -explain ERLC-0001` to see a detailed explanation
t.erl:5:5: Warning: variable 'A' is unused [+warn_unused_vars]
% 5| foo(A) -> B.
% | ^
The two approaches do however not exclude each other, we could for warnings print both.
t.erl:5:5: Warning: variable 'A' is unused [ERL-0002]
% 5| foo(A) -> B.
% | ^
% note: `+warn_unused_vars` on by default
% help: call `erlc -explain ERLC-0002` to see a detailed explanation
We could implement the error index system only for Erlang/OTP, this is what rust and haskell do. It would make things simpler as we would not have to care about backward compatibility and could change things at any time.
However, there has been some interest in third-party projects to also create their own diagnostic codes (the first draft of this EEP was created by one such), it seems like this area can benefit from standardization.
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.