Author:: Roberto Aloi <prof3ta(at)gmail(dot)com> , Lukas Backström <lukas(at)erlang(dot)org>
Status:: Accepted
Type:: Standards Track
Created:: 11-Nov-2024
Erlang-Version:: OTP-28
Post-History:: Updated April 2025 to use non-centralized index

EEP 74: Erlang Diagnostic Index #

Abstract #

The Erlang Diagnostic Index is a standardized way to catalogue diagnostic messages emitted by various tools and applications within the Erlang ecosystem, including - but not limited to - the erlc Erlang compiler, the dialyzer type checker and the ssl application.

The indexes are not limited to Erlang/OTP, but can also be used by third-parties such as the Elixir language, the EqWAlizer type-checker or the Elvis code style reviewer.

Each diagnostic in an index is identified by a unique code and it is accompanied by a description, examples and possible courses of action. Diagnostic codes are namespaced based on the tool that generates them.

The namespaces for diagnostic indexes are not maintained centrally, but left up to the community to co-ordinate. It is recommended that one searches online before taking a namespace, just as when deciding on an application name.

Diagnostic codes can be leveraged by IDEs and language servers to provide better contextual information about errors and warnings and make them easier to search and reference. A standardized diagnostic index creates a common way for the community to provide extra examples and documentation.

Rationale #

The concept of an “Error Index” for a programming language is not a novel idea. Error catalogues already exist, for example, in the Rust and Haskell Communities.

Producing meaningful error messages can sometimes be challenging for developer tools such as compilers and type checkers due to various constraints, including limited context and character count.

By associating a unique code to each diagnostic we relieve tools from having to condense a lot of textual information into a - sometime cryptic - generic, single sentence. Furthermore, as specific wording of errors and warnings is improved over time, diagnostic codes remain constant, providing a search-engine friendly way to index and reference diagnostics.

A good example of this is the expression updates a literal warning message, introduced in Erlang/OTP 27. Given the following code:

-define(DEFAULT, #{timeout => 5000}).

updated(Value) ->
  ?DEFAULT#{timeout => Value}.

The compiler emits the following warning:

test.erl:8:11: Warning: expression updates a literal
    %    8|   ?DEFAULT#{timeout => 1000}.
    %     |           ^

The meaning of the warning may not be obvious to everyone. Most importantly, the compiler provide no information on why the warning is raised and what a user could do about it. The user will then have to refer to a search engine, a forum or equivalent to proceed.

Conversely, we can associate a unique identifier to the code (say, ERL-1234):

test.erl:8:11: Warning: expression updates a literal (ERL-1234)
    %    8|   ?DEFAULT#{timeout => 1000}.
    %     |           ^

The code makes it possible to link the warning message to an external resource (e.g. a wiki page), which contains all the required, additional, information about the error that would not be practical to present directly to the user. Here is an example of what the entry could look like for the above code:

Erlang Error Index Sample Entry

Unique diagnostic codes also have the advantage to be better searchable in forums and chats, where the exact message could vary, but the diagnostic code would be the same.

Finally, diagnostic codes can be used by IDEs (e.g. via language servers) to match on diagnostic codes and provide contextual help. Both the ErlangLS and the ELP language server already use “unofficial” error codes.

Implementation #

Diagnostic Codes #

A diagnostic code should be composed by two parts: an alphanumeric namespace (three or more letters) and a numeric identifier (four or more digits), divided by a dash (-).

A potential set of namespaces could look like the following:

Namespace	Description
ERL	The Erlang compiler and related tools (linter, parser, scanner)
DIA	The Dialyzer type-checker
ELV	The Elvis code-style reviewer
ELP	The Erlang Language Platform
…	…

A set of potential diagnostic codes could look like:

ERL-0123
DIA-0009
ELV-0015
ELP-0001

The exact number of characters/digits for each namespace and code is up to the tool. There can also be multiple namespaces within the same tool, for example the parser, scanner and linter of the Erlang compiler could have separate namespaces.

A diagnostic code must not be re-used. If a tool stops emitting a diagnostic code, the deprecated code is still documented in the index, together with a deprecation notice. This is to avoid re-using a single code for multiple purposes.

Location of Diagnostic Index #

A diagnostic index will be associated with an Erlang application. The index will be placed in the doc/diagnostics/ folder and the files should follow this format:

$NAMESPACE-$CODE(-$ALIAS)?.$EXTENSION

where:

$NAMESPACE - The namespace of the diagnostic, for example ERL. The NAMESPACE must not contains any -.
$CODE - The number of the diagnostic, for example 0001. The code should only be digits and be at least 4 digits long.
$ALIAS - an optional human-readable short-hand for the diagnostic, for example update-literal.
$EXTENSION - Any file extension, though only .md can be rendered nicely by erl.

Additions to Erlang/OTP #

To be able to fetch the detailed diagnostic information easily, some new APIs will be introduced; the application:get_diagnostic/1,2 function, a documentation_url application key and the -explain CLI argument.

`application:get_diagnostic/1,2` #

-doc """
Fetches the data associated with the `DiagnosticCode`.

This function will search the `doc/diagnostics/` folders of all applications
in the code path looking for files with the [`rootname`](`file:rootname/1`) of
the `DiagnosticCode` in ether short or long form.

It will return all occurrences together with which application defined it,
the absolute filename of the file, the long and short diagnostic code
and the diagnostic file's contents.

Example:

```
> application:get_diagnostic("ERL-0001").
{ok, [#{application => compiler,
```erlang
    filename => "/home/erlang/lib/compiler/doc/diagnostics/ERL-0001-update-literal.md",
    url => "https://erlang.org/doc/ERL-0001-update-literal.html",
    short => "ERL-0001",
    long => "ERL-0001-update-literal",
    doc => ~"# ERL-0001\n..."}]}
```
```

If no application defines the diagnostic code, then `{ok,[]}` is returned.

""".
-spec get_diagnostic(DiagnosticCode :: unicode:chardata()) ->
  {ok, [#{application := atom(), filename := filename:name(),
          url => uri_string:uri_string(),
          short := string(), long := string(),
          diagnostic := unicode:binary()}]}.

-doc """
Equivalent to `get_diagnostic/1`, but only searches a specific application for
the `DiagnosticCode`. Returns `error` if the code is not found.
""".
-spec get_diagnostic(Application :: atom(), DiagnosticCode :: unicode:chardata()) ->
  {ok, #{application := atom(), filename := filename:name(),
         url => uri_string:uri_string(),
         short := string(), long := string(),
         diagnostic := unicode:binary()}} | error.

`documentation_url` application key #

The documentation_url will be part of the .app file and allow the application to specify the base address to find its documentation. This is used to create the url for a specific diagnostic index, but can also be used by ExDoc to point to non-hexdoc.pm documentation, see elixir-lang/ex_doc#1975 for details.

-explain or –explain #

The command line tools erl, erlc and dialyzer will have options added called -explain (or --explain) that can be used to print the data gotten from application:get_diagnostic/1. We will use shell_doc to format any markdown documents, while any other document type will be printed verbatim.

As different tools have different conventions regarding how to pass arguments, it is recommended to follow the specific tools design in what the prefix explain with. For example rebar3 could add a new command rebar3 explain that would print the explanation.

Additions to ExDoc #

ExDoc will be extended to generate a new sidebar pane containing each diagnostic in an application together with an index page. Redirects will be generated for both the namespace-code and namespace-short-hand. That is all three of these will work:

hexdocs.pm/elvis/ELV-0001.html
hexdocs.pm/elvis/ELV-dancing-elvis.html
hexdocs.pm/elvis/ELV-0001-dancing-elvis.html

The extension used by the diagnostic code files will be .diagnosticmd, following the pattern of .livemd and .cheatmd.

Additions to rebar3 #

We should make it easy to print and fetch the diagnostic codes from rebar3, so we should add rebar3 explain and also make sure that compile:file can by customized so that the error/warning messages printed by it can have:

%  help: call `rebar3 explain ERLC-0001` to see a detailed explanation

in their output.

Recommended style of diagnostic code file #

Any layout is allowed for a diagnostic index, below is a recommended style in order to get a similar look and feel throughout the community.

# XYZ-ABCD - Short Title

## Example

```
Short example producing the error/warning/info
```

## Explanation

Longer text explaining the error/warning/info together with potential remedies,
more examples and references to the documentation.

For example:

# ERL-0001 - Function head mismatch

## Example

```erlang
%% foo.erl
-module(foo).
-export([foo/1]).
foo(0) -> 1;
boo(1) -> 2.
```

```bash
$ erlc foo.erl
foo.erl:5:1: head mismatch: previous function foo/1 is distinct from boo/1. [ERL-0001]
%    5| boo(1) -> 2.
%     | ^
%  help: call `erlc -explain ERL-0001` to see a detailed explanation
%  help: Should the semicolon after foo/1 be replaced by a period?
```

## Explanation

The error message indicates that two function clauses belonging the same function
differ in their name or in the number of arguments.

In Erlang functions are uniquely identified by the module they belong to, the
function name and the number of argument they take (known as *arity*).
Each function can be composed by multiple *clauses*, separated by a semicolon (`;`).
Therefore, all clauses belonging to the same function have to share the same name.

To fix the error you need to ensure that every function clause has the same name
and that it takes the same number of arguments.

In the above example, `boo/1` could be a second clause for the `foo/1` function,
containing a typo. In that case, the corrective action would be to fix the typo:

```erlang
foo(0) -> 1;
foo(1) -> 2.
```

It could also be that `boo/1` is intended to be a completely different function.
In that case the error can be fixed by replacing the semicolon on the previous
line with a period. Leaving an empty line between the two functions would also
be a good idea, to help the reader understanding `foo/1` and `boo/1` are two
distinct functions:

```erlang
foo(0) -> 1.
boo(1) -> 2.
```

For more information about Erlang functions please refer to the
[Reference Manual](`e:system:ref_man_functions`).

Recommended output from tools using diagnostic codes #

When printing diagnostics, it is recommended that the short description is followed by the diagnostic code and a help text is printed explaining how to get more information is printed after. If no diagnostic code is available for the specific diagnostic, then the help text is not displayed. For example:

> erlc t.erl
t.erl:5:5: Warning: variable 'A' is unused [ERL-1001]
%    5| foo(A) -> ok.
%     |     ^
%  note: `+warn_unused_vars` on by default
%  help: call `erlc -explain ERL-1001` to see a detailed explanation
%  help: rename the variable to '_A' to avoid this warning

This mimics how rustc prints diagnostics. Where possible the [ERL-1001] should be printed with a http link to the docs (in the terminal using \e]8 ANSI escape code).

To make it easier for language servers and IDEs, tools producing diagnostics should produce diagnostics (errors and warnings) in a standardized parsable format. This should be done by specifying an extra option (for example --error-format json).

A possible JSON format, heavily inspired by the LSP protocol, is:

{
  uri: "file:///git/erlang/project/app/src/file.erl",
  range: {
    start: {
      line: 5,
      character: 23
    },
    end: {
      line: 5,
      character: 32
    }
  },
  severity: "warning",
  code: "DIA-1234",
  doc_uri: "https://erlang.org/doc/apps/dialyzer/DIA-1234.html",
  source: "dialyzer",
  message: "This a descriptive error message from Dialyzer"
}

Where:

uri: The path of the file the diagnostic refers to, expressed using the RFC 3986 format
range: The range at which the message applies, zero-based. The range should be as strict as possible. For example, if warning the user that a record is unused, the range of the diagnostic should only cover the name of the record and not the entire definition. This minimizes the distraction for the user when, for example, rendered as a squiggly line, while conveying the same information.
severity: The diagnostic’s severity. Allowed values are error, warning, information, hint.
code: A unique error code identifying the error
doc_uri: A URI to open with more information about the diagnostic error
source: A human-readable string describing the source of the diagnostic
message: A short, textual description of the error. The message should be general enough and make sense in isolation.

The standard library will be extended to help tools generate a standard look and feel for warnings and the machine readable json output. The exact API in the standard library is not part of this EEP.

Alternative solutions #

Centralized catalog #

Originally this EEP proposed a centralized catalog to store the namespace for the community. This would be better for discoverability and would eliminate the possibility of getting namespace clashes. However, we already have the issue of possible application + module name clashes and that works relatively well without any co-ordination. So to keep things simple, there will be no central catalog. We could possibly scrape hexdocs for applications with indexes and create a page with all indexes if there was a need.

Error Index or Diagnostic Index #

Should this functionality be called an “Error Index” even if it includes things that are not errors? Or should it use a more general name, that is “Diagnostic Index”?

rustc and haskell seem to call theirs error code indexes. rustc has limited the index to only include errors, instead for warnings they print the name of the warning option that triggered the warning. In Haskell they seem to have errors, warnings and information items in the “Error Index”.

In this EEP I propose we use “diagnostic” index to include errors, warnings and info, but as diagnostic is a lot longer than “error” and maybe not as obvious a name, we may want to change this to error instead.

Another aspect to this discussion is to look a but closer at what rustc/gcc have decided to to do with warnings/info. They print the flags that enables the warning, not some error code. For example:

> erlc t.erl
t.erl:5:11: variable 'B' is unbound [ERL-0001]
%    5| foo(A) -> B.
%     |           ^
%  help: call `erlc -explain ERLC-0001` to see a detailed explanation

t.erl:5:5: Warning: variable 'A' is unused [+warn_unused_vars]
%    5| foo(A) -> B.
%     |     ^

The two approaches do however not exclude each other, we could for warnings print both.

t.erl:5:5: Warning: variable 'A' is unused [ERL-0002]
%    5| foo(A) -> B.
%     |     ^
%  note: `+warn_unused_vars` on by default
%  help: call `erlc -explain ERLC-0002` to see a detailed explanation

Only for Erlang/OTP repo #

We could implement the error index system only for Erlang/OTP, this is what rust and haskell do. It would make things simpler as we would not have to care about backward compatibility and could change things at any time.

However, there has been some interest in third-party projects to also create their own diagnostic codes (the first draft of this EEP was created by one such), it seems like this area can benefit from standardization.

Copyright #

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.