[Erlang Systems]

2 How to interpret the Erlang crash dumps

This document describes the erl_crash.dump file generated upon abnormal exit of the Erlang runtime system.

The system will write the crash dump in the current directory of the emulator or in the file pointed out by the environment variable (whatever that means on the current operating system) ERL_CRASH_DUMP. For a crash dump to be written, there has to be a writable file system mounted.

Crash dumps are written mainly for one of two reasons; Either the builtin function erlang:halt/1 is called explicitly from running Erlang code or else the runtime system has detected an error that cannot be handled. The most usual reason that the system can't handle the error is that the cause is external limitations, such as running out of memory. A crash dump due to an internal error may be caused by the system reaching limits in the emulator itself (like the number of simultaneous atoms in the system, or too many simultaneous ets tables). Usually the emulator or the operating system can be reconfigured to avoid the crash, why interpreting the crash dump correctly is important.

2.1 Reasons for crash dumps

The reason for the dump is noted in the beginning of the file as Slogan: <reason> (the word Slogan has historical reasons). If the system is halted by the bif erlang:halt/1, the slogan is exactly the parameter passed to the bif (a string), otherwise it is a description generated by the emulator or the (Erlang) kernel. Normally the message should be enough to understand the problem, but nevertheless some messages are described here. Note however that the suggested reasons for the crash are only suggestions. The exact reasons for the errors may vary depending on the local applications and the underlying operating system.

Other errors than the one's mentioned above may occur, as the erlang:halt/1 bif may generate any message. If the message is not generated by the bif and does not occur in the list above, it may be due to an error in the emulator. There may however be unusual messages that I haven't mentioned, that still are connected to an application failure. There is a lot more information available, so more thorough reading of the crash dump may reveal the crash reason. The size of processes, the number of ets tables and the Erlang data on each process stack can be quite useful for tracking down the problem.

2.2 Process information

After the general information in the crash dump (the date, slogan and version information) follows a listing of each living Erlang process in the system. The process information for one process may look like this (except for the line numbers of course):

(1)  <0.2.0> Waiting. Registered as: erl_prim_loader
(2)  Spawned as: erl_prim_loader:start_it/4
(3)  Message buffer data: 262 words
(4)  Link list: [<0.0.0>,<0,1>]
(5)  Dictionary: [{fake, entry}]
(6)  Reductions 2194 stack+heap 987 old_heap_sz=987 
(7)  Heap unused=85 OldHeap unused=987
(8)  Stack dump:
(9)  130ef8     Blank a
(10)  130ef4    Blank a
(11) 130ef0     Blank a
(12) 130eec     <0.1.0>
(13) 130ee8     {state,[],none,get_from_port_efile,stop_port,exit_port,\ 
     <0,1>,infinity,dummy_in_handler}
(14) 130ee4     ["/ldisk/r6a_test/kernel_test","/ldisk/r6a_dev/lib/\
     kernel-2.4/ebin","/ldisk/r6a_dev/lib/stdlib-1.7/ebin","/ldisk/\
     r6a_dev/lib/sasl-1.8.1/ebin"]
(15) 130ee0     Continuation pointer b754c, 
(16) i = 0x125750, cp = 0xb754c, arity = 0, 125710: erl_prim_loader:loop/3, 

Each line of the output should be interpreted as follows:

2.3 Internal table information

This section mostly contains information for runtime system developers. What can be of interest is the following fields:

The rest of the information is only of interest for runtime system developers.

2.4 ETS tables

This section contains information about all the ETS tables in the system. The following fields are interesting for each table:

2.5 Timers

This section contains information about all the timers started with the bif erlang:start_timer/3. Each line includes the message to be sent, the pid to receive the message and how many milliseconds there was left until the message would have been sent.

2.6 Loaded module information

This is a list of all loaded modules, together with the memory usage of each module, in bytes. Note that loaded code is usually larger than the packed format in the beam files.

At the end of the list, the memory usage by loaded code is summarized. There is one field for "Current code" which is code that is the current latest version of the modules. There is also a field for "Old code" which is code where there exists a newer version in the system, but the old version is not yet purged.

2.7 Atoms

Now all the atoms in the system are written. This is only interesting if one suspects that dynamic generation of atoms could be a problem, otherwise this section can be ignored.

2.8 Disclaimer

The format of the crash dump evolves between releases of OTP. Some information here may not apply to your version. A description as this will never be complete, it's meant as an explanation of the crash dump in general and as a help when trying to find application errors, not as a complete specification.


Copyright © 1991-2000 Ericsson Utvecklings AB