4 Corrective Maintenance
There are a couple of problems that may occur and are related to the installation of the system rather than to faulty Erlang code.
4.1 Trouble-shooting
If the Erlang system fails to start properly, the most likely cause is either that Erlang has been installed in an incorrect way, or that the advanced Erlang shell is unable to deal with the terminal type in use. The latter case can be easily identified, since Erlang in that case can be started with
erl -oldshell
(but the advanced editing features will, of course, not work). If this occurs, please submit a problem report, and take care to report the environment $TERM definition.A typical example of an incorrect installation is where the top level directory is incorrectly specified to the
Install
script. Check the path name specified forRootdir
in the installederl
script. If this is incorrect, rerun theInstall
script with the correct path name.Users running the Sun OpenWindows should note that the advanced shell does not work correctly in
cmdtool
orshelltool
if the scrollbar is activated. This is due to deficiencies in these programs. Either deactivate the scrollbar, useerl -oldshell
, or use, for example,xterm
instead.Using distributed Erlang is simpler if the environment is set up to use the DNS system for host name/address lookups, but that is not required. Further details about trouble-shooting for this case can be found below.
4.2 Trouble-shooting Distributed Erlang
There are a number of things that can be wrong when starting distributed Erlang. If you cannot understand the following, ask your your system administrator for help.
Distributed Erlang can be started either with the
-name
or the-sname
flag.To use distributed Erlang in a Wide Area Network environment it is necessary to use the
-name
flag when starting nodes. If this is done, Erlang will use the mechanism for lookup of IP-addresses of hosts as specified at the installation.If in subsequent operations Erlang is supplied with a node name, e.g.
foobar@super.eua.ericsson.se
, Erlang will contactepmd
(Erlang Port Mapper Daemon) at the hostsuper.eua.ericsson.se
in order to find the address of the node calledfoobar
there.At installation of Erlang it is specified by the administrator installing the system if only DNS should be used for address lookup, or if the mechanism provided by the underlying operation system shall be used (the latter man very well use DNS only, or a combination of DNS and lookups in the
/etc/hosts
file).If DNS is used, all hosts involved must be properly configured for DNS, see the UNIX manual page
named(8)
. The following is a list of some of the things that can go wrong on a UNIX system where DNS is supposed to work.
- Non existing or erroneous
/etc/resolv.conf
file. If this file is does not exist on your system, contact your system administrator, or run Erlang on a computer which has this file.
- Host not registered with DNS name server. To check if your host is known to a name server use the program
/usr/etc/nslookup
(SunOS 4) or/usr/sbin/nslookup
(Solaris 2),
% nslookup Default Server: super.eua.ericsson.se Address: 134.138.199.16 > gin Server: super.eua.ericsson.se Address: 134.138.199.16 Name: gin.eua.ericsson.se Address: 134.138.199.53 > xi-term1 Server: super.eua.ericsson.se Address: 134.138.199.16 *** super.eua.ericsson.se can't find xi-term1: Non-existent domain > exit %The code above is an example session withnslookup
. First a question about the hostgin
is asked. This is ok. Then a question about the hostxi-term1
is asked. This is a host at the site known to NIS, butnamed
did not recognize the host. For this reason distributed Erlang cannot run at all on the host xi-term1.
- The portnumber of
epmd
, seeepmd(3)
, is already used by an other program. In Erlang R2A (Erlang 4.5) port 4368 is used for epmd. If this port is used by an other program, distributed Erlang cannot run. The following checks if port 4368 is already used (use/usr/ucb/netstat
on SunOS 4, and/usr/bin/netstat
on Solaris 2),
% epmd -kill Killed % sleep 120 % /usr/ucb/netstat -a | grep 4368 %The code above does the following:
The file /etc/services must also be checked:
- Aborts epmd at the host. The epmd exec file is under the erlang/bin directory
- Is inactive for a period to let tcp connections disappear
- Checks if any other program is using the port. If the netstat command produces any output, there is an error message. This means that another program is occupying the epmd port. If possible, remove this program and try again.
% cat /etc/services | grep 4368 % ypcat services | grep 4368If any of the above commands produce any output, there is a problem as it is not possible to choose a port number on the command line. The only solution is to try to move the other obstructing program to an other tcp/port number.
If DNS is not used at your site at all, Erlang should not be installed with the option of using DNS only.
The epmd program can be used for checking a host in a simlar way to nslookup.
% epmd -hinfo netsim-server.tei.ericsson.se official host name: netsim-server.tei.ericsson.se addr type = 2, addr length = 4 Internet address: 141.137.93.20 Cant't get hostbyaddr() on host 20.93.137.141.in-addr.arpa Bad IPaddr == 141.137.93.20 % epmd -hinfo super official host name: super.eua.ericsson.se addr type = 2, addr length = 4 Internet address: 134.138.199.16 %The above is a transcript from a session with epmd. The first host called
netsimserver.tei.ericsson.se
apparently had some problems.If distributed Erlang is run with the
-sname
flag, it can be run in an environment where DNS is not running at all. If the/etc/resolv.conf
file is not present, the resolver library routines togethostbyname()
andgethostbyaddr()
will resort to reading the/etc/hosts
file, which must then contain the IP address and names of all hosts which are to be used for distributed Erlang applications. The/etc/hosts
file can also contain the names of non local hosts (over a WAN), but if name lookup is used over a WAN, there may be problems if the initial part of a host are the same in two different domains. If Erlang is run over a WAN, DNS is thus the recommended method.A configuration where it is necessary to use the
-sname
flag since DNS is not needed, could be a set of target nodes that run disconnected from any network, for example a laptop computer attached to the targets (possibly via SLIP). In that case it is not reasonable to requireDNS
.Another example is a a set of nodes on a (not networked) LAN with a small set of hosts, and DNS is not desirable or possible. Then the option of reading
/etc/hosts
might be appropriate.