10 Academic and Historical Questions

10.1 Why is Erlang called "Erlang"?

Erlang is named after the mathematician Agner Erlang. Among other things, he worked on queueing theory. The original implementors also tend to avoid denying that Erlang sounds a little like "ERicsson LANGuage".

10.2 Who invented Erlang?

During the 1980s there was a project at the Ericsson Computer Science Laboratory which aimed to find out what aspects of computer languages made it easier to program telecommunications systems. Erlang emerged in the second half of the 80s was the result of taking those features which made writing such systems simpler and avoiding those which made them more complex or error prone.

The people involved at the start were Joe Armstrong, Robert Virding and Mike Williams. Others joined later and added things like distribution and OTP.

10.3 Where does Erlang syntax come from?

Mostly from prolog. Erlang started life as a modified prolog. ! as the send-message operator comes from CSP. Eripascal was probably responsible for , and ; being separators and not terminators.

10.4 Why is Ericsson giving away Erlang?

(The following is my personal impression. I don't speak for Ericsson!)

Nothing to lose: Ericsson's core business is telecommunications products, selling programming tools is not really a business Ericsson is interested in.

Stimulate adoption: Erlang is a great language for many sorts of systems. Releasing a good, free development environment is likely to make Erlang catch on faster.

Generate goodwill: Giving away cool software can only improve Ericsson's image, especially given the current level of media attention around "open software".

10.5 What does soft realtime mean?

Cynics will say "basically nothing".

A hard realtime system is one which can guarantee that a certain action will always be carried out in less than a certain time. Many simple embedded systems can make hard realtime guarantees, e.g. it is possible to guarantee that a particular interrupt service routine on a Z80 CPU will never take more than 34us. It gets progressively harder to make such guarantees for more complex systems.

Many telecomms systems have less strict requirements, for instance they might require a statistical guarantee along the lines of "a database lookup takes less than 20ms in 97% of cases". Soft realtime systems, such as Erlang, let you make that sort of guarantee.

A rule of thumb is that it is straightforward to write Erlang programs which can respond to external events within a few milliseconds. The parts of Erlang which help with this are:

Language features which make it hard(er) for programmers to roughly estimate performance were never added to Erlang. For instance, Erlang doesn't have lazy evaluation.
Erlang processes are very lightweight, much lighter than an operating system thread. Switching between Erlang processes is typically an order of magnitude or two faster than switching between OS threads.
Each Erlang process is garbage collected separately. This avoids the (relatively) long delay which would occur if there was a single large heap for all processes.

10.6 How did the first Erlang compiler get written?

(or: how was Erlang bootstrapped?) In Joe's words:

First I designed an abstract machine to execute Erlang. This was called the JAM machine; JAM = Joe's Abstract Machine.

Then I wrote a compiler from Erlang to JAM and an emulator to see if the machine worked. Both these were written in prolog.

At the same time Mike Williams wrote a C emulator for the JAM.

Then I rewrote the erlang-to-jam compiler in Erlang and used the prolog compiler to compile it. The resultant object code was run in the C emulator. Then we threw away prolog.

Some of this is described in an old paper

10.7 Erlang Syntax and Semantics Details

When in doubt about exactly what the language allows or does, the best place to start is the Erlang Specification. This is still a work in progress, but it covers most of the language.

10.8 Is the order of message reception guaranteed?

Yes, but only within one process.

If there is a live process and you send it message A and then message B, it's guaranteed that if message B arrived, message A arrived before it.

On the other hand, imagine processes P, Q and R. P sends message A to Q, and then message B to R. There is no guarantee that A arrives before B. (Distributed Erlang would have a pretty tough time if this was required!)

10.9 If I send a message, is it guaranteed to reach the receiver?

Most people find it simplest to program as though the answer was "yes, always".

Per Hedeland covered the issues on the mailing list (edited a bit):

"Delivery is guaranteed if nothing breaks" - and if something breaks, you will find out provided you've used link/1. I.e. you will get an EXIT signal not only if the linked process dies, but also if the entire remote node crashes, or the network is broken, or if any of these happen before you do the link.

It seems this issue of "guaranteed delivery" comes up every now and then, but I've never managed to find out exactly what it is those that are asking for it actually want:

A guarantee that the message is put into the receiver's input queue? But if the receiver dies before extracting it from there, that guarantee is useless, as the message is lost anyway.
A guarantee that the receiver extracts the message from its input queue? Well, besides the obvious problem that depending on how the receiver is written, even if it lives happily ever after it may never extract that particular message, it suffers from a variant of the previous problem: Even if you "know" that the receiver has "consumed" the message, it may die before acting on it in any way, and then again it may as well never have been sent.
A guarantee that the receiver actually processes the message? Just kidding of course, hopefully it's obvious to everyone that the only way to obtain such a guarantee, regardless of what programming and communication system you use, is that the receiver is programmed to send an explicit acknowledgment when the processing is complete (of course this may be hidden below an abstraction such as RPC, but the fundamental principle holds).

Add to this that any guarantee would have to entail some form of ack from the remote in at least a distributed system, even if it wasn't directly visible to the programmer. E.g. you could have '!' block until the ack comes back from the remote saying that the message had progressed however far you required - i.e. synchronous communication of sorts. But this would penalize those that don't require the "guarantee" and want asynchronous communication.

So, depending on your requirements, Erlang offers you at least these levels of "guarantee":

Super-safe

Receiver sends ack after processing; sender links, sends, waits for ack or EXIT. This means the sender knows, for each message, whether it was fully processed or not.

Medium-safe

Receiver doesn't send acks; Sender links, sends message(s). This means an EXIT signal informs the sender that some messages may never have been processed.

Pretty-safe

Receiver doesn't send acks; sender sends messages. :-)

There are any number of combinations of these (e.g. receiver sends ack not after each message but at some critical points in the processing).

Per concluded by pointing out that "if you think TCP guarantees delivery, which most people probably do, then so does Erlang".

10.10 What limits does Erlang have?

The Erlang language doesn't specify any limits, but different implementations have different limits on the number of processes, the maximum amount of RAM and so on. These are documented for each implementation.

10.11 Do funs behave during a code change?

Yes and no. A fun is a reference to code; it is not the code itself. Thus a fun can only be evaluated if the code it refers to is actually loaded on the evaluating node. In some cases, we can be sure that execution will never return to a fun. In these cases the old code can be purged without problems. The following code causes no surprises:

	Parent = self(),
	F = fun() -> loop(Parent) end,
	spawn(F).

On the other hand, a bound fun will not be replaced. In the following example, the old verson of F is executed even after a code change:

	-module(cc).
	-export([go/0, loop/1]).

	go() ->
		F = fun() -> 5 end,
		spawn(fun() -> loop(F) end).

	loop(F) ->
		timer:sleep(1000),
		F(),
		cc:loop(F).

This type of problem can be solved in the code_change/2 function in the standard behaviours.

10.12 Examples of bound funs causing surprises

Some typical ways to get in trouble are:

Mnesia. If you store funs in mnesia, you need to have a way to update the table (i.e. replace the fun references) when you load code.
Serialisation. The same deal. If you serialise a fun (e.g. term_to_binary/1) and then unserialise it, you can only evaluate the fun if the code it refers to still exists. In general, serialising a fun, saving it to a file and then loading it in another emulator will not do what you hoped.
Message passing. Sending a fun to a remote node always works, but the fun can only be evaluated on a node where the same version of the same module is available.

A general way to avoid problems with funs which refer to code which doesn't exist is to store the function by name instead of by reference, e.g. by writing fun M:F/A.

10.13 Is it just me, or are records ugly?

Compared to the rest of Erlang, records are rather ugly and error prone. They're ugly because they require an awful lot of typing (no pun intended). They're error prone because the usual method of defining records, the -include directive, provides no protection against multiple, incompatible definitions of records.

Several ways forward have been explored. One is lisp-like structs which have been discussed on the mailing list. Another is Richard O'Keefe's abstract patterns which is an Erlang Enhancement Proposal. Then there is also a suggestion for making records more reliable.

10.14 Can a process receive messages with varying priorities?

Sure. Here's an example of how to do it using nested receives:

        receive ->
           {priority_msg, Data1} -> priority(Data1)
        after
           0 ->
             receive
        	{priority_msg, Data1} -> priority(Data1)
                {normal_msg, Data2} -> normal(Data2)
             end
        end.

10.15 What's the history of static type checking in Erlang?

Erlang ships with a static type analysis system called the Dialyzer. Using the dialyzer is optional, though many or most serious projects use it. The Dialyzer does not require source code to be modified or annotated, though annotations increase the number of problems the Dialyzer can find.

Prior to about 2005, static type checking was used rarely in commercial Erlang-based systems. Several people experimented with various approaches to the problem, including Sven-Olof Nyström, Joe Armstrong, Philip Wadler, Simon Marlow and Thomas Arts.

10.16 How does type checking in Erlang compare to Java/Python/C/C++/Haskell?

Erlang itself, i.e. ignoring the Dialyzer, uses a dynamic type system. All type checking is done at run-time, the compiler does not check types at compile time. The run-time type system cannot be defeated. This is comparable to the type systems in Lisp, Smalltalk, Python, Javascript, Prolog and others.

Java, Eiffel and some other languages have type systems which are mostly checked at compile time, but with some remaining checking done at run time. The combination of checking cannot be defeated. Such type systems provide some guarantees about types which can be exploited by the compiler, this can be useful for optimisation.

Haskell, Mercury and some other languages have type systems which are completely checked at compile time. This type system cannot be defeated. The type system in this type of language is also a design tool, it increases the language's expressiveness.

C, pascal and C++ have type systems which are checked at compile time, but can be defeated by straightforward means provided by the language.

10.17 What's Erlang's relation to 'object oriented' programming?

Many people think of Erlang as a language oriented towards concurrency and message passing, which happens to be functional. I.e. object-oriented isn't mentioned.

Another common point of view is to try and define 'object oriented' and then look at how Erlang fits that definition. For instance, a relatively noncontroversial list of OO features is: dynamic dispatch, encapsulation, subtype polymorphism, inheritance and open recursion. If you regard an Erlang process as an object, then three of the five features are clearly there---e.g. 'open recursion' is just sending a message to self(). The two others, subtype polymorphism and inheritance, are less naturally modelled.

Still another approach is to elevate the importance of message passing in OO. Alan Kay is often brought up in such discussions for three reasons: he's credited with cointing the term 'object oriented' and message passing is the first thing he mentions when explaining what he means by 'object oriented' (for instance in this email) and he doesn't specify a rigid interpretation of how inheritance should work.

Various people have published papers on 'how to do OO in Erlang", the first book (now defunct) about Erlang featured a whole chapter on the subject. In practice, few, if any, real Erlang systems attempt to follow those 'OO Erlang' conventions. The WOOPER project is one example of a serious effort to put an OO layer on top of Erlang.

10.18 How are strings implemented and how can I speed up my string code?

Strings are represented as linked lists, so each character takes 8 octets of memory on a 32 bit machine, twice as much on a 64 bit machine. Access to the Nth element is O(N). This makes it easy to accidentally write O(N^2) code. Here's an example of how that can happen:

	slurp(Socket) -> slurp(Socket, "").

	slurp(Socket, Acc) ->
	  case gen_tcp:recv(Socket, 1) of
	    {ok, [Byte]} -> slurp(Socket, Acc ++ Byte);
	    _ -> Acc
          end.

The bit-syntax provides an alternative way of handling strings with different space/time tradeoffs to the list implementation.

Some techniques for improving performance of string-related code:

Avoid O(N^2) operations. Sometimes this means building a longer string backwards and then using reverse in the final step. Other times, it's better to build up a string as a list of lists and then let the io functions flatten the list.
Consider using an alternative representation of your data, e.g. instead of representing the text as a list of characters, it may be better to represent it as a tree of words.
If you're reading and writing data from sockets, consider using binaries instead of lists of bytes. If all you're doing is copying data from one socket to another, this will be (much) faster. More generally, the bit-syntax provides an alternative way of handling strings with different space/time tradeoffs to the list implementation.
Write speed-critical (profile first!) code in C, and then put it in your erlang system as a port-driver or a linked-in driver.
Ports (and sockets) flatten lists for you, so flattening lists yourself slows things down. This can also be used to create more readable code:
```
gen_tcp:send(Socket, ["GET ", Url, " HTTP/1.0", "\r\n\r\n"]).
```

10.19 How does the Garbage Collector (GC) work?

GC in Erlang works independently on each Erlang process, i.e. each Erlang process has its own heap, and that heap is GCed independently of other processes' heaps.

The current default GC is a "stop the world" generational mark-sweep collector. On Erlang systems running with multiple threads (the default on systems with more than one core), GC stops work on the Erlang process being GCed, but other Erlang processes on other OS threads within the same VM continue to run. The time the process spends stopped is normally short because the size of one process' heap is normally relatively small; much smaller than the combined size of all processes heaps.

The GC strategy for a newly started Erlang process is full-sweep. Once the process' live data grows above a certain size, the GC switches to a generational strategy. If the generational strategy reclaims less than a certain amount, the GC reverts to a full sweep. If the full sweep also fails to recover enough space, then the heap size is increased.

10.20 Can I exploit knowledge about my program to make the GC more efficient?

Maybe, but you might be better off expending effort on thinking of other ways to make your system go faster.

One version of spawn/4 accepts a list of options as the last argument. Ulf Wiger (AXD301) says that the gc_switch and min_heap_size can be used to obtain better performance if you do some measuring, benchmarking and thinking. One 'win' happens when you can completely avoid GC in short-lived processes by setting their initial heap large enough to avoid all GC during the process' life.

min_heap_size can be useful when you know that a certain process will rapidly grow its heap to well above the system's default size. Under such circumstances you get particularly bad GC performance with the current GC implementation.

gc_switch affects the point at which the garbage collector changes from its full-sweep algorithm to its generational algorithm. Again, in a rapidly growing heap which doesn't contain many binaries, GC might perform better with a higher threshold.

10.21 Could the Java Virtual Machine be used to run Erlang?

Yes.

Erjang does exactly that. It's an experimental system with some impressive results.

10.22 Is it possible to prove the correctness of Erlang programs?

One of the touted advantages of functional programming languages is that it is easier to formally reason about programs and prove certain properties of a given program.

Two tools used to do that are MCErlang and Concuerror.

10.23 I've heard it's a bad idea to program defensively in Erlang. Why?

The Erlang coding guidelines suggest avoiding defensive programming. The choice of the term "defensive programming" is unfortunate, because it is usually associated with good practice. The point of the recommendation is that allowing an Erlang process to exit when things go wrong inside the Erlang program is a good approach, i.e. writing code which attempts to avoid an exit is usually a bad idea.

For example, when parsing an integer it makes perfect sense to just write

	I = list_to_integer(L)

if L is not an integer, the process will exit and a supervisor somewhere will restart that part of the system, reporting an error:

	=ERROR REPORT==== 12-Mar-2003::13:04:08 ===
	Error in process <0.25.0> with exit value: {badarg,[{erlang,list_to_integer,[bla]},{erl_eval,expr,3},{erl_eval,exprs,4},{shell,eval_loop,2}]}
** exited: {badarg,[{erlang,list_to_integer,[bla]},
                    {erl_eval,expr,3},
                    {erl_eval,exprs,4},
                    {shell,eval_loop,2}]} **

If a more descriptive diagnostic is required, use a manual exit:

	uppercase_ascii(C) when C >= $a, C =< $z ->
          C - ($a - $A);
        uppercase_ascii(X) ->
          exit({"uppercase_ascii given non-lowercase argument", X}).

This separation of error detection and error handling is a key part of Erlang. It reduces complexity in fault-tolerant systems by keeping the normal and error-handling code separate.

As for most most advice, there are exceptions to the recommendation. One example is the case where input is coming from an untrusted interface, e.g. a user or an external program.

Joe's original explanation is available online.