peer

peer

peer
Start and control linked Erlang nodes.
Module peer was introduced in OTP 25.0.

This module provides functions for starting linked Erlang nodes. The node spawning new nodes is called origin, and newly started nodes are peer nodes, or peers. A peer node automatically terminates when it loses the control connection to the origin. This connection could be an Erlang distribution connection, or an alternative - TCP or standard I/O. The alternative connection provides a way to execute remote procedure calls even when Erlang Distribution is not available, allowing to test the distribution itself.

Peer node terminal input/output is relayed through the origin. If a standard I/O alternative connection is requested, console output also goes via the origin, allowing debugging of node startup and boot script execution (see -init_debug). File I/O is not redirected, contrary to slave(3) behaviour.

The peer node can start on the same or a different host (via ssh) or in a separate container (for example Docker). When the peer starts on the same host as the origin, it inherits the current directory and environment variables from the origin.

Note

This module is designed to facilitate multi-node testing with Common Test. Use the ?CT_PEER() macro to start a linked peer node according to Common Test conventions: crash dumps written to specific location, node name prefixed with module name, calling function, and origin OS process ID). Use random_name/1 to create sufficiently unique node names if you need more control.

A peer node started without alternative connection behaves similarly to slave(3). When an alternative connection is requested, the behaviour is similar to test_server:start_node(Name, peer, Args).

The following example implements a test suite starting extra Erlang nodes. It employs a number of techniques to speed up testing and reliably shut down peer nodes:

  • peers start linked to test runner process. If the test case fails, the peer node is stopped automatically, leaving no rogue nodes running in the background
  • arguments used to start the peer are saved in the control process state for manual analysis. If the test case fails, the CRASH REPORT contains these arguments
  • multiple test cases can run concurrently speeding up overall testing process, peer node names are unique even when there are multiple instances of the same test suite running in parallel
-module(my_SUITE).
-behaviour(ct_suite).
-export([all/0, groups/0]).
-export([basic/1, args/1, named/1, restart_node/1, multi_node/1]).

-include_lib("common_test/include/ct.hrl").

groups() ->
    [{quick, [parallel],
        [basic, args, named, restart_node, multi_node]}].

all() ->
    [{group, quick}].

basic(Config) when is_list(Config) ->
    {ok, Peer, _Node} = ?CT_PEER(),
    peer:stop(Peer).

args(Config) when is_list(Config) ->
    %% specify additional arguments to the new node
    {ok, Peer, _Node} = ?CT_PEER(["-emu_flavor", "smp"]),
    peer:stop(Peer).

named(Config) when is_list(Config) ->
    %% pass test case name down to function starting nodes
    Peer = start_node_impl(named_test),
    peer:stop(Peer).

start_node_impl(ActualTestCase) ->
    {ok, Peer, Node} = ?CT_PEER(#{name => ?CT_PEER_NAME(ActualTestCase)}),
    %% extra setup needed for multiple test cases
    ok = rpc:call(Node, application, set_env, [kernel, key, value]),
    Peer.

restart_node(Config) when is_list(Config) ->
    Name = ?CT_PEER_NAME(),
    {ok, Peer, Node} = ?CT_PEER(#{name => Name}),
    peer:stop(Peer),
    %% restart the node with the same name as before
    {ok, Peer2, Node} = ?CT_PEER(#{name => Name, args => ["+fnl"]}),
    peer:stop(Peer2).

The next example demonstrates how to start multiple nodes concurrently:

multi_node(Config) when is_list(Config) ->
    Peers = [?CT_PEER(#{wait_boot => {self(), tag}})
        || _ <- lists:seq(1, 4)],
    %% wait for all nodes to complete boot process, get their names:
    _Nodes = [receive {tag, {started, Node, Peer}} -> Node end
        || {ok, Peer} <- Peers],
    [peer:stop(Peer) || {ok, Peer} <- Peers].

Start a peer on a different host. Requires ssh key-based authentication set up, allowing "another_host" connection without password prompt.

Ssh = os:find_executable("ssh"),
peer:start_link(#{exec => {Ssh, ["another_host", "erl"]},
    connection => standard_io}),

The following Common Test case demonstrates Docker integration, starting two containers with hostnames "one" and "two". In this example Erlang nodes running inside containers form an Erlang cluster.

docker(Config) when is_list(Config) ->
    Docker = os:find_executable("docker"),
    PrivDir = proplists:get_value(priv_dir, Config),
    build_release(PrivDir),
    build_image(PrivDir),

    %% start two Docker containers
    {ok, Peer, Node} = peer:start_link(#{name => lambda,
        connection => standard_io,
        exec => {Docker, ["run", "-h", "one", "-i", "lambda"]}}),
    {ok, Peer2, Node2} = peer:start_link(#{name => lambda,
        connection => standard_io,
        exec => {Docker, ["run", "-h", "two", "-i", "lambda"]}}),

    %% find IP address of the second node using alternative connection RPC
    {ok, Ips} = peer:call(Peer2, inet, getifaddrs, []),
    {"eth0", Eth0} = lists:keyfind("eth0", 1, Ips),
    {addr, Ip} = lists:keyfind(addr, 1, Eth0),

    %% make first node to discover second one
    ok = peer:call(Peer, inet_db, set_lookup, [[file]]),
    ok = peer:call(Peer, inet_db, add_host, [Ip, ["two"]]),

    %% join a cluster
    true = peer:call(Peer, net_kernel, connect_node, [Node2]),
    %% verify that second peer node has only the first node visible
    [Node] = peer:call(Peer2, erlang, nodes, []),

    %% stop peers, causing containers to also stop
    peer:stop(Peer2),
    peer:stop(Peer).

build_release(Dir) ->
    %% load sasl.app file, otherwise application:get_key will fail
    application:load(sasl),
    %% create *.rel - release file
    RelFile = filename:join(Dir, "lambda.rel"),
    Release = {release, {"lambda", "1.0.0"},
        {erts, erlang:system_info(version)},
        [{App, begin {ok, Vsn} = application:get_key(App, vsn), Vsn end}
            || App <- [kernel, stdlib, sasl]]},
    ok = file:write_file(RelFile, list_to_binary(lists:flatten(
        io_lib:format("~tp.", [Release])))),
    RelFileNoExt = filename:join(Dir, "lambda"),

    %% create boot script
    {ok, systools_make, []} = systools:make_script(RelFileNoExt,
        [silent, {outdir, Dir}]),
    %% package release into *.tar.gz
    ok = systools:make_tar(RelFileNoExt, [{erts, code:root_dir()}]).

build_image(Dir) ->
    %% Create Dockerfile example, working only for Ubuntu 20.04
    %% Expose port 4445, and make Erlang distribution to listen
    %%  on this port, and connect to it without EPMD
    %% Set cookie on both nodes to be the same.
    BuildScript = filename:join(Dir, "Dockerfile"),
    Dockerfile =
      "FROM ubuntu:20.04 as runner\n"
      "EXPOSE 4445\n"
      "WORKDIR /opt/lambda\n"
      "COPY lambda.tar.gz /tmp\n"
      "RUN tar -zxvf /tmp/lambda.tar.gz -C /opt/lambda\n"
      "ENTRYPOINT [\"/opt/lambda/erts-" ++ erlang:system_info(version) ++
      "/bin/dyn_erl\", \"-boot\", \"/opt/lambda/releases/1.0.0/start\","
      " \"-kernel\", \"inet_dist_listen_min\", \"4445\","
      " \"-erl_epmd_port\", \"4445\","
      " \"-setcookie\", \"secret\"]\n",
    ok = file:write_file(BuildScript, Dockerfile),
    os:cmd("docker build -t lambda " ++ Dir).

Options that can be used when starting a peer node through start/1 and start_link/0,1.

Node name (the part before "@"). When name is not specified, but host is, peer follows compatibility behaviour and uses the origin node name.

Use long names to start a node. Default is taken from the origin using net_kernel:longnames(). If the origin is not distributed, short names is the default.

Enforces a specific host name. Can be used to override the default behaviour and start "node@localhost" instead of "node@realhostname".

Defines the peer control process behaviour when the control connection is closed from the peer node side (for example when the peer crashes or dumps core). When set to stop (default), a lost control connection causes the control process to exit normally. Setting peer_down to continue keeps the control process running, and crash will cause the controlling process to exit abnormally.

Alternative connection specification. See the connection datatype.

Alternative mechanism to start peer nodes with, for example, ssh instead of the default bash.

Defines whether to pass the -detached flag to the started peer. This option cannot be set to false using the standard_io alternative connection type. Default is true.

Extra command line arguments to append to the "erl" command. Arguments are passed as is, no escaping or quoting is needed or accepted.

Allows the user to change the arguments passed to exec before the peer is started. This can for example be useful when the exec program wants the arguments to "erl" as a single argument. Example:

peer:start(#{ name => peer:random_name(),
  exec => {os:find_executable("bash"),["-c","erl"]},
  post_process_args =>
     fun(["-c"|Args]) -> ["-c", lists:flatten(lists:join($\s, Args))] end
  }).

List of environment variables with their values. This list is applied to a locally started executable. If you need to change the environment of the remote peer, adjust args to contain -env ENV_KEY ENV_VALUE.

Specifies the start/start_link timeout. See wait_boot datatype.

Specifies the peer node stopping behaviour. See stop().

Alternative connection between the origin and the peer. When the connection closes, the peer node terminates automatically. If the peer_down startup flag is set to crash, the controlling process on the origin node exits with corresponding reason, effectively providing a two-way link.

When connection is set to a port number, the origin starts listening on the requested TCP port, and the peer node connects to the port. When it is set to an {IP, Port} tuple, the origin listens only on the specified IP. The port number can be set to 0 for automatic selection.

Using the standard_io alternative connection starts the peer attached to the origin (other connections use -detached flag to erl). In this mode peer and origin communicate via stdin/stdout.

Overrides executable to start peer nodes with. By default it is the path to "erl", taken from init:get_argument(progname). If progname is not known, peer makes best guess given the current ERTS version.

When a tuple is passed, the first element is the path to executable, and the second element is prepended to the final command line. This can be used to start peers on a remote host or in a Docker container. See the examples above.

This option is useful for testing backwards compatibility with previous releases, installed at specific paths, or when the Erlang installation location is missing from the PATH.

Specifies start/start_link timeout in milliseconds. Can be set to false, allowing the peer to start asynchronously. If {Pid, Tag} is specified instead of a timeout, the peer will send Tag to the requested process.

Uses the alternative connection to evaluate apply(Module, Function, Args) on the peer node and returns the corresponding value Result. Timeout is an integer representing the timeout in milliseconds or the atom infinity which prevents the operation from ever timing out.

When an alternative connection is not requested, this function will raise exit signal with the noconnection reason. Use erpc module to communicate over Erlang distribution.

Uses the alternative connection to evaluate apply(Module, Function, Args) on the peer node. No response is delivered to the calling process.

peer:cast/4 fails silently when the alternative connection is not configured. Use erpc module to communicate over Erlang distribution.

Uses the alternative connection to send Message to a process on the the peer node. Silently fails if no alternative connection is configured. The process can be referenced by process ID or registered name.

Returns the peer node state. Th initial state is booting; the node stays in that state until then boot script is complete, and then the node progresses to running. If the node stops (gracefully or not), the state changes to down.

Creates a sufficiently unique node name for the current host, combining a prefix, a unique number, and the current OS process ID.

Note

Use the ?CT_PEER(["erl_arg1"]) macro provided by Common Test -include_lib("common_test/include/ct.hrl") for convenience. It starts a new peer using Erlang distribution as the control channel, supplies thes calling module's code path to the peer, and uses the calling function name for the name prefix.

Types

Reason = term()

Starts a peer node with the specified start_options(). Returns the controlling process and the full peer node name, unless wait_boot is not requested and the host name is not known in advance.

Types

Reason = term()

Starts a peer node in the same way as start/1, except that the peer node is linked to the currently executing process. If that process terminates, the peer node also terminates.

Accepts start_options(). Returns the controlling process and the full peer node name, unless wait_boot is not requested and host name is not known in advance.

When the standard_io alternative connection is requested, and wait_boot is not set to false, a failed peer boot sequence causes the caller to exit with the {boot_failed, {exit_status, ExitCode}} reason.

Types

disconnect_timeout() = 1000..4294967295 | infinity

Stops a peer node. How the node is stopped depends on the shutdown option passed when starting the peer node. Currently the following shutdown options are supported:

This is the default shutdown behavior. It behaves as shutdown option {halt, DefaultTimeout} where DefaultTimeout currently equals 5000.

Triggers a call to erlang:halt() on the peer node and then waits for the Erlang distribution connection to the peer node to be taken down. If this connection has not been taken down after Timeout milliseconds, it will forcefully be taken down by peer:stop/1. See the warning below for more info about this.

Triggers a call to init:stop() on the peer node and then waits for the Erlang distribution connection to the peer node to be taken down. If this connection has not been taken down after Timeout milliseconds, it will forcefully be taken down by peer:stop/1. See the warning below for more info about this.

Close the control connection to the peer node and return. This is the fastest way for the caller of peer:stop/1 to stop a peer node.

Note that if the Erlang distribution connection is not used as control connection it might not have been taken down when peer:stop/1 returns. Also note that the warning below applies when the Erlang distribution connection is used as control connection.

Warning

In the cases where the Erlang distribution connection is taken down by peer:stop/1, other code independent of the peer code might react to the connection loss before the peer node is stopped which might cause undesirable effects. For example, global might trigger even more Erlang distribution connections to other nodes to be taken down. The potential undesirable effects are, however, not limited to this. It is hard to say what the effects will be since these effects can be caused by any code with links or monitors to something on the origin node, or code monitoring the connection to the origin node.