[erlang-bugs] Erlang nodes fail to communicate if user has no capability to change SO_PRIORITY socket options.

pan@REDACTED pan@REDACTED
Tue Jan 18 17:15:09 CET 2011


Hi!

On Mon, 17 Jan 2011, Serge Aleynikov wrote:

> At some point I was having a similar issue with SO_PRIORITY and SO_TOS bug in 
> inet_drv when trying to open a unix domain socket and pass the open file 
> descriptor to gen_tcp, which failed to function properly. After discussing 
> this issue with Per Hedeland he sent me the attached patch that worked well 
> to solve the issue.  Perhaps it will also work in your case, and if so, it 
> should be included in distribution.

Seems Per's patch covers more cases - if this solves the problem, it seems 
like the best choice to take this one int dev instead of the smaller fix 
i presented earlier.

Janek - have you tried this one?

Cheers,
/Patrik
>
> Serge
>
> On 1/17/2011 1:33 PM, Janek Wrobel wrote:
>> On Thu, Jan 13, 2011 at 5:22 PM, Patrik Nyblom<pan@REDACTED> 
>> wrote:
>>> Hi!
>>> 
>>> On Thu, 13 Jan 2011, Janek Wrobel wrote:
>>> 
>>>> Hi,
>>>> 
>>>> When trying to setup an Erlang cluster I was getting following error
>>>> while spawning a function on a remote node:
>>>> 
>>>> =ERROR REPORT==== 13-Jan-2011::01:00:38 ===
>>>> ** Can not start hello_world:ping,[] on 'node3@REDACTED' **
>>>> 
>>>> After some investigation it turned out that nodes did not accept TCP
>>>> connections, because setting SO_PRIORITY socket option failed. Strace
>>>> follows:
>>> 
>>> What kind of node (kernel version, extra options etc) do you have?
>> 
>> Sorry for a late reply, but I was trying to investigate what
>> configuration option is responsible for this behavior and how to
>> reproduce it on a standard Linux box. Unfortunately without any
>> success. The problem here is that socket gets created with default
>> priority larger then user running the process can
>> set with setsockopt, so the sequence of getsockopt(SO_PRIORITY,
>> &priority), setsockopt(SO_PRIORITY, priority)  fails.
>> 
>> I was thinking that maybe some firewall rule increasing TOS of packets
>> directed and coming from a given TCP port, or some traffic shaping
>> rule ('tc' command) can have an effect of changing default priority of
>> sockets associated with the port. It does not seem to be the case.
>> Maybe someone on this list knows if default priority of Linux sockets
>> can be somehow altered?
>> 
>> One scenario in which the sequence of getsocopt(), setsockopt() can
>> fail is when socket was created by a different OS process that had
>> CAP_NET_ADMIN capability. Socket descriptor can be then passed to
>> Erlang VM running without CAP_NET_ADMIN, and used in 'listen {fd, Fd}'
>> function, causing similar error. But this is definitely not the case
>> here.
>> 
>> 
>>> The inet_driver has a workaround for SO_PRIORITY being destroyed by SO_TOS
>>> settings, I think that's where this fails.
>>> 
>>>> 
>>>> accept(7, {sa_family=AF_INET, sin_port=htons(51602),
>>>> sin_addr=inet_addr("123.123.123.123")}, [16]) = 10
>>>> fcntl(10, F_GETFL)                      = 0x2 (flags O_RDWR)
>>>> fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
>>>> getsockopt(7, SOL_TCP, TCP_NODELAY, [29220483580821504], [4]) = 0
>>>> getsockopt(7, SOL_SOCKET, SO_KEEPALIVE, [29220483580821504], [4]) = 0
>>>> getsockopt(7, SOL_SOCKET, SO_PRIORITY, [29220483580821504], [4]) = 0
>>>> getsockopt(7, SOL_IP, IP_TOS, [29220483580821504], [4]) = 0
>>>> getsockopt(10, SOL_SOCKET, SO_PRIORITY, [-4294967281], [4]) = 0
>>>> getsockopt(10, SOL_IP, IP_TOS, [64424509440], [4]) = 0
>>>> setsockopt(10, SOL_IP, IP_TOS, [0], 4)  = 0
>>>> setsockopt(10, SOL_SOCKET, SO_PRIORITY, [15], 4) = -1 EPERM (Operation
>>>> not permitted)
>>>> close(10)
>>>> 
>>>> To make it working I needed to add '#undef SO_PRIORITY' to
>>>> erts/emulator/drivers/common/inet_drv.c and recompile.
>>>> 
>>>> Can errors from setsockopt(..., SO_PRIORITY) be ignored? According to
>>>> the socket(7) manual, it is normal for a user not to be able to change
>>>> this option ('Setting a priority outside the range 0 to 6 requires the
>>>> CAP_NET_ADMIN capability.').
>>> 
>>> 
>>> I think it would be OK if you checked that you got EPERM in the exact
>>> copy-from-listen-socket-to-result-of-accept code and ignored the result
>>> then.
>>> 
>>> I suspect you would have to patch the function setopt_prio_tos_trick in
>>> inet_drv.c like this:
>>> -----------------------------
>>> diff --combined erts/emulator/drivers/common/inet_drv.c
>>> index 818bc63,818bc63..0000000
>>> --- a/erts/emulator/drivers/common/inet_drv.c
>>> +++ b/erts/emulator/drivers/common/inet_drv.c
>>> @@@ -5095,6 -5095,6 +5095,9 @@@ static int setopt_prio_tos_tric
>>>                                            SO_PRIORITY,
>>>                                            (char *)&tmp_ival_prio,
>>>                                            tmp_arg_sz_prio);
>>> ++                      if (res != 0&&  sock_errno() == EPERM) {
>>> ++                          res = 0;
>>> ++                      }
>>>                     }
>>>                 }
>>>             }
>>> -------------------------------
>>> 
>>> Try that and see if it fixes the problem.
>> 
>> This fixes the problem.
>> 
>> thanks,
>> Janek
>> 
>> ________________________________________________________________
>> erlang-bugs (at) erlang.org mailing list.
>> See http://www.erlang.org/faq.html
>> To unsubscribe; mailto:erlang-bugs-unsubscribe@REDACTED
>> 
>


More information about the erlang-bugs mailing list