Wednesday 27 October 2021

How to fix chmod execute permissions


Problem

You've run something like the following and accidentally removed the execute permission from /bin/chmod:

        [ec2-user@ip-172-31-30-6 ~]$ sudo chmod -x /bin/chmod
        [ec2-user@ip-172-31-30-6 ~]$ ls -l /bin/chmod
        -rw-r--r-- 1 root root 54384 Jan 23  2020 /bin/chmod
        ...
        [root@ip-172-31-30-6 ~]# /bin/chmod +x /usr/bin/netstat 
-bash: /bin/chmod: Permission denied

Now you can't execute chmod, to change the permissions on any files on the system including chmod itself. Below are a couple of ways to fix it.


Solution


Use the ld.so and ld-linux.so* dynamic loader to execute chmod:

According to its man page [1], "The programs ld.so and ld-linux.so* find and load the shared libraries needed by a program, prepare the program to run, and then run it.".

We can use this to execute chmod despite the fact it doesn't have execute permissions, and undo our mistake. Before doing so, we first need to find the ld linux binary. In Amazon Linux 2, I found ld.so under /usr/lib64/ld-2.26.so.

        [ec2-user@ip-172-31-30-6 ~]$ sudo find /usr/lib64 -name "ld*.so*"
        /usr/lib64/ld-2.26.so
        /usr/lib64/ld-linux-x86-64.so.2
        ...

Now that we've found them we can use either one of them to execute chmod:

        [ec2-user@ip-172-31-30-6 ~]$ ls -l /bin/chmod
        -rw-r--r-- 1 root root 54384 Jan 23  2020 /bin/chmod
        [ec2-user@ip-172-31-30-6 ~]$ sudo /usr/lib64/ld-2.26.so /bin/chmod +x /bin/chmod

Finally we verify that the issue is resolved and we can execute chmod to our hearts content:

        [ec2-user@ip-172-31-30-6 ~]$ ls -l /bin/chmod
        -rwxr-xr-x 1 root root 54384 Jan 23  2020 /bin/chmod
        [ec2-user@ip-172-31-30-6 ~]$ sudo /bin/chmod +x /usr/bin/netstat
        [ec2-user@ip-172-31-30-6 ~]$ 

Using Perl

Interestingly enough, perl has its chmod function built in [2]. Why? I have no idea, but we can use it to fix the chmod binary.

An example is shown below:

        [ec2-user@ip-172-31-30-6 ~]$ sudo chmod -x /bin/chmod
        [ec2-user@ip-172-31-30-6 ~]$ ls -l /bin/chmod
        -rw-r--r-- 1 root root 54384 Jan 23  2020 /bin/chmod
        [ec2-user@ip-172-31-30-6 ~]$ sudo perl -e 'chmod(0755, "/bin/chmod")'
        [ec2-user@ip-172-31-30-6 ~]$ ls -l /bin/chmod
        -rwxr-xr-x 1 root root 54384 Jan 23  2020 /bin/chmod
        [ec2-user@ip-172-31-30-6 ~]$ 
As you can see chmod has execute permissions once again.

Rsync from another server

If you have the ability to rsync /bin/chmod from another server you can use the following command as an example to pull the file. This will replace the existing chmod file, including file metadata (such as execute permissions).

        [ec2-user@ip-172-31-30-6 ~]$ ls -l /bin/chmod
        -rw-r--r-- 1 root root 54384 Jan 23  2020 /bin/chmod
        [ec2-user@ip-172-31-30-6 ~]$ rsync -av SOURCE_SERVER:/bin/chmod /tmp/chmod
        ...
        [ec2-user@ip-172-31-30-6 ~]$ sudo mv /tmp/chmod /bin/chmod
        [ec2-user@ip-172-31-30-6 ~]$ ls -l /bin/chmod
        -rwxr-xr-x 1 root root 54384 Jan 23  2020 /bin/chmod
        [ec2-user@ip-172-31-30-6 ~]$ 
Note that you'll need to update SOURCE_SERVER with the IP address or DNS hostname of the source server.

When I have done this in the past, I've used the same OS i.e. Amazon Linux 2. I'm not sure if this would work if it were a completely different OS.

Making a copy and replacing its contents

This solutions requires making a copy of an existing binary which does have execute permissions, and then rsync'ing the contents of the existing broken chmod binary to our copied file before moving the copied file to replace the /bin/chmod that's broken. This one is probably better explained with an example.

        [ec2-user@ip-172-31-30-6 ~]$ ls -l /bin/chmod
        -rw-r--r-- 1 root root 54384 Jan 23  2020 /bin/chmod
        [ec2-user@ip-172-31-30-6 ~]$ sudo cp /bin/chown /bin/chmod2
        [ec2-user@ip-172-31-30-6 ~]$ sudo rsync /bin/chmod /bin/chmod2
        [ec2-user@ip-172-31-30-6 ~]$ sudo /bin/chmod2 +x /bin/chmod
        [ec2-user@ip-172-31-30-6 ~]$ sudo rm -f /bin/chmod2
        [ec2-user@ip-172-31-30-6 ~]$ ls -l /bin/chmod
-rwxr-xr-x 1 root root 54384 Jan 23 2020 /bin/chmod [ec2-user@ip-172-31-30-6 ~]$

Using a Live CD

Unfortunately, I can't go into much detail on this one as its dependant on what kind of OS you're running. Essentially you'd boot the machine with the Live CD, mount the old root volume to a temporary location, and use the Live CD's version of chmod to make your broken chmod executable once again.

References:

[1] ld-linux(8) - Linux man page

https://linux.die.net/man/8/ld-linux

[2] chmod - Perldoc Browser
https://perldoc.perl.org/functions/chmod

Enabling TCP Keepalive Functionality For Legacy Linux Applications

 

Problem

You want to enable TCP keep alive functionality but the application either doesn't support or is being overriden by the application itself.

You may have tried (and failed) to configure this using the sysctl parameters mentioned below to no avail. As a result the connection eventually times out or is closed on its own.

The sysctl parameters you may have tried to configure are:

net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 6

If you have tried to configure the above-mentioned parameters and you still aren't seeing TCP Keepalive functionality enabled then this article may be of use to you.


Solution

You can install the libkeepalive library and using the LD_PRELOAD environment variable, instruct the application to load the library and enable TCP Keepalive functionality.


Quick Setup Guide


        [ec2-user@ip-172-31-30-6 ~]$ wget http://prdownloads.sourceforge.net/libkeepalive/libkeepalive-0.3.tar.gz?download
[ec2-user@ip-172-31-30-6 ~]$ tar zxf libkeepalive-0.3.tar.gz
[ec2-user@ip-172-31-30-6 ~]$ cd libkeepalive-0.3
[ec2-user@ip-172-31-30-6 libkeepalive-0.3]$ make
[ec2-user@ip-172-31-30-6 libkeepalive-0.3]$ sudo cp libkeepalive.so /usr/lib64
[ec2-user@ip-172-31-30-6 libkeepalive-0.3]$ export LD_PRELOAD=/usr/lib64/libkeepalive.so
[ec2-user@ip-172-31-30-6 libkeepalive-0.3]$ export KEEPCNT=20
[ec2-user@ip-172-31-30-6 libkeepalive-0.3]$ export KEEPIDLE=75
[ec2-user@ip-172-31-30-6 libkeepalive-0.3]$ export KEEPINTVL=60
[ec2-user@ip-172-31-30-6 libkeepalive-0.3]$ /path/to/myapplication


How It Works


When /path/to/myapplication executes, the OS will preload the libkeepalive.so library enabling TCP keepalive functionality for newly created TCP sockets in accordance with the KEEPCNT, KEEPIDLE, and KEEPINTVL environment variables. I have tested this using the nc command to create a process that listens on TCP port 5000. Run the nc command via strace, and you'll see what happens when TCP Keepalives are not enabled:
        [ec2-user@ip-172-31-30-6 ~]$ strace nc -l 5000 2>&1 | grep setsockopt
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(3, SOL_IPV6, IPV6_V6ONLY, [1], 4) = 0
setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
^C
[ec2-user@ip-172-31-30-6 ~]$
We set that the SO_KEEPALIVE socket option has not been enabled nor have any of the other Keepalive related settings. Now let's review the difference when TCP Keepalives are enabled for the same command:
        [ec2-user@ip-172-31-30-6 ~]$ export LD_PRELOAD=/usr/lib64/libkeepalive.so
[ec2-user@ip-172-31-30-6 ~]$ export KEEPCNT=20
[ec2-user@ip-172-31-30-6 ~]$ export KEEPIDLE=75
[ec2-user@ip-172-31-30-6 ~]$ export KEEPINTVL=60
[ec2-user@ip-172-31-30-6 ~]$ strace nc -l 5000 2>&1 | grep setsockopt
setsockopt(3, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(3, SOL_TCP, TCP_KEEPCNT, [20], 4) = 0
setsockopt(3, SOL_TCP, TCP_KEEPIDLE, [75], 4) = 0
setsockopt(3, SOL_TCP, TCP_KEEPINTVL, [60], 4) = 0
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(3, SOL_IPV6, IPV6_V6ONLY, [1], 4) = 0
setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(4, SOL_TCP, TCP_KEEPCNT, [20], 4) = 0
setsockopt(4, SOL_TCP, TCP_KEEPIDLE, [75], 4) = 0
setsockopt(4, SOL_TCP, TCP_KEEPINTVL, [60], 4) = 0
setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
^C
[ec2-user@ip-172-31-30-6 ~]$
After using telnet to connect to TCP port 5000, we can use the netstat (or ss) command and see that a Keepalive timer is being used which further confirms that we've enabled TCP Keepalive functionality successfully for the telnet session.
        [ec2-user@ip-172-31-30-6 ~]$ sudo netstat -tnopea
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name Timer
...
tcp 0 0 127.0.0.1:5000 127.0.0.1:36246 ESTABLISHED 1000 1886232 24629/nc keepalive (71.16/0/0)

References:

[1] libkeepalive http://libkeepalive.sourceforge.net/#download

udev: renamed network interface eth0 to eth1

 In the console logs you see the following message:


udev: renamed network interface eth0 to eth1

or you may see something like:

ena 0000:00:05.0 eth1: renamed from eth0

As a result, the network fails to start, and the host isn't accessible.


Solution


There are a couple of ways to solve this issue, but both will require rebooting the machine.


Short Term Solution

In the /etc/udev/rules.d directory, there is a udev rule file ending with "-persistent-net.rules". Usually this file will be prepended with a number (such as 70) which defines the order in which udev rules are processed. Delete the file, and when the OS is started again, the file will be generated from scratch and the network interface will not be renamed to eth1..


$ sudo rm -vf /etc/udev/rules.d/70-persistent-net.rules
$ sudo reboot

If you also see the following message in the console log and the network interface is attached as eth0, then you'll need to check the /etc/sysconfig/network-scripts (or the OS equivalent) to make sure that your network configuration scripts are named correctly ie. ifcfg-eth0 rather than ifcfg-eth1, and the DEVICE and NAME parameters match the device name (eth0).

Bringing up interface eth1: Determining IP information for eth1... done.

An example is shown below of what it's supposed to look like:

[ec2-user@ip-172-31-28-71 ~]$ cat /etc/sysconfig/network-scripts/ifcfg-eth0 
DEVICE=eth0
BOOTPROTO=dhcp
ONBOOT=yes
TYPE=Ethernet


Long Term Solution


Depending on the operating system, in /lib/udev or /usr/lib/udev there is a bash script called "write_net_rules ". In this file, you'll find the section of code towards the bottom of the file which renames the network device if a rule already exists. The exact section of code i'm referring to is outlined below. To stop the issue from re-occurring in future, hash out the portion of code below before proceeding to the next step. When you're done, it should look like this;


#else
#        # if a rule using the current name already exists, find a new name
#        if interface_name_taken; then
#                INTERFACE="$basename$(find_next_available "$basename[0-9]*")"
#                # prevent INTERFACE from being "eth" instead of "eth0"
#                [ "$INTERFACE" = "${INTERFACE%%[ \[\]0-9]*}" ] && INTERFACE=${INTERFACE}0
#                echo "INTERFACE_NEW=$INTERFACE"
#        fi


Once you've done that, you'll need to remove the persistent-net.rules file as per the "Short Term Solution" above.


Why Is This Happening?


When you launch a new machine from an existing image or snapshot, udev will have an existing rule for eth0. An example of this rule is shown below.


SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="01:23:45:67:89:ab", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"


Simply put, when the conditions above are met, the device name is set to eth0.


When a new machine is launched which has not been patched, it launches with an network interface that has a different MAC address. Because of this, the new network interface will not match against the above-mentioned udev rule.

The write_net_rules script will then add a new entry to the persistent-net.rules file for the new network interface. However, since a rule already exists for eth0, udev changes the device name to a device name which isn't already "In use". The script will increase the interface number by one which effectively forces eth0 to be renamed to eth1 because it fails to match on the first rule, and succeeds on the second.

If the customer experiences this behaviour, you should see similar output to what's shown below in the persistent-net.rules file.


SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="01:23:45:67:89:ab", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="ab:cd:ef:09:87:65", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1"


As long as the second rule exists, the machine will always boot and rename eth0 to eth1.

Also if you're wondering, Amazon Linux 1/2 disables the above-mentioned functionality in the ec2-net-utils package. This can be seen here:

https://github.com/aws/ec2-net-utils/blob/master/write_net_rules


How Does Udev Rule Matching Work?


Let's take the following udev rule and break it down.


SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="01:23:45:67:89:ab", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"


if a network device ( SUBSYSTEM=="net" ) is added ( ACTION=="add" ) to the system, and it's not a VLAN'd i.e eth0.130 or sub-interface i.e. eth0:0 ( DRIVERS=="?*" ) with a MAC address of 01:23:45:67:89:ab ( ATTR{address}=="01:23:45:67:89:ab" ), and it's the primary ethernet device ( ATTR{type}=="1" ), and the kernel name of the device begins with "eth" ( KERNEL=="eth*" ), set the name of the device to eth0 ( NAME="eth0" ).


If you want to read more about udev rules, there are far better explanations on the interwebs. I recommend the following online resources.


https://linuxconfig.org/tutorial-on-how-to-write-basic-udev-rules-in-linux

http://www.linuxfromscratch.org/lfs/view/6.3/chapter07/network.html