Curious problem. The network is managed by NetworkManager, and the system has a static IPv4 address assignment.
There is an application managed by Systemd that launches on startup. service file contents below:
When the application launches on start, part of the application’s initialization routine is to identify the ip address assigned to the system. The issue:
Every so many reboots (lets say 10), the application fails to find it’s IP address during init. It looks like enp4s0 interface occasionally takes much longer to get set up than normal (~ 13 seconds), and this is what causes the failure to find/bind to the interface.
start_app.sh
attempted workaround/debugging. The service used to just exec the application. the start script was added as a wrapper to the application. if the start script finds the IP address, then it starts the app. What’s confusing is that the start script will successfully identify the IP, start the app, then the app fails to bind the address.
As specified by the service file, the application does not start until NetworkManager has finishsed, which suggests that the interface is not visible to the kernel when NetworkManager is setting up.
The Ethernet driver loaded is the: Intel I225-LM
I didn’t do any custom configuration of the driver and so I’m wondering if there are different configuration parameters I can init the module with that may make a difference?
network-online.target
Units that strictly require a configured network connection should pull in network-online.target
(via a Wants= type dependency) and order themselves after it. This target unit is intended to pull
in a service that delays further execution until the network is sufficiently set up. What precisely
this requires is left to the implementation of the network managing service.
Although, I have dnsmasq and vsftpd services start successfully with just After=NetworkManager-wait-online.service to their unti config.
Which basically says: “The multi-user.target is not completely started before my service has started.”
which is contradictory with the other condition.
While I do not have any conclusive logs to support the claim, I installed and started the acpid demon, and have performed ~1k reboots without the issue reproducing. So, going to mark this fixed, however I would still be very appreciative for suggestions on how to obtain proof in the for of logs.
Seeing something like this with (of all things) the loopback interface and postfix. Fairly reliably postfix fails to start with the complaint that there is no interface for localhost (127.0.0.1). The first time this occurred I did a “stare and compare” with a working postfix main.cf to no avail. I then did a “systemctl restart postfix” and postfix started and ran as expected. I’ve now made checking postfix status a SOP for this system and only need to restart it to make the “missing localhost interface” problem go away. I have not modified the systemctl file for postfix and it includes “After=syslog.target network.target”.