I run a number of mail servers used for only sending out mail from internal app servers to the public internet. These servers run Postfix for the MTA and OpenDKIM to add signatures to outgoing mail for domains configured for this. These servers are also joined to an active directory domain for admin user login, using realmd/sssd.
Recently a package update the servers received on December 16th caused OpenDKIM to start having a run on file descriptors until errors start being thrown and things begin to break as the service stops responding as Postfix is expecting.
In the mail logs I will see:
opendkim: OpenDKIM Filter: accept() returned invalid socket (Numerical result out of range), try again
postfix/smtpd: warning: milter unix:/var/spool/opendkim/opendkim.sock: can't read SMFIC_OPTNEG reply packet header: Connection reset by peer
postfix/smtpd: warning: milter unix:/var/spool/opendkim/opendkim.sock: read error in initial handshake
This will hit the logs after OpenDKIM has exhausted its file descriptor allotment, so not really a helpful error, just a result of the problem.
LSOF will highlight the problem when showing everything for the opendkim user, normally there are only a handful of these lines present on a healthy server, but on a problem server they will increase until it breaks:
opendkim 231880 opendkim 58u unix 0xffff9352878650c0 0t0 1227972 type=STREAM (CONNECTED)
opendkim 231880 opendkim 59u unix 0xffff935291b70880 0t0 1227387 type=STREAM (CONNECTED)
opendkim 231880 opendkim 60u unix 0xffff935291b72a80 0t0 1227530 type=STREAM (CONNECTED)
opendkim 231880 opendkim 61u unix 0xffff9353654faec0 0t0 1226695 type=STREAM (CONNECTED)
opendkim 231880 opendkim 62u unix 0xffff935291b75d80 0t0 1227535 type=STREAM (CONNECTED)
opendkim 231880 opendkim 63u unix 0xffff93534a4a3300 0t0 1229268 type=STREAM (CONNECTED)
opendkim 231880 opendkim 64u unix 0xffff935287862200 0t0 1228156 type=STREAM (CONNECTED)
opendkim 231880 opendkim 65u unix 0xffff9353654fbfc0 0t0 1229846 type=STREAM (CONNECTED)
opendkim 231880 opendkim 66u unix 0xffff935283732ec0 0t0 1229863 type=STREAM (CONNECTED)
opendkim 231880 opendkim 67u unix 0xffff93534a7050c0 0t0 1227583 type=STREAM (CONNECTED)
opendkim 231880 opendkim 68u unix 0xffff9352856b2a80 0t0 1227614 type=STREAM (CONNECTED)
In testing I have narrowed down the problem packages to these ones:
When I roll back a test server to the 2.8.2-3 versions of the above packages and restart the OpenDKIM service the problem completely goes away.
I have been unable to find any log output anywhere that is providing anything helpful. To observe the issue I have to hit my test server with a load of about 10 emails/sec, and from a domain that OpenDKIM is configured to sign. If all the mail is from domains that aren’t in the signingtable then the problem does not occur.
Any thoughts on this would be appreciated