Postgresql Standby Instance fails to start | memory allocation error

I have a postgresql cluster (Primary and standby) in replica mode. The primary instance is running fine, but the standby instance is crashing every time I try to start it, and I get this error in the logs:

2024-07-24 16:55:58.834 EEST [3071] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-07-24 16:55:58.840 EEST [3071] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-07-24 16:55:58.850 EEST [3081] LOG:  database system was interrupted while in recovery at log time 2024-07-21 00:00:00 EEST
2024-07-24 16:55:58.850 EEST [3081] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
cp: cannot stat 'pg_wal/RECOVERYHISTORY': No such file or directory
2024-07-24 16:55:58.986 EEST [3081] LOG:  entering standby mode
cp: cannot stat 'pg_wal/RECOVERYXLOG': No such file or directory
2024-07-24 16:55:58.992 EEST [3081] LOG:  redo starts at 8/D4C5EFF0
cp: cannot stat 'pg_wal/RECOVERYXLOG': No such file or directory
2024-07-24 16:55:59.115 EEST [3081] LOG:  consistent recovery state reached at 8/D57519F8
2024-07-24 16:55:59.118 EEST [3081] FATAL:  invalid memory alloc request size 2016612846
2024-07-24 16:55:59.119 EEST [3071] LOG:  startup process (PID 3081) exited with exit code 1
2024-07-24 16:55:59.119 EEST [3071] LOG:  terminating any other active server processes
2024-07-24 16:55:59.119 EEST [3071] LOG:  shutting down due to startup process failure
2024-07-24 16:55:59.132 EEST [3071] LOG:  database system is shut down

Currently, transparent huge pages are disabled. Server has 32 GiB of memory, and vm.swappiness is set to 3.

These are the postgresql memory parameters:

max_connections = 200
shared_buffers = 8GB            # min 128kB
huge_pages = try                        # on, off, or try
temp_buffers = 16MB     # min 800kB
work_mem = 20971kB                              # min 64kB
maintenance_work_mem = 2GB              # min 1MB
wal_level = replica                     # minimal, replica, or logical
max_worker_processes = 8                # (change requires restart)
max_parallel_workers_per_gather = 2     # taken from max_parallel_workers
max_parallel_maintenance_workers = 2    # taken from max_parallel_workers
max_parallel_workers = 8
fsync = on                              # flush data to disk for crash safety
synchronous_commit = on         # synchronization level;
wal_log_hints = on                      # also do full page writes of non-critical updates
wal_buffers = 16MB                      # min 32kB, -1 sets based on shared_buffers
checkpoint_completion_target = 0.9      # checkpoint target duration, 0.0 - 1.0
max_wal_size = 8GB
min_wal_size = 1GB
archive_mode = on               # enables archiving; off, on, or always
max_wal_senders = 20            # max number of walsender processes
wal_keep_size = 64              # in megabytes; 0 disables
hot_standby = off                       # "off" disallows queries during recovery
effective_cache_size = 20GB

Server is EL version 8 on (virtual machine on a physical hypervisor)

Looks like it’s trying to allocate 2Gb of memory.

Can you run ‘free’ on the serer where it’s complaining?

There are also other “errors” before that, so something else could be wrong, why is it talking about “recovery”?

Maybe there’s a more verbose log setting, or maybe there’s a tool to do low level consistency check?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.