我正在安装新安装的postgres 10.4,在负载测试期间,我不断收到内存不足的错误。

配置:

 $ cat /etc/security/limits.conf
 postgres   hard    memlock 508559360
 postgres   soft    memlock 508559360


$ cat /etc/sysctl.conf
vm.nr_hugepages = 248320
vm.hugetlb_shm_group = 118
vm.overcommit_memory=2
vm.swappiness=1
vm.vfs_cache_pressure=50
kernel.sem = 250 32000 32 128

系统规范:
$ cat /proc/cpuinfo | grep "core id" | wc -l
32

每个CPU都是2.10GHz @ Intel Xeon(R)CPU E5-2695 v4。如上所示,有16个物理核心,32个逻辑核心。
$ free -m
free -m
             total        used        free      shared  buff/cache   available
Mem:         510969       498048      2434      3098    10485        8343
Swap:        1071         1071        0

注意:我们将memlock设置为大约485 GB,直接用于postgres,显示在“使用过”的高栏中。
$ df -h /dev/shm
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           250G   64K  250G   1% /dev/shm

$ cat /proc/meminfo
MemTotal:       523232496 kB
MemFree:         1216648 kB
MemAvailable:    8348808 kB
Buffers:           65220 kB
Cached:         10211780 kB
SwapCached:         7620 kB
Active:          5556092 kB
Inactive:        5432428 kB
Active(anon):    2208064 kB
Inactive(anon):  1593084 kB
Active(file):    3348028 kB
Inactive(file):  3839344 kB
Unevictable:       24128 kB
Mlocked:           24128 kB
SwapTotal:       1097724 kB
SwapFree:              0 kB
Dirty:                52 kB
Writeback:             0 kB
AnonPages:        728044 kB
Mapped:            62904 kB
Shmem:           3085592 kB
Slab:            1454276 kB
SReclaimable:    1345008 kB
SUnreclaim:       109268 kB
KernelStack:       11984 kB
PageTables:        22856 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    15770860 kB
Committed_AS:    7920136 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:     71680 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:   248320
HugePages_Free:    245883
HugePages_Rsvd:     1800
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      329600 kB
DirectMap2M:    534444032 kB

Postgres信息:
SELECT version();
PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609, 64-bit

max_connections=600
shared_buffers= '8192 MB'
work_mem = '20 MB'
maintenance_work_mem = 2GB
max_parallel_workers = 8
wal_buffers = 16MB
max_wal_size = 20GB
min_wal_size = 1GB
checkpoint_completion_target = 0.9
effective_cache_size = '364 GB'
default_statistics_target = 1000
log_timezone = 'US/Eastern'
track_activities = on
track_counts = on
track_io_timing = on
stats_temp_directory = 'pg_stat_tmp'
datestyle = 'iso, mdy'
timezone = 'US/Eastern'
default_text_search_config = 'pg_catalog.english'
transform_null_equals = on
shared_preload_libraries = 'pg_stat_statements'
track_activity_query_size = 16384
track_functions = all
track_io_timing = true
pg_stat_statements.track = all
session_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = '3s'
auto_explain.log_nested_statements='on'
auto_explain.log_analyze=true

注意:我已经开始使用设置为250 MB的高工作内存,并将其缓慢降低到20 MB,但仍然收到错误。我还验证了连接数不会超过120。在 session 模式下,我们在实例前面使用PGBouncer。

该错误对于堆栈溢出而言太大,但是在这里,它们是链接在一起:

https://codepad.co/snippet/derPU4E8

突出的错误是:
2018-06-18 15:02:22 EDT,28197,mydb,ERROR:  could not resize shared memory segment "/PostgreSQL.1552129380" to 192088 bytes: No space left on device

我不明白它在谈论什么设备。我没有什么似乎与OOM接近的,更不用说它谈论的很小的192088字节了。
2018-06-18 15:02:22 EDT,19708,mydb,ERROR:  out of memory
2018-06-18 15:02:22 EDT,19708,mydb,DETAIL:  Failed on request of size 7232.


2018-06-18 15:02:22 EDT,16688,,LOG:  could not fork worker process: Cannot allocate memory
2018-06-18 15:02:22 EDT,4555,,ERROR:  out of memory
2018-06-18 15:02:22 EDT,4555,,DETAIL:  Failed on request of size 78336.
2018-06-18 15:02:22 EDT,4552,,LOG:  could not open directory "/usr/lib/postgresql/10/share/timezone": Cannot allocate memory
2018-06-18 15:02:22 EDT,19935,mydb,ERROR:  could not load library "/usr/lib/postgresql/10/lib/auto_explain.so": /usr/lib/postgresql/10/lib/auto_explain.so: failed to map segment from shared object


2018-06-18 15:02:22 EDT,28193,mydb,ERROR:  out of memory
2018-06-18 15:02:22 EDT,28193,mydb,DETAIL:  Failed on request of size 8192.
2018-06-18 15:02:22 EDT,26927,mydb,ERROR:  could not resize shared memory segment "/PostgreSQL.1101931262" to 192088 bytes: No space left on device

问题:如何调试此问题,更重要的是如何解决此问题?

最佳答案

我遵循了Laurenz的建议,不再运行OOM。
vm.overcommit_ratio = 100

此外,我删除了sysctl/limits中的memlock和大页面定义。

关于postgresql - Postgres内存不足,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50916940/

10-16 23:52