概述

thinkforce ARM Server是多核心ARM服务器,硬件环境资源如下:

CPU信息如下:

yuxun@yuxun:/$ lscpu
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          80
On-line CPU(s) list:             0-79
Thread(s) per core:              1
Core(s) per socket:              40
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       ARM
Model:                           1
Stepping:                        r1p1
BogoMIPS:                        50.00
NUMA node0 CPU(s):               0-39
NUMA node1 CPU(s):               40-79
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp

nond 节点与 vCPU 对应关系,node0:[0~39], node1:[40~79] 。

PCIe 挂接设备

root@yuxun:~# lspci -tv
-+-[0001:40]---00.0-[41-51]----00.0 3D Graphics controller Device 1ec8:8810
 +-[0001:00]---00.0-[01-09]----00.0-[02-09]--+-00.0-[03]----00.0  Device 025e:f1ac  [NVM Express]
 |                                           +-04.0-[04]----00.0  ASMedia Technology Inc. Device 0625  [ SATA controller AHCI 1.0]
 |                                           +-06.0-[05-06]--+-00.0  Intel Corporation I350 Gigabit Network Connection
 |                                           |               \-00.1  Intel Corporation I350 Gigabit Network Connection
 |                                           +-07.0-[07-08]----00.0-[08]----00.0  ASPEED Technology, Inc. ASPEED Graphics Family [VGA controller]
 |                                           \-08.0-[09]--
 |
 +-[0000:40]---00.0-[41-51]----00.0 3D Graphics controller Device 1ec8:8810
 \-[0000:00]---00.0-[01]--+-00.0  Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
                          \-00.1  Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

2张GPU显卡、1张万兆网卡、1张板载千兆网卡、 2T的NVMe 和 AHCI 固态盘。

服务器应用环境

ubuntu-20系统、80核芯ARM Server服务器,运行 62路 android11 Docker 容器,为平衡系统资源使用,对网卡、显卡核存储的亲和性、进行统一部署。
基本方案是每个 node 上预留出 8核心,总共占用16核心; 余下64核芯分配给容器使用。

查看系统 irqbalance 状态

在进行中断亲和性配置前、需要了解系统 irqbalance 服务器情况,如下:

root@yuxun:~# service irqbalance status
● irqbalance.service - irqbalance daemon
     Loaded: loaded (/lib/systemd/system/irqbalance.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2023-09-26 09:24:09 CST; 4h 42min ago
       Docs: man:irqbalance(1)
             https://github.com/Irqbalance/irqbalance
   Main PID: 1774 (irqbalance)
      Tasks: 2 (limit: 98302)
     Memory: 3.3M
     CGroup: /system.slice/irqbalance.service
             └─1774 /usr/sbin/irqbalance --foreground

Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp55@pci:0000:01:00.1(206) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp56@pci:0000:01:00.1(207) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp57@pci:0000:01:00.1(208) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp58@pci:0000:01:00.1(209) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp59@pci:0000:01:00.1(210) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp60@pci:0000:01:00.1(211) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp61@pci:0000:01:00.1(212) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp62@pci:0000:01:00.1(213) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ inno-drv(215) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ inno-drv(217) guessed as class 0

由此可看到 pci:0000:01:00.1 网卡有8个中断,INNO-DRV是GPU中断,都注册到 irqbalance service 中,会被 balance 策略
优化。

irqbalance 常用方法如下:

开启IRQBalance服务:  service irqbalance start

关闭IRQBalance服务:  service irqbalance stop

关闭开机启动IRQBalance服务:  chkconfig --level 123456 irqbalance off

另外查看/proc/interrupts 这个文件可以看到各个cpu中断情况,直接top命令也可以查看cpu中断情况。

irqbalance 环境敏感性

CPU处于 Performance mode 时,irqbalance 会将中断尽可能均匀地分发给各个 CPU core,以充分利用 CPU 多核,提升性能。

CPU处于 Power-save mode 时,irqbalance 会将中断集中分配给第一个 CPU,以保证其它空闲 CPU 的睡眠时间,降低能耗。

由此可知、需要配置 CPU 的 Performance mode 模式, 避免 irqbalance 调度 网卡、显卡的中断亲和性。

关闭 irqbalance 服务开机启动

root@yuxun:~# service irqbalance status
● irqbalance.service - irqbalance daemon
     Loaded: loaded (/lib/systemd/system/irqbalance.service; disabled; vendor preset: enabled)
     Active: active (running) since Tue 2023-09-26 09:24:09 CST; 5h 42min ago
       Docs: man:irqbalance(1)
             https://github.com/Irqbalance/irqbalance
   Main PID: 1774 (irqbalance)
      Tasks: 2 (limit: 98302)
     Memory: 3.4M
     CGroup: /system.slice/irqbalance.service
             └─1774 /usr/sbin/irqbalance --foreground

Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp55@pci:0000:01:00.1(206) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp56@pci:0000:01:00.1(207) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp57@pci:0000:01:00.1(208) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp58@pci:0000:01:00.1(209) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp59@pci:0000:01:00.1(210) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp60@pci:0000:01:00.1(211) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp61@pci:0000:01:00.1(212) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp62@pci:0000:01:00.1(213) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ inno-drv(215) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ inno-drv(217) guessed as class 0
root@yuxun:~# systemctl stop irqbalance
root@yuxun:~# service irqbalance status
● irqbalance.service - irqbalance daemon
     Loaded: loaded (/lib/systemd/system/irqbalance.service; disabled; vendor preset: enabled)
     Active: inactive (dead)
       Docs: man:irqbalance(1)
             https://github.com/Irqbalance/irqbalance

Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp58@pci:0000:01:00.1(209) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp59@pci:0000:01:00.1(210) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp60@pci:0000:01:00.1(211) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp61@pci:0000:01:00.1(212) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ mlx5_comp62@pci:0000:01:00.1(213) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ inno-drv(215) guessed as class 0
Sep 26 11:12:18 yuxun /usr/sbin/irqbalance[1774]: IRQ inno-drv(217) guessed as class 0
Sep 26 15:07:26 yuxun systemd[1]: Stopping irqbalance daemon...
Sep 26 15:07:26 yuxun systemd[1]: irqbalance.service: Succeeded.
Sep 26 15:07:26 yuxun systemd[1]: Stopped irqbalance daemon.

手动部署 irq 亲和性、可考虑关闭 irqbalance 服务,否则、 irqbalance 会自行调整 irq 亲和参数。

中断掩码计算方法

中断亲和性配置、需要清晰中断掩码计算方法,说明如下:
“f” 是十六进制的值, 二进制是”1111”. 二进制中的每个位代表了服务器上的每个CPU. 那么用以下方法表示每个CPU
二进制 十六进制
CPU 0 0001 1
CPU 1 0010 2
CPU 2 0100 4
CPU 3 1000 8
结合这些位掩码(简单来说就是直接对十六进制值做加法), 我们就能一次定位多个CPU。 例如, 我想同时表示CPU0和CPU2, bitmask结果就是:
二进制 十六进制
CPU 0 0001 1

  • CPU 2 0100 4

    bitmask 0101 5
    如果我想一次性表示所有4个CPU,bitmask结果是:
    二进制 十六进制
    CPU 0 0001 1
    CPU 1 0010 2
    CPU 2 0100 4
  • CPU 3 1000 8

    bitmask 1111 f

实例分析:
thinkforce Arm Server 80核

/# cat /proc/irq/default_smp_affinity
ffff,ffffffff,ffffffff

中断掩码规则:

f     fff,ffffffff,ffff   f    f     f    f     //> 掩码
76~79                   12~15  8~11  4~7  0~4   //> vCPU 核心数

每 4 bits 代表 4个核芯号,可参考显卡邦核实例进一步理解掩码规则。

<1>. 显卡邦核

thinkforce PCIe 显卡现状

<1.1> gpu0 槽位信息和中断

0000:41:00.0 3D controller: Device 1ec8:8810
Subsystem: Device 1ec8:8810
Flags: bus master, fast devsel, latency 0, IRQ 215, NUMA node 0
Memory at 6400000000 (64-bit, prefetchable) [size=256M]
Memory at 6000000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at d0000000 [virtual] [disabled] [size=512K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] Alternative Routing-ID Interpretation (ARI)
Capabilities: [158] Secondary PCI Express
Capabilities: [178] Physical Layer 16.0 GT/s <?> Capabilities: [1a0] Lane Margining at the Receiver <?>
Capabilities: [1c8] Single Root I/O Virtualization (SR-IOV)
Capabilities: [208] L1 PM Substates
Capabilities: [218] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?> Capabilities: [318] Data Link Feature <?>
Capabilities: [324] Vendor Specific Information: ID=0006 Rev=0 Len=018 <?>
Kernel driver in use: inno-drv

<1.1> gpu1 槽位信息和中断

0001:41:00.0 3D controller: Device 1ec8:8810
Subsystem: Device 1ec8:8810
Flags: bus master, fast devsel, latency 0, IRQ 217, NUMA node 1
Memory at e400000000 (64-bit, prefetchable) [size=256M]
Memory at e000000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at 80d0000000 [virtual] [disabled] [size=512K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] Alternative Routing-ID Interpretation (ARI)
Capabilities: [158] Secondary PCI Express
Capabilities: [178] Physical Layer 16.0 GT/s <?> Capabilities: [1a0] Lane Margining at the Receiver <?>
Capabilities: [1c8] Single Root I/O Virtualization (SR-IOV)
Capabilities: [208] L1 PM Substates
Capabilities: [218] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?> Capabilities: [318] Data Link Feature <?>
Capabilities: [324] Vendor Specific Information: ID=0006 Rev=0 Len=018 <?>
Kernel driver in use: inno-drv

查询 inno 显卡中断号

yuxun@yuxun:~$ cat /proc/interrupts |grep inno* |awk '{print $1}'|cut -f1 -d":"
215
217

配置 215 号 GPU0 显卡邦核

配置GPU0 中断使用 36号核心

yuxun@yuxun:~$ cat /proc/irq/215/smp_affinity
0000,00000004,00000000
yuxun@yuxun:~$ cat /proc/irq/215/smp_affinity_list
34
root@yuxun:/home/yuxun# echo 10,00000000 > /proc/irq/215/smp_affinity
root@yuxun:/home/yuxun# cat /proc/irq/215/smp_affinity
0000,00000010,00000000
root@yuxun:/home/yuxun# cat /proc/irq/215/smp_affinity_list
36

配置 217 号 GPU1 显卡邦核

配置217号 使用 76 核芯

root@yuxun:/home/yuxun# echo 1000,00000000,00000000 > /proc/irq/217/smp_affinity
root@yuxun:/home/yuxun# cat /proc/irq/217/smp_affinity_list
76
root@yuxun:/home/yuxun# cat /proc/irq/217/smp_affinity
1000,00000000,00000000

显卡中断亲核性已经开启、两张显卡分别对应在 nodo0-36 和node1-76 节点上。

<2>. 网卡邦核

thinkforce PCIe 槽位的网卡现状

<2.1> PCIe 板载网卡 I350

0001:05:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
        Flags: bus master, fast devsel, latency 0, IRQ 28, NUMA node 1
        Memory at ffe1b00000 (32-bit, non-prefetchable) [size=128K]
        I/O ports at 0000
        Memory at ffe1b40000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number f0-41-c8-ff-ff-c2-e4-74
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [1a0] Transaction Processing Hints
        Capabilities: [1c0] Latency Tolerance Reporting
        Capabilities: [1d0] Access Control Services
        Kernel driver in use: igb
        Kernel modules: igb

0001:05:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
        Flags: bus master, fast devsel, latency 0, IRQ 30, NUMA node 1
        Memory at ffe1b20000 (32-bit, non-prefetchable) [size=128K]
        I/O ports at 0000
        Memory at ffe1b44000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number f0-41-c8-ff-ff-c2-e4-74
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [1a0] Transaction Processing Hints
        Capabilities: [1d0] Access Control Services
        Kernel driver in use: igb
        Kernel modules: igb

<2.2> MT27710 光口FC

0000:01:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
        Subsystem: Mellanox Technologies ConnectX-4 Lx Stand-up dual-port 10GbE MCX4121A-XCAT
        Flags: bus master, fast devsel, latency 0, IRQ 43, NUMA node 0
        Memory at 7c00000000 (64-bit, prefetchable) [size=32M]
        Expansion ROM at 7fe0000000 [disabled] [size=1M]
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [48] Vital Product Data
        Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
        Capabilities: [c0] Vendor Specific Information: Len=18 <?>
        Capabilities: [40] Power Management version 3
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [180] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [1c0] Secondary PCI Express
        Capabilities: [230] Access Control Services
        Kernel driver in use: mlx5_core
        Kernel modules: mlx5_core

0000:01:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
        Subsystem: Mellanox Technologies ConnectX-4 Lx Stand-up dual-port 10GbE MCX4121A-XCAT
        Flags: bus master, fast devsel, latency 0, IRQ 149, NUMA node 0
        Memory at 7c02000000 (64-bit, prefetchable) [size=32M]
        Expansion ROM at 7fe0100000 [disabled] [size=1M]
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [48] Vital Product Data
        Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
        Capabilities: [c0] Vendor Specific Information: Len=18 <?>
        Capabilities: [40] Power Management version 3
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [180] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [230] Access Control Services
        Kernel driver in use: mlx5_core
        Kernel modules: mlx5_core

获取 state UP 的网卡名称

yuxun@yuxun:~$ ip a |grep "state UP" |sed "/docker/d" |awk '{print $2}' |cut -f1 -d":"
enp1s0f1

平台使用 PCIe MT27710 光口网卡, 测试状态采用广电转换模块, 网卡名称如下:

enp1s0f0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether b8:59:9f:e3:92:66  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp1s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.31.52  netmask 255.255.255.0  broadcast 192.168.31.255
        inet6 fe80::ba59:9fff:fee3:9267  prefixlen 64  scopeid 0x20<link>
        ether b8:59:9f:e3:92:67  txqueuelen 1000  (Ethernet)
        RX packets 2449468  bytes 3646926557 (3.6 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 135330  bytes 13037696 (13.0 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

获取网卡 bus_info 信息

yuxun@yuxun:~$ ethtool -i enp1s0f1
driver: mlx5_core
version: 5.0-0
firmware-version: 14.25.1020 (MT_2420110004)
expansion-rom-version:
bus-info: 0000:01:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

~$ :~$ ethtool -i enp1s0f1|grep bus-info |awk '{ print $2 }'
0000:01:00.1

修改网卡队列

yuxun@yuxun:~$ ethtool -l enp1s0f1
Channel parameters for enp1s0f1:
Pre-set maximums:
RX:             0
TX:             0
Other:          0
Combined:       63
Current hardware settings:
RX:             0
TX:             0
Other:          0
Combined:       63
//> sudo -s 切换至 root
root@yuxun:/home/yuxun# ethtool -L enp1s0f1 combined 32
root@yuxun:/home/yuxun# ethtool -l enp1s0f1
Channel parameters for enp1s0f1:
Pre-set maximums:
RX:             0
TX:             0
Other:          0
Combined:       63
Current hardware settings:
RX:             0
TX:             0
Other:          0
Combined:       8
//> 查看网卡队列
# ls /sys/class/net/enp1s0f1/queues/
rx-0  rx-1  rx-10  rx-11  rx-12  rx-13  rx-14  rx-15  rx-2  rx-3  rx-4  rx-5  rx-6  rx-7  rx-8  rx-9  tx-0  tx-1  tx-2  tx-3  tx-4  tx-5  tx-6  tx-7

获取 bus_info 中断号

ethtool -i enp1s0f1 |grep bus-info|awk '{print $2}'
0000:01:00.1
root@yuxun:/home/yuxun# cat /proc/interrupts |grep "0000:01:00.1"|awk -F: '{print $1}'
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209
210 211 212 213

查询中断亲和性

中断亲和分布在哪些 vCPU 上,命令中的 b r e a k v a l u e 为查询到的网卡中断号。 c a t / p r o c / i r q / {break_value}为查询到的网卡中断号。 cat /proc/irq/ breakvalue为查询到的网卡中断号。cat/proc/irq/{interrupt_num}/smp_affinity_list

yuxun@yuxun:~$ cat /proc/irq/150/smp_affinity
0000,00000000,10000000
yuxun@yuxun:~$ cat /proc/irq/150/smp_affinity_list
28
yuxun@yuxun:~$ cat /proc/irq/151/smp_affinity_list
21
yuxun@yuxun:~$ cat /proc/irq/153/smp_affinity_list
10
yuxun@yuxun:~$ cat /proc/irq/154/smp_affinity_list
30
yuxun@yuxun:~$ cat /proc/irq/155/smp_affinity_list
37
yuxun@yuxun:~$ cat /proc/irq/166/smp_affinity_list
28
yuxun@yuxun:~$ cat /proc/irq/180/smp_affinity_list
21
yuxun@yuxun:~$ cat /proc/irq/190/smp_affinity_list
14
yuxun@yuxun:~$ cat /proc/irq/200/smp_affinity_list
22
yuxun@yuxun:~$ cat /proc/irq/213/smp_affinity_list
33

<2.3> 开启 RPS 功能

首先内核要开启CONFIG_RPS编译选项,然后设置需要将队列软中断分配到哪些CPU; 要将 enp1s0f1 上0号收包软中断均匀分配到64个CPU上;如下:

root@yuxun:~# echo 00f,ffffff00,ffffffff > /sys/class/net/enp1s0f1/queues/rx-0/rps_cpus
root@yuxun:~# echo 00f,ffffff00,ffffffff > /sys/class/net/enp1s0f1/queues/rx-1/rps_cpus
root@yuxun:~# echo 00f,ffffff00,ffffffff > /sys/class/net/enp1s0f1/queues/rx-2/rps_cpus
root@yuxun:~# echo 00f,ffffff00,ffffffff > /sys/class/net/enp1s0f1/queues/rx-3/rps_cpus
root@yuxun:~# cat /sys/class/net/enp1s0f1/queues/rx-4/rps_cpus
0000,00000000,00000000
root@yuxun:/home/yuxun# cat /sys/class/net/enp1s0f1/queues/rx-3/rps_cpus
000f,ffffff00,ffffffff

<2.4> 开启 RFS 功能

RFS同样需要开启CONFIG_RPS编译选项,同时设置每个队列的数据流表总数才能真正生效。RFS的实现需要依赖两个表——全局socket流表(rps_sock_flow_table)和设备流表(rps_dev_flow_table)。
全局socket流表记录的是每个流由上面RPS计算通过hash分配的CPU号,也就是期望的CPU号;设备流表存在于每个网络设备的每个接收队列,表中记录的是每个未完成流使用的CPU号,
也就是当前流使用的CPU号。
具体使用哪个CPU简单来说有以下规则,
<1>. 如果两个表中记录的对应流使用的是同一个CPU号,就使用这个CPU
<2>. 如果当前流使用的CPU未设置或者CPU处于离线状态,那就使用期望CPU表中的CPU号,也就是RPS计算而得的CPU号
<3>. 如果两个表中对应流记录的CPU核不是同一个:
<4>. 如果同一流的前一段数据包未处理完,为了避免乱序,不更换CPU,继续使用当前流使用的CPU号
<5>. 如果同一流的前一段数据包已经处理完,那就可以使用期望CPU表中的CPU号.

全局socket流表(rps_sock_flow_table),推荐数值32768,该配置接口:

/proc/sys/net/core/rps_sock_flow_entries

设备流表(rps_dev_flow_table),该配置接口:

/sys/class/net/<dev>/queues/rx-<n>/rps_flow_cnt

两者的关系如下,

rps_sock_flow_entries = rps_flow_cnt * N

其中,N就是队列数量。因此,对于单队列网卡,两个值是一样的。

<2.5> 开启 XPS 功能

XPS通过创建CPU到网卡发送队列的对应关系,来保证处理发送软中断请求的CPU和向外发送数据包的CPU是同一个CPU,用来保证发送数据包时候的局部性。
发送队列到CPU的映射有两种选择:
<1>. 使用CPU映射,通过指定发送队列在某几个 CPU 上处理,减小分发的CPU范围来减少锁开销以及cache miss。最常见的就是1对1,和上面说到的接收软中断绑核类似;
通过以下接口设置,

/sys/class/net/<dev>/queues/tx-<n>/xps_cpus

<2>. 接收队列映射,基于接收队列的映射来选择CPU,也就是说让接收队列和发送队列在同一个CPU,或指定范围的几个CPU来处理。
这种方式对于多线程一直收发包的系统效果比较明显,收发包队列处理在同一个CPU,不仅减少了对其他CPU的打断,同时提高应用处理效率,
收完包后直接在同个CPU继续发包,从而减小CPU消耗,同时减小包的时延。
通过一下接口设置(不是所有网卡都支持),

/sys/class/net/<dev>/queues/tx-<n>/xps_rxqs

例:开启RPS、RFS,XPS

$ sudo -s
$ echo 32768 > /proc/sys/net/core/rps_sock_flow_entries
$ echo ffff,ffff00ff,ffffff00 > /sys/class/net/enp1s0f1/queues/rx-1/rps_cpus
$ echo 4096 > /sys/class/net/enp1s0f1/queues/rx-0/rps_flow_cnt

<2.6> 开启 TSO 特性

TSO(TCP Segmentation Offload)将传出的TCP数据包的分片工作交给网卡来做,这样可以提高大量使用TCP协议传输数据的应用程序的性能。
使用了TSO特性后,将为CPU减负,可有效降低发送端的CPU利用率。 使用ethtool 开启 TSO 特性:

 root@yuxun:~# ethtool -K enp1s0f1 tso on
root@yuxun:~# ethtool -k enp1s0f1
Features for enp1s0f1:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: on
        tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: on
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: on [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

设置时未提示错误,查看网卡参数时,未发现 tso 参数, MT27710 网卡有可能不支持。
启用端口的TSO特性以实现更高的吞吐量。

<2.7> 网卡默认开启

开启 GSO(Generic Segmentation Offload) 特性

GSO(Generic Segmentation Offload):是一种延缓分片技术。它比 TSO 更通用,原因在于它不需要硬件的支持就可以进行分片。
网卡默认开启 GSO 功能.

开启 LRO(Large Receive Offload)特性

LRO(Large Receive Offload):将网卡接收到的多个数据包合并成一个大的数据包,然后再传递给网络协议栈处理的技术。这样提系统接收数据包的能力,减轻 CPU 负载。
此方案未开启 LRO 功能。

开启 GRO (Generic Receive Offload) 特性

GRO (Generic Receive Offload):是 LRO 的软实现,GRO 的合并条件更加的严格和灵活。
网卡默认开启 GSO 功能。

<2.8> 调整中断聚合时间

中断聚合通过合并多个接收到的数据包中断事件,将其一起发送到单个中断中,从而减少了网卡生成的中断数量。

使用中断聚合将带来影响:
<1>. 产生更少的中断, <2>. 降低CPU利用率。 <3>. 增加响应延时。 <4>. 提高整体吞吐量。
对网络小包传输效果显著,实时性稍有影响,不易把时间设置太大,在这里我们增大了中断聚合相关参数。

调整中断聚合参数:

root@yuxun:~# ethtool -c enp1s0f1
Coalesce parameters for enp1s0f1:
Adaptive RX: on  TX: on
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 8
rx-frames: 128
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 8
tx-frames: 128
tx-usecs-irq: 0
tx-frames-irq: 0

rx-usecs-low: 0
rx-frames-low: 0
tx-usecs-low: 0
tx-frames-low: 0

rx-usecs-high: 0
rx-frames-high: 0
tx-usecs-high: 0
tx-frames-high: 0

//> 调整接收延时
root@yuxun:~# ethtool -C enp1s0f1 adaptive-rx off adaptive-tx off rx-usecs 16 rx-frames 128 tx-usecs 16 tx-frames 128
rx-frames unmodified, ignoring
tx-frames unmodified, ignoring
root@yuxun:~#  ethtool -c enp1s0f1
Coalesce parameters for enp1s0f1:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 16
rx-frames: 128
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 16
tx-frames: 128
tx-usecs-irq: 0
tx-frames-irq: 0

rx-usecs-low: 0
rx-frames-low: 0
tx-usecs-low: 0
tx-frames-low: 0

rx-usecs-high: 0
rx-frames-high: 0
tx-usecs-high: 0
tx-frames-high: 0

为了确保使用静态值,需禁用自适应调节,关闭Adaptive RX和Adaptive TX,此网卡不支持自适应调整参数。

rx-usecs:设置接收中断延时的时间。 tx-usecs:设置发送中断延时的时间。
rx-frames:产生中断之前接收的数据包数量。 tx-frames:产生中断之前发送的数据包数量。

<2.9> 测试

采用netperf 工具、打流、TCP_RR性能。
每种测试类型执行3次,中间睡眠10秒, 每种测试类型分别执行100、500、1500个实例, 每实例测试时间长度为60秒

TCP_RR 1 byte: 测试TCP 小数据包 request/response的性能

netperf -t TCP_RR -H $serverip -c -C -l 60

UDP_RR 1 byte: 测试UDP 小数据包 request/response的性能

netperf -t UDP_RR -H $serverip -c -C -l 60

TCP_RR 256 byte: 测试TCP 大数据包 request/response的性能

netperf -t TCP_RR -H $serverip -c -C -l 60 -- -r256,256

UDP_RR 256 byte: 测试UDP 大数据包 request/response的性能

netperf -t UDP_RR -H $serverip -c -C -l 60 -- -r256,256

<3>. NVMe 存储器邦核

PCIe 存储阵列现状

## <3.1> 2T NVME 存储阵列
0001:03:00.0 Non-Volatile memory controller: Device 025e:f1ac (prog-if 02 [NVM Express])
        Subsystem: Device 025e:f1ac
        Flags: bus master, fast devsel, latency 0, IRQ 25, NUMA node 1
        Memory at ffe1800000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [80] Power Management version 3
        Capabilities: [90] MSI: Enable- Count=1/32 Maskable- 64bit+
        Capabilities: [b0] MSI-X: Enable+ Count=33 Masked-
        Capabilities: [c0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [1b8] Latency Tolerance Reporting
        Capabilities: [300] Secondary PCI Express
        Capabilities: [900] L1 PM Substates
        Capabilities: [910] Data Link Feature <?>
        Capabilities: [920] Lane Margining at the Receiver <?>
        Capabilities: [9c0] Physical Layer 16.0 GT/s <?>
        Kernel driver in use: nvme
        Kernel modules: nvme

## <3.2> 
0001:04:00.0 SATA controller: ASMedia Technology Inc. Device 0625 (rev 01) (prog-if 01 [AHCI 1.0])
        Subsystem: ASMedia Technology Inc. Device 1060
        Flags: bus master, fast devsel, latency 0, IRQ 25, NUMA node 1
        Memory at ffe1a80000 (32-bit, non-prefetchable) [size=8K]
        Expansion ROM at ffe1a00000 [disabled] [size=512K]
        Capabilities: [78] Power Management version 3
        Capabilities: [80] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [800] Advanced Error Reporting
        Kernel driver in use: ahci
        Kernel modules: ahci

查看 nvme 中断号、与邦核现状。

root@yuxun:/home/yuxun# $nvme_all cat /proc/interrupts |grep nvme* |awk -F: '{printf $1}'
 42 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
root@yuxun:/home/yuxun# echo $nvme_all

root@yuxun:/home/yuxun# cat /proc/irq/42/smp_affinity
0000,00800000,00000000
root@yuxun:/home/yuxun# cat /proc/irq/42/smp_affinity_list
55
root@yuxun:/home/yuxun# cat /proc/irq/53/smp_affinity_list
40-42
root@yuxun:/home/yuxun# cat /proc/irq/54/smp_affinity_list
43-45
root@yuxun:/home/yuxun# cat /proc/irq/55/smp_affinity_list
46-48
root@yuxun:/home/yuxun# cat /proc/irq/56/smp_affinity_list
49-51
root@yuxun:/home/yuxun# cat /proc/irq/57/smp_affinity_list
52-54
root@yuxun:/home/yuxun# cat /proc/irq/58/smp_affinity_list
55-57
root@yuxun:/home/yuxun# cat /proc/irq/59/smp_affinity_list
58-60
root@yuxun:/home/yuxun# cat /proc/irq/60/smp_affinity_list
61-63
root@yuxun:/home/yuxun# cat /proc/irq/61/smp_affinity_list
64-65
root@yuxun:/home/yuxun# cat /proc/irq/62/smp_affinity_list
66-67
root@yuxun:/home/yuxun# cat /proc/irq/63/smp_affinity_list
68-69
root@yuxun:/home/yuxun# cat /proc/irq/64/smp_affinity_list
70-71
root@yuxun:/home/yuxun# cat /proc/irq/65/smp_affinity_list
72-73
root@yuxun:/home/yuxun# cat /proc/irq/66/smp_affinity_list
74-75
root@yuxun:/home/yuxun# cat /proc/irq/67/smp_affinity_list
76-77
root@yuxun:/home/yuxun# cat /proc/irq/68/smp_affinity_list
78-79
root@yuxun:/home/yuxun# cat /proc/irq/69/smp_affinity_list
0-2
root@yuxun:/home/yuxun# cat /proc/irq/70/smp_affinity_list
3-5
root@yuxun:/home/yuxun# cat /proc/irq/71/smp_affinity_list
6-8
root@yuxun:/home/yuxun# cat /proc/irq/72/smp_affinity_list
9-11
root@yuxun:/home/yuxun# cat /proc/irq/73/smp_affinity_list
12-14
root@yuxun:/home/yuxun# cat /proc/irq/74/smp_affinity_list
15-17
root@yuxun:/home/yuxun# cat /proc/irq/75/smp_affinity_list
18-20
root@yuxun:/home/yuxun# cat /proc/irq/76/smp_affinity_list
21-23
root@yuxun:/home/yuxun# cat /proc/irq/77/smp_affinity_list
24-25
root@yuxun:/home/yuxun# cat /proc/irq/78/smp_affinity_list
26-27
root@yuxun:/home/yuxun# cat /proc/irq/79/smp_affinity_list
28-29
root@yuxun:/home/yuxun# cat /proc/irq/80/smp_affinity_list
30-31
root@yuxun:/home/yuxun# cat /proc/irq/81/smp_affinity_list
32-33
root@yuxun:/home/yuxun# cat /proc/irq/82/smp_affinity_list
34-35
root@yuxun:/home/yuxun# cat /proc/irq/83/smp_affinity_list
36-37
root@yuxun:/home/yuxun# cat /proc/irq/84/smp_affinity_list
38-39

kvm guest ptimer 中断

cat /proc/interrupts |grep "kvm guest ptimer" |awk -F: '{print $1}'

arch_timer 中断

cat /proc/interrupts |grep arch_timer |awk -F: '{print $1}'

SATA ahci 中断

yuxun$ cat /proc/interrupts |grep ahci |awk -F: '{print $1}'
 25

未开启 irqbalance 服务, 中断分配情况

yuxun@yuxun:~/yx_xd_redroid/ibox-tool_android/scripts/yuxun$ ./find_interrupt_on_cpus.sh
if_name: enp1s0f1
interrupt total: 64
inet int_id: 150 vCPU_id: 0-39
inet int_id: 151 vCPU_id: 0
inet int_id: 152 vCPU_id: 1
inet int_id: 153 vCPU_id: 2
inet int_id: 154 vCPU_id: 3
inet int_id: 155 vCPU_id: 4
inet int_id: 156 vCPU_id: 5
inet int_id: 157 vCPU_id: 6
inet int_id: 158 vCPU_id: 7
inet int_id: 159 vCPU_id: 8
inet int_id: 160 vCPU_id: 9
inet int_id: 161 vCPU_id: 10
inet int_id: 162 vCPU_id: 11
inet int_id: 163 vCPU_id: 12
inet int_id: 164 vCPU_id: 13
inet int_id: 165 vCPU_id: 14
inet int_id: 166 vCPU_id: 15
inet int_id: 167 vCPU_id: 16
inet int_id: 168 vCPU_id: 17
inet int_id: 169 vCPU_id: 18
inet int_id: 170 vCPU_id: 19
inet int_id: 171 vCPU_id: 20
inet int_id: 172 vCPU_id: 21
inet int_id: 173 vCPU_id: 22
inet int_id: 174 vCPU_id: 23
inet int_id: 175 vCPU_id: 24
inet int_id: 176 vCPU_id: 25
inet int_id: 177 vCPU_id: 26
inet int_id: 178 vCPU_id: 27
inet int_id: 179 vCPU_id: 28
inet int_id: 180 vCPU_id: 29
inet int_id: 181 vCPU_id: 30
inet int_id: 182 vCPU_id: 31
inet int_id: 183 vCPU_id: 32
inet int_id: 184 vCPU_id: 33
inet int_id: 185 vCPU_id: 34
inet int_id: 186 vCPU_id: 35
inet int_id: 187 vCPU_id: 36
inet int_id: 188 vCPU_id: 37
inet int_id: 189 vCPU_id: 38
inet int_id: 190 vCPU_id: 39
inet int_id: 191 vCPU_id: 40
inet int_id: 192 vCPU_id: 41
inet int_id: 193 vCPU_id: 42
inet int_id: 194 vCPU_id: 43
inet int_id: 195 vCPU_id: 44
inet int_id: 196 vCPU_id: 45
inet int_id: 197 vCPU_id: 46
inet int_id: 198 vCPU_id: 47
inet int_id: 199 vCPU_id: 48
inet int_id: 200 vCPU_id: 49
inet int_id: 201 vCPU_id: 50
inet int_id: 202 vCPU_id: 51
inet int_id: 203 vCPU_id: 52
inet int_id: 204 vCPU_id: 53
inet int_id: 205 vCPU_id: 54
inet int_id: 206 vCPU_id: 55
inet int_id: 207 vCPU_id: 56
inet int_id: 208 vCPU_id: 57
inet int_id: 209 vCPU_id: 58
inet int_id: 210 vCPU_id: 59
inet int_id: 211 vCPU_id: 60
inet int_id: 212 vCPU_id: 61
inet int_id: 213 vCPU_id: 62
nvme interrupt total: 33
nvme int_id: 42 vCPU_id: 0-39
nvme int_id: 53 vCPU_id: 40-42
nvme int_id: 54 vCPU_id: 43-45
nvme int_id: 55 vCPU_id: 46-48
nvme int_id: 56 vCPU_id: 49-51
nvme int_id: 57 vCPU_id: 52-54
nvme int_id: 58 vCPU_id: 55-57
nvme int_id: 59 vCPU_id: 58-60
nvme int_id: 60 vCPU_id: 61-63
nvme int_id: 61 vCPU_id: 64-65
nvme int_id: 62 vCPU_id: 66-67
nvme int_id: 63 vCPU_id: 68-69
nvme int_id: 64 vCPU_id: 70-71
nvme int_id: 65 vCPU_id: 72-73
nvme int_id: 66 vCPU_id: 74-75
nvme int_id: 67 vCPU_id: 76-77
nvme int_id: 68 vCPU_id: 78-79
nvme int_id: 69 vCPU_id: 0-2
nvme int_id: 70 vCPU_id: 3-5
nvme int_id: 71 vCPU_id: 6-8
nvme int_id: 72 vCPU_id: 9-11
nvme int_id: 73 vCPU_id: 12-14
nvme int_id: 74 vCPU_id: 15-17
nvme int_id: 75 vCPU_id: 18-20
nvme int_id: 76 vCPU_id: 21-23
nvme int_id: 77 vCPU_id: 24-25
nvme int_id: 78 vCPU_id: 26-27
nvme int_id: 79 vCPU_id: 28-29
nvme int_id: 80 vCPU_id: 30-31
nvme int_id: 81 vCPU_id: 32-33
nvme int_id: 82 vCPU_id: 34-35
nvme int_id: 83 vCPU_id: 36-37
nvme int_id: 84 vCPU_id: 38-39
ahci interrupt total: 1
ahci int_id: 25 vCPU_id: 0-79
inno-gpu interrupt total: 2
inno-gpu int_id: 215 vCPU_id: 0-39
inno-gpu int_id: 217 vCPU_id: 40-79
arch-timer interrupt total: 1
arch-time int_id: 4 vCPU_id: 0-79
kvm-ptimer interrupt total: 1
kvm-ptimer int_id: 2 vCPU_id: 0-79
yuxun@yuxun:~/yx_xd_redroid/ibox-tool_android/scripts/yuxun$ systemctl status irqbalance
● irqbalance.service - irqbalance daemon
     Loaded: loaded (/lib/systemd/system/irqbalance.service; disabled; vendor preset: enabled)
     Active: inactive (dead)
       Docs: man:irqbalance(1)
             https://github.com/Irqbalance/irqbalance

开启 irqbalance 服务, 中断分配情况

yuxun@yuxun:~/yx_xd_redroid/ibox-tool_android/scripts/yuxun$ ./find_interrupt_on_cpus.sh
if_name: enp1s0f1
interrupt total: 64
inet int_id: 150 vCPU_id: 28
inet int_id: 151 vCPU_id: 32
inet int_id: 152 vCPU_id: 34
inet int_id: 153 vCPU_id: 10
inet int_id: 154 vCPU_id: 30
inet int_id: 155 vCPU_id: 37
inet int_id: 156 vCPU_id: 9
inet int_id: 157 vCPU_id: 2
inet int_id: 158 vCPU_id: 26
inet int_id: 159 vCPU_id: 11
inet int_id: 160 vCPU_id: 36
inet int_id: 161 vCPU_id: 12
inet int_id: 162 vCPU_id: 1
inet int_id: 163 vCPU_id: 39
inet int_id: 164 vCPU_id: 8
inet int_id: 165 vCPU_id: 4
inet int_id: 166 vCPU_id: 28
inet int_id: 167 vCPU_id: 13
inet int_id: 168 vCPU_id: 3
inet int_id: 169 vCPU_id: 7
inet int_id: 170 vCPU_id: 16
inet int_id: 171 vCPU_id: 30
inet int_id: 172 vCPU_id: 32
inet int_id: 173 vCPU_id: 29
inet int_id: 174 vCPU_id: 39
inet int_id: 175 vCPU_id: 35
inet int_id: 176 vCPU_id: 18
inet int_id: 177 vCPU_id: 0
inet int_id: 178 vCPU_id: 34
inet int_id: 179 vCPU_id: 38
inet int_id: 180 vCPU_id: 3
inet int_id: 181 vCPU_id: 20
inet int_id: 182 vCPU_id: 12
inet int_id: 183 vCPU_id: 6
inet int_id: 184 vCPU_id: 11
inet int_id: 185 vCPU_id: 15
inet int_id: 186 vCPU_id: 36
inet int_id: 187 vCPU_id: 21
inet int_id: 188 vCPU_id: 14
inet int_id: 189 vCPU_id: 5
inet int_id: 190 vCPU_id: 14
inet int_id: 191 vCPU_id: 8
inet int_id: 192 vCPU_id: 10
inet int_id: 193 vCPU_id: 17
inet int_id: 194 vCPU_id: 38
inet int_id: 195 vCPU_id: 23
inet int_id: 196 vCPU_id: 16
inet int_id: 197 vCPU_id: 9
inet int_id: 198 vCPU_id: 1
inet int_id: 199 vCPU_id: 26
inet int_id: 200 vCPU_id: 22
inet int_id: 201 vCPU_id: 19
inet int_id: 202 vCPU_id: 21
inet int_id: 203 vCPU_id: 25
inet int_id: 204 vCPU_id: 5
inet int_id: 205 vCPU_id: 31
inet int_id: 206 vCPU_id: 24
inet int_id: 207 vCPU_id: 18
inet int_id: 208 vCPU_id: 22
inet int_id: 209 vCPU_id: 24
inet int_id: 210 vCPU_id: 20
inet int_id: 211 vCPU_id: 27
inet int_id: 212 vCPU_id: 7
inet int_id: 213 vCPU_id: 33
nvme interrupt total: 33
nvme int_id: 42 vCPU_id: 55
nvme int_id: 53 vCPU_id: 40-42
nvme int_id: 54 vCPU_id: 43-45
nvme int_id: 55 vCPU_id: 46-48
nvme int_id: 56 vCPU_id: 49-51
nvme int_id: 57 vCPU_id: 52-54
nvme int_id: 58 vCPU_id: 55-57
nvme int_id: 59 vCPU_id: 58-60
nvme int_id: 60 vCPU_id: 61-63
nvme int_id: 61 vCPU_id: 64-65
nvme int_id: 62 vCPU_id: 66-67
nvme int_id: 63 vCPU_id: 68-69
nvme int_id: 64 vCPU_id: 70-71
nvme int_id: 65 vCPU_id: 72-73
nvme int_id: 66 vCPU_id: 74-75
nvme int_id: 67 vCPU_id: 76-77
nvme int_id: 68 vCPU_id: 78-79
nvme int_id: 69 vCPU_id: 0-2
nvme int_id: 70 vCPU_id: 3-5
nvme int_id: 71 vCPU_id: 6-8
nvme int_id: 72 vCPU_id: 9-11
nvme int_id: 73 vCPU_id: 12-14
nvme int_id: 74 vCPU_id: 15-17
nvme int_id: 75 vCPU_id: 18-20
nvme int_id: 76 vCPU_id: 21-23
nvme int_id: 77 vCPU_id: 24-25
nvme int_id: 78 vCPU_id: 26-27
nvme int_id: 79 vCPU_id: 28-29
nvme int_id: 80 vCPU_id: 30-31
nvme int_id: 81 vCPU_id: 32-33
nvme int_id: 82 vCPU_id: 34-35
nvme int_id: 83 vCPU_id: 36-37
nvme int_id: 84 vCPU_id: 38-39
ahci interrupt total: 1
ahci int_id: 25 vCPU_id: 64
inno-gpu interrupt total: 2
inno-gpu int_id: 215 vCPU_id: 36
inno-gpu int_id: 217 vCPU_id: 79
arch-timer interrupt total: 1
arch-time int_id: 4 vCPU_id: 0-79
kvm-ptimer interrupt total: 1
kvm-ptimer int_id: 2 vCPU_id: 0-79

NVMe 存储控制器 irq 亲和性,需配置 Admin Queue (SQ/CQ) 公用资源所处于的 vcpu 中断亲和性,
本例中是 nvme int_id: 42 vCPU_id: 55 ,可通过调整 irq 42 所在 vcpu 核,具体绑定到哪个核,
根据自己硬件平台资源、整体部署要求。

https://kernel.org/doc/Documentation/IRQ-affinity.txt
https://www.cnblogs.com/Bozh/archive/2013/03/21/2973769.html

09-29 08:51