侧边栏壁纸
博主头像
枕头下放双臭袜子博主等级

今我何功德,曾不事农桑

  • 累计撰写 162 篇文章
  • 累计创建 30 个标签
  • 累计收到 0 条评论
k8s

Pod内存越限OOM导致频繁重启的一次简单排查

枕头下放双臭袜子
2021-07-14 / 0 评论 / 2 点赞 / 1,352 阅读 / 12,774 字 / 正在检测是否收录...

无意发现生产一个服务的Pod重启次数偏高(小公司监控啥的还不到位,勿嫌弃),通过查看Pod信息发现Pod退出原因是OOM,也就是内存越限.
同时也可以看到退出时间是7月12日中午12点16分,然后查看Pod被调度到了哪个节点上,去该节点上查看journalctl日志进一步查看原因。

1、发现重启次数较多的Pod

# kubectl get pod -A 
production   systembusiness-deployment-5c6969978b-6m4gx        1/1     Running   14      6d18h
production   systembusiness-deployment-5c6969978b-hx5lt        1/1     Running   12      6d18h
production   systembusiness-deployment-5c6969978b-qfh5x        1/1     Running   27      6d18h

从上面的输出可以看到该Deployment下的3个Pod重启次数普遍偏多

# kubectl describe pod -n production  systembusiness-deployment-5c6969978b-hx5lt
    State:          Running
      Started:      Mon, 12 Jul 2021 12:10:20 +0800
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Fri, 09 Jul 2021 20:44:31 +0800
      Finished:     Mon, 12 Jul 2021 12:10:20 +0800
    Ready:          True
    Restart Count:  27
    Limits:
      cpu:     500m
      memory:  1000Mi
    Requests:
      cpu:     100m
      memory:  100Mi

从describe的输出中重点关注资源限制LimitsLast State两部分,从Last State部分可以看到退出原因是OOMKilled,也就是内存越限,被杀死的时间是7月12号的中午12点10分

特别注意一下这里 de 资源限制:cpu限制为500m内存限制为1000Mi

2、摸查内存越限情况

为了进一步排查原因,首先查看Pod被调度到了哪个节点,随后去该节点上查看journal的日志和kubelet日志

# kubectl get pod -A -owide | grep systembusiness
production   systembusiness-deployment-5c6969978b-6m4gx         1/1     Running   15         6d22h   10.244.154.209   k8s-node-01 <none>       <none>
production   systembusiness-deployment-5c6969978b-hx5lt         1/1     Running   13         6d22h   10.244.127.255   k8s-node-04 <none>       <none>
production   systembusiness-deployment-5c6969978b-qfh5x         1/1     Running   29         6d22h   10.244.162.29    k8s-node-08 <none>       <none>

选取其中一个Pod到节点上查看kubelet日志(k8s-node-04)

k8s-node-04 使用journalctl根据时间筛选日志
# journalctl --since="2021-07-12 08:00:00"
7月 12 12:11:16 k8s-node-04 kernel: systembus invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=994
7月 12 12:11:16 k8s-node-04 kernel: systembusin cpuset=docker-052d52a163eb1ced8ddbd9968b31cb1d1ec74a2b389e18984737c3118fdd7a5e.scope mems_allowed=0
7月 12 12:11:16 k8s-node-04 kernel: CPU: 2 PID: 1735957 Comm: systembusin Tainted: G           OE    --------- -  - 4.18.0-193.14.2.el8_2.x86_64 #1
7月 12 12:11:16 k8s-node-04 kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8a46cfe 04/01/2014
7月 12 12:11:16 k8s-node-04 kernel: Call Trace:
7月 12 12:11:16 k8s-node-04 kernel:  dump_stack+0x5c/0x80
7月 12 12:11:16 k8s-node-04 kernel:  dump_header+0x6e/0x27a
7月 12 12:11:16 k8s-node-04 kernel:  oom_kill_process.cold.28+0xb/0x10
7月 12 12:11:16 k8s-node-04 kernel:  out_of_memory+0x1ba/0x490
7月 12 12:11:16 k8s-node-04 kernel:  mem_cgroup_out_of_memory+0x49/0x80
7月 12 12:11:16 k8s-node-04 kernel:  try_charge+0x6fa/0x780
7月 12 12:11:16 k8s-node-04 kernel:  ? __alloc_pages_nodemask+0xef/0x280
7月 12 12:11:16 k8s-node-04 kernel:  mem_cgroup_try_charge+0x8b/0x1a0
7月 12 12:11:16 k8s-node-04 kernel:  mem_cgroup_try_charge_delay+0x1c/0x40
7月 12 12:11:16 k8s-node-04 kernel:  do_anonymous_page+0xb5/0x360
7月 12 12:11:16 k8s-node-04 kernel:  __handle_mm_fault+0x662/0x6a0
7月 12 12:11:16 k8s-node-04 kernel:  handle_mm_fault+0xda/0x200
7月 12 12:11:16 k8s-node-04 kernel:  __do_page_fault+0x22d/0x4e0
7月 12 12:11:16 k8s-node-04 kernel:  do_page_fault+0x32/0x110
7月 12 12:11:16 k8s-node-04 kernel:  ? async_page_fault+0x8/0x30
7月 12 12:11:16 k8s-node-04 kernel:  async_page_fault+0x1e/0x30
7月 12 12:11:16 k8s-node-04 kernel: RIP: 0033:0x466bbf
7月 12 12:11:16 k8s-node-04 kernel: Code: 00 00 c5 fe 6f 06 c5 fe 6f 4e 20 c5 fe 6f 56 40 c5 fe 6f 5e 60 48 81 c6 80 00 00 00 c5 fd e7 07 c5 fd e7 4f 20 c5 fd e7 57 40 <c5> fd e7 5f 60 48 81 c7 80 00 00 00 48 81 eb >
7月 12 12:11:16 k8s-node-04 kernel: RSP: 002b:000000c000478a98 EFLAGS: 00010202
7月 12 12:11:16 k8s-node-04 kernel: RAX: 00000000000812dc RBX: 00000000000c06c0 RCX: 000000c0399096e0
7月 12 12:11:16 k8s-node-04 kernel: RDX: 000000c030880000 RSI: 000000c030bc9020 RDI: 000000c039848fa0
7月 12 12:11:16 k8s-node-04 kernel: RBP: 000000c000478af0 R08: 000000c039500000 R09: 000000c030880000
7月 12 12:11:16 k8s-node-04 kernel: R10: 0000000000000020 R11: 0581f4fae9181cae R12: 0000000000000002
7月 12 12:11:16 k8s-node-04 kernel: R13: 00000000016839e0 R14: 0000000000000000 R15: 0000000000465500
7月 12 12:11:16 k8s-node-04 kernel: Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podf05bdc36_dd14_4537_9b91_57cc07df0932.slice/docker-052d52a163eb1ced8ddbd9968b31cb1d1ec74a2b389e18984737c3118f>
7月 12 12:11:16 k8s-node-04 kernel: memory: usage 1024000kB, limit 1024000kB, failcnt 821
7月 12 12:11:16 k8s-node-04 kernel: memory+swap: usage 1024000kB, limit 9007199254740988kB, failcnt 0
7月 12 12:11:16 k8s-node-04 kernel: kmem: usage 5100kB, limit 9007199254740988kB, failcnt 0
7月 12 12:11:16 k8s-node-04 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podf05bdc36_dd14_4537_9b91_57cc07df0932.slice: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_>
7月 12 12:11:16 k8s-node-04 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podf05bdc36_dd14_4537_9b91_57cc07df0932.slice/docker-42aa34971c05a3bc340d91a71c026a77830d496f05>
7月 12 12:11:16 k8s-node-04 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podf05bdc36_dd14_4537_9b91_57cc07df0932.slice/docker-052d52a163eb1ced8ddbd9968b31cb1d1ec74a2b38>
7月 12 12:11:16 k8s-node-04 kernel: [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
7月 12 12:11:16 k8s-node-04 kernel: [3897555]     0 3897555      256        1    32768        0          -998 pause
7月 12 12:11:16 k8s-node-04 kernel: [1735923]     0 1735923   426082   258638  2158592        0           994 systembusin
7月 12 12:11:16 k8s-node-04 kernel: Memory cgroup out of memory: Killed process 1735923 (systembusin) total-vm:1704328kB, anon-rss:1017592kB, file-rss:17492kB, shmem-rss:0kB, UID:0
7月 12 12:11:16 k8s-node-04 kernel: oom_reaper: reaped process 1735923 (systembusin), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
7月 12 12:11:16 k8s-node-04 systemd[1]: docker-052d52a163eb1ced8ddbd9968b31cb1d1ec74a2b389e18984737c3118fdd7a5e.scope: Consumed 2min 25.839s CPU time
7月 12 12:11:16 k8s-node-04 kernel:  mem_cgroup_try_charge+0x8b/0x1a0
7月 12 12:11:16 k8s-node-04 kernel:  mem_cgroup_try_charge_delay+0x1c/0x40
7月 12 12:11:16 k8s-node-04 kernel:  do_anonymous_page+0xb5/0x360
7月 12 12:11:16 k8s-node-04 kernel:  __handle_mm_fault+0x662/0x6a0
7月 12 12:11:16 k8s-node-04 kernel:  handle_mm_fault+0xda/0x200
7月 12 12:11:16 k8s-node-04 kernel:  __do_page_fault+0x22d/0x4e0
7月 12 12:11:16 k8s-node-04 kernel:  do_page_fault+0x32/0x110
7月 12 12:11:16 k8s-node-04 kernel:  ? async_page_fault+0x8/0x30
7月 12 12:11:16 k8s-node-04 kernel:  async_page_fault+0x1e/0x30
7月 12 12:11:16 k8s-node-04 kernel: RIP: 0033:0x466bbf
7月 12 12:11:16 k8s-node-04 kernel: Code: 00 00 c5 fe 6f 06 c5 fe 6f 4e 20 c5 fe 6f 56 40 c5 fe 6f 5e 60 48 81 c6 80 00 00 00 c5 fd e7 07 c5 fd e7 4f 20 c5 fd e7 57 40 <c5> fd e7 5f 60 48 81 c7 80 00 00 00 48 81 eb >
7月 12 12:11:16 k8s-node-04 kernel: RSP: 002b:000000c000478a98 EFLAGS: 00010202
7月 12 12:11:16 k8s-node-04 kernel: RAX: 00000000000812dc RBX: 00000000000c06c0 RCX: 000000c0399096e0
7月 12 12:11:16 k8s-node-04 kernel: RDX: 000000c030880000 RSI: 000000c030bc9020 RDI: 000000c039848fa0
7月 12 12:11:16 k8s-node-04 kernel: RBP: 000000c000478af0 R08: 000000c039500000 R09: 000000c030880000
7月 12 12:11:16 k8s-node-04 kernel: R10: 0000000000000020 R11: 0581f4fae9181cae R12: 0000000000000002
7月 12 12:11:16 k8s-node-04 kernel: R13: 00000000016839e0 R14: 0000000000000000 R15: 0000000000465500
7月 12 12:11:16 k8s-node-04 kernel: Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podf05bdc36_dd14_4537_9b91_57cc07df0932.slice/docker-052d52a163eb1ced8ddbd9968b31cb1d1ec74a2b389e18984737c3118f>
7月 12 12:11:16 k8s-node-04 kernel: memory: usage 1024000kB, limit 1024000kB, failcnt 821
7月 12 12:11:16 k8s-node-04 kernel: memory+swap: usage 1024000kB, limit 9007199254740988kB, failcnt 0
7月 12 12:11:16 k8s-node-04 kernel: kmem: usage 5100kB, limit 9007199254740988kB, failcnt 0
7月 12 12:11:16 k8s-node-04 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podf05bdc36_dd14_4537_9b91_57cc07df0932.slice: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_>
7月 12 12:11:16 k8s-node-04 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podf05bdc36_dd14_4537_9b91_57cc07df0932.slice/docker-42aa34971c05a3bc340d91a71c026a77830d496f05>
7月 12 12:11:16 k8s-node-04 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podf05bdc36_dd14_4537_9b91_57cc07df0932.slice/docker-052d52a163eb1ced8ddbd9968b31cb1d1ec74a2b38>
7月 12 12:11:16 k8s-node-04 kernel: [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
7月 12 12:11:16 k8s-node-04 kernel: [3897555]     0 3897555      256        1    32768        0          -998 pause
7月 12 12:11:16 k8s-node-04 kernel: [1735923]     0 1735923   426082   258638  2158592        0           994 systembusin
7月 12 12:11:16 k8s-node-04 kernel: Memory cgroup out of memory: Killed process 1735923 (systembusin) total-vm:1704328kB, anon-rss:1017592kB, file-rss:17492kB, shmem-rss:0kB, UID:0
7月 12 12:11:16 k8s-node-04 kernel: oom_reaper: reaped process 1735923 (systembusin), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
7月 12 12:11:16 k8s-node-04 systemd[1]: docker-052d52a163eb1ced8ddbd9968b31cb1d1ec74a2b389e18984737c3118fdd7a5e.scope: Consumed 2min 25.839s CPU time
7月 12 12:11:16 k8s-node-04 systemd[1]: libcontainer-4178438-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing.
7月 12 12:11:16 k8s-node-04 systemd[1]: libcontainer-4178438-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing.
7月 12 12:11:16 k8s-node-04 systemd[1]: Created slice libcontainer_4178438_systemd_test_default.slice.
7月 12 12:11:16 k8s-node-04 systemd[1]: Removed slice libcontainer_4178438_systemd_test_default.slice.
7月 12 12:11:16 k8s-node-04 systemd[1]: libcontainer-4178461-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing.
7月 12 12:11:16 k8s-node-04 systemd[1]: libcontainer-4178461-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing.
7月 12 12:11:16 k8s-node-04 systemd[1]: Created slice libcontainer_4178461_systemd_test_default.slice.
7月 12 12:11:16 k8s-node-04 systemd[1]: Removed slice libcontainer_4178461_systemd_test_default.slice.
7月 12 12:11:16 k8s-node-04 dockerd[2687797]: time="2021-07-12T12:11:16+08:00" level=error msg="stat cgroup 052d52a163eb1ced8ddbd9968b31cb1d1ec74a2b389e18984737c3118fdd7a5e" error="cgroups: cgroup deleted"
7月 12 12:11:16 k8s-node-04 systemd[1]: libcontainer-4178468-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing.

先看这一行输出
k8s-node-04 kernel: Memory cgroup out of memory: Killed process 1735923 (systembusin) total-vm:1704328kB, anon-rss:1017592kB, file-rss:17492kB, shmem-rss:0kB, UID:0
从日志输出首先可以看出,Pod进程总共使用的虚拟内存total-vm:1704328kB,前边describe的时候也注意过Pod的资源限制内存1000Mi,现在用了1664Mi左右,导致内存越限被杀死。由此,我们也可以排除掉不是node节点内存不够导致Pod被杀死,而是Pod进程占用内存过多导致越限而被杀死

然后重点注意下面日志部分

7月 12 12:11:16 k8s-node-04 kernel: [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
7月 12 12:11:16 k8s-node-04 kernel: [3897555]     0 3897555      256        1    32768        0          -998 pause
7月 12 12:11:16 k8s-node-04 kernel: [1735923]     0 1735923   426082   258638  2158592        0           994 systembusin

由上边的部分输出来看,对于Pod来说,主要有两个进程作为 OOM killer 的候选者,其中 pause 进程负责为Pod中的业务容器创建共享的network等资源,它的oom_score_adj值为-998,可以确保它不会因为内存原因被系统杀死(oom_score_adj 值越低就越不容易被杀死可参考这里). 这个Pod中剩下的那个进程就是真正的业务容器,它的oom_score_adj值奇高,毫无疑问就是因为它Pod才会被OOM Killed

OOM killer 会根据进程的内存使用情况来计算 oom_score 的值,并根据 oom_score_adj 的值来进行微调

如果Pod中有多进程,谁的内存使用量最多,oom_score 值就越高,也就越容易被杀死。也就是说,当Pod因为OOM越限被杀死的时候,日志所记录的哪个进程的oom_score最高,它也就是导致Pod被杀死的罪魁祸首

3、解决问题

既然已经定位到具体业务容器问题了,解决办法要么提升Pod的资源限制,要么跟研发沟通进行优化,我选择了后者~

1、根据常见Exited Code定位Pod推出原因,可参考
2、journalctl知识1
3、journalctl知识2
4、内存OOM机制详解可以参考这里
5、重点参考 [问题已处理]-kubernetes中2次不同的oom处理

0

评论