本文介绍了似乎在等待的线程的 CPU 利用率高的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在运行一些 JMeter 测试来测试 Web 服务的性能.它使用了大量的 CPU.对于一个 JMeter 请求线程,它使用 10-30% 的任何地方(取决于请求的类型).当我将它提升到仅仅 15 个线程时,我得到了大约 95% 的 CPU 利用率.自然,我想弄清楚发生了什么.我做了一个 Hprof CPU 示例(我尝试了时间选项,但启动我的服务花了一个半小时,没有消息通过).以下是该采样的结果片段(超过 15 分钟).

CPU 样本开始(总计 = 220846) 2014 年 8 月 22 日星期五 13:38:54排名自累积计数跟踪方法1 14.96% 14.96% 33038 300514 java.net.PlainSocketImpl.socketAccept2 14.84% 29.80% 32776 301258 sun.nio.ch.EPollArrayWrapper.epollWait3 12.45% 42.26% 27505 313002 sun.nio.ch.EPollArrayWrapper.epollWait4 7.48% 49.73% 16517 300604 java.net.PlainSocketImpl.socketAccept5 7.18% 56.91% 15856 303203 sun.nio.ch.EPollArrayWrapper.epollWait6 6.18% 63.09% 13639 313001 sun.nio.ch.ServerSocketChannelImpl.accept07 6.04% 69.13% 13329 304259 sun.nio.ch.EPoll.epollWait8 5.11% 74.23% 11275 307102 sun.nio.ch.EPollArrayWrapper.epollWait

以及那些顶级样本的相应堆栈:

跟踪 300514:java.net.PlainSocketImpl.socketAccept(:Unknown line)java.net.AbstractPlainSocketImpl.accept(:Unknown line)java.net.ServerSocket.implAccept(:Unknown line)java.net.ServerSocket.accept(:未知行)sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(:Unknown line)sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(:Unknown line)java.lang.Thread.run(:未知行)跟踪 301258:sun.nio.ch.EPollArrayWrapper.epollWait(:Unknown line)sun.nio.ch.EPollArrayWrapper.poll(:未知行)sun.nio.ch.EPollSelectorImpl.doSelect(:Unknown line)sun.nio.ch.SelectorImpl.lockAndDoSelect(:Unknown line)sun.nio.ch.SelectorImpl.select(:Unknown line)org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBlockingSelector.java:327)跟踪 313002:sun.nio.ch.EPollArrayWrapper.epollWait(:Unknown line)sun.nio.ch.EPollArrayWrapper.poll(:未知行)sun.nio.ch.EPollSelectorImpl.doSelect(:Unknown line)sun.nio.ch.SelectorImpl.lockAndDoSelect(:Unknown line)sun.nio.ch.SelectorImpl.select(:Unknown line)org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1163)java.lang.Thread.run(:未知行)跟踪 300604:java.net.PlainSocketImpl.socketAccept(:Unknown line)java.net.AbstractPlainSocketImpl.accept(:Unknown line)java.net.ServerSocket.implAccept(:Unknown line)java.net.ServerSocket.accept(:未知行)sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(:Unknown line)sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(:Unknown line)sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(:Unknown line)java.lang.Thread.run(:未知行)跟踪 303203:sun.nio.ch.EPollArrayWrapper.epollWait(:Unknown line)sun.nio.ch.EPollArrayWrapper.poll(:未知行)sun.nio.ch.EPollSelectorImpl.doSelect(:Unknown line)sun.nio.ch.SelectorImpl.lockAndDoSelect(:Unknown line)sun.nio.ch.SelectorImpl.select(:Unknown line)net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:217)net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:836)跟踪 313001:sun.nio.ch.ServerSocketChannelImpl.accept0(:Unknown line)sun.nio.ch.ServerSocketChannelImpl.accept(:Unknown line)org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:793)java.lang.Thread.run(:未知行)跟踪 304259:sun.nio.ch.EPoll.epollWait(:未知线路)sun.nio.ch.EPollPort$EventHandlerTask.poll(:Unknown line)sun.nio.ch.EPollPort$EventHandlerTask.run(:Unknown line)java.lang.Thread.run(:未知行)跟踪 307102:sun.nio.ch.EPollArrayWrapper.epollWait(:Unknown line)sun.nio.ch.EPollArrayWrapper.poll(:未知行)sun.nio.ch.EPollSelectorImpl.doSelect(:Unknown line)sun.nio.ch.SelectorImpl.lockAndDoSelect(:Unknown line)sun.nio.ch.SelectorImpl.select(:Unknown line)net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:217)net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:836)

如您所见,超过一半的 CPU 使用率似乎来自应该等待的线程.那不应该占用CPU时间吗?

我看到了这个帖子 http://www.brendangregg.com/blog/2014-06-09/java-cpu-sampling-using-hprof.html 这可能让我认为这个结果具有误导性,但我的top -H"结果显示最大 CPU 使用率,Zabbix 监控也是如此.所以,看起来它实际上是在消耗 CPU.但是,有一个链接指向 hprof 作者的引用,其中指出:

有人可以解释为什么会出现这种情况以及在这些情况下我可以做些什么来减少 CPU 使用率?还是所有的 CPU 使用率指标实际上都具有误导性?如果是这样,了解我的服务中真实 CPU 利用率的更好方法是什么?

解决方案

正如 Brendan Gregg 指出的您链接的博客文章,来自 JVM 认为可运行的所有线程的 hprof 示例.正如你在 Thread.state 的 Javadoc 中看到的,JVM 区分了以下线程状态:

  • 新:尚未启动的线程处于此状态.
  • RUNNABLE:在 Java 虚拟机中执行的线程处于此状态.
  • BLOCKED:被阻塞等待监视器锁的线程处于此状态.
  • WAITING:无限期等待另一个线程执行特定操作的线程处于此状态.
  • TIMED_WAITING:等待另一个线程执行操作达指定等待时间的线程处于此状态.
  • TERMINATED:已退出的线程处于此状态.

正如我们所见,JVM 没有为等待 I/O 的线程提供专用状态.那是因为这样的线程实际上是被操作系统阻塞的,而不是被 JVM 阻塞的.也就是说,就 JVM 而言,等待网络适配器的线程是可运行的.事实上,RUNNABLE 状态的详细 Javadoc 写道:

可运行线程的线程状态.处于可运行状态的线程正在 Java 虚拟机中执行,但它可能正在等待来自操作系统的其他资源,例如处理器.

因此,在 hprof "cpu" 采样中存在 I/O 方法并不意味着这些方法消耗 CPU,因为它们的等待时间也被计算在内.

您可以:

  • 假设 I/O 方法不负责 CPU 消耗,而专注于其他方法
  • 使用更好的分析器,将等待操作系统级别的资源考虑在内

I'm currently running some JMeter tests to test the performance of a web service. It's using a very high amount of CPU. For one JMeter request thread, it's using anywhere from 10-30% (depending on the type of request). When I pump it up to a mere 15 threads, I get ~95% CPU utilization. Naturally, I want to figure out what's going on. I did an Hprof CPU sample (I tried the times option but it took an hour and a half to start my service and no messages would go through). Below are the snippets of the results from that sampling (over a 15 minute span).

CPU SAMPLES BEGIN (total = 220846) Fri Aug 22 13:38:54 2014
rank   self  accum   count trace method
   1 14.96% 14.96%   33038 300514 java.net.PlainSocketImpl.socketAccept
   2 14.84% 29.80%   32776 301258 sun.nio.ch.EPollArrayWrapper.epollWait
   3 12.45% 42.26%   27505 313002 sun.nio.ch.EPollArrayWrapper.epollWait
   4  7.48% 49.73%   16517 300604 java.net.PlainSocketImpl.socketAccept
   5  7.18% 56.91%   15856 303203 sun.nio.ch.EPollArrayWrapper.epollWait
   6  6.18% 63.09%   13639 313001 sun.nio.ch.ServerSocketChannelImpl.accept0
   7  6.04% 69.13%   13329 304259 sun.nio.ch.EPoll.epollWait
   8  5.11% 74.23%   11275 307102 sun.nio.ch.EPollArrayWrapper.epollWait

And the corresponding stack for those top samples:

TRACE 300514:
    java.net.PlainSocketImpl.socketAccept(:Unknown line)
    java.net.AbstractPlainSocketImpl.accept(:Unknown line)
    java.net.ServerSocket.implAccept(:Unknown line)
    java.net.ServerSocket.accept(:Unknown line)
    sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(:Unknown line)
    sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(:Unknown line)
    java.lang.Thread.run(:Unknown line)
TRACE 301258:
    sun.nio.ch.EPollArrayWrapper.epollWait(:Unknown line)
    sun.nio.ch.EPollArrayWrapper.poll(:Unknown line)
    sun.nio.ch.EPollSelectorImpl.doSelect(:Unknown line)
    sun.nio.ch.SelectorImpl.lockAndDoSelect(:Unknown line)
    sun.nio.ch.SelectorImpl.select(:Unknown line)
    org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBlockingSelector.java:327)
TRACE 313002:
    sun.nio.ch.EPollArrayWrapper.epollWait(:Unknown line)
    sun.nio.ch.EPollArrayWrapper.poll(:Unknown line)
    sun.nio.ch.EPollSelectorImpl.doSelect(:Unknown line)
    sun.nio.ch.SelectorImpl.lockAndDoSelect(:Unknown line)
    sun.nio.ch.SelectorImpl.select(:Unknown line)
    org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1163)
    java.lang.Thread.run(:Unknown line)
TRACE 300604:
    java.net.PlainSocketImpl.socketAccept(:Unknown line)
    java.net.AbstractPlainSocketImpl.accept(:Unknown line)
    java.net.ServerSocket.implAccept(:Unknown line)
    java.net.ServerSocket.accept(:Unknown line)
    sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(:Unknown line)
    sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(:Unknown line)
    sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(:Unknown line)
    java.lang.Thread.run(:Unknown line)
TRACE 303203:
    sun.nio.ch.EPollArrayWrapper.epollWait(:Unknown line)
    sun.nio.ch.EPollArrayWrapper.poll(:Unknown line)
    sun.nio.ch.EPollSelectorImpl.doSelect(:Unknown line)
    sun.nio.ch.SelectorImpl.lockAndDoSelect(:Unknown line)
    sun.nio.ch.SelectorImpl.select(:Unknown line)
    net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:217)
    net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:836)
TRACE 313001:
    sun.nio.ch.ServerSocketChannelImpl.accept0(:Unknown line)
    sun.nio.ch.ServerSocketChannelImpl.accept(:Unknown line)
    org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:793)
    java.lang.Thread.run(:Unknown line)
TRACE 304259:
    sun.nio.ch.EPoll.epollWait(:Unknown line)
    sun.nio.ch.EPollPort$EventHandlerTask.poll(:Unknown line)
    sun.nio.ch.EPollPort$EventHandlerTask.run(:Unknown line)
    java.lang.Thread.run(:Unknown line)
TRACE 307102:
    sun.nio.ch.EPollArrayWrapper.epollWait(:Unknown line)
    sun.nio.ch.EPollArrayWrapper.poll(:Unknown line)
    sun.nio.ch.EPollSelectorImpl.doSelect(:Unknown line)
    sun.nio.ch.SelectorImpl.lockAndDoSelect(:Unknown line)
    sun.nio.ch.SelectorImpl.select(:Unknown line)
    net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:217)
    net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:836)

As you can see, over half of the CPU usage seems to be coming from threads that should be waiting. Shouldn't that not be taking up CPU time?

I saw this thread http://www.brendangregg.com/blog/2014-06-09/java-cpu-sampling-using-hprof.html which might make me think this result is misleading, but my "top -H" results showed maximum CPU usage, as did Zabbix monitoring. So, it appears that it is actually consuming CPU. However, there is a link there to a quote from the hprof author which states:

Could someone explain why this might be the case and what I can do to lessen the CPU usage in these cases? Or are all the CPU usage indicators actually misleading? If so, what would be a better way to understand true CPU utilization in my service?

解决方案

As Brendan Gregg points out the blog article you linked, hprof samples from all threads the JVM considers runnable. As you can see in the Javadoc of Thread.state, the JVM distinguishes the following thread states:

As we can see, the JVM does not have a dedicated state for a thread waiting for I/O. That's because such a thread is actually blocked by the operating system, not the JVM. That is, as far the JVM is concerned, a thread waiting for the network adapter is runnable. Indeed, the detail Javadoc for the RUNNABLE state writes:

Therefore, the presence of I/O methods in a hprof "cpu" sampling does not imply that these methods consumed CPU, as their wait time is counted as well.

You can either:

  • assume that I/O methods are not responsible for the CPU consumption, and focus on the other methods
  • use a better profiler that takes into account waiting for OS level resources

这篇关于似乎在等待的线程的 CPU 利用率高的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-05 10:39