本文介绍了Netty.writeAndFlush与未来成功杀死主机的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一台基于Netty(4.0.15)的Websocket服务器在Ubuntu v10上运行,在弹性测试期间,我们这样做:

We have a Netty (4.0.15) based Websocket server running on Ubuntu v10, and during resiliency testing we do:

  1. 杀死-9服务器
  2. 从客户端发送一些数据
  3. 预期客户端上的 writeAndFlush 故障
  1. kill -9 server
  2. send some data from client
  3. expect writeAndFlush failure on client

出于某些原因有时,我们看到:

For some reasons sometimes we see:

  1. writeAndFlush 成功,然后之后
  2. java.io.IOException:对等重置连接
  1. writeAndFlush success and then after
  2. java.io.IOException: Connection reset by peer

那么即使服务器不见了, writeAndFlush 有时是否成功完成,而其他时候却失败了吗?

So is it possible the writeAndFlush sometimes completes successfully even if the server is gone, whilst other times it fails?

这可能是由于被终止进程的OS套接字清理机制的调度而发生的吗?

Maybe this occurs because of the schedule of the OS socket clean-up mechanism for killed processes?

客户端测试代码:

    channel.writeAndFlush(new TextWebSocketFrame("blah blah")).addListeners(
    <snip>
            public void operationComplete(ChannelFuture future) {
                assert future.isSuccess() == false;  <-- sometimes this is not triggered
            }
    </snip>

感谢任何想法,

推荐答案

这是一个简单的竞争条件,您必须接受某些事情.您只能通过不从远程主机接收数据来确定它已消失.通常,这是通过设置计时器并假设如果尚未接收到数据(可能是响应保持活动消息)而远程主机已死的话来实现的.

It's a simple race condition, and something that you have to accept can happen. You can only determine that the remote host has disappeared by not receiving data from it. Generally this is achieved by setting a timer and assuming that if data hasn't been received (possibly in response to a keep alive message) the remote host is dead.

从本质上讲,TCP假设远程主机在未接收到确认的情况下尝试重传某些数据多次或未接收到保持活动的响应(通常默认情况下处于关闭状态),则认为该主机已死.但是,假设主机的发送缓冲区中有空间,您可以继续成功调用writeAndFlush,因为它只会在网络缓冲区中排队.一旦Netty将数据写入内核发送缓冲区,就认为WriteAndFlush成功.没有应用程序级别的确认,就无法确定数据是否到达了远程主机.因此,您可能在TCP正在确定远程主机已死的过程中调用writeAndFlush,因此writeAndFlush成功了,但是没有发送数据.或者,您可以在TCP确定远程主机已死并因此引发错误的同时调用writeAndFlush.

Essentially TCP assumes that the remote host is dead if it attempts to retransmit some data a certain number of times without receiving an acknowledgement, or it does not receive a response to keep alive (which is usually off by default). However, assuming there is room in your host's send buffer, you can continue to call writeAndFlush successfully as it will simply be queued in the network buffers. WriteAndFlush is considered to have succeeded once Netty has written the data to the kernel send buffer. There is no way of determining whether the data reached the remote host without an application level acknowledgement. Thus you may be calling writeAndFlush while TCP is in the process of determining that the remote host has died and so writeAndFlush succeeds but the data is not sent. Alternatively you may call writeAndFlush at the same time as TCP determines the remote host is dead and therefore raises an error.

有关TCP重传和保持活动的更多信息,此处此处

There's a lot more information on TCP retransmission and keep alive here and here

这篇关于Netty.writeAndFlush与未来成功杀死主机的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-16 20:56