最近生产环境出现502 报警较多,通过排查问题,有些问题还挺有意思。通过分析nginx 源码,对查nginx 状态码来源可能会带来一定启发。本文基于1.6.2(主要是和生成环境对齐)。

首先常见的错误码,定义在ngx_http_request.h, 这里有部分是client 引起的,有部分是upstream 引的,到底在什么情况下会引起下面这些问题?查问题从哪些方面入手?

#define NGX_HTTP_CLIENT_CLOSED_REQUEST     499
#define NGX_HTTP_INTERNAL_SERVER_ERROR     500
#define NGX_HTTP_NOT_IMPLEMENTED           501
#define NGX_HTTP_BAD_GATEWAY               502
#define NGX_HTTP_SERVICE_UNAVAILABLE       503
#define NGX_HTTP_GATEWAY_TIME_OUT          504
#define NGX_HTTP_INSUFFICIENT_STORAGE      507

access.log 会打req 的status, 需要去查status 赋值逻辑。

grep -r status= src|grep 502

后端状态码5xx 的逻辑基本在ngx_http_upstream.c 的 ngx_http_upstream_next 中,这里是状态码的

switch(ft_type) {
        case NGX_HTTP_UPSTREAM_FT_TIMEOUT:
            status = NGX_HTTP_GATEWAY_TIME_OUT;
            break;

        case NGX_HTTP_UPSTREAM_FT_HTTP_500:
            status = NGX_HTTP_INTERNAL_SERVER_ERROR;
            break;

        case NGX_HTTP_UPSTREAM_FT_HTTP_403:
            status = NGX_HTTP_FORBIDDEN;
            break;

        case NGX_HTTP_UPSTREAM_FT_HTTP_404:
            status = NGX_HTTP_NOT_FOUND;
            break;

这里ft_type 和 status 有个对应关系,这里ft_error NGX_HTTP_UPSTREAM_FT_TIMEOUT 跟504 ,NGX_HTTP_UPSTREAM_FT_HTTP_500 和 500 等有一对一对应关系,其他的ft type 都使用502 。这里就需要具体查下ft 的赋值情况。

#define NGX_HTTP_UPSTREAM_FT_ERROR           0x00000002
#define NGX_HTTP_UPSTREAM_FT_TIMEOUT         0x00000004
#define NGX_HTTP_UPSTREAM_FT_INVALID_HEADER  0x00000008
#define NGX_HTTP_UPSTREAM_FT_HTTP_500        0x00000010
#define NGX_HTTP_UPSTREAM_FT_HTTP_502        0x00000020
#define NGX_HTTP_UPSTREAM_FT_HTTP_503        0x00000040
#define NGX_HTTP_UPSTREAM_FT_HTTP_504        0x00000080
#define NGX_HTTP_UPSTREAM_FT_HTTP_403        0x00000100
#define NGX_HTTP_UPSTREAM_FT_HTTP_404        0x00000200
#define NGX_HTTP_UPSTREAM_FT_UPDATING        0x00000400
#define NGX_HTTP_UPSTREAM_FT_BUSY_LOCK       0x00000800
#define NGX_HTTP_UPSTREAM_FT_MAX_WAITING     0x00001000
#define NGX_HTTP_UPSTREAM_FT_NOLIVE          0x40000000
#define NGX_HTTP_UPSTREAM_FT_OFF             0x80000000

504, NGX_HTTP_GATEWAY_TIME_OUT 在ngx_http_upstream.c 中有几处会赋值,

  • 第一处是ngx_http_upstream_process_upgraded,
		if (downstream->write->timedout) {
        c->timedout = 1;
        ngx_connection_error(c, NGX_ETIMEDOUT, "client timed out");
        ngx_http_upstream_finalize_request(r, u, NGX_HTTP_REQUEST_TIME_OUT);
        return;
    }

    if (upstream->read->timedout || upstream->write->timedout) {
        ngx_connection_error(c, NGX_ETIMEDOUT, "upstream timed out");
        ngx_http_upstream_finalize_request(r, u, NGX_HTTP_GATEWAY_TIME_OUT);
        return;
    }
  • 第二处是 ngx_http_upstream_process_non_buffered_upstream
 		ngx_connection_t  *c;

    c = u->peer.connection;

    ngx_log_debug0(NGX_LOG_DEBUG_HTTP, c->log, 0,
                   "http upstream process non buffered upstream");

    c->log->action = "reading upstream";

    if (c->read->timedout) {
        ngx_connection_error(c, NGX_ETIMEDOUT, "upstream timed out");
        ngx_http_upstream_finalize_request(r, u, NGX_HTTP_GATEWAY_TIME_OUT);
        return;
    }

    ngx_http_upstream_process_non_buffered_request(r, 0);
  • 第三处是ngx_http_upstream_process_body_in_memory
		c = u->peer.connection;
    rev = c->read;

    ngx_log_debug0(NGX_LOG_DEBUG_HTTP, c->log, 0,
                   "http upstream process body on memory");

    if (rev->timedout) {
        ngx_connection_error(c, NGX_ETIMEDOUT, "upstream timed out");
        ngx_http_upstream_finalize_request(r, u, NGX_HTTP_GATEWAY_TIME_OUT);
        return;
    }

三处都是从upstream 中取连接,然后读或者写超时,可以看出504 的主要主要原因,是读写下游超时。

503 ,NGX_HTTP_SERVICE_UNAVAILABLE , grep 下就可以发现,主要是在limit 限流模块会出现,

grep NGX_HTTP_SERVICE_UNAVAILABLE -r src

src/http/modules/ngx_http_limit_req_module.c:                              NGX_HTTP_SERVICE_UNAVAILABLE);
src/http/modules/ngx_http_limit_conn_module.c:                              NGX_HTTP_SERVICE_UNAVAILABLE);

源码可以比较清晰看出来通过 ngx_http_limit_req_merge_conf 这里重置了状态码,而ngx_http_limit_req_merge_conf 会再 ngx_http_limit_conn_handler 中调用,这里限流被命中则返回503

static ngx_int_t
ngx_http_limit_conn_handler(ngx_http_request_t *r)
{
    ...

    if (r->main->limit_conn_set) {
        return NGX_DECLINED;
    }

    lccf = ngx_http_get_module_loc_conf(r, ngx_http_limit_conn_module);
    limits = lccf->limits.elts;

    for (i = 0; i < lccf->limits.nelts; i++) {
        //处理每一条limit_conn策略
    }
    return NGX_DECLINED;
}

502 相对比较复杂点,出现情况比较多。grep 502 , NGX_HTTP_BAD_GATEWAY 等实现,

  • 1,可以看出ngx_resolve_start 在 resolve 阶段,resolve 失败会NGX_HTTP_BAD_GATEWAY

  • 2, upstream->read/write 遇到eof / 0 /error 的时候会NGX_HTTP_BAD_GATEWAY, recv 系统调用返回n, 大于0时是读写字节数, 在接受到fin 的时候会返回0, 其他错误的时候返回-1。这里常见的一种错就是,nginx 的下游挂了,会返回给上游一个fin,然后502 返回给client。

  • 3,在upstream 连接阶段,ngx_http_upstream_connect 连接下游失败报错会 传 NGX_HTTP_UPSTREAM_FT_ERROR 给ngx_http_upstream_next 。

		rc = ngx_event_connect_peer(&u->peer);

    ngx_log_debug1(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                   "http upstream connect: %i", rc);

    if (rc == NGX_ERROR) {
        ngx_http_upstream_finalize_request(r, u,
                                           NGX_HTTP_INTERNAL_SERVER_ERROR);
        return;
    }

    u->state->peer = u->peer.name;

    if (rc == NGX_BUSY) {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, "no live upstreams");
        ngx_http_upstream_next(r, u, NGX_HTTP_UPSTREAM_FT_NOLIVE);
        return;
    }

    if (rc == NGX_DECLINED) {
        ngx_http_upstream_next(r, u, NGX_HTTP_UPSTREAM_FT_ERROR);
        return;
    }
  • 4 当是无效的header 的时候,NGX_HTTP_UPSTREAM_FT_INVALID_HEADER 会传给 ngx_http_upstream_next
if (u->buffer.last == u->buffer.end) {
  ngx_log_error(NGX_LOG_ERR, c->log, 0,
  "upstream sent too big header");

  ngx_http_upstream_next(r, u,
  NGX_HTTP_UPSTREAM_FT_INVALID_HEADER);
  return;
}

499 相对而言就比较简单了, NGX_HTTP_CLIENT_CLOSED_REQUEST 在client 访问nginx 时,如果主动close 了,nginx 就会记录 499,这个状态码不会返回给client,只本地记录。

09-01 20:47