本文介绍了apache 气流 1.10.9 statsd 启用使调度程序崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的气流在 CeleryExecutor 模式 + progresql 12 下运行,除了打开 statsd 之外,一切都进行得很顺利:

my airflow running in CeleryExecutor mode + progresql 12, all things go well except when turning statsd on:

statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow

调度程序可以渲染作业但作业没有运行,调度程序日志有以下错误:

The schedulers can render jobs but jobs are not running, the scheduler log having below error:

[SQL: SELECT count(*) AS count_1
FROM task_instance
WHERE task_instance.pool = %(pool_1)s AND task_instance.state IN (%(state_1)s, %(state_2)s)]
[parameters: {'pool_1': 'default_pool', 'state_1': 'running', 'state_2': 'queued'}]
(Background on this error at: http://sqlalche.me/e/4xp6)[0m
[31mTraceback (most recent call last):
  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1246, in _execute_context
    cursor, statement, parameters, context
  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 588, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.ProtocolViolation: invalid frontend message type 97
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/airflow/jobs/scheduler_job.py", line 1495, in _validate_and_run_task_instances
    self._process_and_execute_tasks(simple_dag_bag)

  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 588, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.DatabaseError: (psycopg2.errors.ProtocolViolation) invalid frontend message type 97
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

如果禁用 statsd,一切都会恢复.这是气流的错误吗?有什么建议可以解决吗?

If disable statsd, everything resume. Is it a bug for airflow? any advise to resolve it?

推荐答案

我遇到了同样的错误,经过几次测试后,我可以让 statsd 指标正常工作.通常,如果满足以下条件,您将看到错误.

I faced the same error, and after a few tests, i can get statsd metrics working. Typically, you will see the error if the following conditions are met.

  • Statsd 已启用设置为 True
  • SqlAlchemy 连接池设置为 True
  • 已启用调度程序 syserr 日志(通过将 err 日志重定向到可以看到此错误的文件)

在我的例子中,即使调度程序不断抛出错误日志,statsd 指标仍然交付,并且任务也按应有安排.我不知道如何衡量影响,我也不想牺牲 sql_alchemy 连接池,所以我关闭了 statsd.

In my case, even though the scheduler kept throwing the error logs, statsd metrics were still delivered, and tasks were also scheduled as they should. I dont know how to measure the impact, i also dont want to sacrifice sql_alchemy connection pool, so I leave statsd turned off.

(我猜其他人没有看到错误,因为他们错过了上面的第三个)

这篇关于apache 气流 1.10.9 statsd 启用使调度程序崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-04 06:24