本文介绍了Google Cloud Composer (Apache Airflow) 无法访问日志文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Google Cloud Composer(托管的 Airflow)中运行 DAG,它在本地的 Airflow 中运行良好.它所做的只是打印Hello World".但是,当我通过 Cloud Composer 运行它时,我收到错误:

I'm running a DAG in Google Cloud Composer (hosted Airflow) which runs fine in Airflow locally. All it does is print "Hello World". However, when I run it through Cloud Composer I receive the error:

*** Log file does not exist: /home/airflow/gcs/logs/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log
*** Fetching from: http://airflow-worker-d775d7cdd-tmzj9:8793/log/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-d775d7cdd-tmzj9', port=8793): Max retries exceeded with url: /log/matts_custom_dag/main_test/2020-04-20T23:46:53.652833+00:00/2.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8825920160>: Failed to establish a new connection: [Errno -2] Name or service not known',))

我还尝试让 DAG 将数据添加到数据库中,并且实际上有 50% 的成功率.但是,它始终返回此错误消息(并且没有其他打印语句或日志).任何关于为什么会发生这种情况的帮助都非常感谢.

I've also tried making the DAG add data into a database and it actually succeeds 50% of the time. However, it always returns this error message (and no other print statements or logs). Any help much appreciated on why this might be happening.

推荐答案

我们也遇到了同样的问题,然后向 GCP 提出了支持请求并得到了以下答复.

We also faced the same issue then raised a support ticket to GCP and got the following reply.

  1. 该消息与将日志从 Airflow 工作器同步到 WebServer 的延迟有关,至少需要几分钟(取决于对象的数量和大小)总日志大小似乎不大,但足以明显减慢同步速度,因此,我们建议清理/归档日志

  1. The message is related to the latency of syncing logs from Airflow workers to WebServer, it takes at least some minutes (depending on the number of objects and their size)The total log size seems not large but it’s enough to noticeably slow down synchronization, hence, we recommend cleanup/archive the logs

基本上我们建议改用 Stackdriver 日志,因为此同步的设计会造成延迟

Basically we recommend relying on Stackdriver logs instead, because of latency due to the design of this sync

我希望这能帮助您解决问题.

I hope this will help you solve the problem.

这篇关于Google Cloud Composer (Apache Airflow) 无法访问日志文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-04 06:59