本文介绍了写入BigQuery时处理卡住的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用云数据流将数据从发布/订阅消息导入到BigQuery表中.我正在使用DynamicDestinations,因为这些消息可以放在不同的表中.

I'm using cloud Dataflow to import data from Pub/Sub messages to BigQuery tables. I'm using DynamicDestinations since these messages can be put into different tables.

我最近注意到该过程开始消耗所有资源,并且显示该过程被卡住的消息开始显示:

I've recently noticed that the process started consuming all resources and messages stating that the process is stuck started showing:

处理停留在步骤中,将avros写入BigQuery Table/StreamingInserts/StreamingWriteTables/StreamingWrite至少26h45m00s,而未在java.util.concurrent上的sun.misc.Unsafe.park(Native Method)输出或完成状态完成机构在java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)处的.locks.LockSupport.park(LockSupport.java:175)在组织处的java.util.concurrent.FutureTask.get(FutureTask.java:191)中.org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl $ DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:829)上的apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl $ DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:765))org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:103)上的org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:131)),网址为org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn $ DoFnInvoker.invokeFinishBundle(未知来源)

当前,只需取消管道并重新启动它似乎可以暂时解决问题,但我似乎无法查明进程卡住的原因.

Currently, simply cancelling the pipeline and restarting it seems to temporarily solve the problem, but I can't seem to pinpoint the reason the process is getting stuck.

管道正在使用Beam-runners-google-cloud-dataflow-java版本2.8.0和google-cloud-bigquery版本1.56.0

The pipeline is using beam-runners-google-cloud-dataflow-java version 2.8.0 and google-cloud-bigquery version 1.56.0

推荐答案

此日志消息可能看起来很吓人,但它不表示问题.该消息试图传达的信息是您的管道已经执行了相同的操作一段时间了.

This log message may seem scary, but it is not indicative of a problem. What this message is trying to convey is that your pipeline has been performing the same operation for a while.

这不一定是问题:您的文件可能足够大,需要一段时间才能写入.如果您遇到有关正在查看这些消息的问题,请考虑使用哪种管道,以及认为它可能有一些缓慢的步骤是否有意义.

This is not necessarily a problem: Your files may be large enough that they take a while to write. If you've arrived at this question concerned that you're seeing these messages, please consider what kind of pipeline you've got, and whether it makes sense to think it may have some slow steps.

对于您来说,您的管道已经写了26小时,因此这肯定是一个问题.我相信问题与旧版Beam中的库引入的死锁有关.在较新的版本(例如2.15.0)中,这应该不是问题.

In your case, your pipeline has been writing for 26 HOURS, so this is certainly a problem. I believe the problem is related to a deadlock introduced by a library in older versions of Beam. This should not be a problem in more recent ones (e.g. 2.15.0).

这篇关于写入BigQuery时处理卡住的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-30 14:59