本文介绍了有状态的 DoFn 是否可以具有随 TTL 过期的状态?或者无限增长可以吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Apache Beam(在 Dataflow 中运行)中有一个情况,我创建了一个简单的有状态 DoFn,基于 这篇文章.上游窗口是全局的,更改它会影响下游聚合.

I have a situation in Apache Beam (running in Dataflow) where I have created a simple stateful DoFn, based on this article. The upstream window is global, and changing it would impact downstream aggregations.

目前,我没有做任何事情来缩小状态,它似乎只是无限增长.这是真的?无界状态增长是个问题吗?

Currently, I am not doing anything to shrink the state, and it would appear to just grow unbounded. Is this true? Is unbounded state growth a problem?

我想简单地将 TTL 附加到状态,但没有看到此功能.

I would like to simply attach a TTL to the state, but don't see this functionality.

我正在考虑在数据上存储我自己的时间戳,并使用计时器定期清理表.这是可取的吗?

I am considering storing my own timestamp on the data, and using a timer to cleanup the table periodically. Is this advisable?

正在存储的数据是某些事件数据的缓存键.缓存键告诉我,我需要查找此事件的过去事件数据以补充当前事件.有状态的 DoFn 对此很有效,但是,就像我说的那样,我担心它会无限增长.我不确定在 Dataflow 中是否会产生任何后果.

The data that's being stored is a cache key on some evented data. The cache key tells me that I need to lookup a past events data for this event to hydrate the current event. The stateful DoFn works well for this, yet, like I said I am concerned it will grow unbounded. I'm unsure if there's any consequences of that in Dataflow.

推荐答案

当窗口过期时,状态会被自动垃圾回收.由于您使用的是全局窗口,因此它永远不会过期.因此,您需要使用计时器自行管理.

State is automatically garbage collected when a window expired. Since you are using the global window, it will never expire. So you will need to manage this yourself with timers.

我不知道你的代码的细节,但你的想法听起来是对的:

I don't know the details of your code but your idea sounds about right:

  • 在您的状态中存储时间戳,以便您知道它的年龄
  • 设置一个周期性重复的事件时间计时器:
    • 清理表中早于 TTL 的内容
    • @OnTimer 方法可以重置同一个定时器
    • store a timestamp with your state so you know how old it is
    • set an event time timer that repeats periodically:
      • clean up things in the table older than TTL
      • the @OnTimer method can reset the same timer

      您也可以直接为元素的 TTL 设置计时器,但这会导致触发更多计时器.所以只有在音量低的情况下才会好.(但如果交易量较低,您可能不必担心无限增长)

      You could also directly set a timer for the TTL for an element, but that will cause many more timers to fire. So would only be good if volume is low. (but if volume is low you probably don't have to worry about unbounded growth so much)

      这篇关于有状态的 DoFn 是否可以具有随 TTL 过期的状态?或者无限增长可以吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-24 01:59