本文介绍了Camus迁移-Kafka HDFS Connect并非从设置的偏移量开始的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用Confluent HDFS Sink连接器(v4.0.0)替换Camus.我们正在处理敏感数据,因此我们需要在过渡到连接器的过程中保持偏移量的一致性.

I am currently using Confluent HDFS Sink Connector (v4.0.0) to replace Camus. We are dealing with sensitive data so we need to maintain consistency in offset during cutover to connectors.

转换计划:

  1. 我们创建了hdfs接收器连接器,并订阅了一个写入临时hdfs文件的主题.这将创建一个名称为 connect-
  2. 的消费者组.
  3. 使用DELETE请求终止了连接器.
  4. 使用 /usr/bin/kafka-consumer-groups 脚本,我可以将连接器使用者组kafka主题分区的当前偏移量设置为所需的值(即卡缪斯最后一次写错了+ 1).
  5. 当我重新启动hdfs接收器连接器时,它将继续从上次提交的连接器偏移量读取数据,并忽略设置值.我期望hdfs文件名像这样:hdfs_kafka_topic_name + kafkapartition + Camus_offset + Camus_offset_plus_flush_size.format
  1. We created hdfs sink connector and subscribed to a topic which writes to a temporary hdfs file. This creates a consumer group with name connect-
  2. Stopped the connector using DELETE request.
  3. Using /usr/bin/kafka-consumer-groups script, I am able to set the connector consumer group kafka topic partition's current offset to a desired value (i.e. last offset Camus wrote + 1).
  4. When i restart the hdfs sink connector, it continues reading from the last committed connector offset and ignores the set value. I am expecting the hdfs file name to be like:hdfs_kafka_topic_name+kafkapartition+Camus_offset+Camus_offset_plus_flush_size.format

我对融合连接器行为的期望正确吗?

Is my expectation of confluent connector behavior correct ?

推荐答案

重新启动此连接器时,它将使用嵌入到hdfs的最后一个文件的文件has中嵌入的偏移量.它将不使用使用者组偏移量.之所以这样做,是因为它使用预写日志来实现一次准确地将其交付给hdfs.

When you restart this connector, it will use the offset embedded in the file have of the last file written to hdfs. It will not use the consumer group offset. It does this because it uses a write ahead log to achieve exactly once deliver to hdfs.

这篇关于Camus迁移-Kafka HDFS Connect并非从设置的偏移量开始的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-19 20:20