本文介绍了Kafka 喜欢 Kinesis Stream 上的偏移量吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我过去曾与 Kafka 合作过,最近需要将部分数据管道移植到 AWS Kinesis Stream 上.现在我读到 Kinesis 实际上是 Kafka 的一个分支,并且有许多相似之处.

I have worked a bit with Kafka in the past and lately there is a requirement to port part of the data pipeline on AWS Kinesis Stream. Now I have read that Kinesis is effectively a fork of Kafka and share many similarities.

然而,我没有看到我们如何让多个消费者从同一个流中读取,每个消费者都有相应的偏移量.每个数据记录都有一个序列号,但我找不到任何特定于消费者的东西(Kafka 组 ID?).

However I have failed to see how can we have multiple consumers reading from the same stream, each with their corresponding offset. There is a sequence number given to each data record, but I couldn't find anything specific to consumer(Kafka group Id?).

是否真的有可能在同一 AWS Kinesis Stream 上拥有具有不同摄取率的不同使用者?

Is it really possible to have different consumers with different ingestion rate over same AWS Kinesis Stream?

推荐答案

是.

您可以拥有多个 Kinesis 消费者应用程序.假设您有 2 个.

You can have multiple Kinesis Consumer Applications. Let's say you have 2.

  1. 第一个消费者应用程序(我认为它是 Kafka 中的消费者组"?)可以是first-app"并将其位置存储在 DynamoDBfirst-app-table"中.它可以拥有任意数量的节点(ec2 实例).
  2. 第二个消费者应用程序也可以在同一个流上工作,并将它的位置存储在另一个 DynamoDB 表上,比如第二个应用程序表".

每个表将包含应用程序 Y 在分片 X 上最后处理的位置是什么"信息.因此,这两个应用程序将相同分片的检查点存储在不同的位置,从而使它们相互独立.

Each table will contain "what is the last processed position on shard X for app Y" information. So the 2 applications store checkpoints for the same shards in a different place, which makes them independent.

关于摄取率,有一个idleTimeBetweenReadsInMillis" 使用 KCL 的消费者应用程序中的值,即 Amazon Kinesis API for Get 操作的轮询间隔.例如,第一个应用程序可以有2000"轮询间隔,因此它将每 2 秒轮询一次流的分片,以查看是否有新记录出现.

About the ingestion rate, there is a "idleTimeBetweenReadsInMillis" value in consumer applications using KCL, that is the polling interval for Amazon Kinesis API for Get operations. For example first application can have "2000" poll interval, so it will poll stream's shards every 2 seconds to see if any new record came.

我不太了解卡夫卡,但据我所知;Kafka分区"是 Kinesis 中的分片",同样 Kafka偏移"是序列号"在 Kinesis 中.Kinesis 消费者库使用术语检查点" 用于存储序列.就像你说的,概念是相似的.

I don't know Kafka well but as far as I remember; Kafka "partition" is "shard" in Kinesis, likewise Kafka "offset" is "sequence number" in Kinesis. Kinesis Consumer Library uses the term "checkpoint" for the stored sequences. Like you said, the concepts are similar.

这篇关于Kafka 喜欢 Kinesis Stream 上的偏移量吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-22 08:21