为什么Kafka的消费者连接到Zookeeper，而生产者为什么从经纪人那里获取元数据?

本文介绍了为什么Kafka的消费者连接到Zookeeper，而生产者为什么从经纪人那里获取元数据?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

为什么消费者连接到Zookeeper来检索分区位置?卡夫卡生产者必须连接到其中一个经纪人才能检索元数据.

Why is it that consumers connect to zookeeper to retrieve the partition locations? And kafka producers have to connect to one of the brokers to retrieve metadata.

我的意思是，当每个经纪人已经具有所有必要的元数据来告诉生产者发送消息的位置时，zookeeper的用途到底是什么?经纪人不能将相同的信息发送给消费者吗?

My point is, what exactly is the use of zookeeper when every broker already has all the necessary metadata to tell producers the location to send their messages? Couldn't the brokers send this same information to the consumers?

我可以理解经纪人为何拥有元数据，而不必在每次向其发送新消息时都与动物园管理员建立连接.动物园管理员有我缺少的功能吗?我发现很难想到为什么在Kafka集群中确实需要Zookeeper的原因.

I can understand why brokers have the metadata, to not have to make a connection to zookeeper each time a new message is sent to them. Is there a function that zookeeper has that I'm missing? I'm finding it hard to think of a reason why zookeeper is really needed within a kafka cluster.

推荐答案

首先，仅高级消费者才需要zookeeper. SimpleConsumer不需要动物园管理员工作.

First of all, zookeeper is needed only for high level consumer. SimpleConsumer does not require zookeeper to work.

高级消费者需要Zookeeper的主要原因是跟踪消耗的偏移量并处理负载平衡.

The main reason zookeeper is needed for a high level consumer is to track consumed offsets and handle load balancing.

现在更详细.

关于偏移量跟踪，请设想以下情形:启动使用者，使用100条消息，然后关闭使用者.下次启动消费者时，您可能需要从上次消耗的偏移量(即100)中恢复，这意味着您必须将最大消耗的偏移量存储在某个位置.这是Zookeeper介入的地方:它存储每个组/主题/分区的偏移量.因此，这样，下次您启动消费者时，它可能会问嘿，动物园管理员，我应该从那里开始消费的抵消量是多少?". Kafka实际上正在朝着不仅能够在Zookeeper中存储偏移量而且还可以在其他存储中存储偏移量的目的(目前仅zookeeper和kafka偏移量存储可用，我不确定kafka存储是否已完全实现)

Regarding offset tracking, imagine following scenario: you start a consumer, consume 100 messages and shut the consumer down. Next time you start your consumer you'll probably want to resume from your last consumed offset (which is 100), and that means you have to store the maximum consumed offset somewhere. Here's where zookeeper kicks in: it stores offsets for every group/topic/partition. So this way next time you start your consumer it may ask "hey zookeeper, what's the offset I should start consuming from?". Kafka is actually moving towards being able to store offsets not only in zookeeper, but in other storages as well (for now only zookeeper and kafka offset storages are available and i'm not sure kafka storage is fully implemented).

关于负载平衡，产生的消息量可能很大，无法由一台计算机处理，因此您可能需要在某个时候增加计算能力.假设您有一个包含100个分区的主题，要处理此消息，您需要10台计算机.实际上，这里有几个问题:

Regarding load balancing, the amount of messages produced can be quite large to be handled by 1 machine and you'll probably want to add computing power at some point. Lets say you have a topic with 100 partitions and to handle this amount of messages you have 10 machines. There are several questions that arise here actually:

这10台计算机应如何在彼此之间划分分区?
如果其中一台机器死了，会发生什么?
如果要添加另一台计算机会发生什么?

这又是动物园管理员加入的地方:它跟踪组中的所有消费者，并且每个高级消费者都订阅了该组中的更改.关键是，当某个使用者出现或消失时，动物园管理员会通知所有使用者并触发重新平衡，以便他们将分区几乎相等地分割(例如以平衡负载).这样，它可以保证某个消费者是否死亡，其他消费者将继续处理该消费者拥有的分区.

And again, here's where zookeeper kicks in: it tracks all consumers in group and each high level consumer is subscribed for changes in this group. The point is that when a consumer appears or disappears, zookeeper notifies all consumers and triggers rebalance so that they split partitions near-equally (e.g. to balance load). This way it guarantees if one of consumer dies others will continue processing partitions that were owned by this consumer.

这篇关于为什么Kafka的消费者连接到Zookeeper，而生产者为什么从经纪人那里获取元数据?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！