本文介绍了ClickHouse Kafka 性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

按照文档中的示例:https://clickhouse.yandex/docs/en/table_engines/kafka/

我使用 Kafka 引擎创建了一个表和一个将数据推送到MergeTree 表的物化视图.

这里是我的表的结构:

CREATE TABLE 游戏(用户 ID UInt32,活动类型 UInt8,金额 Float32,CurrencyId UInt8,日期字符串) 引擎 = Kafka('XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092', '游戏', 'click-1', 'JSONEachRow', '3');创建表tests.games_transactions(日期,用户 ID UInt32,金额 Float32,CurrencyId UInt8,时间值日期时间,活动类型 UInt8) 引擎 = MergeTree(day, (day, UserId), 8192);创建物化视图tests.games_consumer到tests.games_transactionsAS SELECT toDate(replaceRegexpOne(Date,'\\..*','')) 作为日期,UserId, Amount, CurrencyId, toDateTime(replaceRegexpOne(Date,'\\..*','')) 作为时间值,活动类型从 default.games;

在 Kafka 主题中,我每秒收到大约 150 条消息.

一切正常,表中更新的部分数据延迟很大,绝对不是实时的.

似乎只有当我达到 65536 条新消息准备在 Kafka 中消费时,数据才会从 Kafka 发送到表中

我应该设置一些特定的配置吗?

我尝试从 cli 更改配置:

SET max_insert_block_size=1048设置 max_block_size=655设置 stream_flush_interval_ms=750

但是没有改善

我应该更改任何特定配置吗?
在创建表之前,我应该更改上述配置吗?

解决方案

ClickHouse github 上有一个问题- https://github.com/yandex/ClickHouse/issues/2169.>

基本上你需要设置 max_block_size (http://clickhouse-docs.readthedocs.io/en/latest/settings/settings.html#max-block-size) 之前创建表,否则将无法工作.

我使用了覆盖 users.xml 的解决方案:

<个人资料><max_block_size>100</max_block_size></默认></个人资料></yandex>

我删除了我的表和数据库并重新创建了它们.它对我有用.现在表可能每 100 条记录更新一次.

Following the example from the documentation:https://clickhouse.yandex/docs/en/table_engines/kafka/

I created a table with Kafka Engine and a materialized view that pushes data to a MergeTree table.

Here the structure of my tables:

CREATE TABLE games (
    UserId UInt32,
    ActivityType UInt8,
    Amount Float32,
    CurrencyId UInt8,
    Date String
  ) ENGINE = Kafka('XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092,XXXX.eu-west-1.compute.amazonaws.com:9092', 'games', 'click-1', 'JSONEachRow', '3');


CREATE TABLE tests.games_transactions (
    day Date,
    UserId UInt32,
    Amount Float32,
    CurrencyId UInt8,
    timevalue DateTime,
    ActivityType UInt8
 ) ENGINE = MergeTree(day, (day, UserId), 8192);


  CREATE MATERIALIZED VIEW tests.games_consumer TO tests.games_transactions
    AS SELECT toDate(replaceRegexpOne(Date,'\\..*','')) as day, UserId, Amount, CurrencyId, toDateTime(replaceRegexpOne(Date,'\\..*','')) as timevalue, ActivityType
    FROM default.games;

In the Kafka topic I am getting around 150 messages per second.

Everything is fine, a part that the data are updated in the table with a big delay, definitely not in real time.

Seems that the data are sent from Kafka to the table only when I reach 65536 new messages ready to consume in Kafka

Should I set some particular configuration?

I tried to change the configurations from the cli:

SET max_insert_block_size=1048
SET max_block_size=655
SET stream_flush_interval_ms=750

But there was no improvement

Should I change any particular configuration?
Should I have changed the above configurations before to create the tables?

解决方案

There is an issue for this on ClickHouse github - https://github.com/yandex/ClickHouse/issues/2169.

Basically you need to set max_block_size (http://clickhouse-docs.readthedocs.io/en/latest/settings/settings.html#max-block-size) before table is created, otherwise it will not work.

I used the solution with overriding users.xml:

<yandex>
    <profiles>
        <default>
           <max_block_size>100</max_block_size>
        </default>
    </profiles>
</yandex>

I deleted my table and db and recreated them. It has worked for me. Now may tables get updated every 100 records.

这篇关于ClickHouse Kafka 性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-17 07:16