一、问题现象,使用flink on yarn 模式,写入数据到clickhouse,但是在yarn 集群充足的情况下一直报:Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster,表面现象是 yarn 集群资源可能不够,实际yarn 集群资源是够用的。

查看flink jobmanager的日志,发现日志中一直在出现如下报错:

Could not resolve ResourceManager address akka.tcp://flink@xxxxxxxxx.cn:38121/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://xxxxxxx.cn:38121/user/rpc/resourcemanager_*.

从这个日志来看,也就基本可以确定不是yarn集群资源的问题,是yarn 集群通信出现了问题。

1)、交叉验证,发现提交别的flink streamling 任务都不会存在该问题,只有写clickhouse的时候才会出现该问题,初步排除可能是代码问题或者该任务的jar包引起的。

2)、查看pom依赖:

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-jdbc_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.11</artifactId>
            <version>${flink.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka_2.11</artifactId>
            <version>${flink.version}</version>
      </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>${flink.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>ru.yandex.clickhouse</groupId>
            <artifactId>clickhouse-jdbc</artifactId>
            <version>${clickhouse-jdbc.version}</version>
       </dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>${mysql-connector-java.version}</version>
</dependency>
08-14 18:48