本文介绍了向 Google Dataproc 提交 Uber Jar 时如何解决 Guava 依赖问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 maven shade 插件来构建 Uber jar,以便将其作为作业提交给 google dataproc 集群.Google 已在其集群上安装了 Apache Spark 2.0.2 Apache Hadoop 2.7.3.

I am using maven shade plugin to build Uber jar for submitting it as a job to google dataproc cluster.Google have installed Apache Spark 2.0.2 Apache Hadoop 2.7.3 on their cluster.

Apache spark 2.0.2 使用 com.google.guava 的 14.0.1 和 apache hadoop 2.7.3 使用 11.0.2,这两个应该已经在类路径中.

Apache spark 2.0.2 uses 14.0.1 of com.google.guava and apache hadoop 2.7.3 uses 11.0.2, these both should be in the classpath already.

<plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.0.0</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                    <!--
                        <artifactSet>
                            <includes>
                                <include>com.google.guava:guava:jar:19.0</include>
                            </includes>
                        </artifactSet>
                    -->
                        <artifactSet>
                            <excludes>
                                <exclude>com.google.guava:guava:*</exclude>
                            </excludes>
                        </artifactSet>
                    </configuration>
                </execution>
            </executions>
        </plugin>

当我在 shade 插件中包含 guava 16.0.1 jar 时,我得到这个 Eexception :

When I include guava 16.0.1 jar in shade plugin i get this Eexception :

Exception in thread "main" java.io.IOException: Failed to open native connection to Cassandra at {10.148.0.3}:9042
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:163)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:149)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:149)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:82)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:110)
at com.datastax.spark.connector.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:121)
at com.datastax.spark.connector.cql.Schema$.fromCassandra(Schema.scala:322)
at com.datastax.spark.connector.cql.Schema$.tableFromCassandra(Schema.scala:342)
at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider$class.tableDef(CassandraTableRowReaderProvider.scala:50)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef$lzycompute(CassandraTableScanRDD.scala:60)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef(CassandraTableScanRDD.scala:60)
at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider$class.verify(CassandraTableRowReaderProvider.scala:137)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.verify(CassandraTableScanRDD.scala:60)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.getPartitions(CassandraTableScanRDD.scala:232)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1913)
at org.apache.spark.rdd.RDD.count(RDD.scala:1134)
at com.test.scala.CreateVirtualTable$.main(CreateVirtualTable.scala:47)
at com.test.scala.CreateVirtualTable.main(CreateVirtualTable.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoSuchMethodError: com.google.common.util.concurrent.Futures.withFallback(Lcom/google/common/util/concurrent/ListenableFuture;Lcom/google/common/util/concurrent/FutureFallback;Ljava/util/concurrent/Executor;)Lcom/google/common/util/concurrent/ListenableFuture;
at com.datastax.driver.core.Connection.initAsync(Connection.java:177)
at com.datastax.driver.core.Connection$Factory.open(Connection.java:731)
at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:251)
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:199)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1414)
at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:393)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:156)

... 32 more
17/05/10 09:07:36 INFO

如果我排除 Guava 16.0.1 那么它会抛出这个异常

And If i exclude Guava 16.0.1 then it throws this exception

Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/reflect/TypeParameter
at com.datastax.driver.core.SanityChecks.checkGuava(SanityChecks.java:50)
at com.datastax.driver.core.SanityChecks.check(SanityChecks.java:36)
at com.datastax.driver.core.Cluster.<clinit>(Cluster.java:67)
at com.datastax.spark.connector.cql.DefaultConnectionFactory$.clusterBuilder(CassandraConnectionFactory.scala:35)
at com.datastax.spark.connector.cql.DefaultConnectionFactory$.createCluster(CassandraConnectionFactory.scala:92)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:154)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:149)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$3.apply(CassandraConnector.scala:149)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:82)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:110)
at com.datastax.spark.connector.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:121)
at com.datastax.spark.connector.cql.Schema$.fromCassandra(Schema.scala:322)
at com.datastax.spark.connector.cql.Schema$.tableFromCassandra(Schema.scala:342)
at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider$class.tableDef(CassandraTableRowReaderProvider.scala:50)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef$lzycompute(CassandraTableScanRDD.scala:60)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef(CassandraTableScanRDD.scala:60)
at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider$class.verify(CassandraTableRowReaderProvider.scala:137)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.verify(CassandraTableScanRDD.scala:60)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.getPartitions(CassandraTableScanRDD.scala:232)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1913)
at org.apache.spark.rdd.RDD.count(RDD.scala:1134)
at com.test.scala.CreateVirtualTable$.main(CreateVirtualTable.scala:47)
at com.test.scala.CreateVirtualTable.main(CreateVirtualTable.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.google.common.reflect.TypeParameter
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 38 more
17/05/11 08:24:00 INFO org.spark_project.jetty.server.ServerConnector: Stopped ServerConnector@edc6a5d{HTTP/1.1}{0.0.0.0:4040}
17/05/11 08:24:00 INFO com.datastax.spark.connector.util.SerialShutdownHooks: Successfully executed shutdown hook: Clearing session cache for C* connector

那么这里有什么问题呢?dataproc 上的类加载器是从 hadoop 中选择 guava 11.0.2 吗?因为番石榴 11.0.2 没有类 com/google/common/reflect/TypeParameter .请所有关注此标签的 google dataproc 开发人员提供帮助.

So what can be the problem here ?is classloader on dataproc picking guava 11.0.2 from hadoop ?as guava 11.0.2 does not have class com/google/common/reflect/TypeParameter .All the google dataproc developers watching this tag please help.

推荐答案

已编辑:参见 https://cloud.google.com/blog/products/data-analytics/managing-java-dependencies-apache-spark-applications-cloud-dataproc 用于 Maven 和 SBT 的完整示例.

Edited: See https://cloud.google.com/blog/products/data-analytics/managing-java-dependencies-apache-spark-applications-cloud-dataproc for a fully worked example for Maven and SBT.

原始答案当我制作 uber jars 以在 Hadoop/Spark/Dataproc 上运行时,我经常使用适合我需要的任何版本的番石榴,然后使用一个允许不同版本共存的阴影重定位:

Original AnswerWhen I make uber jars to run on Hadoop / Spark / Dataproc, I often use whichever version of guava suits my needs and then use a shade relocation which allows the different versions to co-exist without issue:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-shade-plugin</artifactId>
  <version>2.3</version>
  <executions>
    <execution>
      <phase>package</phase>
      <goals>
        <goal>shade</goal>
      </goals>
      <configuration>
      <artifactSet>
          <includes>
            <include>com.google.guava:*</include>
          </includes>
      </artifactSet>
      <minimizeJar>false</minimizeJar>
      <relocations>
          <relocation>
            <pattern>com.google.common</pattern>
            <shadedPattern>repackaged.com.google.common</shadedPattern>
          </relocation>
      </relocations>
      <shadedArtifactAttached>true</shadedArtifactAttached>
      </configuration>
  </execution>
</executions>
</plugin>

这篇关于向 Google Dataproc 提交 Uber Jar 时如何解决 Guava 依赖问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 16:34