本文介绍了如何管理冲突的DataProc Guava,Protobuf和GRPC依赖项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一个scala Spark作业,该作业需要使用Java库(youtube/vitess),该库依赖于当前提供的GRPC(1.01),Guava(19.0)和Protobuf(3.0.0)的较新版本. DataProc 1.1映像.

I am working on a scala Spark job which needs to use java library (youtube/vitess) which is dependent upon newer versions of GRPC (1.01), Guava (19.0), and Protobuf (3.0.0) than currently provided on the DataProc 1.1 image.

在本地运行项目并使用maven进行构建时,将加载这些依赖项的正确版本,作业将无问题运行.将作业提交给DataProc时,首选这些库的DataProc版本,并且该作业将引用无法解析的类函数.

When running the project locally and building with maven, the correct versions of these dependencies are loaded an the job will run without issue. When submitting the job to DataProc, the DataProc version of these libraries are preferred and the job will reference class functions that cannot be resolved.

在DataProc上提交Spark作业时,确保加载正确版本的依赖项依赖项的推荐方法是什么?我无法重写此库的组件,以使用由DataProc提供的这些软件包的较旧版本.

What is the recommended way of ensuring that the right version of a dependency's dependencies get loaded when submitting a Spark job on DataProc? I'm not in a position to rewrite components of this library to use the older versions of these packages that are being provided by DataProc.

推荐答案

推荐的方法是将作业的所有依赖项都放入uber jar(使用 Maven Shade 插件)和重新定位此uber jar中的依赖项类,以避免与Dataproc提供的库中的类冲突.

Recommended approach is to include all dependencies for your job into uber jar (created using Maven Shade plugin, for example) and relocate dependencies classes inside this uber jar to avoid conflicts with classes in libraries provided by Dataproc.

作为参考,您可以看看它是完成" rel ="nofollow noreferrer">云存储连接器这是Dataproc发行版的一部分.

For reference, you can take a look at how this is done in Cloud Storage connector which is a part of Dataproc distribution.

这篇关于如何管理冲突的DataProc Guava,Protobuf和GRPC依赖项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 16:36