1 - 为什么要自己编译 Hadoop

一般个人安装使用的都是 Apache 的 Hadoop(还有 CDH Hadoop等等)。

从 Apache 官网下载的安装包是在一些特定的机器上编译而来的,并不能兼容所有的环境,尤其是本地库(用来压缩,支持C程序等等),不同平台有不同的限制。

2 - 准备编译环境

1)本机系统:macOS Big Sur 11.0.1版本;

保证能够连接互联网,Linux 系统,需要关闭防火墙和SELinux:

service iptables stop
chkconfig iptables off

# 关闭SELinux
vim /etc/selinux/config
# 注释:SELINUX=enforcing
# 添加:SELINUX=disable

2)配置 JDK 环境变量,版本为1.8.0_162;

Linux 系统,一般要卸载掉系统自带的 Java 环境:

# 查看已安装的版本:
rpm -qa | grep java
# 卸载:
rpm -e java-1.6.0-openjdk-1.6.0.41-1.13.13.1.el6_8.x86_64  java-1.7.0-openjdk-1.7.0.131-2.6.9.0.el6_8.x86_64

3)安装 Maven,版本为 3.5.2;

Hadoop - macOS 上编译 Hadoop 3.2.1-LMLPHP

为了加速依赖的下载,可以添加阿里云的 Maven 镜像:

 <mirror>
      <id>alimaven</id>
      <name>aliyun maven repo</name>
      <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
      <mirrorOf>central</mirrorOf>
    </mirror>

4)上述软件的环境变量信息:

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home
export CLASSPATH=$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar:.
export PATH=$JAVA_HOME/bin:$PATH:.

export MAVEN_HOME=/usr/local/apache-maven-3.5.2
export PATH=$PATH:$MAVEN_HOME/bin

export HADOOP_HOME=/Users/healchow/bigdata/hadoop-3.2.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

3 - 安装依赖库

1)安装 gcc,cmake,以及 GNU 相关库:

brew install gcc cmake autoconf automake libtool

2)安装 gzip、bzip2、zlib、snappy 等压缩库:

brew install gzip bzip2 zlib

手动安装 snappy 1.1.4 —— 其他版本会出错!

# 下载并解压:
wget https://github.com/google/snappy/archive/1.1.4.tar.gz
tar -zxf 1.1.4.tar.gz
cd snappy-1.1.4

# 指定安装路径,便于 brew 链接(不指定,就会安装到 /usr/local/bin)
./autogen.sh
./configure --prefix=/usr/local/Cellar/snappy/1.1.4
# 编译并安装到上面的路径:
make && make install
# 添加到环境变量:
brew link snappy

3)安装 openssl 依赖,并配置环境变量:

brew install openssl

~/.bash_profile 中添加环境变量:

export OPENSSL_ROOT_DIR="/usr/local/opt/openssl@1.1"
export OPENSSL_INCLUDE_DIR="$OPENSSL_ROOT_DIR/include"
export PKG_CONFIG_PATH="${OPENSSL_ROOT_DIR}/lib/pkgconfig"

# 保存后,令环境变量立即生效:
source ~/.bash_profile

4)手动安装 protobuf 2.5.0:

下载链接:https://github.com/protocolbuffers/protobuf/releases/tag/v2.5.0,解压后,编译安装:

wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
tar -zxf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0
# 指定安装路径,便于 brew 链接(不指定,就会安装到 /usr/local/bin)
./configure --prefix=/usr/local/Cellar/protobuf/2.5.0
# 编译,并安装到上面的路径:
make && make install
# 添加到环境变量:
brew link protobuf

5)可选安装 isa-l:

先安装 nasm:brew install nasm

然后下载源码包(https://github.com/intel/isa-l/releases),编译安装:

cd isa-l-2.28.0

# 执行创建configure
autoreconf --install --symlink -f
./configure --prefix=/usr/local/Cellar/isa-l --libdir=/usr/local/Cellar/isa-l/lib  AS=yasm --target=darwin

# 编译安装:
make && make install

# 创建软链接:
cd /usr/local/lib
ln -s /usr/local/Cellar/isa-l/lib/libisal.2.dylib libisal.2.dylib
ln -s /usr/local/Cellar/isa-l/lib/libisal.a libisal.a
ln -s /usr/local/Cellar/isa-l/lib/libisal.dylib libisal.dylib
ln -s /usr/local/Cellar/isa-l/lib/libisal.la libisal.la

cd /usr/local/lib/pkgconfig
ln -s /usr/local/Cellar/isa-l/lib/pkgconfig/libisal.pc libisal.pc

4 - 编译 Hadoop 源码

下载 Apache Hadoop 源码包,这里下载 3.2.1 版本(https://archive.apache.org/dist/hadoop/core/hadoop-3.2.1/);

下载后,解压到 ${HOME}/bigdata/

编译源码,编译命令是:

cd ${HOME}/bigdata/hadoop-3.2.1-src

# 编译支持 snappy 压缩,需要指定 openssl.prefix,否则默认使用 macOS 自带的 openssl,会导致编译失败:
# -e -X 参数是打印编译过程中的所有日志:
mvn clean package -DskipTests -Pdist,native -Dmaven.javadoc.skip -Dtar \
-Drequire.bzip2 -Dbzip2.prefix=/usr/local/Cellar/bzip2/1.0.8 \
-Drequire.openssl -Dopenssl.prefix=/usr/local/Cellar/openssl@1.1/1.1.1k \
-Drequire.snappy -Dsnappy.lib=/usr/local/Cellar/snappy/1.1.4/lib \
-Drequire.isal -Disal.prefix=/usr/local/Cellar/isa-l -Disal.lib=/usr/local/Cellar/isa-l/lib \
-e -X

5 - 遇到的问题及解决方法

5.1 hadoop-common 模块编译出错

[WARNING] CMake Warning (dev) at CMakeLists.txt:47 (find_package):
[WARNING]   Policy CMP0074 is not set: find_package uses <PackageName>_ROOT variables.
[WARNING]   Run "cmake --help-policy CMP0074" for policy details.  Use the cmake_policy
[WARNING]   command to set the policy and suppress this warning.
[WARNING]
[WARNING]   Environment variable ZLIB_ROOT is set to:
[WARNING]
[WARNING]     /usr/local/Cellar/zlib/1.2.11/
[WARNING]
[WARNING]   For compatibility, CMake is ignoring the variable.
[WARNING] This warning is for project developers.  Use -Wno-dev to suppress it.
[WARNING]
[WARNING] CMake Error at /usr/local/Cellar/cmake/3.20.5/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
[WARNING]   Could NOT find ZLIB (missing: ZLIB_LIBRARY) (found version "1.2.11")
[WARNING] Call Stack (most recent call first):
[WARNING]   /usr/local/Cellar/cmake/3.20.5/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
[WARNING]   /usr/local/Cellar/cmake/3.20.5/share/cmake/Modules/FindZLIB.cmake:120 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
[WARNING]   CMakeLists.txt:47 (find_package)
Hadoop - macOS 上编译 Hadoop 3.2.1-LMLPHP

它提示说找不到 ZLIB_LIBRARY,而 ZLIB_ROOT 被忽略了。看看我的环境变量:

export ZLIB_ROOT=/usr/local/Cellar/zlib/1.2.11
export ZLIB_LIBRARY=/usr/local/Cellar/zlib/1.2.11/lib
export ZLIB_INCLUDE_DIR=/usr/local/Cellar/zlib/1.2.11/include

经过一番查找,原来 XXX_ROOT 在 CMake 3.12 以上是这样的作用:

Hadoop - macOS 上编译 Hadoop 3.2.1-LMLPHP

再参考这位大神(https://github.com/MarkDana/Compile-Hadoop2.2.0-on-MacOS)的分析:

cd /usr/local/Cellar/cmake/3.20.5/share/cmake/Modules
vim FindZLIB.cmake

所以我们只需要设置 ZLIB_ROOT 即可,为了让此变量生效,需要在 CMakeFile 中启用 cmake 的 CMP0074 策略:

修改报错项目对应的 CMake 配置:

vim hadoop-common-project/hadoop-common/src/CMakeLists.txt,启用新特性:

# 在 cmake_minimum_required(VERSION 3.1 FATAL_ERROR) 之后,加入这一行:
cmake_policy(SET CMP0074 NEW)

最后,环境变量中只需要保留这一行即可:

export ZLIB_ROOT=/usr/local/Cellar/zlib/1.2.11

然后,此错误就消失了。

5.2 hadoop-common 模块,仍然出错

[WARNING] CMake Warning (dev) in CMakeLists.txt:
[WARNING]   No project() command is present.  The top-level CMakeLists.txt file must
[WARNING]   contain a literal, direct call to the project() command.  Add a line of
[WARNING]   code such as
[WARNING]
[WARNING]     project(ProjectName)
[WARNING]
[WARNING]   near the top of the file, but after cmake_minimum_required().
[WARNING]
[WARNING]   CMake is pretending there is a "project(Project)" command on the first
[WARNING]   line.
[WARNING] This warning is for project developers.  Use -Wno-dev to suppress it.
[WARNING]
[WARNING] CMake Error at CMakeLists.txt:68 (message):
[WARNING]   Required bzip2 library and/or header files could not be found.
[WARNING]
[WARNING]
[WARNING] -- Configuring incomplete, errors occurred!

找不到 bzip2 库或相关的头文件。。。可是我的 bzip2 环境变量都已经设置了呀:

export BZIP2_ROOT=/usr/local/Cellar/bzip2/1.0.8
export BZIP2_INCLUDE_DIR=/usr/local/Cellar/bzip2/1.0.8/include
export BZIP2_LIBRARY=/usr/local/Cellar/bzip2/1.0.8

其他各种环境变量和 LDFLAGS、CPPFLAGS 设置都无效;

经过各种搜索,感觉 macOS 上就不能编译 bzip2。所以,我改了这里,跳过检查:

# 修改下面一行,直接设置了 REQUIRE_BZIP2,即 TRUE
# if(BZIP2_INCLUDE_DIR AND BZIP2_LIBRARIES)
if(REQUIRE_BZIP2)

5.3 MapReduce NativeTask 模块编译出错

[WARNING] 2 warnings and 12 errors generated.
[WARNING] make[2]: *** [CMakeFiles/nttest.dir/main/native/test/TestCompressions.cc.o] Error 1
[WARNING] make[2]: *** Waiting for unfinished jobs....
[WARNING] make[1]: *** [CMakeFiles/nttest.dir/all] Error 2
[WARNING] make: *** [all] Error 2
......
[INFO] Apache Hadoop MapReduce NativeTask ................. FAILURE [ 21.506 s]
Hadoop - macOS 上编译 Hadoop 3.2.1-LMLPHP

搜索后得知,brew 安装的 snappy 版本是最新的 1.1.9,是通过 C++11 编译的,但是 Hadoop 3.2.1 的编译不支持C++11。

期间,又尝试安装了 snappy 1.1.5 编译还是会出错:

[WARNING] CMake Error at CMakeLists.txt:96 (message):
[WARNING]   Required snappy library could not be found.
[WARNING]   SNAPPY_LIBRARY=SNAPPY_LIBRARY-NOTFOUND, SNAPPY_INCLUDE_DIR=,
[WARNING]   CUSTOM_SNAPPY_INCLUDE_DIR=, CUSTOM_SNAPPY_PREFIX=, CUSTOM_SNAPPY_INCLUDE=

尝试添加了环境变量,不起作用:

export SNAPPY_LIBRARY=/usr/local/Cellar/snappy/1.1.5
export SNAPPY_INCLUDE_DIR=/usr/local/Cellar/snappy/1.1.5/include

# 仍然会报下面的错:
Required snappy library could not be found.
[WARNING]   SNAPPY_LIBRARY=SNAPPY_LIBRARY-NOTFOUND, SNAPPY_INCLUDE_DIR=,
[WARNING]   CUSTOM_SNAPPY_INCLUDE_DIR=, CUSTOM_SNAPPY_PREFIX=, CUSTOM_SNAPPY_INCLUDE=

所以我又安装了最上面提到的 snappy 1.1.4,再测试,然后它终于编译成功了✌️

6 - 编译成功,测试验证

编译命令已经放在上面第4姐了。贴上编译成功的证明✌️

[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main ................................. SUCCESS [  1.893 s]
[INFO] Apache Hadoop Build Tools .......................... SUCCESS [  4.338 s]
[INFO] Apache Hadoop Project POM .......................... SUCCESS [  1.560 s]
[INFO] Apache Hadoop Annotations .......................... SUCCESS [  2.337 s]
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [  0.359 s]
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [  1.777 s]
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [  3.786 s]
[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [  0.951 s]
[INFO] Apache Hadoop Auth ................................. SUCCESS [  6.846 s]
[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [  1.994 s]
[INFO] Apache Hadoop Common ............................... SUCCESS [ 54.530 s]
[INFO] Apache Hadoop NFS .................................. SUCCESS [  3.630 s]
[INFO] Apache Hadoop KMS .................................. SUCCESS [  5.173 s]
[INFO] Apache Hadoop Common Project ....................... SUCCESS [  0.118 s]
[INFO] Apache Hadoop HDFS Client .......................... SUCCESS [ 27.638 s]
[INFO] Apache Hadoop HDFS ................................. SUCCESS [ 31.633 s]
[INFO] Apache Hadoop HDFS Native Client ................... SUCCESS [02:38 min]
[INFO] Apache Hadoop HttpFS ............................... SUCCESS [  4.768 s]
[INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [  1.722 s]
[INFO] Apache Hadoop HDFS-RBF ............................. SUCCESS [  5.303 s]
[INFO] Apache Hadoop HDFS Project ......................... SUCCESS [  0.042 s]
[INFO] Apache Hadoop YARN ................................. SUCCESS [  0.054 s]
[INFO] Apache Hadoop YARN API ............................. SUCCESS [  6.738 s]
[INFO] Apache Hadoop YARN Common .......................... SUCCESS [  9.302 s]
[INFO] Apache Hadoop YARN Registry ........................ SUCCESS [  2.945 s]
[INFO] Apache Hadoop YARN Server .......................... SUCCESS [  0.133 s]
[INFO] Apache Hadoop YARN Server Common ................... SUCCESS [  8.103 s]
[INFO] Apache Hadoop YARN NodeManager ..................... SUCCESS [ 40.942 s]
[INFO] Apache Hadoop YARN Web Proxy ....................... SUCCESS [  1.310 s]
[INFO] Apache Hadoop YARN ApplicationHistoryService ....... SUCCESS [  2.386 s]
[INFO] Apache Hadoop YARN Timeline Service ................ SUCCESS [  1.992 s]
[INFO] Apache Hadoop YARN ResourceManager ................. SUCCESS [ 12.021 s]
[INFO] Apache Hadoop YARN Server Tests .................... SUCCESS [  1.714 s]
[INFO] Apache Hadoop YARN Client .......................... SUCCESS [  2.445 s]
[INFO] Apache Hadoop YARN SharedCacheManager .............. SUCCESS [  1.740 s]
[INFO] Apache Hadoop YARN Timeline Plugin Storage ......... SUCCESS [  1.592 s]
[INFO] Apache Hadoop YARN TimelineService HBase Backend ... SUCCESS [  0.061 s]
[INFO] Apache Hadoop YARN TimelineService HBase Common .... SUCCESS [  2.382 s]
[INFO] Apache Hadoop YARN TimelineService HBase Client .... SUCCESS [  2.167 s]
[INFO] Apache Hadoop YARN TimelineService HBase Servers ... SUCCESS [  0.124 s]
[INFO] Apache Hadoop YARN TimelineService HBase Server 1.2  SUCCESS [  2.625 s]
[INFO] Apache Hadoop YARN TimelineService HBase tests ..... SUCCESS [  3.917 s]
[INFO] Apache Hadoop YARN Router .......................... SUCCESS [  1.785 s]
[INFO] Apache Hadoop YARN Applications .................... SUCCESS [  0.119 s]
[INFO] Apache Hadoop YARN DistributedShell ................ SUCCESS [  1.679 s]
[INFO] Apache Hadoop YARN Unmanaged Am Launcher ........... SUCCESS [  1.112 s]
[INFO] Apache Hadoop MapReduce Client ..................... SUCCESS [  0.196 s]
[INFO] Apache Hadoop MapReduce Core ....................... SUCCESS [  5.185 s]
[INFO] Apache Hadoop MapReduce Common ..................... SUCCESS [  2.387 s]
[INFO] Apache Hadoop MapReduce Shuffle .................... SUCCESS [  1.852 s]
[INFO] Apache Hadoop MapReduce App ........................ SUCCESS [  3.299 s]
[INFO] Apache Hadoop MapReduce HistoryServer .............. SUCCESS [  1.948 s]
[INFO] Apache Hadoop MapReduce JobClient .................. SUCCESS [  3.972 s]
[INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [  1.252 s]
[INFO] Apache Hadoop YARN Services ........................ SUCCESS [  0.040 s]
[INFO] Apache Hadoop YARN Services Core ................... SUCCESS [  2.626 s]
[INFO] Apache Hadoop YARN Services API .................... SUCCESS [  1.434 s]
[INFO] Apache Hadoop Image Generation Tool ................ SUCCESS [  0.980 s]
[INFO] Yet Another Learning Platform ...................... SUCCESS [  1.346 s]
[INFO] Apache Hadoop YARN Site ............................ SUCCESS [  0.044 s]
[INFO] Apache Hadoop YARN UI .............................. SUCCESS [  0.069 s]
[INFO] Apache Hadoop YARN Project ......................... SUCCESS [  9.978 s]
[INFO] Apache Hadoop MapReduce HistoryServer Plugins ...... SUCCESS [  0.671 s]
[INFO] Apache Hadoop MapReduce NativeTask ................. SUCCESS [ 39.343 s]
[INFO] Apache Hadoop MapReduce Uploader ................... SUCCESS [  0.862 s]
[INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [  1.086 s]
[INFO] Apache Hadoop MapReduce ............................ SUCCESS [  4.303 s]
[INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [  0.906 s]
[INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [  1.362 s]
[INFO] Apache Hadoop Archives ............................. SUCCESS [  0.496 s]
[INFO] Apache Hadoop Archive Logs ......................... SUCCESS [  0.599 s]
[INFO] Apache Hadoop Rumen ................................ SUCCESS [  1.424 s]
[INFO] Apache Hadoop Gridmix .............................. SUCCESS [  0.968 s]
[INFO] Apache Hadoop Data Join ............................ SUCCESS [  0.466 s]
[INFO] Apache Hadoop Extras ............................... SUCCESS [  0.543 s]
[INFO] Apache Hadoop Pipes ................................ SUCCESS [  4.609 s]
[INFO] Apache Hadoop OpenStack support .................... SUCCESS [  0.960 s]
[INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [  3.611 s]
[INFO] Apache Hadoop Kafka Library support ................ SUCCESS [  0.838 s]
[INFO] Apache Hadoop Azure support ........................ SUCCESS [  2.333 s]
[INFO] Apache Hadoop Aliyun OSS support ................... SUCCESS [  0.449 s]
[INFO] Apache Hadoop Client Aggregator .................... SUCCESS [  3.429 s]
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [  2.420 s]
[INFO] Apache Hadoop Resource Estimator Service ........... SUCCESS [  1.536 s]
[INFO] Apache Hadoop Azure Data Lake support .............. SUCCESS [  0.592 s]
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [ 10.087 s]
[INFO] Apache Hadoop Tools ................................ SUCCESS [  0.048 s]
[INFO] Apache Hadoop Client API ........................... SUCCESS [01:24 min]
[INFO] Apache Hadoop Client Runtime ....................... SUCCESS [01:05 min]
[INFO] Apache Hadoop Client Packaging Invariants .......... SUCCESS [  0.262 s]
[INFO] Apache Hadoop Client Test Minicluster .............. SUCCESS [02:00 min]
[INFO] Apache Hadoop Client Packaging Invariants for Test . SUCCESS [  0.174 s]
[INFO] Apache Hadoop Client Packaging Integration Tests ... SUCCESS [  0.151 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [ 22.983 s]
[INFO] Apache Hadoop Client Modules ....................... SUCCESS [  0.053 s]
[INFO] Apache Hadoop Cloud Storage ........................ SUCCESS [  0.607 s]
[INFO] Apache Hadoop Cloud Storage Project ................ SUCCESS [  0.054 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 14:07 min
[INFO] Finished at: 2021-06-30T00:10:40+08:00
[INFO] Final Memory: 406M/2110M
[INFO] ------------------------------------------------------------------------
[WARNING] The requested profile "dev" could not be activated because it does not exist.

看截图:

Hadoop - macOS 上编译 Hadoop 3.2.1-LMLPHP

编译好的安装文件,在这个目录下:

我们需要的压缩包在 lib/native 下。

${源码}/hadoop-dist/target/hadoop-3.2.1
# 本地库包在这里:
${源码}/hadoop-dist/target/hadoop-3.2.1/lib/native
Hadoop - macOS 上编译 Hadoop 3.2.1-LMLPHP

拷贝 native 下的文件,到已经 Hadoop 集群的安装目录中,然后检查它对本地库的支持:

没有恼人的 WARN 警告了,zlib、snappy 等压缩功能也都有了✌️

Hadoop - macOS 上编译 Hadoop 3.2.1-LMLPHP

7 - 经验总结

1)尽量用 CentOS系统编译。macOS 编译,大部分本地库都不会通过,会卡死在 CMake。

2)额外使用的环境变量如下:

# 本地编译 Hadoop,必须设置 ZLIB_ROOT,且在 CMakeFile 中启用 cmake 的 CMP0074 策略:
export ZLIB_ROOT=/usr/local/Cellar/zlib/1.2.11
# export ZLIB_LIBRARY=/usr/local/Cellar/zlib/1.2.11/lib
# export ZLIB_INCLUDE_DIR=/usr/local/Cellar/zlib/1.2.11/include

export OPENSSL_ROOT_DIR="/usr/local/opt/openssl@1.1"
export OPENSSL_INCLUDE_DIR="$OPENSSL_ROOT_DIR/include"
export PKG_CONFIG_PATH="${OPENSSL_ROOT_DIR}/lib/pkgconfig"


07-02 19:12