当你想切换了集群想把hive迁移至新集群,两个集群之间又互不相通,一个最简单快捷的方法,就是批量导出元数据信息,同时把数据文件下载上传至新服务器的数据文件存放目录下

批量导出云数据:

# database 即是数据库名称
hive -e "use <database>; show tables;" | xargs -I '{}' sh -c 'hive -e "use <database>; show create table '{}';"' > tables.sql

有的时候会用这个如下命令进行导出:

hive -e "use <database>; show create table <table>;" > tables.sql

可能会报错:

FAILED: ParseException line 1:19 cannot recognize input near '*' '' '' in table name

因为在 Hive中,show create table 命令不支持使用通配符 * 来查询所有表的结构。

将原服务器中 hive 库文件复制到本地(这个会消耗比较长的时间)

hadoop fs -get /user/hive/warehouse/ods.db/ /path/to/destination/

将导下来的 tables.sql 和库文件上传至目标服务器,导入 sql 文件(前提是源服务器和目的服务器的hive版本要一致,否则需要做相关转换)

查看 hive 版本并导入

先创建数据库
hive> show databases;
OK
default
Time taken: 1.399 seconds, Fetched: 1 row(s)
hive> create database osd;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=root, access=WRITE, inode="/user":hdfs:hadoop:drwxr-xr-x
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:256)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:194)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1855)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1839)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1798)
        at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:61)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3101)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1123)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:696)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
)

报错的原因是:Hive没有足够的权限来在HDFS上创建数据库目录。默认情况下,Hive使用当前用户的身份来执行操作,因此您需要确保当前用户(在这种情况下为“root”)具有在HDFS上创建目录的权限

用如下命令创建:

#查看 hive 是否有 /user 目录的权限
[root@hadoop105 hive_db]# hdfs dfs -ls /
drwxr-xr-x   - hdfs hadoop              0 2023-05-26 17:52 /user
#如下操作创建数据库
[root@hadoop105 hive_db]# su - hdfs
Last login: Fri May 26 17:49:25 CST 2023 on pts/1
[hdfs@hadoop105 ~]$ hive -e "create database ods;"
#同时需要修改sql文件中以下部分内容,将hadoop40这个nameserver改为与自己hdfs namenode节点一致的主机名,因为每个集群的服务节点都不一样,名字也可能不一样
LOCATION
  'hdfs://hadoop40/user/hive/warehouse/***/***
#进行批量修改,即old_string是原主机名,new_string是新主机名
[root@hadoop105 hive_db]# vi tables.sql
:%s/old_string$/new_string/g

如果出现如下错误:

[root@hadoop105 hive_db]# hive -f test.sql 
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-common-2.1.1-cdh6.3.2.jar!/hive-log4j2.properties Async: false
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=root, access=WRITE, inode="/user/hive/warehouse/ods.db":hdfs:hadoop:drwxr-xr-x
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:256)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:194)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1855)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1839)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1798)
        at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:61)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3101)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1123)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:696)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
)

将文件授权给 hdfs 用户,然后使用 hdfs 用户进行执行

[root@hadoop105 hive_db]# chown hdfs:hdfs test.sql 
[root@hadoop105 hive_db]# su - hdfs
Last login: Fri May 26 18:09:45 CST 2023 on pts/0
[hdfs@hadoop105 ~]$ hive -f /opt/hive_db/test.sql 
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-common-2.1.1-cdh6.3.2.jar!/hive-log4j2.properties Async: false
OK
Time taken: 2.503 seconds

上面的test.sql我只加了一条 create 数据,但是实际导出的sql文件是包含很多条create语句的,执行会出现如下错误

[hdfs@hadoop105 ~]$ hive -f /opt/hive_db/test.sql 
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-common-2.1.1-cdh6.3.2.jar!/hive-log4j2.properties Async: false
FAILED: ParseException line 42:0 missing EOF at 'CREATE' near ')'

即语法错误,这是因为导出的sql文件中,create() 和 create() 语句直接结束后没有分号隔开,即

create(
    ***
    ***
);
create();
create()

最后一个结束不需要加分号,但是中间的都需要,测试一个两条create()数据的语句结果

[hdfs@hadoop105 ~]$ hive -f /opt/hive_db/test1.sql 
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-common-2.1.1-cdh6.3.2.jar!/hive-log4j2.properties Async: false
OK
Time taken: 2.368 seconds
OK
Time taken: 0.251 seconds

元数据导入成功之后,进行数据导入,这里也只测试了其中一个表数据

#此处只单独测试表 table001 导入
[hdfs@hadoop105 ~]$ hdfs dfs -put /opt/hive_db/table001 /user/hive/warehouse/ods.db/
#导入一般很快,不会有其他信息输出

#注意:上面执行的命令是导出所有表的
hadoop fs -get /user/hive/warehouse/ods.db/ /path/to/destination/
#单独导出表就具体到表名字
hadoop fs -get /user/hive/warehouse/ods.db/table001 /path/to/destination/

重启 hive 或刷新 hive 元数据

hive> REFRESH TABLE table_name;

最后执行计算表的统计信息,以便优化查询计划

ANALYZE TABLE table_name COMPUTE STATISTICS;
05-30 12:35