本文介绍了HDFS NFS位置使用奇怪的数字用户名值获取目录权限的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在NFS安装的HDFS位置的文件夹权限中看到用户名的废话值,而HDFS位置本身(使用Hortonworks HDP 3.1)看起来很好.例如

Seeing nonsense values for user names in folder permissions for NFS mounted HDFS locations, while the HDFS locations themselves (using Hortonworks HDP 3.1) appear fine. Eg.

➜  ~ ls -lh /nfs_mount_root/user
total 6.5K
drwx------. 3 accumulo  hdfs    96 Jul 19 13:53 accumulo
drwxr-xr-x. 3  92668751 hadoop  96 Jul 25 15:17 admin
drwxrwx---. 3 ambari-qa hdfs    96 Jul 19 13:54 ambari-qa
drwxr-xr-x. 3 druid     hadoop  96 Jul 19 13:53 druid
drwxr-xr-x. 2 hbase     hdfs    64 Jul 19 13:50 hbase
drwx------. 5 hdfs      hdfs   160 Aug 26 10:41 hdfs
drwxr-xr-x. 4 hive      hdfs   128 Aug 26 10:24 hive
drwxr-xr-x. 5 h_etl   hdfs   160 Aug  9 14:54 h_etl
drwxr-xr-x. 3    108146 hdfs    96 Aug  1 15:43 ml1
drwxrwxr-x. 3 oozie     hdfs    96 Jul 19 13:56 oozie
drwxr-xr-x. 3 882121447 hdfs    96 Aug  5 10:56 q_etl
drwxrwxr-x. 2 spark     hdfs    64 Jul 19 13:57 spark
drwxr-xr-x. 6 zeppelin  hdfs   192 Aug 23 15:45 zeppelin
➜  ~ hadoop fs -ls /user
Found 13 items
drwx------   - accumulo   hdfs            0 2019-07-19 13:53 /user/accumulo
drwxr-xr-x   - admin      hadoop          0 2019-07-25 15:17 /user/admin
drwxrwx---   - ambari-qa  hdfs            0 2019-07-19 13:54 /user/ambari-qa
drwxr-xr-x   - druid      hadoop          0 2019-07-19 13:53 /user/druid
drwxr-xr-x   - hbase      hdfs            0 2019-07-19 13:50 /user/hbase
drwx------   - hdfs       hdfs            0 2019-08-26 10:41 /user/hdfs
drwxr-xr-x   - hive       hdfs            0 2019-08-26 10:24 /user/hive
drwxr-xr-x   - h_etl    hdfs            0 2019-08-09 14:54 /user/h_etl
drwxr-xr-x   - ml1        hdfs            0 2019-08-01 15:43 /user/ml1
drwxrwxr-x   - oozie      hdfs            0 2019-07-19 13:56 /user/oozie
drwxr-xr-x   - q_etl hdfs            0 2019-08-05 10:56 /user/q_etl
drwxrwxr-x   - spark      hdfs            0 2019-07-19 13:57 /user/spark
drwxr-xr-x   - zeppelin   hdfs            0 2019-08-23 15:45 /user/zeppelin

请注意,对于用户ml1和q_etl,它们在NFS位置上运行ls时具有数字用户值,而不是其用户名.甚至做类似的事情...

Notice the difference for users ml1 and q_etl that they have numerical user values when running ls on the NFS locations, rather then their user names. Even doing something like...

[hdfs@HW04 ml1]$ hadoop fs -chown ml1 /user/ml1

不会更改NFS权限.更烦人的是,当尝试以root身份更改NFS安装权限时,我们看到了

does not change the NFS permissions. Even more annoying, when trying to change the NFS mount permissions as root, we see

[root@HW04 ml1]# chown ml1 /nfs_mount_root/user/ml1
chown: changing ownership of ‘/nfs_mount_root/user/ml1’: Permission denied

这会导致真正的问题,因为不同的uid意味着即使以正确"用户的身份写这些目录也无法访问这些目录.不知道该怎么做.有更多Hadoop经验的人有任何调试建议或修补程序吗?

This causes real problems, since the differing uid means that I can't access these dirs even as the "correct" user to write to them. Not sure what to make of this. Anyone with more Hadoop experience have any debugging suggestions or fixes?

更新:

多做一些测试/调试,发现规则似乎是...

Doing a bit more testing / debugging, found that the rules appear to be...

  1. 如果NFS服务器节点没有与访问NFS挂载的节点上的用户的uid匹配的uid(或gid?),我们将获得奇怪的uid值,如此处所示.
  2. 如果在请求节点上有一个与用户名相关联的uid,那么这就是我们通过NFS访问时看到的分配给该位置的uid用户(即使NFS服务器节点上的该uid实际上不是对于请求用户),例如.
[root@HW01 ~]# clush -ab id ml1
---------------
HW[01,04] (2)
---------------
uid=1025(ml1) gid=1025(ml1) groups=1025(ml1)
---------------
HW[02-03] (2)
---------------
uid=1027(ml1) gid=1027(ml1) groups=1027(ml1)
---------------
HW05
---------------
uid=1026(ml1) gid=1026(ml1) groups=1026(ml1)
[root@HW01 ~]# exit
logout
Connection to hw01 closed.
➜  ~ ls -lh /hdpnfs/user
total 6.5K
...
drwxr-xr-x. 6 atlas     hdfs   192 Aug 27 12:04 ml1
...
➜  ~ hadoop fs -ls /user
Found 13 items
...
drwxr-xr-x   - ml1        hdfs            0 2019-08-27 12:04 /user/ml1
...
[root@HW01 ~]# clush -ab id atlas
---------------
HW[01,04] (2)
---------------
uid=1027(atlas) gid=1005(hadoop) groups=1005(hadoop)
---------------
HW[02-03] (2)
---------------
uid=1024(atlas) gid=1005(hadoop) groups=1005(hadoop)
---------------
HW05
---------------
uid=1005(atlas) gid=1006(hadoop) groups=1006(hadoop)

如果想知道为什么我的用户,在集群中的用户在整个集群节点上具有不同的uid,请查看此处发布的问题:如何(请注意,Hadoop服务用户的这些奇数uid设置是默认情况下由Ambari设置的).

If wondering why I have, user on the cluster that have varying uids across the cluster nodes, see the problem posted here: How to properly change uid for HDP / ambari-created user? (note that these odd uid setting for hadoop service users was set up by Ambari by default).

推荐答案

与HDP hadoop方面的知识渊博的人交谈后,发现问题在于 Ambari 已安装并开始最初安装hadoop集群,可能是指定位置上已有其他先前存在的用户群集节点.

After talking with someone more knowledgeable in HDP hadoop, found that the problem is that when Ambari was setup and run to initially install the hadoop cluster, there may have been other preexisting users on the designated cluster nodes.

Ambari通过为用户提供下一个可用节点的下一个可用UID来创建其各种服务用户块用户UID .但是,在节点上安装Ambari和HDP之前,我在将来名称节点(和其他名称)上创建了一些用户,以便进行一些初步的维护检查和测试.我本应该以root用户身份执行此操作.添加这些额外的用户会使这些节点上的UID计数器偏移,因此当Ambari在节点上创建用户并递增UID时,它是从不同的起始计数器值开始的.因此,UID不同步,并导致HDFS NFS出现问题.

Ambari creates its various service users by giving them the next available UID of a nodes available block of user UIDs. However, prior to installing Ambari and HDP on the nodes, I created some users on the to-be namenode (and others) in order to do some initial maintenance checks and tests. I should have just done this as root. Adding these extra users offset the UID counter on those nodes and so as Ambari created users on the nodes and incremented the UIDs, it was starting from different starting counter values. Thus, the UIDs did not sync and caused problems with HDFS NFS.

要解决此问题,我...

To fix this, I...

  1. 使用Ambari停止所有正在运行的HDP服务
  2. 转到Ambari中的服务帐户并复制所有预期的服务用户名字符串
  3. 对于每个用户,运行类似id <service username>的操作以获取每个用户的组.对于服务组(可能具有多个成员),可以执行类似grep 'group-name-here' /etc/group的操作.我建议采用这种方式,因为默认用户和组的Ambari文档中没有一些您可以在此处获得的信息.
  4. 使用userdelgroupdel删除所有Ambari服务用户和组
  5. 然后重新创建集群中的所有组
  6. 然后重新创建集群中的所有用户(如果节点上有其他用户,则可能需要指定UID)
  7. 重新启动HDP服务(希望所有内容仍然可以正常运行,因为 HDP应该在寻找文字字符串(而不是UID))
  1. Used Ambari to stop all running HDP services
  2. Go to Service Accounts in Ambari and copy all of the expected service users name strings
  3. For each user, run something like id <service username> to get the group(s) for each user. For service groups (which may have multiple members), can do something like grep 'group-name-here' /etc/group. I recommend doing it this way as the Ambari docs of default users and groups does not have some of the info that you can get here.
  4. Use userdel and groupdel to remove all the Ambari service users and groups
  5. Then recreate all the groups across the cluster
  6. Then recreate all the users across the cluster (may need to specify UID if nodes have other users not on others)
  7. Restart the HDP services (hopefully everything should still run as if nothing happend, since HDP should be looking for the literal string (not the UIDs))

对于最后一部分,可以使用诸如clustershell之类的东西.

For the last parts, can use something like clustershell, eg.

# remove user
$ clush -ab userdel <service username>
# check that the UID you want to use is actually available on all nodes
$ clush -ab id <some specific UID you want to use>
# assign that UID to a new service user
$ clush -ab useradd --uid <the specific UID> --gid <groupname> <service username>

要从每个节点获取最低的公用UID,请用过 ...

To get the lowest common available UID from each node, used...

# for UID
getent passwd | awk -F: '($3>1000) && ($3<10000) && ($3>maxuid) { maxuid=$3; } END { print maxuid+1; }'
# for GID
getent passwd | awk -F: '($4>1000) && ($4<10000) && ($4>maxuid) { maxuid=$4; } END { print maxuid+1; }'

Ambari还会为用户创建一些/home目录.重新创建用户后,将需要更改dirs的权限(也可以在其中使用clush之类的东西).

Ambari also creates some /home dirs for users. Once you are done recreating the users, will need to change the permissions for the dirs (can also use something like clush there as well).

*请注意,这是一个很大的麻烦,每当添加另一个集群节点时,您都​​需要手动更正用户的UID.我是针对测试集群执行此操作的,但是对于生产(甚至是更大的测试),您应该只使用Kerberos或SSSD + Active Directory.

* Note that this was a huge pain and you would need to manually correct the UIDs of users whenever you added another cluster node. I did this for a test cluster, but for production (or even a larger test) you should just useKerberos or SSSD + Active Directory.

这篇关于HDFS NFS位置使用奇怪的数字用户名值获取目录权限的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-23 01:26