如何在Java客户端中获取HDFS服务器元数据信息？

本文介绍了如何在Java客户端中获取HDFS服务器元数据信息？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要构建一个实用程序类来测试与HDFS的连接。测试应显示HDFS的服务器端版本和任何其他元数据。虽然，有很多客户端演示可用，但没有提取服务器元数据。

请注意，我的客户端是远程java客户端，没有hadoop和HDFS配置文件来初始化配置。我需要通过使用它的URL来连接到HDFS名称节点服务。

解决方案

Hadoop暴露了一些信息HTTP可以使用。请参见的文章。
最简单的方法是连接到 NN UI并解析服务器返回的内容
：

  URL url = new URL（http：// myhost：50070 / dfshealth.jsp）; 
 BufferedReader in = new BufferedReader（new InputStreamReader（url.openStream（）））; 
 ...

另一方面，如果你知道NN和JT的地址您可以使用这样的简单客户端（Hadoop 0.20.0-r1056497）连接到
：

  import java .net.InetSocketAddress; 
 import org.apache.hadoop.conf.Configuration; 
 import org.apache.hadoop.hdfs.DFSClient; 
 import org.apache.hadoop.hdfs.protocol.ClientProtocol; 
 import org.apache.hadoop.hdfs.protocol.DatanodeInfo; 
 import org.apache.hadoop.hdfs.protocol.FSConstants.DatanodeReportType; 
 import org.apache.hadoop.mapred.ClusterStatus; 
 import org.apache.hadoop.mapred.JobClient; 
 import org.apache.hadoop.util.VersionInfo; 
 
 public class NNConnTest {
 
私人枚举NNStats {
 
 STATS_CAPACITY_IDX（0，
系统的总存储容量，以字节为单位：）; 
 // ...参见org.apache.hadoop.hdfs.protocol.ClientProtocol 
 
 private int id; 
 private String desc; 
 
 private NNStats（int id，String desc）{
 this.id = id; 
 this.desc = desc; 
} 
 
 public String getDesc（）{
 return desc; 
} 
 
 public int getId（）{
 return id; 
} 
 
} 
 
私有枚举ClusterStats {
 
 //参见org.apache.hadoop.mapred.ClusterStatus API docs 
 USED_MEM {
 @Override 
 public String getDesc（）{
 String desc =JobTracker使用的堆内存总量; 
 return desc + clusterStatus.getUsedMemory（）; 
} 
}; 
 
 private static ClusterStatus clusterStatus; 
 public static void setClusterStatus（ClusterStatus stat）{
 clusterStatus = stat; 
} 
 
 public abstract String getDesc（）; 
} 
 
 
 public static void main（String [] args）throws Exception {
 
 InetSocketAddress namenodeAddr = new InetSocketAddress（myhost，8020） ; 
 InetSocketAddress jobtrackerAddr = new InetSocketAddress（myhost，8021）; 
 
配置conf = new Configuration（）; 
 
 //查询NameNode 
 DFSClient客户端=新的DFSClient（namenodeAddr，conf）; 
 ClientProtocol namenode = client.namenode; 
 long [] stats = namenode.getStats（）; 
 
 System.out.println（NameNode info：）; 
 for（NNStats sf：NNStats.values（））{
 System.out.println（sf.getDesc（）+ stats [sf.getId（）]）; 
} 
 
 //查询JobTracker 
 JobClient jobClient = new JobClient（jobtrackerAddr，conf）; 
 ClusterStatus clusterStatus = jobClient.getClusterStatus（true）; 
 
 System.out.println（\\\
JobTracker info：）; 
 System.out.println（State：+ 
 clusterStatus.getJobTrackerState（）。toString（））; 
 
 ClusterStats.setClusterStatus（clusterStatus）; 
 for（ClusterStats cs：ClusterStats.values（））{
 System.out.println（cs.getDesc（））; 
} 
 
 System.out.println（\\\
Hadoop build version：
 + VersionInfo.getBuildVersion（））; 
 
 //查询Datanodes 
 System.out.println（\\\
DataNode info：）; 
 DatanodeInfo [] datanodeReport = namenode.getDatanodeReport（
 DatanodeReportType.ALL）; 
 for（DatanodeInfo di：datanodeReport）{
 System.out.println（Host：+ di.getHostName（））; 
 System.out.println（di.getDatanodeReport（））; 
} 
 
} 
 
}

确保您的客户端应该使用相同的 Hadoop版本作为群集，否则可能会出现 EOFException p>

I need to build a utility class to test the connection to HDFS. The test should display the sever side version of HDFS and any other metadata. Although, there are lot of client demos available but nothing on extracting the server metadata. Could anybody help?

Please note that my client is a remote java client and dont have the hadoop and HDFS config files to initialise the configuration. I need to do it by connecting to the HDFS name node service using its URL on the fly.

解决方案

Hadoop exposes some information over HTTP you can use. See Cloudera's article.Probably the easiest way would be to connect to the NN UI and parse the contentreturned by the server:

URL url = new URL("http://myhost:50070/dfshealth.jsp");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
...

On the other hand if you know the address of the NN and JT you can connect to themwith a simple client like this (Hadoop 0.20.0-r1056497):

import java.net.InetSocketAddress;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hdfs.DFSClient;
import org.apache.hadoop.hdfs.protocol.ClientProtocol;
import org.apache.hadoop.hdfs.protocol.DatanodeInfo;
import org.apache.hadoop.hdfs.protocol.FSConstants.DatanodeReportType;
import org.apache.hadoop.mapred.ClusterStatus;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.util.VersionInfo;

public class NNConnTest {

    private enum NNStats {

        STATS_CAPACITY_IDX(0, 
                "Total storage capacity of the system, in bytes: ");
        //... see org.apache.hadoop.hdfs.protocol.ClientProtocol 

        private int id;
        private String desc;

        private NNStats(int id, String desc) {
            this.id = id;
            this.desc = desc;
        }

        public String getDesc() {
            return desc;
        }

        public int getId() {
            return id;
        }

    }

    private enum ClusterStats {

        //see org.apache.hadoop.mapred.ClusterStatus API docs
        USED_MEM {
            @Override
            public String getDesc() {
                String desc = "Total heap memory used by the JobTracker: ";
                return desc + clusterStatus.getUsedMemory();
            }
        };

        private static ClusterStatus clusterStatus;
        public static void setClusterStatus(ClusterStatus stat) {
            clusterStatus = stat;
        }

        public abstract String getDesc();
    }


    public static void main(String[] args) throws Exception {

        InetSocketAddress namenodeAddr = new InetSocketAddress("myhost",8020);
        InetSocketAddress jobtrackerAddr = new InetSocketAddress("myhost",8021);

        Configuration conf = new Configuration();

        //query NameNode
        DFSClient client = new DFSClient(namenodeAddr, conf);
        ClientProtocol namenode = client.namenode;
        long[] stats = namenode.getStats();

        System.out.println("NameNode info: ");
        for (NNStats sf : NNStats.values()) {
            System.out.println(sf.getDesc() + stats[sf.getId()]);
        }

        //query JobTracker
        JobClient jobClient = new JobClient(jobtrackerAddr, conf); 
        ClusterStatus clusterStatus = jobClient.getClusterStatus(true);

        System.out.println("\nJobTracker info: ");
        System.out.println("State: " + 
                clusterStatus.getJobTrackerState().toString());

        ClusterStats.setClusterStatus(clusterStatus);
        for (ClusterStats cs : ClusterStats.values()) {
            System.out.println(cs.getDesc());
        }

        System.out.println("\nHadoop build version: " 
                + VersionInfo.getBuildVersion());

        //query Datanodes
        System.out.println("\nDataNode info: ");
        DatanodeInfo[] datanodeReport = namenode.getDatanodeReport(
                DatanodeReportType.ALL);
        for (DatanodeInfo di : datanodeReport) {
            System.out.println("Host: " + di.getHostName());
            System.out.println(di.getDatanodeReport());
        }

    }

}

Make sure that your client should use the same Hadoop version as your cluster does otherwise EOFException can occur.

这篇关于如何在Java客户端中获取HDFS服务器元数据信息？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！