本文介绍了无法找到Value类的序列化程序:'org.apache.hadoop.hbase.client.Result'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从HBase中读取数据并将其保存为sequenceFile,但是获取的数据是

  java.io。 IOException:找不到Value类的序列化程序:'org.apache.hadoop.hbase.client.Result'。如果您使用自定义序列化,请确保已正确配置'io.serializations'配置。 

错误。

帖子:









继这两篇文章后,我用三个类注册了Kyro类,但仍然没有运气。



这里是我的程序:

  String tableName =validatorTableSample; 
System.out.println(开始索引hbase:+ tableName);
SparkConf sparkConf = new SparkConf()。setAppName(HBaseRead);
Class [] classes = {org.apache.hadoop.io.LongWritable.class,org.apache.hadoop.io.Text.class,org.apache.hadoop.hbase.client.Result.class};
sparkConf.registerKryoClasses(classes);
JavaSparkContext sc = new JavaSparkContext(sparkConf);
配置conf = HBaseConfiguration.create();
conf.set(TableInputFormat.INPUT_TABLE,tableName);
// conf.setStrings(io.serializations,
// conf.get(io.serializations),
// MutationSerialization.class.getName(),
// ResultSerialization.class.getName());
conf.set(spark.serializer,org.apache.spark.serializer.KryoSerializer);

JavaPairRDD< ImmutableBytesWritable,Result> hBasePairRDD = sc.newAPIHadoopRDD(
conf,
TableInputFormat.class,
ImmutableBytesWritable.class,
Result.class);

hBasePairRDD.saveAsNewAPIHadoopFile(/ tmp / tempOutputPath,ImmutableBytesWritable.class,Result.class,SequenceFileOutputFormat.class);
System.out.println(Finished readFromHbaseAndSaveAsSequenceFile().........);

以下是堆栈跟踪错误:

  java.io.IOException:找不到Value类的序列化程序:'org.apache.hadoop.hbase.client.Result'。如果您使用自定义序列化,请确保已正确配置'io.serializations'配置。 
at org.apache.hadoop.io.SequenceFile $ Writer.init(SequenceFile.java:1254)
at org.apache.hadoop.io.SequenceFile $ Writer。< init>(SequenceFile.java :1156)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java)org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
:75)
在org.apache.spark.rdd.PairRDDFunctions $$ anonfun $ saveAsNewAPIHadoopDataset $ 1 $$ anonfun $ 12.apply(PairRDDFunctions.scala:1112)
at org.apache.spark.rdd.PairRDDFunctions $$ anonfun $ saveAsNewAPIHadoopDataset $ 1 $$ anonfun $ 12.apply(PairRDDFunctions.scala:1095)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache .spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
在java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)$ b $在java.lang.Thread.run(Thread.java:745)
17/05/25 10:58 :38 WARN TaskSetManager:在阶段0.0(TID 0,localhost)中丢失任务0.0:java.io.IOException:无法找到Value类的序列化程序:'org.apache.hadoop.hbase.client.Result'。如果您使用自定义序列化,请确保已正确配置'io.serializations'配置。
at org.apache.hadoop.io.SequenceFile $ Writer.init(SequenceFile.java:1254)
at org.apache.hadoop.io.SequenceFile $ Writer。< init>(SequenceFile.java :1156)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java)org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
:75)
在org.apache.spark.rdd.PairRDDFunctions $$ anonfun $ saveAsNewAPIHadoopDataset $ 1 $$ anonfun $ 12.apply(PairRDDFunctions.scala:1112)
at org.apache.spark.rdd.PairRDDFunctions $$ anonfun $ saveAsNewAPIHadoopDataset $ 1 $$ anonfun $ 12.apply(PairRDDFunctions.scala:1095)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache .spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
(java.util.concurrent.ThreadPoolExecutor)$ Worker.run(ThreadPoolExecutor.java:617)$ b $ at java.lang.Thread.run(Thread.java:745)

17/05 / 25 10:58:38错误TaskSetManager:阶段0.0中的任务0失败1次;中止作业


解决方案



由于我们使用HBase来存储数据,而这个reducer将结果输出到HBase表中,Hadoop告诉我们他不知道如何序列化我们的数据。这就是为什么我们需要帮助它。在setUp中设置io.serializations变量

$ $ $ $ $ $ $ $ $ $ conf.setStrings(io.serializations,new String [] {hbaseConf.get (io.serializations),MutationSerialization.class.getName(),ResultSerialization.class.getName()});


I'm trying to read data out of HBase and save it as a sequenceFile, but getting

java.io.IOException: Could not find a serializer for the Value class: 'org.apache.hadoop.hbase.client.Result'. Please ensure that the configuration 'io.serializations' is properly configured, if you're usingcustom serialization.

error.

I saw two similar posts:

hadoop writables NotSerializableException with Apache Spark API

and

Spark HBase Join Error: object not serializable class: org.apache.hadoop.hbase.client.Result

Following those two posts, I registered Kyro classes with three classes, but still no luck.

Here's my program:

        String tableName = "validatorTableSample";
        System.out.println("Start indexing hbase: " + tableName);
        SparkConf sparkConf = new SparkConf().setAppName("HBaseRead");
        Class[] classes = {org.apache.hadoop.io.LongWritable.class, org.apache.hadoop.io.Text.class, org.apache.hadoop.hbase.client.Result.class};
        sparkConf.registerKryoClasses(classes);
        JavaSparkContext sc = new JavaSparkContext(sparkConf);
        Configuration conf = HBaseConfiguration.create();
        conf.set(TableInputFormat.INPUT_TABLE, tableName);
//      conf.setStrings("io.serializations",
//          conf.get("io.serializations"),
//          MutationSerialization.class.getName(),
//          ResultSerialization.class.getName());
        conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");

        JavaPairRDD<ImmutableBytesWritable, Result> hBasePairRDD = sc.newAPIHadoopRDD(
            conf,
            TableInputFormat.class,
            ImmutableBytesWritable.class,
            Result.class);

        hBasePairRDD.saveAsNewAPIHadoopFile("/tmp/tempOutputPath", ImmutableBytesWritable.class, Result.class, SequenceFileOutputFormat.class);
        System.out.println("Finished readFromHbaseAndSaveAsSequenceFile() .........");

Here's the error stacktrace:

java.io.IOException: Could not find a serializer for the Value class: 'org.apache.hadoop.hbase.client.Result'. Please ensure that the configuration 'io.serializations' is properly configured, if you're usingcustom serialization.
    at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1254)
    at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1156)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1112)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1095)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:86)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
17/05/25 10:58:38 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.io.IOException: Could not find a serializer for the Value class: 'org.apache.hadoop.hbase.client.Result'. Please ensure that the configuration 'io.serializations' is properly configured, if you're usingcustom serialization.
    at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1254)
    at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1156)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
    at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1112)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1095)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:86)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

17/05/25 10:58:38 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
解决方案

Here is what was needed to make it work

Because we use HBase to store our data and this reducer outputs its result to HBase table, Hadoop is telling us that he doesn’t know how to serialize our data. That is why we need to help it. Inside setUp set the io.serializations variable

conf.setStrings("io.serializations", new String[]{hbaseConf.get("io.serializations"), MutationSerialization.class.getName(), ResultSerialization.class.getName()});

这篇关于无法找到Value类的序列化程序:'org.apache.hadoop.hbase.client.Result'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-11 08:09