本文介绍了我的MapReduce程序产生一个零输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

输出文件夹的part-00000文件中没有内容!



这里是我看不出任何异常的命令跟踪,

  [cloudera @ localhost〜] $ hadoop jar testmr.jar TestMR /tmp/example.csv/user/cloudera/output 
14/02/06 11 :45:24 WARN conf.Configuration:session.id已弃用。相反,使用dfs.metrics.session-id
14/02/06 11:45:24 INFO jvm.JvmMetrics:使用processName = JobTracker,sessionId = $ b $初始化JVM度量标准14/02/06 11: 45:24警告mapred.JobClient:使用GenericOptionsParser解析参数。应用程序应该实现相同的工具。
14/02/06 11:45:25信息mapred.FileInputFormat:要输入的总输入路径:1
14/02/06 11:45:25信息mapred.JobClient:正在运行的作业:job_local1238439569_0001
14/02/06 11:45:25信息mapred.LocalJobRunner:在配置中设置OutputCommitter null
14/02/06 11:45:25信息mapred.LocalJobRunner:OutputCommitter是org.apache.hadoop。 mapred.FileOutputCommitter
14/02/06 11:45:25信息mapred.LocalJobRunner:等待地图任务
14/02/06 11:45:25信息mapred.LocalJobRunner:启动任务:attempt_local1238439569_0001_m_000000_0
14/02/06 11:45:26警告mapreduce.Counters:Group org.apache.hadoop.mapred.Task $ Counter已弃用。使用org.apache.hadoop.mapreduce.TaskCounter而不是
14/02/06 11:45:26信息util.ProcessTree:setsid退出并退出代码0
14/02/06 11:45:26 INFO mapred.Task:使用ResourceCalculatorPlugin:org.apache.hadoop.util.LinuxResourceCalculatorPlugin@44aea710
14/02/06 11:45:26信息mapred.MapTask:正在处理split:hdfs://localhost.localdomain:8020 /tmp/example.csv:0+2963382
14/02/06 11:45:26警告mapreduce.Counters:不建议使用计数器名称MAP_INPUT_BYTES。使用FileInputFormatCounters作为组名和BYTES_READ作为计数器名称,而不是
14/02/06 11:45:26信息mapred.MapTask:numReduceTasks:1
14/02/06 11:45:26信息mapred。 MapTask:Map输出收集器类= org.apache.hadoop.mapred.MapTask $ MapOutputBuffer
14/02/06 11:45:26信息mapred.MapTask:io.sort.mb = 50
14 / 02/06 11:45:26信息mapred.MapTask:data buffer = 39845888/49807360
14/02/06 11:45:26信息mapred.MapTask:记录缓冲区= 131072/163840
14 / 02/06 11:45:26信息mapred.JobClient:map 0%reduce 0%
14/02/06 11:45:28信息mapred.MapTask:开始刷新地图输出
14/02 / 06 11:45:28信息compress.CodecPool:得到了全新的压缩器[.snappy]
14/02/06 11:45:28信息mapred.Task:任务:attempt_local1238439569_0001_m_000000_0完成。并且正在提交
14/02/06 11:45:28信息mapred.LocalJobRunner:hdfs://localhost.localdomain:8020 / tmp / example.csv:0 + 2963382
14 / 02/06 11:45:28信息mapred.Task:任务'attempt_local1238439569_0001_m_000000_0'完成。
14/02/06 11:45:28信息mapred.LocalJobRunner:完成任务:attempt_local1238439569_0001_m_000000_0
14/02/06 11:45:28信息mapred.LocalJobRunner:地图任务执行器完成。
14/02/06 11:45:28警告mapreduce.Counters:Group org.apache.hadoop.mapred.Task $ Counter已弃用。使用org.apache.hadoop.mapreduce.TaskCounter改为
14/02/06 11:45:28信息mapred.Task:使用ResourceCalculatorPlugin:org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1d382926
14 / 02/06 11:45:28信息mapred.LocalJobRunner:
14/02/06 11:45:28信息mapred.Merger:合并1个排序的段
14/02/06 11:45:28 INFO compress.CodecPool:得到全新的解压缩器[.snappy]
14/02/06 11:45:28信息mapred.Merger:直到最后一个合并通道,剩下0个片段:0字节
14/02/06 11:45:28信息mapred.LocalJobRunner:
14/02/06 11:45:28信息mapred.Task:任务:attempt_local1238439569_0001_r_000000_0完成。并且正在提交
14/02/06 11:45:28信息mapred.LocalJobRunner:
14/02/06 11:45:28信息mapred.Task:任务attempt_local1238439569_0001_r_000000_0被允许立即提交
14/02/06 11:45:28信息mapred.FileOutputCommitter:将任务'attempt_local1238439569_0001_r_000000_0'保存到hdfs://localhost.localdomain:8020 / user / cloudera / output
14 / 02/06 11:45:28信息mapred.LocalJobRunner:减少>减少
14/02/06 11:45:28信息mapred.Task:完成任务'attempt_local1238439569_0001_r_000000_0'。
14/02/06 11:45:28信息mapred.JobClient:地图100%减少100%
14/02/06 11:45:28信息mapred.JobClient:作业完成:job_local1238439569_0001
14/02/06 11:45:28信息mapred.JobClient:计数器:26
14/02/06 11:45:28信息mapred.JobClient:文件系统计数器
14/02 / 06 11:45:28信息mapred.JobClient:FILE:读取的字节数= 7436
14/02/06 11:45:28信息mapred.JobClient:FILE:写入的字节数= 199328
14/02/06 11:45:28信息mapred.JobClient:FILE:读取操作的数量= 0
14/02/06 11:45:28信息mapred.JobClient:FILE:大量读取操作的数量= 0
14/02/06 11:45:28信息mapred.JobClient:FILE:写操作次数= 0
14/02/06 11:45:28信息mapred.JobClient:HDFS:Number的字节read = 5926764
14/02/06 11:45:28信息mapred.JobClient:HDFS:写入的字节数= 0
14/02/06 11:45:28信息mapred.JobClient :HDFS:读取操作的数量= 10
14/02/06 11:45:28信息mapred.JobC lient:HDFS:大量读取操作的数量= 0
14/02/06 11:45:28信息mapred.JobClient:HDFS:写入操作的数量= 4
14/02/06 11:45 :28信息mapred.JobClient:Map-Reduce Framework
14/02/06 11:45:28信息mapred.JobClient:地图输入记录= 24518
14/02/06 11:45:28信息mapred.JobClient:映射输出记录= 0
14/02/06 11:45:28信息mapred.JobClient:映射输出字节= 0
14/02/06 11:45:28信息mapred。 JobClient:输入分割字节= 129
14/02/06 11:45:28信息mapred.JobClient:结合输入记录= 0
14/02/06 11:45:28信息mapred.JobClient:组合输出记录= 0
14/02/06 11:45:28信息mapred.JobClient:减少输入组= 0
14/02/06 11:45:28信息mapred.JobClient:减少随机播放字节= 0
14/02/06 11:45:28信息mapred.JobClient:减少输入记录= 0
14/02/06 11:45:28信息mapred.JobClient:减少输出记录= 0
14/02/06 11:45:28信息mapred.JobClient:Spilled Records = 0
14/02/06 11:45:28信息mapred.JobClient:花费的CPU时间(ms)= 0
14/02/06 11:45:28信息mapred.JobClient:物理内存(字节)snapshot = 0
14/02/06 11:45:28信息mapred.JobClient:虚拟内存(字节)快照= 0
14/02/06 11:45:28信息mapred.JobClient:总承诺堆使用率(字节)= 221126656
14/02/06 11:45:28信息mapred.JobClient:org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
14/02/06 11:45:28信息mapred.JobClient:BYTES_READ = 2963382
[cloudera @ localhost〜] $

以下是我的MR代码,

  import java.io.IOException; 
import java.util。*;
import java.text.SimpleDateFormat;

导入org.apache.hadoop.fs.Path;
导入org.apache.hadoop.conf。*;
import org.apache.hadoop.io。*;
import org.apache.hadoop.mapred。*;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.util。*;

public class TestMR
{
public static class Map扩展MapReduceBase实现Mapper< LongWritable,Text,Text,Text>
public void map(LongWritable key,Text line,OutputCollector< Text,Text> output,Reporter reporter)throws IOException
{
final String [] split = line.toString ()。分裂(,);

if(split [2] .equals(Test))
{
output.collect(new Text(split [0]),new Text(split [4 ] +|+ split [7]));



$ b public static class Reduce extends MapReduceBase implements Reducer< Text,Text,Text,DoubleWritable>
public void reduce(Text key,Iterator< Text> values,OutputCollector< Text,DoubleWritable> output,Reporter reporter)throws IOException
{
while(values.hasNext( ))
{
long t1 = 0,t2 = 0;
SimpleDateFormat df = new SimpleDateFormat(yyyy-MM-dd HH:mm:ss);

String [] tmpBuf_1 = values.next()。toString()。split(|);
String v1 = tmpBuf_1 [0];
尝试
{
t1 = df.parse(tmpBuf_1 [1])。getTime();

catch(java.text.ParseException e)
{
System.out.println(Unable to parse date string:+ tmpBuf_1 [1]);
继续;
}

if(!values.hasNext())
break;

String [] tmpBuf_2 = values.next()。toString()。split(|);
String v2 = tmpBuf_2 [0];
尝试
{
t2 = df.parse(tmpBuf_2 [1])。getTime();

catch(java.text.ParseException e)
{
System.out.println(Unable to parse date string:+ tmpBuf_2 [1]);
继续;
}

int vDiff = Integer.parseInt(v2) - Integer.parseInt(v1);
long tDiff =(t2 - t1)/ 1000;
if(tDiff> 600)
break;

double declineV = vDiff / tDiff;

output.collect(key,new DoubleWritable(declineV));



$ b public static void main(String [] args)throws Exception
{
JobConf conf = new JobConf( TestMR.class);
conf.setJobName(TestMapReduce);
conf.set(mapred.job.tracker,local);

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(DoubleWritable.class);

conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf,new Path(args [0]));
FileOutputFormat.setOutputPath(conf,new Path(args [1]));

JobClient.runJob(conf);




$ b这是我的第一个MapReduce程序,找出它不会产生输出的原因!
请让我知道是否在我的代码中有任何问题或者更好的方式来运行MapReduce作业来获取输出。



仅供参考,testmr.jar文件位于本地文件系统中,CSV文件和输出文件夹位于HDFS中。

解决方案

如果查看日志,可以看到Map方法不会生成任何输出:

  14/02/06 11:45:28信息mapred.JobClient:地图输入记录= 24518 
14/02/06 11:45:28信息mapred.JobClient:地图输出记录= 0
14/02/06 11:45:28信息mapred.JobClient:地图输出字节= 0

正如您所看到的,Map方法正在获取输入记录,但它正在生成0输出记录。因此,Map方法中的逻辑肯定有问题:

  final String [] split = line.toString()。分裂(,); 

if(split [2] .equals(Test))
{
output.collect(new Text(split [0]),new Text(split [4 ] +|+ split [7]));
}

我建议您将此逻辑测试为带有一些示例输入的简单Java代码数据并确保它能正常工作,然后编辑你的MapReduce代码,然后再次运行这个工作。


The output folder has part-00000 file with no content!

Here is the command trace where I see no exception,

[cloudera@localhost ~]$ hadoop jar testmr.jar TestMR /tmp/example.csv /user/cloudera/output
14/02/06 11:45:24 WARN conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
14/02/06 11:45:24 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
14/02/06 11:45:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/02/06 11:45:25 INFO mapred.FileInputFormat: Total input paths to process : 1
14/02/06 11:45:25 INFO mapred.JobClient: Running job: job_local1238439569_0001
14/02/06 11:45:25 INFO mapred.LocalJobRunner: OutputCommitter set in config null
14/02/06 11:45:25 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
14/02/06 11:45:25 INFO mapred.LocalJobRunner: Waiting for map tasks
14/02/06 11:45:25 INFO mapred.LocalJobRunner: Starting task: attempt_local1238439569_0001_m_000000_0
14/02/06 11:45:26 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
14/02/06 11:45:26 INFO util.ProcessTree: setsid exited with exit code 0
14/02/06 11:45:26 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@44aea710
14/02/06 11:45:26 INFO mapred.MapTask: Processing split: hdfs://localhost.localdomain:8020/tmp/example.csv:0+2963382
14/02/06 11:45:26 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and  BYTES_READ as counter name instead
14/02/06 11:45:26 INFO mapred.MapTask: numReduceTasks: 1
14/02/06 11:45:26 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
14/02/06 11:45:26 INFO mapred.MapTask: io.sort.mb = 50
14/02/06 11:45:26 INFO mapred.MapTask: data buffer = 39845888/49807360
14/02/06 11:45:26 INFO mapred.MapTask: record buffer = 131072/163840
14/02/06 11:45:26 INFO mapred.JobClient:  map 0% reduce 0%
14/02/06 11:45:28 INFO mapred.MapTask: Starting flush of map output
14/02/06 11:45:28 INFO compress.CodecPool: Got brand-new compressor [.snappy]
14/02/06 11:45:28 INFO mapred.Task: Task:attempt_local1238439569_0001_m_000000_0 is done. And is in the process of commiting
14/02/06 11:45:28 INFO mapred.LocalJobRunner: hdfs://localhost.localdomain:8020/tmp/example.csv:0+2963382
14/02/06 11:45:28 INFO mapred.Task: Task 'attempt_local1238439569_0001_m_000000_0' done.
14/02/06 11:45:28 INFO mapred.LocalJobRunner: Finishing task: attempt_local1238439569_0001_m_000000_0
14/02/06 11:45:28 INFO mapred.LocalJobRunner: Map task executor complete.
14/02/06 11:45:28 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
14/02/06 11:45:28 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1d382926
14/02/06 11:45:28 INFO mapred.LocalJobRunner:
14/02/06 11:45:28 INFO mapred.Merger: Merging 1 sorted segments
14/02/06 11:45:28 INFO compress.CodecPool: Got brand-new decompressor [.snappy]
14/02/06 11:45:28 INFO mapred.Merger: Down to the last merge-pass, with 0 segments left of total size: 0 bytes
14/02/06 11:45:28 INFO mapred.LocalJobRunner:
14/02/06 11:45:28 INFO mapred.Task: Task:attempt_local1238439569_0001_r_000000_0 is done. And is in the process of commiting
14/02/06 11:45:28 INFO mapred.LocalJobRunner:
14/02/06 11:45:28 INFO mapred.Task: Task attempt_local1238439569_0001_r_000000_0 is allowed to commit now
14/02/06 11:45:28 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local1238439569_0001_r_000000_0' to hdfs://localhost.localdomain:8020/user/cloudera/output
14/02/06 11:45:28 INFO mapred.LocalJobRunner: reduce > reduce
14/02/06 11:45:28 INFO mapred.Task: Task 'attempt_local1238439569_0001_r_000000_0' done.
14/02/06 11:45:28 INFO mapred.JobClient:  map 100% reduce 100%
14/02/06 11:45:28 INFO mapred.JobClient: Job complete: job_local1238439569_0001
14/02/06 11:45:28 INFO mapred.JobClient: Counters: 26
14/02/06 11:45:28 INFO mapred.JobClient:   File System Counters
14/02/06 11:45:28 INFO mapred.JobClient:     FILE: Number of bytes read=7436
14/02/06 11:45:28 INFO mapred.JobClient:     FILE: Number of bytes written=199328
14/02/06 11:45:28 INFO mapred.JobClient:     FILE: Number of read operations=0
14/02/06 11:45:28 INFO mapred.JobClient:     FILE: Number of large read operations=0
14/02/06 11:45:28 INFO mapred.JobClient:     FILE: Number of write operations=0
14/02/06 11:45:28 INFO mapred.JobClient:     HDFS: Number of bytes read=5926764
14/02/06 11:45:28 INFO mapred.JobClient:     HDFS: Number of bytes written=0
14/02/06 11:45:28 INFO mapred.JobClient:     HDFS: Number of read operations=10
14/02/06 11:45:28 INFO mapred.JobClient:     HDFS: Number of large read operations=0
14/02/06 11:45:28 INFO mapred.JobClient:     HDFS: Number of write operations=4
14/02/06 11:45:28 INFO mapred.JobClient:   Map-Reduce Framework
14/02/06 11:45:28 INFO mapred.JobClient:     Map input records=24518
14/02/06 11:45:28 INFO mapred.JobClient:     Map output records=0
14/02/06 11:45:28 INFO mapred.JobClient:     Map output bytes=0
14/02/06 11:45:28 INFO mapred.JobClient:     Input split bytes=129
14/02/06 11:45:28 INFO mapred.JobClient:     Combine input records=0
14/02/06 11:45:28 INFO mapred.JobClient:     Combine output records=0
14/02/06 11:45:28 INFO mapred.JobClient:     Reduce input groups=0
14/02/06 11:45:28 INFO mapred.JobClient:     Reduce shuffle bytes=0
14/02/06 11:45:28 INFO mapred.JobClient:     Reduce input records=0
14/02/06 11:45:28 INFO mapred.JobClient:     Reduce output records=0
14/02/06 11:45:28 INFO mapred.JobClient:     Spilled Records=0
14/02/06 11:45:28 INFO mapred.JobClient:     CPU time spent (ms)=0
14/02/06 11:45:28 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
14/02/06 11:45:28 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
14/02/06 11:45:28 INFO mapred.JobClient:     Total committed heap usage (bytes)=221126656
14/02/06 11:45:28 INFO mapred.JobClient:   org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
14/02/06 11:45:28 INFO mapred.JobClient:     BYTES_READ=2963382
[cloudera@localhost ~]$

The below is my MR code,

import java.io.IOException;
import java.util.*;
import java.text.SimpleDateFormat;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.util.*;

public class TestMR
{
    public static class Map extends MapReduceBase implements Mapper<LongWritable,Text,Text,Text>
    {
        public void map(LongWritable key, Text line, OutputCollector<Text, Text> output, Reporter reporter) throws IOException
        {
            final String [] split = line.toString().split(",");

            if(split[2].equals("Test"))
            {
                output.collect(new Text(split[0]), new Text(split[4] + "|" + split[7]));
            }
        }
    }

    public static class Reduce extends MapReduceBase implements Reducer<Text,Text,Text,DoubleWritable>
    {
        public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, DoubleWritable> output, Reporter reporter) throws IOException
        {
            while(values.hasNext())
            {
                long t1=0, t2=0;
                SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");

                String [] tmpBuf_1 = values.next().toString().split("|");
                String v1 = tmpBuf_1[0];
                try
                {
                    t1 = df.parse(tmpBuf_1[1]).getTime();
                }
                catch (java.text.ParseException e)
                {
                    System.out.println("Unable to parse date string: "+ tmpBuf_1[1]);
                    continue;
                }

                if(!values.hasNext())
                    break;

                String [] tmpBuf_2 = values.next().toString().split("|");
                String v2 = tmpBuf_2[0];
                try
                {
                    t2 = df.parse(tmpBuf_2[1]).getTime();
                }
                catch (java.text.ParseException e)
                {
                    System.out.println("Unable to parse date string: "+ tmpBuf_2[1]);
                    continue;
                }

                int vDiff = Integer.parseInt(v2) - Integer.parseInt(v1);
                long tDiff = (t2 - t1)/1000;
                if(tDiff > 600)
                    break;

                double declineV = vDiff / tDiff;

                output.collect(key, new DoubleWritable(declineV));
            }
        }
    }

    public static void main(String[] args) throws Exception
    {
        JobConf conf = new JobConf(TestMR.class);
        conf.setJobName("TestMapReduce");
        conf.set("mapred.job.tracker", "local");

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(DoubleWritable.class);

        conf.setMapperClass(Map.class);
        conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf);
    }
}

This is my first MapReduce program and I'm unable to locate the reason why it doesn't produce the output!Please let me know if there is any issue in my code or any better way of running the MapReduce job for getting output.

FYI, the testmr.jar file is in local file system and the CSV and output folders in HDFS.

解决方案

If you look at the logs, you can see that the Map method isn't generating any output:

14/02/06 11:45:28 INFO mapred.JobClient:     Map input records=24518
14/02/06 11:45:28 INFO mapred.JobClient:     Map output records=0
14/02/06 11:45:28 INFO mapred.JobClient:     Map output bytes=0

As you can see, Map method is getting the input records, but it is producing 0 output records. So there must be something wrong with the logic in your Map Method:

final String [] split = line.toString().split(",");

        if(split[2].equals("Test"))
        {
            output.collect(new Text(split[0]), new Text(split[4] + "|" + split[7]));
        }

I suggest that you test this logic as a simple Java code with some sample input data and make sure it works, then edit your MapReduce code and try running the job again.

这篇关于我的MapReduce程序产生一个零输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 05:24