问题描述
我的Hadoop应用程序遇到了一些问题。
只要我的客户端没有关闭文件就退出(例如,由于崩溃),Hadoop中的打开文件永远不会关闭。
当我然后尝试重新启动客户端时,重新打开这些文件以追加数据时失败。 (请参阅下面的异常消息)
是否有一种手动关闭这些文件的好方法,或者更好,在重新打开它们之前直接检查并关闭它们的方法? p>
我使用Cloudera CDH5(2.3.0-cdh5.0.0)。
客户端意外退出后,这些是我的打开文件:
$ hadoop fsck -openforwrite /
[root @ cloudera〜]#su hdfs -c'hadoop fsck -openforwrite /'
通过http:// cloudera连接到namenode:50070
FSCK已启动by hdfs(auth:SIMPLE)from /127.0.0.1 for path / at Fri May 23 08:04:20 PDT 2014
../tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052100 11806743 bytes,1 block(s),OPENFORWRITE:../tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052103 11648439 bytes,1 block(s),OPENFORWRITE:.... / tmp / event_consumer_test / game = game1 / month = 201405 / day = 20140521 / events_2014052108 11953116 bytes,1 blo ck(s),OPENFORWRITE:/ tmp / event_consumer_test / game = game1 / month = 201405 / day = 20140521 / events_2014052109 12047982 bytes,1 block(s),OPENFORWRITE:... / tmp / event_consumer_test / game = game1 / month = 201405 / day = 20140521 / events_2014052113 12010734 bytes,1 block(s),OPENFORWRITE:.......... / tmp / event_consumer_test / game = game1 / month = 201405 / day = 20140521 / events_2014052124 11674047 bytes,1块,OPENFORWRITE:/ tmp / event_consumer_test / game = game2 / month = 201405 / day = 20140521 / events_2014052100 11995602 bytes,1 block(s),OPENFORWRITE:/ tmp / event_consumer_test / game = game2 / month = 201405 / day = 20140521 / events_2014052101 12257502 bytes,1 block(s),OPENFORWRITE:../tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052104 11964174 bytes,1 block(s),OPENFORWRITE:... / tmp / event_consumer_test / game = game2 / month = 201405 / day = 20140521 / events_2014052108 11777061 bytes,1 block(s),OPENFORWRITE:/ tmp / event_consumer_test / game = game2 / month = 201405 / day = 20140521 / events_2014052109 12000840 bytes,1块(多个), OPENFORWRITE:....... / tmp / event_consumer_test / game = game2 / month = 201405 / day = 20140521 / events_2014052117 12041871 bytes,1 block(s),OPENFORWRITE:... / tmp / event_consumer_test / game = game2 /月份= 201405 /天= 20140521 / events_2014052121 12129462字节,1个块,OPENFORWRITE:../tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052124 11856213字节,1个块,OPENFORWRITE :...... / tmp / event_consumer_test / game = game3 / month = 201405 / day = 20140521 / events_2014052106 11863488 bytes,1 block(s),OPENFORWRITE:...... / tmp / event_consumer_test / game = game3 / month = 201405 / day = 20140521 / events_2014052113 11707803 bytes,1 block(s),OPENFORWRITE:./tmp/event_consumer_test/game=game3/month=201405/day=20140521/events_2014052115 11690052 bytes,1 block(s),OPENFORWRITE :../tmp/event_consumer_test/game=game3/month=201405/day=20140521/events_2014052118 11898117 bytes,1 block(s),OPENFORWRITE:.......... / tmp / logs / hdfs / logs / application_1400845529689_0013 / cloudera_8041 0字节,0个块,OPENFORW RITE:..................
.......................... .............. / user / history / done_intermediate / hdfs / job_1400845529689_0007.summary_tmp 0个字节,0个块,OPENFORWRITE:.............. ..............................................
.................................................. ..................................................
.............................................. ..状态:健康
总尺寸:1080902001 B
共计:68
文件总数:348
符号链接总数:0
总计(已验证):344平均。块大小3142156 B)
最小复制块:344(100.0%)
过度复制块:0(0.0%)
低复制块:0(0.0%)
错误复制块:0(0.0%)
默认复制因子:1
平均块复制:1.0
损坏块:0
缺失复制品:0(0.0%)
数据节点数量:1
机架数量:1
FSCK于5月23日星期五结束2014年5月23日08:04:20 PDT 2014以25毫秒结束
路径'/'下的文件系统是HEALTHY
创建和写入文件的代码(简化为问题):
路径路径=新路径(文件名);
if(!this.fs.exists(path)){
this.fs.create(path).close();
}
OutputStream out = this.fs.append(path);
out.write(... message ...);
IOUtils.closeStream(out);
尝试写入一个打开的文件:
线程main中的异常org.apache.hadoop.ipc.RemoteException( org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):无法为客户端127.0.0.1上的DFSClient_NONMAPREDUCE_-1420767882_1创建文件/ tmp / event_consumer_test / game = game1 / month = 201405 / day = 20140521 / events_2014052124,因为当前租赁持有人正试图重新创建文件。
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2458)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem。 java:2340)
在org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2569)
在org.apache.hadoop.hdfs.server.namenode.FSNamesystem。 appendFile(FSNamesystem.java:2532)
位于org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:522)
位于org.apache.hadoop.hdfs.protocolPB。 ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:373)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos $ ClientNamenodeProtocol $ 2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc .ProtobufRpcEngine $ Server $ ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC $ Server.call(RPC.jav a:1026)
在org.apache.hadoop.ipc.Server $ Handler $ 1.run(Server.java:1986)
在org.apache.hadoop.ipc.Server $ Handler $ 1.run( Server.java:1982)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org。 apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
在org.apache.hadoop.ipc.Server $ Handler.run(Server.java:1980)
在org.apache。 hadoop.ipc.Client.call(Client.java:1409)
在org.apache.hadoop.ipc.Client.call(Client.java:1362)
在org.apache.hadoop.ipc。 ProtobufRpcEngine $ Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy $ Proxy9.append(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
在java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache .hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy。$ Proxy9.append(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB .ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:276)
at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1558)
at org.apache.hadoop.hdfs.DFSClient.append (DFSClient.java:1598)
at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1586)
at org.apache.hadoop.hdfs.DistributedFileSystem $ 4.doCall(DistributedFileSystem。 java:320)
at org.apache.hadoop.hdfs.DistributedFileSystem $ 4.doCall(DistributedFileSystem.java:316)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81 )在org.apache.hadoo
p.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:316)
at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1161)
at com.cmp.eventconsumer.io。 HdfsOutputManager.get(HdfsOutputManager.java:46)
at com.cmp.eventconsumer.EventConsumer.fetchEvents(EventConsumer.java:68)
at com.cmp.eventconsumer.EventConsumer.main(EventConsumer.java: (b)sun.reflect.NativeMethodAccessorImpl.invoke0(Native方法)
DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
我有同样的proplem。
我做的是:
try {
} catch(Excetion e){
logger.info(尝试恢复文件租约:+ hdfspath);
fileSystem.recoverLease(hdfspath);
布尔型isclosed = filesystem.isFileClosed(hdfspath);
Stopwatch sw = new StopWatch()。start();
while(!isclosed){
if(sw.elapsedMillis()> 60 * 1000)
throw e;
尝试{
Thread.currentThread()。sleep(1000);
} catch(InterruptedException e1){
}
isclosed = filesystem.isFileClosed(hdfspath);
}
}
I am experiencing some problems with my Hadoop application.
Whenever my client exits without closing the files (e.g. due to a crash), there are open files in Hadoop that are never closed.
When I then try to restart the client it fails when re-opening those files to append data. (See below for Exception message)
Is there a good way to close those files manually or even better, a way to check and close them directly before reopening them?
I am using Cloudera CDH5 (2.3.0-cdh5.0.0).
These are my open files after the client has exited unexpectedly:
$ hadoop fsck -openforwrite /
[root@cloudera ~]# su hdfs -c'hadoop fsck -openforwrite /'
Connecting to namenode via http://cloudera:50070
FSCK started by hdfs (auth:SIMPLE) from /127.0.0.1 for path / at Fri May 23 08:04:20 PDT 2014
../tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052100 11806743 bytes, 1 block(s), OPENFORWRITE: ../tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052103 11648439 bytes, 1 block(s), OPENFORWRITE: ..../tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052108 11953116 bytes, 1 block(s), OPENFORWRITE: /tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052109 12047982 bytes, 1 block(s), OPENFORWRITE: .../tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052113 12010734 bytes, 1 block(s), OPENFORWRITE: ........../tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052124 11674047 bytes, 1 block(s), OPENFORWRITE: /tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052100 11995602 bytes, 1 block(s), OPENFORWRITE: /tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052101 12257502 bytes, 1 block(s), OPENFORWRITE: ../tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052104 11964174 bytes, 1 block(s), OPENFORWRITE: .../tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052108 11777061 bytes, 1 block(s), OPENFORWRITE: /tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052109 12000840 bytes, 1 block(s), OPENFORWRITE: ......./tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052117 12041871 bytes, 1 block(s), OPENFORWRITE: .../tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052121 12129462 bytes, 1 block(s), OPENFORWRITE: ../tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052124 11856213 bytes, 1 block(s), OPENFORWRITE: ....../tmp/event_consumer_test/game=game3/month=201405/day=20140521/events_2014052106 11863488 bytes, 1 block(s), OPENFORWRITE: ....../tmp/event_consumer_test/game=game3/month=201405/day=20140521/events_2014052113 11707803 bytes, 1 block(s), OPENFORWRITE: ./tmp/event_consumer_test/game=game3/month=201405/day=20140521/events_2014052115 11690052 bytes, 1 block(s), OPENFORWRITE: ../tmp/event_consumer_test/game=game3/month=201405/day=20140521/events_2014052118 11898117 bytes, 1 block(s), OPENFORWRITE: ........../tmp/logs/hdfs/logs/application_1400845529689_0013/cloudera_8041 0 bytes, 0 block(s), OPENFORWRITE: ..................
......................................../user/history/done_intermediate/hdfs/job_1400845529689_0007.summary_tmp 0 bytes, 0 block(s), OPENFORWRITE: ...........................................................
....................................................................................................
................................................Status: HEALTHY
Total size: 1080902001 B
Total dirs: 68
Total files: 348
Total symlinks: 0
Total blocks (validated): 344 (avg. block size 3142156 B)
Minimally replicated blocks: 344 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Fri May 23 08:04:20 PDT 2014 in 25 milliseconds
The filesystem under path '/' is HEALTHY
The code (reduced to problem) to create and write files:
Path path = new Path(filename);
if(!this.fs.exists(path)) {
this.fs.create(path).close();
}
OutputStream out = this.fs.append(path);
out.write(... message ...);
IOUtils.closeStream(out);
The exception I get when trying to write to an open file:
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): failed to create file /tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052124 for DFSClient_NONMAPREDUCE_-1420767882_1 on client 127.0.0.1 because current leaseholder is trying to recreate file.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2458)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2340)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2569)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2532)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:522)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:373)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
at org.apache.hadoop.ipc.Client.call(Client.java:1409)
at org.apache.hadoop.ipc.Client.call(Client.java:1362)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy9.append(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy9.append(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:276)
at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1558)
at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1598)
at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1586)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:320)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:316)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:316)
at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1161)
at com.cmp.eventconsumer.io.HdfsOutputManager.get(HdfsOutputManager.java:46)
at com.cmp.eventconsumer.EventConsumer.fetchEvents(EventConsumer.java:68)
at com.cmp.eventconsumer.EventConsumer.main(EventConsumer.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
I had the same proplem.What i do is :
try {
} catch(Excetion e) {
logger.info("try to recover file Lease : "+hdfspath);
fileSystem.recoverLease(hdfspath);
boolean isclosed= filesystem.isFileClosed(hdfspath);
Stopwatch sw = new StopWatch().start();
while(!isclosed) {
if(sw.elapsedMillis()>60*1000)
throw e;
try {
Thread.currentThread().sleep(1000);
} catch (InterruptedException e1) {
}
isclosed = filesystem.isFileClosed(hdfspath);
}
}
这篇关于坠毁的HDFS客户端 - 如何关闭剩余的打开文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!