本文介绍了如何读取HDFS中的文件,而不丢失列和行名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 我的问题是,当我读取一个csv文件包含列名称例如(header),列的名称dissapear和V1,V2... 我有csv格式的 mtcars 数据集,这里是预览 model,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb 马自达RX4,21,6,160,110,3.9,2.62,16.46,0,1, 4,4 Mazda RX4 Wag,21,6,160,110,3.9,2.875,17.02,0,1,4,4 Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1 ,4,1 我想上传到HDFS并读取它, HUE平台并上传文件。我可以在文件管理器中查看它。这里是一个小预览: 然后在R会话使用 plyrmr 我运行下面的代码: filename3< ; - /user/sgerony/mtcars.csv输入(filename3,format = make.input.format(format =csv,sep =,)) ,结果如下: V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 1克莱斯勒帝国14.7 8 440 230 3.23 5.345 17.42 0 0 3 4 2菲亚特128 32.4 4 78.7 66 4.08 2.2 19.47 1 1 4 1 3本田思域30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 4丰田卡罗拉33.9 4 71.1 65 4.22 1.835 19.9 1 1 4 1 您可以看到列名已经消失。我做错了什么? 感谢解决方案解决方案我发现(我真的不喜欢喜欢它,所以如果你有一个更好的一个请做共享)。 我分开的csv文件在两个csv文件,一个只包含列名(mtcars_names.csv),另一个包含数据(mtcars_no_names.csv)。 filename< - /user/sgerony/mtcars_no_names.csv filename.names< - /user/sgerony/mtcars_names.csv filename.names< - as.data.frame(input(filename.names, format = make.input.format( #转换字符类型的列 for(i in 1:dim(filename.names)[2] ){ filename.names [,i]< - as.character(filename.names [,i])} 现在我每次写/读文件时都会编码: ### comlumn名称信息再次丢失输出(输入(filename,format = make.input.format(format =csv, sep =,,col.names = filename.names [1 ,]), path =/ user / sgerony / mtcars_output_csv) 输入(/ user / sgerony / mtcars_output_csv, format = make.input.format (format =csv, sep =,,col.names = filename.names [1,])) 如果我生成数据子集,这可能会很麻烦。对于具有不同列名的每个子集,将必须生成包含列名的新文件 My problem is that when I read a csv file containing column names for example (header), the names of the columns dissapear and have "V1","V2"... insteadI have the mtcars dataset in csv format and here is the previewmodel,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carbMazda RX4,21,6,160,110,3.9,2.62,16.46,0,1,4,4Mazda RX4 Wag,21,6,160,110,3.9,2.875,17.02,0,1,4,4Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1I would like to upload to the HDFS and read it, so I go on the "HUE" platform and upload the file. I can view it in the file manager. here is a small preview:Then in the R session using plyrmr I run the code hereafter:filename3 <- "/user/sgerony/mtcars.csv"input(filename3,format=make.input.format(format = "csv", sep=","))and the result is this: V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V121 Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 42 Fiat 128 32.4 4 78.7 66 4.08 2.2 19.47 1 1 4 13 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 24 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.9 1 1 4 1As you can see the column names have gone away. What am I doing wrong?Thanks 解决方案 This is the solution I found (I really don't like like it so if you have a better one please do share).I separated the csv file in two csv files, one containing only the column names (mtcars_names.csv) and the other containing the data (mtcars_no_names.csv). Then uploaded them on the file manager.filename <- "/user/sgerony/mtcars_no_names.csv"filename.names <- "/user/sgerony/mtcars_names.csv"filename.names <- as.data.frame(input(filename.names,format=make.input.format(format = "csv", sep=",")))# transform the columns in "character" typesfor(i in 1:dim(filename.names)[2]){ filename.names[,i] <- as.character(filename.names[,i])}Now everytime I write /read the file I code:### comlumn name information is once more lostoutput(input(filename,format=make.input.format(format = "csv",sep=",", col.names = filename.names[1,])),path="/user/sgerony/mtcars_output_csv")input("/user/sgerony/mtcars_output_csv",format=make.input.format(format = "csv", sep=",", col.names = filename.names[1,]))which can get quite messy if I generate data subsets. For each subset with different column names a new file containing the column names will have to be generated 这篇关于如何读取HDFS中的文件,而不丢失列和行名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
10-23 22:26