dataFrame.coalesce(1).write.format("parquet").mode("append").save("temp.parquet") EDIT-1 仔细观察后,文档会警告coalesce 但是,如果您要进行剧烈的合并,例如到numPartitions = 1,这可能导致您的计算在更少的节点上进行 超出您的期望(例如,在numPartitions = 1的情况下为一个节点)因此, @Amar 建议使用,最好使用 repartition if i writedataFrame.write.format("parquet").mode("append").save("temp.parquet")in temp.parquet folder i got the same file numbers as the row numbers i think i'm not fully understand about parquet but is it natural? 解决方案 Use coalesce before write operationdataFrame.coalesce(1).write.format("parquet").mode("append").save("temp.parquet")EDIT-1Upon a closer look, the docs do warn about coalesce However, if you're doing a drastic coalesce, e.g. to numPartitions = 1, this may result in your computation taking place on fewer nodes than you like (e.g. one node in the case of numPartitions = 1)Therefore as suggested by @Amar, it's better to use repartition 这篇关于Spark保存(写入)镶木地板仅一个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
09-24 23:40