Spark保存(写入)镶木地板仅一个文件

dataFrame.coalesce(1).write.format("parquet").mode("append").save("temp.parquet") EDIT-1 仔细观察后，文档会警告coalesce 但是，如果您要进行剧烈的合并，例如到numPartitions = 1，这可能导致您的计算在更少的节点上进行超出您的期望(例如，在numPartitions = 1的情况下为一个节点)因此， @Amar 建议使用，最好使用 repartition if i writedataFrame.write.format("parquet").mode("append").save("temp.parquet")in temp.parquet folder i got the same file numbers as the row numbers i think i'm not fully understand about parquet but is it natural? 解决方案 Use coalesce before write operationdataFrame.coalesce(1).write.format("parquet").mode("append").save("temp.parquet")EDIT-1Upon a closer look, the docs do warn about coalesce However, if you're doing a drastic coalesce, e.g. to numPartitions = 1, this may result in your computation taking place on fewer nodes than you like (e.g. one node in the case of numPartitions = 1)Therefore as suggested by @Amar, it's better to use repartition 这篇关于Spark保存(写入)镶木地板仅一个文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

Parquet

Spark保存(写入)镶木地板仅一个文件