本文介绍了如何在GoogleShell中使用gsutil compose并跳过第一行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在外壳中使用"compose"命令来合并在存储桶GCP中获取的文件.当此命令合并那些csv文件但不跳过标题时,将出现问题.

I am trying to use "compose" command in the shell to merge the files I get in my bucket GCP. Problem appears when this command merges those csv files but does not skip the headers.

我最终得到的是合并了24个csv文件和24个标头.

What I finally get is a merge of 24 csv files but also 24 headers.

尝试在python中执行此操作,但也没有解决方案.

Trying to do this in python but also no solution.

有帮助吗?

推荐答案

gsutil上不存在任何标志来跳过csv标头,但是我有一个可以解决该问题的python脚本.

There doesn't exist any flag on gsutil to skip csv headers but I have this python script that can make the workaround.

此脚本从存储桶中下载csv文件,跳过头文件将其附加到文件中,然后将附加文件再次上传到存储桶中.

This script downloads the csv files from the bucket, append them skipping the headers and then upload the appended file to the bucket again.

import csv
from google.cloud import storage

client = storage.Client()
bucket = client.get_bucket('YOUR.BUCKET.NAME')
blob = bucket.get_blob('FILE1.NAME')
blob.download_to_filename('FILE1.NAME')
blob2 = bucket.get_blob('FILE1.NAME')
blob.download_to_filename('FILE2.NAME')
csvs = ["FILE1.NAME", "FILE2.NAME"]
writer = csv.writer(open('appended_output.csv', 'wt'))
for x in csvs:
    with open(x, "rt") as files:
        reader = csv.reader(files)
        next(reader, None)
        for data in reader:
            writer.writerow(data)

blob = bucket.blob("appended_output.csv")
blob.upload_from_filename("appended_output.csv")

这篇关于如何在GoogleShell中使用gsutil compose并跳过第一行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-03 10:59